Logistic Regression Optimization

Logistic Regression Optimization Parameters Explained

These are the most commonly adjusted parameters with Logistic Regression. Let’s take a deeper look at what they are used for and how to change their values:

penalty: (default: “l2“) Defines penalization norms. Certain solver objects support only specific penalization parameters so that should be taken into consideration.

l1: penalty supported by liblinear and saga solvers

l2: penalty supported by  cg, sag, saga, lbfgs solvers.

elasticnet: penalty only supported by: saga solver.

none: Penalty regularization won’t be applied. Doesn’t work with liblinear solver.

solver: (default: “lbfgs“) Provides options to choose solver algorithm for optimization. Usually default solver works great in most situations and there are suggestions for specific occasions below such as: classification problems with large or very large datasets.

If you have particular cases it’s always a good idea to monitor how solver is working on training and test data by comparing different solver functions. This can also help understand the finesse of different solvers a very interesting topic.

lbfgs: Stands for limited-memory BFGS. This solver only calculates an approximation to the Hessian based on the gradient which makes it computationally more effective. On the other hand it’s memory usage is limited compared to regular bfgs which causes it to discard earlier gradients and accumulate only fresh gradients as allowed by the memory restriction.

liblinear: More efficient solver with small datasets. Only useful for ovr (one-versus-rest) problems won’t work with multiclass problems unlike other solvers here. Also doesn’t work with l2 or none parameter values for penalty.

newton-cg: Solver which calculates Hessian explicitly which can be computationally expensive in high dimensions. 

sag: Stands for Stochastic Average Gradient Descent. More efficient solver with large datasets.

saga: Saga is a variant of Sag and it can be used with l1 Regularization. It’s a quite time-efficient solver and usually the go-to solver with very large datasets.

dual: (default: False) 

Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.

tol: (default: 0.0004) This parameter stands for stopping criteria tolerance.

C: (default: 1.0) This parameter signifies strength of the regularization and takes a positive float value. C and regularization strength are negatively correlated (smaller the C is stronger the regularization will be).

fit_intercept: (default: True) Concerning decision function, regulates if a constant should be added.

random_state: (default: None) Adjusts randomness seed.

none: seed will be numpy’s random module: numpy.random

int: seed will be generated based on integer value by random number generator

RandomStaterandom_state will be the random number generator (seed)


from sklearn.linear_model import LogisticRegression
LRM = LogisticRegression(solver="saga", penalty="elasticnet")
LRM = LogisticRegression(tol = 0.0009)
LRM = LogisticRegression(fit_intercept = True)
LRM = LogisticRegression(verbose = 2)
LRM = LogisticRegression(warm_start = True)

More parameters

More Logistic Regression Optimization Parameters for fine tuning

Further on, these parameters can be used for further optimization, to avoid overfitting and make adjustments based on impurity:

  • max_iter
  • warm_start
  • verbose
  • class_weight
  • multi_class
  • l1_ratio
  • n_jobs


(default: 100)

This parameter signifies the maximum iteration allowed for solver convergence.


(default: None)




class_weightdict or ‘balanced’, default=None

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).


(default: False)

This parameter is useful only with solvers other than liblinear. It's used to make use of previous training's solution as initialization hence the term warm start.

False: Previous solution will be discarded.

True: Previous solution will be reused for initialization fitting.


(default: ’auto’)

"auto": Will select between ovr and multinomial automatically. If solver is liblinear ovr will be selecter. Also, if data is binary ovr will be selected. In other cases multinomial will be selected automatically.

"ovr": One-versus-rest causes each binary problem to be fit for each label.

"multinomial": Probability distribution will be fit with multinomial loss. (won't work with liblinear solver)



If solver is saga and penalty is selected as elasticnet this parameter can offer further optimization.

l1_ratio=0: penalty will be equal to l2.

l1_ratio=1: penalty will be equal to l1.

0 < l1_ratio <1, the penalty will be a combination of l1 & l2, l1_ratio fraction will define the weight of l1 in the mix.


(default: 0)

Signifies information printed during machine learning algorithm's execution when available.

In logistic regression cases only available when solver is either liblinear or lbfgs.

0: No verbosity, information won't be displayed.

1: Some verbosity, some information will be displayed.

2: More verbosity, more information will be displayed.


(default: None)

This parameter signifies CPU cores allowed to work in parallel.

Only works when solver is not liblinear and multi_class is "ovr".

None: Only 1 CPU core will work

-1: All CPU cores will be assigned when possible.

int: CPU cores will be allowed to work in parallel based on integer value assigned during logistic regression.

Research References:

BFGS Solver: stands for Broyden–Fletcher–Goldfarb–Shanno

LBFGS Solver: stands for Limited Broyden–Fletcher–Goldfarb–Shanno

1- Broyden, C. G. (1970), “The convergence of a class of double-rank minimization algorithms

2- Fletcher, R. (1970), “A New Approach to Variable Metric Algorithms

3- Goldfarb, D. (1970), “A Family of Variable Metric Updates Derived by Variational Means

4- Shanno, David F. (July 1970), “Conditioning of quasi-Newton methods for function minimization

5- Fletcher, Roger (1987), “Practical methods of optimization (2nd edition)

Official Scikit Learn Documentation: sklearn.linear_model.LogisticRegression