Logistic Regression

Pros & Cons

logistic regression

Advantages

1- Probability Prediction

Compared to some other machine learning algorithms, Logistic Regression will provide probability predictions and not only classification labels (think kNN).

Depending on your output needs this can be very useful if you'd like to have probability results especially if you want to integrate this implementation with another system that works on probability measures.

A good example is you might be after a "spam | no spam" classifier but you might want this to be adjustable based on a probability (similar to Google reCAPTCHA V3), in this case, having probabilities rather than only labels enables this project.

Bank loans can be another field where you want probability on the client rather than such a strict binary answer.

2- Thrives with Little Training

One of the great advantages of Logistic Regression is that when you have a complicated linear problem and not a whole lot of data it's still able to produce pretty useful predictions. This is a pro that comes with Logistic Regression's mathematical foundations and won't be possible with most other Machine Learning models.

3- Efficient Computation

Logistic Regression is not a resource hungry model (unlike many others, think NNs, SVM, kNN) and this makes it suitable for some simple applications.

4- Reputation is King

Logistic Regression struggles to find real use case in real world problems because of how selective it is.

However, it's still respected and good to know. The leap from Linear Regression models to Logistic Regression was incredible when it was first invented. Today it's easy to understand especially if you have a technical background and it opens your mind how smart the idea was (and is) but I bet you it wasn't that easy to come up with when it was nonexistant.

So not really a practical advantage but at least for its place in history Logistic Regression is like a museum article you don't want to skip.

This doesn't mean it has absolutely no use case in the industry you'll just need very specific cases that it applies to.

5- Unlikely to Overfit

Logistic Regression won't overfit easily as it's a linear model. Especially with the C regularization parameter in scikitlearn you can easily take control of any overfitting anxiety you might have.

6- Large Data is Welcome

Since Logistic Regression comes with a fast, resource friendly algorithm it scales pretty nicely. While many algorithms struggles with large datasets (such as SVMs, kNNs, sometimes Tree based models, etc.) Logistic Regression will scale very nicely and let you harvest your millions of rows without your hair losing its original color, oh wait, unless its original color is white! Anyway I think you get the point.

7- Model Flexibility (Regularization)

Inside the borders of linearity, Logistic Regression actually has some nice fitting flexibility. By using the regularization parameter one can apply different regularization techniques to Logistic Regression to reduce the error in the model or fine tune the fitting.

Lasso, Ridge or Elasticnet regularization models can be applied in this sense. Regularization will make Logistic Regression behave more similarly to Naive Bayes in the sense that, it will become a more generalist model and tend to avoid noise and outliers.

machine learning

logistic regression

Disadvantages

1- Overfitting Possibility

Logistic Regression is still prone to overfitting, although less likely than some other models. To avoid this tendency a larger training data and regularization can be introduced.

2- Regularization

Just as no regularization can be a con, regularization can be a con too. High necessity of regularization in Logistic Regression means just a few more parameters to optimize, advanced topics to dive in and cross validation to carry out (Life of a modern human! Who can relate?).

3- Limited Use Case

Logistic Regression is strictly a classification method and it has lots of competition. (SVMs, Naive Bayes, Random Forests, kNN etc.)

4- Linear Decision Boundary

Logistic Regression inherently runs on a linear model. This means even more restriction when it comes to implementing logistic regression.

If you have a non-linear problem in hand you'll have to look for another model but no worries, there are plenty. (think Naive Bayes, SVM, kNN)

5- High Data Maintenance

Data preparation can be tedious in Logistic Regression as both scaling and normalization are important requirements of Logistic Regression.

6- Can't Handle Missing Data

Logistic Regression is not immune to missing data unlike some other machine learning models such as decision trees and random forests which are based on trees.

This usually means extra work on data regarding processing missing values.

wrap-up

Logistic Regression Pros & Cons Summary

Why Logistic Regression?

Historical, mathematical but maybe not so practical.

Logistic Regression is a very old and respected model and it has been heavily used for almost centuries now! Given the fact that it still finds some room for application is almost mind blowing.

But still, it's pretty restrictive and there are strong alternative candidates out there.

Being fast, large-data friendly, scalable, and regularization-able are Logistic Regression's major strong suits.

Besides, when you have very little training data, a linear classification problem and a hard one at it, this good ol' chap can still show you that he ain't dead yet! Ahoy! (Okay, less Pirates of Caribbean for me from now on.)

Probability Reports

Sometimes plain results just won't cut it. You'll want to hear the reasons behind. Logistic Regression's probability calculations are very welcome in those cases.

Fast

Logistic Regression is not as computationally costly as most other models

Large Data Friendly

Logistic Regression's scalability means

Applicability

It's just not so common to come across linear decision boundary problems that require Machine Learning implementation especially if we also look for feature independence.

Small Data Accuracy

Logistic Regression doesn't require tons of data to get smart. It can produce good results with small data when others can't.

Tedious Data Prep

Normalization and Scaling are realities of Logistic Regression. On top of that you will have to take care of missing values in the data.