Skip to content

Linear Regression

Pros & Cons

linear regression

Advantages

1- Fast

Like most linear models, Ordinary Least Squares is a fast, efficient algorithm. You can implement it with a dusty old machine and still get pretty good results.

2- Proven

Similar to Logistic Regression (which came soon after OLS in history), Linear Regression has been a breakthrough in statistical applications.

It has been used to identify countless patterns and predict countless values in countless domains all over the world in last couple of centuries.

With its computationally efficient and usually accurate nature, Ordinary Least Squares and other Linear Regression extensions remain popular both in academia and the industry.

3- General Tendencies

If you have outliers that you'd like to observe. Or if you want to conclude unexpected black-swan like scenarios this is not the model for you.

Like most Regression models, OLS Linear Regression is a generalist algorithm that will produce trend conforming results.

4- Strong Statistical Reporting

With Linear Models such as OLS (also similar in Logistic Regression scenario), you can get rich statistical insights that some other advanced or advantageous models can't provide.

If you are after sophisticated discoveries for direct interpretation or to create inputs for other systems and models Ordinary Linear Squares algorithm can generate a plethora of insightful results ranging from, variance, covariance, partial regression, residual plots and influence measures.

For this feature OLS can be viewed as a perfect supportive Machine Learning Algorithm that will complete and compete with most modern algorithms. Just keep the limitations in mind and keep on exploring!

machine learning

linear regression

Disadvantages

1- Technical Learning Curve

Linear Regression in general is nothing like k Nearest Neighbors. It can be considered very distant relatives with Naive Bayes for its mathematical roots however, there are so many technical aspects to learn in the regression world.

This is more like an opportunity to learn about statistics and intricacies of datasets however, it's also definitely something that takes away from practicality and will discourage some of the time conscious, result oriented folks.

2- Only Linear Problems

Ordinary Least Squares won't work well with non-linear data. If you are not sure about the linearity or if you know your data has non-linear relations then this is a giveaway that most likely Ordinary Least Squares won't perform well for you at this time.

3- General Tendencies

If you'd like to predict outliers or if you want to conclude unexpected black-swan like scenarios this is not the model for you.

Like most Regression models, OLS Linear Regression is a generalist algorithm that will produce trend conforming results.

4- Overfitting Tendencies

Just because OLS is not likely to predict outlier scenarios doesn't mean OLS won't tend to overfit on outliers. Ordinary Least Squares is an inherently sensitive model which requires careful tweaking of regularization parameters.

5- Complicated Optimization

When you enter the world of regularization you might realize that this requires an intense knowledge of data and getting really hands-on.

There is no one regularization method that fits it all and it's not that intuitive to grasp very quickly. So, not to say there is no merit in these efforts and discussions, it might discourage someone seeking a more practical application or the general crowd.

It's also worth noting that perfect regularization can be difficult to validate and time consuming. On the other hand it's quite important to get it right because if you under do it you will risk overfitting on irrelevant features and if you over do it the risk is to miss out on important features that might be valuable/relevant for future predictions.

wrap-up

Linear Regression Pros & Cons Summary

Why Linear Regression?

As one of the main foundations of statistics field, Linear Regression offers tons of proven track record, reputable scientific research and many interesting extensions to choose and benefit from.

Like it's many regression cousins it is fast, scientific, efficient, scalable and powerful.

Don't let its initial simplicity trick you, Ordinary Least Squares and other Linear Regression Models in general require serious understanding of the nature of data in hand and also data processing and regularization methods such as scaling, normalization, missing data handling, l1 regularization, l2 regularization etc.

Besides that OLS can generate very useful statistical reports that might expand your technical horizons in the field.

When you are dealing with linear decision borders and in need of predicting continuous numerical values through regression, OLS and other extensions are highly recommended to dabble with even if you end up with a different Machine Learning Algorithm in the end.

Fast

Linear Regression is fast and scalable. It's not very resource-hungry.

Large Data Friendly

Scalability also means you can work on big data problems.

Statistical Reports

Statistical output you are able to produce with a Ordinary Least Squares far outweighs the trouble of data preparation (given that you are after the statistical output and deep exploration of your data and all its relation/causalities.)

Modifiable

You don't survive 200 something years of heavy academia and industry utilization and happen not to have any modifications. Once you open the box of Linear Regression, you discover a world of optimization, modification and extensions (OLS, WLS, ALS, Lasso, Ridge, Logistic Regression just to name a few).

Implementation Restrictions

If your problem has non-linear tendencies Linear Regression is instantly irrelevant.

Overfitting

Another problem is when data has noise or outlier and Linear Regression tends to overfit. Discovering and getting rid of overfitting can be another pain point for the unwilling practitioner. And even if you are willing, at times it can be difficult to reach optimal setup.

Data Preperation

Regularization, handling missing values, scaling, normalization and data preparation can be tedious.

Learning Curve

Even interpreting the results of Linear Regression as they are intended in a meaningful way can take some education which makes it a bit less appealing to non-statistical audience.