Random Forest

Pros & Cons

random forest


Programming Category (English)160x600

1- Excellent Predictive Powers

If you like Decision Trees, Random Forests are like decision trees on 'roids.

Being consisted of multiple decision trees amplifies random forest's predictive capabilities and makes it useful for application where accuracy really matters.

2- No Normalization

Random Forests also don't require normalization

3- Easy Data Preperation

Overall no requirements for normalization or scaling makes Random Forests much more convenient than some other machine learning algorithms when it comes to "data preperation and pre-processing".

4- Missing Data

Just like decision trees, random forests handle missing values like a champ which brings us back to point: "Easy Data Preperation"

With random forests you can also expect more accurate results since they average results of multiple decision trees in a random but specific way.

5- Fast Training (may be wrong)

Training process is relatively faster for decision trees compared to some other algorithms such as random forests.

This makes sense since random forest deals with multiple trees and decision tree is concerned with a single decision tree.

6- Optimization Options

This can be a blessing and a curse depending on what you want.

But random forest offers lots of parameters to tweak and improve your machine learning model.

Just for computational efficiency, oob_score, n_estimators, random_state, warm_start and n_jobs are just a few that comes to mind.

7- Suitable for large data

If you have a large database or a huge dataset you need to work on, no worries. Usually, random forest algorithm handles large datasets pretty well.

Furthermore, you can take advantage of optimization parameters and make your model more efficient than it would be out of the box.

8- Sophisticated Output

Random Forest also provides a very nice and sophisticated output with variable importance.

This interpretation can help you go beyond an accurate prediction and comment on what's more important in achieving the prediction and why.

machine learning

Holy Python is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

random forest


1- Overfitting Risk

Although much lower than decision trees, overfitting is still a risk with random forests and something you should monitor.

2- Limited with Regression

Random Forest regression is a thing, however, with so many regression model opportunities out there in data science world, random forests may not be the go-to regression approach in every application.

It will not be able to predict any value outside the available values since averaging is a big part of random forest models.

So, while random forest is almost unmatched in most classification solutions, it can be limited with regression especially when data has linear nature.

3- Parameter Complexity

Parameter complexity (used for tuning and optimization) doesn't really compare to a task like manually writing a random forest algorithm let alone discovering it.

However, some algorithms are still more complex than others when it comes to optimization and parameters. Although random forests are known to work well without too much optimization they still have lots of hyper-parameters that can -and in some cases should- be adjusted.

On top of decision tree parameters (such as leaf, node, splitting, tree size etc.) random forests also have parameters regarding tree amount in the forest (n_parameters), tree building methods (bootstrap) and out of bag score (oob_score)

4- Biased towards variables with more levels

If your data has categorical variables with different levels of attributes this can be a big problem because random forest algorithm will favor those with more values which can pose a prediction risk.

5- TradeMark situation

If you are going to use random forests in business or a commercial application, you totally can.

But, know that you can't use the name Random Forest and many of its variations as your product or a part of your product without official permission from the owner entities of the trademark.

Random Forests are a bit unique in this sense however, it's still shared under a generous license and this is just an interesting anecdote rather than a significant bottleneck.

I've never heard anyone getting in legal trouble for using the random forest name inappropriately but you still want to respect the agreements and will of the inventors.

Nevertheless, I've seen/heard some people unnecessarily avoid random forests because of this point because either it's a turn off for them or they don't understand the difference between TM and patent.


Random Forest Pros & Cons Summary

Why Random Forests?

Random Forest technology takes decision trees combines them in a sophisticated way and bring the whole idea to another level that's applicable to real, modern day problems.

With so much great power, prediction accuracy, insightful output, ability to navigate with large datasets or databases and ability to handle missing data so well, no wonder why random forests are one of the most commonly utilized and popular machine learning algorithms out there.

It makes one appreciate that founders Leo Breiman and Adele Cutler only TradeMarked the technology, not patent it, and shared it under a license that's free for all to use. Oh, you never heard of the:
"interesting history of Random Forests", well, why don't you check it out?

Great Accuracy

First thing that comes to mind with Random Forests is their prediction accuracy.

Unlikely to overfit

Since averaging of decision trees takes place you can expect less overfitting and less getting stuck with local minima.

Easy Data Prep

Data prep is no big deal. No scaling, no normalization, missing data is OK.

Great Reports

Random Forests give out great reports with variable importance information.

Parameter Complexity

On top of decision trees' parameters (think nodes, leaves, splitting, tree size) you can expect to have more parameters with random forests regarding tree amount etc.

Relatively Slow

Random Forests can get sluggish especially if your grow your forest with too many trees and not optimize well.

Limited Regression

Don't let random forests' superpowers trick you they can perform pretty badly in specific regression problems. (Linear data)