Decision Trees with Superpowers

An algorithmic breakthrough in the new millenia

Leo Breiman, Photo: Salford Systems

Randomized tree inspiration sources

Random ForestsTM is a trademark of Leo Breiman and Adele Cutler (You can read more about that at the bottom of this article.)

In 2000, Leo Breiman of Berkeley University pointed out that decision trees are same as kernels with true margins. He published his discovery in this phenomenal research paper in 2001: paper here or here.

His paper was heavily inspired by 

1- Amit and Geman’s 1997 paper “Shape quantization and recognition with randomized trees”

2- Dietterich’s 1998 paper about “comparison of three methods for constructing ensembles of decision trees:
Bagging, boosting and randomization, Machine Learning”
and 

3- His own bagging technology inventions:

Breiman, L. (1996a). Bagging predictors.
Breiman, L. (1996b). Out-of-bag estimation.
Breiman, L. (1998a). Arcing classifiers (discussion paper).
Breiman. L. (1998b). Randomizing outputs to increase prediction accuracy.
Breiman, L. (1999). Using adaptive bagging to debias regressions.
Breiman, L. (2000). Some infinity theory for predictor ensembles.

Historical RFTM Advancements

After Breiman and Cutler’s discovery of Random ForestsTM  There has been quite some improvements made.

Lin and Jeon: Adaptive Nearest Neighbour contribution to the random forestsTM This meant random forestsTM  could be adaptive kernel estimates. Research paper here.

Davies and Ghahramani came and propose that Random Forest Kernel can outperform other kernel methods. here

Erwan Scornet was the first to invent KeRF and pointed out the link between KeRFs and Random Forests, paper here

Arlot and Genuer demonstrated that the bias of an infinite forest decreases at a faster rate than a single tree and that infinite forests have significantly better risk rates than single trees. here 

 

Random Forest algorithm is consisted of multiple single decision trees

Trademarks, copyrights and patents

Chances are you don’t deal with intellectual property terminology in your everyday life and the terms can be a little confusing. It may even sound slightly offensive to the open source community to hear that Random ForestTM is a trademark. But the reality is different than that. So, 

-Why did Leo Breiman and Adele Cutler trademark the Random Forest brand?

Before answering that question let’s answer this:

-What’s the difference between patent, copyright and trade mark?

Patent and copyright are very different than trade mark.

Patent gives intellectual property rights to its owner and no one can use a patented invention, service or product without permission, deal or license etc.

Copyright © is similar to patent but instead of scientific or industrial inventions it usually covers creative work such as art , design, literature, design, caricature, cartoon, music, song, beat, lyric, content etc.

Trademark (Registered Trademark, Service Mark) ®, TM , SM only concerns the name of a product or service. When something is trade mark it gets a TM designation and you can’t commercialize anything with a trade marked product without permission from the owner(s).

An interesting note: while patents are only valid for 20 years companies can renew trademarks as long as the company is alive by renewing trademarks every 10 years. Copyrights are valid for during the artists life and for 70 years after their passing.

If we come back to random forests, Leo Breiman and Adele Cutler trade marked the names: RFTM, RandomForestsTM, RandomForestTM and Random ForestTM.

This allowed them to protect the naming of their invention and also, they were able to license their work to Salford Systems and collect consulting fees for their service.

On top of that, they published random forestsTM under GNU license allowing everyone to work with them or commercialize their own work with random forestsTM. Everyone wins.

Finishing Thoughts

Random forestTM is a recent invention considering how long humans worked with decision tree structures (You can read an interesting historical article on decision trees here). I think, although Leo Brieman is already strongly praised, we will digest how big of an achievement Random ForestTM is in the future.

Random Forest technology made a very old very common algorithm more accurate and more popular than ever and even more importantly random forest enabled working with large datasets and big data using the base of decision trees methods.

Random Forest shines as one of the most useful, accurate and efficient Machine Learning Algorithms and it will very likely continue to stay popular as datasets get larger and technology becomes common and abundant.