A gradual development

From Babylon to Scikitlearn

Babylonian Pythagoras Theorem Tablet 1750 BC - Manuscripts in the Schøyen Collection of Cuneiform Texts I

Ancient Times

When it comes to older algorithms it becomes difficult to pinpoint exactly who invented what. Decision trees are relatively new however, their roots can be traced all the way back to Babylonians. So, go figure. Here is an attempt to summarize some of the inventions that preceded decision tree structures and particularly decision tree algorithms.

When we talk about a decision tree, we can stretch the argument since it’s a common and practical structure that can be used everywhere and there are some interesting references in history.

As mentioned above, Babylonians used to be a very advanced civilization who founded many mathematical concepts. One of these concepts was the quadratic and cubic equations.

Babylonians were literally able to use the standard quadratic formula to solve an equation like this:

Quadratic Equation

They also used tables with n^3 and n^2 values to solve a cubic equation like this:

Cubic Equation

So, we know that they were able to deal with roots efficiently and non-linearity was no news to them.

Babylonians thrived somewhere around 2000 BC to 500 BC in Mesopotamia and we seem to have about 400 clay tablets representing well documented Babylonian Mathematics! How cool is that.

Aristotle's Genius

If we come a bit closer in civilization history we see more cool innovations such as Aristotle’s “Categories” text. I find this 15 Chapter work breathtaking since Aristotle created a hierarchical structure of human speech and events in 10 categories. It’s almost as if he captured the nature of Artificial Intelligence in those days. He has included text like “man argues”, “horse runs” which looks more like a decision tree machine learning outcome and the fact that he had the imagination to enumerate human’s anticipation of future outcomes is beyond amazing.

You can see 1853 translation of Aristotle’s Categories here.

We are what we repeatedly do; excellence, then, is not an act but a habit.
Aristotle
Philosopher
A section of Aristotle's recovered Papyrus ~350 BC - Courtesy of Ägyptische Museum in Berlin

Modern Days

So obviously, when we are dealing with such a critical pillar of civilization it’s difficult to exactly tell when we came up with decision trees. However, we can see some more traces in modern scientific literature which might be the modern beginnings of decision tree algorithms.

It might be helpful to understand that Decision Tree Algorithms come in different techniques and names. There is not one algorithm that’s always the same. Some of the most common decision tree algorithms today are CART, ID3, C4.5 and CHAID. These different models have different complexities and performances and they evolve as developments continue. With that being said let’s continue trying to track their roots down:

First, we have 1936 dated paper of Ronald Fisher in which he developed “Linear Discriminant Analysis”. He applied Linear Discriminant Analysis to a 2-class problem. Which in 1948 evolved to be applied to a multiclass problem by C.R. Rao.

In 1950s The Application of Automatic Interaction Detection (AID) in Operational Research sees lots of development which leads to more advanced models throughout the 60s and 70s.

https://www.jstor.org/stable/3008458?seq=1

And in 1980 Gordon V. Kass develops CHAID (Chi-square automatic interaction detection) as a decision tree model based on AID.

In my opinion June 1959 paper by William A. Belson: “Matching and Prediction on the Principle of Biological Classification” can be attributed as the inception of modern day decision tree algorithm. (Journal of the Royal Statistical Society. Series C (Applied Statistics)):

https://www.jstor.org/stable/2985543?seq=1

Since CHAID is a major decision tree implementation and is based on AID model, this theory actually holds some water. If we don’t count AID and see CHAID as the first decision tree algorithm to be implemented in that case CHAID paper published by Gordon Kass in 1980 can be refered to as the first decision tree algorithm.

It probably isn’t the exact first application of its kind in history but as far as science literature documentation goes, which British did fantastically in the old days, this is what we can count as the root of decision tree algorithms, no pun intended.

Then we also have ID3 and CART decision tree implementations.

CART (Classification & Regression Trees) on the other hand is another major decision tree implementation which constructs the tree based on a recursive numerical splitting criterion.

CART methodology was developed by Leo Breiman, Jerome Friedman, Richard Olshen and Charles Stone and they published it in their 1984 paper:

ID3 was developed by Ross J. Quinlan and published in March 1986 paper: Induction of Decision Trees, Machine Learning.

CART and ID3 were both major breakthroughes for classification and regression using decision trees however, they both also came respectively 4 years and 6 years after Gordon Kass’ paper from South Africa.

You can see other relevant research papers by Breiman in his Berkeley page here. And the original 1984 CART paper here.

Finishing Thoughts

So what do you think? Did decision tree algorithms start with AID in 1959 or is it Kass’ CHAID paper in 1980? Or do you think Fisher’s 1936 Linear Discriminant Analysis was the seed for the decision tree methods? How about Babylonians and Ancient Greek? Should we forget about their contribution? Don’t you think science and evolution of civilization is a flag race? It certainly is and as much as it’s important to praise our individual heros, it’s also important to remember the contribution of our ancient ancestors and stay humble.