Roots in the USAF

Technical report becomes groundbreaking kNN discovery

Leo Breiman, Photo: Salford Systems

Berkeley statisticians collaborate on USAF report

kNN is a truly awesome approach to classifying data and its beauty is its simplicity.

But who first came up with the idea of k-Nearest Neighbors?

Evelyn Fix (1904-1965) was a mathematician/statistician who did her Phd at Berkeley and continued to teach statistics there.

Joseph Lawson Hodges Jr. (1922-2000) was also a statistician at Berkeley and was involved in statistical collaborations with the Twentieth Air Force of USAF (United States Air Forces) starting from 1944.

These two brilliant minds met on a technical analysis report in 1951 produced for USAF where they introduced a non-parametric classification method (discriminant analysis). They never published the paper officially probably because of the nature of the work and confidentiality involved especially considering the global atmosphere in those days shortly after WWII. 

This is the first known introduction of non-parametric classification which became the famous kNN algorithm later on.

Journal Article: E. Fix and J.L. Hodges: An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation: Commentary on Fix and Hodges (1951) can be found here.

Continued Contributions

Next, Thomas Cover and Peter Hart proved an upper bound error rate with multiclass kNN classifications in 1967 (twice the Bayes error rate).

Research paper: Nearest neighbor pattern classification (1967)

Cover & Hart’s discovery paved the way for a number of new rejection studies such as:

There has been many improvements to kNNs since those days and new approaches continue to emerge to this day.

Fun Fact: Elvis Presley also served at Ernest Harmon Air Force Base to which Twentieth Air Force was assigned to

Further Reading

Naomi Altman’s article & research paper are also definitely worth a good read:

Finishing Thoughts

Seven decades after its initial discovery kNN algorithm remains popular and it is still widely used by academia as well as industry professional.

k-Nearest Neighbors algorithm was special in a way that no similar non-parametric supervised machine learning algorithm existed when it was initially introduced.

Research continues to make kNN algorithm more precise and more efficient. Based on its intuitive algorithmic mechanism that’s simple as counting to understand kNN’s popularity and usage is likely to remain.

Thank you for visiting this Machine Learning tutorial. If you have enjoyed it you can find similar articles about other algorithms in the link below: