- k Nearest Neighbor (kNN) History
- 27 July 2020
- by HolyPython
Roots in the USAF
Technical report becomes groundbreaking kNN discovery
Berkeley statisticians collaborate on USAF report
kNN is a truly awesome approach to classifying data and its beauty is its simplicity.
But who first came up with the idea of k-Nearest Neighbors?
Evelyn Fix (1904-1965) was a mathematician/statistician who did her Phd at Berkeley and continued to teach statistics there.
Joseph Lawson Hodges Jr. (1922-2000) was also a statistician at Berkeley and was involved in statistical collaborations with the Twentieth Air Force of USAF (United States Air Forces) starting from 1944.
These two brilliant minds met on a technical analysis report in 1951 produced for USAF where they introduced a non-parametric classification method (discriminant analysis). They never published the paper officially probably because of the nature of the work and confidentiality involved especially considering the global atmosphere in those days shortly after WWII.
This is the first known introduction of non-parametric classification which became the famous kNN algorithm later on.
Journal Article: E. Fix and J.L. Hodges: An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation: Commentary on Fix and Hodges (1951) can be found here.
Continued Contributions
Next, Thomas Cover and Peter Hart proved an upper bound error rate with multiclass kNN classifications in 1967 (twice the Bayes error rate).
Research paper: Nearest neighbor pattern classification (1967)
Cover & Hart’s discovery paved the way for a number of new rejection studies such as:
- In 1970: Edward Hellman examined “the (k,k?) nearest neighbor rule with a reject option“.
- In 1975: Fukunaga and Hostetler made refinements with respect Bayes error rate
- In 1976: Dudani; in 1978 Bailey and Jain published distance weighted approaches
- In 1983: Adam Jozwik introduced: A learning scheme for a fuzzy k-NN rule
- In 1985: James Keller et al developed FKNN (Fuzzy kNN): A fuzzy k-nearest neighbor algorithm
- In 2000: Bermejo and Cabestany published: Adaptive soft k-nearest-neighbour classifiers.
There has been many improvements to kNNs since those days and new approaches continue to emerge to this day.
Further Reading
Naomi Altman’s article & research paper are also definitely worth a good read:
Finishing Thoughts
Seven decades after its initial discovery kNN algorithm remains popular and it is still widely used by academia as well as industry professional.
k-Nearest Neighbors algorithm was special in a way that no similar non-parametric supervised machine learning algorithm existed when it was initially introduced.
Research continues to make kNN algorithm more precise and more efficient. Based on its intuitive algorithmic mechanism that’s simple as counting to understand kNN’s popularity and usage is likely to remain.
Thank you for visiting this Machine Learning tutorial. If you have enjoyed it you can find similar articles about other algorithms in the link below: