k-nearest Neighbor
Pros & Cons
k Nearest Neighbor
Advantages
1- Simplicity
kNN probably is the simplest Machine Learning algorithm and it might also be the easiest to understand. It's even simpler in a sense than Naive Bayes, because Naive Bayes still comes with a mathematical formula.
So, if you're totally new to technical fields or if your audience requires a very simple explaining, kNN might be a perfect place for starters.
2- Non-parametric
Non-parametric means kNN doesn't make assumptions regarding the dataset. If you don't know too much about the dataset initially this feature can be a lifesaver.
Also, as new values are added, kNN will successfully adjust based on the n_neighbors parameter you have provided.
Being non-assumptive can mean discovering hidden relations in your data and this can mean gaining a whole new perspective or finding out surprise results, which is usually good, depending on the surprise.
You can refer to this page to read more about k-Nearest Neighbor optimization parameters.
3- Great Sidekick
Due to its comprehensible nature, many people love to use kNN as a warm-up tool. It's perfect to test the waters with or make a simple prediction.
k Nearest Neighbor can also be used to create input to another machine learning algorithm or it can also be used to process the results of another machine learning algorithm.
Finally, kNN's uniqueness offers a great value in terms of cross-validation. It's a model that's sensitive to outliers or complex features which makes it a great candidate to challenge output from other machine learning algorithms such as Naive Bayes or Support Vector Machines.
4- Very Sensitive
If you want to explore features with complex relations or if your data has outliers that you'd like to keep in the considerations, kNN can do a great job in this sense.
Especially with the comfort and simplicity of adjusting neighbor parameter everything becomes intuitive and practical.
5- Versatility
kNN is a great tool for classification but it can be used for regression as well.
Paired with its other features such as intuitiveness, simplicity, practicality and accuracy, it's definitely great to be able to use kNN for regression purposes every now and then.
In this sense it's powerful and can be very useful.
6- Non-Linear Performance
Another versatile trait of k Nearest Neighbor is how good it performs in non-linear situations. Given its simplicity even when other non-linear opportunities are available (i.e.: SVM with non-linear kernels comes to mind), kNN is such a simple straightforward option to try.
No wonder it's common to see professionals apply kNN first to get a sense or different view of data.
Holy Python is reader-supported. When you buy through links on our site, we may earn an affiliate commission.
k Nearest Neighbor
Disadvantages
1- Costly Computation
Unfortunately, k Nearest Neighbor is a hungry machine learning algorithm since it has to calculate the proximity between each neighbors for every single value in the dataset.
This doesn't mean it's completely unusable, it's just that it falls out of favor and becomes impractical when you enter the world of big data or similar applications. Something to keep in mind with this sympathetic algorithm.
2- RAM Monster
It's not just the CPU that takes a hit with k Nearest Neighbor, RAM also gets occupied when this little monster is working. kNN stores all its values in the RAM and again, you might not notice it with small implementations but try to work on a large database and
3- Significant Parameters
Although kNN has few parameters to tune this can trick the analyst. k parameter for neighbor amounts and parameter for how distance is calculated can make a huge difference in the outcomes.
Luckily, it's extremely easy and straightforward to play with these parameters and experiment with the way they affect the results. The real risk is not being aware of the fact that they will make a huge impact.
4- Small Dimensions Only
If you want to work on datasets with many features this can be problematic with kNN.
Let's say you have 1 million rows with 100 classes. With a 30/70 test/training split, kNN will have to calculate 100 subtraction, 100 squares and 1 square root for each row (700.000 training rows). Just gives an idea why it gets difficult with large datasets and high feature/class numbers when kNN is being used.
5- Equal Treatment
Equal Treatment is almost always good but here is a case.
Since kNN is non-parametric and it doesn't make any assumptions, this means all the attributes will be treated as equally important for the results.
This is simply not always the case and if you want to navigate around noise in a noisy data kNN may not be suitable for this case.
6- Handling Missing Values
kNN can't handle data with missing values unless you apply a process called imputation. This means missing values in your data will be filled with certain numerical values such as averages, ones, zeros etc.
This can be a tedious extra task and it can also introduce wrong bias to the data.
Luckily, there are readily available tools to impute data in a practical way such as KnnImpute (i.e.: sklearn.impute.KNNImputer) and dealing with missing data is usually just a reality of Data Science.
wrap-up
k Nearest Neighbor Pros & Cons Summary
Why k Nearest Neighbor?
kNN is a go-to Machine Learning Algorithm for many people not because its extremely competent but because it's so practical. It's like the person you favor sometimes, cuz he is family.
Everybody can understand or explain how kNN works in a couple of minutes and the results it gives are usually surprisingly accurate.
Obvious short comings are, it takes up computation resources and it won't be suitable for too many features or very large datasets. Things that might cause problems in some industrial applications but for many cases kNN will do just fine.
Very Intuitive
When you discover about the way kNN functions, everything makes sense and it's so easy to understand its logic. Not that other machine learning algos are that hard to understand but kNN is just readily understandable regardless one's background.
Non-assumptive
Great for discovering hidden patterns or working with unstructured data
Accurate
Might not be the most accurate algorithm always but kNNs are usually quite accurate.
Very Sensitive
You get to include outliers and anomalies in your analysis.
Large Data Problems
kNN will struggle with large datasets especially if data has high dimension.
Very Sensitive
Although this is rather an advantage, it can be a problem if you'd like to take outliers and noisy data into consideration. If you're looking for an insensitive algorithm in this sense you might want to look into Naive Bayes Classifier.
Very Sensitive
Although this is rather an advantage, it can be a problem if you'd like to take outliers and noisy data into consideration. If you're looking for an insensitive algorithm in this sense you might want to look into Naive Bayes Classifier.
Missing Data
Missing Data is not handled as well as Naive Bayes in kNN. If you have too much missing data in dataset this can be a significant problem for kNN.