kNN Optimization
K Nearest Neighbor Optimization Parameters Explained
- n-neighbors
- weights
- algorithm
These are the most commonly adjusted parameters with k Nearest Neighbor Algorithms. Let’s take a deeper look at what they are used for and how to change their values:
n_neighbor: (default 5) This is the most fundamental parameter with kNN algorithms. It regulates how many neighbors should be checked when an item is being classified.
weights: (default: “uniform“) Another important parameter, weights, signifies how weight should be distributed between neighbor values.
“uniform” : This value will cause weights to be distributed equally among all neighbor values.
“distance” : This value will cause weights to be distributed based on their distance (inversely correlated). Closer neighbors will have a higher weight in the algorithm.
[callable] : You can also define a function and assign it to this parameter. Weights will be custom based on the array you are providing.
algorithm: (default: “auto”) Signifies the algorithm that will be used to compute nearest neighbors.
“auto“: Uses most suitable algorithm automatically based on dataset.
“ball_tree“: Uses BallTree algorithm
“kd_tree“: Uses KDTree algorithm
“brute“: Uses brute-force search
Examples:
knn = KNeighborsClassifier(n_neighbors=40)
knn = KNeighborsClassifier(n_neighbors=40, weights="distance")
knn = KNeighborsClassifier(algorithm="brute")
More parameters
More kNN Optimization Parameters for fine tuning
Further on, these parameters can be used for further optimization, to avoid performance and size inefficiencies as well as suboptimal algorithm results:
- leaf_size
- p
- n_jobs
leaf_size
(default: 30)
If BallTree or KDTree algorithms are chosen this will allow additional parameters to be used such as leaf_size, metrics, metric_size.
leaf_size is an important parameter that can affect performance and size of the algorithm.
p
(default: 2)
p parameter signifies the power for Minkowski.
1: manhattan_distance (l1)
2: euclidean_distance (l2)
minkowski_distance (l_p) can be used for arbitrary p.
n_jobs
(default: None)
Signifies the parallel jobs to be allowed at the same time for neighbor algorithm.
None: assigns 1 as value
-1: All processors will be used.
Editor’s Pick
Udacity up to 75% off
Udacity Data Science Courses
You can now enroll in Udacity Data Science and coding courses with up to 75% off discount and enable lucrative career opportunities.
*Financial aid available
Official Scikit Learn Documentation: sklearn.neighbors.kNeighborsClassifier