These are the most commonly adjusted parameters with k Nearest Neighbor Algorithms. Let’s take a deeper look at what they are used for and how to change their values:
n_neighbor: (default 5) This is the most fundamental parameter with kNN algorithms. It regulates how many neighbors should be checked when an item is being classified.
weights: (default: “uniform“) Another important parameter, weights, signifies how weight should be distributed between neighbor values.
“uniform” : This value will cause weights to be distributed equally among all neighbor values.
“distance” : This value will cause weights to be distributed based on their distance (inversely correlated). Closer neighbors will have a higher weight in the algorithm.
[callable] : You can also define a function and assign it to this parameter. Weights will be custom based on the array you are providing.
algorithm: (default: “auto”) Signifies the algorithm that will be used to compute nearest neighbors.
“auto“: Uses most suitable algorithm automatically based on dataset.
“ball_tree“: Uses BallTree algorithm
“kd_tree“: Uses KDTree algorithm
“brute“: Uses brute-force search
knn = KNeighborsClassifier(n_neighbors=40)
knn = KNeighborsClassifier(n_neighbors=40, weights="distance")
knn = KNeighborsClassifier(algorithm="brute")
Further on, these parameters can be used for further optimization, to avoid performance and size inefficiencies as well as suboptimal algorithm results:
(default: 30)
If BallTree or KDTree algorithms are chosen this will allow additional parameters to be used such as leaf_size, metrics, metric_size.
leaf_size is an important parameter that can affect performance and size of the algorithm.
(default: 2)
p parameter signifies the power for Minkowski.
1: manhattan_distance (l1)
2: euclidean_distance (l2)
minkowski_distance (l_p) can be used for arbitrary p.
(default: None)
Signifies the parallel jobs to be allowed at the same time for neighbor algorithm.
None: assigns 1 as value
-1: All processors will be used.
Official Scikit Learn Documentation: sklearn.neighbors.kNeighborsClassifier