kNN Optimization

K Nearest Neighbor Optimization Parameters Explained

These are the most commonly adjusted parameters with k Nearest Neighbor Algorithms. Let’s take a deeper look at what they are used for and how to change their values:

n_neighbor: (default 5) This is the most fundamental parameter with kNN algorithms. It regulates how many neighbors should be checked when an item is being classified.

weights: (default: “uniform“) Another important parameter, weights, signifies how weight should be distributed between neighbor values.

uniform” : This value will cause weights to be distributed equally among all neighbor values.

distance” : This value will cause weights to be distributed based on their distance (inversely correlated). Closer neighbors will have a higher weight in the algorithm.

[callable] : You can also define a function and assign it to this parameter. Weights will be custom based on the array you are providing.

algorithm: (default: “auto”) Signifies the algorithm that will be used to compute nearest neighbors. 

auto“: Uses most suitable algorithm automatically based on dataset.

ball_tree“: Uses BallTree algorithm

kd_tree“: Uses KDTree algorithm

brute“: Uses brute-force search

Examples:

knn = KNeighborsClassifier(n_neighbors=40)
knn = KNeighborsClassifier(n_neighbors=40, weights="distance")
knn = KNeighborsClassifier(algorithm="brute")

More parameters

More kNN Optimization Parameters for fine tuning

Further on, these parameters can be used for further optimization, to avoid performance and size inefficiencies as well as suboptimal algorithm results:

  • leaf_size
  • p
  • n_jobs

leaf_size

(default: 30)

If BallTree or KDTree algorithms are chosen this will allow additional parameters to be used such as leaf_size, metrics, metric_size.

leaf_size is an important parameter that can affect performance and size of the algorithm.

p

(default: 2)

p parameter signifies the power for Minkowski.

1: manhattan_distance (l1)

2: euclidean_distance (l2)

minkowski_distance (l_p) can be used for arbitrary p.

n_jobs

(default: None)

Signifies the parallel jobs to be allowed at the same time for neighbor algorithm.

None: assigns 1 as value

-1: All processors will be used.

Official Scikit Learn Documentation: sklearn.neighbors.kNeighborsClassifier