# kNN Optimization

#### K Nearest Neighbor Optimization Parameters Explained

These are the most commonly adjusted parameters with k Nearest Neighbor Algorithms. Let’s take a deeper look at what they are used for and how to change their values:

n_neighbor: (default 5) This is the most fundamental parameter with kNN algorithms. It regulates how many neighbors should be checked when an item is being classified.

weights: (default: “uniform“) Another important parameter, weights, signifies how weight should be distributed between neighbor values.

uniform” : This value will cause weights to be distributed equally among all neighbor values.

distance” : This value will cause weights to be distributed based on their distance (inversely correlated). Closer neighbors will have a higher weight in the algorithm.

[callable] : You can also define a function and assign it to this parameter. Weights will be custom based on the array you are providing.

algorithm: (default: “auto”) Signifies the algorithm that will be used to compute nearest neighbors.

auto“: Uses most suitable algorithm automatically based on dataset.

ball_tree“: Uses BallTree algorithm

kd_tree“: Uses KDTree algorithm

brute“: Uses brute-force search

## Examples:

``knn = KNeighborsClassifier(n_neighbors=40)``
``knn = KNeighborsClassifier(n_neighbors=40, weights="distance")``
``knn = KNeighborsClassifier(algorithm="brute")``

## More parameters

#### More kNN Optimization Parameters for fine tuning

Further on, these parameters can be used for further optimization, to avoid performance and size inefficiencies as well as suboptimal algorithm results:

• leaf_size
• p
• n_jobs

### leaf_size

(default: 30)

If BallTree or KDTree algorithms are chosen this will allow additional parameters to be used such as leaf_size, metrics, metric_size.

leaf_size is an important parameter that can affect performance and size of the algorithm.

### p

(default: 2)

p parameter signifies the power for Minkowski.

1: manhattan_distance (l1)

2: euclidean_distance (l2)

minkowski_distance (l_p) can be used for arbitrary p.

### n_jobs

(default: None)

Signifies the parallel jobs to be allowed at the same time for neighbor algorithm.

None: assigns 1 as value

-1: All processors will be used.

Official Scikit Learn Documentation: sklearn.neighbors.kNeighborsClassifier