K-nearest Neighbors (KNN)

2024 | AI Dictionary

A supervised algorithm that classifies data based on nearest neighbors.

What is K-nearest Neighbors (KNN)?

K-nearest neighbors (KNN) is a simple, instance-based supervised machine learning algorithm used for both classification and regression tasks. The main idea behind KNN is that data points that are close to each other in the feature space tend to have similar labels or values. The algorithm makes predictions based on the majority vote (for classification) or the average value (for regression) of the \(K\)-nearest data points to the point being predicted.

How KNN Works

  1. Distance Metric: The algorithm calculates the distance between the target data point and all other points in the training dataset. The most commonly used distance metric is Euclidean distance, but other metrics like Manhattan or Minkowski distance can also be used.

  2. Choosing K: The parameter \(K\) represents the number of nearest neighbors to consider. A smaller \(K\) makes the model sensitive to noise, while a larger \(K\) makes the model more stable but possibly less sensitive to local patterns.

  3. Prediction:

    • For classification, the target label is determined by the majority class of the \(K\) nearest neighbors.
    • For regression, the target value is the average of the values of the \(K\) nearest neighbors.
  4. No Training Phase: KNN is a lazy learner, meaning it doesn’t require an explicit training phase. The computation happens during prediction when the distances to the nearest neighbors are calculated.

Applications of KNN

Example of K-nearest Neighbors (KNN)

An example of KNN applied for classification using Python’s scikit-learn library:

# Importing necessary libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Loading the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Creating the KNN classifier model
knn = KNeighborsClassifier(n_neighbors=3)

# Training the model
knn.fit(X_train, y_train)

# Making predictions
y_pred = knn.predict(X_test)

# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

In this example, the KNN algorithm is applied to the Iris dataset. The dataset is split into training and test sets, the KNN model is trained with \(K=3\), and the accuracy of the predictions on the test set is evaluated.

The accuracy metric gives an indication of how well the model is performing in classifying the test data.

KNN is easy to implement and understand, but it can become computationally expensive as the dataset grows, since it requires calculating distances between the test point and all training points.

Did you liked the K-nearest Neighbors (KNN) gist?

Learn about 250+ need-to-know artificial intelligence terms in the AI Dictionary.

Read the Governor's Letter

Stay ahead with Governor's Letter, the newsletter delivering expert insights, AI updates, and curated knowledge directly to your inbox.

By subscribing to the Governor's Letter, you consent to receive emails from AI Guv.
We respect your privacy - read our Privacy Policy to learn how we protect your information.

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z