What K-nearest Neighbors (KNN) Meaning, Applications & Example
A supervised algorithm that classifies data based on nearest neighbors.
What is K-nearest Neighbors (KNN)?
K-nearest neighbors (KNN) is a simple, instance-based supervised machine learning algorithm used for both classification and regression tasks. The main idea behind KNN is that data points that are close to each other in the feature space tend to have similar labels or values. The algorithm makes predictions based on the majority vote (for classification) or the average value (for regression) of the \(K\)-nearest data points to the point being predicted.
How KNN Works
Distance Metric: The algorithm calculates the distance between the target data point and all other points in the training dataset. The most commonly used distance metric is Euclidean distance, but other metrics like Manhattan or Minkowski distance can also be used.
Choosing K: The parameter \(K\) represents the number of nearest neighbors to consider. A smaller \(K\) makes the model sensitive to noise, while a larger \(K\) makes the model more stable but possibly less sensitive to local patterns.
Prediction:
- For classification, the target label is determined by the majority class of the \(K\) nearest neighbors.
- For regression, the target value is the average of the values of the \(K\) nearest neighbors.
No Training Phase: KNN is a lazy learner, meaning it doesn’t require an explicit training phase. The computation happens during prediction when the distances to the nearest neighbors are calculated.
Applications of KNN
- Classification: KNN is commonly used for classification tasks, such as categorizing emails into spam or not spam, recognizing handwritten digits, or classifying animals based on features like size and weight.
- Recommendation Systems: KNN is used in collaborative filtering to recommend products or content based on the preferences of users who are similar to the target user.
- Anomaly Detection : KNN can identify outliers by checking if a data point has few neighbors in its vicinity, suggesting it is different from the majority of the data.
- Image Recognition : KNN is often used in computer vision tasks, where it can classify images based on pixel values or other extracted features.
Example of K-nearest Neighbors (KNN)
An example of KNN applied for classification using Python’s scikit-learn
library:
# Importing necessary libraries
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Loading the iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Creating the KNN classifier model
knn = KNeighborsClassifier(n_neighbors=3)
# Training the model
knn.fit(X_train, y_train)
# Making predictions
y_pred = knn.predict(X_test)
# Evaluating the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
In this example, the KNN algorithm is applied to the Iris dataset. The dataset is split into training and test sets, the KNN model is trained with \(K=3\), and the accuracy of the predictions on the test set is evaluated.
The accuracy metric gives an indication of how well the model is performing in classifying the test data.
KNN is easy to implement and understand, but it can become computationally expensive as the dataset grows, since it requires calculating distances between the test point and all training points.