What K-means Clustering Meaning, Applications & Example

An unsupervised algorithm that groups data into k clusters.

What is K-means Clustering?

K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a dataset into distinct groups or clusters. The algorithm works by grouping data points into \(K\) clusters, where each data point belongs to the cluster with the nearest mean (centroid). K-means aims to minimize the variance within each cluster, ensuring that data points within a cluster are as similar as possible while being as different as possible from data points in other clusters.

Key Concepts of K-means Clustering

  1. Centroids: Each cluster is represented by a centroid, which is the mean of all data points in the cluster. The centroids are recalculated during the algorithm’s iterations to improve the clustering.
  2. Clusters: The data points are grouped into \(K\) clusters. The number \(K\) must be predefined before running the algorithm.
  3. Assignment Step: Each data point is assigned to the nearest centroid based on a distance metric, typically Euclidean distance.
  4. Update Step: After the assignment, the centroid of each cluster is recalculated as the mean of all the points assigned to that cluster.
  5. Convergence: The algorithm repeats the assignment and update steps until the centroids no longer change or the maximum number of iterations is reached.

Applications of K-means Clustering

Example of K-means Clustering

An example of how K-means clustering might be applied using Python’s scikit-learn library:

# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Generating some random data
X = np.random.rand(100, 2)

# Applying K-means with 3 clusters
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)

# Getting the cluster centroids
centroids = kmeans.cluster_centers_

# Getting the labels (cluster assignments)
labels = kmeans.labels_

# Plotting the data points and centroids
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='X', s=200)
plt.title('K-means Clustering')
plt.show()

In this example, 100 data points are randomly generated, and K-means is applied to group them into 3 clusters. The centroids of the clusters are displayed as red ‘X’ markers.

This visual representation allows you to observe how the algorithm groups the data and how close the data points are to their respective centroids.

Read the Governor's Letter

Stay ahead with Governor's Letter, the newsletter delivering expert insights, AI updates, and curated knowledge directly to your inbox.

By subscribing to the Governor's Letter, you consent to receive emails from AI Guv.
We respect your privacy - read our Privacy Policy to learn how we protect your information.

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z