What Naive Bayes Meaning, Applications & Example
A simple probabilistic classifier based on Bayes' theorem.
What is Naive Bayes?
Naive Bayes is a classification algorithm based on Bayes’ Theorem, which uses probability theory to predict the category of an input data point. It assumes that the features of the data are independent (naive assumption), which simplifies the calculation of probabilities. Despite its simplicity, Naive Bayes often performs well in many real-world classification tasks.
How Naive Bayes Works
Bayes’ Theorem: Naive Bayes calculates the probability of a class label given the features of the data. It applies Bayes’ Theorem to compute the likelihood of each class based on the features.
\[ P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)} \]Where:
- \( P(C|X) \) is the probability of class \( C \) given the features \( X \).
- \( P(X|C) \) is the likelihood of the features \( X \) given class \( C \).
- \( P(C) \) is the prior probability of class \( C \).
- \( P(X) \) is the probability of the features.
Feature Independence: Naive Bayes assumes that all features are independent, meaning the presence or absence of one feature does not affect the others. This assumption simplifies the calculation of the likelihood \( P(X|C) \).
Prediction: For each class, Naive Bayes calculates the posterior probability and assigns the class with the highest probability as the predicted class.
Applications of Naive Bayes
- Spam Detection: Used in email filtering to classify emails as spam or not based on the occurrence of certain words or patterns in the content.
- Sentiment Analysis : Classifies text into sentiment categories (positive, negative, neutral) by analyzing the frequency of words or phrases.
- Document Classification: Categorizes documents into predefined categories (e.g., news articles into topics like sports, politics, etc.).
- Medical Diagnosis: Helps predict disease categories based on patient symptoms and medical history.
Example of Naive Bayes
In spam email detection, Naive Bayes can classify an email as “spam” or “not spam” by analyzing the frequency of certain words, like “free,” “win,” or “offer.” The algorithm calculates the probability of the email belonging to each class and assigns it to the class with the highest probability, effectively filtering out unwanted spam messages.