What One-Hot Encoding Meaning, Applications & Example
Technique for representing categorical variables as binary vectors.
What is One-Hot Encoding?
One-Hot Encoding is a technique used to convert categorical variables into a binary vector representation. Each category is represented as a vector where only one element is 1 (indicating the presence of that category), and all other elements are 0.
Key Features of One-Hot Encoding
- Binary Representation: Each category is mapped to a unique binary vector.
- No Ordinal Relationship: Suitable for categorical data where there is no inherent order (e.g., colors, types).
- Increased Dimensionality: Each new category increases the number of features.
Applications of One-Hot Encoding
- Machine Learning: Often used in preprocessing categorical data for models like logistic regression , decision trees, or neural networks.
- Natural Language Processing (NLP) : Applied to represent words as vectors in tasks like text classification .
Example of One-Hot Encoding
In categorical variable encoding, one-hot encoding can be applied to a list of categories:
import pandas as pd
data = pd.DataFrame({'Color': ['Red', 'Green', 'Blue', 'Green']})
one_hot = pd.get_dummies(data['Color'])
This converts the Color
column into binary vectors representing each color.