What Quantization Meaning, Applications & Example

Technique to reduce model size by using fewer bits per weight.

What is Quantization?

Quantization in machine learning refers to the process of reducing the precision of the model ’s weights and/or activations to reduce the model size and improve inference speed, often for deployment on resource-constrained devices. By approximating continuous values with discrete values, quantization can significantly decrease the memory footprint and computation requirements.

Types of Quantization

  1. Weight Quantization: Reducing the precision of the model’s weights (e.g., from 32-bit to 8-bit integers).
  2. Activation Quantization: Reducing the precision of the activation values during the forward pass.
  3. Post-training Quantization: Applied after the model has been trained, adjusting the weights without requiring retraining.
  4. Quantization-Aware Training (QAT): A technique where quantization is incorporated during training, allowing the model to adapt to the lower precision during the learning process.

Applications of Quantization

Example of Quantization

In a computer vision application, a deep neural network for image classification can be quantized from 32-bit floating point weights to 8-bit integer weights. This can drastically reduce the model size, allowing it to run faster on mobile devices without a significant loss in accuracy.

Read the Governor's Letter

Stay ahead with Governor's Letter, the newsletter delivering expert insights, AI updates, and curated knowledge directly to your inbox.

By subscribing to the Governor's Letter, you consent to receive emails from AI Guv.
We respect your privacy - read our Privacy Policy to learn how we protect your information.

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z