Q-learning

2025 | AI Dictionary

A reinforcement learning algorithm that predicts expected future rewards.

What is Q-learning?

Q-learning is a model -free reinforcement learning algorithm used to find the optimal action-selection policy for an agent interacting with an environment. It enables an agent to learn the best actions to take by estimating the quality of each action in a given state. The algorithm updates a Q-table to store the value (Q-value) of each state-action pair, which is used to determine the best action in any given state.

How Q-learning Works

Q-table Initialization: The algorithm starts by initializing a Q-table, where each state-action pair has a Q-value. Initially, all Q-values are set to zero or small random values.
Exploration and Exploitation: The agent explores the environment by choosing actions based on a balance of exploration (trying new actions) and exploitation (choosing the best-known action).
Q-value Update: After each action, the Q-value for the corresponding state-action pair is updated based on the reward received and the estimated future rewards. The update is done using the following formula: \[ Q(s, a) = Q(s, a) + \alpha \times [r + \gamma \times \max_{a'}Q(s', a') - Q(s, a)] \] Where:
- \(Q(s, a)\) is the Q-value for the state-action pair.
- \(r\) is the reward received after taking action \(a\).
- \(\gamma\) is the discount factor, controlling the importance of future rewards.
- \(\alpha\) is the learning rate .

Applications of Q-learning

Robotics: Q-learning can be used to teach robots how to navigate through environments, such as finding the shortest path to a goal.
Game Playing: Q-learning can be applied to teach AI agents to play games by optimizing strategies to maximize rewards (e.g., learning how to play chess or video games).
Self-Driving Cars: Q-learning helps in decision-making processes for autonomous vehicles, such as choosing the best route or avoiding obstacles.

Example of Q-learning

In a maze-solving problem, an agent starts at the entrance and must find the exit. Using Q-learning, the agent will explore different paths, gradually learning the best route by updating its Q-table. Initially, the agent might try random actions and get stuck in dead ends, but over time it will learn the optimal sequence of moves to reach the exit, maximizing its cumulative reward.

Did you liked the Q-learning gist?

Learn about 250+ need-to-know artificial intelligence terms in the AI Dictionary.

Read the Governor's Letter

Stay ahead with Governor's Letter, the newsletter delivering expert insights, AI updates, and curated knowledge directly to your inbox.

By subscribing to the Governor's Letter, you consent to receive emails from AI Guv.
We respect your privacy - read our Privacy Policy to learn how we protect your information.