A gentle introduction to Reinforcement Learning

Djurdjura Mountains in Kabylie region — Algeria- Taken by me

What is the difference between RL and other Machine Learning field?

As we know, Machine Learning can be split into three major domains: Supervised Learning, Unsupervised Learning, and Reinforcement Learning. In both supervised and unsupervised learning, we try to learn from available data contained in a dataset in order to build the best model which generalizes to similar data that are not used during the training process. In contrast, in Reinforcement Learning, there is no available data from where we can learn. Instead, we have an environment with whom we can interact. After each interaction with the environment, a feedback is received. From this process of trial-error, we will be able to understand how the environment actually works.

How does Reinforcement Learning work?

Well, as mentioned in the previous section, the goal is to build an agent that learns how the environment works through a trial-error process. At each time (step), the agent will perform an action in the environment. If the action is good, the agent receives positive feedback (reward). Else, it receives a negative reward. The role of the agent is to maximize the total rewards he gets from the environment. The environment can be represented as a set of states. From each state, we can reach some other states. For those how have studied Markov Decision Process, it is exactly what we are doing here.

From Nikolay Atanasov lecture

How does the agent learn?

As we saw previously, the goal of the agent is to maximize the total rewards he gets from the environment. Thus, when he is at state s, he has to choose the best action that gives him the highest reward. But, by doing that, it means that the agent does not take into account the impact of choosing a certain action on the future rewards he will receive since he is only interested in the reward he will get for one step. To come over this problem, the agent will choose the action that maximizes its rewards in the long term. We then introduce the discounted expected return function that the agent has to maximize.

Bellman Optimality Equation

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Amrani Amine

Amrani Amine

5th year computer science and statistics engineering student at Polytech Lille, I am fascinated by machine learning and Graph Neural Networks.