Reinforcement Learning

You might have seen robots doing mundane tasks like cleaning room or serving beer to people. However, these actions are usually remote-controlled by a human. These robots are physically capable of doing things following a set of instructions given to them, but they lack the basic intelligence to decide and do things by themselves. Embedding intelligence is a software challenge, and reinforcement learning, a subfield in machine learning, provides a promising direction towards developing intelligent robotics. 
Reinforcement learning is concerned with how an agent uses the feedback to evaluate its actions and plan about future actions in the given environment to maximize the results. In reinforcement learning, the agent is empowered to decide how to perform a task, which makes it different from other such machine learning models where the agent blindly follows a set of instructions given to it. The machine acts on its own, not according to a set of pre-written commands. Thus, reinforcement learning denotes those algorithms, which work based on the feedback of their actions and decide how to accomplish a complex task. 
These algorithms are rewarded when they make the right decision and are punished when they make the wrong decision. Under favourable conditions, they can do a superhuman performance. Here is an comprehensive Tutorial on Reinforcement learning along with a case study.

Importance of Reinforce Learning

We need technological assistance to simplify life, improve productivity and to make better business decisions. To achieve this goal, we need intelligent machines. While it is easy to write programs for simple tasks, we need a way out to build machines that carry out complex tasks. To Achieve this is to create machines that are capable of learning things by themselves. Reinforce learning does this.

Reinforcement Learning Basics

Basics of reinforcement machine learning include:

  • An Input, an initial state, from which the model starts an action
  • Outputs – there could be many possible solutions to a given problem, which means there could be many outputs
  • The training on deep reinforcement learning is based on the input, and the user can decide to either reward or punish the model depending on the output. The model decides the best solution based on the maximum reward.
  • The model considers the rewards and punishments and continues to learn through them.

Reinforcement Learning: Types 

Reinforcement is of two different types: positive and negative

A reinforcement is considered positive when a given event has a positive effect such as an increase in the frequency and strength of the behaviour. 
Positive reinforcement has the following advantages:

  • It gives the maximum possible performance
  • It sustains the change for a long time

Positive reinforcement has a disadvantage as well – if the reinforcement is too much, it could cause overload and weaken the result.

A reinforcement is considered negative when an action is stopped or dodged due to a negative condition.

Deep Reinforcement Learning

Deep reinforcement learning uses a training set to learn and then applies that to a new set of data. It is a bit different from reinforcement learning which is a dynamic process of learning through continuous feedback about its actions and adjusting future actions accordingly acquire the maximum reward.
Fields of Applications 

  • Gaming
  • Robotics
  • E-commerce
  • Self-driving cars
  • Industrial automation
  • Stock price forecasting
  • News
  • Design training systems
  • Web search engines like Google
  • Photo tagging applications
  • Spam detector applications
  • Weather forecasting application

Definitions in Reinforcement Learning

There are several concepts and definitions in reinforcement learning. Major ones are listed below:
Agent: Agent is the one that takes actions. For instance, Super Mario is an agent as it navigates a video game. 
Action (A): It is the collection of all possible moves any agent is capable of making.  It is self-explanatory, and the agents can choose from a set of possible actions. 
Discount factor: To fight against delayed gratification, we need to make immediate rewards greater than future rewards. The discount factor is used for this and thus apply a short-term gratification in the agent. 
Environment: Just as the word implies, the ‘environment’ is the surroundings through which the agents move.  The environment considers the action and the current state of the agent as the input and grants a reward for the agent in the next state, and that is the output.
State: This refers to the current situation where the agent places itself – such as a specific place or action. A state relates the agent to other relevant things such as obstacles, rewards, enemies and tools. 
Reward: This denotes the feedback given for an action taken by the agent. The feedback is an evaluation of the agent’s action and decides if it is a success or failure. 
Policy: This denotes the agent’s strategy to decide the next course of action. Each policy is taken based on the current state. It aims to do those actions that bring in the highest reward. 
Value: Denotes expected long-term return to the current state, in contrast to the short-term rewards.  
Q-value or action-value: It is very similar to the concept of value, except that it considers the current action as well.  Q-value is the one that maps the state and action to rewards. Trajectory: This denotes several states lined in a sequence and the actions that could influence them. 
From the feedback loop given above, an agent does a certain action based on the environment it is, in and this constitutes the state. The agent’s action and the environment are considered and then a feedback is generated, which decides if that action is a success or failure. The goal could be different in different scenarios. 

  • The goal in a video game may be to finish the game with maximum points. Hence, each additional point gained in the game will affect the subsequent action of the agent.
  • The goal in the real world may be to travel between two points, say, A to B. Every small unit the robot moves closer towards point B could be counted as points.

Pros and Cons of Reinforcement Machine Learning


  • It helps to solve very complex problems that conventional techniques fail to solve
  • It gives long-term results that are very difficult to accomplish.
  • This model works like human learning pattern and hence, demonstrates perfection in every action.
  • The model is capable of learning from the errors and corrects them. So there is a very little chance of repetition of the same error. 
  • It learns from experience and hence a dataset is not needed to guide its actions. 
  • It provides scope for an intelligent examination of the situation-action relation and creates the ideal behaviour within a given context, that leads to maximum performance.


  • Too much of reinforcement may cause an overload which could weaken the results.
  • Reinforcement learning is preferred for solving complex problems, not simple ones.
  • It requires plenty of data and involves a lot of computation. 
  • Maintenance cost is high

Challenges Faced by Reinforcement Learning

As mentioned earlier, reinforcement learning uses feedback method to take the best possible actions. This makes it suitable for finding a solution for many complex problems and it has found application in many domains. But it faces many challenges as well. The main one is the challenge in creating the simulation environment that depends a lot on the chosen task. In chess or Go games, where the model has to perform superhuman tasks, the environment is simple. However, it is a bit complex when you consider a real-life application like designing an autonomous car model where you need a highly realistic simulator. This is crucial as you are going to drive the car on the street. The model must be capable of figuring out how and when to apply the brake or how to avoid a collision. It could not be a problem in a virtual world, but it becomes a hard-to-crack-problem when you need to hit the real world. Things get tricky when you transfer the model from the safe training environment into the real world.
Another challenge lies in tweaking and scaling the neural network that controls the agent.  It is complex because the only way to communicate with the network is through rewards and penalties. The major challenge associated with this is that this could lead to catastrophic forgetting or in other words, this might cause some old knowledge to get erased as it acquires new knowledge. 
Another challenge is that sometimes the agent does a task just as it is, which means the model does not achieve the optimal output. For example, the model causes a jumper to just jump like a kangaroo, instead of leading the agent to do things that we expect the agent to do – such as walking.  
Last but not least, there could arise a problem where the agent just optimizes the prize but does not intend to do the task. Consider the open AI video as an example of this. In this video, the agent learned to bag the rewards without completing the race. 
There is no doubt that reinforcement machine learning has huge potential to change the world. The biggest advantage of this cutting-edge technology is that it is capable of learning by itself through trial and error, just like human beings. It makes mistakes, corrects them, learn from them to avoid making the same mistake in the future. It can be best combined with other machine learning technologies for better performance. No wonder that it is used in many real-world applications such as robotics, gaming to mention some. It is the best way to incorporate creative and innovation to perform a task. Reinforcement learning surely has the potential to become a revolutionary technology in the future development of artificial intelligence. 

Source :

Leave a Reply

Your email address will not be published. Required fields are marked *