You probably heard about reinforcement learning (RL) already. But do you know what it is and what its use cases are? In this blog post, we’ll explore how reinforcement learning works and show some of its applications. We’ll also discuss some of the challenges and limitations of applying RL in practice. Plus, we’ll provide examples of how RL can be used to solve real-world problems and transform your business.
As a primer, consider watching this multi-agent hide-and-seek training from OpenAI which illustrates the strengths of RL pretty well.
A Short Reinforcement Learning Glossary
We’ll be using some reinforcement learning lingo in this article, so if you’re not familiar, here is a quick refresher. If you don’t need it, feel free to skip to the next section.
The following is a definition of terms within the context of RL:
Agent: The agent can be considered as the protagonist in an RL approach. Its job is to figure out actions that maximize the reward. To derive this decision, the agent observes the current state.
Reward: The reward is the gratification an agent gets for choosing a good action over a bad one, i.e. bringing the agent closer to its goal.
Environment: An agent acts within an environment, which imposes the problem boundaries. It defines the action space and state space.
Interpreter: Evaluates the environment and derives a reward.
Action: To interact with the environment, the agent has the option to choose an action. Possible actions are defined in the action space.
State: The state describes the status of an environment and therefore, how it was affected by the action of an agent. Usually, the state of an environment is just partially observable.
Episode: An episode consists of multiple steps, i.e. it can be considered as the set of steps taken to the end of a game or a terminal state. Once reached, the game is reset so that a new episode can begin.
What Is Reinforcement Learning?
Reinforcement learning is a sub-field of machine learning (ML) and focuses on decision-making in an environment by receiving feedback in the form of rewards or penalties. This approach is particularly useful for problems in which we know the desired outcome but are not sure about the steps along the way. Also, large state and action spaces are good indicators for problems where RL is suited well.
Reinforcement Learning Compared to Other ML Techniques
While supervised learning uses labeled data to adjust the parameters of a transfer function and subsequently map input data to output values, unsupervised learning mainly derives patterns from similarities within a dataset.
In contrast, reinforcement learning targets a different problem set and doesn’t need labeled input/output pairs. The goal is to train an agent that is able to observe and interact with an environment in order to reach a desired outcome. The agent aims to optimize its decision-making process while receiving state observations and rewards from the environment. Based on the previously learned experience, the agent can derive new actions and improve its decision-making. This cycle of observation-action-reward (also known as a step) repeats until an episode (a collection of steps) terminates. This typically happens, once a target reward or the step limit has been reached. Then, a new episode starts from scratch except that the agent’s previous experience is stored and can be considered for future decision-making.
Many advanced applications in the reinforcement learning space use a deep neural network, i.e. deep learning (DL), for enhancing learning capabilities. This variation is called deep reinforcement learning (DRL). For simplicity, we align with literature in this domain and use RL as a standard term including DRL, since this is a common simplification.
Now that we went through a basic introduction to the topic, where does reinforcement learning actually play a role?
Machines That Outperform Humanity
What do Korean board game players and world-class sailors have in common? They were beaten by reinforcement learning.
Infamous AlphaGo
The most prominent example of RL beating humanity is already a couple of years old. Lee Sedol, the world’s best Go player at the time, took part in the Google DeepMind Challenge Match in 2016 and represented humanity in the challenge of playing Go against an RL agent. Go is known as the most complex strategy game in the world, and was therefore a good stage to demonstrate the decision-making power of RL. His opponent: AlphaGo, a reinforcement learning algorithm created by Google DeepMind.
Sedol was sure of victory and prophesied that no AI or machine could beat him in Go. But his confidence disappeared once he was about to face his defeat in the first match. In total, Sedol and AlphaGo played 5 matches on 5 different days, of which 4 matches were won by the reinforcement learning algorithm. The award-winning documentary is available on YouTube and is definitely worth watching.
Regatta Sailing With RL
The Emirates Team New Zealand participates in the America’s Cup, which has always been a challenge that combines technology, innovation, and sailing. To test the designs of new vessels before building them, they came up with a simulator. This strategy was significant for the team’s victory in 2017 but also required multiple team members to sit in front of the simulator and simultaneously execute maneuvers using the digital twin.
In 2019, the team decided for testing designs 24/7, decreasing the required time for design iterations. Therefore, a McKinsey-built RL agent learned to sail like a world-class sailor to scale tests without depending on a human labor force. Soon, the agent even outperformed his human competitors and consequently, sailors were learning maneuvers from the agent. As a result, the learned tactics and RL-designed hydrofoils led to another win for the Emirates Team New Zealand in 2021.
Is Reinforcement Learning the Future?
There are many controversial discussions as to the future of RL. One of the most prominent blog posts is from 2018 in which Alex Irpan (a software engineer at Google) tries to explain, why deep reinforcement learning doesn’t work yet.
This was written at a time when the hype around reinforcement learning was huge and it was publicized as a way to implement artificial general intelligence (AGI) in the near future. I’m sure that his following claim holds true for 2018.
Whenever someone asks me if reinforcement learning can solve their problem, I tell them it can’t. I think this is right at least 70% of the time. (Alex Irpan)
But the same holds true for machine learning & data science, too. According to KDnuggets, the majority of data scientists say that 80% of models built with the intention to be deployed never reach production. But does this mean machine learning does not work? Not really.
As always in life, a single tool cannot solve each of your problems. Instead, having a toolbox full of different ML approaches ready to solve specific needs is what companies should and do look for.
Know Your Solution Space
Of course, traditional algorithms without a learning component are often times enough to do the job. In our experience at Motius, whenever someone asks if ML should be used to solve a specific problem, our answer is that it can probably be done in a simpler way. Or there are at least better approaches out there.
Looking into a solution space before understanding the problem precisely is typically related to the hype train surrounding the specific solution and the subsequent fear of missing out. This is why we try to focus on the problem space first and then come up with a potential solution, considering a variety of approaches. Sometimes ML is the right attempt and sometimes not, but our take is to never use ML/DL/RL just for the sake of doing so.
So, Is Reinforcement Learning Dead?
Even though it seems like the initial hype around this technology is gone, valuable real-world use cases are still emerging.
For instance, Amazon uses deep reinforcement learning to increase the efficiency of its inventory system by 12%. According to the involved researchers, their “model is able to handle lost sales, correlated demand, stochastic vendor lead-times, and exogenous price matching”. To achieve this incredible result in inventory management, Amazon generated a proxy of the demand for products in times of uncertainty or missing data.
However, Irpan claims:
DQN can solve a lot of the Atari games, but it does so by focusing all of learning on a single goal – getting really good at one game. The final model won’t generalize to other games, because it hasn’t been trained that way. - Alex Irpan
This might have been valid back in 2018, but Google DeepMind proved it wrong just a few weeks ago. Their newly published paper on Mastering Diverse Domains through World Modelsaddresses the problem of solving tasks across different domains and adds to the strengths of specializing also generalization capabilities. The same holds true for their research efforts toward a Generalist Agent. On the other hand, how well does a supervised learning approach work on a dataset from a totally different domain without retraining?
And to finally prove our thesis, even ChatGPT incorporates RL. By leveraging insights based on users’ behavior, the OpenAI team can learn to predict model output quality and use this information to align the model’s output with what the users deem as high quality. To do this, they use RL to fine tune the model’s output with an approach called reinforcement learning from human feedback (RLHF).
There are many more known examples of working RL approaches and probably even more that are undisclosed and optimize things behind closed doors.
Is RL Relevant for Your Business?
Since Amazon is using RL, should you leverage its strengths as well? Besides inventory management, there are numerous businesses that can benefit from reinforcement learning. But before you can assess if you have a suitable use case, you might wanna check the showstoppers.
Showstoppers for Reinforcement Learning
Missing environment: An environment (mostly incorporating a simulation) is typically a hard requirement for RL, even though offline reinforcement learning is a recent effort with promising advantages. Also, the agent needs to be able to interact with it as well to get insights on the ongoings (observing the state).
Unclear target: Is it clear what needs to be optimized? Everything which is not included in a reward function will simply be neglected. Therefore, make sure you know what the outcome should look like concerning different parameters of interest.
The problem statement cannot be framed as a game: The aim of a reinforcement learning approach is to reach a certain goal or at least get as close as possible. So, can you think of some kind of score, that should be maximized?
It can be easily solved with a traditional approach: This is not a showstopper but non-ML approaches are often times less complex than ML and especially RL approaches. So make sure that it is really necessary to add the additional layer of complexity.
How to Assess Suitable RL Use Cases?
Didn’t find any showstoppers for using reinforcement learning? Congrats, you can now assess if you have a suitable use case for RL.
To do so, we suggest considering the following:
- Do you have a large and complex input space?
- Do you have a large and complex output space?
- Will a rule-based system not work since relations cannot be represented reasonably?
- Is a continuous learning process required?
- Do you need to adapt to changing conditions?
- Is there a well-defined simulation?
If most questions are answered with “Yes!”, chances are high that it’s worth looking further into RL.
When it works, reinforcement learning can be considered a really strong tool. But the path to get it running is not trivial. Compared to supervised learning (which requires labeled data), reinforcement learning needs a properly defined environment and an accurately defined rewardfunction, which is the biggest challenge.
Reinforcement Learning Work at Motius
Over the past few years, reinforcement learning was a research-heavy topic. But soon, more and more RL approaches will eventually make it into real-life applications. At Motius, we are really fascinated by the opportunities that reinforcement learning offers and see a growing interest from customers, too.
There are plenty of exciting use cases: leveraging consumer data, modeling internal assets & processes, robotic fleet managers, or electricity optimizations.
We figured that reinforcement learning will become an area in ML with lots of upsides, but also hurdles to overcome. That’s why we decided to invest and start working on it early, focusing on the ‘R’ in our “R&D” day-to-day job. And the current customer sentiment has proven this choice to be the correct one.
Save Resources When Implementing RL
On top of that, we built our own RL-MoTool (Motius + Tool), which allows us to reduce the time to get started in RL projects from days or weeks to a couple of hours.
How the RL-MoTool works: The RL-MoTool allows us to spin up reinforcement learning projects in no time. It emerged from an internal research project where the code base has been abstracted and combined into a reusable tool. This setup avoids time-intensive preparatory work while MLOps features like model versioning, performance monitoring, or even scaling computation to a Kubernetes cluster, are already integrated. Thanks to our RL-MoTool, we can directly dive into tackling the problem statement instead of wasting time working on cumbersome system setups.
Alongside some of our projects applying RL to traffic light control and household appliances, we worked on an approach to optimize wireless communication with deep reinforcement learning and recently published our results in a journal paper.
Ready to Explore RL?
These are just a few examples in which we already used reinforcement learning successfully. We are sure that the future of RL will be bright. Thus, we look forward to solving problems with creative solutions figured out by AI. Let us know if you want to look deeper into this field. Together, we can assess the feasibility of applying RL or other approaches and understand the ROI of optimizing processes or making decisions based on RL.