Reinforcement Learning
Hierarchical structures can be the key to safe and efficient AI systems

Deep Learning has increased interest in using artificial intelligence in more and more systems. Reinforcement Learning in Machine Learning has also experienced an upswing. However, despite these impressive recent successes, it still faces the challenge of being accepted and successfully deployed. This is especially true for safety-critical tasks such as robotics, autonomous driving, and industrial control. One approach that can help address this problem is Hierarchical Reinforcement Learning.

Street crossing from above
mask Street crossing from above
In safety-related applications, Artificial Intelligence (AI) has not been widely accepted as an alternative to traditional engineering solutions. There are several reasons for this. For example, neural networks still lack formal guarantees and are therefore unreliable in most cases.

A notorious problem that neural networks must deal with are adversarial attacks. These are small perturbations of the sensor inputs that can be caused by ordinary noise or even malicious attacks. Under certain circumstances, they are sufficient to change the decisions made by the neural network [1]. It is therefore an urgent task for deep learning to increase its robustness and to develop appropriate testing methods.

Furthermore, neural networks also need to be better explained to achieve safer AI, as they are typically treated as black boxes. Hence, formalizing the requirements that specify the intended behavior of such systems is not a trivial task. The reason AI is used in the first place is that describing the desired behavior is not easily done by using explicable logic [2]. Nonetheless, being able to justify the decision made by the model is a necessary step in the validation and verification of such systems.

Sample efficiency in model-based Reinforcement Learning

Most state-of-the-art deep learning models are data inefficient. However, collecting the massive amounts of data needed to train such models is not always feasible: Think autonomous driving. The approach of the leading players (e.g. Tesla, Waymo) is to train their models on thousands of hours of recorded driving data. AlphaGO and OpenAI Five, celebrated AI models capable of playing the games of Go and Dota 2, respectively, also require an extremely large amount of collected data to achieve their impressive results.

The root of this problem stems from a well-known problem in Reinforcement Learning: the curse of dimensionality. It describes how the number of states grows exponentially with task complexity, easily becoming a computationally intractable problem. Improving sampling efficiency is therefore paramount to deploying Reinforcement Learning models in complex scenarios.

Sample efficiency refers to the amount of data the agent needs to experience to reach a chosen target level of performance [3]. The fewer interactions with the environment required for the agent to learn a good control strategy, the more efficient the learning method. The sample efficiency can be increased by better structuring the model to process the collected data more efficiently and by using better strategies to interact with the environment. A promising approach is data efficient Hierarchical Reinforcement Learning (HRL).

What is Hierarchical Reinforcement Learning?

Hierarchical Reinforcement Learning addresses problems such as sample inefficiency, scalability, and generalization. By decomposing the problem into modules with various levels of abstraction, efficiency is increased. It contrasts with end-to-end learning, which consists of optimizing over a single model. It is solely responsible for processing the input coming from the sensors and outputting the decision to be sent to the actuators.

Several studies in neuroscience and behavioral psychology suggest that our brains in a hierarchical manner. For example, even infants use temporal abstraction to generate subgoals when solving their tasks [4]. Guiding our behavior in accordance with goals, plans, and broader contextual knowledge is what distinguishes humans and allows us to solve complex problems [5].

Inspired by such biological evidence, the core idea of Hierarchical Reinforcement Learning is to learn how to solve a task by learning specific skills (also called abstract actions) that are combined to achieve higher-level goals. A significant impact on sample efficiency comes from the fact that the set of learned skills can be used to solve variations of the task or even completely new ones.

Industries such as automotive and avionics have traditionally designed their safety-critical systems in a modular fashion. This modular approach facilitates maintainability, the implementation of redundancy modules, and the traceability of a detected failure in both hardware and software systems. This can serve as a motivation for designing AI models that solve complex tasks by breaking the problem down into subproblems that are much easier to understand and verify. Data-efficient Hierarchical Reinforcement Learning is therefore a viable approach to achieving safer AI-based systems.

Although AI research has come a long way and achieved impressive results, there is still much to be done to deploy learning-based models in real-world, complex, safety-critical applications. Model-based Hierarchical Reinforcement Learning is a promising approach that can help achieve this audacious goal.

[1] Lütjens, Björn, Michael Everett, and Jonathan P. How. “Certified adversarial robustness for deep reinforcement learning.” Conference on Robot Learning. PMLR, 2020.

[2] Alves, Erin E., et al. “Considerations in assuring safety of increasingly autonomous systems.” No. NASA/CR-2018-220080. 2018.

[3] Botvinick, Matthew, et al. “Reinforcement learning, fast and slow.” Trends in cognitive sciences 23.5 (2019): 408-422.

[4] Ribas-Fernandes, Jose J F, et al. “A neural signature of hierarchical reinforcement learning.” Neuron 71.2 (2011): 370-379.

[5] Badre, David, et al. “Hierarchical cognitive control deficits following damage to the human frontal lobe.” Nature neuroscience 12.4 (2009): 515-522.


This work was funded by the Bavarian Ministry for Economic Affairs, Regional Development and Energy as part of a project to support the thematic development of the Institute for Cognitive Systems.

Read next

The future of production
On the way to Industry 5.0?

Gereon Weiss
Gereon Weiß
Industry 4.0 / Fraunhofer IKS
Industry 4.0