Reinforcement learning (RL) is a machine learning paradigm in which an agent must learn to optimize its actions with respect to its surrounding environment. For example, a possible task would be a robot that has the mission to move from the start to the end of a maze. Discretizing its movement into time steps, at each step: the robot is in an environment state, and it takes one of several possible actions, which turns the environment into the next state.
To quantify how helpful the given action was, the environment provides a positive or negative reward as feedback to the agent. After receiving the reward, the agent can process and memorize this experience and, if needed, change its behavior to maximize reward. This learning cycle continues until the agent converges or until it surpasses the available number of training iterations.
RL algorithms need to memorize the reward one can expect for each environment state and/or for each state-action pair. This information can be retained and updated using a map-like data structure, but this would become too big for industrially relevant use cases.
Current performant RL algorithms instead use neural networks to approximate the expected rewards and update their weights analogously to the supervised learning process. This approach has so far been shown to be able to help healthcare workers with decision support in various disease treatments and drug designs, and to help guide autonomous systems. Therefore, the potential benefit to industry is a great motivator for research in RL.
How and why should quantum computing be used in RL?
Quantum reinforcement learning (QRL) is the term given to all methods that can be found at the intersection between reinforcement learning and quantum computing. On the spectrum between these two worlds, multiple approaches can be found. Some employ classical methods that make use of quantum physics principles, namely the quantum-inspired algorithms. At the other end, some solutions are tailored exclusively for fault-tolerant quantum computers.
However, the quantum hardware currently developed and accessible does not completely fulfill the fault-tolerant criteria: there is a rather low number of computational quantum bits available in a device and they are still affected by various types of error – hence the name of noisy intermediate scale quantum (NISQ) devices. For these devices, one can look at the middle ground of the QRL spectrum, namely the hybrid quantum-classical (HQC) reinforcement learning (RL). HQC algorithms replace only parts of the pipeline with quantum submodules, which allows one to simultaneously make use of the advantages of quantum and classical computing.
Quantum reinforcement learning supports
The Fraunhofer Institute for Cognitive Systems IKS is looking into QRL methods as part of the Munich Quantum Valley
project, supported by the Bavarian state government with funds from the Hightech Agenda Bayern. For this purpose, an HQC RL algorithm was developed in a paper presented this year at the 15th International Conference on Agents and Artificial Intelligence . In this paper, a robot must navigate on a slippery frozen lake, avoid the holes, and reach the goal position. The challenge of this environment is that there is a fixed possibility for the agent to slip and not take the intended actions, and thus has to also anticipate this.
A classical state of the art RL algorithm was used, where the neural networks used for the computation of the expected reward were replaced by quantum circuits of various architectures. All circuits were shallow, had the same input data embedding, and were suitable for NISQ devices. Therefore, while they have been benchmarked in simulation, they are also suitable to be run on the quantum hardware available today.
Many of the 19 HQC architectures employed reached comparable rewards with respect to their classical counterparts, and needed fewer training steps. When it comes to the choice of the quantum architecture there are several non-trivial choices, such as: the data encoding, the architecture of the trainable quantum circuit, the measurement type, and the classical post-processing of the training results. So far, quantum metrics such as entanglement capability fail to explain the differences in performance, which can be observed in the paper .
This discussion shows the two sides of quantum reinforcement learning: Firstly, there are still dilemmas with respect to how such a pipeline should be built for a given task, which existing literature fails to exhaustively answer and thus continues to investigate. Secondly, HQC is a promising path for reinforcement learning where one may converge faster to a stable trained agent. This benefit enforces why further research could include healthcare and robot navigation tasks, where convergence speed is vital.
 Drăgan, Theodora-Augustina, et al. “Quantum Reinforcement Learning for Solving a Stochastic Frozen Lake Environment and the Impact of Quantum Architecture Choices.” Proceedings of the 15th International Conference on Agents and Artificial Intelligence, 2023. Crossref, https://doi.org/10.5220/0011673400003393.