Towards resilient architectures for cognitive systems

Cyber-physical systems (CPS) integrate software algorithms and physical structures. To exceed the simple embedded systems of today, advances in CPS require several properties, particularly adaptability, resilience, and safety. Lean management of CPS with resilient architectures satisfies these requirements by combining data distribution with service-oriented structures. A practical use case: a moving robot in a warehouse.

mask Eisfläche

Cognitive systems share the environment with other participants, such as humans, animals, and other systems. Therefore, the problem of participant detection and future behavior forecasting in realistic scenarios is at the core of safe motion planning.

To perceive the surrounding environment, cognitive systems consist of a huge number of sensors that generate different types of data over time; moreover, uncertainties are present in the data and, in order to manage and interact with the physical world, the system should be able to estimate them. In this context, resilient architectures come into play by creating the infrastructure for cognitive systems and by correctly classifying data to provide new solutions. Researchers have been studying and proposing architectures over the last four decades and, with a given multitude of approaches, the goal of this architecture is to manage the cognitive abilities of the system, such as perception and reasoning, through safety contracts.

Resilient software architecture

Typically, with CPS, cognitive abilities are included inside the MAPE-K (Monitor-Analyze-Plan-Execute over a shared Knowledge) feedback loop. The integration of them and their organization to carry out the cycle is possible through a self-adaptable behavior of the architecture. Particularly, when it involves distributed adaptation, the architecture needs to define how information and processing are organized into components, and how information flows between components. Therefore, developing resilient architectures creates the following challenges:

  • ADAPTATION: resilient architectures must be dynamic in terms of services and abilities. On the one hand, services can evolve over time, i.e. new services appear while others disappear. On the other hand, abilities are described by information that varies over time, and require an adaptive classification from the architecture.
  • REPRESENTATION: another challenge is whether the ability is directly supported by the embedded process, or is instead provided by the architecture. Design decisions affect which abilities are learned from experience and which are based on specialized services.
  • OPTIMIZATION: the architecture aims to find the optimal configuration for a given problem. This requires the combinatorial optimization of services, which is in general a hard problem. Therefore, with a given optimization problem, we must run the correct service configuration for a given time.

Some work has already been carried out in these directions; however, possible errors during the detection phase cannot be recovered, and the quantification of uncertainties is still an unresolved problem.

A deeper look into the resilient node

In our previous blog, ResilientSOA, the service orchestration was described through a periodical reconfiguration of the nodes. Each node has a known interface that is executed according to a finite state machine; this only allows the developer the freedom to modify custom states, such as inactive and active states. In this blog, the words node and service are interchangeable.

States are divided into primary and transition states. Depending on the strategy chosen by the service manager, the transition states are reached by receiving a command in a primary state and the result of the transition leading to a positive or negative response respectively, determined by whether the node has reached a new primary state or not.

Resilient SOA Wheeled Robot

Primary and transition states of resilient node: primary states are colored in blue and transition states in green.

ACTIVE STATE: the node is running and able to provide its capabilities under the fulfillment of a contract. The responsibility for managing the node capabilities and validating the contract requirements rests with the manager and, whenever these requirements are not met, they may deactivate or shut down the node depending on the response time.

INACTIVE STATE: the node has been configured and is waiting to be run. Using specific services, the manager checks the status of the node and asks for the node to be activated or cleaned up, whether the configuration meets the contract requirements or not.

UNCONFIGURED STATE: the node has been created and looks to the manager for the registration process. The node should provide the name, type, and capabilities through the registration service; then specific services are created for managing the node and checking its status.

FINALIZED: this is the last state before being destroyed. Different reasons, such as redundancy, nonresponsiveness, or degradation, can lead to the finalization of the node.

Use case: moving robot

In a project, the Fraunhofer Institute for Cognitive Systems IKS considers a moving robot in warehouse logistics, composed of a camera and lidar. Along with the required operations, such as pick and place, the system interacts with the physical world in safety-critical circumstances where many problems could occur. For example, if an unknown object (out-of-distribution) is posed in front of the system, the camera is unable to perceive it, and only the rest of the sensors can provide information.

Therefore, the architecture needs to consider the output from all the sensors in order to classify the relation between the data and detect uncertainties and predict possible behaviors. With our approach, each node carries out an ability through a contract and the manager consistently validates and associates the reliability with the current data.

Adaptability as a main advantage

One of the main advantages of this setup for cyber-physical systems is adaptability. The architecture can adjust itself according to the current situation and can activate or deactivate any services if the data provided is not satisfactory.

Moreover, building the system with a service-oriented architecture ensures scalability and usability, because the services can be easily replaced with new versions, and can be duplicated for redundancy purposes. Another important benefit is the resilience property of the architecture, provided by run-time validation of safety contracts. This property guarantees safe system behavior because each data is delimited within a secure operational design domain and is constantly monitored by the service manager.



In the screencast, on the left-hand side, we see a Festo Robotino moving in the Webots simulation environment. Lidar sensor and the convolutional neural network (CNN) measurements appear on a Rviz data visualization tool, on the right-hand side and at the bottom respectively. The status of each node, CNN, lidar, and control, is mapped to a specific color and is shown at the top of the screencast through a graph visualization plot. In this context, blue corresponds to the »unconfigured” state, light green to the »inactive« state, and dark green to the »active« state.

At the beginning of the simulation, the nodes start up and go from the »unconfigured« to »active« state. After a while, the CNN is activated by the manager and is able to give a better understanding of the surrounding, for example by classifying the boxes on the shelf. The boxes are labeled with a green frame if the objects are in-distribution and with a red frame if they are out-of-distribution.

This activity is being funded by the Bavarian Ministry for EconomicAffairs, Regional Development and Energy as part of a project to support the thematic development of the Institute for Cognitive Systems.

Read next

A software toolkit for flexible and robust architectures

Florian Wörter
Florian Wörter
Cognitive systems / Fraunhofer IKS
Cognitive systems