Flexilience: balancing key requirements in autonomous systems

Reconciling safety, performance and reliability in cloud-based cognitive systems is no easy task. Especially in safety-critical autonomous systems such as driverless cars, expectations rightly assume that all three requirements are implemented at the highest level. However, this comes up against limited resources. The Fraunhofer IKS has developed an approach for optimising safety, performance and reliability depending on the current situation and concrete applications: flexilience.

Vogelschwarm
mask Vogelschwarm

When it comes to cloud-based intelligent systems such as self-driving cars, autonomous industrial robots or smart buildings, we demand the highest levels of safety, performance and reliability. Unfortunately, it would take unlimited resources to optimize all three of these features. In the real world, however, we need to allocate available resources, taking into account not only system requirements but also the concrete application. A proper allocation of resources enables the introduction of varying levels of flexibility, intelligence and resilience into a system. When combined, these three qualities create a new term: flexilience – persistent dependability and optimized performance in cognitive systems when facing changes.

Flexibility: how systems adapt to new situations

A flexible system is able to modify its behavior in response to the current situation. Crucially, flexibility is about more than just a free choice of one of a number of preprogrammed actions. Flexibility is the ability of the system to modify itself – to react to constraining circumstances and to shape its goals and ways of achieving them.

WHY FLEXIBILITY? An ideal system should be able to correctly handle any potential situation – changes in customer requirements, say, or a dynamically shifting goal. In the course of the design phase, however, it is impossible to test all of these scenarios. This is because intelligent, autonomous systems are complex, operate in the real world, and interact with humans and other systems.

Instead of hopelessly trying to predict the unpredictable, it is therefore much better to equip our system with the means to adapt itself to new situations. For example, a change in the volume of user demands may lead to overload and necessitate an interim imposition of additional constraints on the system, thereby limiting its functionality for a short period. On the other hand, should conditions become more favorable, a system may temporarily increase its performance in order to exploit this opportunity.

Flexibility enables a reduction in risk, since it means that only the requisite functionality needs to be delivered. Later, the system can extend itself and adapt to new needs. It can also maximize potential by incorporating new services and seamlessly adjusting to new requirements, thereby increasing reliability. In other words, the system always operates at the highest possible level of performance.

Intelligence: systems observe themselves
and draw conclusions

A cognitive system utilizes artificial intelligence (AI) in order to solve problems autonomously and develop strategies for human tasks. It requires cognitive capabilities such as context comprehension, interaction, adaptation and learning. In this way, a system can be imbued with intelligence, thereby enabling it to analyze its actions and the results of those actions so as to learn and apply this knowledge in the future.

WHY INTELLIGENCE? It is not enough merely to adapt to new situations and to meet safety requirements. Continuous adaptation should also focus on performance enhancement. The ideal system would be perfectly optimized to meet any situation. In the real world, however, this is hard to achieve. A solution here is to introduce intelligence into the system, so that it can observe itself and its environment, perform actions, analyze the results, and apply this knowledge in the future – even in scenarios that system designers have never considered. By enhancing the system with tools that enable it to analyze and learn from past experience, we can achieve surprising results and greatly improve performance.

Resilience: systems react safely in every possible situation

A resilient system remains dependable even if the environmental conditions change, resources become unavailable, or system faults occur. Dependability represents a promise to users that a system will continue to operate correctly.

WHY RESILEINCE? It is not enough that a system adapt to changing needs and optimize its performance. Users also expect a resilient system to deliver uninterrupted service of the desired quality and with satisfactory levels of safety, reliability and availability. In practice, this means that a system should be able to predict potential problems, including a lack of resources and other unfavorable conditions, and to implement in advance proper countermeasures in order to keep providing a correct level of service.

Once again, in the real world, with all its complexity, volatility and uncertainty, it is impossible to conceive of every possible scenario during the design phase. Engineers are unable to devise countermeasures for every potential situation. For a start, there is no way of imagining every possible scenario; secondly, the design of such system would be too laborious; and, lastly, even a small variation to the scenario would cause the system to behave in an unpredicted way.

Therefore, in order to avoid compromising the safety and reliability of the system, an ability to continuously adapt to the current situation must be embedded within the system. To ensure this adaptability, the system must be able to redirect all currently available resources so as to support key functionalities. And the system must learn how, and when, this should be done – and refine this knowledge over time.

This blog post is based on the white paper "Flexilient End-to-End Architectures". It is available for download on the website of the Fraunhofer IKS.

Download White Paper Pfeil nach rechts

Flexilience: the best of three worlds

Flexibility, intelligence and resilience: each is necessary, each is responsible for safe and performant operation of the system, and each requires resources – the more, the better. Resources, however, are always limited and expensive, and here they are allocated to satisfy conflicting goals. For example, if all resources are directed towards resilience, the result is that flexibility and intelligence are neglected. A system designer must therefore establish a unique balance between these features – flexilience –according to the application, user needs and actual requirements.

The figure below – a triangle of qualities – depicts the dilemma of the design engineer. A single point is to be selected for your system at any one time. The distance between this point and the triangle’s corners represents the degree to which resources have been allocated to each particular feature. The closer this point is to any one corner, the greater the allocation of resources to that specific feature. The desired location of a point can change over time, thereby reflecting how the system adapts to dynamic conditions.

Qualitätsdreieck
Bild

Key qualities of dependable cloud-based cyber-physical systems.

For example, a point located at the bottom corner of the smaller triangle delivers great performance, thanks to enhanced flexibility and intelligence. However, neglecting resilience will result in possible downtimes due to poor resistance to changing conditions. One solution would be to move the point upwards so as to increase resilience when necessary, while still maintaining good performance. Similarly, a point located at the top corner of the larger triangle delivers high resilience to changing conditions and uninterrupted service almost all the time. However, this service will offer very unsatisfactory performance. To improve it, the point needs to be shifted downwards so as to redirect part of the resources towards increasing flexibility and intelligence.

Qualitätsdreieck mit Punkten
Bild

Depending on the context and application, resources can be dynamically allocated to achieve different levels of resilience, flexibility and intelligence.

Practical examples: an autonomous car …

By way of illustration, an autonomous car requires high levels of safety and reliability: human life depends on the correct and uninterrupted delivery of service, even at the cost of decreased performance. In this case, computational power, sensors, network bandwidth and other resources should therefore be used mainly to ensure the resilience of the self-driving car.

While not a top priority, performance is also important, since this is what users observe and evaluate. When a car is not driving, it only has to deliver a single functionality – namely, being “parked” – and can therefore be perfectly safe and reliable. When a car is in use, the point within the triangle of qualities must be shifted downwards so as to increase performance. If the system detects a fault, it must then increase safety. In summary, a point within the triangle of qualities can be moved within a certain area depending on context, thereby reallocating resources so that user demands and system requirements are satisfied.

Qualitätsdreieck autonomes Fahren
Bild

Changing context requires dynamic resources allocation. Green area represents all possible combinations of resources allocations for an autonomous car and black points are examples of concrete scenarios requiring different levels of safety and performance.

… and an industry robot

In the second example, an industrial robot operating in an enclosed, structured environment does not require such a high level of safety as an autonomous car, since there is no interaction with external agents displaying unknown behavior, such as children or nervous drivers. Here, the highest priority is to maximize performance and minimize downtime. During normal operation, therefore, most of the resources should be directed towards flexibility and intelligence so as to maximize the performance of the system. During cooperation with human workers, the point must be shifted upwards to ensure the safety and reliability of the system. When maintenance is performed, safety becomes the top priority. The point is therefore moved to the right.

Qualitätsdreieck Roboter
Bild

Performance or safety can be increased by moving the point representing current resources allocation. This approach allows satisfying requirements during different modes of operation of an industrial robot.

To provide a system with true self-awareness and enable constant improvement, it is not sufficient to engineer flexilience merely during the design phase. The key challenge is to adjust resource allocation and to continuously search for the desired flexilience during run time. In this way, the goals of safety, performance and reliability can be adjusted over time, depending on current requirements and context.


This activity is being funded by the Bavarian Ministry for EconomicAffairs, Regional Development and Energy as part of a project to support the thematic development of the Institute for Cognitive Systems.

Read next

Designing and evaluating dependable cloud-based systems

Anna Kosmalska
Anna Kosmalska
Cognitive systems / Fraunhofer IKS
Cognitive systems