Designing and evaluating dependable cloud-based systems

Addressing the wide range of challenges in dependable cloud-based cyber-physical systems of systems (CPSoS) requires new approaches that are automated and efficient enough to shift some of the system’s elements to runtime. To design and evaluate the system architecture, Fraunhofer IKS has developed an iterative and automated process.

Gebirge mit Wolken
mask Gebirge mit Wolken

As outlined in a previous blog post, cloud-based CPSoS face a variety of challenges that include achieving affordable safety and performance levels, providing interoperable connectivity and predictable real-time behavior, as well as ensuring security and maintainability.

Automated and efficient approaches that are designed to shift some of the system’s elements to runtime require greater effort during the design phase. However, they can provide more system flexibility and dependability, thereby reducing costs by avoiding maintenance downtime at a later point. The Fraunhofer Institute for Cognitive Systems IKS proposes an iterative and automated process including feedback loops to design and evaluate the system architecture.

Design Process
Bild

The design process allows continuous improvement of the designed architecture through various feedback cycles, plus optimal solutions that fulfill all system requirements and goals.

Starting with safety goals

A list of performance and safety goals serves as the initial input for the process that shapes the entire design. The goals define the tasks and qualities of the system, as well as potential hazards and risks. They are used to derive the system requirements, starting from the top-level tasks that define the overall system architecture and main subcomponents.

Systematic analysis identifies failure modes and system weaknesses in order to refine the requirements. This is done by suggesting proper counter measures that are carried out until the level of weaknesses is satisfactorily low. This process, called weakness-driven requirements refinement, helps identify flaws such as safety hazards, failures, security threats or breach of performance thresholds. Counter measures are then integrated at the subsystem level.

The system requirements limit the options of the configuration space model, a set of tools to (semi-)formally describe, analyze, and collect information about the system. An integral, yet distinct part of this approach is a safety model, which can be used to carry out safety-oriented analyses and set limits to performance-related optimizations. Exploration of the configuration space provides a set of solutions. These are essentially system configurations that meet the requirements, from which optimal system configurations are identified and evaluated.

Requirements compliance is only one aspect of the success. Detailed analyses and simulations must be used to validate the solutions and demonstrate that they fulfil the initial goals of the system and ensure dependability. Validation also allows fault forecasts to be provided and helps to identify critical scenarios. If validation identifies new weaknesses, suitable counter measures are applied. The system requirements can then be refined for the next iteration.

Cloud-based systems have the potential to provide true self-awareness

An essential result of the design process is monitoring and recovery concepts, which can be integrated into the operating system. By monitoring the properties defined by the requirements, the state of the system can be determined. If necessary, recovery plans defined by optimal solutions for the identified contexts can then be triggered.

More advanced concepts and prototypes, including necessary monitors and recovery mechanisms, can help in constituting the system’s self-awareness. Self-awareness is the ability of a system to determine its own state and its environment, plus to detect faults and identify possible actions. Cloud-based systems have the potential to provide true self-awareness as they offer an abundance of processing power and other resources.

If the system has an adaptable architecture – meaning the boundaries of the system can change dynamically from local embedded subsystems to an end-to-end architecture made of multiple CPSoS, networks, and edge and cloud services – self-awareness should be provided at each level. However, independent decisions must be coordinated to avoid conflicts.

Solutions can vary between predefined rules that guarantee conformance and more interactive patterns that require communication. This is why monitoring properties and recovery plans are defined by the system requirements and optimal solutions during the design process. The aim is to adapt them dynamically for a given context.

Application scenario: automated valet parking

To illustrate the design process, let’s analyze a concrete application scenario such as an automated valet parking system. The input (the initial system goals) maximizes the performance, such as the speed of the vehicles in the parking area. It also ensures safety, such as avoiding collisions. The derived requirements define the domain and initial architecture of the system (the parking lot is a semi-closed area and vehicles should be guided by a cloud-based safety system).

Potential vulnerabilities, such as loss of connectivity, can be identified in advance. One possible counter measure would be to stop the vehicle until the connection is recovered, which leads to further vulnerabilities. The requirements refinement, weaknesses identification, and counter measures design process are carried out until all foreseeable and relevant vulnerabilities are eliminated. This results in a more mature system architecture.

In the next step, the possible system configurations, constrained by the requirements, are explored in order to find optimal solutions with the potential to meet performance and safety requirements. They are then validated using various techniques such as Monte Carlo simulation. The defined runtime monitoring and recovery plans ensure that the system is flexible enough to manage unexpected situations. This includes monitoring the connection status for instance – or more precisely determining if the most recent received information is outdated.

This blog post is based on the white paper "Flexilient End-to-End Architectures". It is available for download on the website of the Fraunhofer IKS.

Download white paper Pfeil nach rechts

If the connection is lost, the vehicle can switch to local control and proceed using its own sensors. Rules regarding when the vehicle can proceed on its own ensure that the cloud and other vehicles have sufficient time to adapt to the independent vehicle. Once connectivity is available again, another recovery plan is triggered that reintegrates the vehicle back under control of the cloud.

Adaptive dependability management ensures resilience

An essential feature of cloud-based systems that provide various services is resilience. This entails ensuring dependability at all time regardless of the context, whether it involves a dynamically changing environment or failures. Even the most comprehensive design process cannot predict all possible situations. The most conceivable scenarios are considered during exploration of the configuration space model. This permits a system design that remains dependable and performant most of the time.

The real world is full of unimagined scenarios however – and this is where adaptive dependability management comes into play. Automating the cycle of monitoring, updating the system’s self-monitoring in the form of a safety model and continual model-based analysis with system optimization and adaptation allow the system to adapt to new contexts. A safe but potentially less efficient configuration is then used as a fallback.

Iterative and continuous development of the system using techniques such as weakness-driven requirements refinement, plus increasing the system’s self-awareness, can help overcome the various challenges involved in the development of dependable, cloud-based end-to-end architectures. Autonomous systems can be then enhanced with an abundance of cloud services and resources to create completely new opportunities and concepts.


This activity is being funded by the Bavarian Ministry for Economic Affairs, Regional Development and Energy as part of a project to support the thematic development of the Institute for Cognitive Systems.

Read next

Flexible and dynamic systems
Dynamic safety assurance in end-to-end architectures

Oleg Oleinichenko
Oleg Oleinichenko