Cloud-based systems face safety and efficiency challenges – ways out of a catch 22 situation

The future will rely on widespread, massively-connected, highly-intelligent systems. To implement this vision, we need dependable cloud-based cyber-physical systems. Research in this direction encounters as many opportunities as challenges. Such systems exist in changing environments and interact with other systems and humans, which requires intelligence, autonomy, safety and adaptability – in fact, conflicting goals. An example of this is autonomous driving.

mask Gebirge mit Wolken

Compared with cars using only local sensors and functions, connected vehicles can present a much higher level of optimization and demonstrate completely new opportunities. Obviously, such a system of systems would need to be safe and reliable to ensure proper handling of hazardous situations and to provide uninterrupted services and an excellent user experience. Connected autonomous vehicles have great potential, not only as personal self-driving cars, but also for fleets of autonomous taxis, buses or trucks, providing cheaper and safer transportation.

Dependable, autonomous, cyber-physical systems of systems (CPSoS) face a wealth of challenges that need to be overcome in order to use them in everyday life. First, satisfactory levels of safety and efficiency must be achieved at affordable costs. One way to achieve this is by dynamically adjusting the system’s performance to an acceptable risk, taking into account the current situation. Secondly, stable control across the end-to-end architecture must be ensured through predictable real-time behaviour with sufficiently low-latency and jitter. Furthermore, given that many different vendors are currently developing autonomous systems, interoperable connectivity is essential. To strengthen security, the design needs to broaden the system boundaries to identify and mitigate additional threats to the system’s safety. Finally, seamless data uploads, updates and maintainability must be provided to gather training data and distribute changes without interfering with normal operation.

In summary, in order to address these challenges, we need flexible, resilient and intelligent architectures that offer information and resources by utilizing various services, but which have the capability to gracefully degrade if they are no longer available. These architectures should be dynamic and stretch across the entire communication chain, from the end user (autonomous vehicles), to edge infrastructures (roadside units or cameras) and on to the cloud services.

Self-driving cars on the highway

Let’s analyse a simple automotive scenario – such as multiple autonomous vehicles on a highway – that can be improved using cloud-based systems. The goal is to maximize speed and ensure safety. The on-board sensors and embedded software, which are designed for city traffic, operate safely only at speeds of up to 80 kmh. However, the road infrastructure includes powerful sensors, such as high-quality cameras, that provide a complete view of the road. The cloud service collects, combines and analyzes the data from the cameras, allowing the vehicles to increase their speed to 130 kmh as long as there is a reliable exchange of data between the vehicle and the cloud. This illustrates how performance can be improved without compromising safety.

However, it’s imperative that the system continues to operate despite connectivity disruptions or cloud unavailability, such as when the vehicle is out of range of the roadside unit, or if information from the cloud is too delayed to make it useful. In this case, the system has to degrade the performance to an acceptable level of risk, such as reducing the speed of the vehicle so that it operates safely.
Autobahnkreuz von oben

Moreover, transitions between different modes of operation are critical. The design must take into account the time it takes for the system to detect and recover from malfunctions, including any delays. The system must constantly monitor its ability to safely detect faults and decelerate. In our example here, the cloud service provides information regarding safe passing distances while the local vehicle system calculates the safe velocity.

The cloud service in this example provides extended sensor range and therefore must provide reliable, real-time information similar to local sensors. Interoperability is required to allow the use of a common road infrastructure and cloud services by vehicles from different manufacturers. In addition to compatible APIs, the systems must coordinate guarantees and demands for safety. As the new connections add new targets, which increases the potential for putting human lives at risk due to hacker attacks on the vehicle subsystems, each level of the system – from the embedded subsystems and road infrastructure, to the cloud services – must ensure security and mitigate possible safety risks. To maintain continuous operation of the vehicle, updates have to be performed without interrupting the functionality of the system.

Safety requires resilient systems

Our approach to overcome the above challenges is to find a unique balance between resilience, flexibility and intelligence. Safety, the most important requirement, must always be ensured. This requires that the system is resilient enough to handle unfavourable conditions, such as cloud unavailability, by downgrading to local operation or performing a minimal risk manoeuvre, e.g. to safely stop the vehicle. It also has to be intelligent enough to continuously adapt to the current environment and adapt in unexpected situations. Reliability enables uninterrupted operation and an excellent user experience. On the other hand, while users of such vehicles expect safety and reliability, the performance and efficiency of the vehicle, which requires flexibility for easy reconfiguration, influences their experience the most. Our approach is to find a compromise between these demands given the system requirements and the available resources.

We propose several techniques to achieve this compromise, the first of which is weakness-driven requirements refinement. This entails an iterative process to uncover the system’s weaknesses – in other words, any deviation from the system’s intended function – and integrate countermeasures into the system. With this technique, satisfactory levels of safety and efficiency can be achieved during the design phase. In our example, one of the primary risks is loss of connectivity, which can resolved by monitoring the status of the connection and using degradation to transition to local operation if necessary.

Our services

Cloud-based approaches often promise manifold advantages over local solutions. However, additional information is usually required in order to decide which option to take. We can create an (abstract) case study by executing the design process and exploring specific scenarios or, we can contribute our experience with similar cases in the past.

Second, adaptable architectures, which allow changes to the system structure over time, depending on the availability of services and resources. In our example, the system dynamically changes it boundaries from a single vehicle, to a system of systems consisting of multiple vehicles, edge systems and cloud networks, thus increasing flexibility and performance.

Another technique is graceful upgrades and downgrades, which allows the system to operate, albeit on a limited basis, in cases where services or resources fail or become unavailable. Switching between cloud and local operation without interruption provides the best possible performance in the current situation. We also propose including safe handling of the updates during the system life cycle, which allows software updates to be completed without interrupting the system. Our approach also focuses on increasing the system’s self-awareness, which means the capability to perform self-assessment to ensure the proper detection and management of failures, thus maintaining the required performance and safety regardless of the situation. In our example, constant monitoring of the connectivity and potential faults is required to successfully adapt the system to the current conditions.


This activity is being funded by the Bavarian Ministry for Economic Affairs, Regional Development and Energy as part of a project to support the thematic development of the Institute for Cognitive Systems.

Read next

Safetronic 2024: Preview
Architectures for Safe Automated Driving

Moritz Antlanger TT Tech 2024
Moritz Antlanger
Autonomous driving / Fraunhofer IKS
Autonomous driving