Safetronic 2024: Preview
Explainable Statistical Evaluation of an Automated Driving System Functionality

Algorithms used in automated driving systems are complex and use non-deterministic deep neural networks. Some variants of deep neural networks can be explained theoretically but their behavior under all conditions for the set of safety-critical scenarios to be covered by the automated driving system in the given operational design domain is generally not readily understood. Here is how it could work – a new approach.

Frauke Blossey

Rainer Faller

September 6, 2024

Explainability requires models that can be explained under all operating conditions. Physical models are typically much easier to explain and understandable than abstract statistical data models. Either the Automated driving systems (ADS) vendor or independent functional safety and SOTIF assessors need to confirm that the ADS meets the Tolerable Risk Target [TRT]. Hence the concept we are proposing is not to explain the ADS algorithms and their implementation, but to keep the physical and statistical analysis and testing arguments independent of the complex algorithms, and easily explainable. Powerful tools for it are directed acyclic Bayesian Networks (BN) and the method of Monte Carlo (MCS) as an extension of the well-known fault trees analysis.

How safe must ADS be – Tolerable Risk Target

The European Commission states in a report [1] that Connected and Automated Vehicles must decrease or at least not increase harm compared to conventional driving. Therefore, the rate of fatalities on the road is a reasonable starting point for a TRT. A common term is Positive Risk Balance.

The traffic accident data of the Federal Statistical Office of Germany (DESTATIS) [2] shows for the years 2014 to 2019 (excl. COVID19 lockdown) an average rate of 1.41 • 10^-7/h of fatalities and 2.3 • 10^-6/h of serious injuries on German highways, assuming a mean velocity of 100km/h. Further studies [4] show that 15 % of serious injuries may lead to death in hospital. Consequently, the sum of the frequencies of the violation of the safety goal resulting in fatalities shall be less than the actual risk derived from DESTATIS data. Some fatalities could have been caused by random hardware failures of the ADS, so the hardware failure rate, typically 1• 10^-8/h for ISO 26262-5 ASIL D [6], is deducted once for all SCS from the TRT.

Out of the multiple highway scenarios, a high demand frequency SCS is considered, with an interim TRT of 2• 10^-8/h, as low demand frequency SCS have minimal impact on dangerous failure rates [3].

Explainable Statistical Evaluation

Exida considers the European Commission requirement for a quantitative risk target as helpful. Quantification has helped in the history of functional safety to focus design improvements on the weak parts of a system. The SOTIF standard, ISO 21448, describes, however, quantitative methods only in the informative annex C.

To address the gap of quantitative analysis of statistical and AI models, exida proposes the workflow shown in Fig. 1. Some activities are explained in the following:

ADS item and function definition.
Definition of the operational design domain (ODD), the TRT, and the SCS.
Physical model for each SCS and analysis of the dominant factors of the physical model.
Collection of data with particular care taken on the dominant factors.
Bayesian Network modelling refining the FTA, and the physical model. MCS using the collected data distributions.
Analysis of the dominant factors as per the Bayesian Network and MCS simulation. If needed collection of data for additional dominant factors.
If the results of the Bayesian Network and MCS show that the hypothesis of the ADS being worse than the human drivers cannot be rejected, then improve the ADS.
Else use the physical model and Bayesian Network as a reference model for the statistical testing of the ADS implementation.

Explainable Statistics approach — Figure 1: Explainable Statistical Evaluation workflow

Assumed ADS architecture

We assume a 3-layer ADS architecture:

Layer 1: Control subsystem.
Layer 2: Protective subsystem.
Layer 3: Traffic monitoring and warning subsystem.

High demand frequency scenario –
vehicle cut-in

The Bayesian Network, Fig. 2, is used to model and evaluate the dependencies of scenario variables (traffic & environmental conditions, ADS safety performance, accident severity). Injuries are determined by a crash model resulting in injury level from IL0 to IL4 (IL0 = No impact, IL1 = property damage, IL2 = minor injury, IL3 = serious injury, IL4 = fatality).

Cut in BY Net v2 — Figure 2: MCS of the Bayesian Network with fault-free ADS and exemplary full range distributions

Analysis of the dominant factors
per the physical model

To focus the efforts for the data collection (activity (4) of Fig. 1), an initial analysis of the dominant factors per the physical model is performed (activity (3)). The analysis does a numerical partial derivative of the individual variables with the other variables being held at different operating set-points. Note: It follows the concept of the Birnbaum importance measure used in FTA [5]. The variables with high importance are the dominant factors. Positive means increase in value of the variable increases risk, negative means increase in value decreases risk, zero means the variable does not contribute to risk as per the physical model.

Monte-Carlo simulations

The variables in the Bayesian Network have been populated with experience-based distributions, see Fig. 2 for examples and injury level results. Distributions from fleet data evaluation should be collected for the DF, activity (4) of Fig. 1, but were not released for publication by OEMs. The results of the MCS have been compared with the Null Hypothesis that “the ADS is equal or worse than the human drivers”, i.e., does not provide a positive risk balance, activity (5, 7). The MCS of the Bayesian Network shows the following results:

The null hypothesis cannot be rejected for the entire range of variables, i.e., the TRT is far from being reached.
If the ranges of the negative dominant variables are dynamically limited, then the Null Hypothesis can be rejected against a 95 % upper confidence limit, and a one-sample T-test with a p-value <1• 10^-34.
The importance of the variables matches generally with the importance analysis of the physical model if the ranges of the variables are very limited, like under Swedish regulations, or very open, like on German Highway. The MCS identifies in addition the importance of the absolute velocity.

Conditions to meet the Tolerable Risk Target

In contrast to the FTA in [3], the MCS of the Bayesian Network shows that the TRT can be met with a diverse redundant ADS architecture, provided the environmental and traffic conditions are favorable.

Some of the dominant factors and conditions are part of the static and geographic ODD definition and monitoring, but others are of dynamic nature. To ensure that the dynamic dominant factors are monitored not only for the ego-vehicle but the other vehicles around, a dynamic traffic monitor is proposed. Cut-in speed difference, TTC and cut-in frequency can be estimated from monitoring the dynamic traffic flow on adjacent lanes. If the estimated dominant factors ranges fall outside the ranges assumed for the safety and SOTIF case, then the SAE L3 ADS should alert the driver and fall back to SAE L2+, while still offering its full functionality to the alerted driver, layer 3 (see section 4). Aspects of a dynamic traffic monitor functionality are implemented by OEM.

Safetronic 2024

Learn more about Safetronic 2024 and register now.

Safetronic Website

Diverse redundant safety architectures may meet the TRT for SAE ≥ L3, if the ranges of dominant factors, which have negative impact on the violation of the safety goal, will be monitored and limited. This requires a dynamic traffic condition monitoring (dynamic ODD) of which aspects are implemented by OEMs.

Neither the FTA or the Bayesian Network modelling herein give evidence that architectures with little or no diverse redundancy including sensor diversity between control & protection sub-system can meet the TRT. Likewise, architectures with a high potential for common cause failures between control & protection subsystem, such as sensors sensitive to high noise or low contrast scenarios, have not been shown to meet the TRT.

Exida intends to prompt an open discussion on safety and SOTIF reasoning using statistical argumentation based on models that are easy to understand by functional safety and SOTIF practitioners and assessors.

Magazine of the Fraunhofer Institute for Cognitive Systems IKS

Safetronic 2024: Preview
Explainable Statistical Evaluation of an Automated Driving System Functionality

How safe must ADS be – Tolerable Risk Target

Explainable Statistical Evaluation

Assumed ADS architecture

High demand frequency scenario –
vehicle cut-in

Analysis of the dominant factors
per the physical model

Monte-Carlo simulations

Conditions to meet the Tolerable Risk Target

Safetronic 2024

Interview Frank Kirschke-Biller
»The holistic view of product safety is very important to me«

How safe must ADS be – Tolerable Risk Target

Explainable Statistical Evaluation

Assumed ADS architecture

High demand frequency scenario – vehicle cut-in

Analysis of the dominant factors per the physical model

Monte-Carlo simulations

Conditions to meet the Tolerable Risk Target

Safetronic 2024

Interview Frank Kirschke-Biller »The holistic view of product safety is very important to me«

High demand frequency scenario –
vehicle cut-in

Analysis of the dominant factors
per the physical model

Interview Frank Kirschke-Biller
»The holistic view of product safety is very important to me«