Artificial Intelligence and Safety: is it all about data?

HungaroControl2021. 04. 15.

Zoltan Molnar – HungaroControl

Automation is an important phenomenon in many fields of our life, and recently, it has been gaining a momentum in the ATM industry. Although automation is not restricted to Artificial Intelligence (AI) and Machine Learning (ML) solutions, promising performance of AI in numerous fields can significantly accelerate the process. ML has not been widely used in safety critical applications in our industry yet, but this will change soon, as it can be seen in other sectors (e.g. autonomous vehicles). ML-based solutions are expected to be widespread in this decade, predominantly in assisting human operators, and support them to increase performance and safety standards.

In case of “traditional” software products, safety assurance seems to be a straightforward process, the necessary methods and standards already exist (e.g. ED-153, ED-109A), while ANSPs and manufacturers have significant experience. On the other hand, the assurance can be tricky in case of higher automation levels, when the software directly supports decision-making, or even makes decision on its own. The vast majority of software used in ATM should meet SWAL 3 or 4 requirements. However, in case of SWAL 1, the software development process is much more complex. SWAL 1 is so demanding that it is unlikely that someone is willing to do it today for a reasonable price (of course it highly depends on the complexity of the tool).

Utilization of ML-based applications in ATM raises additional questions, as data has a pivotal role in ML development; the operation is based on patterns and rules identified in training datasets. This is one of the greatest benefit of ML; it is able to identify patterns which are not visible for the experts. However, it can also be a weakness, the behaviour of the application seldom fully understandable and explainable for human operators. Almost every ML method contain a certain level of “black box” and face explainability issues. Existing software assurance techniques are based on the assumption that the software is fully explainable and the functional and safety requirements are traceable throughout the development lifecycle, so the applicability of these are limited. The data and the algorithm are integrated, an inadequate learning dataset can ruin the whole application, while improper validation and test datasets weaken the assurance process. A paradigm shift is inevitable, and we have to reconsider how we are thinking about software safety assurance.

The solution is depending on the criticality of the task which the ML component executes. In ML component level, actions can be taken in several phases of the development process, mostly in model and data selection, data preparation and learning assurance phases. There are more and less explainable ML methods, regressions and decision trees are good example for the former, while deep neural networks for the later. Explainable AI (XAI) is also a developing and promising field, which can help interpret AI models. During the system definition phase, safety criticality of the application shall be considered, and the ML method should be selected accordingly. Size, correctness and representativeness of training, validation and test datasets are key factors in ML safety, already existing concepts, like Data Assurance Level (DAL) can be part of the solution. Training datasets shall cover abnormal and non-nominal situations, to support balanced operational performance and safety in a dynamically changing environment. During training, loss function can be tailored to consider safety aspects.

In sub-system level, more ML components can be used for the same task in parallel, and a voting logic can be applied. An ML component can be also in pair with a non-ML component in order to enhance safety.

In functional system level, automation of specific tasks can highly influence the human-machine cooperation, and the role of the human in the system at all. ML applications aiming decision support or resolution advisory might decrease ATCO workload on one hand, but also decrease their situational awareness on the other hand. These applications support the ATM system to handle greater and more complex traffic, but potentially make the system less resilient in the same time.

These problems seem to be solvable, but further studies should be conducted to analyse and minimize the effect of automation and ML to safety and system resilience, and situational awareness should be reinterpreted in the new environment. One can even imagine that certification of AI models will be somehow similar to the evaluation of human operators – the model should execute its tasks in a simulated, or real environment (e.g. in passive shadow mode) for a pre-defined time period. Testing and validation in different levels of abstraction of the functional system and performance monitoring are also very important in ML safety assurance.

In the meantime, it is very important to understand our current functional system even better, and identify what are the main contributing factors to safety and resilience as safety-II approach propagates, to be able to change (or sometimes not change) the system wisely. One more task in which data can help us.

About the author:

Zoltan Molnar is an accomplished safety and risk management expert at HungaroControl. He is responsible for the safety assessment of HungaroControl’s remote tower program in Budapest, and he is also plays a pivotal role in remote tower related consultancy and R&D activities. He is active in other ATM projects, like cross-border free route airspace implementation, and participates in ATM safety monitoring activities, and  AI safety working groups.