ISO/IEC 24029-2:2023
(Main)Artificial intelligence (AI) - Assessment of the robustness of neural networks - Part 2: Methodology for the use of formal methods
Artificial intelligence (AI) - Assessment of the robustness of neural networks - Part 2: Methodology for the use of formal methods
This document provides methodology for the use of formal methods to assess robustness properties of neural networks. The document focuses on how to select, apply and manage formal methods to prove robustness properties.
Intelligence artificielle (IA) — Evaluation de la robustesse de réseaux neuronaux — Partie 2: Méthodologie pour l'utilisation de méthodes formelles
General Information
Overview
ISO/IEC 24029-2:2023 - "Artificial intelligence (AI) - Assessment of the robustness of neural networks - Part 2: Methodology for the use of formal methods" provides a structured methodology for applying formal methods to prove robustness properties of neural networks. The standard guides how to select, apply and manage formal verification techniques (e.g., solvers, abstract interpretation, reachability analysis, model checking) to obtain mathematical proofs, counterexamples or inconclusive results about network behavior. It is intended to complement statistical testing and increase trust in neural network robustness across the AI life cycle.
Key topics and technical requirements
- Robustness assessment framework: Definitions and criteria for robustness-related properties including stability, sensitivity, relevance and reachability, and how to express these as formal requirements.
- Domain and attributes: Guidance on defining the input domain for verification (attributes, bounding conditions) so formal analysis targets relevant operating regions.
- Applicability of formal methods: Considerations for different neural network architectures and input data types, and mapping of techniques to tasks:
- Solvers (SMT/LP/MILP) for exact reasoning where feasible
- Abstract interpretation for scalable over-approximations
- Reachability analysis (deterministic and non-deterministic environments)
- Model checking for stateful or symbolic models
- Life-cycle integration: Recommendations for applying formal methods during design & development, verification & validation, deployment and operation/monitoring - including identifying recognized features, checking separability, covering input domains, and measuring perturbation impact.
- Practical constraints: Notes on scalability and trade-offs between precision and computational cost; formal methods may be complementary rather than universally decisive.
Practical applications and intended users
ISO/IEC 24029-2:2023 is targeted at:
- AI developers and ML engineers seeking formal assurance of neural network behavior.
- Verification and safety engineers in safety-critical domains (autonomous driving, medical devices, industrial control).
- Regulatory and compliance teams needing documented, rigorous robustness evidence.
- Researchers and tool developers building formal verification tools for neural networks.
Typical applications:
- Proving that small input perturbations do not change classifier labels (adversarial robustness).
- Verifying reachability properties for control systems.
- Complementing statistical testing during V&V and monitoring robustness changes in operation.
Related standards
- ISO/IEC 24029-1 (overview of robustness assessment techniques)
- ISO/IEC 22989:2022 (AI concepts and terminology)
- ISO/IEC 23053:2022 (framework for AI systems using ML)
- ISO/IEC 25059:2023 (AI system quality model - robustness as reliability sub-characteristic)
Keywords: ISO/IEC 24029-2:2023, formal methods, robustness assessment, neural networks, formal verification, model checking, abstract interpretation, reachability analysis.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 24029-2
First edition
2023-08
Artificial intelligence (AI) —
Assessment of the robustness of
neural networks —
Part 2:
Methodology for the use of formal
methods
Intelligence artificielle (IA) — Evaluation de la robustesse de réseaux
neuronaux —
Partie 2: Méthodologie pour l'utilisation de méthodes formelles
Reference number
© ISO/IEC 2023
© ISO/IEC 2023
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
© ISO/IEC 2023 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope .1
2 Normative references .1
3 Terms and definitions .1
4 Abbreviated terms .4
5 Robustness assessment .4
5.1 General . 4
5.2 Notion of domain . 5
5.3 Stability . 6
5.3.1 Stability property . 6
5.3.2 Stability criterion . 6
5.4 Sensitivity . 6
5.4.1 Sensitivity property . 6
5.4.2 Sensitivity criterion . 7
5.5 Relevance . 7
5.5.1 Relevance property . 7
5.5.2 Relevance criterion . 7
5.6 Reachability . 8
5.6.1 Reachability property . 8
5.6.2 Reachability criterion . 8
6 Applicability of formal methods on neural networks .9
6.1 Types of neural network concerned . 9
6.1.1 Architectures of neural networks . 9
6.1.2 Neural networks input data type . 10
6.2 Types of formal methods applicable.12
6.2.1 General .12
6.2.2 Solver .13
6.2.3 Abstract interpretation . 13
6.2.4 Reachability analysis in deterministic environments .13
6.2.5 Reachability analysis in non-deterministic environments . 14
6.2.6 Model checking . 14
6.3 Summary . 14
7 Robustness during the life cycle .15
7.1 General . 15
7.2 During design and development . 15
7.2.1 General .15
7.2.2 Identifying the recognized features . 15
7.2.3 Checking separability . 16
7.3 During verification and validation . 16
7.3.1 General . 16
7.3.2 Covering parts of the input domain . 17
7.3.3 Measuring perturbation impact . 17
7.4 During deployment . 18
7.5 During operation and monitoring . 19
7.5.1 General . 19
7.5.2 Robustness on a domain of operation . 19
7.5.3 Changes in robustness .20
Bibliography .21
iii
© ISO/IEC 2023 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance
are described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria
needed for the different types of document should be noted. This document was drafted in
accordance with the editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of
any claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC
had not received notice of (a) patent(s) which may be required to implement this document. However,
implementers are cautioned that this may not represent the latest information, which may be obtained
from the patent database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall
not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 42, Artificial intelligence, in collaboration with the European Committee for
Standardization (CEN) Technical Committee CEN/CLC/JTC 21, Artificial Intelligence, in accordance with
the Agreement on technical cooperation between ISO and CEN (Vienna Agreement).
A list of all parts in the ISO/IEC 24029 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
iv
© ISO/IEC 2023 – All rights reserved
Introduction
Neural networks are widely used to perform complex tasks in various contexts, such as image or
natural language processing and predictive maintenance. AI system quality models comprise certain
[1]
characteristics, including robustness. For example, ISO/IEC 25059:2023, which extends the SQuaRE
[2]
International Standards to AI systems, considers in its quality model that robustness is a sub-
characteristic of reliability. Demonstrating the ability of a system to maintain its level of performance
under varying conditions can be done using statistical analysis, but proving it requires some form of
formal analysis. In that regard formal methods can be complementary to other methods in order to
increase trust in the robustness of the neural network.
Formal methods are mathematical techniques for rigorous specification and verification of software
and hardware systems with the goal to prove their correctness. Formal methods can be used to
formally reason about neural networks and prove whether they satisfy relevant robustness properties.
For example, consider a neural network classifier that takes as input an image and outputs a label from
a fixed set of classes (such as car or airplane). Such a classifier can be formalized as a mathematical
function that takes the pixel intensities of an image as input, computes the probabilities for each
possible class from the fixed set, and returns a label corresponding to the highest probability. This
formal model can then be used to mathematically reason about the neural network when the input
image is modified. For example, suppose when given a concrete image for which the neural network
outputs the label “car” the following question can be asked: “does the network output a different label
if the value of an arbitrary pixel in the image is modified?” This question can be formulated as a formal
mathematical statement that is either true or false for a given neural network and image.
A classical approach to using formal methods consists of three main steps that are described in this
document. First, the system to be analyzed is formally defined in a model that precisely captures all
possible behaviours of the system. Then, a requirement is mathematically defined. Finally, a formal
method, such as solver, abstract interpretation or model checking, is used to assess whether the system
meets the given requirement, yielding either a proof, a counterexample or an inconclusive result.
This document covers several available formal method techniques. At each stage of the life cycle,
the document presents criteria that are applicable to assess the robustness of neural networks and
to establish how neural networks are verified by formal methods. Formal methods can have issues
in terms of scalability, however, they are still applicable to all types of neural networks performing
various tasks on several data types. While formal methods have long been used on traditional software
systems, the use of formal methods on neural networks is fairly recent and is still an active field of
investigation.
This document is aimed at helping AI developers who use neural networks and who are tasked with
assessing their robustness throughout the appropriate stages of the AI life cycle. ISO/IEC TR 24029-1
provides a more detailed overview of the techniques available to assess the robustness of neural
networks, beyond the formal methods described in this document.
v
© ISO/IEC 2023 – All rights reserved
INTERNATIONAL STANDARD ISO/IEC 24029-2:2023(E)
Artificial intelligence (AI) — Assessment of the robustness
of neural networks —
Part 2:
Methodology for the use of formal methods
1 Scope
This document provides methodology for the use of formal methods to assess robustness properties of
neural networks. The document focuses on how to select, apply and manage formal methods to prove
robustness properties.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989:2022, Information technology — Artificial intelligence — Artificial intelligence concepts
and terminology
ISO/IEC 23053:2022, Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 22989:2022,
ISO/IEC 23053:2022 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
domain
set of possible inputs to a neural network characterized by attributes of the environment
EXAMPLE 1 A neural network performing a natural language processing task is manipulating texts composed
of words. Even though the number of possible different texts is unbounded, the maximum length of each sentence
is always bounded. An attribute describing this domain can therefore be the maximum length allowed for each
sentence.
EXAMPLE 2 A face capture domain requirements can rely on attributes such as that the size of faces is at
least 40 pixels by 40 pixels. That half-profile faces are detectable at a lower level of accuracy, provided most of
the facial features are still visible. Similarly, partial occlusions are handled to some extent. Detection typically
requires that more than 70 % of the face is visible. Views where the camera is the same height as the face perform
best and performance degrades as the view moves above 30 degrees or below 20 degrees from straight on.
Note 1 to entry: An attribute is used to describe a bounded object even though the domain can be unbounded.
© ISO/IEC 2023 – All rights reserved
3.2
attribute
property or characteristic of an object that can be distinguished quantitatively or qualitatively by
human or automated means
[SOURCE: ISO/IEC/IEEE 15939:2017, 3.2, modified — "entity" replaced with "object".]
3.3
bounded domain
set containing a finite number of objects
EXAMPLE 1 The domain of all valid 8-bit RGB images with n-pixels is bounded by its size which is at most
3 × n
256 .
EXAMPLE 2 The number of all valid English sentences is infinite, therefore this domain is unbounded.
Note 1 to entry: The number of objects in an unbounded domain is infinite.
3.4
bounded object
object represented by a finite number of attributes
Note 1 to entry: Contrary to a bounded object, an unbounded object is represented with an infinite number of
attributes.
3.5
stability
extent to which the output of a neural network remains the same when its inputs are changed
Note 1 to entry: A more stable neural network is less likely to change its output when input changes are noise.
3.6
sensitivity
extent to which the output of a neural network varies when its inputs are changed
Note 1 to entry: A more sensitive neural network is less likely to change its outputs when input changes are
informative.
3.7
architecture
fundamental concepts or properties of a system in its environment embodied in its elements,
relationships, and in the principles of its design and evolution
3.8
relevance
ordered relative importance of an input's impact on the output of a neural network as compared to all
other inputs
3.9
criterion
rule on which a judgment or decision can be based, or by which a product, service, result, or process can
be evaluated
[SOURCE: ISO/IEC/IEEE 15289:2019 3.1.6]
3.10
time series
sequence of values sampled at successive points in time
[SOURCE: ISO/IEC 19794-1:2011, 3.54]
© ISO/IEC 2023 – All rights reserved
3.11
reachability
property describing whether a set of states is possible to be reached by an AI agent in a given
environment
3.12
piecewise linear neural network
neural network using piecewise linear activation functions
Note 1 to entry: Examples of linear activation functions are Rectify linear unit or MaxOut.
3.13
binarized neural network
neural network having parameters that are primarily binary
3.14
recurrent neural network
neural network maintaining an internal state which encodes what the neural network has learned after
processing a subsequence of the input data
3.15
transformer neural network
transformer
neural network using a self-attention mechanism to weight the effect of different parts of the input data
during processing
3.16
model checking
formal expression of a theory
3.17
structural-based testing
glass-box testing
white-box testing
structural testing
dynamic testing in which the tests are derived from an examination of the structure of the test item
Note 1 to entry: Structure-based testing is not restricted to use at component level and can be used at all levels,
e.g. menu item coverage as part of a system test.
Note 2 to entry: Techniques include branch testing, decision testing, and statement testing.
[SOURCE: ISO/IEC/IEEE 29119-1:2022, 3.80]
3.18
closed-box testing
specification-based testing
black-box testing
testing in which the principal test basis is the external inputs and outputs of the test item, commonly
based on a specification, rather than its implementation in source code or executable software
[SOURCE: ISO/IEC/IEEE 29119-1:2022, 3,75]
© ISO/IEC 2023 – All rights reserved
4 Abbreviated terms
AI artificial intelligence
BNN binarized neural networks
GNN graph neural networks
MILP mixed-integer linear programming
MRI magnetic resonance imaging
PLNN piecewise linear neural networks
ReLU rectified linear unit
RNN recurrent neural networks
SAR synthetic aperture radar
SMC satisfiability modulo convex
SMT satisfiability modulo theories
5 Robustness assessment
5.1 General
In the context of neural networks, robustness specifications typically represent different conditions
that can naturally or adversarially change in the domain (see 5.2) in which the neural network is
deployed.
EXAMPLE 1 Consider a neural network that processes medical images, where inputs fed to the neural network
are collected with a medical device that scans patients. Taking multiple images of the same patient naturally
does not produce identical images. This is because the orientation of the patient can slightly change, the lighting
in the room can change, an object can be reflected or random noise can be added by image post-processing steps.
EXAMPLE 2 Consider a neural network that processes the outputs of sensors and onboard cameras of a
self-driving vehicle. Due to the dynamic nature of the outside world, such as weather conditions, pollution and
lighting conditions, the input to the neural network is expected to have wide variations of various attributes.
Importantly, these variations introduced by the environment are typically not expected to change
the neural network’s robustness. The robustness of the neural network can then be verified against
changes to such environmental conditions by using relevant proxy specifications within the neural
network’s domain of use.
[10]
Robustness properties can be local or global. It is more common to verify local robustness properties
than global robustness properties, as the former are easier to specify. Local robustness properties are
specified with respect to a sample input from the test dataset. For example, given an image correctly
classified as a car, the local robustness property can specify that all images generated by rotating the
original image within 5 degrees are also classified as a car. A drawback of verifying local robustness
properties is that the guarantees are local to the provided test sample and do not extend to other samples
in the dataset. In contrast, global robustness properties define guarantees that hold deterministically
[11]
over all possible inputs. For domains where input features have semantic meaning, for example, air
traffic collision avoidance systems, the global properties can be specified by defining valid input values
for the input features expected in a real-world deployment. Defining meaningful input values is more
challenging in settings where the individual features have no semantic meaning. The set of robustness
properties described in this clause is not exhaustive and it is possible that new robustness properties
occur in the future.
© ISO/IEC 2023 – All rights reserved
5.2 Notion of domain
Most AI systems, including neural networks, are intended to operate in a particular environment
where their performance characteristics can be defined and evaluated (typical metrics of evaluation
can be found in ISO/IEC TR 24029-1:2021, Table 1). Robustness, being one of the key performance
characteristics, is inseparable from the domain where a neural network is operating. The existence of
a bounded domain is implicit in many neural network applications (e.g. image classification expects
images of certain quality and in a certain format).
The agent paradigm shown in Figure 1 (reproduced from ISO/IEC 22989:2022, Figure 1) postulates that
an agent senses its environment and acts on this environment towards achieving certain goals. The
distinct concepts AI agent and environment are emphasized in this paradigm. The notion of domain
captures the limitations of current technology where a neural network, being a particular type of AI
agent, is technically capable of achieving its goal only if it is operating on appropriate inputs.
Figure 1 — The agent paradigm
The concept of domain rests on the following pillars:
— a domain shall be determined by a set of attributes which are clearly defined (i.e. the domain
contains bounded objects);
— the specification of domain should be sufficient for the AI system to conduct one or more given tasks
as intended;
— data used for training should be representative of data expected to be used for inference.
Establishing a domain involves specifying all data attributes essential for the neural network to be
capable of achieving its goal.
Several popular domains of application of neural networks cover applications in vision, speech
processing and robotics. To describe these domains, and more importantly their variability, the
attributes used are generally numerical. Examples include the shape of an object in an image, the
intensity of some pixels or the amplitude of an audio signal.
However, other domains can be expressed through non-numerical attributes including natural language
processing, graph and Big Code (the use of automatically learning from existing code). In these cases,
the attributes can be non-numerical, for example, the words in a sentence or the edges in a graph.
The attributes allow the AI developer to generate another instance in the domain from an existing
instance. The attributes should be bounded in the robustness specification.
© ISO/IEC 2023 – All rights reserved
5.3 Stability
5.3.1 Stability property
A stability property expresses the extent to which a neural network output remains the same when
its inputs vary over a specific domain. Checking the stability over a domain where the behaviour is
supposed to hold allows for checking whether or not the performance can hold too. A stability property
can be expressed either in a closed-end form (e.g. “is the variation under this threshold?”) or an open-
ended form (e.g. “what is the largest stable domain?”).
In order to prove that a neural network remains performant in the presence of noisy inputs, a stability
property shall be expressed. A stability property should be used on domains of uses which, in terms
of expected behaviour, present some regularity properties. A stability property should not be used on
a chaotic system as it is not relevant. However, even when the regularity of the domain is not easy to
affirm (e.g. chaotic system), the stability property can be used to compare neural networks.
5.3.2 Stability criterion
A stability criterion establishes whether a stability property holds within a specific domain, not just
for a specific set of examples or for a subset of the domain such as training or validation datasets. A
stability criterion can be checked using formal methods described in 6.2.
A stability criterion shall define at least the domain value space and output value space on which it has
been measured and the stability property expected.
A stability criterion may be used as one of the criteria to compare models.
For a comparison to be accurate, the following requirements shall be met:
— the neural networks perform the same task;
— the stability criterion is used on the same domain;
— the stability criterion proves the same objective.
For example, for a neural network doing classification, a stability criterion assesses whether or not
a particular decision holds for every input in the domain. For a neural network doing regression, a
stability criterion assesses whether or not the regression remains stable on the domain.
To be applicable, a stability criterion relies on pre-existing information of the expected output of the
neural network. This information can be known by the AI developer or can be determined by another
means (using simulation or solver systems). It is well-suited to assess the robustness over a domain
where the expected answer is known to be similar. For this reason, a stability criterion is recommended
for any decision-making process handled by a neural network (e.g. classification, identification).
5.4 Sensitivity
5.4.1 Sensitivity property
A sensitivity property on a neural network expresses the extent to which the output of a neural network
varies when its inputs are changed. In order to assess the robustness on a domain, it is sometimes
necessary to check the variability of a system. A sensitivity analysis can be carried out to determine
how much the system varies and the inputs which can influence that variance. This analysis is then
compared to a pre-existing understanding of the expected performance of the system.
When a sensitivity analysis is used to determine whether a neural network stays bounded, the
sensitivity analysis shall be used over a domain. As is the case for the stability property, sensitivity
analysis is more suited for domains of use which present some regularity properties.
© ISO/IEC 2023 – All rights reserved
5.4.2 Sensitivity criterion
As a sensitivity criterion expresses a property over a domain (and not a specific set of examples) it can
be checked using formal methods described in 6.2.
A sensitivity criterion shall define at least the domain on which it has been measured and what are the
sensitivity thresholds to be checked.
A sensitivity criterion may be used to compare different neural network architectures or trained
models. For a comparison to be accurate, the following requirements shall be met:
— the neural networks shall perform the same task;
— the sensitivity criterion shall be used on the same domain;
— the sensitivity criterion shall prove the same objective.
A sensitivity criterion is especially well-suited for neural networks performing interpolation or
regression tasks. For these kinds of tasks, it often allows a direct proof against a ground truth that can
hold over a domain.
A sensitivity criterion is usually expressed in a closed-form as a threshold of variation over a specific
domain of variation of the inputs.
5.5 Relevance
5.5.1 Relevance property
A relevance property on a neural network expresses an ordering of the impact of the inputs on the
outputs. For each output, a relevance can be calculated. It expresses the individual impact of each input
on the result obtained for this output. For each output the individual impact of each input can be sorted
in an ordered fashion. A relevance property checks if the ordering obtained satisfies a requirement
order expressed by the AI developer. A relevance property can be checked using a variety of methods to
evaluate the impact of each input. Contrary to stability and sensitivity properties, a relevance property
can lead to a debate between the experts in charge of its evaluation. Indeed, two neural networks can
have very different relevance property results, both of which are still considered acceptable. A method
for resolving conflicting results should be included in the comparison protocol. For example, a protocol
can use a voting system in order to resolve the situation.
A relevance property should be used in cases where the neural network performs a task that can
be done by a human. For these cases, the justification of the output of the neural network should be
understood and verified. A relevance property asserts if the performance of the system can be assured
for the correct reasons. If that is the case, then the robustness of the system can be justified and not just
asserted. This verification can be done manually by a human operator or automatically using references
that have been checked before.
5.5.2 Relevance criterion
A relevance criterion expresses a relevance property over a domain which requires demonstration of
a link between each input and the outputs. For that, it requires a method able to separate the influence
of each input. Formal methods relying on symbolic calculus, logical calculus or computational methods
can be used to achieve such a goal. Examples of formal methods available to check a relevance criterion
are provided in 6.2.
A relevance criterion should present the domain on which it has been measured and the expected
results. If the expected results cannot be defined a priori, the relevance criterion should present at least
the methodology to evaluate the results.
© ISO/IEC 2023 – All rights reserved
A relevance criterion may be used to compare different neural network architectures or training
outputs. For a comparison to be accurate, the following requirements shall be met:
— the neural networks shall perform the same task;
— the relevance criterion shall be used on the same domain;
— the relevance criterion shall prove the same objective.
EXAMPLE For a neural network performing a classification task, a relevance criterion can be used to check
if the most relevant pixels are located on a specific part of the object to be identified (e.g. the wheels in order to
identify a vehicle). For a neural network performing predictive analysis of a time series, a relevance criterion can
be used to check if the predicted event matches a consequential logic acceptable for the AI developer (e.g. a soon
to be faulty engine can be triggered by an over-heating alarm).
A relevance criterion can be expressed on a variety of tasks, as long as the result can be analyzed by an
AI developer. A relevance criterion can be used for example, on classification, detection, interpolation
or regression tasks. Checking a relevance criterion can be automated or the checking can rely on human
assessment to see if the result obtained is acceptable. When the checking relies on human assessment,
the decision can be transferred as a new requirement to automate tests to the degree possible.
5.6 Reachability
5.6.1 Reachability property
A reachability property on a neural network expresses the multi-step performance of the network in
conjunction with its operating environment. This type of property applies to systems operating in the
agent paradigm as shown in Figure 1. A reachability property checks whether an AI agent can reach
a set of states when using the neural network to control itself in a given environment. A reachability
property can specify either a set of failure states that the AI agent shall avoid or a set of goal states that
the AI agent shall reach.
Expressing this type of property requires defining an environment model that describes the effect of an
AI agent’s action on its next state. The environment can evolve either deterministically or stochastically.
For a deterministic environment, the reachability property expresses whether or not it is possible for
the AI agent to reach a particular set of states.
5.6.2 Reachability criterion
A reachability criterion expresses a reachability property over a given set of initial states. For a
deterministic environment, it can be checked using methods described in 6.2.4. For a stochastic
environment, the criterion expresses a probability of reaching a set of states. This probability can be
determined using methods in 6.2.5.
A reachability criterion should be satisfied for a given set of initial states. The set of initial states can
be specified as part of the criterion. Alternatively, formal methods can be used to determine the set of
initial states for which the neural network satisfies the criterion. An advantage of using a reachability
criterion to evaluate a neural network is that it provides a metric on the performance of the network
in a closed-loop environment. Therefore, it can be used to express high-level safety properties that go
beyond input-output properties.
For example, in the case of an aircraft collision avoidance neural network, the reachability criterion can
express a requirement to avoid reaching a set of collision states given a particular environment model.
© ISO/IEC 2023 – All rights reserved
6 Applicability of formal methods on neural networks
6.1 Types of neural network concerned
6.1.1 Architectures of neural networks
6.1.1.1 General
Neural networks can be designed and built using different kinds of architectures. Formal verification
techniques for neural networks depend on their architecture. Subclause 6.1.1 describes formal
techniques that have been developed for the following architectures: piecewise linear neural networks,
binarized neural networks, recurrent neural networks and transformer networks. While this list is
not exhaustive, and new architectures and relevant formal verification techniques can emerge, this list
covers a large number of current neural network architectures and the techniques that apply. More
details about the techniques mentioned are available in 6.2.
6.1.1.2 Piecewise linear neural networks
[12]
PLNNs do not use non-linear functions such as sigmoid or tanh. PLNNs can use linear transformations
such as fully connected or convolutional layers, pooling units such as MaxPooling, and operations such
as batch-normalization or dropout that preserve piecewise linearity. The majority of current neural
networks are PLNNs.
Formal verification methods have been proposed that first transform a PLNN into a mathematically
equivalent set of linear classifiers, and then interpret each linear classifier by the features that dominate
[13]
its prediction. Other verification methods view the PLNN as a global optimization problem and use
a method like satisfiability modulo theories (SMT) solver. Some even have posed formal verification of
[14]
robustness as a mixed integer linear program. Other methods are presented in ISO/IEC TR 24029-1.
[15] [16] [17]
Additional verification methods include Fast-Lin – Fast-Lip, CROWN and formal safety analysis .
6.1.1.3 Binarized neural networks
In binarized neural networks (BNN) all activations are binary, making these networks memory
efficient and computationally efficient, enabling the use of specialized algorithms for fast binary matrix
multiplication. Various embedded applications ranging from image classification to object detection
[18]
have been built using such an architecture .
Formal verification of such BNNs has been achieved by creating an exact representation of the BNN
as a Boolean formula such that all valid pairs of inputs and outputs of a given network are solutions of
[19]
the Boolean formula. Verification is then achieved by using methods like Boolean satisfiability and
[18]
integer linear programming .
6.1.1.4 Recurrent neural networks
Recurrent neural networks (RNN) allow accurate and efficient processing of sequential data in many
domains including speech, finance and text. At each timestep, a RNN updates its internal state based
on the input at that step and the internal state from previous steps. The final output is obtained after
processing the whole input in sequence.
[20]
A recurrent neural network, used as a finite classifier, can be viewed as an infinite-state machine.
For such an infinite state system, a finite-state automaton can be trained using automated learning
techniques such as a shadow model approximating the system at hand. The shadow model can then be
used to check whether the RNN meets its specification, for example, using model checking techniques.
Besides model checking, abstract interpretation can be applied for proving local robustness of RNNs
[21]
used in image, audio and motion sensor data classification .
© ISO/IEC 2023 – All rights reserved
6.1.1.5 Transformer networks
[22]
Transformer networks can be deep learning networks with an encoder-decoder architecture. The
transformer starts by generating representations or embeddings for each distinct part of the input
through the encoder. While doing this, it uses self-attention to aggregate information from all of the
other parts of the input to generate a new internal representation for the input. This step is then
repeated multiple times in parallel for all parts of the input, successively generating new internal
representations. The decoder operates similarly and generates one part of the output at a time. While
doing this, the decoder attends to the other previously generated parts of the output and also factors
in the internal representations generated by the encoder. Transformers, thus, have complex self-
attention layers that pose many challenges for verification, including cross-nonlinearity and cross-
position dependency. Self-attention layers are the most challenging parts for formal verification of the
robustness of transformers.
In Reference [23], a method is proposed to formally verify the robustness of transformers. A transformer
layer is decomposed into a number of sub-layers, such that in each sub-layer, some operations are
performed on the neurons in that sub-layer. The operations that are performed fall broadly into three
categories:
— linear transformations;
— unary nonlinear functions;
— operations in self-attention.
Each sub-layer is viewed as containing n positions in the sequence with each position containing a
group
...
Frequently Asked Questions
ISO/IEC 24029-2:2023 is a standard published by the International Organization for Standardization (ISO). Its full title is "Artificial intelligence (AI) - Assessment of the robustness of neural networks - Part 2: Methodology for the use of formal methods". This standard covers: This document provides methodology for the use of formal methods to assess robustness properties of neural networks. The document focuses on how to select, apply and manage formal methods to prove robustness properties.
This document provides methodology for the use of formal methods to assess robustness properties of neural networks. The document focuses on how to select, apply and manage formal methods to prove robustness properties.
ISO/IEC 24029-2:2023 is classified under the following ICS (International Classification for Standards) categories: 35.020 - Information technology (IT) in general. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase ISO/IEC 24029-2:2023 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...