ISO/IEC DTS 42119-3
(Main)Artificial intelligence - Testing of AI - Part 3: Verification and validation analysis of AI systems
Artificial intelligence - Testing of AI - Part 3: Verification and validation analysis of AI systems
This document describes approaches and provides guidance on processes for the verification and validation analysis of AI systems (comprising AI system components and the interaction of non-AI components with the AI system components) including formal methods, simulation and evaluation. This document is applicable for AI systems verification and validation in the context of the AI system life cycle stages described in ISO/IEC 22989. This document is applicable to all types of organizations engaged in the development, deployment and use of AI systems.
Intelligence artificielle — Test des IA — Partie 3: Analyse de la vérification et de la validation des systèmes d'IA
General Information
Overview
ISO/IEC DTS 42119-3:2025 is an international standard providing guidance on verification and validation (V&V) analysis of artificial intelligence (AI) systems. It focuses on applying formal methods, simulation, and evaluation techniques throughout the AI system life cycle as outlined in ISO/IEC 22989. This standard supports organizations engaged in developing, deploying, or using AI systems by ensuring AI system components-and their interactions with non-AI components-are rigorously assessed for correctness, reliability, and performance. V&V analysis complements but does not replace traditional testing approaches.
Key Topics
Verification and Validation (V&V) Analysis: The document defines V&V analysis as the application of formal methods, simulation, and evaluation to AI systems for ensuring they meet system and requirements specifications, as well as end-user needs. It highlights the challenges AI systems pose, such as non-deterministic outputs, environment changes, and evolving models in autonomous deployment.
V&V in AI System Life Cycle: V&V activities occur continuously-from project scoping, design, and development to deployment and monitoring in production. Approaches vary depending on AI system complexity, type of machine learning, and system environment.
Formal Methods: Formal verification techniques are used to mathematically prove that AI system properties hold, especially for safety-critical applications and to demonstrate robustness. These include theorem proving, model checking, and symbolic analysis, which help identify issues early in development.
Simulation: Simulation methods, particularly relevant for cyber-physical systems (CPS) like autonomous vehicles, enable evaluation of AI behaviors under varying environmental conditions that may be impractical to test physically.
Evaluation Methods: Evaluation encompasses statistical analysis, benchmarking, calibration error measurement, and explainability assessments. These help quantify AI system performance, reliability, and interpretability, which are vital for trustworthiness.
Complement to Testing: V&V analysis approaches provide insights beyond conventional testing by accounting for AI-related complexities such as emergent behavior, dynamic adaptation, and continuous learning, enabling generation of additional test cases from analysis results.
Applications
AI System Development: Organizations can apply the standard’s guidance during design and development phases to rigorously verify AI models and their integration within larger systems.
Safety-Critical Systems: Applicable in domains like autonomous mobility, healthcare, and industrial automation, where formal verification and validation assure compliance with stringent safety and performance requirements.
Continuous Monitoring: Post-deployment, V&V analysis aids in ongoing validation and re-evaluation of AI systems adapting to environmental changes or software updates to maintain reliability.
Explainability and Trust: Incorporates qualitative evaluation methods to improve AI explainability, a key factor in ethical AI use and regulatory compliance.
Benchmarking and Calibration: Provides methods to benchmark AI systems and measure calibration errors in machine learning models, ensuring fair and accurate predictions.
Related Standards
ISO/IEC 22989: Defines AI system concepts and life cycle stages referenced for contextual V&V analysis.
ISO/IEC 25058: Offers guidance on quality evaluation methods for AI systems, complementing V&V activities.
ISO/IEC/IEEE 29119-1: Provides foundations for software testing processes including V&V.
ISO/IEC TR 24029-1: Addresses formal assessment of neural network robustness.
ISO/IEC TS 6254: Pertains to explainability methods relevant for AI system evaluation.
ISO 26262, IEC 61508, EN 50128: Safety standards recommending formal verification methods for safety-critical software.
By following ISO/IEC DTS 42119-3:2025, organizations can implement comprehensive verification and validation strategies tailored for AI systems’ unique properties-supporting robust, safe, and trustworthy AI technologies across diverse applications.
Frequently Asked Questions
ISO/IEC DTS 42119-3 is a draft published by the International Organization for Standardization (ISO). Its full title is "Artificial intelligence - Testing of AI - Part 3: Verification and validation analysis of AI systems". This standard covers: This document describes approaches and provides guidance on processes for the verification and validation analysis of AI systems (comprising AI system components and the interaction of non-AI components with the AI system components) including formal methods, simulation and evaluation. This document is applicable for AI systems verification and validation in the context of the AI system life cycle stages described in ISO/IEC 22989. This document is applicable to all types of organizations engaged in the development, deployment and use of AI systems.
This document describes approaches and provides guidance on processes for the verification and validation analysis of AI systems (comprising AI system components and the interaction of non-AI components with the AI system components) including formal methods, simulation and evaluation. This document is applicable for AI systems verification and validation in the context of the AI system life cycle stages described in ISO/IEC 22989. This document is applicable to all types of organizations engaged in the development, deployment and use of AI systems.
ISO/IEC DTS 42119-3 is classified under the following ICS (International Classification for Standards) categories: 35.020 - Information technology (IT) in general. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase ISO/IEC DTS 42119-3 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
FINAL DRAFT
Technical
Specification
ISO/IEC DTS
42119-3
ISO/IEC JTC 1/SC 42
Artificial intelligence — Testing of
Secretariat: ANSI
AI —
Voting begins on:
2025-07-18
Part 3:
Verification and validation analysis
Voting terminates on:
2025-09-12
of AI systems
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
Reference number
FINAL DRAFT
Technical
Specification
ISO/IEC DTS
42119-3
ISO/IEC JTC 1/SC 42
Artificial intelligence — Testing of
Secretariat: ANSI
AI —
Voting begins on:
Part 3:
Verification and validation analysis
Voting terminates on:
of AI systems
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
© ISO/IEC 2025
IN ADDITION TO THEIR EVALUATION AS
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
or ISO’s member body in the country of the requester.
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
© ISO/IEC 2025 – All rights reserved
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviations . 2
5 Overview of verification and validation (V&V) analysis . 3
5.1 General .3
5.2 V&V analysis in the AI system life cycle .5
6 V&V analysis approaches .7
6.1 Formal methods .7
6.2 Simulation .7
6.3 Evaluation .8
7 Processes used in V&V analysis approaches for AI systems .8
7.1 Introduction .8
7.2 Formal methods .8
7.2.1 Introduction .8
7.2.2 Selection of type of formal methods .9
7.2.3 Types of formal methods.9
7.3 Simulation in the context of cyber physical systems . 15
7.3.1 Cyber physical systems (CPS) . 15
7.3.2 AI cyber physical systems (AI-CPS) . 15
7.3.3 High-level approaches to simulation .16
7.3.4 Criteria for using simulation .17
7.3.5 Simulation across AI life cycle .18
7.4 Evaluation .18
7.4.1 General .18
7.4.2 Specialized evaluation analyses .18
Annex A (informative) Calibration error measures for machine learning, deep learning, and
sequence-to-sequence models . . 19
Annex B (informative) Qualitative evaluation of explainability of input classification models .24
Annex C (informative) Benchmarking .28
Bibliography .30
© ISO/IEC 2025 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 42, Artificial intelligence.
A list of all parts in the ISO/IEC 42119 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2025 – All rights reserved
iv
Introduction
Some AI systems can be verified against system specifications and requirements specifications. Similarly,
some AI systems can be validated to determine if they meet the implied and explicit needs of end-users and
organizations. This verification and validation can be done at all stages of the AI system life cycle.
The purpose of this document is to guide AI stakeholders to apply appropriate verification and validation
(V&V) analysis approaches for AI systems. V&V analysis comprises formal methods, simulation and
evaluation. These methods complement testing.
An AI system is typically composed of AI components and non-AI components. The V&V analysis of an AI
system considers potential complexities brought on by training data and non-deterministic model outputs.
This document aligns with the AI system life cycle stages described in ISO/IEC 22989 and ISO/IEC 5338, and
it expands on AI system V&V analysis approaches described in ISO/IEC/IEEE 29119-1:2022, Annex A. This
document does not specify a single V&V analysis approach meant to apply to all AI systems.
Testing of AI systems is addressed in ISO/IEC TR 29119-11, the SQuaRE quality models for AI systems are
detailed in ISO/IEC 25059 and quality evaluation of AI systems is addressed in ISO/IEC TS 25058. Testing
methods and quality evaluation for AI systems are important mechanisms for the V&V of AI systems.
Although this document uses the phrase “verification and validation” and the abbreviation “V&V”, and
ISO/IEC 17029 uses the similar phrase “validation and verification”, the two documents are aimed at
different audiences and have different Scopes.
Clause 4 provides an overview of V&V analysis.
Clause 5 discusses V&V analysis approaches.
Clause 6 discusses processes used in V&V analysis approaches for AI systems.
Annex A discusses calibration error measures for classification machine learning (ML) models.
Annex B discusses qualitative evaluation of explainability of input classification models.
Annex C discusses benchmarking.
© ISO/IEC 2025 – All rights reserved
v
FINAL DRAFT Technical Specification ISO/IEC DTS 42119-3:2025(en)
Artificial intelligence — Testing of AI —
Part 3:
Verification and validation analysis of AI systems
1 Scope
This document describes approaches and provides guidance on processes for the verification and validation
analysis of AI systems, including AI systems that are part of larger systems. Verification and validation
analysis is comprised of formal methods, simulation and evaluation.
This document is applicable for AI systems verification and validation analysis in the context of the AI
system life cycle stages described in ISO/IEC 22989.
This document is applicable to all types of organizations engaged in the development, deployment and use of
AI systems.
2 Normative references
The following documents are referred to in the text in such a way that some or all their content constitutes
the requirements of this document. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989, Information technology — Artificial intelligence — Artificial intelligence concepts and
terminology
ISO/IEC TS 25058, Systems and software engineering — Systems and software Quality Requirements and
Evaluation (SQuaRE) — Guidance for quality evaluation of artificial intelligence (AI) systems
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 22989 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
requirements specification
documentation that specifies the requirements for a system or component
Note 1 to entry: Typically included are functional requirements, performance requirements, interface requirements,
design requirements, and development standards.
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3447]
© ISO/IEC 2025 – All rights reserved
3.2
system specification
documented set of mandatory requirements for a system
Note 1 to entry: The more generic term is used for consistent terminology within this standard and includes terms
such as System Performance Specification (SPS), System Requirements Document (SRD), and System/Subsystem
Specification (SSS).
[SOURCE: ISO/IEC/IEEE 24748-8:2019, 3.1.7]
3.3
calibration measure
ratio of the confidence of the predictions by a machine learning model with respect to the accuracy of its
predictions
3.4
verification and validation analysis
V&V analysis
application of formal methods, simulation and evaluation within the broader context of verification and
validation
3.5
explainability
property of an AI system that enables a given human audience to comprehend the reasons for the system's
behaviour
Note 1 to entry: Explainability methods are not limited to the production of explanations, but also include the enabling
of interpretations.
1)
[SOURCE: ISO/IEC TS 6254:20—, 3.22 ]
4 Abbreviations
ACE adaptive calibration error
AI artificial intelligence
AUC area under the curve
AV autonomous vehicle
CARLA car learning to act
CPS cyber physical systems
DNN deep neural network
ECE expected calibration error
EEMBC embedded microprocessor benchmark consortium
HELM holistic framework for evaluating language models
HIL hardware in the loop
IG integrated gradients
LIME local interpretable model-agnostic explanations
1) Under preparation. Stage at the time of publication: ISO/IEC TS 6254:2025.
© ISO/IEC 2025 – All rights reserved
LRA linear real arithmetic
LRP layer-wise relevance propagation
MCE maximum calibration error
ML machine learning
NLP natural language processing
NP nondeterministic polynomial time
RL reinforcement learning
ROC receiver operating characteristic
SCE static calibration error
SHAP shapley additive explanations
SIL system in the loop
SPS system performance specification
SRD system requirements document
SSS system/subsystem specification
XAI explainability of AI
5 Overview of verification and validation (V&V) analysis
5.1 General
Verification and validation (V&V) analysis is a form of V&V that comprises formal methods, simulation and
evaluation. It complements but does not involve testing. V&V analysis is described in ISO/IEC/IEEE 29119-1
and is shown within the V&V hierarchy in Figure 1.
© ISO/IEC 2025 – All rights reserved
Figure 1 — V&V analysis in the V&V hierarchy
V&V analysis can be challenging due to several factors, such as:
The environment in which the AI system is deployed can change over time, such that input data is no longer
aligned with training data. Especially for cyber-physical systems, it is difficult to test for environments
that cannot be visualized in advance. For example, autonomous driving technology has been tested in
an environment dominated by non-autonomous systems. With time, as more autonomous systems are
deployed, the deployment environment changes, such that it can be difficult to know the expected result of
a test beforehand.
— AI systems can demonstrate inconsistent behaviour in response to unexpected input or over time. AI
systems can also display instability in case of slight deviations in input data (e.g. adversarial examples).
[8]
Continuous testing of AI systems is therefore important. This leads to complex requirements on
monitoring the environment, fault tolerance, continuous learning to adapt to changing environments
and other safety and quality measures.
— Some AI systems, while in autonomous mode, can change during the deployment phase, for example
due to software updates, improved algorithms or new data. This can affect conditions under which V&V
analysis is conducted or repeated. These changes should be considered when developing V&V analysis
approaches.
— For AI systems, consistency in behaviour is not always desirable due to challenges based on changing
environments.
Organizations can potentially gain insights into these challenges by complementing AI system testing with
V&V analysis approaches, such as reduced determinism, unpredictable and difficult-to-explain outputs, and
unanticipated, emergent behaviour that can lead to unintended consequences. Further, V&V analysis can
generate additional test cases after analysing the results of a given set of test cases and its results.
V&V analysis approaches can be tailored for AI systems that perform complex perceptual tasks, that process
high-dimensional inputs and whose functions are difficult to formally describe. V&V analysis approaches
can be tailored as a function of complexity of AI systems, tasks they perform and changes to the deployment
environment.
Formal methods of assessment of the robustness of neural networks are addressed in ISO/IEC TR 24029-1.
ISO/IEC 24029-2, IEC 61508, EN 50128 and ISO 26262 recommend the use of formal methods for certain
safety-critical systems.
© ISO/IEC 2025 – All rights reserved
Evaluation in the context of V&V analysis is addressed through ISO/IEC TS 25058, as discussed in 7.4.1.
5.2 V&V analysis in the AI system life cycle
Figure 2 provides an example of the continuous application of V&V analysis at various stages of AI system
development.
© ISO/IEC 2025 – All rights reserved
Figure 2 — Example of V&V analysis mapped to the ISO/IEC 22989 AI system life cycle
V&V analysis can apply from the stage where the AI project is scoped to the stages where it is deployed
then monitored in production. V&V analysis, when applied to a trained ML model, can be iterated in the
design and development stage. Post deployment, V&V analysis can be used in continuous validation and
© ISO/IEC 2025 – All rights reserved
re-evaluation stages, leading to transition back to the design and development stage. For example, formal
methods can be used early on in and throughout the process, while simulation approaches (6.3) can only be
used after prototype development.
EXAMPLE Applicable V&V analysis approaches during the design and development stage can vary based on the
type of learning applied to the ML model. Supervised machine learning makes use of labelled data during training,
while unsupervised machine learning makes use of unlabelled data during training
6 V&V analysis approaches
6.1 Formal methods
Formal methods are V&V analysis approaches that provide mechanisms for establishing properties that an
AI system is to satisfy to meet its required functionality. Formal methods such as model checking, theorem
proving and static analysis can be used to verify that a system as designed has been implemented according
to that design, without extensive testing and real-time use of the AI system. Formal methods can be applied
in cases that do not allow for complete testing to occur, complementing real-time use of the AI system as well
as other V&V methods such as testing. For example, formal methods can be applied to autonomous system to
[14]
provide insights into conditions under which they are more likely or less likely to behave unpredictably .
Conventional testing can be used as a baseline, with formal methods being introduced such as run-time
monitoring, model checking and theorem proving. Those methods vary in the strength of the results they
generate as well as the level of expertise they require. More thorough approaches generally require more
expertise, as illustrated in the assurance to effort spectrum shown in Figure 3.
Figure 3 — The assurance to effort spectrum
Examples of the objectives of formal methods include the following:
— data quality assessment, formal verification and gradient checking;
— bias and fairness testing (when mathematically modelled), data provenance tracking, regulatory
compliance checks and risk assessment;
— analysis and algorithmic bias testing (when using statistical or formal methods), backpropagation
verification, and performance monitoring and alerting (when based on predetermined formal
constraints).
6.2 Simulation
Simulation is a V&V analysis approach that attempts to evaluate the effect of diverse, heterogeneous
environments on AI systems, as well as the effect of AI systems on these heterogeneous environments. AI
systems are often applied in complex socio-technical environments exhibiting dynamic behaviour, with
many heterogeneous components such as humans and other systems. Simulation allows for replication of
© ISO/IEC 2025 – All rights reserved
real-world scenarios enabling assessment of such AI systems for accuracy, reliability and robustness before
their deployment in real world environment. V&V analysis using simulation helps in:
— qualifying the given AI system in diverse scenarios, especially focusing on boundary conditions;
— manipulation of key system variables in a controlled environment to check for AI system performance
and enabling a robust qualification of the AI system;
— faster development by enabling early-stage validation of AI system before real-world deployment.
Simulation can help with more cost-effective, faster development and qualification of an AI system while
ensuring improved performance and safety of the AI system. While benefits of simulation in V&V analysis
applies for any AI system, the benefits are more pronounced for complex systems like cyber-physical
systems. Examples of such systems include advanced driver assistance systems. In 6.3, simulation is
discussed prioritizing cyber-physical systems due to benefits realized given involved complexity of such
systems.
6.3 Evaluation
ISO/IEC TS 25058 provides guidance for evaluation of AI systems using the SQuaRE quality model.
Complementing the evaluation methods in ISO/IEC TS 25058, examples of evaluation analysis methods
include the following:
— cross-validation, hyperparameter tuning validation, performance benchmarking and confusion matrix
[15]
analysis ;
— ROC (receiver operating characteristic)-AUC (area under the curve) analysis and precision-recall curve
analysis;
— SHAP (SHapley Additive exPlanations), LIME (local interpretable model-agnostic explanations), saliency
maps and partial dependence plots;
— counterfactual analysis, user acceptance testing and explainability testing with users;
— model drift detection, resource usage monitoring, feedback loop integration, and ongoing monitoring
and periodic re-validation.
7 Processes used in V&V analysis approaches for AI systems
7.1 Introduction
Depending on its requirements and the nature of the AI system to which V&V analysis is applied, an
organization can apply formal methods, simulation or evaluation, as well as any combination of the three.
The scope of V&V analysis should be identified prior to initiating V&V analysis activities to ensure resource
availability and to meet stakeholder timelines.
7.2 Formal methods
7.2.1 Introduction
Formal methods involve the use of mathematical models and logical processes to verify that a system as
designed has been implemented according to that design. By using mathematical models of a system, it is
possible to verify a system’s properties more effectively than through non-exhaustive empirical testing.
Applying formal methods in AI system V&V analysis involves the following:
a) Defining formal specifications to define the properties and behaviour that the AI system is expected
to exhibit. These include both functional (expected behaviour) and non-functional (performance,
reliability, etc.) requirements;
© ISO/IEC 2025 – All rights reserved
b) Creating an abstract model (as the basis of verification) of the AI system that captures its essential
features and behaviours;
c) Employing verification tools to verify that the AI system meets the formal specifications;
d) Continuously monitoring the AI system’s performance in real-world conditions to ensure it remains
compliant with the formal specifications.
7.2.2 Selection of type of formal methods
Soundness and completeness are desired of any verification process for AI systems with respect to a
property specification. In the context of AI systems, soundness refers to the ability of a process to correctly
identify all valid behaviours of the AI system. Completeness refers to the ability of a process to correctly
identify all invalid behaviours of the AI system. A verification process is sound if, whenever it reports that
a property holds, the property actually holds. A verification process is complete if, whenever a property is
true of the AI system, it reports that the property holds. However, formal verification of AI systems can be a
computationally hard problem. For example, formal verification of deep neural networks (DNNs) belongs to
[16]
the class of nondeterministic polynomial time (NP)-hard problems. Non-linearity and non-convexity of
DNNs imposes mathematical limits on the significance and precision with which they can be solved as well
as their scalability in terms of the size of networks that can be verified. This makes the verification of even
basic properties computationally hard. Hence, any process that aims to verify an AI system that uses DNNs
balances soundness and completeness.
EXAMPLE A sound process that over-approximates valid DNN behaviours using a compact representation of
the network is likely to be more scalable than a complete process that attempts to exactly capture the valid set of
behaviours, at the cost of a more complex representation.
As discussed in 7.2.3.2, abstract interpretation-based methods are sound but potentially incomplete.
Abstract interpretation-based methods sacrifice precision for scalability.
As discussed in 7.2.3.3, constraint solving methods are typically complete but potentially unsound in specific
cases. Constraint solving based methods are also generally more precise but less scalable than abstract
interpretation-based methods.
7.2.3 Types of formal methods
7.2.3.1 General
Abstract interpretation, solver-based methods and model checking are types of formal methods applicable
as part of V&V analysis.
7.2.3.2 Abstract interpretation
Abstract interpretation is a formal method that helps build a queryable verification analysis specification.
[1]
It can be applied to training phases and runtime performance, enabling live data tracing and monitoring.
In the specific case of neural networks, each neuron in an ML model can hold values in larger ranges. A
typical AI model can have millions of such parameters, such that it is difficult if not impossible to track
outcomes for all possible combinations of values. Abstraction helps in studying properties of an AI-model by
reducing values to certain bins or by classifying values as positive, negative or zero.
Abstract interpretation guarantees that information about an AI-model approximates its semantics.
Properties are achieved by establishing relationships between formal methods and static analysis.
Generic steps in abstract interpretation are as follows:
— rule of sign: abstract the expressions only using literals, addition and subtractions;
— define the abstraction function;
— define the concretization function;
© ISO/IEC 2025 – All rights reserved
— define the Galois connection;
— for each edge, define an abstract transfer function;
— prove that the abstract function is consistent with concretization function;
— if abstract interpretation outcome is verified, the semantics of an ML model are also considered verified.
Over-approximation is an abstraction-based framework that can be applied to neural network verification
to reduce the size of the network. It is assumed that properties verified for the smaller, abstract network
hold true for larger network as well.
Steps in this process are as follows:
— apply approximation and create a smaller abstract network that can be verified more efficiently;
— if the abstract network does not satisfy the specification, the verification procedure provides a counter
example to demonstrate that original network violates specification or is spurious;
— if the counter example is spurious, refine the abstract network to make it more accurate and repeat the
process (a method known as counter example guided abstraction refinement);
— during the abstraction step, apply the generic steps at the neuron level.
7.2.3.3 Solver-based methods and applicability to DNNs
In constraint solver-based verification of AI systems a correctness property and the ML model are expressed
as a set of mathematical constraints, which are then solved to decide if the property holds.
As an example, a DNN can be imagined as a graph GV=(),E where each node or neuron vV∈ either
computes an affine expression of the form
n
av +b
ii i
∑
i=1
where
v are the weights of the DNN;
i
b are the biases of the DNN;
i
a are the inputs from the previous stage in the DNN.
i
For affine nodes, the v values are received on edges incident on v from neurons in the previous layer. av,
i ii
and b are real-valued or compute an activation function. This computation is followed by an application of
i
an activation function at the activation nodes which can be a piece-wise linear function such as a ReLU or a
non-linear function such as sigmoid or tanh.
Correctness properties that a DNN is expected to satisfy can be defined in terms of its inputs and outputs.
For example, for an image classifier DNN, given any input image x (of a dog) that is slightly darker or lighter
than a reference image c (of a dog), the output classification is expected to always be the same (dog). Such
properties can be expressed as a specification on the DNN, formally written as
T= {precondition on inputs x} y=f(x) {postcondition on outputs y},
nm
where the DNN is represented as a function fR: →R , for n and m neurons in the input and output layers,
respectively.
The DNN graph G and the specification can be encoded as mixed linear integer or linear real arithmetic
(LRA) constraints. A sample LRA encoding, ∅, for a single DNN neuron v , which computes the affine function
n
v
fR: →R , can be expressed as
v
© ISO/IEC 2025 – All rights reserved
n
k v
j
φ =⇒∧ Cv =+av b
v iout ∑ ij in i
i=1
j=1
where
n is the number of inputs to v and equal to the number of edges incident on v ;
v
v is a variable that encodes the output of v ;
out
j
v represents neurons in the preceding layer;
in
th
a is the weight of the j incident edge;
ij
b represents bias;
i
C allows modeling of conditions on inputs from the preceding layer.
i
R
One can similarly encode the activation nodes. For example, the LRA encoding for a ReLU activation node v
can be given as
R R R R R
φ =>vv00⇒=vv∧≤ ⇒=v 0
() ()
Rin out in in out
Finally, one can encode the edges between two neurons as
n
v
i
ϕ =∧ vw=
v in out
i=1
where
w, v are edges incident on v;
th
w output equal to the i input of v .
out
th
For each edge wv, incident on v , the output w is equal to the i input of v and the full DNN is encoded
()
out
as graph G .
[17]
This encoding along with the LRA encoding of the specification is solved using off-the-shelf solvers,
which use a combination of Davis–Putnam–Logemann–Loveland-based satisfiability solvers that handle
propositional constraints and an algorithm such as Simplex to handle the LRA constraints.
Figure 5 shows a typical process for using constraint solving based processes for DNN verification.
Figure 4 — Constraint solving for DNN verification
© ISO/IEC 2025 – All rights reserved
The first step is to obtain a trained DNN model and define a suitable correctness specification like the triple
T. Following this, a verification condition is constructed as
Φ =∧αφ ∧¬β
G
where
Φ represents the verification condition;
α represents LRA encodings of the pre-conditions given by the correctness specification;
β represents LRA encodings of the post-conditions given by the correctness specification;
φ is the LRA encoding of the DNN graph.
G
The post-condition is negated to check for violations of the property that is represented by satisfying
assignments to the formula. Alternatively, if the formula is unsatisfiable then this effectively proves that the
correctness property holds.
In the next step an off-the-shelf Satisfiability Modulo Theories or Mixed-Integer Linear Programming solver
is called to solve the verification condition Φ . If the solver is unable to either prove or disprove the formula,
then it is likely that the verification condition is too weak and it is necessary to strengthen it. This is done by
strengthening either the pre-condition or the post-condition or both. For example, if the output constraint
th
β initially captured the fact that the i output neuron has a value in the range 01, for a given set of
[]
inputs for a correct classification, then the post-condition can be strengthened by adding constraints that
limit the range of the other output neurons to achieve the same classification. This is done iteratively until
the solver can either prove or disprove the verification condition. Another way to refine the verification
condition is to add additional information about the DNN as constraints to the solver. Examples include a
constraint that the weights of the DNN are within a certain range or that some ReLU nodes are always
inactive or active. However, inferring such information about the DNN requires additional analysis of the
DNN, potentially using a different set of processes.
7.2.3.4 Model checking
Model checking is an automated formal verification approach whereby specifications or desired properties
are rigorously checked for a formal model or implementation given as a state–transition system, such as a
[18]
Kripke structure. If the specification does not hold for the formal model, the model checker generally
provides the counter example in the form of a path to property violation.
In the definition of Kripke structure, for the formal model M:
— S is the set of all possible states;
— I is the possible initial states;
— delta is the transition relation between the states.
The path in the model M is the sequence of states such that the relationship between the transition holds. The
paths can be finite or infinite. The infinite paths result from self-loops or cycles in the model. This contributes
to infinite states in the model and leads to the state-space explosion problem. The explicit declaration of
state variables can also result in a large number of states to encapsulate all possible behaviours in formal
model, which again can lead to state-space explosion. Numerous abstraction approaches are available to
reduce the size of the model, including partial order reduction, bisimulation and counterexample guided
abstraction refinement.
FANNet employs a model checking based approach for formal analysis of neural networks, obtaining a noise
[19]
tolerance of trained neural networks. Steps in this process are as follows:
— generating Kripke structure of neural networks;
— analysis of formal network models.
© ISO/IEC 2025 – All rights reserved
In Figure 5, the trained FANNet model and the corresponding test data is put through the following formal
verification stages:
— The trained model is functionally validated through formal modelling, which can be an iterative process.
— After completing functional validation, the model undergoes behavioural and robustness verification
with a different set of noise bounds measuring its tolerance. Noise tolerance levels for the trained model
are measured at this stage.
— If the model fails in the behavioural and robustness verification stage, the model undergoes
counterexample analysis. The checker provides counterexamples to determine if the input node is
sensitive or detecting presence of training bias.
— The model is then evaluated through safety verification to check if safety specifications are met.
© ISO/IEC 2025 – All rights reserved
Figure 5 — FANNet model checking methodology
Optimization of formal neural network analysis can be accomplished as follows:
— state space reduction;
© ISO/IEC 2025 – All rights reserved
— input domain segmentation;
— coarse-grain verification;
— random input segmentation.
7.3 Simulation in the context of cyber physical systems
7.3.1 Cyber physical systems (CPS)
Safety critical cyber physical systems (CPS) such as autonomous vehicles (AVs) or surgical robots are
increasingly built using AI components. Deployment of such systems entails extensive validation including
proper risk assessment. CPS is one of many contexts in which V&V analysis through simulation can be
effective.
7.3.2 AI cyber physical systems (AI-CPS)
Validation of AI CPS systems (AI-CPS) can utilize both conventional validation methods and simulation-
based processes to address potential complications associated with the AI components. Validating an AI-CPS
using solely conventional methods can result in undiscovered risks. Simulation as a V&V analysis approach
[20]
is recommended to be applied to AI-CPS as it allows to address the following key requirements :
a) Safety. Validating an AV system directly on real-world roads can be dangerous. Simulation allows a
controlled setting for the initial rounds of validation.
b) Efficiency. Real-world validation is not ideal for providing rapid validation feedback during development.
Simulation can generate feedback suited toward iterative development.
c) Coverage. In real-world validation settings, it can be difficult to exercise edge cases and adversarial
scenarios when estimating risk. Simulation provides a managed environment in which behaviour under
such scenarios can be validated.
d) Adaptability. The simulation environment allows an organization to modify (or simulate) the deployment
environment constantly to ensure the AI-CPS is not overfitting to a certain deployment scenario.
e) Unbiasedness. The adaptability of the simulation environment can reduce unwanted bias in model
evaluation and validation results, improving trustworthiness.
f) Prioritized failures. Adversarial conditions are easy to simulate in a simulated environment, allowing
for a prioritized validation of adversarial failure scenarios.
g) Black-box interaction. Simulation environments can treat the AI-CPS as a black-box when not all
components of the AI-CPS are not available. This is a powerful mechanism to validate AI-CPS through its
development phase ensuring integrity of the overall validation.
AI cyber physical systems (AI-CPS) can be designed to accommodate ambiguity and adjust to uncertainties.
These capabilities are provided through:
— Ability to manage data dynamically. The data is a combination of what the AI-CPS senses from its
environment and its own internal system state. For example, an autonomous car can factor in the current
traffic density when deciding the right speed for the vehicle on the freeway.
— Ability to make decisions that factor in the context in which that decision needs to be taken. For the
AI-CPS, the context is a combination of the environmental conditions at the time of taking the decision
and its own internal system context. For example, an autonomous car takes the decision of whether to
proceed or not at an intersection by considering the other vehicles at the intersection, pedestrians at
the intersection and other factors. The rule for navigating the intersection is not fixed, but adaptive in
nature based on the conditions at specific intersections.
© ISO/IEC 2025 – All rights reserved
— Ability to adapt its action based on the environmenta
...
Date:2025-05-21
ISO/IEC JTC1JTC 1/SC 42/JWG 2
Secretariat: ANSI
Date: 2025-07-04
Artificial intelligence — Testing of AI — —
Part 3:
Verification and validation analysis of AI systems
FDIS stage
© ISO/IEC 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying,
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO
at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
EmailE-mail: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2025 – All rights reserved
ii
Contents
Foreword . iv
Introduction . v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviations . 2
5 Overview of verification and validation (V&V) analysis . 3
5.1 General . 3
5.2 V&V analysis in the AI system life cycle . 4
6 V&V analysis approaches . 6
6.1 Formal methods . 6
6.2 Simulation . 7
6.3 Evaluation . 7
7 Processes used in V&V analysis approaches for AI systems . 7
7.1 Introduction . 7
7.2 Formal methods . 8
7.3 Simulation in the context of cyber physical systems . 14
7.4 Evaluation . 17
Annex A (informative) Calibration error measures for machine learning, deep learning, and
sequence-to-sequence models . 18
Annex B (informative) Qualitative evaluation of explainability of input classification models . 23
Annex C (informative) Benchmarking . 28
Bibliography . 30
© ISO/IEC 2025 – All rights reserved
iii
Foreword
ISO (the International Organization for Standardization) is a and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide federation of national standardsstandardization.
National bodies (that are members of ISO member bodies). The workor IEC participate in the development of
preparing International Standards is normally carried out through ISO technical committees. Each member
body interested in a subject for which a technical committee has been established has the right to be
represented on that committee. Internationalby the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
ISO documentsdocument should be noted. This document was drafted in accordance with the editorial rules
of the ISO/IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
Attention is drawnISO and IEC draw attention to the possibility that some of the elementsimplementation of
this document may beinvolve the subjectuse of (a) patent(s). ISO and IEC take no position concerning the
evidence, validity or applicability of any claimed patent rights in respect thereof. As of the date of publication
of this document, ISO and IEC had not received notice of (a) patent(s) which may be required to implement
this document. However, implementers are cautioned that this may not represent the latest information,
which may be obtained from the patent database available at www.iso.org/patents and https://patents.iec.ch
rights. ISO. ISO and IEC shall not be held responsible for identifying any or all such patent rights. Details of any
patent rights identified during the development of the document will be in the Introduction and/or on the ISO
list of patent declarations received (see ).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT),) see
www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 7, Software and systems engineering and Subcommittee SC 42, Artificial intelligence.
A list of all parts in the ISO/IEC 42119 series can be found on the ISO website. The ISO/and IEC 42119 series
Commented [eXtyles1]: Invalid reference: "ISO/IEC
42119 series"
includes the following:websites.
ISO/IEC TS 42119-2 – Artificial intelligence — Testing of AI — Part 2: Overview of testing AI systems
Commented [eXtyles2]: ISO/IEC TS 42119-2: current
stage is 50.00
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html and www.iec.ch/national-
committees.
© ISO/IEC 2025 – All rights reserved
iv
Introduction
Some AI systems can be verified against system specifications and requirements specifications. Similarly,
some AI systems can be validated to determine if they meet the implied and explicit needs of end-users and
organizations. This verification and validation can be done at all stages of the AI system life cycle.
The purpose of this document is to guide AI stakeholders to apply appropriate verification and validation
(V&V) analysis approaches for AI systems. V&V analysis comprises formal methods, simulation and
evaluation. These methods complement testing.
An AI system is typically composed of AI components and non-AI components. The V&V analysis of an AI
system considers potential complexities brought on by training data and non-deterministic model outputs.
This document aligns with the AI system life cycle stages described in ISO/IEC 22989 and ISO/IEC 5338, and
[1]
it expands on AI system V&V analysis approaches described in ISO/IEC/IEEE 29119-1:2022, Annex A. . This
document does not specify a single V&V analysis approach meant to apply to all AI systems.
[2]
Testing of AI systems is addressed in ISO/IEC TR 29119-11, , the SQuaRE quality models for AI systems are
[3] [4]
detailed in ISO/IEC 25059 and quality evaluation of AI systems is addressed in ISO/IEC TS 25058. . Testing
methods and quality evaluation for AI systems are important mechanisms for the V&V of AI systems.
Although this document uses the phrase “verification and validation” and the abbreviation “V&V”, and ISO/IEC
17029:2019 uses the similar phrase “validation and verification”, the two documents are aimed at different
audiences and have different Scopes.
Clause 4Clause 4 provides an overview of V&V analysis.
Clause 5Clause 5 discusses V&V analysis approaches.
Clause 6Clause 6 discusses processes used in V&V analysis approaches for AI systems.
Annex AAnnex A discusses calibration error measures for classification machine learning (ML) models.
Annex BAnnex B discusses qualitative evaluation of explainability of input classification models.
Annex CAnnex C discusses benchmarking.
© ISO/IEC 2025 – All rights reserved
v
Artificial intelligence — Testing of AI — —
Part 3:
Verification and validation analysis of AI systems
1 Scope
This document describes approaches and provides guidance on processes for the verification and validation
analysis of AI systems, including AI systems that are part of larger systems. Verification and validation analysis
is comprised of formal methods, simulation and evaluation.
This document is applicable for AI systems verification and validation analysis in the context of the AI system
life cycle stages described in ISO/IEC 22989.
This document is applicable to all types of organizations engaged in the development, deployment and use of
AI systems.
2 Normative references
The following documents are referred to in the text in such a way that some or all their content constitutes the
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989:2022, Information technology — Artificial intelligence — Artificial intelligence concepts and
terminology
ISO/IEC /TS 25058:2024, Systems and software engineering — Systems and software Quality Requirements and
Evaluation (SQuaRE) — Guidance for quality evaluation of artificial intelligence (AI) systems
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 22989:2022 apply and the
following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— — ISO Online browsing platform: available at https://www.iso.org/obp
— — IEC Electropedia: available at https://www.electropedia.org/
3.1 3.1
requirements specification
documentation that specifies the requirements for a system or component
Note 1 to entry: Typically included are functional requirements, performance requirements, interface requirements,
design requirements, and development standards.
[5]
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3447 ]]
3.2 3.2
system specification
documented set of mandatory requirements for a system
© ISO/IEC 2025 – All rights reserved
Note 1 to entry: The more generic term is used for consistent terminology within this standard and includes terms such
as System Performance Specification (SPS), System Requirements Document (SRD), and System/Subsystem Specification
(SSS).
[6]
[SOURCE: ISO/IEC/IEEE 24748-8:2019, 3.1.7 ]]
3.3 3.3
calibration measure
ratio of the confidence of the predictions by a machine learning model with respect to the accuracy of its
predictions
3.4 3.4
verification and validation analysis
V&V analysis
application of formal methods, simulation and evaluation within the broader context of verification and
validation
3.5 3.5
explainability
property of an AI system that enables a given human audience to comprehend the reasons for the system's
behaviour
Note 1 to entry: Explainability methods are not limited to the production of explanations, but also include the enabling
of interpretations.
[7] 1)
[SOURCE: ISO/IEC TS 6254 –:20—, 3.22 ] ]
4 Abbreviations
ACE adaptive calibration error
AI artificial intelligence
AUC area under the curve
AV autonomous vehicle
CARLA car learning to act
CPS cyber physical systems
DNN deep neural network
ECE expected calibration error
EEMBC embedded microprocessor benchmark consortium
HELM holistic framework for evaluating language models
HIL hardware in the loop
IG integrated gradients
LIME local interpretable model-agnostic explanations
LRA linear real arithmetic
LRP layer-wise relevance propagation
MCE maximum calibration error
1)
Under preparation. Stage at the time of publication: ISO/IEC TS 6254:2025.
© ISO/IEC 2025 – All rights reserved
ML machine learning
NLP natural language processing
NP nondeterministic polynomial time
RL reinforcement learning
ROC receiver operating characteristic
SCE static calibration error
SHAP shapley additive explanations
SIL system in the loop
SPS system performance specification
SRD system requirements document
SSS system/subsystem specification
XAI explainability of aiAI
5 Overview of verification and validation (V&V) analysis
5.1 General
Verification and validation (V&V) analysis is a form of V&V that comprises formal methods, simulation and
evaluation. It complements but does not involve testing. V&V analysis is described in ISO/IEC/IEEE 29119-1
and is shown within the V&V hierarchy in Figure 1Figure 1.
17847_ed1fig1.EPS
Figure 1 — V&V analysis in the V&V hierarchy
V&V analysis can be challenging due to several factors, such as:
© ISO/IEC 2025 – All rights reserved
The environment in which the AI system is deployed can change over time, such that input data is no longer
aligned with training data. Especially for cyber-physical systems, it is difficult to test for environments that
cannot be visualized in advance. For example, autonomous driving technology has been tested in an
environment dominated by non-autonomous systems. With time, as more autonomous systems are deployed,
the deployment environment changes, such that it can be difficult to know the expected result of a test
beforehand.
— — AI systems can demonstrate inconsistent behaviour in response to unexpected input or over time. AI
systems can also display instability in case of slight deviations in input data (e.g. adversarial
[8 [8]]
examples). ). Continuous testing of AI systems is therefore important. This leads to complex
requirements on monitoring the environment, fault tolerance, continuous learning to adapt to changing
environments and other safety and quality measures.
— — Some AI systems, while in autonomous mode, can change during the deployment phase, for example
due to software updates, improved algorithms or new data. This can affect conditions under which V&V
analysis is conducted or repeated. These changes should be considered when developing V&V analysis
approaches.
— — For AI systems, consistency in behaviour is not always desirable due to challenges based on changing
environments.
Organizations can potentially gain insights into these challenges by complementing AI system testing with
V&V analysis approaches., such as reduced determinism;, unpredictable and difficult-to-explain outputs;, and
unanticipated, emergent behaviour that can lead to unintended consequences. Further, V&V analysis can
generate additional test cases after analysing the results of a given set of test cases and its results.
V&V analysis approaches can be tailored for AI systems that perform complex perceptual tasks, that process
high-dimensional inputs and whose functions are difficult to formally describe. V&V analysis approaches can
be tailored as a function of complexity of AI systems, tasks they perform and changes to the deployment
environment.
[9]
Formal methods of assessment of the robustness of neural networks are addressed in ISO/IEC TR 24029-1
[10] [11] [12] [13]
and. ISO/IEC 24029-2 ., IEC 61508 , EN 50128 and ISO 26262 recommend the use of formal methods
Commented [eXtyles3]: Invalid reference: "ISO/IEC
24029-2 [10]"
for certain safety-critical systems.
Evaluation in the context of V&V analysis is addressed through ISO/IEC TS 25058, as discussed in 7.4.17.4.1.
5.2 V&V analysis in the AI system life cycle
Figure 2Figure 2 provides an example of the continuous application of V&V analysis at various stages of AI
system development.
© ISO/IEC 2025 – All rights reserved
17847_ed1fig2.EPS
Figure 2 — Example of V&V analysis mapped to the ISO/IEC 22989 AI system life cycle
© ISO/IEC 2025 – All rights reserved
V&V analysis can apply from the stage where the AI project is scoped to the stages where it is deployed then
monitored in production. V&V analysis, when applied to a trained ML model, can be iterated in the design and
development stage. Post deployment, V&V analysis can be used in continuous validation and re-evaluation
stages, leading to transition back to the design and development stage. For example, formal methods can be
used early on in and throughout the process, while simulation approaches (6.3(6.3)) can only be used after
prototype development.
EXAMPLE Applicable V&V analysis approaches during the design and development stage can vary based on the type
of learning applied to the ML model. Supervised machine learning makes use of labelled data during training, while
unsupervised machine learning makes use of unlabelled data during training
6 V&V analysis approaches
6.1 Formal methods
Formal methods are V&V analysis approaches that provide mechanisms for establishing properties that an AI
system is to satisfy to meet its required functionality. Formal methods such as model checking, theorem
proving and static analysis can be used to verify that a system as designed has been implemented according
to that design, without extensive testing and real-time use of the AI system. Formal methods can be applied in
cases that do not allow for complete testing to occur, complementing real-time use of the AI system as well as
other V&V methods such as testing. For example, formal methods can be applied to autonomous system to
[14[14] ]
provide insights into conditions under which they are more likely or less likely to behave unpredictably . .
Conventional testing can be used as a baseline, with formal methods being introduced such as run-time
monitoring, model checking and theorem proving. Those methods vary in the strength of the results they
generate as well as the level of expertise they require. More thorough approaches generally require more
expertise, as illustrated in the assurance to effort spectrum shown in Figure 3Figure 3.
17847_ed1fig3.EPS
Figure 3 — The assurance to effort spectrum
Examples of the objectives of formal methods include the following:
— — data quality assessment, formal verification and gradient checking;
— — bias and fairness testing (when mathematically modelled), data provenance tracking, regulatory
compliance checks and risk assessment;
© ISO/IEC 2025 – All rights reserved
— — analysis and algorithmic bias testing (when using statistical or formal methods), backpropagation
verification, and performance monitoring and alerting (when based on predetermined formal
constraints).
6.2 Simulation
Simulation is a V&V analysis approach that attempts to evaluate the effect of diverse, heterogeneous
environments on AI systems, as well as the effect of AI systems on these heterogeneous environments. AI
systems are often applied in complex socio-technical environments exhibiting dynamic behaviorbehaviour,
with many heterogeneous components such as humans and other systems. Simulation allows for replication
of real-world scenarios enabling assessment of such AI systems for accuracy, reliability and robustness before
their deployment in real world environment. V&V analysis using simulation helps in:
— — qualifying the given AI system in diverse scenarios, especially focusing on boundary conditions;
— — manipulation of key system variables in a controlled environment to check for AI system performance
and enabling a robust qualification of the AI system;
— — faster development by enabling early-stage validation of AI system before real-world deployment.
Simulation can help with more cost-effective, faster development and qualification of an AI system while
ensuring improved performance and safety of the AI system. While benefits of simulation in V&V analysis
applies for any AI system, the benefits are more pronounced for complex systems like cyber-physical systems.
Examples of such systems include advanced driver assistance systems. In 6.36.3,, simulation is discussed
prioritizing cyber-physical systems due to benefits realized given involved complexity of such systems.
6.3 Evaluation
ISO/IEC TS 25058 provides guidance for evaluation of AI systems using the SQuaRE quality model.
Complementing the evaluation methods in ISO/IEC TS 25058, examples of evaluation analysis methods
include the following:
— — cross-validation, hyperparameter tuning validation, performance benchmarking and confusion matrix
[15[15] ]
analysis ; ;
— — ROC (receiver operating characteristic)-AUC (Areaarea under the curve) analysis and precision-recall
curve analysis;
— — SHAP (SHapley Additive exPlanations), LIME (local interpretable model-agnostic explanations),
saliency maps and partial dependence plots;
— — counterfactual analysis, user acceptance testing and explainability testing with users;
— — model drift detection, resource usage monitoring, feedback loop integration, and ongoing monitoring
and periodic re-validation.
7 Processes used in V&V analysis approaches for AI systems
7.1 Introduction
Depending on its requirements and the nature of the AI system to which V&V analysis is applied, an
organization can apply formal methods, simulation or evaluation, as well as any combination of the three. The
scope of V&V analysis should be identified prior to initiating V&V analysis activities to ensure resource
availability and to meet stakeholder timelines.
© ISO/IEC 2025 – All rights reserved
7.2 Formal methods
7.2.1 Introduction
Formal methods involve the use of mathematical models and logical processes to verify that a system as
designed has been implemented according to that design. By using mathematical models of a system, it is
possible to verify a system’s properties more effectively than through non-exhaustive empirical testing.
Applying formal methods in AI system V&V analysis involves the following:
a) a) Defining formal specifications to define the properties and behaviour that the AI system is
expected to exhibit. These include both functional (expected behaviour) and non-functional
(performance, reliability, etc.) requirements;
b) b) Creating an abstract model (as the basis of verification) of the AI system that captures its
essential features and behaviours;
c) c) Employing verification tools to verify that the AI system meets the formal specifications;
d) d) Continuously monitoring the AI system’s performance in real-world conditions to ensure it
remains compliant with the formal specifications.
7.2.2 Selection of type of formal methods
Soundness and completeness are desired of any verification process for AI systems with respect to a property
specification. In the context of AI systems, soundness refers to the ability of a process to correctly identify all
valid behaviours of the AI system. Completeness refers to the ability of a process to correctly identify all invalid
behaviours of the AI system. A verification process is sound if, whenever it reports that a property holds, the
property actually holds. A verification process is complete if, whenever a property is true of the AI system, it
reports that the property holds. However, formal verification of AI systems can be a computationally hard
problem. For example, formal verification of deep neural networks (DNNs) belongs to the class of
[16 [16]]
nondeterministic polynomial time (NP)-hard problems. . Non-linearity and non-convexity of DNNs
imposes mathematical limits on the significance and precision with which they can be solved as well as their
scalability in terms of the size of networks that can be verified. This makes the verification of even basic
properties computationally hard. Hence, any process that aims to verify an AI system that uses DNNs balances
soundness and completeness.
EXAMPLE A sound process that over-approximates valid DNN behaviours using a compact representation of the
network is likely to be more scalable than a complete process that attempts to exactly capture the valid set of behaviours,
at the cost of a more complex representation.
As discussed in 7.2.3.27.2.3.2,, abstract interpretation-based methods are sound but potentially incomplete.
Abstract interpretation-based methods sacrifice precision for scalability.
As discussed in 7.2.3.37.2.3.3,, constraint solving methods are typically complete but potentially unsound in
specific cases. Constraint solving based methods are also generally more precise but less scalable than abstract
interpretation-based methods.
7.2.3 Types of formal methods
7.2.3.1 General
Abstract interpretation, solver-based methods and model checking are types of formal methods applicable as
part of V&V analysis.
© ISO/IEC 2025 – All rights reserved
7.2.3.2 Abstract interpretation
[1 [1]]
Abstract interpretation is a formal method that helps build a queryable verification analysis specification. .
It can be applied to training phases and runtime performance, enabling live data tracing and monitoring.
In the specific case of neural networks, each neuron in an ML model can hold values in larger ranges. A typical
AI model can have millions of such parameters, such that it is difficult if not impossible to track outcomes for
all possible combinations of values. Abstraction helps in studying properties of an AI-model by reducing values
to certain bins or by classifying values as positive, negative or zero.
Abstract interpretation guarantees that information about an AI-model approximates its semantics.
Properties are achieved by establishing relationships between formal methods and static analysis.
Generic steps in abstract interpretation are as follows:
— — rule of sign: abstract the expressions only using literals, addition and subtractions;
— — define the abstraction function;
— — define the concretization function;
— — define the Galois connection;
— — for each edge, define an abstract transfer function;
— — prove that the abstract function is consistent with concretization function;
— — if abstract interpretation outcome is verified, the semantics of an ML model are also considered
verified.
Over-approximation is an abstraction-based framework that can be applied to neural network verification to
reduce the size of the network. It is assumed that properties verified for the smaller, abstract network hold
true for larger network as well.
Steps in this process are as follows:
— — apply approximation and create a smaller abstract network that can be verified more efficiently;
— — if the abstract network does not satisfy the specification, the verification procedure provides a counter
example to demonstrate that original network violates specification or is spurious;
— — if the counter example is spurious, refine the abstract network to make it more accurate and repeat
the process (a method known as counter example guided abstraction refinement);
— — during the abstraction step, apply the generic steps at the neuron level.
7.2.3.3 Solver-based methods and applicability to DNNs
In constraint solver-based verification of AI systems a correctness property and the ML model are expressed
as a set of mathematical constraints, which are then solved to decide if the property holds.
As an example, a DNN can be imagined as a graph 𝐺𝐺 = (𝑉𝑉,𝐸𝐸) where each node or neuron 𝑣𝑣∈𝑉𝑉 either
computes an affine expression of the form
© ISO/IEC 2025 – All rights reserved
𝑛𝑛
� 𝑎𝑎𝑣𝑣 +𝑏𝑏
𝑖𝑖 𝑖𝑖 𝑖𝑖
𝑖𝑖=1
where
𝑣𝑣 are the weights of the DNN;
𝑖𝑖
𝑏𝑏 are the biases of the DNN;
𝑖𝑖
𝑎𝑎 are the inputs from the previous stage in the DNN.
𝑖𝑖
For affine nodes, the 𝑣𝑣 values are received on edges incident on 𝑣𝑣 from neurons in the previous layer. 𝑎𝑎 ,𝑣𝑣
𝑖𝑖 𝑖𝑖 𝑖𝑖
and 𝑏𝑏 are real-valued or compute an activation function. This computation is followed by an application of an
𝑖𝑖
activation function at the activation nodes which can be a piece-wise linear function such as a ReLU or a non-
linear function such as sigmoid or tanh.
Correctness properties that a DNN is expected to satisfy can be defined in terms of its inputs and outputs. For
example, for an image classifier DNN, given any input image 𝑥𝑥 (of a dog) that is slightly darker or lighter than
a reference image 𝑐𝑐 (of a dog), the output classification is expected to always be the same (dog). Such
properties can be expressed as a specification on the DNN, formally written as
T= {precondition on inputs x} y=f(x) {postcondition on outputs y},
𝑛𝑛 𝑚𝑚
where the DNN is represented as a function ,𝑓𝑓:𝑅𝑅 →𝑅𝑅 , for 𝑛𝑛 and 𝑚𝑚 neurons in the input and output layers,
respectively.
The DNN graph 𝐺𝐺 and the specification can be encoded as mixed linear integer or linear real arithmetic (LRA)
constraints. A sample LRA encoding, ∅, for a single DNN neuron ,𝑣𝑣, which computes the affine function
𝑛𝑛
𝑣𝑣
,𝑓𝑓 :𝑅𝑅 →𝑅𝑅, can be expressed as
𝑣𝑣
𝑛𝑛
𝑣𝑣
𝑘𝑘
𝑗𝑗
𝜙𝜙 =∧ (𝐶𝐶 ⇒ (𝑣𝑣 =� 𝑎𝑎 𝑣𝑣 +𝑏𝑏 ))
𝑣𝑣 𝑖𝑖 𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑗𝑗 𝑖𝑖
𝑖𝑖𝑛𝑛
𝑖𝑖=1
𝑗𝑗=1
where
𝑛𝑛 is the number of inputs to 𝑣𝑣 and equal to the number of edges incident on 𝑣𝑣;
𝑣𝑣
𝑣𝑣 is a variable that encodes the output of 𝑣𝑣;
𝑜𝑜𝑜𝑜𝑜𝑜
𝑗𝑗
𝑣𝑣 represents neurons in the preceding layer;
𝑖𝑖𝑛𝑛
𝑜𝑜ℎ
𝑎𝑎 is the weight of the 𝑗𝑗 incident edge;
𝑖𝑖𝑗𝑗
𝑏𝑏 represents bias;
𝑖𝑖
𝐶𝐶 allows modeling of conditions on inputs from the preceding layer.
𝑖𝑖
𝑅𝑅
One can similarly encode the activation nodes. For example, the LRA encoding for a ReLU activation node 𝑣𝑣
can be given as
𝑅𝑅 𝑅𝑅 𝑅𝑅 𝑅𝑅 𝑅𝑅
𝜙𝜙 = (𝑣𝑣 > 0⇒𝑣𝑣 =𝑣𝑣 )∧ (𝑣𝑣 ≤ 0⇒𝑣𝑣 = 0)
𝑅𝑅 𝑖𝑖𝑛𝑛 𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖𝑛𝑛 𝑖𝑖𝑛𝑛 𝑜𝑜𝑜𝑜𝑜𝑜
Finally, one can encode the edges between two neurons as
𝑛𝑛
𝑣𝑣
𝑖𝑖
𝜑𝜑 =∧ 𝑣𝑣 =𝑤𝑤
𝑣𝑣 𝑖𝑖𝑛𝑛 𝑜𝑜𝑜𝑜𝑜𝑜
𝑖𝑖=1
where
w, v are edges incident on v;
𝑜𝑜ℎ
𝑤𝑤 output equal to the 𝑖𝑖 input of 𝑣𝑣.
𝑜𝑜𝑜𝑜𝑜𝑜
© ISO/IEC 2025 – All rights reserved
𝑜𝑜ℎ
For each edge (𝑤𝑤,𝑣𝑣) incident on ,𝑣𝑣, the output 𝑤𝑤 is equal to the 𝑖𝑖 input of 𝑣𝑣 and the full DNN is encoded
𝑜𝑜𝑜𝑜𝑜𝑜
as graph .𝐺𝐺.
[ [17]]
This encoding along with the LRA encoding of the specification is solved using off-the-shelf solvers, 17,
which use a combination of Davis–Putnam–Logemann–Loveland-based satisfiability solvers that handle
propositional constraints and an algorithm such as Simplex to handle the LRA constraints.
Figure 5Figure 5 shows a typical process for using constraint solving based processes for DNN verification.
17847_ed1fig4.EPS
Figure 4 — Constraint solving for DNN verification
The first step is to obtain a trained DNN model and define a suitable correctness specification like the triple T.
Following this, a verification condition is constructed as
𝛷𝛷 =𝛼𝛼∧𝜙𝜙 ∧ ¬𝛽𝛽
𝐺𝐺
_
where
𝛷𝛷 represents the verification condition;
_
𝛼𝛼 represents LRA encodings of the pre-conditions given by the correctness specification;
𝛽𝛽 represents LRA encodings of the post-conditions given by the correctness specification;
𝜙𝜙 is the LRA encoding of the DNN graph.
𝐺𝐺
The post-condition is negated to check for violations of the property that is represented by satisfying
assignments to the formula. Alternatively, if the formula is unsatisfiable then this effectively proves that the
correctness property holds.
In the next step an off-the-shelf Satisfiability Modulo Theories or Mixed-Integer Linear Programming solver is
called to solve the verification condition .𝛷𝛷. If the solver is unable to either prove or disprove the formula, then
_
it is likely that the verification condition is too weak and it is necessary to strengthen it. This is done by
strengthening either the pre-condition or the post-condition or both. For example, if the output constraint 𝛽𝛽
𝑜𝑜ℎ
initially captured the fact that the 𝑖𝑖 output neuron has a value in the range [0,1] for a given set of inputs for
a correct classification, then the post-condition can be strengthened by adding constraints that limit the range
of the other output neurons to achieve the same classification. This is done iteratively until the solver can
either prove or disprove the verification condition. Another way to refine the verification condition is to add
additional information about the DNN as constraints to the solver. Examples include a constraint that the
weights of the DNN are within a certain range or that some ReLU nodes are always inactive or active. However,
inferring such information about the DNN requires additional analysis of the DNN, potentially using a different
set of processes.
© ISO/IEC 2025 – All rights reserved
7.2.3.4 Model checking
Model checking is an automated formal verification approach whereby specifications or desired properties
are rigorously checked for a formal model or implementation given as a state–transition system, such as a
[18 [18]]
Kripke structure. . If the specification does not hold for the formal model, the model checker generally
provides the counter example in the form of a path to property violation.
In the definition of Kripke structure, for the formal model M:
— — S is the set of all possible states;
— — I is the possible initial states;
— — delta is the transition relation between the states.
The path in the model M is the sequence of states such that the relationship between the transition holds. The
paths can be finite or infinite. The infinite paths result from self-loops or cycles in the model. This contributes
to infinite states in the model and leads to the state-space explosion problem. The explicit declaration of state
variables can also result in a large number of states to encapsulate all possible behaviours in formal model,
which again can lead to state-space explosion. Numerous abstraction approaches are available to reduce the
size of the model, including partial order reduction, bisimulation and counterexample guided abstraction
refinement.
FANNet employs a model checking based approach for formal analysis of neural networks, obtaining a noise
[19 [19]]
tolerance of trained neural networks. . Steps in this process are as follows:
— — generating Kripke structure of neural networks;
— — analysis of formal network models.
In Figure 5Figure 5,, the trained FANNet model and the corresponding test data is put through the following
formal verification stages:
— — The trained model is functionally validated through formal modelling, which can be an iterative
process.
— — After completing functional validation, the model undergoes behavioural and robustness verification
with a different set of noise bounds measuring its tolerance. Noise tolerance levels for the trained model
are measured at this stage.
— — If the model fails in the behavioural and robustness verification stage, the model undergoes
counterexample analysis. The checker provides counterexamples to determine if the input node is
sensitive or detecting presence of training bias.
— — The model is then evaluated through safety verification to check if safety specifications are met.
© ISO/IEC 2025 – All rights reserved
17847_ed1fig5.EPS
Figure 5 — FANNet model checking methodology
© ISO/IEC 2025 – All rights reserved
Optimization of formal neural network analysis can be accomplished as follows:
— — state space reduction;
— — input domain segmentation;
— — coarse-grain verification;
— — random input segmentation.
7.3 Simulation in the context of cyber physical systems
7.3.1 Cyber physical systems (CPS)
Safety critical cyber physical systems (CPS) such as autonomous vehicles (AVs) or surgical robots are
increasingly built using AI components. Deployment of such systems entails extensive validation including
proper risk assessment. CPS is one of many contexts in which V&V analysis through simulation can be
effective.
7.3.2 AI cyber physical systems (AI-CPS)
Validation of AI CPS systems (AI-CPS) can utilize both conventional validation methods and simulation-based
processes to address potential complications associated with the AI components. Validating an AI-CPS using
solely conventional methods can result in undiscovered risks. Simulation as a V&V analysis approach is
[20[20] ]
recommended to be applied to AI-CPS as it allows to address the following key requirements : :
a) a) Safety. Validating an AV system directly on real-world roads can be dangerous. Simulation
allows a controlled setting for the initial rounds of validation.
b) b) Efficiency. Real-world validation is not ideal for providing rapid validation feedback during
development. Simulation can generate feedback suited toward iterative development.
c) c) Coverage. In real-world validation settings, it can be difficult to exercise edge cases and
adversarial scenarios when estimating risk. Simulation provides a managed environment in which
behaviour under such scenarios can be validated.
d) d) Adaptability. The simulation environment allows an organization to modify (or simulate) the
deployment environment constantly to ensure the AI-CPS is not overfitting to a certain deployment
scenario.
e) e) Unbiasedness. The adaptability of the simulation environment can reduce unwanted bias in
model evaluation and validation results, improving trustworthiness.
f) f) Prioritized failures. Adversarial conditions are easy to simulate in a simulated environment,
allowing for a prioritized validation of adversarial failure scenarios.
g) g) Black-box interaction. Simulation environments can treat the AI-CPS as a black-box when not
all components of the AI-CPS are not available. This is a powerful mechanism to validate AI-CPS through
its development phase ensuring integrity of the overall validation.
AI cyber physical systems (AI-CPS) can be designed to accommodate ambiguity and adjust to uncertainties.
These capabilities are provided through:
— — Ability to manage data dynamically. The data is a combination of what the AI-CPS senses from its
environment and its own internal system state. For example, an autonomous car can factor in the current
traffic density when deciding the right speed for the vehicle on the freeway.
© ISO/IEC 2025 – All rights reserved
— — Ability to make decisions that factor in the context in which that decision needs to be taken. For the
AI-CPS, the context is a combination of the environmental conditions at the time of taking the decision and
its own internal system context. For example, an autonomous car takes the decision of whether to proceed
or not at an intersection by considering the other vehicles at the intersection, pedestrians at the
intersection and other factors. The rule for navigating the intersection is not fixed, but adaptive in nature
based on the conditions at specific intersections.
— — Ability to adapt its action based on the environmental conditions factoring in the ambiguities and
uncertainties. For example, an AV can adapt its behaviour based on whether pedestrians choose to cross
the road and other similarly ambiguous scenarios.
17847_ed1fig6.EPS
Figure 6 — Generic illustration of AI-CPS
In Figure 6Figure 6,, the sensors help the system sense from its environment. The communication interfaces
help the AI-CPS interact with external actors as well as to provide required contextual information critical to
decision making.
The knowledge bases represent the cognitive decision-making nerve centre of the AI-CPS. Episodic memory
is critical to learning from experience as it serves as a long-term memory for events affecting the AI-CPS and
backbone to the adaptive nature of an AI-CPS. The discourse memory stores all the communication details of
the AI-CPS with its environment and helps resolve the ambiguities in the context. The plan library is a
repository of successfully executed plans for common classes of problems, which helps drive efficiency in
achieving the purpose of the AI-CPS. The model libraries contain all the models specific to the given domain.
For example, object detection for vehicles will be different from object detection in a retail shop. Tasks relevant
to the use case and user model when it is relevant to interact with users in its environment. The knowledge
base is continuously updated by the self-regulation block. This block controls the cognitive information
processing and adapts using AI reasoning starting from an initial configuration.
Simulation can bring significant value when the AI-CPS is deciding and acting. In the decision stage, physics
simulation provides raw sensor data and ground truth information, which eliminates the need for complex
image processing, reduces training time and increases accuracy. When the AI-CPS is acting, simulation offers
[21 [21]]
sensitivity, robustness, and reliability analyses. . Simulation also helps improve AI-CPS using
© ISO/IEC 2025 – All rights reserved
reinforcement learning (RL) based models. Simulation helps train the RL models by creating diverse simulated
environments which helps improve the quality of decide and act components of an AI-CPS based on RL.
Simulation approaches for the AI-CPS and the environment in which it is deployed can differ depending on the
validation objectives.
7.3.3 High-level approaches to simulation
7.3.3.1 General
[22 [22]]
Simulation approaches inclu
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...