ISO/IEC TS 42112
(Main)Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization
Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization
This document describes the characteristics that impact machine learning model training efficiency and then provides the optimization approaches that apply to these characteristics. This document provides AI providers and AI producers with a set of characteristics and the related optimizations that they can use to enhance their machine learning model training efficiency. AI providers and AI producers can also use this information to evaluate different machine learning model training approaches. This document does not specify any training accelerating mechanisms provided and implemented within machine learning computing device defined in ISO /IEC TR 17903 and its library.
Technologies de l'information — Intelligence artificielle — Recommandations relatives à l'optimisation de l'efficacité de l'entraînement du modèle d'apprentissage automatique
General Information
- Status
- Not Published
- Technical Committee
- ISO/IEC JTC 1/SC 42 - Artificial intelligence
- Current Stage
- 6000 - International Standard under publication
- Start Date
- 24-Apr-2026
- Completion Date
- 25-Apr-2026
Overview
ISO/IEC TS 42112: Information technology - Artificial intelligence - Guidance on machine learning model training efficiency optimization provides structured guidance for optimizing the efficiency of machine learning (ML) model training. Rapid growth in the scale and complexity of ML models and datasets has increased the time, resources, and cost required for effective model training. This technical specification supports both AI providers (offering ML platforms or infrastructure) and AI producers (developing ML-based solutions) by outlining factors that impact training efficiency and presenting relevant optimization strategies. Its ultimate goal is to help stakeholders reduce model training time and resource consumption without sacrificing the quality or scalability of AI systems.
Key Topics
ISO/IEC TS 42112 explores a range of topics essential to enhancing machine learning training efficiency:
- Efficiency-Impacting Characteristics: Discussion of how training data quality, dataset size, model parameters, and computing resource management affect training durations and resource usage.
- Training Data Management: Guidance on ensuring data quality, optimal dataset size, and efficient data processing workflows.
- Model Parameter Handling: Strategies for managing large parameter sets, especially in distributed training environments.
- Communication Optimization: Addressing network bottlenecks, synchronization challenges, and solutions to minimize data transfer delays.
- Failure Detection and Recovery: Recommendations for robust monitoring, checkpointing, and recovery mechanisms to reduce downtime and loss in the event of hardware or software failures.
- Optimization Approaches: Presentation of various practical methods such as optimized data preparation, feature engineering, parallelism strategies, and resource allocation techniques.
- Role of Stakeholders: Differentiation between AI providers and producers, highlighting their respective roles and optimization objectives.
Applications
The value of ISO/IEC TS 42112 extends across a variety of ML and AI applications, including:
- Machine Learning Model Development: Enables AI producers to design efficient training processes, leading to faster deployment and iteration of new models.
- AI Platform Optimization: Assists AI providers in maximizing the utilization of computing and networking resources across multiple customers and tasks, thus reducing operational costs.
- Enterprise AI Solutions: Offers actionable guidance for businesses seeking to minimize time-to-market for AI-enabled products and services through optimized ML workflows.
- E-Commerce and Large-Scale Systems: Applicable to industries such as e-commerce, where rapid and accurate model retraining on large datasets is critical for personalized recommendations and user experience improvements.
- Robustness and Reliability Enhancements: Supports implementation of robust failover and recovery strategies, improving ML infrastructure’s stability and reliability.
Related Standards
ISO/IEC TS 42112 complements and aligns with several existing international standards, ensuring interoperability and best practice adoption across the AI lifecycle:
- ISO/IEC 22989: Artificial intelligence concepts and terminology – foundational definitions relevant for ML system stakeholders.
- ISO/IEC 23053: Framework for AI systems using machine learning – provides context for ML pipelines referenced in this specification.
- ISO/IEC TR 17903: Overview of ML computing devices – referenced for hardware and library-specific acceleration mechanisms.
- ISO/IEC TS 4213: Assessment of ML classification performance – aids in evaluating model outcomes post-training.
- ISO/IEC 25010 & 25059: Quality model standards – define performance efficiency in software and AI systems.
By adhering to ISO/IEC TS 42112, AI providers and AI producers can systematically identify efficiencies, streamline ML workflows, and better evaluate competing training strategies. This specification is a practical resource for organizations aiming to balance scalability, speed, and resource optimization in artificial intelligence development and deployment.
Buy Documents
ISO/IEC DTS 42112 - Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization/12/2026
REDLINE ISO/IEC DTS 42112 - Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization/12/2026
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

NYCE
Mexican standards and certification body.
Sponsored listings
Frequently Asked Questions
ISO/IEC TS 42112 is a draft published by the International Organization for Standardization (ISO). Its full title is "Information technology — Artificial intelligence — Guidance on machine learning model training efficiency optimization". This standard covers: This document describes the characteristics that impact machine learning model training efficiency and then provides the optimization approaches that apply to these characteristics. This document provides AI providers and AI producers with a set of characteristics and the related optimizations that they can use to enhance their machine learning model training efficiency. AI providers and AI producers can also use this information to evaluate different machine learning model training approaches. This document does not specify any training accelerating mechanisms provided and implemented within machine learning computing device defined in ISO /IEC TR 17903 and its library.
This document describes the characteristics that impact machine learning model training efficiency and then provides the optimization approaches that apply to these characteristics. This document provides AI providers and AI producers with a set of characteristics and the related optimizations that they can use to enhance their machine learning model training efficiency. AI providers and AI producers can also use this information to evaluate different machine learning model training approaches. This document does not specify any training accelerating mechanisms provided and implemented within machine learning computing device defined in ISO /IEC TR 17903 and its library.
ISO/IEC TS 42112 is classified under the following ICS (International Classification for Standards) categories: 35.240.01 - Application of information technology in general. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC TS 42112 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
FINAL DRAFT
Technical
Specification
ISO/IEC DTS 42112
ISO/IEC JTC 1/SC 42
Information technology — Artificial
Secretariat: ANSI
intelligence — Guidance on machine
Voting begins on:
learning model training efficiency
2026-02-26
optimization
Voting terminates on:
2026-04-23
Technologies de l'information — Intelligence artificielle —
Recommandations relatives à l'optimisation de l'efficacité de
l'entraînement du modèle d'apprentissage automatique
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
Reference number
ISO/IEC DTS 42112:2026(en) © ISO/IEC 2026
FINAL DRAFT
ISO/IEC DTS 42112:2026(en)
Technical
Specification
ISO/IEC DTS 42112
ISO/IEC JTC 1/SC 42
Information technology — Artificial
Secretariat: ANSI
intelligence — Guidance on machine
Voting begins on:
learning model training efficiency
optimization
Voting terminates on:
Technologies de l'information — Intelligence artificielle —
Recommandations relatives à l'optimisation de l'efficacité de
l'entraînement du modèle d'apprentissage automatique
RECIPIENTS OF THIS DRAFT ARE INVITED TO SUBMIT,
WITH THEIR COMMENTS, NOTIFICATION OF ANY
RELEVANT PATENT RIGHTS OF WHICH THEY ARE AWARE
AND TO PROVIDE SUPPOR TING DOCUMENTATION.
© ISO/IEC 2026
IN ADDITION TO THEIR EVALUATION AS
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO
LOGICAL, COMMERCIAL AND USER PURPOSES, DRAFT
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
INTERNATIONAL STANDARDS MAY ON OCCASION HAVE
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
TO BE CONSIDERED IN THE LIGHT OF THEIR POTENTIAL
or ISO’s member body in the country of the requester.
TO BECOME STAN DARDS TO WHICH REFERENCE MAY BE
MADE IN NATIONAL REGULATIONS.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/IEC DTS 42112:2026(en) © ISO/IEC 2026
© ISO/IEC 2026 – All rights reserved
ii
ISO/IEC DTS 42112:2026(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 3
5 Overview of ML model training . 3
5.1 Model training in ML pipeline .3
5.2 Stakeholders of model training .4
6 ML model training efficiency . 4
7 Characteristics impacting ML model training efficiency . 4
7.1 Training data .4
7.2 Model parameter management .4
7.3 Communication challenges . .5
7.4 Failure detection in model training .5
7.5 Failure recovery in model training .5
7.6 Quality of the ML model .5
7.7 Management of computing resources .6
8 Model training efficiency optimization methods . 6
8.1 Overview .6
8.2 Training data preparation and model optimization .6
8.2.1 General consideration .6
8.2.2 Training data quality optimization . .7
8.2.3 Feature engineering .7
8.2.4 Feature selection .8
8.2.5 Feature scaling .8
8.2.6 Training algorithm selection .8
8.2.7 Training process optimization .9
8.2.8 Ensemble learning .9
8.3 Parallelism strategies . . .9
8.3.1 Data parallelism .9
8.3.2 Model parallelism .10
8.3.3 Hybrid parallelism .10
8.4 Communication optimization.10
8.4.1 Collective communication .10
8.4.2 Data compression .10
8.4.3 Asynchronous communication .11
8.4.4 Network topology-aware scheduling .11
8.5 Model checkpoint optimization .11
8.5.1 Hierarchical checkpoint saving .11
8.5.2 Overlapping model copy and computation .11
8.5.3 Network-aware asynchronous storage .11
8.6 Resource management for model training . 12
8.7 Failure detection optimization . 12
8.8 Continuous monitoring and anomaly detection . 13
8.9 Environment and infrastructure assessment . 13
Annex A (informative) Use case: Deep learning recommendation system for an e-commerce
platform . 14
Bibliography . 17
© ISO/IEC 2026 – All rights reserved
iii
ISO/IEC DTS 42112:2026(en)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 42, Artificial intelligence.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2026 – All rights reserved
iv
ISO/IEC DTS 42112:2026(en)
Introduction
Machine learning (ML) is a key branch of artificial intelligence (AI). To apply ML across diverse domains, ML
models are trained, validated and deployed in production environments. As model complexity and dataset
size continue to grow, the time, hardware resources, human effort and financial costs associated with ML
model training are escalating.
ML platforms, services and products are now widely available and adopted. Both AI providers (who offer
ML platforms, services or products) and AI producers (who build ML-based solutions) seek to optimize
training efficiency to minimize resource consumption and cost, without compromising model or dataset
scale. For instance, AI providers aim to reduce hardware usage per training task to support more customers
and improve resource utilization. AI producers prioritize faster training cycles to accelerate deployment of
validated models.
This document provides guidance to help AI providers and producers achieve faster training and reduced
resource consumption, given specific models, dataset and infrastructure.
© ISO/IEC 2026 – All rights reserved
v
FINAL DRAFT Technical Specification ISO/IEC DTS 42112:2026(en)
Information technology — Artificial intelligence — Guidance
on machine learning model training efficiency optimization
1 Scope
This document outlines key factors affecting machine learning model training efficiency and presents
corresponding optimization approaches.
It provides guidance for AI providers and producers through a structured set of characteristics and related
optimizations to improve training efficiency. This information can support the evaluation and comparison
of various ML training strategies.
This document does not specify any training accelerating mechanisms provided and implemented within
machine learning computing device described in ISO/IEC TR 17903.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989:2022, Information technology — Artificial intelligence — Artificial intelligence concepts and
terminology
ISO/IEC 23053:2022, Framework for Artificial Intelligence (AI) Systems Using Machine Learning (ML)
ISO/IEC TR 17903:2024, Information technology — Artificial intelligence — Overview of machine learning
computing devices
ISO/IEC TS 4213, Information technology — Artificial intelligence — Assessment of machine learning
classification performance
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 22989, ISO/IEC 23053,
ISO/IEC TR 17903, ISO/IEC TS 4213 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
artificial intelligence platform
AI platform
set of services that are provided by AI platform provider and enable other stakeholders to produce artificial
intelligence (AI) services or products
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTS 42112:2026(en)
3.2
asynchronous
pertaining to two or more processes that do not depend upon the occurrence of specific events such as
common timing signals
[SOURCE: ISO/IEC 20944-1:2013, 3.6.1.14]
3.3
computation graph
graph used to represent mathematical computation process, where nodes represent operations or variables
and edges represent data flows and dependencies
3.4
failure
event during the model training process in which the training task fails to complete as expected or
experiences significant degradation in performance
3.5
fault
defect, error or abnormal condition in infrastructure, hardware, software, data or configuration that can
lead to a failure (3.4) in the model training process
3.6
grid search
hyperparameter tuning technique that explores all possible combinations of hyperparameters within a
predefined set to identify the combination that can maximize ML model performance
Note 1 to entry: Hyperparameter tuning is specified in ISO/IEC 23053:2022, 6.5.3.1.
3.7
principal-component analysis
PCA
factor analysis involving the extraction of orthogonal factors that successively capture the largest amount of
variance in the dataset
[SOURCE: ISO 18115-1:2023, 22.14, modified — Notes to entry removed.]
3.8
random search
hyperparameter tuning technique that randomly samples a fixed number of hyperparameter combinations
within a predefined set to identify a favourable combination
Note 1 to entry: Hyperparameter tuning is specified in ISO/IEC 23053:2022, 6.5.3.1.
3.9
synchronous
pertaining to two or more processes that depend upon the occurrence of specific events such as common
timing signals
[SOURCE: ISO/IEC 20944-1:2013, 3.6.1.13]
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTS 42112:2026(en)
4 Abbreviated terms
AI artificial intelligence
AdaBoost adaptive boosting
Bagging bootstrap aggregating
BMI body mass index
CPU central processing unit
CNN convolutional neural network
D2H Device-to-Host
FP floating-point
GPU graphics processing unit
KNN k-nearest neighbours
Lasso least absolute shrinkage and selection operator
LDA linear discriminant analysis
LSTM long short-term memory
ML machine learning
Nested CV nested cross validation
NLP natural language processing
PCA principal component analysis
RNN recurrent neural network
SMOTE synthetic minority over-sampling technique
SSD solid-state drive
SVM support vector machine
t-SNE t-distributed stochastic neighbour embedding
XGBoost extreme gradient boosting
5 Overview of ML model training
5.1 Model training in ML pipeline
The ML pipeline is defined in ISO/IEC 23053:2022, Clause 8. Within this pipeline, training data are prepared
and an appropriate ML algorithm is selected prior to model training. During model training, an ML model is
trained using the training data to establish its parameters. After training, model selection is conducted to
tune hyperparameters, followed by model evaluation and verification.
This document focuses exclusively on optimizing efficiency during the ML model training phase.
NOTE This document does not cover the ML optimization methods specified in ISO/IEC 23053:2022, 6.5.4.
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTS 42112:2026(en)
5.2 Stakeholders of model training
AI stakeholder roles and their sub-roles, as defined in ISO/IEC 22989:2022, 5.19, are applicable to the context
of ML model training in this document.
In this context, an AI provider supplies tools, infrastructure or services, typically via an AI platform, to
enable organizations to train ML models. The AI platform facilitates various aspects of the ML workflow,
including data preparation, model training, evaluation and deployment. Key features provided by the AI
platform include:
— computing resources allowing access to hardware (CPUs, GPUs or other types of processors) necessary
for model training;
— storage solutions for secure and efficient management of datasets and models;
— training automation to streamline and manage aspects of the training process;
— collaboration tools supporting version control, project management and team coordination;
[9]–[11]
— framework support ensuring compatibility with various ML frameworks (optional).
An AI producer is responsible for designing, training, and validating ML models, and may utilize the tools,
infrastructure or services offered by an AI provider.
6 ML model training efficiency
Performance efficiency is a key characteristic in the AI system quality model (see ISO/IEC 25059). It is
defined in terms of time behaviour, resource utilization and capacity (see ISO/IEC 25010). In the context of
ML model training, efficiency optimization refers to:
— time behaviour optimization, which aims to reduce the duration of model training;
— resource utilization optimization, which focuses on minimizing hardware usage or improving utilization
rate.
7 Characteristics impacting ML model training efficiency
7.1 Training data
Several characteristics of training data influence ML training efficiency and optimization potential.
The quality of the training data impacts the efficiency of ML model training and the inference performance of
the trained model. ISO/IEC 5259-2 specifies data quality characteristics and methods for their quantitative
assessment. Low-quality training data can lead to longer training time, increased resource consumption
and degraded inference performance. If data quality requirements are not met, retraining can be necessary,
which can significantly increase overall training duration and resource usage.
The size of the training dataset has a complex effect on both training efficiency and model inference
performance. Larger datasets demand more computation, memory and network resources, and typically
require longer training durations. Conversely, datasets that are too small to be representative, complete,
balanced, effective or fair can impair model performance, even if they require fewer resources and shorter
training time.
The method of data processing also impacts training efficiency. Sequential data processing can be
prohibitively time-consuming or infeasible for large datasets.
7.2 Model parameter management
A large number of model parameters can require distributed storage and frequent updates across multiple
ML computing devices.
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTS 42112:2026(en)
7.3 Communication challenges
Centralized, synchronous communication between multiple ML computing devices where training data are
split can be an efficiency bottleneck. Latency between ML computing devices further contributes to reduced
training efficiency.
Concurrent model training tasks running on shared computing infrastructure can lead to network
congestion. When multiple tasks simultaneously transmit data transmission, they may exceed network
bandwidth limits, causing network congestion. High-volume data transmission from certain training task
can obstruct network traffic for others, slowing down communication and negatively impacting overall
training efficiency.
7.4 Failure detection in model training
ML model training typically runs on computing infrastructure equipped with supporting libraries, software
and tools. When training large models with extensive dataset, the process can be prolonged. Hardware
failure, system faults, power outage or other disruptions may interrupt the training process.
When failures occur, troubleshooting is necessary. If the issue is not clearly linked to the AI producer’s code,
the AI provider’s operations team should become intervene to diagnose and isolate the fault. Ultimately, it
can be necessary for the AI producer to resubmit the training task, which can be time-consuming.
In large-scale distributed synchronous deep learning tasks, hardware faults can leave many GPU cards idle,
resulting in substantial resource waste.
Effective failure detection is therefore critical to maintaining ML model training efficiency.
7.5 Failure recovery in model training
To prevent restarting the entire training process after a failure, intermediate model states can be saved
during training. For large models trained on extensive dataset, a checkpoints mechanism should be
implemented to periodically save model states—including weights, optimizer configurations and other
relevant parameters. Checkpoints enable AI producers to resume training from a saved state rather than
starting from scratch. However, generating checkpoints introduces overhead and failure recovery using this
method can affect overall training efficiency.
7.6 Quality of the ML model
The quality of an ML model, including correctness, robustness, bias mitigation, information security and
protection from exploitation, can significantly influence training efficiency.
Correctness directly affects training efficiency, as it reflects how well the model performs in terms of metrics
like accuracy and error rates. High correctness indicates effective learning and generalization, reducing the
need for retraining and improving resource efficiency. However, striving for high correctness can lead to
overfitting, where the model becomes overly tailored to the training data and performs poorly on unseen
data. Mitigating overfitting requires techniques such as cross-validation, regularization and additional
tuning, which can extend the training process.
Robustness reflects the model’s ability to handle fluctuations and interruptions in training data and to
perform reliably with imperfect or faulty data. Training for robustness involves exposing the model to noisy
or incomplete data and evaluating its ability under such conditions. A robust model can adapt to fluctuations
without significant performance decrease, reduce the need for repeated validation and conserve resources.
Achieving robustness can require additional training phases, such as incorporating negative examples or
optimizing hyperparameters to handle perturbations. This additional complexity can lengthen the training
process and require more computing resources. Techniques like data augmentation and regularization can
improve resilience but also increase training time and computational costs.
Bias mitigation requires careful data curation and pre-processing to ensure representative and fairness.
Techniques such as re-weighting or data augmentation can be necessary to address underrepresented
gropes. A model free from unwanted bias generalizes better across diverse data and reduces retraining
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTS 42112:2026(en)
due to ethical or legal concerns. However, ensuring fairness can increase training pipeline complexity and
require additional fine-tuning and post-training bias audits, further extending the training cycle.
Information security impacts training efficiency by introducing requirements for data privacy and model
confidentiality. Measures such as encryption and access control can complicate data preparation and
training workflows. Maintaining infrastructure availability, data confidentiality and model integrity helps
avoid costly retraining. However, secure handling of sensitive data and model parameters can require
additional steps, such as secure data storage and protected access environments, which introduces overhead
that affects overall training efficiency.
Safeguard against exploitation also influences training efficiency. Secure models help prevent system
failures and reduce retraining caused by adversarial vulnerabilities. Techniques such as output obfuscation
and review of model predictions help prevent unintended information exposure and adversarial exploitation.
Proactively building adversarial robustness saves time and resources otherwise spent on reactive defences.
However, simulating attack scenarios and validating robustness can significantly increase training time and
resources demands.
7.7 Management of computing resources
Poor management of computing resource can lead to workload imbalance, where some ML computing
devices are overloaded while others remain underutilized. This results in inefficient resources usage and
may cause instability or failure in overloaded devices.
When multiple model training tasks run concurrently on the same underlying infrastructure, resource
congestion can occur. These tasks simultaneously compete for limited computing capacity, and critical
training tasks can be delayed or interrupted due to insufficient resource allocation, ultimately slowing down
the overall training process.
8 Model training efficiency optimization methods
8.1 Overview
Model training efficiency optimization aims to reduce training time while improving computing resources
utilization. It is a multifaceted process that involves various approaches targeting different aspects of
the training process. From a stakeholder perspective, different roles emphasize distinct optimization
approaches.
— AI producers primarily focus on optimizing training data preparation and model design, including
training data quality optimization, feature engineering, feature scaling, training algorithm selection,
training process optimization and ensemble learning.
— AI providers, especially AI platform providers, focus on system-level optimizations, including parallel
computing, communication, checkpoint, failure detection, resource management, continuous monitoring
and anomaly detection, and environment and infrastructure assessment.
An example of the application of the optimization techniques described in Clause 8 is illustrated in Annex A.
8.2 Training data preparation and model optimization
8.2.1 General consideration
The training data preparation and model optimization process consists of a comprehensive sequence of
methods aimed at improving the quality of training data, enhancing the informativeness of features and
maximizing model performance. These methods follow the machine learning (ML) pipeline and should be
selected and combined according to the unique characteristics of the dataset and the modelling goals.
To achieve consistent and transparent optimization, a structured machine learning pipeline approach should
be implemented. In this approach, each optimization task is treated as a distinct pipeline step that addresses
a specific aspect of the modelling process. Pipeline steps are applied gradually and synergistically, leading
© ISO/IEC 2026 – All rights reserved
ISO/IEC DTS 42112:2026(en)
to improved generalization, performance metrics and operational robustness.
...
Style Definition
...
Style Definition
...
Style Definition
...
ISO/IEC TS DTS 42112:####(X)
Style Definition
...
ISO/IEC JTC 1/SC 42/WG 5
Style Definition
...
Style Definition
Secretariat: ANSI
...
Style Definition
...
Date: 2025-10-222026-02-11
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
Information technology — Artificial intelligence — Guidance on
...
Style Definition
machine learning model training efficiency optimization .
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
CDTechnologies de l'information — Intelligence artificielle — Recommandations relatives à
Style Definition
...
l'optimisation de l'efficacité de l'entraînement du modèle d'apprentissage automatique
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
FDIS stage Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Warning for WDs and CDs
Style Definition
...
This document is not an ISO International Standard. It is distributed for review and comment. It is subject to
Style Definition
...
change without notice and may not be referred to as an International Standard.
Style Definition
...
Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of
Style Definition
...
which they are aware and to provide supporting documentation.
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
Style Definition
...
© ISO/IEC TS 42112– All rights reserved Style Definition
...
Formatted
...
Formatted
...
Formatted
...
Formatted
...
Formatted
...
Formatted
...
Formatted
...
ISO #####-#:####(X)
A model manuscript of a draft International Standard (known as “The Rice Model”) is available at
https://www.iso.org/iso/model_document-rice_model.pdf
2 © ISO #### – All rights reserved
Formatted: Font: Bold
© ISO/IEC TS 42112– All rights reserved
ISO/IEC TS DTS 42112:####(X:(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
© ISO/IEC 2026
Formatted: HeaderCentered, Space After: 0 pt, Line
spacing: single
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, Formatted: Indent: Left: 0 cm, Right: 0 cm, Adjust
space between Latin and Asian text, Adjust space
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO
between Asian text and numbers
at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
Formatted: English (United Kingdom)
EmailE-mail: copyright@iso.org
Formatted: English (United Kingdom)
Website: www.iso.orgwww.iso.org
Published in Switzerland
Formatted: English (United Kingdom)
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
iv © ISO/IEC TS 42112 2026 – All rights reserved
iv
ISO/IEC TS DTS 42112:####(X:(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Contents
Formatted: HeaderCentered, Left, Space After: 0 pt,
Line spacing: single
Foreword . vii
Introduction . viii
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 3
5 Overview of ML model training . 4
5.1 Model training in ML pipeline . 4
5.2 Stakeholders of model training . 4
6 ML model training efficiency . 5
7 Characteristics impacting ML model training efficiency . 5
7.1 Training data . 5
7.2 Model parameter management . 5
7.3 Communication challenges . 6
7.4 Failure detection in model training . 6
7.5 Failure recovery in model training . 6
7.6 Quality of the ML model . 6
7.7 Management of computing resources . 7
8 Model training efficiency optimization methods . 7
8.1 Overview . 7
8.2 Training data preparation and model optimization . 8
8.3 Parallelism strategies . 11
8.4 Communication optimization . 12
8.5 Model checkpoint optimization . 13
8.6 Resource management for model training . 14
8.7 Failure detection optimization . 15
8.8 Continuous monitoring and anomaly detection . 15
8.9 Environment and infrastructure assessment . 15
Annex A (informative) Use case: Deep learning recommendation system for an e-commerce
platform . 1
Bibliography . 6
This template allows you to work with default MS Word functions and styles. You can use these if you want to
maintain the Table of Contents automatically and apply auto-numbering.
To update the Table of Contents please select it and press "F9".
Foreword . v
Introduction . vi
1 Scope . 1
Formatted: Font: 10 pt
2 Normative references . 1
Formatted: Font: 10 pt
3 Terms and definitions . 1
Formatted: Font: 11 pt
4 Abbreviation terms . 2
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
© ISO/IEC TS 42112 2026 – All rights reserved
v
ISO/IEC TS DTS 42112:####(X:(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
5 Overview of ML model training . 3
Formatted: HeaderCentered, Space After: 0 pt, Line
5.1 Model training in ML pipeline . 3
spacing: single
5.2 Stakeholders of model training . 3
6 ML model training efficiency . 4
7 Characteristics impacting ML model training efficiency . 4
7.1 Training data . 4
7.2 Model parameter management . 4
7.3 Communication challenges . 4
7.4 Failure detection in model training . 4
7.5 Failure recovery in model training . 5
7.6 Quality of the ML model . 5
7.7 Management of computing resources . 5
8 Model training efficiency optimization methods . 6
8.1 Overview . 6
8.2 Training data preparation and model optimization . 6
8.2.1 General consideration . 6
8.2.2 Training data quality optimization . 7
8.2.3 Feature engineering . 7
8.2.4 Feature selection . 7
8.2.5 Feature scaling . 7
8.2.6 Training algorithm selection . 8
8.2.7 Training process optimization . 8
8.2.8 Ensemble learning . 8
8.3 Parallelism strategies . 9
8.3.1 Data parallelism . 9
8.3.2 Model parallelism . 9
8.3.3 Hybrid parallelism . 9
8.4 Communication optimization . 10
8.4.1 Collective communication . 10
8.4.2 Data compression . 10
8.4.3 Asynchronous communication . 10
8.4.4 Network topology-aware scheduling . 10
8.5 Model checkpoint optimization . 10
8.5.1 Hierarchical checkpoint saving . 10
8.5.2 Overlapping model copy and computation . 11
8.5.3 Network-aware asynchronous storage . 11
8.6 Resource management for model training . 11
8.7 Failure detection optimization . 12
8.8 Continuous monitoring and anomaly detection . 12
8.9 Environment and infrastructure assessment . 13
Annex A . 1
Use case: A deep learning recommendation system for an e-commerce platform . 1
A.1 General . 1
A.2 Challenges. 1
A.3 Optimization practices. 1
A.3.1 General . 1
A.3.1 Training data and model optimizations of AI producers. 2
Formatted: Font: 10 pt
A.3.2 Platform-level optimizations of AI providers . 2
Formatted: Font: 10 pt
Bibliography . 4
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
vi © ISO/IEC TS 42112 2026 – All rights reserved
vi
ISO/IEC TS DTS 42112:####(X:(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Foreword
Formatted: HeaderCentered, Left, Space After: 0 pt,
Line spacing: single
ISO (the International Organization for Standardization) is a and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide federation of national standardsstandardization.
National bodies (that are members of ISO member bodies). The workor IEC participate in the development of
preparing International Standards is normally carried out through ISO technical committees. Each member
body interested in a subject for which a technical committee has been established has the right to be
represented on that committee. Internationalby the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
ISO documentsdocument should be noted. This document was drafted in accordance with the editorial rules
of the ISO/IEC Directives, Part 2 (see www.iso.org/directives 2 (see www.iso.org/directives or
www.iec.ch/members_experts/refdocs).
ISO drawsand IEC draw attention to the possibility that the implementation of this document may involve the
Formatted: English (United Kingdom)
use of (a) patent(s). ISO takesand IEC take no position concerning the evidence, validity or applicability of any
Formatted: English (United Kingdom)
claimed patent rights in respect thereof. As of the date of publication of this document, ISO [had/and IEC had
Formatted: English (United Kingdom)
not] received notice of (a) patent(s) which may be required to implement this document. However,
implementers are cautioned that this may not represent the latest information, which may be obtained from
Formatted: Font: Not Italic, Font color: Auto, English
the patent database available at www.iso.org/patents. ISOwww.iso.org/patents and https://patents.iec.ch.
(United Kingdom)
ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html)
see www.iso.org/iso/foreword.html. In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Formatted: English (United Kingdom)
Subcommittee SC 42, Artificial intelligence.
Formatted: Font color: Auto
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.htmlwww.iso.org/members.html and
www.iec.ch/national-committees.
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
© ISO/IEC TS 42112 2026 – All rights reserved
vii
ISO/IEC TS DTS 42112:####(X:(en) Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Introduction
Formatted: HeaderCentered, Space After: 0 pt, Line
spacing: single
Machine learning (ML) is a key branch of artificial intelligence (AI). To apply ML across diverse domains, ML
models are trained, validated and deployed in production environmentenvironments. As model complexity
and dataset size continue to grow, the time, hardware resources, human effort and financial costs associated
with ML model training are escalating.
ML platforms, services and products are now widely available and adopted. Both AI providers (who offer ML
platforms, services or products) and AI producers (who build ML-based solutions) seek to optimize training
efficiency to minimize resource consumption and cost, without compromising model or dataset scale. For
instance, AI providers aim to reduce hardware usage per training task to support more customers and
improve resource utilization. AI producers prioritize faster training cycles to accelerate deployment of
validated models.
This document provides guidance to help AI providers and producers achieve faster training and reduced
Formatted: Font color: Auto
resource consumption, given specific models, dataset and infrastructure.
Formatted: Font: 10 pt
Formatted: Font: 10 pt
Formatted: Font: 11 pt
Formatted: FooterPageRomanNumber, Space After: 0
pt, Line spacing: single
viii © ISO/IEC TS 42112 2026 – All rights reserved
viii
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: HeaderCentered, Left, Space After: 0 pt,
Information technology — Artificial intelligence — Guidance on
Line spacing: single
machine learning model training efficiency optimization
Formatted: Font: Font color: Auto, (Asian) Japanese,
(Other) English (United Kingdom), Not Highlight
1 Scope
Formatted: Font: Font color: Auto, (Asian) Japanese,
(Other) English (United Kingdom), Not Highlight
This document outlines key factors affecting machine learning model training efficiency and presents
corresponding optimization approaches. Formatted: Main Title 1, Space After: 0 pt
Formatted: English (United Kingdom)
It offersprovides guidance for AI providers and producers through a structured set of characteristics and
Formatted: English (United Kingdom)
related optimizations to improve training efficiency. This information can support the evaluation and
comparison of various ML training strategies.
Formatted: English (United Kingdom)
This document does not specify any training accelerating mechanisms provided and implemented within
Formatted: Font: 11 pt
machine learning computing device described in ISO/IEC TR 17903.
Formatted: Font: 11 pt, English (United Kingdom)
Formatted: Font: 11 pt
2 Normative references
Formatted: Font: 11 pt, English (United Kingdom)
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 22989:2022, Information technology — Artificial intelligence –— Artificial intelligence concepts
Formatted: RefNorm
and terminology
ISO/IEC 23053:2022, Framework for Artificial Intelligence (AI) SystemSystems Using Machine Learning
Formatted: English (United Kingdom)
(ML)
Formatted: Font: Not Italic
ISO/IEC TR 17903:2024, Information technology — Artificial intelligence — Overview of machine learning
computing devices
ISO/IEC TS 4213, Information technology — Artificial intelligence — Assessment of machine learning
classification performance
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 22989:2022, ISO/IEC
Formatted: Font color: Auto
23053:2022, ISO/IEC TR 17903:2024, ISO/IEC TS 4213:2022 and the following apply.
Formatted: Font color: Auto
Formatted: Font color: Auto
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
Formatted: English (United Kingdom)
— ISO Online browsing platform: available at https://www.iso.org/obphttps://www.iso.org/obp
Formatted: English (United Kingdom)
— IEC Electropedia: available at https://www.electropedia.org/https://www.electropedia.org/
Formatted: English (United Kingdom)
Formatted: No underline, Font color: Auto
Formatted: List Continue 1, No bullets or numbering,
3.1 3.1
Don't keep with next
artificial intelligence platform
Formatted: No underline, Font color: Auto
AI platform
set of services that are provided by AI platform provider and enable other stakeholders to produce Formatted: Dutch (Netherlands)
AIartificial intelligence (AI) services or products
Formatted: TermNum2
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
© ISO #### – All rights reserved 1
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
3.2 3.2
Formatted: HeaderCentered, Space After: 0 pt, Line
asynchronous
spacing: single
pertaining to two or more processes that do not depend upon the occurrence of specific events such as
common timing signals
Formatted: TermNum2
[SOURCE: ISO/IEC 20944-1:2013, 3.6.1.14]
Formatted: Source
3.3
at-risk group
subset of stakeholders that can be adversely affected by unwanted bias
Note 1 to entry: At-risk groups can also emerge from intersections of groups as described in ISO/IEC TR
24027.
Note 2 to entry: Unforeseen at-risk groups can emerge due to the use of AI systems, as described in
ISO/IEC TS 12791, 5.1.5
[SOURCE: ISO/IEC TS 12791:2024, 3.3.11, modified — “5.1.5” in Note 2 to entry replaced with “ISO/IEC
TS 12791, 5.1.5”.]
3.3 3.4
Formatted: TermNum2
computation graph
graph used to represent mathematical computation process, where nodes represent operations or
variables and edges represent data flows and dependencies
3.4 3.5
failure
event during the model training process in which the training task fails to complete as expected or
experiences significant degradation in performance
3.5 3.6
fault
defect, error or abnormal condition in infrastructure, hardware, software, data or configuration that can
lead to a failure (3.6)(3.4) in the model training process
Formatted: Font: Not Italic
Formatted: Font: English (United Kingdom)
3.6 3.7
grid search
hyperparameter tuning technique that explores all possible combinations of hyperparameters within a
predefined set to identify the combination that can maximize ML model performance
Note 1 to entry: Hyperparameter tuning is specified in ISO/IEC 23053:2022, 6.5.3.1.
Formatted: Note
3.7 3.8
principal-component analysis
PCA
Formatted: Font: Not Bold, English (United Kingdom)
factor analysis involving the extraction of orthogonal factors that successively capture the largest amount
Formatted: Term(s)
of variance in the data setdataset
[SOURCE: ISO 18115-1:2023, 22.14, modified — Notes to entry removed.]
3.8 3.9
Formatted: TermNum2
random search
hyperparameter tuning technique that randomly samples a fixed number of hyperparameter
combinations within a predefined set to identify a favourable combination
Formatted: English (United Kingdom)
Note 1 to entry: Hyperparameter tuning is specified in ISO/IEC 23053:2022, 6.5.3.1.
Formatted: Note
2 © ISO #### – All rights reserved
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
3.9 3.10
Formatted: HeaderCentered, Left, Space After: 0 pt,
synchronous
Line spacing: single
pertaining to two or more processes that depend upon the occurrence of specific events such as common
timing signals
[SOURCE: ISO/IEC 20944-1:2013, 3.6.1.13]
4 AbbreviationAbbreviated terms
AI artificial intelligence
AdaBoost adaptive boosting
Bagging bootstrap aggregating
BMI body mass index
CPU central processing unit
CNN convolutional neural network
D2H Device-to-Host
FP floating-point
GPU graphics processing unit
KNN k-nearest neighbours
Lasso least absolute shrinkage and selection operator
LDA linear discriminant analysis
LSTM long short-term memory
ML machine learning
Nested CV nested cross validation
NLP natural language processing
PCA principal component analysis
RNN recurrent neural network
SMOTE synthetic minority over-sampling technique
SSD solid-state drive
SVM support vector machine
t-SNE t-distributed stochastic neighbour embedding
XGBoost extreme gradient boosting
AI artificial intelligence
© ISO #### – All rights reserved 3
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
AdaBoost adaptive boosting
Formatted: HeaderCentered, Space After: 0 pt, Line
Bagging bootstrap aggregating spacing: single
BMI body mass index
CPU central processing unit
CNN convolutional neural network
D2H Device-to-Host
FP floating-point
GPU graphics processing unit
KNN k-nearest neighbours
Lasso least absolute shrinkage and selection operator
LDA linear discriminant analysis
LSTM long short-term memory
ML machine learning
Nested CV nested cross validation
NLP natural language processing
PCA principal component analysis
RNN recurrent neural network
SMOTE synthetic minority over-sampling technique
SSD solid-state drive
SVM support vector machine
t-SNE t-distributed stochastic neighbour embedding
XGBoost extreme gradient boosting
5 Overview of ML model training Formatted: Space Before: 12 pt
5.1 Model training in ML pipeline
Formatted: Dutch (Netherlands)
Formatted: Dutch (Netherlands)
The ML pipeline is defined in ISO/IEC 23053:2022, Clause 8. Within this pipeline, training data isare
Formatted: Font: English (United Kingdom)
prepared and an appropriate ML algorithm is selected prior to model training. During model training, an
ML model is trained using the training data to establish its parameters. After training, model selection is
Formatted: Body Text
conducted to tune hyperparameters, followed by model evaluation and verification.
This document focuses exclusively on optimizing efficiency during the ML model training phase.
NOTE: This document does not cover the ML optimization methods specified in ISO/IEC 23053:2022, Clause
6.5.4.
5.2 Stakeholders of model training
AI stakeholder roles and their sub-roles, as defined in ISO/IEC 22989:2022, Clause 5.19, are applicable to
Formatted: Font: English (United Kingdom)
the context of ML model training in this document.
Formatted: Font: English (United Kingdom)
Formatted: Body Text
In this context, an AI provider supplies tools, infrastructure or services, typically via an AI platform, to
enable organizations to train ML models. The AI platform facilitates various aspects of the ML workflow,
Formatted: English (United Kingdom)
4 © ISO #### – All rights reserved
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
including data preparation, model training, evaluation and deployment. Key features provided by the AI
Formatted: HeaderCentered, Left, Space After: 0 pt,
platform include:
Line spacing: single
— computing resources allowing access to hardware (CPUs, GPUs or other types of processors)
Formatted: English (United Kingdom)
necessary for model training;
— storage solutions for secure and efficient management of datasets and models;
Formatted: English (United Kingdom)
— training automation to streamline and manage aspects of the training process;
Formatted: English (United Kingdom)
— collaboration tools supporting version control, project management and team coordination;
Formatted: English (United Kingdom)
[9]–[11]
— framework support ensuring compatibility with various ML frameworks [11]–[13] (optional);).
Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
An AI producer is responsible for designing, training, and validating ML models, and may utilize the tools,
Formatted: Font: English (United Kingdom)
infrastructure or services offered by an AI provider.
6 ML model training efficiency
Performance efficiency is a key characteristic in the AI system quality model (see ISO/IEC 25059). It is
Formatted: Font: English (United Kingdom)
defined in terms of time behaviour, resource utilization and capacity (see ISO/IEC 25010:2023, Clause
Formatted: Font: English (United Kingdom)
3.2). In the context of ML model training, efficiency optimization refers to:
Formatted: Font: English (United Kingdom)
— time behaviour optimization, which aims to reduce the duration of model training;
Formatted: Font: English (United Kingdom)
Formatted: Body Text
— resource utilization optimization, which focuses on minimizing hardware usage or improving
Formatted: Font: English (United Kingdom)
utilization rate.
Formatted: Font: English (United Kingdom)
7 Characteristics impacting ML model training efficiency
Formatted: Font: English (United Kingdom)
7.1 Training data Formatted: English (United Kingdom)
Formatted: English (United Kingdom)
Several characteristics of training data influence ML training efficiency and optimization potential.
Formatted: English (United Kingdom)
The quality of the training data impacts the efficiency of ML model training and the inference
Formatted: English (United Kingdom)
performance of the trained model. ISO/IEC 5259-2 specifies data quality characteristics and methods for
Formatted: Font: English (United Kingdom)
their quantitative assessment. Low-quality training data can lead to longer training time, increased
Formatted: Body Text
resource consumption and degraded inference performance. If data quality requirements are not met,
retraining maycan be necessary, which can significantly increase overall training duration and resource
Formatted: Font: English (United Kingdom)
usage.
The size of the training dataset has a complex effect on both training efficiency and model inference
performance. Larger datasets demand more computation, memory and network resources, and typically
require longer training durations. Conversely, datasets that are too small to be representative, complete,
balanced, effective or fair can impair model performance, even if they require fewer resources and
shorter training time.
Formatted: English (United Kingdom)
The method of data processing also impacts training efficiency. Sequential data processing can be
prohibitively time-consuming or infeasible for large datasets.
Formatted: Font: English (United Kingdom)
Formatted: English (United Kingdom)
7.2 Model parameter management
A large number of model parameters can require distributed storage and frequent updates across
Formatted: Body Text
multiple ML computing devices.
© ISO #### – All rights reserved 5
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
7.3 Communication challenges
Formatted: HeaderCentered, Space After: 0 pt, Line
spacing: single
Centralized, synchronous communication between multiple ML computing devices where training data
isare split can be an efficiency bottleneck. Latency between ML computing devices further contributes to
Formatted: Body Text
reduced training efficiency.
Concurrent model training tasks running on shared computing infrastructure can lead to network
Formatted: Font: English (United Kingdom)
congestion. When multiple tasks simultaneously transmit data transmission, they may exceed network
bandwidth limits, causing network congestion. High-volume data transmission from certain training task
can obstruct network traffic for others, slowing down communication and negatively impacting overall
training efficiency.
7.4 Failure detection in model training
ML model training typically runs on computing infrastructure equipped with supporting libraries,
Formatted: Body Text
software and tools. When training large models with extensive dataset, the process can be prolonged.
Hardware failure, system faults, power outage or other disruptions may interrupt the training process.
When failures occur, troubleshooting is necessary. If the issue is not clearly linked to the AI producer’s
Formatted: Font: English (United Kingdom)
code, the AI provider’s operations team should become intervene to diagnose and isolate the fault.
Ultimately, it can be necessary for the AI producer may need to resubmit the training task, which can be
Formatted: Font: English (United Kingdom)
time-consuming.
Formatted: Font: English (United Kingdom)
Formatted: English (United Kingdom)
In large-scale distributed synchronous deep learning tasks, hardware faults can leave many GPU cards
idle, resulting in substantial resource waste.
Effective failure detection is therefore critical to maintaining ML model training efficiency.
7.5 Failure recovery in model training
To prevent restarting the entire training process after a failure, intermediate model states can be saved
Formatted: Font: English (United Kingdom)
during training. For large models trained on extensive dataset, a checkpoints mechanism should be
Formatted: Font: English (United Kingdom)
implemented to periodically save model states—including weights, optimizer configurations and other
Formatted: Body Text
relevant parameters. Checkpoints enable AI producers to resume training from a saved state rather than
starting from scratch. However, generating checkpoints introduces overhead and failure recovery using
Formatted: Font: English (United Kingdom)
this method can affect overall training efficiency.
Formatted: Font: English (United Kingdom)
7.6 Quality of the ML model Formatted: Font: English (United Kingdom)
Formatted: Font: English (United Kingdom)
The quality of an ML model, including correctness, robustness, bias mitigation, information security and
Formatted: Font: English (United Kingdom)
protection from exploitation, can significantly influence training efficiency.
Formatted: Body Text
Correctness directly affects training efficiency, as it reflects how well the model performs in terms of
Formatted: Font: English (United Kingdom)
metrics like accuracy and error rates. High correctness indicates effective learning and generalization,
Formatted: Font: English (United Kingdom)
reducing the need for retraining and improving resource efficiency. However, striving for high
correctness maycan lead to overfitting, where the model becomes overly tailored to the training data and
Formatted: Font: English (United Kingdom)
performs poorly on unseen data. Mitigating overfitting requires techniques such as cross-validation,
regularization and additional tuning, which can extend the training process.
Robustness reflects the model’s ability to handle fluctuations and interruptions in training data and to
perform reliably with imperfect or faulty data. Training for robustness involves exposing the model to
noisy or incomplete data and evaluating its ability under such conditions. A robust model can adapt to
fluctuations without significant performance decrease, reduce the need for repeated validation and
conserve resources. Achieving robustness maycan require additional training phases, such as
incorporating negative examples or optimizing hyperparameters to handle perturbations. This additional
6 © ISO #### – All rights reserved
ISO/IEC TS DTS 42112:####(X:(en)
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
Formatted: Font: 11 pt, Bold
complexity can lengthen the training process and require more computing resources. Techniques like
Formatted: HeaderCentered, Left, Space After: 0 pt,
data augmentation and regularization can improve resilientresilience but also increase training time and
Line spacing: single
computational costs.
Bias mitigation requires careful data curation and preprocessingpre-processing to ensure representative
and fairness. Techniques such as re-weighting or data augmentation maycan be necessary to address
underrepresented gropes. A model free from unwanted bias generalizes better across diverse data and
reduces retraining due to ethical or legal concerns. However, ensuring fairing mayfairness can increase
Formatted: Font: English (United Kingdom)
training pipeline complexity and require additional fine-tuning and post-training bias audits, further
Formatted: Font: English (United Kingdom)
extending the training cycle.
Formatted: Font: English (United Kingdom)
Information security impacts training efficiency by introducing requirements for data privacy and model
confidentiality. Measures such as encryption and access control can complicate data preparation and
training workflows. Maintaining infrastructure availability, data confidentiality, and model integrity
helps avoid costly retraining. However, secure handling of sensitive data and model parameters maycan
require additional steps, such as secure data storage and protected access environments. While these
measures introduce, which introduces overhead that affects overall training efficiency.
Safeguard against exploitation also influences training efficiency. Secure models help prevent system
failures and reduce retraining caused by adversarial vulnerabilities. Techniques such as output
obfuscation and review of model predictions help prevent unintended information exposure and
adversarial exploitation. Proactively building adversarial robustness saves time and resources otherwise
Formatted: Font: English (United Kingdom)
spent on reactive defences. However, simulating attack scenarios and validating robustness can
significantly increase training time and resources demands.
7.7 Management of computing resources
Formatted: Font color: Auto, Dutch (Netherlands)
Poor management of computing resource can lead to workload imbalance, where some ML computing
Formatted: Font: Font color: Auto, English (United
devices are overloaded while others remain underutilized. This results in inefficient resources usage and Kingdom), Not Expanded by / Condensed by , Pattern:
may cause instability or failure in overloaded devices. Clear
Formatted: Body Text
When multiple model training tasks run concurrently on the same underlying infrastructure, resource
Format
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...