ISO/IEC 5259-4:2024
(Main)Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 4: Data quality process framework
Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 4: Data quality process framework
This document establishes general common organizational approaches, regardless of the type, size or nature of the applying organization, to ensure data quality for training and evaluation in analytics and machine learning (ML). It includes guidance on the data quality process for: — supervised ML with regard to the labelling of data used for training ML systems, including common organizational approaches for training data labelling; — unsupervised ML; — semi-supervised ML; — reinforcement learning; — analytics. This document is applicable to training and evaluation data that come from different sources, including data acquisition and data composition, data preparation, data labelling, evaluation and data use. This document does not define specific services, platforms or tools.
Intelligence artificielle — Qualité des données pour les analyses de données et l’apprentissage automatique — Partie 4: Cadre pour le processus de qualité des données
General Information
Standards Content (Sample)
International
Standard
ISO/IEC 5259-4
First edition
Artificial intelligence — Data
2024-07
quality for analytics and machine
learning (ML) —
Part 4:
Data quality process framework
Intelligence artificielle — Qualité des données pour les analyses
de données et l’apprentissage automatique —
Partie 4: Cadre pour le processus de qualité des données
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
Contents Page
Foreword .v
Introduction .vi
1 Scope .1
2 Normative references .1
3 Terms and definitions .1
4 Symbols and abbreviated terms.3
5 Data quality process principles .3
6 Data quality process framework .3
6.1 General .3
6.2 Data quality planning .5
6.3 Data quality evaluation .6
6.4 Data quality improvement .6
6.5 Data quality process validation .6
6.6 Using the DQPF .7
7 Data quality process for ML .7
7.1 General .7
7.2 Data requirements .8
7.3 Data planning . .9
7.4 Data acquisition .9
7.5 Data preparation .10
7.5.1 General .10
7.5.2 Supervised ML .10
7.5.3 Unsupervised ML .10
7.5.4 Semi-supervised ML .10
7.5.5 Dataset composition .11
7.5.6 Data labelling .11
7.5.7 Data annotation .11
7.5.8 Data quality assessment . 12
7.5.9 Data quality improvement . 13
7.5.10 Data de-identification . 15
7.5.11 Data encoding. .16
7.6 Data provisioning .16
7.6.1 General .16
7.6.2 Supervised ML .16
7.6.3 Unsupervised ML .16
7.6.4 Semi-supervised ML .16
7.7 Data decommissioning .16
8 Data labelling methods and process .17
8.1 General .17
8.2 Data labelling principles .17
8.3 Data labelling methods .17
8.4 Data labelling process .18
8.4.1 General .18
8.4.2 Labelling specifications .18
8.4.3 Labelling participant roles .18
8.4.4 Labelling tools or platforms .19
8.4.5 Labelling task establishment .19
8.4.6 Labelling task assignment .19
8.4.7 Labelling process control . 20
8.4.8 Labelling result quality checking . 20
8.4.9 Labelling result revision . . 20
© ISO/IEC 2024 – All rights reserved
iii
9 Roles of participants .21
9.1 General .21
9.2 Data planner .21
9.3 Data originator .21
9.4 Data collector .21
9.5 Data engineer .21
9.6 Data holder .21
9.7 Data user .21
10 Data quality process for semi-supervised ML .22
10.1 General . 22
10.2 Data requirements . 22
10.3 Data planning . . 22
10.4 Data acquisition . 22
10.5 Data preparation . 22
10.6 Data provisioning .
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.