Data governance and quality for AI within the European context

This document provides an overview on AI-related standards, with a focus on data and data life cycles, to organizations, agencies, enterprises, developers, universities, researchers, focus groups, users, and other stakeholders that are experiencing this era of digital transformation.
It describes links among the many international standards and regulations published or under development, with the aim of promoting a common language, a greater culture of quality, giving an information framework.
It addresses the following areas:
-   data governance;
-   data quality;
-   elements for data, data sets properties to provide unbiased evaluation and information for testing.

Datenmanagement und -qualität für KI im europäischen Kontext

Gouvernance et qualité des données pour l'IA dans le contexte européen

Upravljanje in kakovost podatkov za UI v evropskem okviru

Ta dokument organizacijam, agencijam, podjetjem, razvijalcem, univerzam, raziskovalcem, ciljnim skupinam, uporabnikom in drugim deležnikom v dobi digitalne transformacije zagotavlja pregled standardov v zvezi z umetno inteligenco, s poudarkom na podatkih in življenjskih ciklih podatkov.
Opisuje povezave med številnimi mednarodnimi standardi in predpisi, ki so objavljeni ali v pripravi, z namenom spodbujanja skupnega jezika, izboljšanja kulture kakovosti in zagotavljanja informacijskega okvira.
Obravnava naslednja področja:
– upravljanje podatkov;
– kakovost podatkov;
– elementi za podatke in lastnosti naborov podatkov, ki zagotavljajo nepristranske ocene in informacije za preskušanje.

General Information

Status
Published
Publication Date
05-Nov-2024
Current Stage
6060 - Definitive text made available (DAV) - Publishing
Start Date
06-Nov-2024
Due Date
30-Apr-2025
Completion Date
06-Nov-2024

Overview

CEN/CLC/TR 18115:2024 - Data governance and quality for AI within the European context is a CEN-CENELEC Technical Report that maps European regulations and international standards relevant to data governance, data quality and the data life cycle for Artificial Intelligence (AI). The report provides an information framework and common terminology to help organizations align AI development, testing and deployment with legal, ethical and technical expectations across Europe.

Key topics

  • Data governance framework: multi-level governance approaches, roles and flows for managing data across organizations and European initiatives.
  • Data quality models: characteristics and measures adapted from ISO/IEC models (e.g., ISO/IEC 25012, ISO/IEC 5259-2, ISO 8000) to support accuracy, robustness, accessibility and trustworthiness of datasets.
  • Data life cycle: lifecycle phases for data collection, curation, validation, labeling, storage and reuse with quality checkpoints for AI training, validation and testing.
  • Unbiased evaluation & testing: properties of datasets and information required to perform fair, reproducible testing and unbiased evaluation of AI systems.
  • Regulatory alignment: mapping between the EU AI Act (notably Articles on data and governance), GDPR and other EU laws (Data Governance Act, Data Act, Open Data Directive) to standardization needs.
  • Practical guidance: case studies, best practices and examples (including public sector experiences and sector-specific notes such as healthcare) to illustrate implementation challenges and solutions.

Applications

This Technical Report is intended for:

  • Organizations and enterprises implementing or procuring AI systems who need to design trustworthy data pipelines.
  • Developers and data scientists seeking standardized quality criteria for training, validation and testing datasets.
  • Public administrations and data offices aiming to align public data reuse with EU data governance policies.
  • Universities, researchers and focus groups analysing interoperability, ethics and dataset bias.
  • Certification bodies and auditors assessing compliance with data-related requirements of the EU AI Act and GDPR.

Practical uses include designing data governance policies, defining dataset quality metrics for ML pipelines, preparing documentation for AI conformity assessment, and harmonizing local practices with pan‑European data spaces.

Related standards and references

  • EU laws: AI Act (2024), GDPR, Data Governance Act (2022), Data Act (2023), Open Data Directive (EU 2019/1024).
  • ISO/IEC standards cited in the TR: ISO/IEC 25012, ISO/IEC 5259-2, ISO 8000, ISO/IEC 22989 and related AI/data standards and technical reports.
  • Produced by CEN/CLC/JTC 21; the TR synthesizes JRC research, standards clusters and practical case studies for European stakeholders.

For implementation, refer to the full Technical Report and national standards bodies for guidance and adoption pathways.

Technical report
TP CEN/CLC/TR 18115:2025 - BARVE
English language
64 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


SLOVENSKI STANDARD
01-februar-2025
Upravljanje in kakovost podatkov za UI v evropskem okviru
Data governance and quality for AI within the European context
Datenmanagement und -qualität für KI im europäischen Kontext
Gouvernance et qualité des données pour l'IA dans le contexte européen
Ta slovenski standard je istoveten z: CEN/CLC/TR 18115:2024
ICS:
35.240.01 Uporabniške rešitve Application of information
informacijske tehnike in technology in general
tehnologije na splošno
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

TECHNICAL REPORT CEN/CLC/TR 18115

RAPPORT TECHNIQUE
TECHNISCHER REPORT
November 2024
ICS 35.240.01
English version
Data governance and quality for AI within the European
context
Gouvernance et qualité des données pour l'IA dans le Datenmanagement und -qualität für KI im
contexte européen europäischen Kontext

This Technical Report was approved by CEN on 30 September 2024. It has been drawn up by the Technical Committee
CEN/CLC/JTC 21.
CEN and CENELEC members are the national standards bodies and national electrotechnical committees of Austria, Belgium,
Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy,
Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Republic of North Macedonia, Romania, Serbia,
Slovakia, Slovenia, Spain, Sweden, Switzerland, Türkiye and United Kingdom.

CEN-CENELEC Management Centre:
Rue de la Science 23, B-1040 Brussels
© 2024 CEN/CENELEC All rights of exploitation in any form and by any means
Ref. No. CEN/CLC/TR 18115:2024 E
reserved worldwide for CEN national Members and for
CENELEC Members.
Contents Page
European foreword . 4
Introduction . 5
1 Scope . 8
2 Normative references . 8
3 Terms and definitions . 8
3.1 General . 8
3.2 Data governance . 10
3.3 Data quality . 12
4 Abbreviations . 14
5 JRC research and data-related standards on AI . 15
5.1 General . 15
5.2 Research: Data quality requirements for inclusive, non-biased and trustworthy AI . 16
5.3 Data-related standards on AI for data governance and data quality . 18
5.3.1 General . 18
5.3.2 A short description of the standards mentioned in Figure 4 (taken from www.iso.org) . 19
6 Data governance . 24
7 Data quality . 35
8 Elements for data, data sets, information for testing and evaluation . 45
9 Data governance and data quality for large European contexts . 50
9.1 General . 50
9.2 Italian government: Strategy program on Artificial Intelligence . 50
9.3 Italian agency application of data quality model for public administrations . 51
9.4 Spanish experience on data Governance: Data Office . 52
9.5 European governance relating to the Directive on inclusivity and accessibility . 53
10 General considerations on innovative technology: Ethics, Governance, AI Act . 54
11 Potential challenges. 56
11.1 General . 56
11.2 Stakeholders’ engagement . 56
11.3 Contextualization . 56
11.4 Critical infrastructures . 57
11.5 Ethics and regulatory challenges . 57
11.6 Interoperability . 58
11.7 Big volume of data . 59
12 Best practices from organizations, industries and research activities . 59
12.1 General . 59
12.2 AI in healthcare: the MES-CoBraD approach . 59
12.3 Overview of industries that stand out for their approach to data governance . 60
Bibliography . 62
Figures
Figure 1 — Connections of Legislations, Standards, Guidelines & Monitoring specifications . 7
Figure 2 — Active organizations mentioned in JRC . 17
Figure 3 — Standards and Technical reports mentioned in JRC . 18
Figure 4 — Clusters of standards, TS, TR data-related . 18
Figure 5 — Example of relationships among quality aspects of ISO/IEC 5259-2, ISO/IEC 25059
and AI Act , eliciting new requirements to be harmonized . 23
Figure 6 — European legal references and ISO standards for AI on data quality (or
complementary) . 26
Figure 7 — Data governance framework . 27
Figure 8 — Data governance flow at European level . 30
Figure 9 — Data managing integration and synthesis of experiences . 32
Figure 10 — Data Governance summary . 34
Figure 11 — Data Quality Measures and Data Life Cycle Model . 36
Figure 12 — Relationship among quality models, characteristics, QM, QME, property, target
entity . 38
Figure 13 — Data life cycle framework . 44
Figure 14 — Example of conceptual perspective visualization of data testing and evaluation . 47
Figure 15 — Visualization of elements for governance of data, data sets, testing . 47
Figure 16 — Example of ontological contextual schema of elements resulting in the conference
online held in October 2020 with 100 speakers [13] . 49

Tables
Table 1 — Main documents considered for data governance framework . 24
Table 2 — Type of governance and multi-level point of view . 28
Table 3 — Characteristics of the data quality model adapted from ISO/IEC 25012 . 37
Table 4 — Characteristics of data quality models from ISO 8000, ISO/IEC 25012 and
ISO/IEC 5259-2 . 41

European foreword
This document (CEN/CLC/TR 18115:2024) has been prepared by Technical Committee CEN/JTC 21
“Artificial Intelligence”, the secretariat of which is held by DS.
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. CEN shall not be held responsible for identifying any or all such patent rights.
Any feedback and questions on this document should be directed to the users’ national standards body.
A complete listing of these bodies can be found on the CEN website.
Introduction
This document aims to provide an overview of the relevant regulations in the European context and
connected international standards, paying particular attention to data governance and data quality
topics. Relevant regulations considered are:
— “Council of Europe” Ad hoc Committee on AI (CAI) that produced “Recommendation CM/Rec (2020)
of the Committee of Ministers to member States on the human rights impact of algorithmic systems”
and the deliverable “possible elements of a legal framework on Artificial Intelligence, based on the
Council of Europe’s standards on human rights, democracy and the rules of law” (2021) [1].
— “European strategy for data” (2020), which is essential to govern new technologies and create
business opportunities.
— “Artificial Intelligence Act” (2024), which aims to ensure that AI systems placed on the market and
used in the EU are safe and respect fundamental rights. Attention is given specifically to:
— Article 10 “Data and data governance” describing the quality criteria specifying aspects of
training, validation and testing of data sets.
— Article 15 “Accuracy, robustness, and cybersecurity” describing essential quality
characteristics that can be extended to a general data quality model; consistency between
terms and definitions is a common goal of this document, as well as of future TS and EN
standards.
— Articles where standard quality characteristics are mentioned (see Figure 5).
— “Data Governance Act” (2022) providing a framework aiming:
— to increase trust in data sharing across areas;
— to develop common European data spaces in strategic domains (e.g. health, environment,
energy, agriculture, mobility, finance, manufacturing, public administration;
— to strengthen mechanism to increase data availability and overcome technical obstacles to the
reuse of data.
— “Data Act” (2023): key elements include the reinforced data portability and data sharing, rules
governing the processing data shared, model contracts, access and use data held by private
companies, data and cloud interoperability, databases containing data from IoT, restriction on data
sharing.
— “Open data Directive” (EU 2019/1024): provides common rules for a European market for
government-held data, including the re-use of public sector information.
In addition, Regulation (EU) 2016/679 of the European Parliament and the Council on the protection of
natural persons about the processing of personal data and on the free movement of such data, and
repealing Directive 95/46/EC – GDPR, it is also considered in this document. The General Data
Protection Regulation – GDPR, entered into force on May 2016, creates a harmonized set of rules
applicable to processing of all European personal data. The objective of GDPR is to ensure that personal
data enjoys a high standard of protection everywhere in the EU, increasing legal certainty for both
individuals and organizations proceeding data, and offering a higher degree of protection for individuals
and their fundamental rights. According to ISO/IEC 22989 types of organizations are e.g. commercial
enterprises, governments agencies, not-for-profit organizations. The objective of GDPR is to provide a
consistent and high level of protection of natural persons regarding the processing of personal data and
the free movement of such data and to remove the obstacles to the flow of personal data within the
Union. In addition, GDPR ensures a common level of protection of the rights and freedoms of natural
persons concerning the processing of such data all over the Member States, increasing legal certainty
for both individuals and organizations proceeding with data and offering a higher degree of protection
for individuals and their fundamental rights.
GDPR takes into consideration also the activity of processing personal data by Artificial Intelligence
systems (see processing reported in 3.2.10), as we will see explaining characteristics of data quality
containing specific requirements on this topic strongly related to some principles of GDPR and as can
also be seen in some documents of the Council of Europe COE [1].
Another important aspect of quality underlined in this document it is related to accessibility for disabled
users. In this case also we will describe the concepts explaining characteristics of data quality the value
of accessibility, and understandability of data. The accessibility quality characteristic related to a
European legislative regulation is a good example of governance of data that are obtained with a global
vision by monitoring the activities in progress in each Country. A similar approach of governance, global
and local, can be extended in the future to the large applications of AI, developing specific EN Standards
or Technical specification.
Finally, some considerations on ethics are reported to reinforce some aspects related to data use.
The European Commission and the Member States put forward a ‘Coordinated Plan on Artificial
Intelligence’ - COM (2018) 795 - with the stated goal of maximizing AI investments impact both at
European and national levels and strengthening synergies and cooperation among Member States. To
this end, Member States were strongly encouraged to develop their own national AI strategies (e.g. with
Guidelines and monitoring specifications) to achieve these aims, in conformance with laws.
Figure 1 — Connections of Legislations, Standards, Guidelines & Monitoring specifications
EU AI Act and CEN-CENELEC JTC21 are harmonizing legislations and Standards. Guidelines &
Monitoring can be developed by Member States / Companies: examples are quoted in Clause 9 and 12
of this TR. Following these perspectives, the goal of this document is promoting a complement to the
overview of a common terminology and language on Artificial Intelligence to facilitate innovation,
communications, coordination, planning and agreements between European countries, national visions,
enterprises, projects and products realization oriented to quality and mitigating risks. For innovation
management the approach taken in the ISO 56000 family can be considered. For social motivation and
responsibility, ISO 26000 can contribute to sustain the inclusiveness and ethics principles.
1 Scope
This document provides an overview on AI-related standards, with a focus on data and data life cycles,
to organizations, agencies, enterprises, developers, universities, researchers, focus groups, users, and
other stakeholders that are experiencing this era of digital transformation.
It describes links among the many international standards and regulations published or under
development, with the aim of promoting a common language, a greater culture of quality, giving an
information framework.
It addresses the following areas:
— data governance;
— data quality;
— elements for data, data sets properties to provide unbiased evaluation and information for testing.
2 Normative references
There are no normative references in this document.
NOTE For the application of this document: users and stakeholders can apply the standards listed depending
on their context of use and in compliance with the laws.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp/
— IEC Electropedia: available at https://www.electropedia.org/
Note 1 to entry: Terms and definitions have been divided into General, Data Governance and Data quality.
3.1 General
3.1.1
Artificial Intelligence
AI
research and development of mechanisms and applications of AI systems
Note 1 to entry: Research and development can take place across any number of fields such as computer science,
data science, humanities, mathematics, and natural sciences
[SOURCE: ISO/IEC 22989:2022 ]
ISO/IEC 22989:2022/AMD1 is under development.
3.1.2
AI system
engineered system that generates outputs such as content, forecasts, recommendations, or decisions for
a given set of human-defined objectives
Note 1 to entry: The engineered system can use various techniques and approaches related to artificial intelligence
to develop a model to represent data, knowledge, processes, etc. which can be used to conduct tasks.
Note 2 to entry: AI systems are designed to operate with varying levels of automation.
[SOURCE: ISO/IEC 22989:2022 ]
3.1.3
element
smaller part of an architecture
EXAMPLES records, fields, format, metadata, images, etc.
[SOURCE: ISO/IEC 25024:2015; in ISO/IEC 25024:2015, 4.19, the term is used with reference to the
architecture of data and to computer program domain such as data model or data dictionary.]
3.1.4
framework
reusable design (models or code) that can be refined (specialized) and extended to provide some
portion of the overall functionality of many applications
[SOURCE: IEEE 1320.2-1998 (R2004)]
3.1.7
life cycle
evolution of a system, product, service, project or other human-made entity, from conception through
retirement
[SOURCE: ISO/IEC 22989:2022; ISO/IEC/IEEE 15288:2023]
3.1.8
measure
variable to which a value is assigned as the result of measurement
Note 1 to entry: the term measure is used to refer collectively to base measures, derived measures and indicators.
[SOURCE: ISO/IEC 25024:2015, 4.26, ISO/IEC 25010:2011, 4.4.5, ISO/IEC/IEEE 15939:2017]
3.1.9
measurement
set of operations having the object of determining a value of a measure
[SOURCE: ISO/IEC 25024:2015; ISO 3951-5:2006]

ISO/IEC 22989:2022/AMD1 is under development.
3.1.10
metric
defined measurement method and measurement scale
[SOURCE: ISO/IEC 14102:2008]
3.1.11
process
set of interrelated or interacting activities which transforms inputs into outputs
[SOURCE: ISO/IEC/IEEE 12207:2017]
3.1.12
product
result of a process
[SOURCE: ISO/IEC/IEEE 12207:2017]
3.1.13
property
property to quantify
property of a target entity that is related to a quality measure element and which can be quantified by a
measurement method
[SOURCE: ISO/IEC 25021:2012, Figure 5, reported in Figure 12 of this document]
3.1.14
quality model
defined set of characteristics and of relationships between them, which provides a framework for
specifying quality requirements and evaluating quality
[SOURCE: ISO/IEC 25000:2014]
3.1.15
system
combination of interacting elements organized to achieve one or more stated purposes
[SOURCE ISO/IEC 25000:2014]
3.2 Data governance
3.2.1
corporate governance
system by which corporations are directed and controlled
[SOURCE: ISO/IEC 38500:2024]
3.2.2
data governance
execution and enforcement of authority over the definition, production, and usage of data related assets
[SOURCE: IEEE 7005:2021]
3.2.3
data governance framework
strategy, policies, decision-making structures and accountabilities, through which the organization’s
governance arrangements operate on data
[SOURCE: ISO/IEC TR 38502:2017, modified – the data are specified]
3.2.4
governance
process for establishing and enforcing strategic goals and objectives, organizational policies, and
performance parameters
[SOURCE: Software Extension to the PMBOK (R) Guide Fifth Edition) ISO/IEC/IEEE 21840:2019]
3.2.5
governing body
person or group of people who are accountable for the performance and conformance of the
organization
[SOURCE: ISO/IEC 5259-5 ; ISO/IEC 38500:2024]
3.2.6
management
system of controls and processes required to achieve the strategic objectives set by the organization's
governing body
[SOURCE: ISO/IEC/IEEE 21840:2019]
3.2.7
strategy
organization's overall plan of development, describing the effective use of resources in support of the
organization in its future activities. It involves setting objectives and proposing initiatives for action
[SOURCE: ISO/IEC/IEEE 24765]
3.2.8
process
predetermined course of events that occur during the execution of all or part of a program
[SOURCE: ISO/IEC 2382:2015]
3.2.9
personal data
any information relating to an identified or identifiable natural person
(‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in
particular by reference to an identifier such as a name, an identification number, location data, an online
identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic,
cultural or social identity of that natural person
[SOURCE: Regulation (EU) 2016/679 (GDPR) [28], Article 4 (1)]

Under preparation. Current stage: ISO/IEC FDIS 5259-5:2024.
3.2.10
processing of personal data
operation or set of operations which is performed on personal data or on sets of personal, whether or
not by automated means, such as collection, recording, organization, structuring, adaptation or
alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making
available, alignment or combination, restriction, erasure or destruction
[SOURCE: Regulation (EU) 2016/679 (GDPR) [28], Article 4 (2)]
3.2.11
product
result of a process
[SOURCE: ISO/IEC/IEEE 12207:2017, 3.1.36; ISO/IEC/IEEE 24748-1:2024, 3.34)
3.3 Data quality
3.3.1
analytics
composite concept consisting of data acquisition, validation, processing, including quantification,
visualization and interpretation
[SOURCE: ISO/IEC 5259-1:2024, modified; ISO/IEC 20546:2019, modified]
3.3.2
big data
extensive datasets, primarily in the data characteristics of volume, variety, velocity, and/or variability,
that require a scalable technology for efficient storage, manipulation, management and analysis
[SOURCE: ISO/IEC 20546:2019]
3.3.3
data
reinterpretable representation of information in a formalized manner suitable for communication,
interpretation, or processing
Note 1 to entry: Data can be processed by humans or by automatic means.
[SOURCE: ISO/IEC 25012:2008, ISO/IEC 2382:2015]
Note 2 to entry: The reinterpretable representation is connected to the data attributes that enable to be read and
interpreted by users (see ISO/IEC 25012:2008, 5.3.2.7).
3.3.4
data life cycle
cycle composed of 10 stages, i.e. idea conception, business requirements, data planning, data acquisition,
data preparation, building model, system deployment, system operation, data decommissioning, system
decommissioning
[SOURCE: ISO/IEC 8183:2023, Clause 5]
3.3.5
data management
disciplined process that plans for acquirers and provides stewardship for business and technical data,
consistent with requirements, throughout the data life cycle
[SOURCE: IEEE 7005:2021, 3.1]
3.3.6
data processing
systematic performance of operations upon data
[SOURCE: ISO/IEC 2382:2015]
3.3.7
data provenance record
record of the ultimate derivation and passage of a piece of data (3.3.3) through its various owners or
custodian
[SOURCE: ISO 8000-2:2022, 3.8.4]
3.3.8
data quality
degree to which the characteristics of data satisfy stated and implied needs when used under specified
conditions
[SOURCE: ISO/IEC 25012:2008]
3.3.9
data quality management
coordinated activities to direct and control an organization with regard to data quality
[SOURCE: ISO 8000-2:2022, 3.8.2]
3.3.10
data quality model
defined set of characteristics which provides a framework for specifying data quality requirements and
evaluation data quality
[SOURCE: ISO/IEC 25012:2008]
3.3.11
data file
set of related data records treated as a unit
Note 1 to entry: In ISO/IEC 25024:2015, data set is a synonym of data file.
3.3.12
dataset
collection of data with a shared format and goal-relevant content
[SOURCE: ISO/IEC 22989:2022, 3.2.5, modified in ISO/IEC 5259-2 ]

Under preparation. Current stage: ISO/IEC FDIS 5259-2:2024
3.3.13
data strategy
organization's overall plan of development, describing the effective use of data in support of the
organization in its future activities
Note 1 to entry: It involves setting a policy, objectives and proposing initiatives for action.
3.3.14
information
knowledge concerning objects, such as facts, events, things, or ideas, including concepts, that within a
certain context have a particular meaning
[SOURCE: ISO/IEC 2382:2015, quoted in ISO/IEC 25012:2008, and ISO/IEC 25024:2015]
3.3.15
non-personal data
all data that does not qualify as personal data (3.2.9)
3.3.16
provenance
information on the place and time of origin, derivation or generation of a dataset, proof of the dataset,
or a record of past and present ownership of the dataset
[SOURCE: ISO/IEC 11179-33:2023, 3.11]
3.3.17
synthetic data
data that has been generated using a purpose-built mathematical model or algorithm, with the aim of
solving a (set of) data science task(s)
[SOURCE: [38]]
3.3.18
quality in use
extent to which the system or product, when it is used in a specific context of use, satisfies, or exceeds
stakeholders needs to achieve beneficial goals or outcomes
[SOURCE: ISO/IEC 25019:2023, 3.1.15]
4 Abbreviations
AI Artificial Intelligence
AWI Approved Work Item
CD Committee Draft
CEI Italian Electrotechnical Committee
CEN European Committee for Standardization
CENELEC European Electrotechnical Committee for Standardization
CLC CENELEC
DGA Data Governance Act
DIS Draft International Standard
DLC Data Life Cycle
EU European Union
GDPR General Data Protection Regulation
IEC International Electrotechnical Commission
IEEE Institute of Electrical and Electronics Engineers
ISO International Organization for Standardization
LC Life Cycle
JTC 1 ISO-IEC Joint Technical Committee for Information Technology
JTC 21 CEN-CENELEC Joint Technical Committee for Artificial Intelligence
JRC Joint Research Centre
ML Machine learning
SQuaRE Software Quality Requirements and Evaluation
TR Technical Report
TS Technical Specification
UNI Italian National Unification Body
5 JRC research and data-related standards on AI
5.1 General
The Artificial Intelligence website of the European Commission’s Joint Research Centre (JRC) is the
official EU website reporting and monitoring of the development, uptake, and impact of Artificial
Intelligence in Europe.
It is possible to explore AI Watch by topic that takes a holistic view of what is impacting AI:
— topics on AI: Enablers, Landscape, Standards, Evolution of Technology, Trustworthy
— tools
— countries: all the European countries
— publications: that can be filtered by a keyword, type, data, area
— collaborations
— data
— events
— news
It offers a dynamic vision according to the evolutions taking place. Below is a static overview of the
situation of data and AI standards.
In this clause is highlighted part of the Conference and Workshop organized online on 8 June 2022 by
Joint Research Centre - JRC, the European Commission’s Science and knowledge Service, with the
participation of more than 178 persons from 36 countries, out of which 137 from 21 EU member states.
This research is considered for this document a good introduction to the tentative extension of
standards on AI concerning data (see 5.2).
5.2 Research: Data quality requirements for inclusive, non-biased and trustworthy AI
The report “Data quality requirements for inclusive, non-biased and trustworthy AI” [2] is available
at https://data.europa.eu/doi/10.2760/365479
Even if the data and data quality perspective is horizontal, parallel sessions was focused in the research
on different sectors:
4.1 Education and employment
4.2 Law enforcement and the public sector
4.3 Finance
4.4 AI for media, including social media, content moderation, recommender systems
4.5 Medicine and healthcare
4.6 Industrial automation and robotics
with a lot of detailed information.
For example, in 4.5.2 of the report [2], included in 4.5 (Medicine and healthcare), the challenges
addressed are mentioning data set properties and data quality aspects:
— legal compliance
— completeness and correctness of the data
— currentness
— inter and intra-data consistency
— representativeness of data
— balancedness
— avoidance of bias.
To guarantee the reader a harmonized view of the data quality characteristics according to the
standards, under development, also requested by the JRC [2], it is useful to consider:
— The data quality inherent characteristics mentioned in 4.5.2 (compliance, completeness,
currentness, consistency) are an essential basis of inherent data quality (system and domain
independent), that can be completed with accuracy, credibility;
— The data quality for data set is defined as representativeness, balancedness, avoidance of bias.
In Clause 8 of this document, a complete data quality model is defined based on specific international
standards.
The JRC report is quoted in this document when appropriate many with precise indications.
“The European Standardisation Organisations CEN and CENELEC recognized the urgent need for AI
standardization and launched last year the Joint Technical Committee 21 ‘Artificial Intelligence’
(JTC 21), responsible for the development and adoption of standards for AI, as well as providing
guidance to other technical committees concerned with AI.” (see JRC document, 2.4)
As reported in JRC “agreements between European Standardisation Organisations (ESO) and
International Standardisation Organisations (ISO), as well as relevant ad-hoc initiatives, can ensure that
international standards can be used at European level (also as harmonized standards)”. In terms of
international AI standardization ISO/IEC JTC1 SC42 (Joint Technical Committee 1, Subcommittee 42) is
the main source, with a considerable history of AI work, and a substantial number of standards
published or on development at different stages”. (JRC document - Clause 2.3).
In the JRC document, 4.5.1, the following figure (Figure 2) is shown on the state of art, ongoing
standardization activities.
NOTE The mentioned title in the figure “Data governance & quality for AI” was related to the “ad hoc group
5”, before the modification to “Data governance & quality for AI within the European context”. Ad hoc group 5 is
now the JTC21 WG3.
Figure 2 — Active organizations mentioned in JRC
The section ISO/IEC JTC1 has been extended in Figure 2, considering relevant existing international
standards published or under development.
The current international standards or documents in the field of AI are summarized below, listing
relevant documents developed by subcommittees of ISO/IEC JTC 1 on Information Technology:
— SC42 for Artificial Intelligence;
— SC40 for IT Governance;
— SC7 for testing in Software Engineering field.
Before to proceed collecting standards the following feature is described also in 4.5.1 of the JRC
document, giving an idea, as reported in Figure 3, of initial perspective to be extended, with particular
attention to data quality.
Figure 3 — Standards and Technical reports mentioned in JRC
Relevant ISO/IEC publications (standards and TRs) concern horizontal aspects of AI (e.g. robustness,
bias, machine learning = ML) as well as health specific aspects (i.e. ML applications for imaging and other
medical applications).
In the following there are core published standards or TRs (including DIS - Draft International
Standards) or under development (AWI - WD - CD - Committee Draft stage). Because of the fast evolution
of information, it is recommended to verify the stage of each standard at www.iso.org.
5.3 Data-related standards on AI for data governance and data quality
5.3.1 General
Figure 4 shows clusters of documents developed by different commissions and working groups.

Figure 4 — Clusters of standards, TS, TR data-related
5.3.2 A short description of the standards mentioned in Figure 4 (taken from www.iso.org)
5.3.2.1 ISO/IEC 5259-1 Artificial intelligence – Data quality for analytics and machine
learning (ML) – Part 1: Overview, terminology, and examples
This document provides the means for understanding and associating the individual documents of the
ISO/IEC 5259 series and is the foundation for conceptual understanding of data quality for analytics and
machine learning.
5.3.2.2 ISO/IEC 5259-2 Artificial intelligence – Data quality for analytics and machine
learning (ML) – Part 2: Data quality measures
This document provides a data quality model containing measures, and a guidance on reporting data
quality in the context of analytics and machine learning (ML). This document builds on ISO 8000 series,
ISO/IEC 25012, and ISO/IEC 25024. The aim of this documents is to enable organizations to achieve
their data quality objectives and requirements. ISO/IEC 5259-2 “Data quality measures” satisfy the
needs related to the quality of “individual data” (data item) and “datasets” (group of data) for
representativeness, control of bias and so on. In the Annex are reported information on measurement,
uml model, overview and categories of quality characteristics, synthetic data.
5.3.2.3 ISO/IEC 5259-3 Artificial intelligence – Data quality for analytics and machine
learning (ML) – Part 3: Data quality management requirements and guidelines
This document specifies requirements and provides guidance for establishing, implementing,
maintaining, and continually improving the quality for data used in the areas of analytics and machine
learning. This document does not define a detailed process, methods or metrics. Rather, it defines the
requirements and guidance for a quality management process along with a reference process and
methods that can be tailored to meet the requirements in this document.
5.3.2.4 ISO/IEC 5259-4 Artificial intelligence – Data quality for analytics and machine
learning (ML) – Part 4: Data quality process framework
This document provides general common organizational approaches, regardless of type, size or nature
of the applying organization, to ensure data quality for training and evaluation in analytics and machine
learning. It is applicable to training and evaluation data that comes from different sources, including
data acquisition and data composition, data preparation, data labelling, evaluation, and data use.
5.3.2.5 ISO/IEC 5259-5 Artificial intelligence – Data quality for analytics and machine
learning (ML) – Part 5: Data quality Governance
This document provides a data quality governance framework for analytics and machine learning to
enable governing bodies of organizations to direct and oversee the implementation and operation of
data quality measures, management, and related processes with adequate controls throughout the data
life cycle. This document can be applied to any analytics and machine learning. This document does not
define specific management requirements or process requirements specified in ISO/IEC 5259-3 and
ISO/IEC 5259-4, respectively.
5.3.2.6 ISO/IEC 8183:2023 Information Technology – Artificial intelligence – Data lifecycle
framework
This document provides an overarching data life cycle framework that is instantiable for any AI system
from data ideation to decommission. This document is applicable to the data processing throughout the

Under preparation. Current stage: ISO/IEC FDIS 5259-2:2024.
Under preparation. Current stage: ISO/IEC FDIS 5259-2:2024.
Under preparation. Current stage: ISO/IEC FDIS 5259-5:2024.
AI system life cycle including the acquisition, creation, development, deployment, maintenance, and
decommissioning.
5.3.2.7 ISO/IEC 20546:2019 Information Technology – Big data – Overview and vocabulary
This document provides an overview of the field of big data, the relationship to other technical areas
and standards efforts, and the concepts as described to big data.
5.3.2.8 ISO/IEC 22989:2022 Information technology – Artificial intelligence – Artificial
intelligence concepts and terminology (Foundational)
This document establishes terminology for AI and describes concepts in the field of AI, including terms
related to data; it can be used in the development of other standards and in support of communications
among diverse and interested parties or stakeholders; it is applicable to all types of organizations (e.g.
commercial enterprises, government agencies, not-for-profit organizations).
5.3.2.9 ISO/IEC TR 24027:2021 Information technology – Artificial intelligence – Bias in AI
systems and AI aided decision making
This document addresses bias in relation to AI systems, especially with regards to AI-aided decision-
making. Measurement techniques and methods for assessing bias are described, with the aim to address
and treat bias-related vulnerabilities. All AI system lifecycle phases are in scope, including but not
limited to data collection, training, continual learning, design, testing, evaluation, and use.
5.3.2.10 ISO/IEC TR 24030:2024 Information Technology – Artificial intelligence (AI) – Use
cases
This document provides a collection of representative use cases of AI applications in a variety of
domains.
5.3.2.11 ISO/IEC 25059:2023 Software engineering – Systems and software Quality
Requirements and Evaluation (SQuaRE) – Quality model for AI
This document provides an application-specific extension for AI systems of ISO/IEC 25010:2011. It
satisfies a lot of characteristics mentioned in the AI Act, directly or indirectly, such as Functional
suitability, Interoperability, Portability, Usability. It adds also detailed characteristics and sub
characteristics using terminology for specifying, measuring, and evaluating AI system quality. It
introduces five new characteristics to the quality model of the product (functional adaptability, user
controllability, transparency, robustness, intervenability). In the quality in use section, it also introduces
two new sub-characteristics for the final user and stakeholders (transparency and mitigation of societal
and ethics risks). Models for data quality are complimentary to this model.
5.3.2.12 ISO/IEC 42001 Information Technology – Artificial intelligence – Management system
This document specifies the requirements and provides guidance for establishing, implementing,
maintaining, and continually improving an AI management system within the context of an organization.
This document is intended for use by an organization providing or using products or services that utilize
AI systems. This document helps the organization to develop or use AI systems responsibly in pursuing
its objectives and meet applicable regulatory requirements, obligations related to interested parties and
expectations from them. In Annex A, 7.4 “Quality of data for AI system” is described the importance of
ISO/IEC 25024 “Measurement of data qual
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

Frequently Asked Questions

CEN/CLC/TR 18115:2024 is a technical report published by the European Committee for Standardization (CEN). Its full title is "Data governance and quality for AI within the European context". This standard covers: This document provides an overview on AI-related standards, with a focus on data and data life cycles, to organizations, agencies, enterprises, developers, universities, researchers, focus groups, users, and other stakeholders that are experiencing this era of digital transformation. It describes links among the many international standards and regulations published or under development, with the aim of promoting a common language, a greater culture of quality, giving an information framework. It addresses the following areas: - data governance; - data quality; - elements for data, data sets properties to provide unbiased evaluation and information for testing.

This document provides an overview on AI-related standards, with a focus on data and data life cycles, to organizations, agencies, enterprises, developers, universities, researchers, focus groups, users, and other stakeholders that are experiencing this era of digital transformation. It describes links among the many international standards and regulations published or under development, with the aim of promoting a common language, a greater culture of quality, giving an information framework. It addresses the following areas: - data governance; - data quality; - elements for data, data sets properties to provide unbiased evaluation and information for testing.

CEN/CLC/TR 18115:2024 is classified under the following ICS (International Classification for Standards) categories: 35.240.01 - Application of information technology in general. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase CEN/CLC/TR 18115:2024 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of CEN standards.

La norme CEN/CLC/TR 18115:2024 sur la gouvernance des données et la qualité des données pour l'intelligence artificielle dans le contexte européen représente une avancée significative dans la standardisation des pratiques de gestion des données. Son champ d'application est particulièrement pertinent dans une période où la transformation numérique implique une multitude d’acteurs, allant des organisations et agences aux universités et développeurs. Cette norme se distingue par sa capacité à établir des liens clairs entre les nombreux standards internationaux et les réglementations en développement, favorisant ainsi une langue commune et une culture de qualité essentielle. La portée de la norme couvre notamment des domaines cruciaux tels que la gouvernance des données et la qualité des données, des éléments fondamentaux pour garantir des évaluations objectives et des informations fiables lors des tests des systèmes d'IA. En se concentrant sur le cycle de vie des données et les propriétés des ensembles de données, la norme met en avant des pratiques permettant une évaluation non biaisée et un partage d'informations transparent. Cela renforce la pertinence de la norme dans un contexte où la confiance et la transparence sont des enjeux majeurs pour les utilisateurs et les développeurs d'IA. La CEN/CLC/TR 18115:2024 est donc un document essentiel pour tous les acteurs impliqués dans le développement et l'implémentation de l'intelligence artificielle, offrant une structure qui aide à naviguer dans le paysage complexe des standards et à améliorer la qualité des données utilisées. La norme ne se contente pas de présenter des solutions, mais sert également de fondement pour construire des systèmes d'IA robustes et fiables, essentiels pour un avenir numérique prospère.

CEN/CLC/TR 18115:2024は、ヨーロッパにおけるAI関連の標準についての包括的な概要を提供しており、特にデータガバナンスとデータの品質に焦点を当てています。この標準は、デジタルトランスフォーメーションの時代において、組織、機関、企業、開発者、大学、研究者、焦点グループ、ユーザーなど、さまざまな関係者に対して非常に有用です。 このドキュメントは、国際的な標準や規制の関連性を説明しており、それらが既に公表されているものや今後開発が予定されているものを含んでいます。これにより、共通の言語を推進し、質の文化を向上させ、情報のフレームワークを提供することを目指しています。 特に、データガバナンスやデータ品質といった重要な領域に対するアプローチは評価されるポイントであり、これによりステークホルダー間での理解と協力が促進されます。また、データおよびデータセットの特性に関する要素を取り入れることにより、公正な評価とテストのための情報提供が可能になる点も、この標準の強みです。 全体として、CEN/CLC/TR 18115:2024は、AIの文脈におけるデータガバナンス及びデータ品質に関する道標を示しており、各種の関係者がこの重要な分野での適切な管理と向上を図るための基盤を提供しています。

CEN/CLC/TR 18115:2024 표준은 유럽 맥락에서 인공지능(AI)에 대한 데이터 거버넌스 및 품질에 관한 문서로, 데이터와 데이터 생명 주기에 중점을 두어 다양한 이해관계자들에게 명확한 정보를 제공합니다. 이 표준은 디지털 전환 시대를 맞이하는 조직, 기관, 기업, 개발자, 대학, 연구자 및 사용자들에게 AI 관련 표준을 포괄적으로 설명하고 있습니다. 이 문서의 주요 강점은 여러 국제 표준 및 규제 간의 연계를 명확히 제시하여 통일된 언어 및 품질 문화를 촉진하는 데 있습니다. 이를 통해 정보 프레임워크를 구체화하고, 데이터 거버넌스 및 데이터 품질에 대한 명확한 기준을 설정할 수 있도록 돕습니다. CEN/CLC/TR 18115:2024는 데이터 및 데이터 세트의 속성을 정의하여 편향 없는 평가와 테스트를 위한 정보를 제공합니다. 이러한 요소들은 AI 개발자와 기업이 신뢰할 수 있는 데이터 기반의 의사 결정을 할 수 있도록 지원합니다. 특히, 데이터 거버넌스 관련 규정은 데이터 관리와 품질 보증에 대한 책임을 명확히 하여 모든 이해관계자가 일관된 접근 방식을 취할 수 있게 합니다. 결론적으로, 이 표준은 AI와 관련된 데이터의 품질 및 거버넌스에 대한 중요한 지침을 제공하며, 유럽 전역에서 디지털 전환을 성공적으로 이끌기 위한 필수 문서로 자리매김하고 있습니다.

The standard CEN/CLC/TR 18115:2024 excels in establishing a comprehensive framework for data governance and quality within the context of AI in Europe. Its scope is notably broad, catering to a diverse range of stakeholders including organizations, agencies, developers, and researchers, among others. This inclusivity is a significant strength, as it ensures that the guidelines are accessible and applicable across various sectors experiencing digital transformation. One of the primary merits of this standard is its emphasis on data and data life cycles, offering an insightful overview of AI-related standards. By delineating the connections among numerous international standards and regulations, the document fosters a unified approach to handling data across different jurisdictions. This holistic perspective is crucial in promoting a common language in the field, which is often fragmented due to the varying regulations and practices across member states. The standard’s focus on data governance and data quality is particularly relevant in today’s data-driven landscape. It addresses vital components, such as the properties of data and data sets required to facilitate unbiased evaluation and robust testing. By providing clear guidelines and criteria in these areas, CEN/CLC/TR 18115:2024 contributes to a greater culture of quality, empowering organizations to enhance their data management practices. Moreover, the document serves as an essential reference point during this era of rapid technological advancement, where the ethical use of AI and the integrity of data are paramount. The emphasis on establishing sound data governance structures will help stakeholders navigate the complexities of data management in AI, ensuring compliance with existing regulations while also advocating for best practices. Overall, CEN/CLC/TR 18115:2024 stands out as a pivotal resource in the European context, addressing critical issues related to data governance and quality for AI. Its relevance is underscored by its potential to guide organizations in effectively managing their data assets and adhering to international standards, ultimately fostering trust and accountability in AI applications.

Die Norm CEN/CLC/TR 18115:2024 bietet einen umfassenden Überblick über Standards im Bereich Künstliche Intelligenz (KI) und deren Auswirkungen auf Daten und Datenlebenszyklen. Sie richtet sich an eine Vielzahl von Stakeholdern, darunter Organisationen, Behörden, Unternehmen, Entwickler, Universitäten, Forscher, Fokusgruppen, Benutzer und andere Akteure, die sich in der Ära der digitalen Transformation befinden. Ein wesentlicher Schwerpunkt der Norm liegt auf der Datenverwaltung und der Datenqualität. Die Norm definiert klare Richtlinien und Standards, die für die Implementierung von effektiven Datenmanagementpraktiken unerlässlich sind. Besonders hervorzuheben ist der Ansatz zur Förderung einer gemeinsamen Sprache unter verschiedenen Akteuren, was zu einer höheren Qualität der Daten und deren Verwendung beiträgt. Die Beschreibung der Elemente für Daten und Datensatzeigenschaften ermöglicht eine unvoreingenommene Bewertung und bietet wertvolle Informationen für Testzwecke. Dies ist besonders relevant in einem europäischen Kontext, in dem der Austausch und die Konsistenz von Daten von entscheidender Bedeutung sind, um die integrations- und innovationsfördernde Rolle von KI zu unterstützen. Darüber hinaus bietet das Dokument eine Struktur zur Verbindung der zahlreichen internationalen Standards und Regulierungen, die veröffentlicht oder in Entwicklung sind. Diese Verknüpfung fördert ein besseres Verständnis und eine stärkere Verflechtung zwischen den verschiedenen Standards, was die Qualität der Datenverwaltung und -nutzung weiter verbessert. Die Norm CEN/CLC/TR 18115:2024 ist somit ein äußerst relevant und fortschrittlich Instrument für alle, die im Bereich der KI und Datenverarbeitung tätig sind, und spielt eine entscheidende Rolle bei der Standardisierung von Daten governance und Datenqualität innerhalb des europäischen Rahmens.