ISO/PRF 24635-1
(Main)Language resource management — Corpus annotation project management — Part 1: Core model
Language resource management — Corpus annotation project management — Part 1: Core model
This standard describes the basic principles and recommended procedures for corpus annotation project management as its core model. The core model of corpus project management consists of a series of recommended work packages to fulfill the basic requirements for error-free corpus annotation with training the involved human and validating the intermediate results. Thus, the core model contains recommendations below: - corpus annotation project organization, - internal structures and work-packages for corpus annotation project management, - project team members' qualification, - workflow among the internal structures of project.
Gestion des ressources linguistiques — Gestion de projet d'annotation de corpus — Partie 1: Modèle de base
Upravljanje jezikovnih virov - Projektno vodenje anotacije korpusa - 1. del: Jedrni model
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
01-junij-2025
Upravljanje jezikovnih virov - Projektno vodenje anotacije korpusa - 1. del: Jedrni
model
Language resource management — Corpus Annotation Project Management — Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet d'annotation de corpus —
Partie 1: Modèle de base
Ta slovenski standard je istoveten z: ISO/PRF 24635-1
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
DRAFT
International
Standard
ISO/DIS 24635-1
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Corpus Annotation Project
Voting begins on:
Management —
2024-11-05
Part 1:
Voting terminates on:
2025-01-28
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
This document is circulated as received from the committee secretariat.
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS.
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Reference number
ISO/DIS 24635-1:2024(en)
DRAFT
ISO/DIS 24635-1:2024(en)
International
Standard
ISO/DIS 24635-1
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Corpus Annotation Project
Voting begins on:
Management —
Part 1:
Voting terminates on:
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
© ISO 2024
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
STANDARDS MAY ON OCCASION HAVE TO
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
This document is circulated as received from the committee secretariat. BE CONSIDERED IN THE LIGHT OF THEIR
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
or ISO’s member body in the country of the requester.
NATIONAL REGULATIONS.
ISO copyright office
RECIPIENTS OF THIS DRAFT ARE INVITED
CP 401 • Ch. de Blandonnet 8
TO SUBMIT, WITH THEIR COMMENTS,
CH-1214 Vernier, Geneva
NOTIFICATION OF ANY RELEVANT PATENT
Phone: +41 22 749 01 11
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/DIS 24635-1:2024(en)
ii
ISO/DIS 24635-1:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
3.1 Terms and definitions for corpus annotation .2
3.2 Terms and definitions for project management.3
4 Purpose and justification . 6
5 Core Model . 6
5.1 Project organization and role .7
5.1.1 Project manager .7
5.1.2 Project technical manager .7
5.1.3 Work package manager .7
5.1.4 Process team leader.7
5.1.5 Team member .8
5.2 Process groups for corpus annotation project .8
5.3 Corpus annotation project work package and process .8
5.3.1 Integrated management of corpus annotation project .8
5.3.2 Corpus annotation work management . 12
5.3.3 Corpus annotation project quality control . 13
6 Publication and archiving of the corpus annotation (optional) .15
Annex A (informative) Process flow in the scope of process groups and work packages .16
Bibliography . 19
iii
ISO/DIS 24635-1:2024(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of any patent
rights identified during the development of the document will be in the Introduction and/or on the ISO list of
patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and Terminology, Subcommittee
SC 4, Language Resource Management.
A list of all parts in the ISO 24635 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
ISO/DIS 24635-1:2024(en)
Introduction
Corpus annotation is a process of annotating additional linguistic information to primary data. The goal
of corpus annotation projects is to achieve high quality deliverables following the annotation specification
within limited resource environments.
Language resource management – Corpus Annotation Project Management is a serialized proposal of standards
that aim to give recommendations to construct high quality annotated corpora effectively and efficiently.
The proposal consists of three parts of model – Core Model, Validation Model, and Training Model.
Part 1: Core Model presents the basic principles including considerations of corpus annotation, procedures
of corpus annotation project, project organization, work packages and tasks that can be applied to corpus
annotation project regardless of the scale, complexity, and duration of the corpus annotation projects.
Part 2: Training Model presents the basic principles to train the project participants and to maintain their
ability to execute the project.
Part 3: Validation Model presents the basic principles for quality control of deliverables achieving error- free
annotation following the specification of annotation.
v
DRAFT International Standard ISO/DIS 24635-1:2024(en)
Language resource management — Corpus Annotation
Project Management —
Part 1:
Core model
1 Scope
This standard is a part of series of standards for corpus annotation project management. This part 1
describes the core model of project management for corpus annotation, to specify the work packages of
project teams, required processes and deliverables. The other parts of this series of standards shall describe
the training model of human resources involved and the validation model as parts 2 and 3.
This document does not specify the methodology to solve the issues such as quality control, human training,
reusability, licensing and copyright, but present the necessary components for such issues and specify what
work packages, their subtasks and workflow among them are required to manage the corpus annotation
project to handle such issues. This document presents the required components to deal with the quality
control, human training, reusability, licensing, copyright and other area for corpus project management by
specifying what work packages, their subtasks and workflow among them.
Thus, this core model of project management for corpus annotation shall specify recommendations on what
work packages and deliverables are required under the project in which workflows and processes deal with:
— Integration and Communication Among Work Packages: This includes ensuring that all work
packages are well-coordinated, particularly in terms of the adoption of broader annotation standards
and integration with ontologies to enhance interoperability. Effective communication across work
packages is crucial for the seamless sharing of annotated documents with other projects.
— Human Resource Management and Interrater Reliability: This covers the management of human
resources, focusing on training and qualification, as well as the implementation of interrater reliability
practices. These practices include training, testing, and the use of appropriate tools to ensure consistency
across annotations.
— Annotation Guideline Management and Software Utilization: This involves managing the guidelines
for annotation tasks and utilizing annotation software and tools, particularly in environments leveraging
artificial intelligence (AI) and machine learning (ML) techniques. It includes the cautious application of
AI/ML methods, such as weak supervised learning, to support the annotation process.
— Quality Control, Validation, and Structured Documentation: This encompasses the processes
...
International
Standard
ISO 24635-1
First edition
Language resource management —
Corpus annotation project
management —
Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet
d'annotation de corpus —
Partie 1: Modèle de base
PROOF/ÉPREUVE
Reference number
ISO 24635-1:2025(en) © ISO 2025
ISO 24635-1:2025(en)
© ISO 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
PROOF/ÉPREUVE
ii
ISO 24635-1:2025(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Terms related to corpus annotation .2
3.2 Terms related to project management .3
4 Core model . 6
4.1 General .6
4.2 Project organization and role .6
4.2.1 General .6
4.2.2 Project manager .7
4.2.3 Project technical manager .7
4.2.4 Work package manager .7
4.2.5 Process team leader.7
4.2.6 Team member . .8
4.3 Project management process groups for corpus annotation project .8
4.4 Corpus annotation project work package and process .8
4.4.1 General .8
4.4.2 Integrated management of corpus annotation project .8
4.4.3 Corpus annotation work management . 12
4.4.4 Corpus annotation project quality control . 13
5 Publication and archiving of the corpus annotation (optional) .15
Annex A (informative) Process flow organized by process groups and work packages.16
Bibliography .20
PROOF/ÉPREUVE
iii
ISO 24635-1:2025(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24635 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
PROOF/ÉPREUVE
iv
ISO 24635-1:2025(en)
Introduction
Corpus annotation is a process of annotating additional information to primary data. The goal of corpus
annotation projects is to achieve high quality deliverables following the annotation specification within
limited resource environments.
This series gives recommendations on constructing high quality annotated corpora effectively and
efficiently. The series will consist of three parts of model: core model, training model and validation model:
— This document presents the basic principles including considerations of corpus annotation, procedures
of corpus annotation project, project organization, work packages and tasks that can be applied to corpus
annotation project regardless of the scale, complexity and duration of the corpus annotation projects.
1)
— ISO 24635-2 presents the basic principles for training project participants and maintaining their ability
to execute the annotation project tasks.
2)
— ISO 24635-3 presents the basic principles for quality control of deliverables ensuring error-free
annotation aligned with the annotation specification.
Corpus annotation principles and guidelines on proper and efficient annotation have a long-established
history, as discussed in References [11] and [13]. This document specifically focuses on providing guidance on
managing corpus annotation projects effectively rather than prescribing specific annotation methodologies.
1) Under preparation. Stage at the time of publication: ISO/WD 24635-2:2024.
2) Planned.
PROOF/ÉPREUVE
v
International Standard ISO 24635-1:2025(en)
Language resource management — Corpus annotation project
management —
Part 1:
Core model
1 Scope
This document establishes a core model of project management for corpus annotation, to specify the work
packages of project teams, required processes and deliverables.
This document presents the necessary components for issues such as coordination, human training,
reusability, software, quality control, licensing and copyright. However, it does not specify a methodology to
solve such issues.
This document gives guidance on what work packages and deliverables are required under the project in
which workflows and processes deal with the following:
— Integration and communication among work packages: This includes ensuring that all work packages are
well-coordinated, particularly in terms of the adoption of broader annotation standards and integration
with ontologies to enhance interoperability. Effective communication across work packages is crucial for
the seamless sharing of annotated documents with other projects.
— Human resource management and interrater reliability: This covers the management of human resources,
focusing on training and qualification, as well as the implementation of interrater reliability practices.
These practices include training, testing and the use of appropriate tools to ensure consistency across
annotations.
— Annotation guideline management and software utilization: This involves managing the guidelines for
annotation tasks and utilizing annotation software and tools, particularly in environments leveraging
artificial intelligence (AI) and machine learning (ML) techniques.
— Quality control, data validation and structured documentation: This encompasses the processes for
quality control and validation of annotation results, alongside the need for structured documentation
and ongoing curation. This ensures that annotated documents remain accurate, relevant and usable over
the long term.
— Licensing, copyrights and metadata management: This focuses on documenting licences and copyrights,
providing metadata to manage the sharing of resources. It is particularly important in areas with
copyright restrictions or licensing concerns, ensuring that data subsets can be appropriately managed
and shared.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
PROOF/ÉPREUVE
ISO 24635-1:2025(en)
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1 Terms related to corpus annotation
3.1.1
annotation
information added to primary data (3.1.9), independent of its representation
[SOURCE: ISO 24623-1:2018, 3.1]
3.1.2
annotation layer
layer for corpus annotation (3.1.6)
EXAMPLE Syntactic layer, lexical-semantic layer, entity layer.
3.1.3
annotation scheme
description of the structure of annotations (3.1.1)
3.1.4
annotation unit
specific segment of primary data (3.1.9) that is identified and labelled according to an annotation scheme (3.1.3)
EXAMPLE Word, phrase, clause, sentence, utterance.
3.1.5
corpus
collection of natural language data
[SOURCE: ISO 1087:2019, 3.6.4, modified — The preferred term “text corpus” deleted. Note 1 to entry
deleted.]
3.1.6
corpus annotation
action of adding interpretative linguistic or non-linguistic information to a corpus (3.1.5)
[12]
[SOURCE: Leech, G., 2005 , modified — “non-linguistic” added.]
3.1.7
corpus annotation project
project (3.2.9) aimed at enhancing a collection of corpora (3.1.5) with metadata or labels that provide
additional linguistic, non-linguistic, semantic, or structural information to facilitate analysis, research and
the development of natural language processing tools
3.1.8
guideline
official recommendation or advice that indicates policies, standards or procedures for how something
should be accomplished
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.1774]
3.1.9
primary data
original, unannotated electronic representation of language data that serves as the foundation for the
annotation process
PROOF/ÉPREUVE
ISO 24635-1:2025(en)
3.1.10
resource
skilled human re
...
ISO/FDIS PRF 24635-1:2025(en)
ISO/TC 37/SC 4/WG 5
Secretariat: KATS
Date: 2025-04-1405-15
Language resource management — Corpus annotation project
management —
Part 1:
Core model
Gestion des ressources linguistiques — Gestion de projet d'annotation de corpus —
Partie 1: Modèle de base
PROOF
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication
may be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying,
or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO
at the address below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: + 41 22 749 01 11
E-mail: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents
Foreword . iv
Introduction . v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
3.1 Terms related to corpus annotation . 2
3.2 Terms related to project management . 3
4 Core model . 6
4.1 General . 6
4.2 Project organization and role . 7
4.3 Project management process groups for corpus annotation project . 8
4.4 Corpus annotation project work package and process . 8
5 Publication and archiving of the corpus annotation (optional) . 16
Annex A (informative) Process flow organized by process groups and work packages . 17
Bibliography . 22
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types of
ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent rights
in respect thereof. As of the date of publication of this document, ISO had not received notice of (a) patent(s)
which may be required to implement this document. However, implementers are cautioned that this may not
represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
A list of all parts in the ISO 24635 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
Corpus annotation is a process of annotating additional information to primary data. The goal of corpus
annotation projects is to achieve high quality deliverables following the annotation specification within
limited resource environments.
This series gives recommendations on constructing high quality annotated corpora effectively and efficiently.
The series will consist of three parts of model: core model, training model and validation model:
— This document presents the basic principles including considerations of corpus annotation, procedures of
corpus annotation project, project organization, work packages and tasks that can be applied to corpus
annotation project regardless of the scale, complexity and duration of the corpus annotation projects.
1)
— ISO 24635-2 presents the basic principles for training project participants and maintaining their ability
to execute the annotation project tasks.
2)
— ISO 24635-3 presents the basic principles for quality control of deliverables ensuring error-free
annotation aligned with the annotation specification.
Corpus annotation principles and guidelines on proper and efficient annotation have a long-established
history, as discussed in References [11] and [13]. This standarddocument specifically focuses on providing
guidelines forguidance on managing corpus annotation projects effectively rather than prescribing specific
annotation methodologies.
1)
Under preparation. Stage at the time of publication: ISO/WD 24635-2:2024.
2)
Planned.
v
Language resource management — Corpus annotation project
management —
Part 1:
Core model
1 Scope
This document establishes a core model of project management for corpus annotation, to specify the work
packages of project teams, required processes and deliverables.
This document presents the necessary components for issues such as coordination, human training,
reusability, software, quality control, licensing and copyright. However, it does not specify a methodology to
solve such issues.
This document gives guidance on what work packages and deliverables are required under the project in
which workflows and processes deal with the following:
— Integration and communication among work packages: This includes ensuring that all work packages are
well-coordinated, particularly in terms of the adoption of broader annotation standards and integration
with ontologies to enhance interoperability. Effective communication across work packages is crucial for
the seamless sharing of annotated documents with other projects.
— Human resource management and interrater reliability: This covers the management of human resources,
focusing on training and qualification, as well as the implementation of interrater reliability practices.
These practices include training, testing and the use of appropriate tools to ensure consistency across
annotations.
— Annotation guideline management and software utilization: This involves managing the guidelines for
annotation tasks and utilizing annotation software and tools, particularly in environments leveraging
artificial intelligence (AI) and machine learning (ML) techniques.
— Quality control, data validation and structured documentation: This encompasses the processes for quality
control and validation of annotation results, alongside the need for structured documentation and ongoing
curation. This ensures that annotated documents remain accurate, relevant and usable over the long term.
— Licensing, copyrights and metadata management: This focuses on documenting licences and copyrights,
providing metadata to manage the sharing of resources. It is particularly important in areas with copyright
restrictions or licensing concerns, ensuring that data subsets can be appropriately managed and shared.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https://www.iso.org/obp
— IEC Electropedia: available at https://www.electropedia.org/
3.1 Terms related to corpus annotation
3.1.1
annotation
information added to primary data (3.1.9), independent of its representation
[SOURCE: ISO 24623-1:2018, 3.1]
3.1.2
annotation layer
layer for corpus annotation (3.1.6)
EXAMPLE Syntactic layer, lexical-semantic layer, entity layer.
3.1.3
annotation scheme
description of the structure of annotations (3.1.1)
3.1.4
annotation unit
specific segment of primary data (3.1.9) that is identified and labelled according to an annotation scheme
(3.1.3)
EXAMPLE Word, phrase, clause, sentence, utterance.
3.1.5
corpus
collection of natural language data
[SOURCE: ISO 1087:2019, 3.6.4, modified — The preferred term “text corpus” deleted. Note 1 to entry deleted.]
3.1.6
corpus annotation
action of adding interpretative linguistic or non-linguistic information to a corpus (3.1.5)
[SOURCE: LEECH, Geoffrey. Adding Linguistic Annotation. In: WYNNE, Martin (ed.). Developing Linguistic
[12 ]
Corpora: A guide to Good Practice. Oxford: Oxbow books,LEECH, G., 2005 , , modified — “non-linguistic”
added.]
3.1.7
corpus annotation project
project (3.2.9) aimed at enhancing a collection of corpora (3.1.5) with metadata or labels that provide
additional linguistic, non-linguistic, semantic, or structural information to facilitate analysis, research and the
development of natural language processing tools
3.1.8
guideline
official recommendation or advice that indicates policies, standards or procedures for how something should
be accomplished
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.1774]
3.1.9
primary data
original, unannotated electronic representation of language data that serves as the foundation for the
annotation process
3.1.10
resource
skilled human resources (specific disciplines either individually or in crews or teams), equipment, services,
supplies, commodities, material, budgets or funds
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3461,2]
3.2 Terms related to project management
3.2.1
activity
identified piece of work that is required to be undertaken to complete a project (3.2.9), programme, portfolio
or other related work
[SOURCE: ISO 21506:2024, 3.2], modified — Note 1 to entry deleted.]
3.2.2
control
comparison of actual performance with planned performance, analysing variances and taking appropriate
corrective and/or preventive action as needed
[SOURCE: ISO 21506:2024, 3.13], modified — “and/or” replaced “and”.]
3.2.3
data consistency
adherence to uniform and standardized annotation (3.1.1)guidelines (3.1.8) and criteria across the entire
corpus (3.1.5), ensuring that all annotated elements follow the same rules and conventions, which facilitates
reliable and reproducible analysis
3.2.4
data validation
process (3.2.7) of systematically checking and verifying the accuracy, completeness and consistency of
annotations (3.1.1) within the corpus (3.1.5) to ensure that the data meet predefined quality standards and
guidelines (3.1.8)
3.2.5
deliverable
unique and verifiable element that is required to be produced by a project (3.2.9)
[SOURCE: ISO 21502:2020, 3.9]
3.2.6
output
aggregated tangible or intangible deliverables (3.2.5) that form the project (3.2.9) result
[SOURCE: ISO 21502:2020, 3.14]
3.2.7
process
systematic series of activities (3.2.1) directed towards causing an end result such that one or more inputs will
be acted upon to create one or more outputs (3.2.6)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3037,8]
3.2.8
process group
collection of related processes (3.2.7)
[SOURCE: ISO/IEC/IEEE 24765:2017, 3.3057,1]
3.2.9
project
temporary endeavour to achieve one or more defined objectives
[SOURCE: ISO 21502:2020, 3.20]
3.2.10
project charter
document that states the problem to be solved, the improvement goals, the project scope (3.2.19), the project
(3.2.9) milestones and the project roles and responsibilities
[SOURCE: ISO 13053-2:2011, 2.26]
3.2.11
project communications management
processes (3.2.7) that are requ
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.