ISO 24611-1:2025
(Main)Language resource management — Morphosyntactic annotation framework (MAF) — Part 1: Core model
Language resource management — Morphosyntactic annotation framework (MAF) — Part 1: Core model
This document establishes a framework for the representation of annotations of word-sized units in texts. Such annotations describe tokens, their relationship with lexical units (word-forms), and the relevant morphosyntactic properties. This document proposes a metamodel for morphosyntactic annotation that can be augmented with references to data categories contained in a data category repository conforming to ISO 12620-2. It also defines an XML serialization for morphosyntactic annotations, according to the principles laid out in the TEI Guidelines (see Reference [ REF Reference_ref_33 \r \h 31 08D0C9EA79F9BACE118C8200AA004BA90B0200000008000000110000005200650066006500720065006E00630065005F007200650066005F00330033000000 ]). This document does not apply to structural ambiguities or the structure and composition of morphosyntactic tagsets. This document does not address the linguistic choices that identify tokens or determine the language- or context-particular relationships between tokens and word-forms.
Gestion des ressources linguistiques — Cadre d'annotation morphosyntaxique (MAF) — Partie 1: Modèle de base
Upravljanje z jezikovnimi viri - Ogrodje za oblikoskladenjsko označevanje (MAF) - 1. del: Jedrni model
General Information
Relations
Standards Content (Sample)
International
Standard
ISO 24611-1
First edition
Language resource management —
2025-11
Morphosyntactic annotation
framework (MAF) —
Part 1:
Core model
Gestion des ressources linguistiques — Cadre d'annotation
morphosyntaxique (MAF) —
Partie 1: Modèle de base
Reference number
© ISO 2025
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 MAF metamodel . 6
4.1 Levels of description in the MAF metamodel .6
4.2 MAF in the standards landscape .6
4.3 Metadata .7
4.4 Structural ambiguities .7
4.5 MAF metamodel in detail .8
5 Token-level segmentation . 10
5.1 General remarks .10
5.2 Formal description: .10
5.3 Embedding notation .11
5.4 Stand-off notation . 12
5.5 Normalization and script conversion .14
5.6 Inline token annotation strategies for token separation . 15
5.6.1 General remarks . 15
5.6.2 Adjacent tokens in embedded mode . 15
5.6.3 Overlapping tokens . 15
6 Word-forms as linguistic units .16
6.1 General remarks .16
6.2 Formal description: .17
6.3 Token attachment .17
6.3.1 One token: one word-form .17
6.3.2 Several contiguous tokens: one word-form .18
6.3.3 Several discontinuous tokens: one word-form .18
6.3.4 Zero token: one word-form .19
6.3.5 One token: several word-forms .19
6.4 Referencing lexical entries .19
6.5 Compound word-forms . 20
6.6 Identification of word-forms .21
7 Morphosyntactic content .21
7.1 General remarks .21
7.2 Using feature structures . .21
7.3 Compact morphosyntactic tags . 22
7.4 FSR libraries . . 22
7.5 Designing morphosyntactic tagsets . 23
8 Handling ambiguities.24
8.1 General .24
8.2 Word-form content ambiguities .24
8.3 Lexical and structural ambiguities . 25
9 Conformance .25
Annex A (informative) Examples .27
Annex B (informative) Referencing externally defined data categories .31
Bibliography .34
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 4, Language resource management.
This first edition of ISO 24611-1 cancels and replaces ISO 24611:2012, which has been technically revised.
The main changes are as follows:
— the data model is fully serialized in TEI XML;
— definitions and text have been revised;
— conformance conditions have been added;
— most of the former Clause 8, dealing with word lattices, has been removed and delegated to a planned
ISO 24611-2;
— the annex of sample data categories has been removed in favour of an external repository of data
categories.
A list of all parts in the ISO 24611 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
ISO/TC 37/SC 4 focuses on the definition of models and formats for the representation of annotated
language resources. To this end, it has generalized the modelling strategy initiated by its sister committee,
ISO/TC 37/SC 3, for the representation of terminological data (see Reference [21]), through which linguistic
data models are seen as the combination of a generic data pattern (a metamodel), which is further refined
through a selection of data categories that provide the descriptors for this specific annotation level.
Such models are defined independently of any specific formats and ensure that an implementer has the
necessary conceptual instrument with which to design and compare formats with regard to their degrees of
interoperability.
One important aspect of representing any kind of annotation is the capacity to provide a clear and reliable
semantics for the various descriptors used, either in the form of formal features and feature values, or
directly as objects in a representation that is expressed, for instance, in XML. In order to be shared
...
SLOVENSKI STANDARD
01-oktober-2024
Upravljanje z jezikovnimi viri - Ogrodje za oblikoskladenjsko označevanje (MAF) -
1. del: Jedrni model
Language resource management — Morphosyntactic annotation framework (MAF) —
Part 1: Core model
Gestion des ressources linguistiques - Cadre d'annotation morphosyntaxique (MAF) —
Partie 1: Modèle de base
Ta slovenski standard je istoveten z: ISO/DIS 24611-1
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
DRAFT
International
Standard
ISO/DIS 24611-1
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Morphosyntactic annotation
Voting begins on:
framework (MAF) —
2024-07-25
Part 1:
Voting terminates on:
2024-10-17
Core model
Gestion des ressources linguistiques - Cadre d'annotation
morphosyntaxique (MAF) —
Partie 1: Modèle de base
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
STANDARDS MAY ON OCCASION HAVE TO
This document is circulated as received from the committee secretariat.
BE CONSIDERED IN THE LIGHT OF THEIR
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
NATIONAL REGULATIONS.
RECIPIENTS OF THIS DRAFT ARE INVITED
TO SUBMIT, WITH THEIR COMMENTS,
NOTIFICATION OF ANY RELEVANT PATENT
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Reference number
ISO/DIS 24611-1:2024(en)
DRAFT
ISO/DIS 24611-1:2024(en)
International
Standard
ISO/DIS 24611-1
ISO/TC 37/SC 4
Language resource management —
Secretariat: KATS
Morphosyntactic annotation
Voting begins on:
framework (MAF) —
Part 1:
Voting terminates on:
Core model
Gestion des ressources linguistiques - Cadre d'annotation
morphosyntaxique (MAF) —
Partie 1: Modèle de base
ICS: 01.020
THIS DOCUMENT IS A DRAFT CIRCULATED
FOR COMMENTS AND APPROVAL. IT
IS THEREFORE SUBJECT TO CHANGE
AND MAY NOT BE REFERRED TO AS AN
INTERNATIONAL STANDARD UNTIL
PUBLISHED AS SUCH.
IN ADDITION TO THEIR EVALUATION AS
BEING ACCEPTABLE FOR INDUSTRIAL,
© ISO 2024
TECHNOLOGICAL, COMMERCIAL AND
USER PURPOSES, DRAFT INTERNATIONAL
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
STANDARDS MAY ON OCCASION HAVE TO
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
This document is circulated as received from the committee secretariat. BE CONSIDERED IN THE LIGHT OF THEIR
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
POTENTIAL TO BECOME STANDARDS TO
WHICH REFERENCE MAY BE MADE IN
or ISO’s member body in the country of the requester.
NATIONAL REGULATIONS.
ISO copyright office
RECIPIENTS OF THIS DRAFT ARE INVITED
CP 401 • Ch. de Blandonnet 8
TO SUBMIT, WITH THEIR COMMENTS,
CH-1214 Vernier, Geneva
NOTIFICATION OF ANY RELEVANT PATENT
Phone: +41 22 749 01 11
RIGHTS OF WHICH THEY ARE AWARE AND TO
PROVIDE SUPPORTING DOCUMENTATION.
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland Reference number
ISO/DIS 24611-1:2024(en)
ii
ISO/DIS 24611-1:2024(en)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 The MAF metamodel . 5
4.1 Levels of description in the MAF metamodel .5
4.2 MAF in the standards landscape .6
4.3 Metadata .7
4.4 Structural ambiguities .7
4.5 MAF metamodel in detail .7
5 Token-level segmentation . 9
5.1 Introduction .9
5.2 Formal description: .9
5.3 Embedding notation .10
5.4 Stand-off notation .11
5.5 Normalization and script conversion . 12
5.6 Inline token annotation strategies for token separation . 13
5.6.1 Introduction . 13
5.6.2 Adjacent tokens in embedded mode . 13
5.6.3 Overlapping tokens .14
6 Word-forms as linguistic units .15
6.1 Introduction . 15
6.2 Formal description: .16
6.3 Token attachment .16
6.3.1 One token : one word-form . .16
6.3.2 Several contiguous tokens : one word-form .16
6.3.3 Several discontinuous tokens : one word-form.17
6.3.4 Zero token: one word-form .17
6.3.5 One token : several word-forms .18
6.4 Referencing lexical entries .18
6.5 Compound word-forms .19
6.6 Identification of word-forms . 20
7 Morphosyntactic content .20
7.1 Introduction . 20
7.2 Using feature structures . . 20
7.3 Compact morphosyntactic tags .21
7.4 FSR libraries . .21
7.5 Designing morphosyntactic tagsets . 23
8 Handling ambiguities.24
8.1 Introduction .24
8.2 Word-form content ambiguities .24
8.3 Lexical and structural ambiguities . 25
9 Conformance .25
Annex A (informative) Examples .26
Annex B (informative) Referencing externally defined data categories .31
Bibliography .34
iii
ISO/DIS 24611-1:2024(en)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO documents should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may i
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.