ISO/TC 37/SC 4/WG 6 - Linguistic annotation

	Jan 2026
28	29	30	31	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31
1	2	3	4	5	6	7

	Feb 2026
25	26	27	28	29	30	31
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
1	2	3	4	5	6	7

	Jan 2026
28	29	30	31	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31
1	2	3	4	5	6	7

	Feb 2026
25	26	27	28	29	30	31
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
1	2	3	4	5	6	7

	Jan 2026
28	29	30	31	1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31
1	2	3	4	5	6	7

	Feb 2026
25	26	27	28	29	30	31
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
1	2	3	4	5	6	7

This document specifies the structure of an ontology for a fine-grained description of the expressive power of corpus query languages (CQLs) in terms of search needs. The ontology consists of three interrelated taxonomies of concepts: the CQLF metamodel (a formalization of ISO 24623-1); the expressive power taxonomy, which describes different facets of the expressive power of CQLs; and a taxonomy of CQLs. This document specifies: a) the taxonomy of the CQLF metamodel; b) the topmost layer of the expressive power taxonomy (whose concepts are called “functionalities”); c) the structure of the layers of the expressive power taxonomy and the relationships between them, in the form of subsumption assertions; d) the formalization of the linkage between the CQL taxonomy and the expressive power taxonomy, in the form of positive and negative conformance statements. This document does not define the entire contents of the ontology (see Clause 4).

Standard
23 pages
English language
e-Library read for
AI-Chat
1 day
Standard
18 pages
English language
sale 15% off

ISO

ISO 24623-1:2018(Main)

Language resource management - Corpus query lingua franca (CQLF) - Part 1: Metamodel

SIST ISO 24623-1:2018

ISO 24623-1:2018 describes the abstract metamodel designed to accommodate any corpus query language (QL) and providing a basis for coarse-grained classification. The metamodel consists of several components referred to as CQLF classes, levels, and modules, and is illustrated with examples from the Single-stream class (where a single data stream is used to organize the relevant data structures). Within this class, this document discusses three CQLF levels (Linear, Complex and Concurrent), as well as their subdivisions into modules, dictated by functional and modelling criteria. ISO 24623-1:2018 does not provide a way to specify further details beyond the above-mentioned divisions, and neither does it contain within its scope QLs designed to query more than one concurrent data stream, as in multimodal corpora or in parallel corpora (such QLs can still be classified according to the criteria suggested here for less expressive QLs).

Standard
17 pages
English language
e-Library read for
AI-Chat
1 day
Standard
17 pages
English language
e-Library read for
AI-Chat
1 day
Standard
12 pages
English language
sale 15% off

ISO

ISO 24615-2:2018(Main)

Language resource management - Syntactic annotation framework (SynAF) - Part 2: XML serialization (Tiger vocabulary)

SIST ISO 24615-2:2018

ISO 24615-2:2018 describes an XML-conformant serialization of the ISO 24615‑1 meta-model, with the objective of supporting interoperability across language resources or language processing components in the domain of syntactic annotations. As an extension of ISO 24615‑1, this document is also coordinated with ISO 24612.

Standard
17 pages
English language
e-Library read for
AI-Chat
1 day
Standard
17 pages
English language
e-Library read for
AI-Chat
1 day
Standard
12 pages
English language
sale 15% off

ISO

ISO 24624:2016(Main)

Language resource management - Transcription of spoken language

SIST ISO 24624:2018

ISO 24624:2016 specifies rules for representing transcriptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts. Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

Standard
39 pages
English language
e-Library read for
AI-Chat
1 day
Standard
39 pages
English language
e-Library read for
AI-Chat
1 day
Standard
32 pages
English language
sale 15% off
Standard
34 pages
French language
sale 15% off

ISO

ISO 24615-1:2014(Main)

Language resource management - Syntactic annotation framework (SynAF) - Part 1: Syntactic model

SIST ISO 24615-1:2018

ISO 24615-1:2014 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615-1:2014 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.

Standard
25 pages
English language
e-Library read for
AI-Chat
1 day
Standard
25 pages
English language
e-Library read for
AI-Chat
1 day
Standard
20 pages
English language
sale 15% off
Standard
20 pages
French language
sale 15% off
Standard
26 pages
Russian language
sale 15% off

ISO

ISO 24611:2012(Main)

Language resource management - Morpho-syntactic annotation framework (MAF)

SIST ISO 24611:2013

ISO 24611:2012 provides a framework for the representation of annotations of word-forms in texts; such annotations concern tokens, their relationship with lexical units, and their morpho-syntactic properties.It describes a metamodel for morpho-syntactic annotation that relates to a reference to the data categories contained in the ISOCat data category registry (DCR, as defined in ISO 12620). It also describes an XML serialization for morpho-syntactic annotations, with equivalences to the guidelines of the TEI (text encoding initiative).

Standard
65 pages
English language
e-Library read for
AI-Chat
1 day
Standard
65 pages
English language
e-Library read for
AI-Chat
1 day
Standard
58 pages
English language
sale 15% off
Standard
63 pages
French language
sale 15% off
Standard
58 pages
Russian language
sale 15% off
Standard
80 pages
Russian language
sale 15% off

ISO

ISO 24614-2:2011(Main)

Language resource management - Word segmentation of written texts - Part 2: Word segmentation for Chinese, Japanese and Korean

SIST ISO 24614-2:2014

The basic concepts and general principles of word segmentation as defined in ISO 24614-1 apply to Chinese, Japanese and Korean. Text needs to be segmented into tokens, words, phrases or some other types of smaller textual units in order to perform certain computational applications on language resources, such as natural language processing, information retrieval and machine translation. ISO 24614-2:2011 is restricted to the segmentation of a text into words or other word segmentation units (WSUs). This task is distinct from morphological or syntactic analysis per se, although it greatly depends on morphosyntactic analysis. It is also different from the task of laying out a framework for constructing a lexicon and identifying its lexical entries, namely lemmas and lexemes. The frameworks for the latter tasks are provided by ISO 24611, ISO 24613 and ISO 24615. ISO 24614-2:2011 specifies rules for delineating WSUs for Chinese, Japanese and Korean. Some rules are common to all three languages, though each language also has its own distinct rules for identifying WSUs. The common features are discussed, then the distinct rules are laid out for Chinese, for Japanese and for Korean.

Standard
49 pages
English language
e-Library read for
AI-Chat
1 day
Standard
49 pages
English language
e-Library read for
AI-Chat
1 day
Standard
43 pages
English language
sale 15% off

ISO

ISO 24614-1:2010(Main)

Language resource management - Word segmentation of written texts - Part 1: Basic concepts and general principles

SIST ISO 24614-1:2013

ISO 24614-1:2010 presents the basic concepts and general principles of word segmentation, and provides language-independent guidelines to enable written texts to be segmented, in a reliable and reproducible manner, into word segmentation units (WSU). The many applications and fields that need to segment texts into words - and thus to which ISO 24614-1:2010 can be applied - include translation, content management, speech technologies, computational linguistics and lexicography.

Standard
20 pages
English language
e-Library read for
AI-Chat
1 day
Standard
20 pages
English language
e-Library read for
AI-Chat
1 day
Standard
15 pages
English language
sale 15% off

ISO

ISO 24615:2010(Main)

Language resource management - Syntactic annotation framework (SynAF)

SIST ISO 24615:2013

ISO 24615:2010 describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. ISO 24615:2010 is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.

Standard
23 pages
English language
e-Library read for
AI-Chat
1 day
Standard
18 pages
English language
sale 15% off

18-Oct-2010
18-Oct-2010
01.020
ISO/TC 37/SC 4

ISO/TC 37/SC 4/WG 6 - Linguistic annotation

Annotation linguistique

General Information

Frequently Asked Questions