Language resource management -- Syntactic annotation framework (SynAF) -- Part 1: Syntactic model
Language resource management -- Syntactic annotation framework (SynAF) -- Part 1: Syntactic model
This part of ISO 24615 describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability
across language resources or language processing components. This part of ISO 24615 is complementary
and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a
metamodel for syntactic representations as well as reference data categories for representing both
constituency and dependency information in sentences or other comparable utterances and segments.
Gestion de ressources langagières -- Cadre d'annotation syntaxique (SynAF) -- Partie 1: Modèle syntaxique
Cette partie de l'ISO 24615 décrit le cadre d'annotation syntaxique (SynAF), modèle de haut niveau pour représenter l'annotation syntaxique des données linguistiques, dans le but d'offrir l'interopérabilité entre les ressources linguistiques ou les composants du traitement de la langue. La présente partie de l'ISO 24615 est complémentaire par rapport à l'ISO 24611 (MAF, cadre d'annotation morpho-syntaxique) et y est étroitement liée. Elle fournit un métamodèle pour les représentations syntaxiques, avec les catégories de données de référence destinées à représenter tout aussi bien des informations de constituance que des informations de dépendance, dans des phrases ou autres énonciations et segments du même ordre.
Upravljanje z jezikovnimi viri - Ogrodje za skladenjsko označevanje (SynAF) - 1. del: Model skladnje
Ta del standarda ISO 24615 opisuje ogrodje za skladenjsko označevanje (SynAF), ki je večravninski model za predstavitev skladenjskega označevanja jezikovnih podatkov, da se zagotovi podpora interoperabilnosti med jezikovnimi viri ali komponentami za obdelavo jezikov. Ta del standarda ISO 24615 dopolnjuje standard ISO 24611 (MAF, ogrodje za oblikoskladenjsko označevanje) in je tesno povezan z njim, pri čemer določa metamodel za skladenjske predstavitve in referenčne podatkovne kategorije za predstavitev podatkov o sestavi ter odvisnosti v stavkih ali drugih primerljivih izjavah in segmentih.
Upravljanje z jezikovnimi viri - Ogrodje za skladenjsko označevanje (SynAF) - 1.
del: Model skladnje
Language resource management -- Syntactic annotation framework (SynAF) -- Part 1:
Syntactic model
Gestion de ressources langagières -- Cadre d'annotation syntaxique (SynAF) -- Partie 1:
Modèle syntaxique
SIST ISO 24615-1:2018
SIST ISO 24615-1:2018
STANDARD 24615-1
First edition
Language resource management —
Syntactic annotation framework
(SynAF) —
Part 1:
Syntactic model
Gestion de ressources langagières — Cadre d’annotation syntaxique
(SynAF) —
Partie 1: Modèle syntaxique
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT) see the following URL: Foreword - Supplementary information.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
This first edition of ISO 24615-1 cancels and replaces ISO 24615:2010, of which it constitutes a minor
ISO 24615 (all parts) is designed to coordinate closely with ISO 24612, Language resource management —
Linguistic annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical
markup framework (LMF), and ISO 24611, Language resource management — Morpho-syntactic
annotation framework.
ISO 24615 consists of the following parts, under the general title Language resource management —
Syntactic annotation framework (SynAF):
— Part 1: Syntactic model
The following part is under preparation:
— Part 2: XML serialization ()
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
ISO 24615 is based on numerous projects and pre-standardisation activities that have taken place in
the last few years (see Abeillé, 2001 ), to provide reference models and formats for the representation
of syntactic information, whether as the output of a syntactic parser, or as annotations of language
resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto standard for
treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-stuttgart.
de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni (2003)
] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the
ISO 24615 (SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed
share a common data model that offers a good basis for the SynAF metamodel (see the study made in
Deliverable D.3.1 “Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU
project LIRICS, available at
This part of ISO 24615 proposes a metamodel for syntactic annotation together with a list of relevant
data categories for syntactic annotation. The data categories are available on the ISOCat server (http:// in the syntax profile (as defined in ISO 12620:2009).
SIST ISO 24615-1:2018
SIST ISO 24615-1:2018
Language resource management — Syntactic annotation
framework (SynAF) —
Part 1:
Syntactic model
1 Scope
This part of ISO 24615 describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability
across language resources or language processing components. This part of ISO 24615 is complementary
and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a
metamodel for syntactic representations as well as reference data categories for representing both
constituency and dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611:2012, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1:2000, ISO 12620:2009,
ISO 24611:2012 and the following apply.
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
Note 1 to entry: Adverbs are possible adjuncts for a sentence.
non-recursive constituent (3.4)
group of phrases (3.14), usually containing a predicate
Note 1 to entry: A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb.
A main clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a
constituent (3.4).
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammaticalfunctions
(3.7) that constituents play in relation to each other
syntactic edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
Note 1 to entry: Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
grammatical function
grammatical role of a wordform (3.24) or constituent (3.4) within its embedding syntactic environment
Note 1 to entry: For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as
a subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and
the main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed
under the concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
syntactic head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is
feminine, then the gender of the entire constituent will be feminine)
Note 1 to entry: The head of a constituent usually cannot be left out.
linguistic annotation
feature-value pair denoting a linguistic property of a linguistic segment
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
Note 1 to entry: In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
Note 1 to entry: A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-
modifier). Modifiers are optional in a constituent.
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
Note 1 to entry: A non-terminal node has an outgoing constituency edge (3.6).
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical
function (3.7), e.g. in a clause (3.3)
Note 1 to entry: Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and
having the role of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases,
verb phrases, adjective phrases, adverbial phrases and prepositional phrases. Phrases have been informally
described as “bloated words”, in that the parts of the phrase added to the head elaborate and specify the reference
of the head. In our model, a phrase is a special case of a constituent (3.4).
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
Note 1 to entry: A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk
about “utterances” rather than sentences.
pair of points (p1, p2), where p1 ⩽ p2, identifying the segment of the document to which an annotation
(3.9) is applied
Note 1 to entry: A multiple span is a sequence of spans where the ending point of each span is less than or equal to
the starting point of the subsequent span.
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
Note 1 to entry: A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with
a verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
Note 1 to entry: The subject, indirect object and direct object are subcategorized grammatical functions (3.7)
within a sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase
(3.14) or the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase
or verb)
Note 1 to entry: For verbs and verbal phrases, arguments identify the participants in the process referred to by
the verb. In some frameworks, syntactic arguments are called complements.
syntactic graph
connected set of syntactic nodes (3.12) and edges (3.6)
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
syntactic tree
syntactic graph (3.20) in which each node has a single parent
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing
the relations that exist between those units
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements
at the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations
between categories of the same kind (such as the head-head relations between nouns in appositions,
or nominal coordinations in some formalisms). The dependency information can exist between
morpho-syntactically annotated items within a phrase (an adjective is the modifier of the head
noun within an NP) or describe a specific relation between syntactic constituents at the clausal
and sentential level (i.e. an NP being the “subject” of the main verb of a clause or sentence). The
dependency relation can also be stated for empty elements (e.g. the pro element in romance
languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy
interrelating syntactic annotation for both constituency and dependency as stated in the SynAF
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value
pairs, which represent the associated syntactic data categories. The SynAF textual descriptions specify
more complete information about the SynAF classes, relations and extensions than can be included in
the UML diagram. Developers shall define a data category selection (DCS) as specified for SynAF data
category selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
4 © ISO 2014 – All rights reserved
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Figure 1 — SynAF metamodel (articulated with MAF)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of
non-terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see
3.6, syntactic edges).
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one
or more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_
Nodes are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes,
with one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/
(see Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/, /
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as
well as (see Figure 1) the application of morphosyntactic information to MAF annotated data.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Annex A
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in
combination with the SynAF metamodel. When necessary, specific applications may define additional
data categories, which shall be described in compliance with ISO 12620 and provided in the ISOCat data
category registry.
A.2 Basic syntactic data categories
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation
among them
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is pho-
nologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized
forms are restricted to the contracted forms of auxiliaries, as in I’m, she’ll, etc.
However in some instances, articles are also referred to as clitics.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] mechanism allowing the construction of words into phrases, phrases into higher
phrases or clauses, and clauses into sentences
— Note [en] The construction of sentences into text is not usually called constituency.
Definition [en] union of constituency and dependency
Definition [en] property of a grammatical unit sharing a boundary with another
Definition [en] process of fully decoding the clauses and relations present in a sentence
Definition [en] mechanism allowing the linking of words, or in some formalisms also phrases and
clauses, based on the binary head-dependent division and a possible annotation of
grammatical function
Definition [en] construction consisting of two negative forms in the same clause
— Note [en] Example: In English, “I’m not unhappy”.
Definition [en] annotation that is added in the text
— Note [en] The original organization of the text is modified.
/enclitic/ - BC: /clitic/
Definition [en] clitic that depends upon a preceding word
Definition [en] before anything according to a certain order
Definition [en] hybrid style annotation where standoff and embedded are mixed
/morphosyntacticAnnotation/ - BC: /annotation/
Definition [en] annotation related to the morphology of the words and their part of speech
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] construction that expresses the contradiction of some or all of a sentence’s, word’s
or phrase’s meaning
— Note [en] Negation may be based on negative particles (like “not”) or on prefixes (like “un”,
or “non”). Example: In English, “I’m not happy”.
Definition [en] immediately afterwards
Definition [en] the default edge expressing the constituency relationship, originating in a con-
stituent and terminating in a component of that constituent
Definition [en] a phrase or word in a clause which provides a statement regarding the subject of
that clause. Most clauses can thus be divided into a subject and predicate, where
the predicate is a function expanding on the subject.
— Note [en] Example: “Kevin kicks the ball” is seen as a subject (“Kevin”) associated with a
predicate phrase (“kicks the ball”).
Definition [en] immediately before
Definition [fr] immédiatement avant
Name [en] previous
Name [fr] précédent
/proclitic/ - BC: /clitic/
Definition [en] clitic that depends upon a following word
— Note [en] Example: “the” in “the boy”.
Definition [en] act of spreading a linguistic property from a grammatical unit to another
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] an indirect edge expressing syntactic constituency. These edges may be used to
express the relationship between a head and a coreferent of its omitted depend-
— Note [en] Example: In “I saw Bill, but went straight back home afterwards”, “I” may serve as
an explicit subject to the first clause, dominated by a primary syntactic edge, but
in the second clause, a further secondary syntactic edge leading to “I” can make
it clear that it is also the subject of the second clause, without being one of the
explicit parts of that clause, which are dominated by primary edges. This device is
used in some formalisms to avoid the introduction of empty elements standing in
for such ‘missing’ bearers of grammatical function.
Definition [en] process of identifying the chunks in a sentence
Definition [en] annotation that is recorded externally from the grammatical units and that refers
to these units
— Note [en] The original organization of the text is kept unchanged.
/syntacticAnnotation/ - BC: /annotation/
Definition [en] annotation describing constituency and/or dependency
— Note [en] syntactic annotation does not directly deal with the meaning of an utterance
Definition [en] feature used in the description of the syntax of a language
Conceptual Domain /primarySyntacticEdge/, /secondarySyntacticEdge/
Definition [en] characterizes the syntactic edge according to its role in the syntactic representa-
Definition [en] rule that limits what the syntax allows in a particular language
Definition [en] process of annotating the part of speech for every word
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] property for a clause beginning by a question word
— Note [en] In English, “who is he ?” is a whType question.
Definition [en] property for a clause where only a positive or a negative answer or position is pos-
— Note [en] In English, “Are you coming?” is a yesNoType question.
A.3 Constituency related data categories
/adjectiveChunk/ - BC: /chunk/
Definition [en] chunk headed by an adjective
/adjectivePhrase/ - BC: /phrase/
Definition [en] phrase headed by an adjective
/adpositionChunk/ - BC: /chunk/
Definition [en] chunk introduced by one or several adpositions that are not necessarily contigu-
ous and on the same end of the chunk
/adpositionPhrase/ - BC: /phrase/
Definition [en] phrase introduced by one or several adpositions and containing a complement
such as a noun phrase
— Note [en] The adpositions are not necessarily contiguous and on the same end of the phrase.
/adverbChunk/ - BC: /chunk/
Definition [en] chunk headed by an adverb
/adverbPhrase/ - BC: /phrase/
Definition [en] phrase headed by an adverb
/chunk/ - BC: /grammaticalUnit/
Definition [en] flat sequence of words typically containing more than one word
— Note [en] A chunk cannot contain any sub-structures. A chunk is frequently similar to a
phrase and mostly continuous.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
/clause/ - BC: /grammaticalUnit/
Definition [en] unit of grammatical organization smaller than or equal to the sentence but larger
than phrases and words, and generally containing its own predicate
— Note [en] The traditional classification is of clausal units into main (independent or superor-
dinate) and subordinate (or dependent) clauses, e.g. the boy arrived (main clause)
after the rain started (subordinate clause). A clause may form a whole sentence, as
in “they came”. A clause may contain sub-clauses.
/comparativePhrase/ - BC: /phrase/
Definition [en] phrase expressing a comparative meaning
— Note [en] In English, there is both an inflection (e.g. larger) and a comparative phrase con-
struction (e.g. more beautiful) to express the comparative.
/coordinatedPhrase/ - BC: /phrase/
Definition [en]
ISO 24615-1:2014(E)
© ISO 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
Published in Switzerland
ISO 24615-1:2014(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 SynAF metamodel . 4
4.1 Introduction . 4
4.2 SynAF metamodel . . 4
Annex A (normative) Data categories for SynAF . 7
Annex B (informative) Relation to the Linguistic Annotation Framework .18
Bibliography .20
ISO 24615-1:2014(E)
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT) see the following URL: Foreword - Supplementary information.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
This first edition of ISO 24615-1 cancels and replaces ISO 24615:2010, of which it constitutes a minor
ISO 24615 (all parts) is designed to coordinate closely with ISO 24612, Language resource management —
Linguistic annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical
markup framework (LMF), and ISO 24611, Language resource management — Morpho-syntactic
annotation framework.
ISO 24615 consists of the following parts, under the general title Language resource management —
Syntactic annotation framework (SynAF):
— Part 1: Syntactic model
The following part is under preparation:
— Part 2: XML serialization ()
ISO 24615-1:2014(E)
ISO 24615 is based on numerous projects and pre-standardisation activities that have taken place in
the last few years (see Abeillé, 2001 ), to provide reference models and formats for the representation
of syntactic information, whether as the output of a syntactic parser, or as annotations of language
resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto standard for
treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-stuttgart.
de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni (2003)
] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the
ISO 24615 (SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed
share a common data model that offers a good basis for the SynAF metamodel (see the study made in
Deliverable D.3.1 “Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU
project LIRICS, available at
This part of ISO 24615 proposes a metamodel for syntactic annotation together with a list of relevant
data categories for syntactic annotation. The data categories are available on the ISOCat server (http:// in the syntax profile (as defined in ISO 12620:2009).
Language resource management — Syntactic annotation
framework (SynAF) —
Part 1:
Syntactic model
1 Scope
This part of ISO 24615 describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability
across language resources or language processing components. This part of ISO 24615 is complementary
and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a
metamodel for syntactic representations as well as reference data categories for representing both
constituency and dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611:2012, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1:2000, ISO 12620:2009,
ISO 24611:2012 and the following apply.
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
Note 1 to entry: Adverbs are possible adjuncts for a sentence.
non-recursive constituent (3.4)
group of phrases (3.14), usually containing a predicate
Note 1 to entry: A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb.
A main clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a
constituent (3.4).
ISO 24615-1:2014(E)
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammaticalfunctions
(3.7) that constituents play in relation to each other
syntactic edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
Note 1 to entry: Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
grammatical function
grammatical role of a wordform (3.24) or constituent (3.4) within its embedding syntactic environment
Note 1 to entry: For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as
a subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and
the main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed
under the concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
syntactic head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is
feminine, then the gender of the entire constituent will be feminine)
Note 1 to entry: The head of a constituent usually cannot be left out.
linguistic annotation
feature-value pair denoting a linguistic property of a linguistic segment
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
Note 1 to entry: In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
Note 1 to entry: A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-
modifier). Modifiers are optional in a constituent.
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
ISO 24615-1:2014(E)
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
Note 1 to entry: A non-terminal node has an outgoing constituency edge (3.6).
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical
function (3.7), e.g. in a clause (3.3)
Note 1 to entry: Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and
having the role of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases,
verb phrases, adjective phrases, adverbial phrases and prepositional phrases. Phrases have been informally
described as “bloated words”, in that the parts of the phrase added to the head elaborate and specify the reference
of the head. In our model, a phrase is a special case of a constituent (3.4).
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
Note 1 to entry: A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk
about “utterances” rather than sentences.
pair of points (p1, p2), where p1 ⩽ p2, identifying the segment of the document to which an annotation
(3.9) is applied
Note 1 to entry: A multiple span is a sequence of spans where the ending point of each span is less than or equal to
the starting point of the subsequent span.
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
Note 1 to entry: A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with
a verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
Note 1 to entry: The subject, indirect object and direct object are subcategorized grammatical functions (3.7)
within a sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase
(3.14) or the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase
or verb)
Note 1 to entry: For verbs and verbal phrases, arguments identify the participants in the process referred to by
the verb. In some frameworks, syntactic arguments are called complements.
syntactic graph
connected set of syntactic nodes (3.12) and edges (3.6)
ISO 24615-1:2014(E)
syntactic tree
syntactic graph (3.20) in which each node has a single parent
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing
the relations that exist between those units
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements
at the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations
between categories of the same kind (such as the head-head relations between nouns in appositions,
or nominal coordinations in some formalisms). The dependency information can exist between
morpho-syntactically annotated items within a phrase (an adjective is the modifier of the head
noun within an NP) or describe a specific relation between syntactic constituents at the clausal
and sentential level (i.e. an NP being the “subject” of the main verb of a clause or sentence). The
dependency relation can also be stated for empty elements (e.g. the pro element in romance
languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy
interrelating syntactic annotation for both constituency and dependency as stated in the SynAF
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value
pairs, which represent the associated syntactic data categories. The SynAF textual descriptions specify
more complete information about the SynAF classes, relations and extensions than can be included in
the UML diagram. Developers shall define a data category selection (DCS) as specified for SynAF data
category selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
ISO 24615-1:2014(E)
Figure 1 — SynAF metamodel (articulated with MAF)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of
non-terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see
3.6, syntactic edges).
ISO 24615-1:2014(E)
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one
or more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_
Nodes are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes,
with one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/
(see Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/, /
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as
well as (see Figure 1) the application of morphosyntactic information to MAF annotated data.
ISO 24615-1:2014(E)
Annex A
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in
combination with the SynAF metamodel. When necessary, specific applications may define additional
data categories, which shall be described in compliance with ISO 12620 and provided in the ISOCat data
category registry.
A.2 Basic syntactic data categories
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation
among them
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is pho-
nologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized
forms are restricted to the contracted forms of auxiliaries, as in I’m, she’ll, etc.
However in some instances, articles are also referred to as clitics.
ISO 24615-1:2014(E)
Definition [en] mechanism allowing the construction of words into phrases, phrases into higher
phrases or clauses, and clauses into sentences
— Note [en] The construction of sentences into text is not usually called constituency.
Definition [en] union of constituency and dependency
Definition [en] property of a grammatical unit sharing a boundary with another
Definition [en] process of fully decoding the clauses and relations present in a s
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
© ISO 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
Published in Switzerland
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 SynAF metamodel . 4
4.1 Introduction . 4
4.2 SynAF metamodel . . 4
Annex A (normative) Data categories for SynAF . 7
Annex B (informative) Relation to the Linguistic Annotation Framework .18
Bibliography .20
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT) see the following URL: Foreword - Supplementary information.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
This first edition of ISO 24615-1 cancels and replaces ISO 24615:2010, of which it constitutes a minor
ISO 24615 (all parts) is designed to coordinate closely with ISO 24612, Language resource management —
Linguistic annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical
markup framework (LMF), and ISO 24611, Language resource management — Morpho-syntactic
annotation framework.
ISO 24615 consists of the following parts, under the general title Language resource management —
Syntactic annotation framework (SynAF):
— Part 1: Syntactic model
The following part is under preparation:
— Part 2: XML serialization ()
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
ISO 24615 is based on numerous projects and pre-standardisation activities that have taken place in
the last few years (see Abeillé, 2001 ), to provide reference models and formats for the representation
of syntactic information, whether as the output of a syntactic parser, or as annotations of language
resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto standard for
treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-stuttgart.
de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni (2003)
] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the
ISO 24615 (SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed
share a common data model that offers a good basis for the SynAF metamodel (see the study made in
Deliverable D.3.1 “Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU
project LIRICS, available at
This part of ISO 24615 proposes a metamodel for syntactic annotation together with a list of relevant
data categories for syntactic annotation. The data categories are available on the ISOCat server (http:// in the syntax profile (as defined in ISO 12620:2009).
SIST ISO 24615-1:2018
SIST ISO 24615-1:2018
Language resource management — Syntactic annotation
framework (SynAF) —
Part 1:
Syntactic model
1 Scope
This part of ISO 24615 describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability
across language resources or language processing components. This part of ISO 24615 is complementary
and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a
metamodel for syntactic representations as well as reference data categories for representing both
constituency and dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611:2012, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1:2000, ISO 12620:2009,
ISO 24611:2012 and the following apply.
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
Note 1 to entry: Adverbs are possible adjuncts for a sentence.
non-recursive constituent (3.4)
group of phrases (3.14), usually containing a predicate
Note 1 to entry: A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb.
A main clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a
constituent (3.4).
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammaticalfunctions
(3.7) that constituents play in relation to each other
syntactic edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
Note 1 to entry: Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
grammatical function
grammatical role of a wordform (3.24) or constituent (3.4) within its embedding syntactic environment
Note 1 to entry: For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as
a subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and
the main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed
under the concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
syntactic head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is
feminine, then the gender of the entire constituent will be feminine)
Note 1 to entry: The head of a constituent usually cannot be left out.
linguistic annotation
feature-value pair denoting a linguistic property of a linguistic segment
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
Note 1 to entry: In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
Note 1 to entry: A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-
modifier). Modifiers are optional in a constituent.
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
Note 1 to entry: A non-terminal node has an outgoing constituency edge (3.6).
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical
function (3.7), e.g. in a clause (3.3)
Note 1 to entry: Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and
having the role of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases,
verb phrases, adjective phrases, adverbial phrases and prepositional phrases. Phrases have been informally
described as “bloated words”, in that the parts of the phrase added to the head elaborate and specify the reference
of the head. In our model, a phrase is a special case of a constituent (3.4).
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
Note 1 to entry: A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk
about “utterances” rather than sentences.
pair of points (p1, p2), where p1 ⩽ p2, identifying the segment of the document to which an annotation
(3.9) is applied
Note 1 to entry: A multiple span is a sequence of spans where the ending point of each span is less than or equal to
the starting point of the subsequent span.
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
Note 1 to entry: A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with
a verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
Note 1 to entry: The subject, indirect object and direct object are subcategorized grammatical functions (3.7)
within a sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase
(3.14) or the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase
or verb)
Note 1 to entry: For verbs and verbal phrases, arguments identify the participants in the process referred to by
the verb. In some frameworks, syntactic arguments are called complements.
syntactic graph
connected set of syntactic nodes (3.12) and edges (3.6)
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
syntactic tree
syntactic graph (3.20) in which each node has a single parent
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing
the relations that exist between those units
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements
at the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations
between categories of the same kind (such as the head-head relations between nouns in appositions,
or nominal coordinations in some formalisms). The dependency information can exist between
morpho-syntactically annotated items within a phrase (an adjective is the modifier of the head
noun within an NP) or describe a specific relation between syntactic constituents at the clausal
and sentential level (i.e. an NP being the “subject” of the main verb of a clause or sentence). The
dependency relation can also be stated for empty elements (e.g. the pro element in romance
languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy
interrelating syntactic annotation for both constituency and dependency as stated in the SynAF
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value
pairs, which represent the associated syntactic data categories. The SynAF textual descriptions specify
more complete information about the SynAF classes, relations and extensions than can be included in
the UML diagram. Developers shall define a data category selection (DCS) as specified for SynAF data
category selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Figure 1 — SynAF metamodel (articulated with MAF)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of
non-terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see
3.6, syntactic edges).
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one
or more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_
Nodes are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes,
with one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/
(see Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/, /
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as
well as (see Figure 1) the application of morphosyntactic information to MAF annotated data.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Annex A
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in
combination with the SynAF metamodel. When necessary, specific applications may define additional
data categories, which shall be described in compliance with ISO 12620 and provided in the ISOCat data
category registry.
A.2 Basic syntactic data categories
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation
among them
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is pho-
nologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized
forms are restricted to the contracted forms of auxiliaries, as in I’m, she’ll, etc.
However in some instances, articles are also referred to as clitics.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] mechanism allowing the construction of words into phrases, phrases into higher
phrases or clauses, and clauses into sentences
— Note [en] The construction of sentences into text is not usually called constituency.
Definition [en] union of constituency and dependency
Definition [en] property of a grammatical unit sharing a boundary with another
Definition [en] process of fully decoding the clauses and relations present in a sentence
Definition [en] mechanism allowing the linking of words, or in some formalisms also phrases and
clauses, based on the binary head-dependent division and a possible annotation of
grammatical function
Definition [en] construction consisting of two negative forms in the same clause
— Note [en] Example: In English, “I’m not unhappy”.
Definition [en] annotation that is added in the text
— Note [en] The original organization of the text is modified.
/enclitic/ - BC: /clitic/
Definition [en] clitic that depends upon a preceding word
Definition [en] before anything according to a certain order
Definition [en] hybrid style annotation where standoff and embedded are mixed
/morphosyntacticAnnotation/ - BC: /annotation/
Definition [en] annotation related to the morphology of the words and their part of speech
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] construction that expresses the contradiction of some or all of a sentence’s, word’s
or phrase’s meaning
— Note [en] Negation may be based on negative particles (like “not”) or on prefixes (like “un”,
or “non”). Example: In English, “I’m not happy”.
Definition [en] immediately afterwards
Definition [en] the default edge expressing the constituency relationship, originating in a con-
stituent and terminating in a component of that constituent
Definition [en] a phrase or word in a clause which provides a statement regarding the subject of
that clause. Most clauses can thus be divided into a subject and predicate, where
the predicate is a function expanding on the subject.
— Note [en] Example: “Kevin kicks the ball” is seen as a subject (“Kevin”) associated with a
predicate phrase (“kicks the ball”).
Definition [en] immediately before
Definition [fr] immédiatement avant
Name [en] previous
Name [fr] précédent
/proclitic/ - BC: /clitic/
Definition [en] clitic that depends upon a following word
— Note [en] Example: “the” in “the boy”.
Definition [en] act of spreading a linguistic property from a grammatical unit to another
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] an indirect edge expressing syntactic constituency. These edges may be used to
express the relationship between a head and a coreferent of its omitted depend-
— Note [en] Example: In “I saw Bill, but went straight back home afterwards”, “I” may serve as
an explicit subject to the first clause, dominated by a primary syntactic edge, but
in the second clause, a further secondary syntactic edge leading to “I” can make
it clear that it is also the subject of the second clause, without being one of the
explicit parts of that clause, which are dominated by primary edges. This device is
used in some formalisms to avoid the introduction of empty elements standing in
for such ‘missing’ bearers of grammatical function.
Definition [en] process of identifying the chunks in a sentence
Definition [en] annotation that is recorded externally from the grammatical units and that refers
to these units
— Note [en] The original organization of the text is kept unchanged.
/syntacticAnnotation/ - BC: /annotation/
Definition [en] annotation describing constituency and/or dependency
— Note [en] syntactic annotation does not directly deal with the meaning of an utterance
Definition [en] feature used in the description of the syntax of a language
Conceptual Domain /primarySyntacticEdge/, /secondarySyntacticEdge/
Definition [en] characterizes the syntactic edge according to its role in the syntactic representa-
Definition [en] rule that limits what the syntax allows in a particular language
Definition [en] process of annotating the part of speech for every word
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
Definition [en] property for a clause beginning by a question word
— Note [en] In English, “who is he ?” is a whType question.
Definition [en] property for a clause where only a positive or a negative answer or position is pos-
— Note [en] In English, “Are you coming?” is a yesNoType question.
A.3 Constituency related data categories
/adjectiveChunk/ - BC: /chunk/
Definition [en] chunk headed by an adjective
/adjectivePhrase/ - BC: /phrase/
Definition [en] phrase headed by an adjective
/adpositionChunk/ - BC: /chunk/
Definition [en] chunk introduced by one or several adpositions that are not necessarily contigu-
ous and on the same end of the chunk
/adpositionPhrase/ - BC: /phrase/
Definition [en] phrase introduced by one or several adpositions and containing a complement
such as a noun phrase
— Note [en] The adpositions are not necessarily contiguous and on the same end of the phrase.
/adverbChunk/ - BC: /chunk/
Definition [en] chunk headed by an adverb
/adverbPhrase/ - BC: /phrase/
Definition [en] phrase headed by an adverb
/chunk/ - BC: /grammaticalUnit/
Definition [en] flat sequence of words typically containing more than one word
— Note [en] A chunk cannot contain any sub-structures. A chunk is frequently similar to a
phrase and mostly continuous.
SIST ISO 24615-1:2018
ISO 24615-1:2014(E)
/clause/ - BC: /grammaticalUnit/
Definition [en] unit of grammatical organization smaller than or equal to the sentence but larger
than phrases and words, and generally containing its own predicate
— Note [en] The traditional classification is of clausal units into main (independent or superor-
dinate) and subordinate (or dependent) clauses, e.g. the boy arrived (main clause)
after the rain started (subordinate clause). A clause may form a whole sentence, as
in “they came”. A clause may contain sub-clauses.
/comparativePhrase/ - BC: /phrase/
Definition [en] phrase expressing a comparative meaning
— Note [en] In English, there is both an inflection (e.g. larger) and a comparative phrase con-
struction (e.g. more beautiful) to express the comparative.
/coordinatedPhrase/ - BC: /phrase/
Definition [en] phrase expressing a coordination
ISO 24615-1:2014(F)
© ISO 2014
Tous droits réservés. Sauf prescription différente ou nécessité dans le contexte de sa mise en œuvre, aucune partie de cette
publication ne peut être reproduite ni utilisée sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique,
y compris la photocopie, ou la diffusion sur l’internet ou sur un intranet, sans autorisation écrite préalable. Une autorisation peut
être demandée à l’ISO à l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Case postale 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Genève
Tél.: +41 22 749 01 11
Fax: +41 22 749 09 47
Publié en Suisse
ISO 24615-1:2014(F)
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d'application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Métamodèle SynAF . 4
4.1 Introduction . 4
4.2 Métamodèle SynAF . 5
4.2.1 Vue d’ensemble . 5
4.2.2 Classe SyntacticNode. 5
4.2.3 Classe T_Node . 5
4.2.4 Classe NT_Node . 5
4.2.5 Classe SyntacticEdge . 5
4.2.6 Classe Annotation . 5
Annexe A (normative) Catégories de données pour SynAF. 7
Annexe B (informative) Relation avec le cadre d’annotation linguistique .17
Bibliographie .19
ISO 24615-1:2014(F)
L'ISO (Organisation internationale de normalisation) est une fédération mondiale d'organismes
nationaux de normalisation (comités membres de l'ISO). L'élaboration des Normes internationales est
en général confiée aux comités techniques de l'ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l'ISO participent également aux travaux.
L'ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d'approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www
.iso .org/directives).
L'attention est appelée sur le fait que certains des éléments du présent document peuvent faire l'objet de
droits de propriété intellectuelle ou de droits analogues. L'ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l'élaboration du document sont indiqués dans l'Introduction et/ou dans la liste des déclarations de
brevets reçues par l'ISO (voir www .iso .org/brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
Pour une explication de la signification des termes et expressions spécifiques de l'ISO liés à l'évaluation
de la conformité, ou pour toute information au sujet de l'adhésion de l'ISO aux principes de l’Organisation
mondiale du commerce (OMC) concernant les obstacles techniques au commerce (OTC), voir le lien
suivant: www .iso .org/iso/fr/avant-propos .html.
Le comité chargé de l'élaboration du présent document est l’ISO/TC 37, Langage et terminologie, sous-
comité SC 4, Gestion des ressources linguistiques.
Cette première édition de l’ISO 24615-1 annule et remplace l’ISO 24615:2010 dont elle constitue une
révision mineure.
L’ISO 24615, dans toutes ses parties, est conçue de manière coordonnée avec l’ISO 24612, Gestion des
ressources linguistiques – Cadre d’annotation linguistique (LAF), l’ISO 24613:2008, Gestion des ressources
linguistiques – Cadre d’annotation lexicale (LMF), et l’ISO 24611, Gestion des ressources langagières –
Cadre d’annotation morpho-syntaxique.
L’ISO 24615 comporte les parties suivantes, sous le titre général Gestion des ressources linguistiques –
Cadre d’annotation syntaxique (SynAF):
— Partie 1: Modèle syntaxique
La partie suivante est en cours d’élaboration:
— Part 2: Sérialisation XML ()
ISO 24615-1:2014(F)
L’ISO 24615 se fonde sur de nombreux projets et activités de pré-normalisation qui ont été menés au
cours des dernières années (voir Abeillé, 2001 ) en vue de constituer des modèles de référence et
des formats pour la représentation de l’information syntaxique, soit sous forme de sortie d’analyseur
syntaxique, soit sous forme d’annotations de ressources linguistiques (treebanks). Depuis plusieurs
années l’initiative Penn Treebank constitue le standard de facto pour l’annotation syntaxique, mais
des travaux plus récents, par exemple l’initiative Negra/Tiger (voir http: //www .ims .uni -stuttgart
.de/projekte/TIGER/TIGERCorpus/) en Allemagne ou l’initiative ISST en Italie (voir Montemagni
(2003) ) démontre la viabilité d’un cadre plus cohérent qui tient compte à la fois des constituants
hiérarchiques et des phénomènes de dépendance syntaxique dans l’annotation syntaxique.
Le projet eContent “LIRICS” est un projet majeur car il a rassemblé un groupe d’experts, initiateur du projet
ISO 24615 (SynAF). Lors de la préparation de SynAF, ce groupe a confirmé que les initiatives actuelles
partagent effectivement un modèle de données commun qui offre une bonne base de départ pour le
métamodèle SynAF (voir l’étude “Evaluation of initiatives for morpho-syntactic and syntactic annotation”
du projet européen LIRICS, disponible à l’adresse http: //lirics .loria .fr/doc _pub/Del3 _1 _V2 .pdf).
Cette partie de l’ISO 24615 propose un métamodèle pour l’annotation syntaxique ainsi qu’une liste
de catégories de données pertinente pour l’annotation syntaxique. Les catégories de données sont
disponibles sur le serveur ISOCat (http: //www .isocat .org/) au sein du profil syntaxe (tel que défini
dans l’ISO 12620:2009).
Gestion de ressources linguistiques — Cadre d'annotation
syntaxique (SynAF) —
Partie 1:
Modèle syntaxique
1 Domaine d'application
Cette partie de l’ISO 24615 décrit le cadre d’annotation syntaxique (SynAF), modèle de haut niveau pour
représenter l’annotation syntaxique des données linguistiques, dans le but d’offrir l’interopérabilité
entre les ressources linguistiques ou les composants du traitement de la langue. La présente partie
de l’ISO 24615 est complémentaire par rapport à l’ISO 24611 (MAF, cadre d’annotation morpho-
syntaxique) et y est étroitement liée. Elle fournit un métamodèle pour les représentations syntaxiques,
avec les catégories de données de référence destinées à représenter tout aussi bien des informations de
constituance que des informations de dépendance, dans des phrases ou autres énonciations et segments
du même ordre.
2 Références normatives
Les documents suivants sont indispensables à l'application de ce document. Pour les références datées,
seule l’édition citée s’applique. Pour les références non datées, la dernière édition du document référencé
s’applique (incluant ses éventuels amendements).
ISO 1087-1:2000, Travaux terminologiques — Vocabulaire — Partie 1: Théorie et application
ISO 12620:2009, Terminologie et autres ressources langagières et ressources de contenu — Spécification
de catégories de données et gestion d’un registre de catégories de données pour les ressources langagières
ISO 24611:2012, Gestion des ressources langagières — Cadre d'annotation morphosyntaxique (MAF)
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions figurant dans l’ISO 1087-1:2000,
l’ISO 12620:2009, l’ISO 24611:2012 ainsi que ce les suivants s’appliquent.
élément non essentiel associé à un verbe en opposition avec les arguments syntaxiques (3.19)
Note 1 à l'article: Les adverbes sont des circonstants possibles pour une phrase.
bloc syntaxique
constituant non récursif (3.4)
Note 1 à l'article: Le terme «bloc» est la traduction en français du terme anglais «chunk»
ISO 24615-1:2014(F)
groupe de syntagmes (3.14), qui comprend habituellement un prédicat
Note 1 à l'article: Une proposition peut être soit une proposition principale (3.10) soit une proposition subordonnée
(3.17). Dans les langues qui distinguent la finitude, les propositions dont le prédicat est un verbe peuvent être
soit finies soit non-finies, en fonction de la forme du verbe. Une proposition principale peut constituer à elle seule
une phrase complète (3.15). Dans le modèle SynAF, une proposition est un cas particulier de constituant (3.4).
groupement syntaxique de mots [en syntagmes (3.14)], de syntagmes [en propositions (3.3) ou d’autres
syntagmes] ou de propositions [en une phrase (3.15)], sur la base de propriétés structurelles (ou
relation de dépendance
relation syntaxique entre mots-formes (3.24) ou constituants (3.4) sur la base des fonctions grammaticales
(3.7) que les constituants assument les uns par rapport aux autres
arc syntaxique
triplet avec un nœud source (3.12), un nœud cible, et des annotations optionnelles (3.9)
Note 1 à l'article: Les nœuds non-terminaux (3.13) ont un arc syntaxique sortant de constituance.
fonction grammaticale
rôle grammatical d’un mot-forme (3.24) ou d’un constituant (3.4) dans son environnement syntaxique
Note 1 à l'article: par exemple, un syntagme nominal (SN) peut jouer le rôle du sujet dans une phrase (3.15), ou
un nom celui du sujet dépendant d’un verbe dans un graphe de dépendance. Il y a un lien grammatical entre
le sujet – SN et le verbe principal d’une phrase. Toutes les relations grammaticales (sujet – prédicat, tête –
modificateur, etc.) sont subsumées sous le concept de relation de dépendance (3.5), que cela soit entre nœuds
terminaux ou nœuds non-terminaux.
tête syntaxique
partie d’un constituant (3.4) qui détermine sa distribution (l’environnement syntaxique dans lequel le
constituant peut apparaître) et ses propriétés grammaticales (ex: si le genre grammatical de la tête est
féminin, alors le genre de la totalité du constituant sera féminin)
Note 1 à l'article: La tête d’un constituant ne peut habituellement pas être négligée.
annotation linguistique
paire attribut-valeur dénotant une propriété linguistique d’un segment linguistique
proposition principale
proposition (3.3) qui peut fonctionner toute seule comme une phrase complète (3.15)
Note 1 à l'article: Dans les langues distinguant la finitude, la proposition principale est habituellement finie.
Exemple: Le train est en retard.
2 © ISO 2014 – Tous droits réservés
partie d’un constituant (3.4) qui attribue une propriété à la tête (3.8) dudit constituant
Note 1 à l'article: Un modificateur peut être situé avant ou après la tête du syntagme (3.14) (pré-modificateur ou
post-modificateur). Dans un constituant, les modificateurs sont optionnels.
nœud syntaxique
mot-forme (3.24) ou constituant (3.4) considéré comme un composant syntaxique élémentaire d’une
analyse syntaxique
nœud non terminal
nœud syntaxique (3.12) qui n’est pas un mot-forme (3.24)
Note 1 à l'article: Un nœud non terminal possède un arc sortant de constituance (3.6).
groupe de mots-formes (3.24) (comportant habituellement un ou plusieurs mots) qui peut remplir une
fonction grammaticale (3.7), dans une proposition (3.3), par exemple
Note 1 à l'article: Les syntagmes vides sont autorisés (pronoms non-réalisés, quelquefois marqués comme «pro»
et ayant le rôle de sujets dans des propositions). Un syntagme est typiquement nommé d’après sa tête (3.8), par
exemple les syntagmes nominaux, les syntagmes verbaux, les syntagmes adjectivaux, les syntagmes adverbiaux
et les syntagmes prépositionnels. De manière informelle, les syntagmes sont décrits comme des «mots gonflés».,
du fait que les éléments qui le constituent et qui sont ajoutés à la tête syntaxique forment et spécifient la référence
de celle-ci. Dans notre modèle, le syntagme est un cas particulier de constituant (3.4).
groupe lié de mots-formes (3.24) contenant une prédication, exprimant habituellement une pensée
complète et formant l’unité de base de la structure de discours
Note 1 à l'article: Une phrase consiste en une ou plusieurs propositions (3.3). Quand l’objectif est de décrire le
discours, il est commun de parler d’énonciations plutôt que de phrases.
paire de points (p1, p2), pour lesquels p1 ⩽ p2, identifiant les segments du document faisant l’objet de
l’annotation (3.9)
Note 1 à l'article: Un empan multiple est une séquence d’empans dans laquelle le point final de chaque empan est
inférieur ou égal au point de départ de l’empan suivant.
proposition subordonnée
proposition qui remplit une fonction grammaticale (3.7) dans un syntagme (3.14) [par exemple, une
proposition relative (3.3) modifiant la tête (3.8) nominale d’un syntagme nominal] ou dans une autre
Note 1 à l'article: Généralement, une proposition subordonnée ne fonctionne pas seule comme une phrase, elle
constitue une partie d’une phrase plus grande.
cadre de sous-catégorisation
ensemble de restrictions indiquant les propriétés des arguments syntaxiques (3.19) qui peuvent ou
doivent apparaître avec un verbe
EXEMPLE Alfred (/argument syntaxique/) lit un livre (/argument syntaxique/) aujourd’hui (/circonstant/).
ISO 24615-1:2014(F)
Note 1 à l'article: Le sujet, l’objet indirect et l’objet direct sont des fonctions grammaticales (3.7) sous-catégorisées
au sein d’une phrase; elles sont dépendantes du verbe (c’est-à-dire qu'elles peuvent apparaître dans des cadres de
argument syntaxique
élément fonctionnellement essentiel qui est requis et dont l’interprétation est donnée par la tête
de son syntagme (3.14) ou le nœud (3.12) dont il dépend (par ex, l’argument nominal d’un syntagme
prépositionnel ou d’un verbe)
Note 1 à l'article: Pour les verbes et les syntagmes verbaux, les arguments identifient les participants au processus
référencé par le verbe. Dans certains cadres, les arguments syntaxiques sont appelés compléments.
graphe syntaxique
ensemble connecté de nœuds syntaxiques (3.12) et d’arcs (3.6)
arbre syntaxique
graphe syntaxique (3.20) dans lequel chaque nœud n’a qu’un seul parent
façon dont les mots-formes (3.24) sont interconnectés et/ou groupés ensemble pour former des
syntagmes, et ainsi fixer les relations qui existent entre lesdits éléments
nœud terminal
nœud syntaxique (3.12) qui est un mot-forme (3.24) simple ou un élément vide impliqué dans une
relation syntaxique
entité contiguë ou non-contiguë dans une séquence de discours ou de texte, identifiée comme un item
lexical autonome
4 Métamodèle SynAF
4.1 Introduction
Les annotations syntaxiques ont au moins deux fonctions dans le traitement du langage:
a) représenter la constituance linguistique, comme dans les syntagmes nominaux (SN), en décrivant
une séquence structurée d’items annotés en morpho-syntaxe (incluant des éléments vides ou des
traces engendrées par des mouvements au niveau de la constituance), ainsi que les constituants
construits à partir d’éléments non-contigus, et
b) représenter les relations de dépendance, comme les relations tête-modificateur, et incluant
les relations entre catégories de même nature (comme les relations tête-tête entre noms dans
les appositions, ou les coordinations nominales dans certains formalismes). L’information de
dépendance peut exister entre items annotés en morpho-syntaxe au sein d’un syntagme (un adjectif
est le modificateur d’une tête nominale au sein d’un SN) ou bien décrit une relation spécifique entre
constituants syntaxiques au niveau de la proposition ou de la phrase (c’est-à-dire, un SN «sujet»
du verbe principal de la proposition ou de la phrase). La relation de dépendance peut aussi être
associée à des éléments vides (par exemple, l’élément pro dans les langues romanes qui assume une
fonction grammaticale).
ISO 24615-1:2014(F)
En conséquence, les annotations syntaxiques doivent être conformes à une stratégie d’annotation
multicouches mettant en corrélation les annotations syntaxiques relatives à la constituance et à la
dépendance, comme spécifié dans le métamodèle SynAF.
4.2 Métamodèle SynAF
4.2.1 Vue d’ensemble
Le métamodèle SynAF est représenté comme un ensemble de classes UML complété par des paires
attribut-valeur, qui représentent les catégories de données syntaxiques. La description textuelle SynAF
spécifie de manière plus complète les classes SynAF, les relations et extensions qui peuvent être incluses
dans le diagramme UML. Les développeurs doivent définir une sélection de catégories de données (DCS)
comme spécifié dans les procédures de sélection des catégories de données pour SynAF (voir Figure 1).
Les catégories de données fournies dans l’Annexe A doivent être utilisées pour la représentation des
annotations syntaxiques.
4.2.2 Classe SyntacticNode
La classe SyntacticNode est une classe générique subsumant à la fois la classe des nœuds terminaux
et la classe des nœuds non-terminaux. Les nœuds syntaxiques peuvent être impliqués dans autant de
relations syntaxiques que nécessaires (voir 3.6 arc syntaxiques).
4.2.3 Classe T_Node
La classe T_Node représente les nœuds terminaux d’un arbre syntaxique, constitué de mots-formes
annotés en morpho-syntaxe, ainsi que d’éléments vides, s'il y lieu. Les nœuds T_Nodes sont définis sur
un ou plusieurs empans (des empans multiples peuvent servir pour les constituants discontinus). Les
T_Nodes sont annotés avec des catégories syntaxiques valides au niveau du mot.
4.2.4 Classe NT_Node
La classe NT_Node représente les nœuds non-terminaux de l’arbre syntaxique. Les arbres syntaxiques
sont principalement constitués de T_Nodes et de NT_Nodes, avec des éléments vides, s'il y a lieu Les
T_Nodes font référence à un empan. De ce fait, en vertu des représentations de l’arbre syntaxique, des
empans peuvent aussi être déduits pour les NT_Nodes. Les NT_Nodes sont annotés avec des catégories
syntaxiques valides au niveau syntagmatique et supérieur (niveau propositionnel ou phrastique).
4.2.5 Classe SyntacticEdge
La classe SyntacticEdge représente une relation entre nœuds syntaxiques (à la fois nœuds terminaux
et non-terminaux). Par exemple, la relation de dépendance est binaire, composée d’une paire de
nœuds source et cible, avec une ou plusieurs annotations. En particulier, un arc syntaxique peut
être annoté par un /syntacticEdgeType/ (voir l’Annexe A), dont le domaine conceptuel peut être un /
primarySyntacticEdge/ ou /secondarySyntacticEdge/, sans que cela soit limitatif.
4.2.6 Classe Annotation
La classe Annotation représente l’application de l’information syntaxique aux données annotées SynAF,
ainsi que l’application de l’information morpho-syntaxique aux données annotées MAF (voir Figure 1).
© ISO 2014 – Tous droits réservés 5
---------------------- Page: 10 ----------------------
ISO 24615-1:2014(F)
Figure 1 — Métamodèle SynAF (articulé avec MAF)
ISO 24615-1:2014(F)
Annexe A
Catégories de données pour SynAF
A.1 Généralités
Les catégories de données suivantes doivent être utilisées pour la représentation des annotations
syntaxiques en combinaison avec le métamodèle SynAF. Si nécessaire, des applications spécifiques
peuvent définir des catégories de données additionnelles, qui doivent être décrites en conformité avec
l’ISO 12620 et mémorisées dans le registre de catégories de données ISOCat.
A.2 Catégories de données syntaxiques de base
/annotation/ annotation
Définition [fr] information ajoutée à un mot, un syntagme, une proposition, une phrase,
un texte, ou à une relation entre ces éléments
Domaine conceptuel /deepParsing/, /shallowParsing/, /tagging/
Définition [fr] niveau de la richesse d’information que l’annotation décrit
/annotationStyle/ style d’annotation
Domaine conceptuel /embeddedNotation/, /mixedNotation/, /standoffNotation/
Définition [fr] style de l’annotation
/annotationType/ type d’annotation
Domaine conceptuel /constituency/ /constituencyAndDependency/ /dependency/
Définition [fr] type de l’annotation
/clitic/ clitique
Définition [fr] mot non marqué qui ne peut fonctionner tout seul comme une énoncia-
tion normale et qui est phonologiquement dépendant d’un mot voisin
pour sa prononciation
— Note [fr] Il y a une grande variation concernant les clitiques. Quelquefois, en anglais,
les formes clitiques se limitent aux formes contractées des auxiliaires,
comme I’m, she’ll, etc. Néanmoins dans d’autres contextes, les articles
sont aussi qualifiés de clitiques.
/constituency/ constituance
Définition [fr] mécanisme permettant la construction des mots en syntagmes, des syn-
tagmes en syntagmes de plus haut niveau, et des propositions en phrases
ISO 24615-1:2014(F)
— Note [fr] L'élaboration de phrases en texte n’est habituellement pas dénommée
/constituencyAndDependency/ constituance et dépendance
Définition [fr] union de la constituance et de la dépendance
/contiguous/ contigu
Définition [fr] propriété d’une unité grammaticale qui partage une frontière avec une autre
/deepParsing/ analyse profonde
Définition [fr] processus de décodage complet des propositions et des relations pré-
sentes dans une phrase
/dependency/ dépendance
Définition [fr] mécanisme permettant la liaison de mots ou, dans certains formalismes,
de syntagmes et de propositions, basé sur la division binaire dépendant
de la tête et sur une éventuelle annotation de la fonction grammaticale
/doubleNegation/ double négation
Définition [fr] construction consistant en deux formes négatives dans la même pro-
— Note [fr] Exemple: En français, “je ne suis pas mécontent”.
/embeddedNotation/ notation enchâssée
Définition [fr] annotation qui est ajoutée dans un texte
— Note [fr] L’organisation originale du texte est modifiée.
/enclitic/ - BC: /clitic/ clitique
Définition [fr] qui dépend d’un mot précédent
Définition [fr] avant toute autre chose selon un certain ordre
/mixedNotation/ notation mixte
Définition [fr] style d’annotation hybride dans lequel les styles déporté et enchâssé
sont mélangés
/morphosyntacticAnnotation/ annotation morphosyntaxique - BC: /annotation/
Définition [fr] annotation relative à la morphologie des mots et de leur partie du discours
/negation/ négation
Définition [fr] construction qui exprime la signification contraire de tout ou partie
d’une phrase, d’un mot ou d’un syntagme
— Note [fr] La négation peut être fondée sur les particules négatives (comme «ne
… pas») ou des préfixes (comme «in» ou «non»). Exemple: en français,
«je ne suis pas content».
ISO 24615-1:2014(F)
/next/ prochain
Définition [fr] immédiatement après
/primarySyntacticEdge/ arc syntaxique primaire
Définition [fr] arc par défaut exprimant la relation de constituance, prenant son origine
dans un constituant et se terminant dans un composant de ce constituant
/predicate/ prédicat
Définition [fr] syntagme ou mot dans une proposition qui fournit une assertion sur le
sujet de la proposition. La plupart des propositions peuvent ainsi être
décomposées en sujet et prédicat, où le prédicat est une fonction qui
s’étend au sujet
— Note [fr] Exemple: «Kevin tape dans le ballon» est vu comme un sujet («Kevin»)
associé avec un prédicat phrastique («tape dans le ballon»).
/previous/ précédent
Définition [fr] immédiatement avant
/proclitic/proclitique/ /clitic/clitique/
Définition [fr] clitique qui dépend du mot suivant
— Note [fr] Exemple: «le» dans «le garçon».
/propagation/ propagation
Définition [fr] acte d’étendre une propriété linguistique d’une unité grammaticale à
une autre
/secondarySyntacticEdge/ arc syntaxique secondaire
Définition [fr] arc indirect exprimant la constituance syntaxique. Ces arcs peuvent
être utilisés pour exprimer la relation entre une tête et un coréférent
de son dépendant omis
— Note [fr] Exemple: Dans «J’ai vu Bill, mais suis retourné ensuite à la maison», «Je»
peut fonctionner comme un sujet explicite dans la première proposition,
régi par un arc syntaxique primaire, mais dans la seconde proposition,
un arc syntaxique secondaire menant à «Je» peut clarifier la situation
dans la mesure où il est le sujet de la seconde proposition, tout ceci sans
être une des parties explicites de cette proposition, lequel est régi par
des arcs primaires. Ce mécanisme est utilisé dans certains formalismes
pour éviter l’introduction d’éléments vides valant pour de tels porteurs
‘absents’ de la fonction grammaticale.
/shallowParsing/ analyse de surface
Définition [fr] processus d’identification des blocs dans une phrase
Définition [fr] annotation qui est enregistrée de manière externe aux unités gramma-
ticales et qui référence ces unités
— Note [fr] L’organisation originale du texte reste inchangée.
ISO 24615-1:2014(F)
/syntacticAnnotation/ annotation syntaxique - BC: /annotation/
Définition [fr] annotation qui décrit la constituance et/ou la dépendance
— Note [fr] L’annotation syntaxique n’a pas directement de rapport avec le signifié
de l’énonciation.
/syntacticFeature/ trait utilisé dans la description
