SIST ISO 24615:2013
(Main)Language resource management -- Syntactic annotation framework (SynAF)
Language resource management -- Syntactic annotation framework (SynAF)
This International Standard describes the syntactic annotation framework (SynAF), a high level model for representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across language resources or language processing components. This International Standard is complementary and closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for syntactic representations as well as reference data categories for representing both constituency and dependency information in sentences or other comparable utterances and segments.
Gestion de ressources langagières -- Cadre d'annotation syntaxique (SynAF)
Upravljanje z jezikovnimi viri - Ogrodje za skladenjsko označevanje (SynAF)
Ta mednarodni standard opisuje ogrodje za skladenjsko označevanje (SynAF), ki je večravninski model za predstavitev skladenjskega označevanja jezikovnih podatkov, da se zagotovi podpora interoperabilnosti med jezikovnimi viri ali komponentami za obdelavo jezikov. Ta mednarodni standard dopolnjuje in je tesno povezan s standardom ISO 24611 (MAF, ogrodje za oblikoskladenjsko označevanje) in zagotavlja metamodel za skladenjske predstavitve in referenčne podatkovne kategorije za predstavitev podatkov o sestavi in odvisnosti v stavkih ali drugih primerljivih izjavah in segmentih.
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO
STANDARD 24615
First edition
2010-10-15
Language resource management —
Syntactic annotation framework (SynAF)
Gestion de ressources langagières — Cadre d'annotation syntaxique
(SynAF)
Reference number
ISO 24615:2010(E)
©
ISO 2010
---------------------- Page: 1 ----------------------
ISO 24615:2010(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
COPYRIGHT PROTECTED DOCUMENT
© ISO 2010
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2010 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24615:2010(E)
Contents Page
Foreword .iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .1
4 SynAF metamodel .4
4.1 Introduction.4
4.2 SynAF metamodel .5
4.2.1 Overview.5
4.2.2 SyntacticNode class.6
4.2.3 T_Node class .6
4.2.4 NT_Node class.6
4.2.5 SyntacticEdge class.6
4.2.6 Annotation class.6
Annex A (normative) Data categories for SynAF.7
Annex B (informative) Relation to the Linguistic Annotation Framework .15
Bibliography.17
© ISO 2010 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24615:2010(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24615 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management, in collaboration with the European
eContent Project “LIRICS” (Linguistic Infrastructure for Interoperable Resources and Systems), under the
contract e-Content-22236-LIRICS.
ISO 24615 is designed to coordinate closely with ISO 24612, Language resource management — Linguistic
annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical markup framework
(LMF), and ISO 24611, Language resource management — Morpho-syntactic annotation framework.
iv © ISO 2010 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24615:2010(E)
Introduction
This International Standard is based on numerous projects and pre-standardisation activities that have taken
[9]
place in the last few years (see Abeillé, 2001 ), to provide reference models and formats for the
representation of syntactic information, whether as the output of a syntactic parser, or as annotations of
language resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto
standard for treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-
stuttgart.de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni
[18]
(2003) ] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the ISO 24615
(SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed share a common
data model that offers a good basis for the SynAF metamodel (see the study made in Deliverable D.3.1
“Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU project LIRICS, available at
http://lirics.loria.fr/doc_pub/Del3_1_V2.pdf).
This International Standard proposes a metamodel for syntactic annotation together with a list of relevant data
categories for syntactic annotation. The data categories are available on the ISOCat server
(http://www.isocat.org/) in the syntax profile (as defined in ISO 12620:2009).
© ISO 2010 – All rights reserved v
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24615:2010(E)
Language resource management — Syntactic annotation
framework (SynAF)
1 Scope
This International Standard describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across
language resources or language processing components. This International Standard is complementary and
closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for
syntactic representations as well as reference data categories for representing both constituency and
dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 1087-2:2000, Terminology work — Vocabulary — Part 2: Computer applications
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1, ISO 1087-2,
ISO 12620:2009, ISO 24611 and the following apply.
3.1
adjunct
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
NOTE Adverbs are possible adjuncts for a sentence.
3.2
chunk
non-recursive constituent (3.4)
3.3
clause
group of phrases (3.14), usually containing a predicate
NOTE A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb. A main
clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a constituent (3.4).
© ISO 2010 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO 24615:2010(E)
3.4
constituent
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
3.5
dependency
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammatical
functions (3.7) that constituents play in relation to each other
3.6
syntactic edge
edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
NOTE Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
3.7
grammatical function
grammatical role of a word form (3.24) or constituent (3.4) within its embedding syntactic environment
NOTE For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as a
subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and the
main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed under the
concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
3.8
syntactic head
head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is feminine,
then the gender of the entire constituent will be feminine)
NOTE The head of a constituent usually cannot be left out.
3.9
linguistic annotation
annotation
feature-value pair denoting a linguistic property of a linguistic segment
3.10
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
NOTE In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
3.11
modifier
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
NOTE A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-modifier).
Modifiers are optional in a constituent.
3.12
node
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
2 © ISO 2010 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24615:2010(E)
3.13
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
NOTE A non-terminal node has an outgoing constituency edge (3.6).
3.14
phrase
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical function
(3.7), e.g. in a clause (3.3)
NOTE Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and having the role
of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases, verb phrases, adjective
phrases, adverbial phrases and prepositional phrases. Phrases have been informally described as “bloated words”, in that
the parts of the phrase added to the head elaborate and specify the reference of the head. In our model, a phrase is a
special case of a constituent (3.4).
3.15
sentence
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
NOTE A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk about
“utterances” rather than sentences.
3.16
span
pair of points (p1, p2), where p1 u p2, identifying the segment of the document to which an annotation (3.9)
is applied
NOTE A multiple span is a sequence of spans where the ending point of each span is less than or equal to the
starting point of the subsequent span.
3.17
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
NOTE A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
3.18
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with a
verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
NOTE The subject, indirect object and direct object are subcategorized grammatical functions (3.7) within a
sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
3.19
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase (3.14) or
the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase or verb)
NOTE For verbs and verbal phrases, arguments identify the participants in the process referred to by the verb. In
some frameworks, syntactic arguments are called complements.
© ISO 2010 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO 24615:2010(E)
3.20
syntactic graph
graph
connected set of syntactic nodes (3.12) and edges (3.6)
3.21
syntactic tree
syntactic graph (3.20) in which each node has a single parent
3.22
syntax
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing the
relations that exist between those units
3.23
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic relation
3.24
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical item
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements at
the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations between
categories of the same kind (such as the head-head relations between nouns in appositions, or nominal
coordinations in some formalisms). The dependency information can exist between morpho-syntactically
annotated items within a phrase (an adjective is the modifier of the head noun within an NP) or describe a
specific relation between syntactic constituents at the clausal and sentential level (i.e. an NP being the
“subject” of the main verb of a clause or sentence). The dependency relation can also be stated for empty
elements (e.g. the pro element in romance languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy interrelating
syntactic annotation for both constituency and dependency as stated in the SynAF metamodel.
4 © ISO 2010 – All rights reserved
---------------------- Page: 9 ----------------------
ISO 24615:2010(E)
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value pairs,
which represent the associated syntactic data categories. The SynAF textual descriptions specify more
complete information about the SynAF classes, relations and extensions than can be included in the UML
diagram. Developers shall define a data category selection (DCS) as specified for SynAF data category
selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
Figure 1 — SynAF metamodel (articulated with MAF)
© ISO 2010 – All rights reserved 5
---------------------- Page: 10 ----------------------
ISO 24615:2010(E)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of non-
terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see 3.6,
syntactic edges).
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one or
more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_Nodes
are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes, with
one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/ (see
Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/,
/secondarySyntacticEdge/.
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as well as
(see Figure 1) the application of morphosyntactic information to MAF annotated data.
6 © ISO 2010 – All rights reserved
---------------------- Page: 11 ----------------------
ISO 24615:2010(E)
Annex A
(normative)
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in combination with
the SynAF metamodel. When necessary, specific applications may define additional data categories, which
shall be described in compliance with ISO 12620 and provided in the ISOCat data category registry.
A.2 Basic syntactic data categories
/annotation/
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation among
them
/annotationDepth/
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
/annotationStyle/
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
/annotationType/
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
/clitic/
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is
phonologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized forms
are restricted to the contracted forms of auxiliaries, as in I'm, she’ll, etc. However in
some instances, articles are also referred to as clitics.
/constituency/
Definition [en] mechanism allowing the construction of words into phrases, phrases into higher
phrases or clauses, and clauses into sentences
— Note [en] The construction of sentences into text is not usually called constituency.
/constituencyAndDependency/
Definition [en] union of constituency and dependency
/contiguous/
Definition [en] property of a grammatical unit sharing a boundary with another
/deepParsing/
Definition [en] process of fully
...
SLOVENSKI STANDARD
SIST ISO 24615:2013
01-julij-2013
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DVNODGHQMVNRR]QDþHYDQMH6\Q$)
Language resource management -- Syntactic annotation framework (SynAF)
Gestion de ressources langagières -- Cadre d'annotation syntaxique (SynAF)
Ta slovenski standard je istoveten z: ISO 24615:2010
ICS:
01.020 7HUPLQRORJLMDQDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
SIST ISO 24615:2013 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24615:2013
---------------------- Page: 2 ----------------------
SIST ISO 24615:2013
INTERNATIONAL ISO
STANDARD 24615
First edition
2010-10-15
Language resource management —
Syntactic annotation framework (SynAF)
Gestion de ressources langagières — Cadre d'annotation syntaxique
(SynAF)
Reference number
ISO 24615:2010(E)
©
ISO 2010
---------------------- Page: 3 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
COPYRIGHT PROTECTED DOCUMENT
© ISO 2010
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2010 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
Contents Page
Foreword .iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .1
4 SynAF metamodel .4
4.1 Introduction.4
4.2 SynAF metamodel .5
4.2.1 Overview.5
4.2.2 SyntacticNode class.6
4.2.3 T_Node class .6
4.2.4 NT_Node class.6
4.2.5 SyntacticEdge class.6
4.2.6 Annotation class.6
Annex A (normative) Data categories for SynAF.7
Annex B (informative) Relation to the Linguistic Annotation Framework .15
Bibliography.17
© ISO 2010 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 24615 was prepared by Technical Committee ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management, in collaboration with the European
eContent Project “LIRICS” (Linguistic Infrastructure for Interoperable Resources and Systems), under the
contract e-Content-22236-LIRICS.
ISO 24615 is designed to coordinate closely with ISO 24612, Language resource management — Linguistic
annotation framework (LAF), ISO 24613:2008, Language resource management — Lexical markup framework
(LMF), and ISO 24611, Language resource management — Morpho-syntactic annotation framework.
iv © ISO 2010 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
Introduction
This International Standard is based on numerous projects and pre-standardisation activities that have taken
[9]
place in the last few years (see Abeillé, 2001 ), to provide reference models and formats for the
representation of syntactic information, whether as the output of a syntactic parser, or as annotations of
language resources (treebanks). For several years, the Penn Treebank initiative has served as a de facto
standard for treebanking, but more recent works e.g. the Negra/Tiger initiative (see: http://www.ims.uni-
stuttgart.de/projekte/TIGER/TIGERCorpus/) in Germany or the ISST initiative in Italy [see Montemagni
[18]
(2003) ] demonstrate the viability of a more coherent framework that can account for both (hierarchical)
constituency and dependency phenomena in syntactic annotation.
The eContent project “LIRICS”, has been seminal in gathering a group of experts, who initiated the ISO 24615
(SynAF) project. While preparing SynAF, this group confirmed that existing initiatives indeed share a common
data model that offers a good basis for the SynAF metamodel (see the study made in Deliverable D.3.1
“Evaluation of initiatives for morpho-syntactic and syntactic annotation” of the EU project LIRICS, available at
http://lirics.loria.fr/doc_pub/Del3_1_V2.pdf).
This International Standard proposes a metamodel for syntactic annotation together with a list of relevant data
categories for syntactic annotation. The data categories are available on the ISOCat server
(http://www.isocat.org/) in the syntax profile (as defined in ISO 12620:2009).
© ISO 2010 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24615:2013
---------------------- Page: 8 ----------------------
SIST ISO 24615:2013
INTERNATIONAL STANDARD ISO 24615:2010(E)
Language resource management — Syntactic annotation
framework (SynAF)
1 Scope
This International Standard describes the syntactic annotation framework (SynAF), a high level model for
representing the syntactic annotation of linguistic data, with the objective of supporting interoperability across
language resources or language processing components. This International Standard is complementary and
closely related to ISO 24611 (MAF, morpho-syntactic annotation framework) and provides a metamodel for
syntactic representations as well as reference data categories for representing both constituency and
dependency information in sentences or other comparable utterances and segments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 1087-1:2000, Terminology work — Vocabulary — Part 1: Theory and application
ISO 1087-2:2000, Terminology work — Vocabulary — Part 2: Computer applications
ISO 12620:2009, Terminology and other language and content resources — Specification of data categories
and management of a Data Category Registry for language resources
ISO 24611, Language resource management — Morpho-syntactic annotation framework
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 1087-1, ISO 1087-2,
ISO 12620:2009, ISO 24611 and the following apply.
3.1
adjunct
non-essential element associated with a verb as opposed to syntactic arguments (3.19)
NOTE Adverbs are possible adjuncts for a sentence.
3.2
chunk
non-recursive constituent (3.4)
3.3
clause
group of phrases (3.14), usually containing a predicate
NOTE A clause can be either a main clause (3.10) or a subordinate clause (3.17). In languages distinguishing
finiteness, clauses whose predicate is a verb can be either finite or non-finite, depending on the form of the verb. A main
clause alone can build a complete sentence (3.15). In the SynAF model, a clause is a special case of a constituent (3.4).
© ISO 2010 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
3.4
constituent
syntactic grouping of words [into phrases (3.14)], phrases [into clauses (3.3) or other phrases] or clauses
[into a sentence (3.15)] on the base of structural (or hierarchical) properties
3.5
dependency
dependency relation
syntactic relation between word forms (3.24) or constituents (3.4) on the basis of the grammatical
functions (3.7) that constituents play in relation to each other
3.6
syntactic edge
edge
triplet with a source node (3.12), a target node, and optional annotations (3.9)
NOTE Non-terminal nodes (3.13) have an outgoing constituency syntactic edge.
3.7
grammatical function
grammatical role of a word form (3.24) or constituent (3.4) within its embedding syntactic environment
NOTE For example, a noun phrase (NP) can act as a subject within a sentence (3.15), or a noun may act as a
subject dependent of a verb in a dependency graph. There is a grammatical relation between the subject – NP and the
main verb in a sentence. All grammatical relations (subject – predicate, head – modifier, etc.) are subsumed under the
concept of dependency relations (3.5), whether between terminal or non-terminal nodes.
3.8
syntactic head
head
part of a constituent (3.4) which determines its distribution (the syntactic environments in which the
constituent may appear) and its grammatical properties (e.g. if the grammatical gender of the head is feminine,
then the gender of the entire constituent will be feminine)
NOTE The head of a constituent usually cannot be left out.
3.9
linguistic annotation
annotation
feature-value pair denoting a linguistic property of a linguistic segment
3.10
main clause
clause (3.3), which can act on its own as a complete sentence (3.15)
NOTE In languages distinguishing finiteness, the main clause is usually finite. Example: The train is late.
3.11
modifier
part of a constituent (3.4) which ascribes a property to the head (3.8) of the constituent
NOTE A modifier can be placed before or after the head of the phrase (3.14) (pre-modifier or post-modifier).
Modifiers are optional in a constituent.
3.12
node
syntactic node
word form (3.24) or constituent (3.4) seen as an elementary syntactic component of a syntactic analysis
2 © ISO 2010 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
3.13
non-terminal node
syntactic node (3.12) which is not a word form (3.24)
NOTE A non-terminal node has an outgoing constituency edge (3.6).
3.14
phrase
group of word forms (3.24) (usually containing one or more words) which can fulfill a grammatical function
(3.7), e.g. in a clause (3.3)
NOTE Empty phrases are permitted (being non-realised pronouns, sometimes marked as “pro”, and having the role
of subjects in clauses). A phrase is typically named after its head (3.8), for example noun phrases, verb phrases, adjective
phrases, adverbial phrases and prepositional phrases. Phrases have been informally described as “bloated words”, in that
the parts of the phrase added to the head elaborate and specify the reference of the head. In our model, a phrase is a
special case of a constituent (3.4).
3.15
sentence
related group of word forms (3.24) containing a predication, usually expressing a complete thought and
forming the basic unit of discourse structure
NOTE A sentence consists of one or more clauses (3.3). When describing speech, it is common to talk about
“utterances” rather than sentences.
3.16
span
pair of points (p1, p2), where p1 u p2, identifying the segment of the document to which an annotation (3.9)
is applied
NOTE A multiple span is a sequence of spans where the ending point of each span is less than or equal to the
starting point of the subsequent span.
3.17
subordinate clause
clause which fulfils a grammatical function (3.7) in a phrase (3.14) [for example a relative clause (3.3)
modifying the head (3.8) noun of a nominal phrase] or in another clause
NOTE A subordinate clause usually does not act on its own as a sentence, but is part of a larger sentence.
3.18
subcategorization frame
set of restrictions indicating the properties of the syntactic arguments (3.19) that can or must occur with a
verb
EXAMPLE Alfred (/syntacticArgument/) reads a book (/syntacticArgument/) today (/adjunct/).
NOTE The subject, indirect object and direct object are subcategorized grammatical functions (3.7) within a
sentence; they are dependents of the verb (i.e. they can appear in subcategorization frames).
3.19
syntactic argument
functionally essential element that is required and given its interpretation by the head of its phrase (3.14) or
the node (3.12) of which it is a dependent (e.g. the nominal argument of a prepositional phrase or verb)
NOTE For verbs and verbal phrases, arguments identify the participants in the process referred to by the verb. In
some frameworks, syntactic arguments are called complements.
© ISO 2010 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
3.20
syntactic graph
graph
connected set of syntactic nodes (3.12) and edges (3.6)
3.21
syntactic tree
syntactic graph (3.20) in which each node has a single parent
3.22
syntax
way in which word forms (3.24) are interrelated and/or grouped together into phrases, thus capturing the
relations that exist between those units
3.23
terminal node
syntactic node (3.12) which is a single word form (3.24) or an empty element involved in a syntactic relation
3.24
word form
contiguous or non-contiguous entity from a speech or text sequence identified as an autonomous lexical item
4 SynAF metamodel
4.1 Introduction
Syntactic annotations have at least two functions in language processing:
a) to represent linguistic constituency, as in noun phrases (NP), describing a structured sequence of
morpho-syntactically annotated items (including empty elements or traces generated by movements at
the constituency level), as well as constituents built from non-contiguous elements, and
b) to represent dependency relations, such as head-modifier relations, and also including relations between
categories of the same kind (such as the head-head relations between nouns in appositions, or nominal
coordinations in some formalisms). The dependency information can exist between morpho-syntactically
annotated items within a phrase (an adjective is the modifier of the head noun within an NP) or describe a
specific relation between syntactic constituents at the clausal and sentential level (i.e. an NP being the
“subject” of the main verb of a clause or sentence). The dependency relation can also be stated for empty
elements (e.g. the pro element in romance languages, which serves a grammatical function).
As a consequence, syntactic annotations shall comply with a multi-layered annotation strategy interrelating
syntactic annotation for both constituency and dependency as stated in the SynAF metamodel.
4 © ISO 2010 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
4.2 SynAF metamodel
4.2.1 Overview
The SynAF metamodel is represented as a set of UML classes complemented by UML attribute-value pairs,
which represent the associated syntactic data categories. The SynAF textual descriptions specify more
complete information about the SynAF classes, relations and extensions than can be included in the UML
diagram. Developers shall define a data category selection (DCS) as specified for SynAF data category
selection procedures (see Figure 1). The data categories given in Annex A shall be used for the
representation of syntactic annotations.
Figure 1 — SynAF metamodel (articulated with MAF)
© ISO 2010 – All rights reserved 5
---------------------- Page: 13 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
4.2.2 SyntacticNode class
The SyntacticNode class is a generic class subsuming both the class of terminal nodes and the class of non-
terminal nodes. Syntactic nodes can be involved in as many syntactic relations as necessary (see 3.6,
syntactic edges).
4.2.3 T_Node class
The T_Node class represents the terminal nodes of a syntactic tree, consisting of morpho-syntactically
annotated word forms, as well as empty elements when appropriate. The T_Nodes are defined over one or
more spans (multiple spans can account for discontinuous constituents). T_Nodes are annotated with
syntactic categories valid for the word level.
4.2.4 NT_Node class
The NT_Node class represents the non-terminal nodes of a syntax tree. Syntax trees mainly consist of
T_Nodes and NT_Nodes, including empty elements when appropriate. T_Nodes make reference to a span.
Thus by virtue of the syntactic tree representation, spans can also be inferred for NT_Nodes. The NT_Nodes
are annotated with syntactic categories valid at the phrasal level and higher (clausal, sentential).
4.2.5 SyntacticEdge class
The SynacticEdge class represents a relation between syntactic nodes (both terminal and non-terminal
nodes). For example, the dependency relation is binary, consisting of a pair of source and target nodes, with
one or more annotations. In particular, a syntactic edge can be annotated by a /syntacticEdgeType/ (see
Annex A), whose conceptual domain can be one of, but is not limited to, /primarySyntacticEdge/,
/secondarySyntacticEdge/.
4.2.6 Annotation class
The Annotation class represents the application of syntactic information to SynAF annotated data, as well as
(see Figure 1) the application of morphosyntactic information to MAF annotated data.
6 © ISO 2010 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
Annex A
(normative)
Data categories for SynAF
A.1 General
The following data categories shall be used for the representation of syntactic annotations in combination with
the SynAF metamodel. When necessary, specific applications may define additional data categories, which
shall be described in compliance with ISO 12620 and provided in the ISOCat data category registry.
A.2 Basic syntactic data categories
/annotation/
Definition [en] information added to a word, phrase, clause, sentence, a text or to a relation among
them
/annotationDepth/
Conceptual Domain /deepParsing/, /shallowParsing/, /tagging/
Definition [en] level of information richness the annotation describes
/annotationStyle/
Conceptual Domain /embeddedNotation/, /mixedNotation/, /standoffNotation/
Definition [en] style of annotation
/annotationType/
Conceptual Domain /constituency/ /constituencyAndDependency/ /dependency/
Definition [en] type of annotation
/clitic/
Definition [en] unstressed word which cannot stand on its own as a normal utterance and is
phonologically dependent upon a neighboring word for pronunciation
— Note [en] There is a great variation concerning clitics. Sometimes, in English, the cliticized forms
are restricted to the contracted forms of auxiliaries, as in I'm, she’ll, etc. However in
some instances, articles are also referred to as clitics.
/constituency/
Definition [en] mechanism allowing the construction of words into phrases, phrases into higher
phrases or clauses, and clauses into sentences
— Note [en] The construction of sentences into text is not usually called constituency.
/constituencyAndDependency/
Definition [en] union of constituency and dependency
/contiguous/
Definition [en] property of a grammatical unit sharing a boundary with another
/deepParsing/
Definition [en] process of fully decoding the clauses and relations present in a sentence
© ISO 2010 – All rights reserved 7
---------------------- Page: 15 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
/dependency/
Definition [en] mechanism allowing the linking of words, or in some formalisms also phrases and
clauses, based on the binary head-dependent division and a possible annotation of
grammatical function
/doubleNegation/
Definition [en] construction consisting of two negative forms in the same clause
— Note [en] Example: In English, “I'm not unhappy”.
/embeddedNotation/
Definition [en] annotation that is added in the text
— Note [en] The original organization of the text is modified.
/enclitic/ - BC: /clitic/
Definition [en] clitic that depends upon a preceding word
/first/
Definition [en] before anything according to a certain order
/mixedNotation/
Definition [en] hybrid style annotation where standoff and embedded are mixed
/morphosyntacticAnnotation/ - BC: /annotation/
Definition [en] annotation related to the morphology of the words and their part of speech
/negation/
Definition [en] construction that expresses the contradiction of some or all of a sentence's, word’s or
phrase’s meaning
— Note [en] Negation may be based on negative particles (like “not”) or on prefixes (like “un”, or
“non”). Example: In English, “I'm not happy”.
/next/
Definition [en] immediately afterwards
/primarySyntacticEdge/
Definition [en] the default edge expressing the constituency relationship, originating in a constituent
and terminating in a component of that constituent
/predicate/
Definition [en] a phrase or word in a clause which provides a statement regarding the subject of that
clause. Most clauses can thus be divided into a subject and predicate, where the
predicate is a function expanding on the subject.
— Note [en] Example: “Kevin kicks the ball” is seen as a subject (“Kevin”) associated with a
predicate phrase (“kicks the ball”).
/previous/
Definition [en] immediately before
Definition [fr] immédiatement avant
Name [en] previous
Name [fr] précédent
/proclitic/ - BC: /clitic/
Definition [en] clitic that depends upon a following word
— Note [en] Example: “the” in “the boy”.
/propagation/
Definition [en] act of spreading a linguistic property from a grammatical unit to another
8 © ISO 2010 – All rights reserved
---------------------- Page: 16 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
/secondarySyntacticEdge/
Definition [en] an indirect edge expressing syntactic constituency. These edges may be used to
express the relationship between a head and a coreferent of its omitted dependent.
— Note [en] Example: In “I saw Bill, but went straight back home afterwards”, “I” may serve as an
explicit subject to the first clause, dominated by a primary syntactic edge, but in the
second clause, a further secondary syntactic edge leading to “I” can make it clear that it
is also the subject of the second clause, without being one of the explicit parts of that
clause, which are dominated by primary edges. This device is used in some formalisms
to avoid the introduction of empty elements standing in for such ‘missing’ bearers of
grammatical function.
/shallowParsing/
Definition [en] process of identifying the chunks in a sentence
/standoffNotation/
Definition [en] annotation that is recorded externally from the grammatical units and that refers to
these units
— Note [en] The original organization of the text is kept unchanged.
/syntacticAnnotation/ - BC: /annotation/
Definition [en] annotation describing constituency and/or dependency
— Note [en] syntactic annotation does not directly deal with the meaning of an utterance
/syntacticFeature/
Definition [en] feature used in the description of the syntax of a language
/syntacticEdgeType/
Conceptual Domain /primarySyntacticEdge/, /secondarySyntacticEdge/
Definition [en] characterizes the syntactic edge according to its role in the syntactic representation
/syntacticRestriction/
Definition [en] rule that limits what the syntax allows in a particular language
/tagging/
Definition [en] process of annotating the part of speech for every word
/whType/
Definition [en] property for a clause beginning by a question word
— Note [en] In English, “who is he ?” is a whType question.
/yesNoType/
Definition [en] property for a clause where only a positive or a negative answer or position is possible
— Note [en] In English, “Are you coming?” is a yesNoType question.
A.3 Constituency related data categories
/adjectiveChunk/ - BC: /chunk/
Definition [en] chunk headed by an adjective
/adjectivePhrase/ - BC: /phrase/
Definition [en] phrase headed by an adjective
/adpositionChunk/ - BC: /chunk/
Definition [en] chunk introduced by one or several adpositions that are not necessarily contiguous and
on the same end of the chunk
© ISO 2010 – All rights reserved 9
---------------------- Page: 17 ----------------------
SIST ISO 24615:2013
ISO 24615:2010(E)
/adpositionPhrase/ - BC: /phrase/
Definition [en] phrase introduced by one or several adpositions and containing a complement such as
a noun phrase
— Note [en] The adpositions are not necessarily contiguous and on the same end of the phrase.
/adverbChunk/ - BC: /chunk/
Definition [en] chunk headed by an adverb
/adverbPhrase/ - BC: /phrase/
Definition [en] phrase headed by an adverb
/chunk/ - BC: /grammaticalUnit/
Definition [en] flat sequence of words typically containing more than one word
— Note [en] A chunk cannot contain any sub-structures. A chunk is frequently similar to a phrase
and mostly continuous.
/clause/ - BC: /grammaticalUnit/
Definition [en] unit of grammatical organization smaller than or equal to the sentence but larger than
phrases and words, and generally containing its own predicate
— Note [en] The traditional classification is of clausal units into main (independent or superordinate)
and subordinate (or dependent) clauses, e.g. the boy arrived (main clause) after the
rain started (subordinate clause). A clause may form a whole sentence, as in “they
came”. A clause may contain sub-clauses.
/comparativePhrase/ - BC: /phrase/
Definition [en] phrase expressing a comparative meaning
— Note [en] In English, there is both an inflection (e.g. larger) and a comparative phrase
construction (e.g. more beautiful) to express the comparative.
/coordinatedPhrase/ - BC: /phrase/
Definition [en] phrase expressing a coordination
/declarativeClause/ - BC: /clause/
Definition [en] clause referring to the expression of a statement and
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.