ISO 24617-9:2019
(Main)Language resource management — Semantic annotation framework — Part 9: Reference annotation framework (RAF)
Language resource management — Semantic annotation framework — Part 9: Reference annotation framework (RAF)
This document provides a comprehensive model for the annotation and representation of referential phenomena in natural language texts and multimodal interactions. Such phenomena can cover simple anaphoric or coreferential mechanisms as well as more complex bridging or multimodal mechanisms. It provides a reference serialisation in XML defined as a customisation of the TEI P5 guidelines. In addition, the document describes the core data categories related to referential entities and link structures, and also needed for the description of annotation schemes and serialisation mechanisms for implementing conformant models as concrete data formats.
Gestion des ressources linguistiques — Cadre d'annotation sémantique — Partie 9: Cadre d'annotation de la référence (RAF)
Upravljanje jezikovnih virov - Ogrodje za semantično označevanje - 9. del: Referenčni okvir označevanja (RAF)
This document is intended to complement the ISO 24617 series and to provide all the necessary
conceptual and technical mechanisms for the annotation of referential phenomena in multimodal
discourse. Reference phenomena are an essential component for the understanding and structuring of
discursive mechanisms, ranging from very basic pronominal relation to complex bridging anaphora.
Annotating such phenomena in an interoperable way improves the re-usability of language resources
in such applications in language technology as named entity recognition, text understanding and
synthesis, text summarization, information retrieval, automatic question-answering, man-machine
dialogue, and machine translation.
The content of this document builds upon various projects and software platforms that have been
dealing with reference annotation (RA), in particular the following References [9],[2],[16],[21],
[26],[25],[22],[5],[15],[13] but also the TEI P5 guidelines. Based on these and other previous works,
the Referential Annotation Framework (RAF) aims at providing a synthesized way of treating various
reference phenomena in discourse. In continuity with most practices in the field, RAF focuses on
marking up referring expressions in a discourse and the relations that hold between them and the
corresponding entities, whether this is based upon employing crowd sourcing or machine learning
Language resource management — Semantic annotation
framework —
Part 9:
Reference annotation framework (RAF)
1 Scope
This document provides a comprehensive model for the annotation and representation of referential
phenomena in natural language texts and multimodal interactions. Such phenomena can cover simple
anaphoric or coreferential mechanisms as well as more complex bridging or multimodal mechanisms. It
provides a reference serialisation in XML defined as a customisation of the TEI P5 guidelines. In addition,
the document describes the core data categories related to referential entities and link structures, and
also needed for the description of annotation schemes and serialisation mechanisms for implementing
conformant models as concrete data formats.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24622-1, Language resource management — Component Metadata Infrastructure (CMDI) — Part 1:
The Component Metadata Model
TEI P5, Guidelines for Electronic Text Encoding and Interchange. Version 3.5.0. Last updated on 29th
January 2019. TEI Consortium. http:// www .tei -c .o
This document is intended to complement the ISO 24617 series and to provide all the necessary
conceptual and technical mechanisms for the annotation of referential phenomena in multimodal
discourse. Reference phenomena are an essential component for the understanding and structuring of
discursive mechanisms, ranging from very basic pronominal relation to complex bridging anaphora.
Annotating such phenomena in an interoperable way improves the re-usability of language resources
in such applications in language technology as named entity recognition, text understanding and
synthesis, text summarization, information retrieval, automatic question-answering, man-machine
dialogue, and machine translation.
The content of this document builds upon various projects and software platforms that have been
dealing with reference annotation (RA), in particular the following References [9],[2],[16],[21],
[26],[25],[22],[5],[15],[13] but also the TEI P5 guidelines. Based on these and other previous works,
the Referential Annotation Framework (RAF) aims at providing a synthesized way of treating various
reference phenomena in discourse. In continuity with most practices in the field, RAF focuses on
marking up referring expressions in a discourse and the relations that hold between them and the
corresponding entities, whether this is based upon employing crowd sourcing or machine learning
Language resource management — Semantic annotation
framework —
Part 9:
Reference annotation framework (RAF)
1 Scope
This document provides a comprehensive model for the annotation and representation of referential
phenomena in natural language texts and multimodal interactions. Such phenomena can cover simple
anaphoric or coreferential mechanisms as well as more complex bridging or multimodal mechanisms. It
provides a reference serialisation in XML defined as a customisation of the TEI P5 guidelines. In addition,
the document describes the core data categories related to referential entities and link structures, and
also needed for the description of annotation schemes and serialisation mechanisms for implementing
conformant models as concrete data formats.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24622-1, Language resource management — Component Metadata Infrastructure (CMDI) — Part 1:
The Component Metadata Model
TEI P5, Guidelines for Electronic Text Encoding and Interchange. Version 3.5.0. Last updated on 29th
This document is intended to complement the ISO 24617 series and to provide all the necessary
conceptual and technical mechanisms for the annotation of referential phenomena in multimodal
discourse. Reference phenomena are an essential component for the understanding and structuring of
discursive mechanisms, ranging from very basic pronominal relation to complex bridging anaphora.
Annotating such phenomena in an interoperable way improves the re-usability of language resources
in such applications in language technology as named entity recognition, text understanding and
synthesis, text summarization, information retrieval, automatic question-answering, man-machine
dialogue, and machine translation.
The content of this document builds upon various projects and software platforms that have been
dealing with reference annotation (RA), in particular the following References [9],[2],[16],[21],
[26],[25],[22],[5],[15],[13] but also the TEI P5 guidelines. Based on these and other previous works,
the Referential Annotation Framework (RAF) aims at providing a synthesized way of treating various
reference phenomena in discourse. In continuity with most practices in the field, RAF focuses on
marking up referring expressions in a discourse and the relations that hold between them and the
corresponding entities, whether this is based upon employing crowd sourcing or machine learning
Language resource management — Semantic annotation
framework —
Part 9:
Reference annotation framework (RAF)
1 Scope
This document provides a comprehensive model for the annotation and representation of referential
phenomena in natural language texts and multimodal interactions. Such phenomena can cover simple
anaphoric or coreferential mechanisms as well as more complex bridging or multimodal mechanisms. It
provides a reference serialisation in XML defined as a customisation of the TEI P5 guidelines. In addition,
the document describes the core data categories related to referential entities and link structures, and
also needed for the description of annotation schemes and serialisation mechanisms for implementing
conformant models as concrete data formats.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24622-1, Language resource management — Component Metadata Infrastructure (CMDI) — Part 1:
The Component Metadata Model
TEI P5, Guidelines for Electronic Text Encoding and Interchange. Version 3.5.0. Last updated on 29th
January 2019. TEI Consortium. http:// www .tei -c .org/ Guidelines/ P5/
Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation 26 November 2008.
https:// www .w3 .org/ TR/ REC -xml/
IETF BCP 47, Tags for Identifying Languages, September 2009. https:// tools .ietf .org/ html/ bcp47
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
linguistic mechanism by which the interpretation of a referring expression (3.7) depends on another
expression mentioned in the same text or discourse
Note 1 to entry: The notion of anaphora is more general than that of coreference (3.3): the interpretation of
anaphora is context-dependent, whereas coreference is determined rather rigidly independently to its possible
use of context (see Reference [25]).
Note 2 to entry: The
This document is intended to complement the ISO 24617 series and to provide all the necessary
conceptual and technical mechanisms for the annotation of referential phenomena in multimodal
discourse. Reference phenomena are an essential component for the understanding and structuring of
discursive mechanisms, ranging from very basic pronominal relation to complex bridging anaphora.
Annotating such phenomena in an interoperable way improves the re-usability of language resources
in such applications in language technology as named entity recognition, text understanding and
synthesis, text summarization, information retrieval, automatic question-answering, man-machine
dialogue, and machine translation.
The content of this document builds upon various projects and software platforms that have been
dealing with reference annotation (RA), in particular the following References [9],[2],[16],[21],
[26],[25],[22],[5],[15],[13] but also the TEI P5 guidelines. Based on these and other previous works,
the Referential Annotation Framework (RAF) aims at providing a synthesized way of treating various
reference phenomena in discourse. In continuity with most practices in the field, RAF focuses on
marking up referring expressions in a discourse and the relations that hold between them and the
corresponding entities, whether this is based upon employing crowd sourcing or machine learning
Language resource management — Semantic annotation
framework —
Part 9:
Reference annotation framework (RAF)
1 Scope
This document provides a comprehensive model for the annotation and representation of referential
phenomena in natural language texts and multimodal interactions. Such phenomena can cover simple
anaphoric or coreferential mechanisms as well as more complex bridging or multimodal mechanisms. It
provides a reference serialisation in XML defined as a customisation of the TEI P5 guidelines. In addition,
the document describes the core data categories related to referential entities and link structures, and
also needed for the description of annotation schemes and serialisation mechanisms for implementing
conformant models as concrete data formats.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 24622-1, Language resource management — Component Metadata Infrastructure (CMDI) — Part 1:
The Component Metadata Model
TEI P5, Guidelines for Electronic Text Encoding and Interchange. Version 3.5.0. Last updated on 29th
January 2019. TEI Consortium. http:// www .tei -c .org/ Guidelines/ P5/
Extensible Markup Language (XML) 1.0 (Fifth Edition), W3C Recommendation 26 November 2008.
https:// www .w3 .org/ TR/ REC -xml/
IETF BCP 47, Tags for Identifying Languages, September 2009. https:// tools .ietf .org/ html/ bcp47
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
linguistic mechanism by which the interpretation of a referring expression (3.7) depends on another
expression mentioned in the same text or discourse
Note 1 to entry: The notion of anaphora is more general than that of coreference (3.3): the interpretation of
anaphora is context-dependent, whereas coreference is determined rather rigidly independently to its possible
use of context (see Reference [25]).
Note 2 to entry: The
