SIST ISO 24617-8:2018
Language resource management -- Semantic annotation framework (SemAF) -- Part 8: Semantic relations in discourse, core annotation schema (DR-core)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8: Semantic relations in discourse, core annotation schema (DR-core)
This document establishes the representation and annotation of local, “low-level” discourse relations between situations mentioned in discourse, where each relation is annotated independently of other relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse relations, many of which have similar definitions in different frameworks. To the extent possible, this document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization, and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is well understood and can be clearly defined; (b) further extensions to represent higher-level, global discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be compatible across frameworks, even when they are based on different theories of discourse structure. As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard aims to be transparent in its relation to existing frameworks for discourse relation annotation, but also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1 (time and events); still other discourse relations are very similar to certain predicate-argument relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various parts are required to form a consistent whole, this document pays special attention to the interactions of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves linking local discourse relations to form one or more composite global structures. This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example, presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie 8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
L'ISO 24617-8:2016 d�termine la repr�sentation et l'annotation des relations du discours locales, de �bas niveau�, entre les situations mentionn�es dans le discours, chaque relation �tant annot�e ind�pendamment des autres relations dans le m�me discours.
L'ISO 24617-8:2016 fournit un socle d'annotation des relations du discours, en sp�cifiant un ensemble de base de relations du discours, un grand nombre d'entre elles rev�tant des d�finitions similaires dans des cadres diff�rents. Dans la mesure du possible, le pr�sent document fournit des transpositions de s�mantique dans les diff�rents cadres existants.
L'ISO 24617-8:2016 peut �tre appliqu� � deux situations diff�rentes:
- pour l'annotation des relations du discours dans les corpus de langage naturel;
- en tant que repr�sentation cible des m�thodes automatiques d'analyse de surface du discours, pour le r�sum� automatique et autres applications.
Les objectifs de cette sp�cification sont de fournir:
- un ensemble de r�f�rence de cat�gories de donn�es qui d�finissent une collection de types de relations du discours avec une s�mantique explicite;
- une repr�sentation pivot bas�e sur un cadre de d�finition des relations du discours qui peut faciliter la transposition entre diff�rents cadres;
- une base d'�laboration de lignes directrices en vue de cr�er de nouvelles ressources qui seront imm�diatement interop�rables avec des ressources pr�-existantes.
En ce qui concerne la structure du discours, la limite du pr�sent document aux sp�cifications d'annotation de relations du discours locales, de �bas niveau�, est fond�e sur l'id�e (a) que l'analyse � ce niveau correspond � ce qui est bien compris et peut �tre clairement d�fini, (b) qu'il est possible, s'il y a lieu, de proc�der � des extensions compl�mentaires permettant de repr�senter une structure de discours globale de niveau plus �lev�, et (c) qu'il permettra une compatibilit� des annotations en d�coulant avec les divers cadres, m�me s'ils reposent sur des th�ories de structure du discours diff�rentes.
En tant que partie int�grante du cadre d'annotation s�mantique (SemAF) de l'ISO 24617, l'ISO 24617-8:2016 DR-core a pour objectif d'�tre transparente dans sa relation avec les cadres d'annotations des relations du discours existants, mais �galement d'�tre compatible avec les autres parties de l'ISO 24617. Certaines relations du discours sont sp�cifiques au discours interactif et recoupent la Partie 2 de l'ISO 24617 consacr�e � l'annotation des actes de dialogue. D'autres relations du discours se rapportent au temps, et leur annotation fait partie int�grante de l'ISO 24617‑1 (temps et �v�nements); d'autres relations du discours encore sont tr�s semblables � certaines relations pr�dicat-argument (�r�les s�mantiques�), dont l'annotation est l'objet principal de l'ISO 24617‑4. Puisque les diff�rentes parties sont indispensables pour constituer un ensemble coh�rent, le pr�sent document porte une attention particuli�re aux interactions de l'annotation des relations du discours avec les autres sch�mas d'annotation s�mantique (voir Article 8).
L'ISO 24617-8:2016 ne traite pas de la repr�sentation des structures de discours globales de niveau �lev�, qui implique de relier des relations du discours locales pour constituer une ou plusieurs structures globales plus complexes.
L'ISO 24617-8:2016 se limite, en outre, aux relations strictement s�mantiques, et exclut donc, par exemple, les relations pr�sentationnelles, qui concernent la fa�on dont un texte est pr�sent� � ses lecteurs ou la fa�on dont des locuteurs structurent leurs contributions � un dialogue oral.
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8. del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Ta dokument ureja predstavitev in označevanje odnosov v lokalnem diskurzu »na nizki ravni« med okoliščinami, omenjenimi v diskurzu, kjer je vsak odnos označen neodvisno od drugih odnosov v istem diskurzu.
Ta dokument določa podlago za označevanje odnosov diskurza z določitvijo nabora temeljnih odnosov diskurza, od katerih imajo številni podobne definicije v različnih ogrodjih. Ta dokument, kolikor mogoče, določa preslikave semantike med različnimi ogrodji.
Ta dokument se uporablja v dveh različnih okoliščinah:
— za označevanje odnosov diskurza v korpusu naravnega jezika;
— kot ciljno predstavitev samodejnih metod za plitko razčlenjevanje diskurza, za povzemanje in druge aplikacije.
Cilji te specifikacije so zagotoviti:
— referenčni nabor podatkovnih kategorij, ki definirajo zbirko vrst odnosov diskurza z eksplicitno semantiko;
— ključno predstavitev, ki temelji na ogrodju za definiranje odnosov diskurza, ki lahko omogoči preslikavo med različnimi ogrodji;
— podlago za pripravo smernic za ustvarjanje novih virov, ki bodo takoj interoperabilni s predhodno obstoječimi viri.
Ob upoštevanju strukture diskurza, omejitev tega dokumenta na specifikacije za označevanje lokalnih odnosov diskurza »na nizki ravni« temelji na pogledu, da (a) je analiza na tej ravni tisto, kar je dobro razumljivo in je mogoče jasno definirati; (b) so, kjer je zaželeno, mogoče nadaljnje razširitve za predstavitev globalne strukture diskurza na višji ravni; in (c) da omogoča združljivost označevanja, ki nastane, med ogrodji, tudi kadar ta temeljijo na različnih teorijah strukture diskurza. Kot del ogrodja za semantično označevanje (»SemAF«) iz standarda ISO 24617 trenutni standard CD-jedro poskuša biti transparenten v svojem odnosu do obstoječih ogrodij za označevanje odnosov diskurza, hkrati pa tudi združljiv z drugimi deli standarda ISO 24617. Nekateri odnosi diskurza so značilni za interaktivni diskurz in se prekrivajo z 2. delom standarda ISO 24617, standarda ISO za označevanje dialogov. Drugi odnosi diskurza se nanašajo na čas in njihovo označevanje je del standarda ISO 24617-1 (čas in dogodki); spet drugi odnosi diskurza pa so zelo podobni določenim odnosov med predikatom in argumenti (»semantične vloge«), katerih označevanje je predmet standarda ISO 24617-4. Ker so za oblikovanje konsistentne celote potrebni različni deli, ta dokument posveča posebno pozornost označevanju interakcij v odnosih diskurza in drugim shemam semantičnega označevanja (glej 8. točko).
Ta dokument ne upošteva predstavitve strukture globalnega diskurza na višji ravni, ki zajema povezovanje odnosov lokalnega diskurza za oblikovanje ene ali več sestavljenih globalnih struktur. Ta dokument je dodatno omejen strogo na semantične odnose, pri čemer so na primer predstavitveni odnosi, ki se nanašajo na način, na katerega je besedilo predstavljeno bralcem, ali način, na
katerega govorci strukturirajo svoje prispevke v govorjenem dialogu, izključeni.
General Information
Buy Standard
Standards Content (Sample)
SLOVENSKI STANDARD
SIST ISO 24617-8:2018
01-september-2018
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8.
del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie
8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24617-8:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 2 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
©
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 8 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
© ISO 2016 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informatio
...
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
©
ISO 2016
---------------------- Page: 1 ----------------------
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24617-8:2016(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
© ISO 2016 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24617-8:2016(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24617-8:2016(E)
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
© ISO 2016 – All rights reserved v
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
© ISO 2016 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO 24617-8:2016(E)
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO 24617-8:2016(E)
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informational” term, i.e. in terms of the “meaning” of the relation’s arguments, or in
“intentional” terms, i.e. in terms of the intentions of the speaker/writer (W) and intended effects on
the hearer/reader (R). While SDRT, HTDC, PDTB and CCR describe the meaning in informational terms,
RST provides definitions in intentional terms. For instance, Example 5 shows the definition for the
(non-volitional) Cause relation in RST (N = nucleus, S = satellite, W = writer, R = reader), while Example
6 presents the definition for the same relation in HTDC (where it is called Explanation).
Example 5 Non-Volitional Cause (RST)
Constraints on N: presents a situation that is not a nucleus
Constraints on the N + S combination: S presents a situation that, by means other than motivating a
volitional action, caused the situation presented in N; without the presentation of S, R might not know
the particular cause of the situation; a presentation of N is more central than S to W’s purposes in
putting forth the N-S combination
The effect: R recognizes the situation presented in S as a cause of the situation presented in N
Locus of the effect: N and S.
Example 6 Explanation (HTDC)
Infer that the state/event asserted by S causes or could cause the state/event asserted by S .
1 0
Despite the different ways of describing DRel semantics, it is important to note that in many cases, the
differences lie in the “level” at which the
...
SLOVENSKI STANDARD
SIST ISO 24617-8:2018
01-september-2018
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8.
del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie
8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24617-8:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 2 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
©
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 8 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
© ISO 2016 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informational” term, i.e. in terms of the “meaning” of the r
...
SLOVENSKI STANDARD
SIST ISO 24617-8:2018
01-september-2018
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DVHPDQWLþQRR]QDþHYDQMH6HP$)
GHO6HPDQWLþQLRGQRVLYGLVNXU]XRVQRYQDVKHPDR]QDþHYDQMD&'MHGUR
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie
8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 7HUPLQRORJLMDQDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
SIST ISO 24617-8:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
---------------------- Page: 1 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 2 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
©
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 8 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
© ISO 2016 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informational” term, i.e. in terms of the “meaning” of the relation
...
NORME ISO
INTERNATIONALE 24617-8
Première édition
2016-12-15
Gestion des ressources langagières —
Cadre d’annotation sémantique
(SemAF) —
Partie 8:
Relations sémantiques dans le
discours, schéma d’annotation de base
(DR-core)
Language resource management — Semantic annotation framework
(SemAF) —
Part 8: Semantic relations in discourse, core annotation schema
(DR-core)
Numéro de référence
ISO 24617-8:2016(F)
©
ISO 2016
---------------------- Page: 1 ----------------------
ISO 24617-8:2016(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2016, Publié en Suisse
Droits de reproduction réservés. Sauf indication contraire, aucune partie de cette publication ne peut être reproduite ni utilisée
sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique, y compris la photocopie, l’affichage sur
l’internet ou sur un Intranet, sans autorisation écrite préalable. Les demandes d’autorisation peuvent être adressées à l’ISO à
l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO 24617-8:2016(F)
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 2
3 Termes et définitions . 2
4 Notions fondamentales et métamodèle. 3
4.1 Vue d’ensemble . 3
4.2 Représentation de la structure du discours . 4
4.3 Description sémantique des relations du discours . 4
4.4 Variantes pragmatiques des relations du discours . . 5
4.5 Classification hiérarchique des relations du discours . 6
4.6 Inférence de relations multiples entre deux segments . 6
4.7 Représentation de la symétrie ou de l’asymétrie des relations . 6
4.8 Représentation de l’importance relative des arguments pour la signification/la
structure du discours . 7
4.9 Arité des arguments . 8
4.10 Forme syntaxique, étendue et (non) adjacence des réalisations des arguments . 8
4.11 Déclencheurs des relations du discours . 8
4.12 Représentation de l’attribution en tant que relation du discours . 9
4.13 Représentation des relations basées sur des entités .10
4.14 Représentation de la non existence d’une relation du discours .11
4.15 Résumé: Postulats du schéma d’annotation du DR-core .11
4.16 Questions à reprendre dans la suite donnée à DR-core .12
4.17 Métamodèle .12
5 Ensemble de base de relations du discours .13
6 Approches actuelles et schémas d’annotation .23
6.1 Vue d’ensemble .23
6.2 Théorie des structures rhétoriques (Rhetorical Structure Theory – RST) .23
6.3 RST Treebank .24
6.4 Théorie de Hobbs sur la cohérence du discours (Hobbs’ Theory of Discourse
Coherence – HTDC) .25
6.5 GraphBank .26
6.6 SDRT .27
6.7 CCR .28
6.8 Penn Discourse Treebank (PDTB) .28
6.9 Transposition des relations du discours DR-Core dans les classifications existantes .30
7 Interactions du présent document avec les autres schémas d’annotation .33
7.1 Chevauchement des schémas d’annotation .33
7.2 Relations du discours et rôles sémantiques .34
7.3 Relations du discours et relations temporelles .34
7.4 Relations du discours et relations sémantiques entre actes du dialogue .35
8 DRelML: Langage de balisage des relations du discours (Discourse Relations
Markup Language) .36
8.1 Vue d’ensemble .36
8.2 Syntaxe abstraite et sémantique de DRelML .37
8.3 Syntaxe concrète .38
Bibliographie .42
© ISO 2016 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO 24617-8:2016(F)
Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www.
iso.org/directives).
L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www.iso.org/brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la signification des termes et expressions spécifiques de l’ISO liés à l’évaluation
de la conformité, ou pour toute information au sujet de l’adhésion de l’ISO aux principes de l’Organisation
mondiale du commerce (OMC) concernant les obstacles techniques au commerce (OTC), voir le lien
suivant: www.iso.org/iso/fr/avant-propos.html
Le comité chargé de l’élaboration du présent document est l’ISO/TC 37, Terminologie et autres ressources
langagières et ressources de contenu, sous-comité SC 4, Gestion des ressources linguistiques.
Une liste de toutes les parties de l’ISO 24617 figure sur le site web de l’ISO.
iv © ISO 2016 – Tous droits réservés
---------------------- Page: 4 ----------------------
ISO 24617-8:2016(F)
Introduction
La dernière décennie a connu une multiplication de corpus annotés linguistiquement et codant de
nombreux phénomènes à l’appui de la recherche empirique en matière de langue naturelle, tant
informatique que théorique. Au niveau du discours, un intérêt pour le traitement du discours a conduit
à l’élaboration de plusieurs corpus annotés en ce qui concerne les relations du discours. Les relations
du discours, également appelées «relations de cohérence» ou «relations rhétoriques», sont des relations
exprimées de manière explicite ou implicite, entre des situations mentionnées dans un discours: elles
sont essentielles à une pleine compréhension du discours, allant au-delà de la signification véhiculée par
les propositions et les phrases. Les relations du discours et la structure du discours sont considérées
[39][41]
comme des composantes essentielles des tâches du TALN telles que le résumé automatique , les
[74] [19][47]
questions complexes dans les systèmes de question-réponses , la génération de langage naturel
[56] [42] [11][12]
, la traduction automatique , la fouille d’opinions et l’analyse des sentiments et la recherche
[38] [76]
d’information . Une synthèse récente intègre une description des dernières techniques en matière
de discours et de traitement automatique. Plusieurs initiatives internationales et collaboratives ont
permis de créer des ressources de relations du discours annotées, dans différentes langues et genres,
en vue de favoriser le développement de ce type d’applications.
Les cadres d’annotation existants présentent deux différences fondamentales au niveau des postulats de
départ: l’une d’entre elle concerne la représentation de la structure du discours, l’autre la classification
sémantique des relations du discours. Il s’ensuit que les annotations élaborées au moyen d’un cadre
donné sont difficiles à interpréter dans un autre cadre et que l’interopérabilité des ressources annotées
est limitée. Cependant, en dépit de ces différences, il existe entre ces cadres d’annotation de fortes
compatibilités qui peuvent être précisées et utilisées pour procéder à des transpositions et établir des
correspondances entre les ressources, ainsi que pour servir de base aux annotations futures.
Dans un discours (écrit ou oral) cohérent, les situations mentionnées dans le discours, comme les
événements, les déclarations, les faits, les propositions et les actes de dialogue, sont liées, sur le plan
sémantique, par des relations causales, contrastives, temporelles et autres, appelées «relations du
discours», «relations rhétoriques» ou «relations de cohérence». Bien que les relations du discours se
situent principalement entre les significations de phrases ou des énoncés successifs du discours, elles
peuvent aussi apparaître entre les significations d’unités plus petites ou plus grandes (nominalisations,
propositions, paragraphes, segments de dialogues) et elles peuvent également apparaître entre des
situations qui ne sont pas décrites de façon explicite, mais qui peuvent être inférées.
Le présent document a pour objet de spécifier une approche interopérable d’annotation de relations
sémantiques locales dans le discours (DRel), qui respecte le cadre d’annotation linguistique (LAF,
[23]
ISO 24612-2; voir également Référence ) et les grands principes de l’annotation sémantique
déterminés dans l’ISO 24617-6. Il illustre le point de vue selon lequel il peut être observé des
compatibilités sous-jacentes fortes par rapport à la description sémantique des relations du discours
dans les divers cadres de relations du discours utilisés pour l’annotation des données, par exemple la
[40]
théorie des structures rhétoriques (Rhetorical Structure Theory, RST) , la théorie des représentations
[3]
discursives segmentées (Segmented Discourse Representation Theory, SDRT) , le Penn Discourse
[59]
Treebank (PDTB) , la théorie de Hobbs sur la cohérence du discours (Hobbs’ Theory of Discourse
[17][18]
Coherence, HTDC) et l’approche cognitive des relations de cohérence (Cognitive Approach to
[66]
Coherence Relations, CCR) . Ce document a pour objet d’expliquer ces compatibilités et de proposer
des transpositions approximatives entre les définitions des relations individuelles du discours, telles
que spécifiées dans les différents cadre, qui bénéficieront à l’ensemble de la communauté.
Le présent document a pour objet de (1) dresser une liste de souhaits concernant l’interopérabilité
de l’annotation des DRel; (2) préciser une méthode d’annotation des DRel qui soit compatible avec
les schémas d’annotation normalisés de l’ISO relatifs à l’information sémantique, existants et à venir;
(3) fournir des définitions claires et mutuellement cohérentes d’un ensemble «de base» de relations du
discours qui apparaissent souvent sous une forme ou une autre dans de nombreux cadres actuels de
relations du discours. Ensemble, les objectifs (2) et (3) constituent un «schéma d’annotation de base»
des DRel.
© ISO 2016 – Tous droits réservés v
---------------------- Page: 5 ----------------------
ISO 24617-8:2016(F)
Le présent document n’a pas pour objet de fournir un ensemble exhaustif et figé de relations du discours,
mais plutôt de fournir un ensemble de base de relations ouvert et extensible. Le schéma d’annotation
de base aborde également certaines questions de l’annotation des relations du discours qui restent
en suspens, car elles nécessitent une étude plus approfondie en collaboration avec d’autres initiatives
d’annotation multilingue du discours, notamment l’action TextLink dans le cadre du programme
européen COST. Il est envisagé d’élaborer prochainement une nouvelle partie de l’ISO 24617 qui
complètera le présent document en fournissant un schéma d’annotation complet et interopérable des
DRel, tout en répondant à la dimension multilingue de la norme. Les questions qui seront reprises dans
cette partie complémentaire sont énumérées en 4.16.
vi © ISO 2016 – Tous droits réservés
---------------------- Page: 6 ----------------------
NORME INTERNATIONALE ISO 24617-8:2016(F)
Gestion des ressources langagières — Cadre d’annotation
sémantique (SemAF) —
Partie 8:
Relations sémantiques dans le discours, schéma
d’annotation de base (DR-core)
1 Domaine d’application
Le présent document détermine la représentation et l’annotation des relations du discours locales,
de «bas niveau», entre les situations mentionnées dans le discours, chaque relation étant annotée
indépendamment des autres relations dans le même discours.
Le présent document fournit un socle d’annotation des relations du discours, en spécifiant un ensemble
de base de relations du discours, un grand nombre d’entre elles revêtant des définitions similaires dans
des cadres différents. Dans la mesure du possible, le présent document fournit des transpositions de
sémantique dans les différents cadres existants.
Le présent document peut être appliqué à deux situations différentes:
— pour l’annotation des relations du discours dans les corpus de langage naturel;
— en tant que représentation cible des méthodes automatiques d’analyse de surface du discours, pour
le résumé automatique et autres applications.
Les objectifs de cette spécification sont de fournir:
— un ensemble de référence de catégories de données qui définissent une collection de types de
relations du discours avec une sémantique explicite;
— une représentation pivot basée sur un cadre de définition des relations du discours qui peut faciliter
la transposition entre différents cadres;
— une base d’élaboration de lignes directrices en vue de créer de nouvelles ressources qui seront
immédiatement interopérables avec des ressources pré-existantes.
En ce qui concerne la structure du discours, la limite du présent document aux spécifications
d’annotation de relations du discours locales, de «bas niveau», est fondée sur l’idée (a) que l’analyse
à ce niveau correspond à ce qui est bien compris et peut être clairement défini, (b) qu’il est possible,
s’il y a lieu, de procéder à des extensions complémentaires permettant de représenter une structure
de discours globale de niveau plus élevé, et (c) qu’il permettra une compatibilité des annotations en
découlant avec les divers cadres, même s’ils reposent sur des théories de structure du discours
différentes.
En tant que partie intégrante du cadre d’annotation sémantique (SemAF) de l’ISO 24617, la présente
norme DR-core a pour objectif d’être transparente dans sa relation avec les cadres d’annotations des
relations du discours existants, mais également d’être compatible avec les autres parties de l’ISO 24617.
Certaines relations du discours sont spécifiques au discours interactif et recoupent la Partie 2 de
l’ISO 24617 consacrée à l’annotation des actes de dialogue. D’autres relations du discours se rapportent
au temps, et leur annotation fait partie intégrante de l’ISO 24617-1 (temps et événements); d’autres
relations du discours encore sont très semblables à certaines relations prédicat-argument («rôles
sémantiques»), dont l’annotation est l’objet principal de l’ISO 24617-4. Puisque les différentes parties
sont indispensables pour constituer un ensemble cohérent, le présent document porte une attention
© ISO 2016 – Tous droits réservés 1
---------------------- Page: 7 ----------------------
ISO 24617-8:2016(F)
particulière aux interactions de l’annotation des relations du discours avec les autres schémas
d’annotation sémantique (voir Article 8).
Le présent document ne traite pas de la représentation des structures de discours globales de niveau
élevé, qui implique de relier des relations du discours locales pour constituer une ou plusieurs
structures globales plus complexes.
Le présent document se limite, en outre, aux relations strictement sémantiques, et exclut donc, par
exemple, les relations présentationnelles, qui concernent la façon dont un texte est présenté à ses
lecteurs ou la façon dont des locuteurs structurent leurs contributions à un dialogue oral.
2 Références normatives
Le présent document ne contient aucune référence normative.
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions suivants s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— IEC Electropedia: disponible à l’adresse http://www.electropedia.org/.
— ISO Online browsing platform: disponible à l’adresse http://www.iso.org/obp.
3.1
discours
séquence de propositions ou de phrases dans un texte écrit ou d’énoncés dans un discours oral
3.2
situation
éventualité, fait, proposition, condition, croyance ou acte de dialogue, qui peut être réalisé au moyen
d’une expression simple ou complexe sur le plan linguistique, par exemple une proposition, une
nominalisation, une phrase/un énoncé ou un segment de discours comportant des phrases ou des
énoncés multiples
3.3
relation du discours
relation entre deux situations (3.2) mentionnées dans un discours (3.1)
EXEMPLE 1 «Pierre est arrivé en retard à la réunion. Il était bloqué dans un embouteillage.» Les événements
mentionnés dans ces deux phrases sont implicitement liés par la relation du discours Cause.
EXEMPLE 2 «Pierre était bloqué dans un embouteillage, mais il est arrivé à temps à la réunion.» Les
événements mentionnés dans ces deux propositions sont reliés par la relation du discours Concession, exprimé
par le connecteur «mais».
EXEMPLE 3 «Pierre n’a pas réussi à venir à la réunion: il a été retenu dans un très gros embouteillage.» Dans
cet exemple, la relation causale est la même que dans l’exemple 1, cependant l’argument exprimé par la première
partie n’est pas une éventualité, mais une proposition, formée par la description d’un événement à polarité
négative.
Note 1 à l’article: Il existe des quasi-synonymes pour «relation du discours», dont la signification est légèrement
différente, à savoir «relation de cohérence» et «relation rhétorique».
2 © ISO 2016 – Tous droits réservés
---------------------- Page: 8 ----------------------
ISO 24617-8:2016(F)
3.4
connecteur de discours
mot ou expression à mots multiples exprimant une relation du discours (3.3)
EXEMPLE Les connecteurs de discours à mot unique comprennent «mais», «puisque», «et», «cependant»,
«car». Les connecteurs de discours à mots multiples comprennent «ainsi que», «tel que».
Note 1 à l’article: De nombreux mots utilisés en tant que connecteurs de discours peuvent également être utilisés
comme conjonction à l’intérieur d’une proposition, par exemple l’utilisation de «et» dans «Jean et Marie forment
un beau couple».
3.5
structure de discours de bas niveau
représentation de la structure de discours qui ne spécifie que les dépendances locales entre une
relation de discours et ses arguments, sans que soient précisés les liens ou les dépendances entre ces
structures locales
4 Notions fondamentales et métamodèle
4.1 Vue d’ensemble
Dans un discours, qui se déclenche lorsque la communication implique une séquence de propositions
ou de phrases dans un texte, ou d’énoncés dans un dialogue, un aspect essentiel de la compréhension
découle de la façon dont les événements, les déclarations, les faits, les propositions et les actes de
dialogue mentionnés dans le discours sont reliés les uns aux autres. Comprendre ces relations, telles que
la Causalité (Cause), le Contraste (Contrast) et la Condition (Condition), contribue à ce que l’on appelle
la «cohérence du discours»: ces relations peuvent être «réalisées» de manière explicite au moyen de
certains mots et expressions (souvent appelés «connecteurs») ou peuvent être implicites lorsqu’elles
doivent être inférées à partir du contexte du discours et de notre connaissance du monde. Les exemples 1
à 3 illustrent la relation Cause réalisée avec des expressions de différentes classes syntaxiques. Dans
l’exemple 1, une conjonction de surbordination, «parce que», est utilisée pour identifier une situation
donnée (ici, la signification de la proposition subordonnée) comme la raison à l’événement d’achat
évoqué dans la proposition principale. Dans l’exemple 2, un adverbe, «En conséquence», est utilisé pour
relier deux phrases en exprimant la conséquence liée au fait de ne pas constater beaucoup de signes
indiquant un arrêt de la croissance. Dans l’exemple 3, il est de nouveau fait usage d’une expression
explicite, pour expliquer l’allégation concernant le niveau de retrait des investisseurs, mais, ici, cette
expression ne correspond pas à une classe syntaxique unique et bien définie comme une conjonction
ou un adverbe. Enfin, l’exemple 4 montre que, bien qu’une relation causale puisse être inférée entre
les deux phrases, la deuxième phrase proposant une explication de la raison pour laquelle certains
(investisseurs) ont relevé leurs liquidités, aucun mot, aucune expression du texte n’exprime cette
inférence. Au lieu de cela, il est nécessaire d’utiliser le contexte du discours avec des mécanismes de
cohésion et de connaissance du monde pour comprendre la relation. Souvent, lorsque de telles relations
[44]
sont inférées, il est possible d’introduire une expression conjonctive pour exprimer la relation,
comme démontré ici avec l’insertion de «parce que». Dans ce document, le terme «conjonctif» est utilisé
au sens large, pour faire référence à des mots ou des expressions utilisés pour exprimer une relation du
discours, notamment les mots ou les expressions tirés de classes syntaxiques bien définies tout autant
que ceux qui ne le sont pas.
Exemple 1 M. Taft, qui est également président de Taft Broadcasting Co., a déclaré qu’il achetait des actions,
parce qu’il dispose d’un compte à la société de courtage Salomon Brothers Inc., qui lui avait recommandé
ces actions comme un bon investissement.
Exemple 2 En dépit du ralentissement économique, rares sont les signes montrant clairement que la croissance
marque le pas. En conséquence, les dirigeants de la Fed peuvent être divisés sur l’utilité ou non d’une
politique d’assouplissement du crédit.
Exemple 3 Mais un désengagement prononcé des investisseurs est plus qu’improbable cette fois-ci, selon
les gestionnaires de fonds. L’une des principales raisons tient au fait que les investisseurs ont déjà
considérablement réduit leurs achats de fonds en actions depuis le Lundi noir.
© ISO 2016 – Tous droits réservés 3
---------------------- Page: 9 ----------------------
ISO 24617-8:2016(F)
Exemple 4 Certains augmentent leurs liquidités à des niveaux record. [Implicite (parce que)] Des niveaux
élevés de liquidités permettent de réguler un fonds lorsque le marché s’effondre.
Les cadres existants de description et de représentation des relations du discours diffèrent les uns
des autres sur plusieurs aspects. La suite de cet article établit une comparaison des cadres les plus
importants, en privilégiant ceux qui ont été utilisés comme base d’annotation des relations du discours
dans les corpus, en particulier la théorie de Hobbs sur la cohérence du discours (Theory of Discourse
1)
[18]
Coherence, HTDC) , la théorie des structures rhétoriques (Rhetorical Structure Theory, RST)
[40]
de Mann et Thompson , l’approche cognitive des relations de cohérence (Cognitive Approach of
[66]
Coherence Relations, CCR) de Sanders et al. , la théorie des représentations discursives segmentées
[3]
(Segmented Discourse Representation Theory, SDRT) de Asher et Lascarides et le cadre d’annotation
[59][61]
du Penn Discourse Treebank (PDTB) . Cette comparaison met en exergue et analyse les aspects
principaux considérés comme pertinents pour l’élaboration de la représentation pivot de DR-core. Pour
chaque aspect, l’analyse est suivie de la spécification ISO adoptée pour ledit aspect. L’article se termine
par un résumé des caractéristiques de base de la spécification DR-core et le métamodèle DR-Core.
4.2 Représentation de la structure d
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.