SIST ISO 24617-8:2018
Language resource management -- Semantic annotation framework (SemAF) -- Part 8: Semantic relations in discourse, core annotation schema (DR-core)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8: Semantic relations in discourse, core annotation schema (DR-core)
ISO 24617-8:2016 establishes the representation and annotation of local, "low-level" discourse relations between situations mentioned in discourse, where each relation is annotated independently of other relations in the same discourse.
ISO 24617-8:2016 provides a basis for annotating discourse relations by specifying a set of core discourse relations, many of which have similar definitions in different frameworks. To the extent possible, this document provides mappings of the semantics across the different frameworks.
ISO 24617-8:2016 is applicable to two different situations:
- for annotating discourse relations in natural language corpora;
- as a target representation of automatic methods for shallow discourse parsing, for summarization, and for other applications.
The objectives of this specification are to provide:
- a reference set of data categories that define a collection of discourse relation types with an explicit semantics;
- a pivot representation based on a framework for defining discourse relations that can facilitate mapping between different frameworks;
- a basis for developing guidelines for creating new resources that will be immediately interoperable with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating local, "low-level" discourse relations is based on the view that (a) the analysis at this level is what is well understood and can be clearly defined; (b) further extensions to represent higher-level, global discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework ("SemAF"), the present DR-core standard aims to be transparent in its relation to existing frameworks for discourse relation annotation, but also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617‑1 (time and events); still other discourse relations are very similar to certain predicate-argument relations ("semantic roles"), whose annotation is the subject matter of ISO 24617‑4. Since the various parts are required to form a consistent whole, this document pays special attention to the interactions of discourse relation annotation and other semantic annotation schemes (see Clause 8).
ISO 24617-8:2016 does not consider global, higher-level discourse structure representation which involves linking local discourse relations to form one or more composite global structures.
ISO 24617-8:2016 is, moreover, restricted to strictly semantic relations, to the exclusion of, for example, presentational relations, which concern the way in which a text is presented to its readers or the way in which speakers structure their contributions in a spoken dialogue.
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie 8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
L'ISO 24617-8:2016 détermine la représentation et l'annotation des relations du discours locales, de «bas niveau», entre les situations mentionnées dans le discours, chaque relation étant annotée indépendamment des autres relations dans le même discours.
L'ISO 24617-8:2016 fournit un socle d'annotation des relations du discours, en spécifiant un ensemble de base de relations du discours, un grand nombre d'entre elles revêtant des définitions similaires dans des cadres différents. Dans la mesure du possible, le présent document fournit des transpositions de sémantique dans les différents cadres existants.
L'ISO 24617-8:2016 peut être appliqué à deux situations différentes:
- pour l'annotation des relations du discours dans les corpus de langage naturel;
- en tant que représentation cible des méthodes automatiques d'analyse de surface du discours, pour le résumé automatique et autres applications.
Les objectifs de cette spécification sont de fournir:
- un ensemble de référence de catégories de données qui définissent une collection de types de relations du discours avec une sémantique explicite;
- une représentation pivot basée sur un cadre de définition des relations du discours qui peut faciliter la transposition entre différents cadres;
- une base d'élaboration de lignes directrices en vue de créer de nouvelles ressources qui seront immédiatement interopérables avec des ressources pré-existantes.
En ce qui concerne la structure du discours, la limite du présent document aux spécifications d'annotation de relations du discours locales, de «bas niveau», est fondée sur l'idée (a) que l'analyse à ce niveau correspond à ce qui est bien compris et peut être clairement défini, (b) qu'il est possible, s'il y a lieu, de procéder à des extensions complémentaires permettant de représenter une structure de discours globale de niveau plus élevé, et (c) qu'il permettra une compatibilité des annotations en découlant avec les divers cadres, même s'ils reposent sur des théories de structure du discours différentes.
En tant que partie intégrante du cadre d'annotation sémantique (SemAF) de l'ISO 24617, l'ISO 24617-8:2016 DR-core a pour objectif d'être transparente dans sa relation avec les cadres d'annotations des relations du discours existants, mais également d'être compatible avec les autres parties de l'ISO 24617. Certaines relations du discours sont spécifiques au discours interactif et recoupent la Partie 2 de l'ISO 24617 consacrée à l'annotation des actes de dialogue. D'autres relations du discours se rapportent au temps, et leur annotation fait partie intégrante de l'ISO 24617‑1 (temps et événements); d'autres relations du discours encore sont très semblables à certaines relations prédicat-argument («rôles sémantiques»), dont l'annotation est l'objet principal de l'ISO 24617‑4. Puisque les différentes parties sont indispensables pour constituer un ensemble cohérent, le présent document porte une attention particulière aux interactions de l'annotation des relations du discours avec les autres schémas d'annotation sémantique (voir Article 8).
L'ISO 24617-8:2016 ne traite pas de la représentation des structures de discours globales de niveau élevé, qui implique de relier des relations du discours locales pour constituer une ou plusieurs structures globales plus complexes.
L'ISO 24617-8:2016 se limite, en outre, aux relations strictement sémantiques, et exclut donc, par exemple, les relations présentationnelles, qui concernent la façon dont un texte est présenté à ses lecteurs ou la façon dont des locuteurs structurent leurs contributions à un dialogue oral.
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8. del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Ta dokument ureja predstavitev in označevanje odnosov v lokalnem diskurzu »na nizki ravni« med okoliščinami, omenjenimi v diskurzu, kjer je vsak odnos označen neodvisno od drugih odnosov v istem diskurzu.
Ta dokument določa podlago za označevanje odnosov diskurza z določitvijo nabora temeljnih odnosov diskurza, od katerih imajo številni podobne definicije v različnih ogrodjih. Ta dokument, kolikor mogoče, določa preslikave semantike med različnimi ogrodji.
Ta dokument se uporablja v dveh različnih okoliščinah:
— za označevanje odnosov diskurza v korpusu naravnega jezika;
— kot ciljno predstavitev samodejnih metod za plitko razčlenjevanje diskurza, za povzemanje in druge aplikacije.
Cilji te specifikacije so zagotoviti:
— referenčni nabor podatkovnih kategorij, ki definirajo zbirko vrst odnosov diskurza z eksplicitno semantiko;
— ključno predstavitev, ki temelji na ogrodju za definiranje odnosov diskurza, ki lahko omogoči preslikavo med različnimi ogrodji;
— podlago za pripravo smernic za ustvarjanje novih virov, ki bodo takoj interoperabilni s predhodno obstoječimi viri.
Ob upoštevanju strukture diskurza, omejitev tega dokumenta na specifikacije za označevanje lokalnih odnosov diskurza »na nizki ravni« temelji na pogledu, da (a) je analiza na tej ravni tisto, kar je dobro razumljivo in je mogoče jasno definirati; (b) so, kjer je zaželeno, mogoče nadaljnje razširitve za predstavitev globalne strukture diskurza na višji ravni; in (c) da omogoča združljivost označevanja, ki nastane, med ogrodji, tudi kadar ta temeljijo na različnih teorijah strukture diskurza. Kot del ogrodja za semantično označevanje (»SemAF«) iz standarda ISO 24617 trenutni standard CD-jedro poskuša biti transparenten v svojem odnosu do obstoječih ogrodij za označevanje odnosov diskurza, hkrati pa tudi združljiv z drugimi deli standarda ISO 24617. Nekateri odnosi diskurza so značilni za interaktivni diskurz in se prekrivajo z 2. delom standarda ISO 24617, standarda ISO za označevanje dialogov. Drugi odnosi diskurza se nanašajo na čas in njihovo označevanje je del standarda ISO 24617-1 (čas in dogodki); spet drugi odnosi diskurza pa so zelo podobni določenim odnosov med predikatom in argumenti (»semantične vloge«), katerih označevanje je predmet standarda ISO 24617-4. Ker so za oblikovanje konsistentne celote potrebni različni deli, ta dokument posveča posebno pozornost označevanju interakcij v odnosih diskurza in drugim shemam semantičnega označevanja (glej 8. točko).
Ta dokument ne upošteva predstavitve strukture globalnega diskurza na višji ravni, ki zajema povezovanje odnosov lokalnega diskurza za oblikovanje ene ali več sestavljenih globalnih struktur. Ta dokument je dodatno omejen strogo na semantične odnose, pri čemer so na primer predstavitveni odnosi, ki se nanašajo na način, na katerega je besedilo predstavljeno bralcem, ali način, na
katerega govorci strukturirajo svoje prispevke v govorjenem dialogu, izključeni.
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
01-september-2018
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8.
del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie
8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
©
ISO 2016
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informational” term, i.e. in terms of the “meaning” of the relation’s arguments, or in
“intentional” terms, i.e. in terms of the intentions of the speaker/writer (W) and intended effects on
the hearer/reader (R). While SDRT, HTDC, PDTB and CCR describe the meaning in informational terms,
RST provides definitions in intentional terms. For instance, Example 5 shows the definition for the
(non-volitional) Cause relation in RST (N = nucleus, S = satellite, W = writer, R = reader), while Example
6 presents the definition for the same relation in HTDC (where it is called Explanation).
Example 5 Non-Volitional Cause (RST)
Constraints on N: presents a situation that is not a nucleus
Constraints on the N + S combination: S presents a situation that, by means other than motivating a
volitional action, caused the situation presented in N; without the presentation of S, R might not know
the particular cause of the situation; a presentation of N is more central than S to W’s purposes in
putting forth the N-S combination
The effect: R recognizes the situation presented in S as a cause of the situation presented in N
Locus of the effect: N and S.
Example 6 Explanation (HTDC)
Infer that the state/event asserted by S causes or could cause the state/event asserted by S .
1 0
Despite the different ways of describing DRel semantics, it is important to note that in many cases, the
differences lie in the “level” at which the relation is described, especially when the situations being
related are the same. Thus, for example, a DRel defined in informational terms in one framework can be
effectively mapped to a DRel in another framework where it may be defined in intentional terms. With
this in mind, DRel meaning in the DR-core specification is described in “informational” terms, but in 6.9,
a mapping is provided from the core relation types (presented in Clause 5) to the relations present in
existing classifications, including those that define relations in intentional terms.
4.4 Pragmatic variants of discourse relations
With the exception of HTDC, all frameworks also distinguish relations when one or both of the
2)
arguments involve an implicit belief or a dialogue act that takes scope over the semantic content of
the argument. The motivation for this distinction comes from examples like Example 7, where it should
not be inferred that John’s sending of the message somehow led to him being absent from work, but
2) The concept of a dialogue act, as used in ISO 24617-2, can be seen as an empirically based and computationally
well-defined interpretation of the traditional notion of a “speech act”.
4 © ISO 2016 – All rights reserved
rather that it causes the speaker/writer to believe that John is not at work. In other words, the meaning
of the subordinate clause provides evidence supporting the claim made by the main clause. Similarly, in
Example 8, the inference should be made that the explanation is being provided not for the content of
the question but for the (dialogue) act of questioning itself.
Example 7 John is not at work today, because he sent me a message to say he was sick.
Example 8 What are you doing tonight? Because there’s a good movie on.
This kind of distinction has been given various names in the literature, for example the “semantic-
[73][66][46] [17][44]
pragmatic” distinction, the “internal-external” distinction, the “ideational-pragmatic”
[63] [37]
distinction and the “content-metatalk” distinction. In other cases, such as in RST, the distinction,
while not being explicitly named, is evidently taken into account in the classification (e.g. Cause vs.
Evidence/Justify in RST distinguishes the semantic and pragmatic interpretations, respectively). What
is difficult to reconcile about the treatment of this distinction across the various frameworks is that
while some, like CCR, allow for it for all relation types, others, like the PDTB and RST, only admit it
for some relations (e.g. Cause, Condition, Contrast, Concession in PDTB). It must be noted, however,
that there doesn’t seem to be any a priori reason for such a restriction to only some relation types,
and the choice is in the end found to result from what was observed in the corpus that was analysed
and/or annotated. In DR-core, the “semantic-pragmatic” distinction is allowed for all relation types, with
the general aim of not being overly restrictive in the absence of well-defined criteria. At the same time,
the scheme does not encode this distinction on the relation, but rather on the arguments of the relation,
the main reason being that in all cases involving either a belief or a dialogue act, what is different is
not the relation, but rather the semantic status of the arguments. A further motivation comes from
recognizing that representing the distinction on the relation would not distinguish cases where the
belief or dialogue act is implicit (as in Examples 7 and 8) from those where they are made explicit with
performative verbs or propositional attitude verbs, as in Examples 9 and 10. Pragmatic interpretations
are therefore represented on arguments using a feature indicating the argument to be of the type
“belief” or the type “dialogue act”. Note that in cases exemplified by Examples 9 and 10 the belief or
dialogue act aspect of the meaning is entirely obtained from the explicit content of the arguments,
rather than from a contextually motivated inference.
Example 9 I believe John is not at work today because he sent me a message to say he was sick.
Example 10 I’m asking you what you are doing tonight because there’s a good movie on.
4.5 Hierarchical classification of discourse relations
In all existing frameworks, discourse relations are grouped together semantically to a greater or lesser
degree; where they differ is in how the groupings are done. For example, while PDTB groups Concession
together with Contrast under the broader Comparison class, CCR places Concession under the Negative
Causal relation group, while placing Contrast under the Negative Additive group. Reconciliation with
respect to these groupings is not possible, since they stem from basic differences in what is taken to
count as semantic closeness. The solution adopted in the DR-core specification is to use a “flat” set of core
relations that can be used in an annotation scheme as just that, or mapped to the appropriate type
within a particular hierarchical scheme adopted. In 6.9, these mappings from the DR-core relations to
the schemes in different frameworks are provided.
4.6 Inference of multiple relations between two segments
Among the various frameworks, the PDTB is unique in allowing multiple relations to be inferred
between two given situations. The connective “since”, for example, can have both temporal and causal
interpretations, as in Example 11.
Example 11 MiniScribe has been on the rocks since it disclosed earlier this year that its earnings
reports for 1988 weren’t accurate.
The DR-core specification provides for representing multiple relations inferred between two given
situations, both when the relations are realized explicitly as well as implicitly.
4.7 Representation of (a)symmetry of relations
Whether or not a discourse relation is symmetric or asymmetric is a distinction embodied in the
representation of all frameworks. That is, given a relation REL and its arguments A and B, all frameworks
distinguish whether or not (REL, A, B) is equivalent to (REL, B, A). For example, the Contrast relation is
taken to be symmetric whereas the Cause relation is considered asymmetric. Where frameworks differ
is in how this distinction is captured in the scheme. Most classifications, such as RST, CCR, HTDC and
PDTB, encode asymmetry in terms of the textual linear ordering and/or the syntax of the argument
realizations. Thus, in the CCR classification, where the argument span ordering is one of the basic
“cognitive” primitives underlying the scheme, the relation Cause-Consequence captures the “basic”
order for the semantic causal relation, with the cause appearing before the effect, whereas the relation
Consequence-Cause captures the “non-basic” order, with the effect appearing before the cause. In the
PDTB, argument spans are first named as Arg1 and Arg2 according to syntactic criteria, including
syntactic dependency and linear order, and the asymmetrical relations are then defined in terms of
the Arg1 and Arg2 labels (for example, in Cause:Reason, Arg2 is the cause and Arg1 the effect, while in
Cause:Result, Arg1 is the cause and Arg2 is the effect). GraphBank, on the other hand, utilizes a different
mechanism to capture the asymmetry. Rather than making reference to linear order, it makes use of
directed arcs in the annotation, with definitions provided for how to interpret the directionality for
each relation type (for example, for the relation Cause-Effect, the arc is directed from the span stating
the cause to the span stating the effect; for the relation Violated Expectation, the arc is directed from
the span stating the cause to the span stating the absent effect; and so on).
In the DR-core specification, representation of asymmetry abstracts over the linear ordering and
syntactic structure, not only because these are not semantic in nature but also because they may not be
good criteria from the viewpoint of interoperability, given the wide variation in cross-linguistic syntax,
including clause-combination. Instead, asymmetry is represented by specifying the argument roles in
the definition of each relation. Arguments are named Arg1 and Arg2, but they bear relation-specific
semantic roles. For example, in the Cause relation, defined as “Arg2 serves as an explanation for Arg1”
(see Table 1), the text span named Arg2 always provides the reason in the Cause relation, irrespective
of linear order or syntax, and Arg1 always constitutes the result. For human annotators, mnemonic
labels indicating the semantic roles, like “reason” and “result”, are more convenient than “Arg2” and
“Arg1”, therefore the ISO specification also allows the use of these semantic role labels. Table 2 provides
the mapping between Arg1 and Arg2 labels and the corresponding semantic role labels for asymmetric
relations. In symmetric relations, on the other hand, where both arguments play the same semantic role,
arguments are named Arg1 and Arg2 following their linear order in the text.
It is important to note that this representation can be effectively mapped to other schemes for
representing asymmetry and in no way obfuscates the differences in linear ordering of the arguments,
which can be easily determined by pairing the argument roles with the text span annotations. The
ISO scheme acknowledges that linear ordering has a bearing for claims that different versions of an
asymmetric relation may not have the same linguistic constraints, for example, with respect to
[3]
linguistic predictions for the following discourse.
4.8 Representation of the relative importance of arguments for discourse
meaning/structure
Beyond the representation of asymmetry, some frameworks, namely RST, HTDC, and SDRT also
explicitly represent the “relative importance” of DRel arguments, taking this relative importance to
impact the meaning or structure of the text as a whole. In RST, one argument of an asymmetric relation
[40]
is labelled the “nucleus” whereas the other is labelled “satellite”, based on the following criteria: (a)
The nucleus is more essential to the writer’s purpose than the satellite; (b) In comparison to the nucleus,
the satellite is more easily substitutable without much change to the apparent function of the text (or
discourse) as a whole, and (c) Without the nucleus, the content of the satellite is incomprehensible (in the
text as a whole), a non sequitur. HTDC has a similar approach, using the term “dominance”, with the goal
of deriving a single assertion from a discourse relation connecting two segments, and distinguishing
relations in terms of how this single assertion should be derived. In subordinating relations, in particular,
the assertion associated with the relation is obtained from the “dominant” segment, as specified in the
relation definitions. SDRT, on the other hand, classifies a relation as “subordinating” or “coordinating”,
6 © ISO 2016 – All rights reserved
[4]
depending on what structural configuration the arguments create in the discourse graph. In the DR-
core specification, the relative importance of arguments for the text (meaning or structure) as a whole is
not represented directly. However, because of the explicit identification of the roles of arguments in each
relation definition (as described in 4.7), a layer of representation capturing the arguments” relative
importance can be easily derived. For example, a mapping from ISO categories to RST categories for
Cause would label the Arg2 (corresponding to the reason) argument as the satellite and the Arg1
(corresponding to the result) argument as the nucleus, because there is a one-to-one mapping in RST
between the semantic roles of arguments and their respective functional roles for relative importance,
for each relation. A similar mapping can be shown for SDRT relations as well.
4.9 Arity of arguments
Except RST, all frameworks assume that a discourse relation has two and only two arguments. In RST,
the constraints on the number of arguments for a relation are captured via multinuclear relations,;
the relations Joint and Sequence (among others) allow for more than two arguments. In the DR-core
specification, a discourse relation is restricted to two and only two arguments, with the understanding
that a mapping from binary relations to n-ary relations is possible where necessary. For example, two
identical binary relations with shared arguments, R(A, B) and R(B, C), can be collapsed into a single
ternary relation R(A, B, C), if the given framework allows for the relation R to be n-ary.
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations
Three important considerations for annotating the arguments of a discourse relation are the following.
The first has to do with the kinds of syntactic forms the realization of an argument can have. That is,
what are the minimal allowable syntactic units corresponding to an argument? While all frameworks
agree that the typical syntactic realization of an argument is a “clause”, some allow for certain non-
clausal phrases as well. In the end, the differences emerge because of different views on the information
status of different syntactic fo
...
SLOVENSKI STANDARD
01-september-2018
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8.
del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)
Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie
8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
©
ISO 2016
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informational” term, i.e. in terms of the “meaning” of the relation’s arguments, or in
“intentional” terms, i.e. in terms of the intentions of the speaker/writer (W) and intended effects on
the hearer/reader (R). While SDRT, HTDC, PDTB and CCR describe the meaning in informational terms,
RST provides definitions in intentional terms. For instance, Example 5 shows the definition for the
(non-volitional) Cause relation in RST (N = nucleus, S = satellite, W = writer, R = reader), while Example
6 presents the definition for the same relation in HTDC (where it is called Explanation).
Example 5 Non-Volitional Cause (RST)
Constraints on N: presents a situation that is not a nucleus
Constraints on the N + S combination: S presents a situation that, by means other than motivating a
volitional action, caused the situation presented in N; without the presentation of S, R might not know
the particular cause of the situation; a presentation of N is more central than S to W’s purposes in
putting forth the N-S combination
The effect: R recognizes the situation presented in S as a cause of the situation presented in N
Locus of the effect: N and S.
Example 6 Explanation (HTDC)
Infer that the state/event asserted by S causes or could cause the state/event asserted by S .
1 0
Despite the different ways of describing DRel semantics, it is important to note that in many cases, the
differences lie in the “level” at which the relation is described, especially when the situations being
related are the same. Thus, for example, a DRel defined in informational terms in one framework can be
effectively mapped to a DRel in another framework where it may be defined in intentional terms. With
this in mind, DRel meaning in the DR-core specification is described in “informational” terms, but in 6.9,
a mapping is provided from the core relation types (presented in Clause 5) to the relations present in
existing classifications, including those that define relations in intentional terms.
4.4 Pragmatic variants of discourse relations
With the exception of HTDC, all frameworks also distinguish relations when one or both of the
2)
arguments involve an implicit belief or a dialogue act that takes scope over the semantic content of
the argument. The motivation for this distinction comes from examples like Example 7, where it should
not be inferred that John’s sending of the message somehow led to him being absent from work, but
2) The concept of a dialogue act, as used in ISO 24617-2, can be seen as an empirically based and computationally
well-defined interpretation of the traditional notion of a “speech act”.
4 © ISO 2016 – All rights reserved
rather that it causes the speaker/writer to believe that John is not at work. In other words, the meaning
of the subordinate clause provides evidence supporting the claim made by the main clause. Similarly, in
Example 8, the inference should be made that the explanation is being provided not for the content of
the question but for the (dialogue) act of questioning itself.
Example 7 John is not at work today, because he sent me a message to say he was sick.
Example 8 What are you doing tonight? Because there’s a good movie on.
This kind of distinction has been given various names in the literature, for example the “semantic-
[73][66][46] [17][44]
pragmatic” distinction, the “internal-external” distinction, the “ideational-pragmatic”
[63] [37]
distinction and the “content-metatalk” distinction. In other cases, such as in RST, the distinction,
while not being explicitly named, is evidently taken into account in the classification (e.g. Cause vs.
Evidence/Justify in RST distinguishes the semantic and pragmatic interpretations, respectively). What
is difficult to reconcile about the treatment of this distinction across the various frameworks is that
while some, like CCR, allow for it for all relation types, others, like the PDTB and RST, only admit it
for some relations (e.g. Cause, Condition, Contrast, Concession in PDTB). It must be noted, however,
that there doesn’t seem to be any a priori reason for such a restriction to only some relation types,
and the choice is in the end found to result from what was observed in the corpus that was analysed
and/or annotated. In DR-core, the “semantic-pragmatic” distinction is allowed for all relation types, with
the general aim of not being overly restrictive in the absence of well-defined criteria. At the same time,
the scheme does not encode this distinction on the relation, but rather on the arguments of the relation,
the main reason being that in all cases involving either a belief or a dialogue act, what is different is
not the relation, but rather the semantic status of the arguments. A further motivation comes from
recognizing that representing the distinction on the relation would not distinguish cases where the
belief or dialogue act is implicit (as in Examples 7 and 8) from those where they are made explicit with
performative verbs or propositional attitude verbs, as in Examples 9 and 10. Pragmatic interpretations
are therefore represented on arguments using a feature indicating the argument to be of the type
“belief” or the type “dialogue act”. Note that in cases exemplified by Examples 9 and 10 the belief or
dialogue act aspect of the meaning is entirely obtained from the explicit content of the arguments,
rather than from a contextually motivated inference.
Example 9 I believe John is not at work today because he sent me a message to say he was sick.
Example 10 I’m asking you what you are doing tonight because there’s a good movie on.
4.5 Hierarchical classification of discourse relations
In all existing frameworks, discourse relations are grouped together semantically to a greater or lesser
degree; where they differ is in how the groupings are done. For example, while PDTB groups Concession
together with Contrast under the broader Comparison class, CCR places Concession under the Negative
Causal relation group, while placing Contrast under the Negative Additive group. Reconciliation with
respect to these groupings is not possible, since they stem from basic differences in what is taken to
count as semantic closeness. The solution adopted in the DR-core specification is to use a “flat” set of core
relations that can be used in an annotation scheme as just that, or mapped to the appropriate type
within a particular hierarchical scheme adopted. In 6.9, these mappings from the DR-core relations to
the schemes in different frameworks are provided.
4.6 Inference of multiple relations between two segments
Among the various frameworks, the PDTB is unique in allowing multiple relations to be inferred
between two given situations. The connective “since”, for example, can have both temporal and causal
interpretations, as in Example 11.
Example 11 MiniScribe has been on the rocks since it disclosed earlier this year that its earnings
reports for 1988 weren’t accurate.
The DR-core specification provides for representing multiple relations inferred between two given
situations, both when the relations are realized explicitly as well as implicitly.
4.7 Representation of (a)symmetry of relations
Whether or not a discourse relation is symmetric or asymmetric is a distinction embodied in the
representation of all frameworks. That is, given a relation REL and its arguments A and B, all frameworks
distinguish whether or not (REL, A, B) is equivalent to (REL, B, A). For example, the Contrast relation is
taken to be symmetric whereas the Cause relation is considered asymmetric. Where frameworks differ
is in how this distinction is captured in the scheme. Most classifications, such as RST, CCR, HTDC and
PDTB, encode asymmetry in terms of the textual linear ordering and/or the syntax of the argument
realizations. Thus, in the CCR classification, where the argument span ordering is one of the basic
“cognitive” primitives underlying the scheme, the relation Cause-Consequence captures the “basic”
order for the semantic causal relation, with the cause appearing before the effect, whereas the relation
Consequence-Cause captures the “non-basic” order, with the effect appearing before the cause. In the
PDTB, argument spans are first named as Arg1 and Arg2 according to syntactic criteria, including
syntactic dependency and linear order, and the asymmetrical relations are then defined in terms of
the Arg1 and Arg2 labels (for example, in Cause:Reason, Arg2 is the cause and Arg1 the effect, while in
Cause:Result, Arg1 is the cause and Arg2 is the effect). GraphBank, on the other hand, utilizes a different
mechanism to capture the asymmetry. Rather than making reference to linear order, it makes use of
directed arcs in the annotation, with definitions provided for how to interpret the directionality for
each relation type (for example, for the relation Cause-Effect, the arc is directed from the span stating
the cause to the span stating the effect; for the relation Violated Expectation, the arc is directed from
the span stating the cause to the span stating the absent effect; and so on).
In the DR-core specification, representation of asymmetry abstracts over the linear ordering and
syntactic structure, not only because these are not semantic in nature but also because they may not be
good criteria from the viewpoint of interoperability, given the wide variation in cross-linguistic syntax,
including clause-combination. Instead, asymmetry is represented by specifying the argument roles in
the definition of each relation. Arguments are named Arg1 and Arg2, but they bear relation-specific
semantic roles. For example, in the Cause relation, defined as “Arg2 serves as an explanation for Arg1”
(see Table 1), the text span named Arg2 always provides the reason in the Cause relation, irrespective
of linear order or syntax, and Arg1 always constitutes the result. For human annotators, mnemonic
labels indicating the semantic roles, like “reason” and “result”, are more convenient than “Arg2” and
“Arg1”, therefore the ISO specification also allows the use of these semantic role labels. Table 2 provides
the mapping between Arg1 and Arg2 labels and the corresponding semantic role labels for asymmetric
relations. In symmetric relations, on the other hand, where both arguments play the same semantic role,
arguments are named Arg1 and Arg2 following their linear order in the text.
It is important to note that this representation can be effectively mapped to other schemes for
representing asymmetry and in no way obfuscates the differences in linear ordering of the arguments,
which can be easily determined by pairing the argument roles with the text span annotations. The
ISO scheme acknowledges that linear ordering has a bearing for claims that different versions of an
asymmetric relation may not have the same linguistic constraints, for example, with respect to
[3]
linguistic predictions for the following discourse.
4.8 Representation of the relative importance of arguments for discourse
meaning/structure
Beyond the representation of asymmetry, some frameworks, namely RST, HTDC, and SDRT also
explicitly represent the “relative importance” of DRel arguments, taking this relative importance to
impact the meaning or structure of the text as a whole. In RST, one argument of an asymmetric relation
[40]
is labelled the “nucleus” whereas the other is labelled “satellite”, based on the following criteria: (a)
The nucleus is more essential to the writer’s purpose than the satellite; (b) In comparison to the nucleus,
the satellite is more easily substitutable without much change to the apparent function of the text (or
discourse) as a whole, and (c) Without the nucleus, the content of the satellite is incomprehensible (in the
text as a whole), a non sequitur. HTDC has a similar approach, using the term “dominance”, with the goal
of deriving a single assertion from a discourse relation connecting two segments, and distinguishing
relations in terms of how this single assertion should be derived. In subordinating relations, in particular,
the assertion associated with the relation is obtained from the “dominant” segment, as specified in the
relation definitions. SDRT, on the other hand, classifies a relation as “subordinating” or “coordinating”,
6 © ISO 2016 – All rights reserved
[4]
depending on what structural configuration the arguments create in the discourse graph. In the DR-
core specification, the relative importance of arguments for the text (meaning or structure) as a whole is
not represented directly. However, because of the explicit identification of the roles of arguments in each
relation definition (as described in 4.7), a layer of representation capturing the arguments” relative
importance can be easily derived. For example, a mapping from ISO categories to RST categories for
Cause would label the Arg2 (corresponding to the reason) argument as the satellite and the Arg1
(corresponding to the result) argument as the nucleus, because there is a one-to-one mapping in RST
between the semantic roles of arguments and their respective functional roles for relative importance,
for each relation. A similar mapping can be shown for SDRT relations as well.
4.9 Arity of arguments
Except RST, all frameworks assume that a discourse relation has two and only two arguments. In RST,
the constraints on the number of arguments for a relation are captured via multinuclear relations,;
the relations Joint and Sequence (among others) allow for more than two arguments. In the DR-core
specification, a discourse relation is restricted to two and only two arguments, with the understanding
that a mapping from binary relations to n-ary relations is possible where necessary. For example, two
identical binary relations with shared arguments, R(A, B) and R(B, C), can be collapsed into a single
ternary relation R(A, B, C), if the given framework allows for the relation R to be n-ary.
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations
Three important considerations for annotating the arguments of a discourse relation are the following.
The first has to do with the kinds of syntactic forms the realization of an argument can have. That is,
what are the minimal allowable syntactic units corresponding to an argument? While all frameworks
agree that the typical syntactic realization of an argument is a “clause”, some allow for certain non-
clausal phrases as well. In the end, the differences emerge because of different views on the information
status of different syntactic forms in discourse and their relevance to discourse
...
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
©
ISO 2016
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 2
3 Terms and definitions . 2
4 Basic concepts and metamodel . 3
4.1 Overview . 3
4.2 Representation of discourse structure . 3
4.3 Semantic description of discourse relations . 4
4.4 Pragmatic variants of discourse relations . 4
4.5 Hierarchical classification of discourse relations . 5
4.6 Inference of multiple relations between two segments . 5
4.7 Representation of (a)symmetry of relations . 6
4.8 Representation of the relative importance of arguments for discourse meaning/
structure . 6
4.9 Arity of arguments . 7
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations . 7
4.11 Triggers of discourse relations . 7
4.12 Representation of attribution as a discourse relation . 8
4.13 Representation of entity-based relations. 9
4.14 Representation of non-existence of a discourse relation .10
4.15 Summary: Assumptions of the DR-core annotation scheme .10
4.16 Issues to be taken up in the follow-up of DR-core .11
4.17 Metamodel .11
5 Core discourse relations .12
6 Current approaches and annotation schemes .21
6.1 Overview .21
6.2 Rhetorical structure theory (RST) .21
6.3 RST Treebank .22
6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .24
6.5 GraphBank .24
6.6 SDRT .25
6.7 CCR .26
6.8 Penn Discourse Treebank (PDTB) .26
6.9 Mapping of DR-core discourse relations to existing classifications .28
7 Interactions of this document with other annotation schemes .30
7.1 Overlapping annotation schemes .30
7.2 Discourse relations and semantic roles .31
7.3 Discourse relations and temporal relations .31
7.4 Discourse relations and semantic relations between dialogue acts .32
8 DRelML: Discourse Relations Markup Language .33
8.1 Overview .33
8.2 DRelML abstract syntax and semantics .34
8.3 Concrete syntax .35
Bibliography .39
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
Introduction
The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena
in support of empirical natural language research, both computational and theoretical. At the level of
discourse, interest in discourse processing has led to the development of several corpora annotated for
discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are
relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key
to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.
Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such
[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]
machine translation, opinion mining and sentiment analysis, and information retrieval. A
[76]
recent overview includes a description of the state of the art in discourse and computation. Several
international and collaborative efforts have resulted in annotated resources of discourse relations,
across languages as well as genres, to support the development of such applications.
Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of
which concerns the representation of discourse structure, while the other has to do with the semantic
classification of discourse relations. As a result, annotations constructed using one framework are not
easily interpreted in another framework, and annotated resources are limited in their interoperability.
Notwithstanding their differences, however, there are strong compatibilities between them that can be
clarified and used as the basis for mappings and comparisons between the resources, as well as for use
as a basis for future annotation.
In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,
states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,
temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence
relations”. Although discourse relations hold most prominently between the meanings of successive
sentences or utterances in a discourse, they may also occur between the meanings of smaller or
larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between
situations that are not explicitly described but that can be inferred.
This document aims to specify an interoperable approach to the annotation of local semantic relations
in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also
Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects
the view that strong underlying compatibilities with respect to the semantic description of discourse
relations can be observed in the various discourse relation frameworks being used to support data
[40]
annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory
[3] [59] [17][18]
(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and
[66]
the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation
of these compatibilities and a loose mapping between definitions of individual discourse relations, as
specified in the different frameworks that will benefit the community as a whole.
The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;
(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard
annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions
of a set of “core” discourse relations which are commonly found in some form in many existing discourse
relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.
This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at
providing an open, extensible set of core relations. The core annotation scheme also discusses certain
issues in discourse relation annotation that it leaves open, as they require further study in collaboration
with other efforts in multilingual discourse annotation, in particular the European COST action
TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a
complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension
of the standard. The issues to be taken up for this complementary part are listed in 4.16.
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope
This document establishes the representation and annotation of local, “low-level” discourse relations
between situations mentioned in discourse, where each relation is annotated independently of other
relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core discourse
relations, many of which have similar definitions in different frameworks. To the extent possible, this
document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization,
and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit
semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate
mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable
with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating
local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is
well understood and can be clearly defined; (b) further extensions to represent higher-level, global
discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be
compatible across frameworks, even when they are based on different theories of discourse structure.
As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard
aims to be transparent in its relation to existing frameworks for discourse relation annotation, but
also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive
discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act
annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1
(time and events); still other discourse relations are very similar to certain predicate-argument
relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various
parts are required to form a consistent whole, this document pays special attention to the interactions
of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves
linking local discourse relations to form one or more composite global structures.
This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,
presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation
eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically
simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse
segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)
EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two
sentences are implicitly related through the discourse relation Cause.
EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the
two clauses are related by the discourse relation Concession, expressed by the connective “but”.
EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal
relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an
eventuality, but a proposition, formed by an event description with negative polarity.
Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence
relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)
EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word
discourse connectives include “as well as”, “such as”.
Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal
conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure
representation of discourse structure that only specifies local dependencies between a discourse
relation and its arguments, without further specifying any links or dependencies across these local
structures
2 © ISO 2016 – All rights reserved
4 Basic concepts and metamodel
4.1 Overview
In a discourse, which comes into play when communication involves a sequence of clauses or sentences
in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,
states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.
Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called
the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and
phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis
of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized
with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”
is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the
buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate
two sentences to express the consequence of not seeing many signs about growth coming to a halt. In
Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,
but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction
or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two
sentences, with the second sentence offering an explanation for why some (investors) have raised their
cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse
context needs to be used together with, cohesive devices and world knowledge to get at the relation.
[44]
Often, when such relations are inferred, it is possible to insert a connective phrase to express the
relation, as shown here with the insertion of “because”. In this document, the term “connective” is used
in a broad sense, to refer to any word or phrase used to express a discourse relation, including both
those drawn from well-defined syntactic classes as well as those that are not.
Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares
because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had
recommended the stock as a good buy.
Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.
As a result, Fed officials may be divided over whether to ease credit.
Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund
managers said. A major reason is that investors already have sharply scaled back their purchases
of stock funds since Black Monday.
Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash
positions help buffer a fund when the market falls.
Existing frameworks for describing and representing discourse relations differ along several lines.
The remainder of this clause provides a comparison of the most important frameworks, focusing on
those that have been used as the basis for annotating discourse relations in corpora, in particular the
1)
[18]
Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann
[40] [66]
and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,
[3]
Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation
[59][61]
framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses
the main issues that are considered relevant for developing the pivot representation in DR-core. For
each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends
with a summary of the key features of the DR-core specification, and the DR-core metamodel.
4.2 Representation of discourse structure
One important difference between existing DRel frameworks concerns the representation of discourse
[10] [40]
structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a
[78]
tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on
1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,
appear elsewhere in the literature.
[64]
HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus
[1]
and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,
but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to
[59]
discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse
[16][75] [65]
relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,
individual relations along with their arguments are annotated, without being combined with other
relations to form a composite structure encompassing the entire text.
These widely different views about the structural representation for discourse are difficult to reconcile
with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of
discourse relations is adopted, with the idea that individual relations can be more reliably annotated
and that they can be further annotated to project a higher-level tree or graph structure, depending on
one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can
also serve as a pivot representation when comparing annotations of different resources grounded in
different theories.
4.3 Semantic description of discourse relations
A second difference among existing frameworks relates to whether the meaning of a discourse relation
is described in “informational” term, i.e. in terms of the “meaning” of the relation’s arguments, or in
“intentional” terms, i.e. in terms of the intentions of the speaker/writer (W) and intended effects on
the hearer/reader (R). While SDRT, HTDC, PDTB and CCR describe the meaning in informational terms,
RST provides definitions in intentional terms. For instance, Example 5 shows the definition for the
(non-volitional) Cause relation in RST (N = nucleus, S = satellite, W = writer, R = reader), while Example
6 presents the definition for the same relation in HTDC (where it is called Explanation).
Example 5 Non-Volitional Cause (RST)
Constraints on N: presents a situation that is not a nucleus
Constraints on the N + S combination: S presents a situation that, by means other than motivating a
volitional action, caused the situation presented in N; without the presentation of S, R might not know
the particular cause of the situation; a presentation of N is more central than S to W’s purposes in
putting forth the N-S combination
The effect: R recognizes the situation presented in S as a cause of the situation presented in N
Locus of the effect: N and S.
Example 6 Explanation (HTDC)
Infer that the state/event asserted by S causes or could cause the state/event asserted by S .
1 0
Despite the different ways of describing DRel semantics, it is important to note that in many cases, the
differences lie in the “level” at which the relation is described, especially when the situations being
related are the same. Thus, for example, a DRel defined in informational terms in one framework can be
effectively mapped to a DRel in another framework where it may be defined in intentional terms. With
this in mind, DRel meaning in the DR-core specification is described in “informational” terms, but in 6.9,
a mapping is provided from the core relation types (presented in Clause 5) to the relations present in
existing classifications, including those that define relations in intentional terms.
4.4 Pragmatic variants of discourse relations
With the exception of HTDC, all frameworks also distinguish relations when one or both of the
2)
arguments involve an implicit belief or a dialogue act that takes scope over the semantic content of
the argument. The motivation for this distinction comes from examples like Example 7, where it should
not be inferred that John’s sending of the message somehow led to him being absent from work, but
2) The concept of a dialogue act, as used in ISO 24617-2, can be seen as an empirically based and computationally
well-defined interpretation of the traditional notion of a “speech act”.
4 © ISO 2016 – All rights reserved
rather that it causes the speaker/writer to believe that John is not at work. In other words, the meaning
of the subordinate clause provides evidence supporting the claim made by the main clause. Similarly, in
Example 8, the inference should be made that the explanation is being provided not for the content of
the question but for the (dialogue) act of questioning itself.
Example 7 John is not at work today, because he sent me a message to say he was sick.
Example 8 What are you doing tonight? Because there’s a good movie on.
This kind of distinction has been given various names in the literature, for example the “semantic-
[73][66][46] [17][44]
pragmatic” distinction, the “internal-external” distinction, the “ideational-pragmatic”
[63] [37]
distinction and the “content-metatalk” distinction. In other cases, such as in RST, the distinction,
while not being explicitly named, is evidently taken into account in the classification (e.g. Cause vs.
Evidence/Justify in RST distinguishes the semantic and pragmatic interpretations, respectively). What
is difficult to reconcile about the treatment of this distinction across the various frameworks is that
while some, like CCR, allow for it for all relation types, others, like the PDTB and RST, only admit it
for some relations (e.g. Cause, Condition, Contrast, Concession in PDTB). It must be noted, however,
that there doesn’t seem to be any a priori reason for such a restriction to only some relation types,
and the choice is in the end found to result from what was observed in the corpus that was analysed
and/or annotated. In DR-core, the “semantic-pragmatic” distinction is allowed for all relation types, with
the general aim of not being overly restrictive in the absence of well-defined criteria. At the same time,
the scheme does not encode this distinction on the relation, but rather on the arguments of the relation,
the main reason being that in all cases involving either a belief or a dialogue act, what is different is
not the relation, but rather the semantic status of the arguments. A further motivation comes from
recognizing that representing the distinction on the relation would not distinguish cases where the
belief or dialogue act is implicit (as in Examples 7 and 8) from those where they are made explicit with
performative verbs or propositional attitude verbs, as in Examples 9 and 10. Pragmatic interpretations
are therefore represented on arguments using a feature indicating the argument to be of the type
“belief” or the type “dialogue act”. Note that in cases exemplified by Examples 9 and 10 the belief or
dialogue act aspect of the meaning is entirely obtained from the explicit content of the arguments,
rather than from a contextually motivated inference.
Example 9 I believe John is not at work today because he sent me a message to say he was sick.
Example 10 I’m asking you what you are doing tonight because there’s a good movie on.
4.5 Hierarchical classification of discourse relations
In all existing frameworks, discourse relations are grouped together semantically to a greater or lesser
degree; where they differ is in how the groupings are done. For example, while PDTB groups Concession
together with Contrast under the broader Comparison class, CCR places Concession under the Negative
Causal relation group, while placing Contrast under the Negative Additive group. Reconciliation with
respect to these groupings is not possible, since they stem from basic differences in what is taken to
count as semantic closeness. The solution adopted in the DR-core specification is to use a “flat” set of core
relations that can be used in an annotation scheme as just that, or mapped to the appropriate type
within a particular hierarchical scheme adopted. In 6.9, these mappings from the DR-core relations to
the schemes in different frameworks are provided.
4.6 Inference of multiple relations between two segments
Among the various frameworks, the PDTB is unique in allowing multiple relations to be inferred
between two given situations. The connective “since”, for example, can have both temporal and causal
interpretations, as in Example 11.
Example 11 MiniScribe has been on the rocks since it disclosed earlier this year that its earnings
reports for 1988 weren’t accurate.
The DR-core specification provides for representing multiple relations inferred between two given
situations, both when the relations are realized explicitly as well as implicitly.
4.7 Representation of (a)symmetry of relations
Whether or not a discourse relation is symmetric or asymmetric is a distinction embodied in the
representation of all frameworks. That is, given a relation REL and its arguments A and B, all frameworks
distinguish whether or not (REL, A, B) is equivalent to (REL, B, A). For example, the Contrast relation is
taken to be symmetric whereas the Cause relation is considered asymmetric. Where frameworks differ
is in how this distinction is captured in the scheme. Most classifications, such as RST, CCR, HTDC and
PDTB, encode asymmetry in terms of the textual linear ordering and/or the syntax of the argument
realizations. Thus, in the CCR classification, where the argument span ordering is one of the basic
“cognitive” primitives underlying the scheme, the relation Cause-Consequence captures the “basic”
order for the semantic causal relation, with the cause appearing before the effect, whereas the relation
Consequence-Cause captures the “non-basic” order, with the effect appearing before the cause. In the
PDTB, argument spans are first named as Arg1 and Arg2 according to syntactic criteria, including
syntactic dependency and linear order, and the asymmetrical relations are then defined in terms of
the Arg1 and Arg2 labels (for example, in Cause:Reason, Arg2 is the cause and Arg1 the effect, while in
Cause:Result, Arg1 is the cause and Arg2 is the effect). GraphBank, on the other hand, utilizes a different
mechanism to capture the asymmetry. Rather than making reference to linear order, it makes use of
directed arcs in the annotation, with definitions provided for how to interpret the directionality for
each relation type (for example, for the relation Cause-Effect, the arc is directed from the span stating
the cause to the span stating the effect; for the relation Violated Expectation, the arc is directed from
the span stating the cause to the span stating the absent effect; and so on).
In the DR-core specification, representation of asymmetry abstracts over the linear ordering and
syntactic structure, not only because these are not semantic in nature but also because they may not be
good criteria from the viewpoint of interoperability, given the wide variation in cross-linguistic syntax,
including clause-combination. Instead, asymmetry is represented by specifying the argument roles in
the definition of each relation. Arguments are named Arg1 and Arg2, but they bear relation-specific
semantic roles. For example, in the Cause relation, defined as “Arg2 serves as an explanation for Arg1”
(see Table 1), the text span named Arg2 always provides the reason in the Cause relation, irrespective
of linear order or syntax, and Arg1 always constitutes the result. For human annotators, mnemonic
labels indicating the semantic roles, like “reason” and “result”, are more convenient than “Arg2” and
“Arg1”, therefore the ISO specification also allows the use of these semantic role labels. Table 2 provides
the mapping between Arg1 and Arg2 labels and the corresponding semantic role labels for asymmetric
relations. In symmetric relations, on the other hand, where both arguments play the same semantic role,
arguments are named Arg1 and Arg2 following their linear order in the text.
It is important to note that this representation can be effectively mapped to other schemes for
representing asymmetry and in no way obfuscates the differences in linear ordering of the arguments,
which can be easily determined by pairing the argument roles with the text span annotations. The
ISO scheme acknowledges that linear ordering has a bearing for claims that different versions of an
asymmetric relation may not have the same linguistic constraints, for example, with respect to
[3]
linguistic predictions for the following discourse.
4.8 Representation of the relative importance of arguments for discourse
meaning/structure
Beyond the representation of asymmetry, some frameworks, namely RST, HTDC, and SDRT also
explicitly represent the “relative importance” of DRel arguments, taking this relative importance to
impact the meaning or structure of the text as a whole. In RST, one argument of an asymmetric relation
[40]
is labelled the “nucleus” whereas the other is labelled “satellite”, based on the following criteria: (a)
The nucleus is more essential to the writer’s purpose than the satellite; (b) In comparison to the nucleus,
the satellite is more easily substitutable without much change to the apparent function of the text (or
discourse) as a whole, and (c) Without the nucleus, the content of the satellite is incomprehensible (in the
text as a whole), a non sequitur. HTDC has a similar approach, using the term “dominance”, with the goal
of deriving a single assertion from a discourse relation connecting two segments, and distinguishing
relations in terms of how this single assertion should be derived. In subordinating relations, in particular,
the assertion associated with the relation is obtained from the “dominant” segment, as specified in the
relation definitions. SDRT, on the other hand, classifies a relation as “subordinating” or “coordinating”,
6 © ISO 2016 – All rights reserved
[4]
depending on what structural configuration the arguments create in the discourse graph. In the DR-
core specification, the relative importance of arguments for the text (meaning or structure) as a whole is
not represented directly. However, because of the explicit identification of the roles of arguments in each
relation definition (as described in 4.7), a layer of representation capturing the arguments” relative
importance can be easily derived. For example, a mapping from ISO categories to RST categories for
Cause would label the Arg2 (corresponding to the reason) argument as the satellite and the Arg1
(corresponding to the result) argument as the nucleus, because there is a one-to-one mapping in RST
between the semantic roles of arguments and their respective functional roles for relative importance,
for each relation. A similar mapping can be shown for SDRT relations as well.
4.9 Arity of arguments
Except RST, all frameworks assume that a discourse relation has two and only two arguments. In RST,
the constraints on the number of arguments for a relation are captured via multinuclear relations,;
the relations Joint and Sequence (among others) allow for more than two arguments. In the DR-core
specification, a discourse relation is restricted to two and only two arguments, with the understanding
that a mapping from binary relations to n-ary relations is possible where necessary. For example, two
identical binary relations with shared arguments, R(A, B) and R(B, C), can be collapsed into a single
ternary relation R(A, B, C), if the given framework allows for the relation R to be n-ary.
4.10 Syntactic form, extent, and (non-)adjacency of argument realizations
Three important considerations for annotating the arguments of a discourse relation are the following.
The first has to do with the kinds of syntactic forms the realization of an argument can have. That is,
what are the minimal allowable syntactic units corresponding to an argument? While all frameworks
agree that the typical syntactic realization of an argument is a “clause”, some allow for certain non-
clausal phrases as well. In the end, the differences emerge because of different views on the information
status of different syntactic forms in discourse and their relevance to discourse coherence. Also to be
considered are languages like Turkish where nominalizations (noun phrases denoting eventualities)
[79]
are very common. In the DR-core specification, what counts as a DRel argument is constrained by its
semantic status rather than its syntactic form. In particular, a DRel argument must denote a situation as
defined in 3.2, that is, the situation must be one of the following types: event, state, fact, proposition, or
[2][68]
dialogue act .
The second issue has to do with the extent of arguments. All frameworks allow for argument spans
to be arbitrarily complex, composed of multiple clauses in coordination or subordinate relations, as
well as multiple sentences, as long as they are required for interpreting the relation in which they
participate. PDTB further stipulates that argument spans must contain the “minimal” amount of
information needed to interpret the relation, which is closely related to the third issue concerning the
(non-)requirement for the adjacency of argument spans. Some frameworks, such as RST, require the
text spans of the related arguments to be textually adjacent, whereas others such as the PDTB impose
this constraint only for implicit discourse relations. To a large extent, these differences arise because of
differences in assumptions about the global structure of a text, and the reflection of such assumptions
in the annotation. As with the issue of syntactic form,
...
NORME ISO
INTERNATIONALE 24617-8
Première édition
2016-12-15
Gestion des ressources langagières —
Cadre d’annotation sémantique
(SemAF) —
Partie 8:
Relations sémantiques dans le
discours, schéma d’annotation de base
(DR-core)
Language resource management — Semantic annotation framework
(SemAF) —
Part 8: Semantic relations in discourse, core annotation schema
(DR-core)
Numéro de référence
©
ISO 2016
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2016, Publié en Suisse
Droits de reproduction réservés. Sauf indication contraire, aucune partie de cette publication ne peut être reproduite ni utilisée
sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique, y compris la photocopie, l’affichage sur
l’internet ou sur un Intranet, sans autorisation écrite préalable. Les demandes d’autorisation peuvent être adressées à l’ISO à
l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – Tous droits réservés
Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 2
3 Termes et définitions . 2
4 Notions fondamentales et métamodèle. 3
4.1 Vue d’ensemble . 3
4.2 Représentation de la structure du discours . 4
4.3 Description sémantique des relations du discours . 4
4.4 Variantes pragmatiques des relations du discours . . 5
4.5 Classification hiérarchique des relations du discours . 6
4.6 Inférence de relations multiples entre deux segments . 6
4.7 Représentation de la symétrie ou de l’asymétrie des relations . 6
4.8 Représentation de l’importance relative des arguments pour la signification/la
structure du discours . 7
4.9 Arité des arguments . 8
4.10 Forme syntaxique, étendue et (non) adjacence des réalisations des arguments . 8
4.11 Déclencheurs des relations du discours . 8
4.12 Représentation de l’attribution en tant que relation du discours . 9
4.13 Représentation des relations basées sur des entités .10
4.14 Représentation de la non existence d’une relation du discours .11
4.15 Résumé: Postulats du schéma d’annotation du DR-core .11
4.16 Questions à reprendre dans la suite donnée à DR-core .12
4.17 Métamodèle .12
5 Ensemble de base de relations du discours .13
6 Approches actuelles et schémas d’annotation .23
6.1 Vue d’ensemble .23
6.2 Théorie des structures rhétoriques (Rhetorical Structure Theory – RST) .23
6.3 RST Treebank .24
6.4 Théorie de Hobbs sur la cohérence du discours (Hobbs’ Theory of Discourse
Coherence – HTDC) .25
6.5 GraphBank .26
6.6 SDRT .27
6.7 CCR .28
6.8 Penn Discourse Treebank (PDTB) .28
6.9 Transposition des relations du discours DR-Core dans les classifications existantes .30
7 Interactions du présent document avec les autres schémas d’annotation .33
7.1 Chevauchement des schémas d’annotation .33
7.2 Relations du discours et rôles sémantiques .34
7.3 Relations du discours et relations temporelles .34
7.4 Relations du discours et relations sémantiques entre actes du dialogue .35
8 DRelML: Langage de balisage des relations du discours (Discourse Relations
Markup Language) .36
8.1 Vue d’ensemble .36
8.2 Syntaxe abstraite et sémantique de DRelML .37
8.3 Syntaxe concrète .38
Bibliographie .42
Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www.
iso.org/directives).
L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www.iso.org/brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la signification des termes et expressions spécifiques de l’ISO liés à l’évaluation
de la conformité, ou pour toute information au sujet de l’adhésion de l’ISO aux principes de l’Organisation
mondiale du commerce (OMC) concernant les obstacles techniques au commerce (OTC), voir le lien
suivant: www.iso.org/iso/fr/avant-propos.html
Le comité chargé de l’élaboration du présent document est l’ISO/TC 37, Terminologie et autres ressources
langagières et ressources de contenu, sous-comité SC 4, Gestion des ressources linguistiques.
Une liste de toutes les parties de l’ISO 24617 figure sur le site web de l’ISO.
iv © ISO 2016 – Tous droits réservés
Introduction
La dernière décennie a connu une multiplication de corpus annotés linguistiquement et codant de
nombreux phénomènes à l’appui de la recherche empirique en matière de langue naturelle, tant
informatique que théorique. Au niveau du discours, un intérêt pour le traitement du discours a conduit
à l’élaboration de plusieurs corpus annotés en ce qui concerne les relations du discours. Les relations
du discours, également appelées «relations de cohérence» ou «relations rhétoriques», sont des relations
exprimées de manière explicite ou implicite, entre des situations mentionnées dans un discours: elles
sont essentielles à une pleine compréhension du discours, allant au-delà de la signification véhiculée par
les propositions et les phrases. Les relations du discours et la structure du discours sont considérées
[39][41]
comme des composantes essentielles des tâches du TALN telles que le résumé automatique , les
[74] [19][47]
questions complexes dans les systèmes de question-réponses , la génération de langage naturel
[56] [42] [11][12]
, la traduction automatique , la fouille d’opinions et l’analyse des sentiments et la recherche
[38] [76]
d’information . Une synthèse récente intègre une description des dernières techniques en matière
de discours et de traitement automatique. Plusieurs initiatives internationales et collaboratives ont
permis de créer des ressources de relations du discours annotées, dans différentes langues et genres,
en vue de favoriser le développement de ce type d’applications.
Les cadres d’annotation existants présentent deux différences fondamentales au niveau des postulats de
départ: l’une d’entre elle concerne la représentation de la structure du discours, l’autre la classification
sémantique des relations du discours. Il s’ensuit que les annotations élaborées au moyen d’un cadre
donné sont difficiles à interpréter dans un autre cadre et que l’interopérabilité des ressources annotées
est limitée. Cependant, en dépit de ces différences, il existe entre ces cadres d’annotation de fortes
compatibilités qui peuvent être précisées et utilisées pour procéder à des transpositions et établir des
correspondances entre les ressources, ainsi que pour servir de base aux annotations futures.
Dans un discours (écrit ou oral) cohérent, les situations mentionnées dans le discours, comme les
événements, les déclarations, les faits, les propositions et les actes de dialogue, sont liées, sur le plan
sémantique, par des relations causales, contrastives, temporelles et autres, appelées «relations du
discours», «relations rhétoriques» ou «relations de cohérence». Bien que les relations du discours se
situent principalement entre les significations de phrases ou des énoncés successifs du discours, elles
peuvent aussi apparaître entre les significations d’unités plus petites ou plus grandes (nominalisations,
propositions, paragraphes, segments de dialogues) et elles peuvent également apparaître entre des
situations qui ne sont pas décrites de façon explicite, mais qui peuvent être inférées.
Le présent document a pour objet de spécifier une approche interopérable d’annotation de relations
sémantiques locales dans le discours (DRel), qui respecte le cadre d’annotation linguistique (LAF,
[23]
ISO 24612-2; voir également Référence ) et les grands principes de l’annotation sémantique
déterminés dans l’ISO 24617-6. Il illustre le point de vue selon lequel il peut être observé des
compatibilités sous-jacentes fortes par rapport à la description sémantique des relations du discours
dans les divers cadres de relations du discours utilisés pour l’annotation des données, par exemple la
[40]
théorie des structures rhétoriques (Rhetorical Structure Theory, RST) , la théorie des représentations
[3]
discursives segmentées (Segmented Discourse Representation Theory, SDRT) , le Penn Discourse
[59]
Treebank (PDTB) , la théorie de Hobbs sur la cohérence du discours (Hobbs’ Theory of Discourse
[17][18]
Coherence, HTDC) et l’approche cognitive des relations de cohérence (Cognitive Approach to
[66]
Coherence Relations, CCR) . Ce document a pour objet d’expliquer ces compatibilités et de proposer
des transpositions approximatives entre les définitions des relations individuelles du discours, telles
que spécifiées dans les différents cadre, qui bénéficieront à l’ensemble de la communauté.
Le présent document a pour objet de (1) dresser une liste de souhaits concernant l’interopérabilité
de l’annotation des DRel; (2) préciser une méthode d’annotation des DRel qui soit compatible avec
les schémas d’annotation normalisés de l’ISO relatifs à l’information sémantique, existants et à venir;
(3) fournir des définitions claires et mutuellement cohérentes d’un ensemble «de base» de relations du
discours qui apparaissent souvent sous une forme ou une autre dans de nombreux cadres actuels de
relations du discours. Ensemble, les objectifs (2) et (3) constituent un «schéma d’annotation de base»
des DRel.
Le présent document n’a pas pour objet de fournir un ensemble exhaustif et figé de relations du discours,
mais plutôt de fournir un ensemble de base de relations ouvert et extensible. Le schéma d’annotation
de base aborde également certaines questions de l’annotation des relations du discours qui restent
en suspens, car elles nécessitent une étude plus approfondie en collaboration avec d’autres initiatives
d’annotation multilingue du discours, notamment l’action TextLink dans le cadre du programme
européen COST. Il est envisagé d’élaborer prochainement une nouvelle partie de l’ISO 24617 qui
complètera le présent document en fournissant un schéma d’annotation complet et interopérable des
DRel, tout en répondant à la dimension multilingue de la norme. Les questions qui seront reprises dans
cette partie complémentaire sont énumérées en 4.16.
vi © ISO 2016 – Tous droits réservés
NORME INTERNATIONALE ISO 24617-8:2016(F)
Gestion des ressources langagières — Cadre d’annotation
sémantique (SemAF) —
Partie 8:
Relations sémantiques dans le discours, schéma
d’annotation de base (DR-core)
1 Domaine d’application
Le présent document détermine la représentation et l’annotation des relations du discours locales,
de «bas niveau», entre les situations mentionnées dans le discours, chaque relation étant annotée
indépendamment des autres relations dans le même discours.
Le présent document fournit un socle d’annotation des relations du discours, en spécifiant un ensemble
de base de relations du discours, un grand nombre d’entre elles revêtant des définitions similaires dans
des cadres différents. Dans la mesure du possible, le présent document fournit des transpositions de
sémantique dans les différents cadres existants.
Le présent document peut être appliqué à deux situations différentes:
— pour l’annotation des relations du discours dans les corpus de langage naturel;
— en tant que représentation cible des méthodes automatiques d’analyse de surface du discours, pour
le résumé automatique et autres applications.
Les objectifs de cette spécification sont de fournir:
— un ensemble de référence de catégories de données qui définissent une collection de types de
relations du discours avec une sémantique explicite;
— une représentation pivot basée sur un cadre de définition des relations du discours qui peut faciliter
la transposition entre différents cadres;
— une base d’élaboration de lignes directrices en vue de créer de nouvelles ressources qui seront
immédiatement interopérables avec des ressources pré-existantes.
En ce qui concerne la structure du discours, la limite du présent document aux spécifications
d’annotation de relations du discours locales, de «bas niveau», est fondée sur l’idée (a) que l’analyse
à ce niveau correspond à ce qui est bien compris et peut être clairement défini, (b) qu’il est possible,
s’il y a lieu, de procéder à des extensions complémentaires permettant de représenter une structure
de discours globale de niveau plus élevé, et (c) qu’il permettra une compatibilité des annotations en
découlant avec les divers cadres, même s’ils reposent sur des théories de structure du discours
différentes.
En tant que partie intégrante du cadre d’annotation sémantique (SemAF) de l’ISO 24617, la présente
norme DR-core a pour objectif d’être transparente dans sa relation avec les cadres d’annotations des
relations du discours existants, mais également d’être compatible avec les autres parties de l’ISO 24617.
Certaines relations du discours sont spécifiques au discours interactif et recoupent la Partie 2 de
l’ISO 24617 consacrée à l’annotation des actes de dialogue. D’autres relations du discours se rapportent
au temps, et leur annotation fait partie intégrante de l’ISO 24617-1 (temps et événements); d’autres
relations du discours encore sont très semblables à certaines relations prédicat-argument («rôles
sémantiques»), dont l’annotation est l’objet principal de l’ISO 24617-4. Puisque les différentes parties
sont indispensables pour constituer un ensemble cohérent, le présent document porte une attention
particulière aux interactions de l’annotation des relations du discours avec les autres schémas
d’annotation sémantique (voir Article 8).
Le présent document ne traite pas de la représentation des structures de discours globales de niveau
élevé, qui implique de relier des relations du discours locales pour constituer une ou plusieurs
structures globales plus complexes.
Le présent document se limite, en outre, aux relations strictement sémantiques, et exclut donc, par
exemple, les relations présentationnelles, qui concernent la façon dont un texte est présenté à ses
lecteurs ou la façon dont des locuteurs structurent leurs contributions à un dialogue oral.
2 Références normatives
Le présent document ne contient aucune référence normative.
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions suivants s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— IEC Electropedia: disponible à l’adresse http://www.electropedia.org/.
— ISO Online browsing platform: disponible à l’adresse http://www.iso.org/obp.
3.1
discours
séquence de propositions ou de phrases dans un texte écrit ou d’énoncés dans un discours oral
3.2
situation
éventualité, fait, proposition, condition, croyance ou acte de dialogue, qui peut être réalisé au moyen
d’une expression simple ou complexe sur le plan linguistique, par exemple une proposition, une
nominalisation, une phrase/un énoncé ou un segment de discours comportant des phrases ou des
énoncés multiples
3.3
relation du discours
relation entre deux situations (3.2) mentionnées dans un discours (3.1)
EXEMPLE 1 «Pierre est arrivé en retard à la réunion. Il était bloqué dans un embouteillage.» Les événements
mentionnés dans ces deux phrases sont implicitement liés par la relation du discours Cause.
EXEMPLE 2 «Pierre était bloqué dans un embouteillage, mais il est arrivé à temps à la réunion.» Les
événements mentionnés dans ces deux propositions sont reliés par la relation du discours Concession, exprimé
par le connecteur «mais».
EXEMPLE 3 «Pierre n’a pas réussi à venir à la réunion: il a été retenu dans un très gros embouteillage.» Dans
cet exemple, la relation causale est la même que dans l’exemple 1, cependant l’argument exprimé par la première
partie n’est pas une éventualité, mais une proposition, formée par la description d’un événement à polarité
négative.
Note 1 à l’article: Il existe des quasi-synonymes pour «relation du discours», dont la signification est légèrement
différente, à savoir «relation de cohérence» et «relation rhétorique».
2 © ISO 2016 – Tous droits réservés
3.4
connecteur de discours
mot ou expression à mots multiples exprimant une relation du discours (3.3)
EXEMPLE Les connecteurs de discours à mot unique comprennent «mais», «puisque», «et», «cependant»,
«car». Les connecteurs de discours à mots multiples comprennent «ainsi que», «tel que».
Note 1 à l’article: De nombreux mots utilisés en tant que connecteurs de discours peuvent également être utilisés
comme conjonction à l’intérieur d’une proposition, par exemple l’utilisation de «et» dans «Jean et Marie forment
un beau couple».
3.5
structure de discours de bas niveau
représentation de la structure de discours qui ne spécifie que les dépendances locales entre une
relation de discours et ses arguments, sans que soient précisés les liens ou les dépendances entre ces
structures locales
4 Notions fondamentales et métamodèle
4.1 Vue d’ensemble
Dans un discours, qui se déclenche lorsque la communication implique une séquence de propositions
ou de phrases dans un texte, ou d’énoncés dans un dialogue, un aspect essentiel de la compréhension
découle de la façon dont les événements, les déclarations, les faits, les propositions et les actes de
dialogue mentionnés dans le discours sont reliés les uns aux autres. Comprendre ces relations, telles que
la Causalité (Cause), le Contraste (Contrast) et la Condition (Condition), contribue à ce que l’on appelle
la «cohérence du discours»: ces relations peuvent être «réalisées» de manière explicite au moyen de
certains mots et expressions (souvent appelés «connecteurs») ou peuvent être implicites lorsqu’elles
doivent être inférées à partir du contexte du discours et de notre connaissance du monde. Les exemples 1
à 3 illustrent la relation Cause réalisée avec des expressions de différentes classes syntaxiques. Dans
l’exemple 1, une conjonction de surbordination, «parce que», est utilisée pour identifier une situation
donnée (ici, la signification de la proposition subordonnée) comme la raison à l’événement d’achat
évoqué dans la proposition principale. Dans l’exemple 2, un adverbe, «En conséquence», est utilisé pour
relier deux phrases en exprimant la conséquence liée au fait de ne pas constater beaucoup de signes
indiquant un arrêt de la croissance. Dans l’exemple 3, il est de nouveau fait usage d’une expression
explicite, pour expliquer l’allégation concernant le niveau de retrait des investisseurs, mais, ici, cette
expression ne correspond pas à une classe syntaxique unique et bien définie comme une conjonction
ou un adverbe. Enfin, l’exemple 4 montre que, bien qu’une relation causale puisse être inférée entre
les deux phrases, la deuxième phrase proposant une explication de la raison pour laquelle certains
(investisseurs) ont relevé leurs liquidités, aucun mot, aucune expression du texte n’exprime cette
inférence. Au lieu de cela, il est nécessaire d’utiliser le contexte du discours avec des mécanismes de
cohésion et de connaissance du monde pour comprendre la relation. Souvent, lorsque de telles relations
[44]
sont inférées, il est possible d’introduire une expression conjonctive pour exprimer la relation,
comme démontré ici avec l’insertion de «parce que». Dans ce document, le terme «conjonctif» est utilisé
au sens large, pour faire référence à des mots ou des expressions utilisés pour exprimer une relation du
discours, notamment les mots ou les expressions tirés de classes syntaxiques bien définies tout autant
que ceux qui ne le sont pas.
Exemple 1 M. Taft, qui est également président de Taft Broadcasting Co., a déclaré qu’il achetait des actions,
parce qu’il dispose d’un compte à la société de courtage Salomon Brothers Inc., qui lui avait recommandé
ces actions comme un bon investissement.
Exemple 2 En dépit du ralentissement économique, rares sont les signes montrant clairement que la croissance
marque le pas. En conséquence, les dirigeants de la Fed peuvent être divisés sur l’utilité ou non d’une
politique d’assouplissement du crédit.
Exemple 3 Mais un désengagement prononcé des investisseurs est plus qu’improbable cette fois-ci, selon
les gestionnaires de fonds. L’une des principales raisons tient au fait que les investisseurs ont déjà
considérablement réduit leurs achats de fonds en actions depuis le Lundi noir.
Exemple 4 Certains augmentent leurs liquidités à des niveaux record. [Implicite (parce que)] Des niveaux
élevés de liquidités permettent de réguler un fonds lorsque le marché s’effondre.
Les cadres existants de description et de représentation des relations du discours diffèrent les uns
des autres sur plusieurs aspects. La suite de cet article établit une comparaison des cadres les plus
importants, en privilégiant ceux qui ont été utilisés comme base d’annotation des relations du discours
dans les corpus, en particulier la théorie de Hobbs sur la cohérence du discours (Theory of Discourse
1)
[18]
Coherence, HTDC) , la théorie des structures rhétoriques (Rhetorical Structure Theory, RST)
[40]
de Mann et Thompson , l’approche cognitive des relations de cohérence (Cognitive Approach of
[66]
Coherence Relations, CCR) de Sanders et al. , la théorie des représentations discursives segmentées
[3]
(Segmented Discourse Representation Theory, SDRT) de Asher et Lascarides et le cadre d’annotation
[59][61]
du Penn Discourse Treebank (PDTB) . Cette comparaison met en exergue et analyse les aspects
principaux considérés comme pertinents pour l’élaboration de la représentation pivot de DR-core. Pour
chaque aspect, l’analyse est suivie de la spécification ISO adoptée pour ledit aspect. L’article se termine
par un résumé des caractéristiques de base de la spécification DR-core et le métamodèle DR-Core.
4.2 Représentation de la structure du discours
Une différence importante entre les cadres de DRel existants concerne la représentation de la structure
[10] [40]
du discours. Par exemple, la RST TreeBank , basée sur la théorie des structures rhétoriques ),
[78]
adopte une représentation en arbre pour subsumer l’intégralité du texte du discours. Le Graphbank ,
basé sur la HTDC, permet la production de graphes généraux admettant des parents multiples et
[64] [1]
des dépendances croisées, et les corpus DISCOR et ANNODIS , basés sur la SDRT, permettent la
production de graphes dirigés acycliques admettant des parents multiples, mais pas de dépendances
croisées. Il existe également des cadres pré-théoriques ou théoriquement neutres en ce qui concerne
[59]
la structure du discours. Ceux-ci comprennent le PDTB , basé plus ou moins sur une approche par
[16][75] [65]
lexicalisation des relations et de la structure du discours (DLTAG , et le DiscAn basé sur la
CCR). Dans ces deux cadres, les relations individuelles ainsi que leurs arguments sont annotés, sans
être combinés à d’autres relations pour former une structure composite englobant l’intégralité du texte.
Ces visions très différentes de la représentation structurelle du discours sont difficiles à concilier.
La spécification DR-Core adopte une position pré-théorique impliquant une annotation de bas niveau des
relations du discours, avec l’idée que les relations individuelles peuvent être annotées de manière plus
fiable et qu’elles peuvent être annotées ultérieurement en vue de la réalisation d’une structure en
graphe ou en arborescence de niveau plus élevé, en fonction de l’inclination de chacun pour une théorie
ou une autre. Du point de vue de l’interopérabilité, l’annotation de bas niveau peut servir également
de représentation pivot pour la comparaison des annotations de différentes ressources fondées sur
différentes théories.
4.3 Description sémantique des relations du discours
Une deuxième différence entre les cadres existants concerne la description de la signification d’une
relation du discours, à savoir si elle est décrite en termes «informationnels», c’est-à-dire en termes
exprimant la «signification» des arguments de la relation, ou en termes «intentionnels», c’est-
à-dire en termes exprimant les intentions du locuteur/de l’auteur (A) et les effets attendus sur
l’auditeur/le lecteur (L). Alors que les théories SDRT, HTDC, PDTB et CCR décrivent la signification
en termes informationnels, la RST fournit des définitions en termes intentionnels. C’est ainsi que
l’exemple 5 présente la définition d’une relation de Cause (non-volontaire) dans la RST (N = noyau,
S = satellite, A = auteur, L = lecteur), alors que l’exemple 6 présente la définition de la même relation
dans la HTDC (où elle est appelée Explanation).
Exemple 5 Cause non-volontaire (RST).
Contraintes sur N: présente une situation qui n’est pas un noyau.
1) «HTDC», acronyme de Hobbs’ theory (théorie de Hobbs), est créé pour répondre aux besoins du présent
document et n’apparaît pas, à ce jour, dans la littérature scientifique.
4 © ISO 2016 – Tous droits réservés
Contraintes sur la combinaison N + S: S présente une situation qui, par des moyens autres que l’incitation
à une action volontaire, a causé la situation présentée en N: sans la présentation de S, L peut ne pas
connaître la cause particulière de la situation; une présentation de N est plus centrale que S pour
répondre aux objectifs de A en présentant cette combinaison N-S.
L’effet: L reconnait la situation présentée en S comme cause de la situation présentée en N.
Locus de l’effet: N et S.
Exemple 6 Explanation (HTDC).
Inférer que la déclaration/l’événement exprimé par S cause ou est susceptible de causer la
déclaration/l’événement exprimée par S .
En dépit des divers modes de description de la sémantique des DRel, il est important de noter que, dans
de nombreux cas, les différences résident dans le «niveau» auquel la relation est décrite, en particulier
lorsque les situations rapportées sont les mêmes. Ainsi, par exemple, une DRel définie en termes
informationnels dans un cadre peut être transposée avec efficacité dans une DRel d’un autre cadre
dans lequel elle peut être définie en termes intentionnels. Dans cet esprit, la signification des DRel dans
la spécification DR-core est décrite en termes «informationnels», mais une transposition des types de base
des relations (présentées à l’Article 5) dans les relations figurant dans les classifications existantes, y
compris celles qui définissent les relations en termes intentionnels, est fournie en 6.9.
4.4 Variantes pragmatiques des relations du discours
À l’exception de la HTDC, tous les cadres distinguent également des relations lorsque l’un ou les deux
2)
arguments impliquent une croyance implicite ou un acte de dialogue qui s’applique au contenu
sémantique de l’argument. La motivation de cette distinction découle d’exemples comme l’exemple 7,
dans lequel il convient de ne pas inférer que l’envoi du message de Jean l’a conduit d’une manière ou
d’une autre à ne pas être présent à son travail, mais qu’au contraire il amène l’orateur/l’auteur à croire
que Jean n’est pas au travail. Autrement dit, la signification de la proposition subordonnée apporte
la preuve soutenant l’affirmation de la proposition principale. De façon similaire, dans l’exemple 8, il
convient d’inférer que l’explication est fournie non pas pour le contenu de la question, mais pour l’acte
(de dialogue) de questionnement.
Exemple 7 Jean n’est pas au travail aujourd’hui, car il m’a envoyé un message pour dire qu’il était malade.
Exemple 8 Que fais-tu ce soir? Car il y a un bon film au cinéma.
Ce type de distinction revêt divers noms dans la littérature scientifique, par exemple distinction
[73][66][46] [17][44]
«semantic-pragmatic» , distinction «internal-external» , distinction «ideational-
[63] [37]
pragmatic» et distinction «content-metatalk» . Dans d’autres cas, comme dans la RST, la
distinction, tout en n’étant pas nommée de façon explicite, est manifestement prise en compte dans
la catégorisation (par exemple, Cause vs Evidence/Justify dans la RST opère une distinction entre les
interprétations respectivement sémantique et pragmatique). La difficulté pour harmoniser le traitement
de cette distinction dans les différents cadres réside dans le fait que si certains cadres, comme la CCR,
l’autorisent pour tous les types de relations, d’autres, comme le PDTB et la RST, ne l’admettent que
pour certaines relations (par exemple, Cause, Condition, Contrast, Concession dans le PDTB). Il doit
cependant être noté qu’il ne semble y avoir a priori aucune raison pour restreindre cette distinction à
seulement quelques types de relations et que le choix, en définitive, découle de l’observation du corpus
analysé et/ou annoté. Dans le DR-core, la distinction «semantic-pragmatic» est admise pour tous les
types de relations, principalement dans l’optique de ne pas être trop restrictif en l’absence de critères bien
définis. En même temps, le schéma n’encode pas cette distinction sur la relation, mais sur les arguments de
la relation, la principale raison en étant que, dans tous les cas impliquant une croyance ou un acte de
dialogue, ce qui diffère n’est pas la relation, mais l’état sémantique des arguments. Reconnaître que la
représentation de la distinction sur la relation ne permettra pas de distinguer les cas où la croyance ou
l’acte de dialogue est implicite (comme dans les exemples 7 et 8) de ceux où des verbes performatifs ou
2) Le concept d’acte de dialogue, tel qu’il est utilisé dans la norme ISO 24617-2, peut être considéré comme une
interprétation empirique et bien définie sur le plan computationnel de la notion traditionnelle d’«acte de langage».
des verbes d’attitude propositionnelle les rendent explicites, comme dans les exemples 9 et 10, constitue
une raison supplémentaire. Les interprétations pragmatiques sont, par conséquent, représentées sur
les arguments par le biais d’une caractéristique indiquant que l’argument est du type «croyance» ou du
type «acte de dialogue». À noter que dans les cas représentés par les exemples 9 et 10, le caractère de
croyance ou d’acte du dialogue de la signification est entièrement obtenu par le contenu explicite des
arguments, plutôt que par une inférence motivée par le contexte.
Exemple 9 Je crois que Jean n’est pas au travail aujourd’hui, car il m’a envoyé un message pour dire qu’il
était malade.
Exemple 10 Je te demande ce que tu fais ce soir parce qu’il y a un bon film au cinéma.
4.5 Classification hiérarchique des relations du discours
Dans tous les cadres existants, les relations du discours sont regroupées sémantiquement à un degré
plus ou moins grand; la différence apparaît dans la composition des groupes. Par exemple, alors que le
PDTB regroupe Concession et Contrast dans la classe plus large Comparison, la CCR place Concession
dans le groupe de relations Negative Causal, tout en plaçant Contrast dans le groupe Negative Additive.
L’harmonisation en ce qui concerne ces groupes n’est pas possible, puisqu’ils résultent de différences
fondamentales dans ce qui est considéré s’inscrire dans la proximité sémantique. La solution adoptée
dans la spécification DR-core consiste à utiliser un ensemble de base de relations «plat» qu’il est possible
d’utiliser dans un schéma d’annotation en tant que tel ou de transposer dans le type approprié du
schéma hiérarchique particulier qui est adopté. Ces transpositions des relations DR-core dans les
schémas de différents cadres sont fournies en 6.9.
4.6 Inférence de relations multiples entre deux segments
Parmi les divers cadres, PDTB est unique, car il permet d’inférer des relations multiples entre deux
situations données. Le connecteur «since», par exemple, peut revêtir des interprétations temporelle et
causale, comme dans l’exemple 11.
Exemple 11 MiniScribe has been on the rocks since it disclosed earlier this year that its earnings reports
for 1988 weren’t accurate. (MiniScribe est en difficulté depuis/puisqu’il a révélé au début de l’année que ses
rapports financiers pour 1988 n’étaient pas justes.)
La spécification DR-core prévoit la représentation de relations multiples inférées entre deux situations
données, que ces relations soient réalisées explicitement ou implicitement.
4.7 Représentation de la symétrie ou de l’asymétrie des relations
La symétrie ou l’asymétrie d’une relation du discours est une distinction intégrée dans toutes les
représentations de cadres. Autrement dit, si l’on suppose une relation REL et ses arguments A et B, tous
les cadres reconnaissent si (REL, A, B) est équivalent ou non à (REL, B, A). Par exemple, la relation
Contrast est considérée comme étant symétrique, alors que la relation Cause est considérée comme
asymétrique. La différence entre les cadres réside dans la façon dont cette distinction est intégrée
dans le schéma. La plupart des classifications, telles que la RST, la CCR, la HTDC et le PDTB, encodent
l’asymétrie en termes d’ordonnancement linéaire textuel et/ou de syntaxe des réalisations des
arguments. Ainsi, dans la classification de la CCR, où l’ordonnancement des empans correspondant aux
arguments est l’une des primitives «cognitives» basiques sous-tendant le schéma, la relation Cause-
Consequence intègre l’ordre «basique» pour la relation sémantique causale, la cause apparaissant
avant l’effet, alors que la relation Consequence-Cause intègre l’ordre «non basique», l’effet apparaissant
avant la cause. Dans le PDTB, les empans correspondant aux arguments sont d’abord dénommés Arg1
et Arg2 conformément aux critères syntaxiques, incluant la dépendance syntaxique et l’ordre linéaire,
et les relations asymétriques sont alors définies par rapport aux étiquettes Arg1 et Arg2 (par exemple
dans Cause:Reason, Arg2 est la cause et Arg1 l’effet, alors que dans Cause:Result, Arg1 est la cause
et Arg2 l’effet). Le Graphbank, d’autre part, utilise un mécanisme différent pour intégrer l’asymétrie.
Plutôt que de faire référence à l’ordre linéaire, il utilise des arcs dirigés dans l’annotation, en fournissant
des définitions sur la façon d’interpréter la direction de chaque type de relation (par exemple, pour la
relation Cause-Effect, l’arc est dirigé de l’empan formulant la cause vers l’empan formulant l’effet; pour
6 © ISO 2016 – Tous droits réservés
la relation Violated Expectation, l’arc est dirigé de l’empan formulant la cause vers l’empan formulant
l’absence d’effet, etc.).
Dans la spécification DR-core, la représentation de l’asymétrie fait abstraction de l’ordonnancement
linéaire et de la structure syntaxique, non seulement parce qu’ils ne sont pas sémantiques par nature,
mais aussi parce qu’ils peuvent ne pas être de bons critères du point de vue de l’interopérabilité, en
raison de la grande variation de la syntaxe interlinguistique, y compris la combinaison des propositions.
Au lieu de cela, l’asymétrie est représentée par la spécification des rôles des arguments dans la définition
de chaque relation. Les arguments sont dénommés Arg1 et Arg2, mais ils portent des rôles sémantiques
spécifiques à la relation. Par exemple, dans la relation Cause, avec la définition «Arg2 sert d’explication
à Arg1» (voir Tableau 1), l’empan nommé Arg2 fournit toujours la raison dans la relation Cause,
indépendamment de l’ordre linéaire ou de la syntaxe, et Arg1 constitue toujours le résultat. Pour les
professionnels de l’annotation, les étiquettes mnémoniques indiquant les rôles sémantiques, comme
«raison» et «résultat», sont plus pratiques que «Arg1» et «Arg2», par conséquent la spécification
ISO permet également l’utilisation de ces étiquettes de rôle sémantique. Le Tableau 2 fournit la
correspondance entre les étiquettes Arg1 et Arg2 et les étiquettes de rôle sémantique correspondantes
pour des relations asymétriques. Dans les relations symétriques, d’autre part, dans lesquelles les deux
arguments jouent le même rôle symétrique, les arguments sont dénommés Arg1 et Arg2 en respectant
l’ordre linéaire du texte.
Il est important de noter que cette représentation peut être effectivement transposée dans d’autres
schémas pour représenter l’asymétrie et qu’elle n’occulte en aucune façon les différences de
l’ordonnancement linéaire des arguments, qui peut être facilement déterminé en appariant les rôles
des arguments et les annotations des empans. Le schéma ISO reconnait que l’ordonnancement linéaire
appuie la thèse selon laquelle différentes versions d’une relation asymétriques peuvent ne pas avoir les
mêmes contraintes linguistiques, par exemple, en ce qui concerne les prédiction
...














Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...