Language resource management -- Semantic annotation framework (SemAF) -- Part 8: Semantic relations in discourse, core annotation schema (DR-core)

This document establishes the representation and annotation of local, “low-level” discourse relations between situations mentioned in discourse, where each relation is annotated independently of other relations in the same discourse.
This document provides a basis for annotating discourse relations by specifying a set of core  discourse relations, many of which have similar definitions in different frameworks. To the extent possible, this document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;
— as a target representation of automatic methods for shallow discourse parsing, for summarization, and for other applications.
The objectives of this specification are to provide:
— a reference set of data categories that define a collection of discourse relation types with an explicit semantics;
— a pivot representation based on a framework for defining discourse relations that can facilitate mapping between different frameworks;
— a basis for developing guidelines for creating new resources that will be immediately interoperable with pre-existing resources.
With respect to discourse structure, the limitation of this document to specifications for annotating local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is well understood and can be clearly defined; (b) further extensions to represent higher-level, global discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be compatible across frameworks, even when they are based on different theories of discourse structure. As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard aims to be transparent in its relation to existing frameworks for discourse relation annotation, but also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1 (time and events); still other discourse relations are very similar to certain predicate-argument relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various parts are required to form a consistent whole, this document pays special attention to the interactions of discourse relation annotation and other semantic annotation schemes (see Clause 8).
This document does not consider global, higher-level discourse structure representation which involves linking local discourse relations to form one or more composite global structures. This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example, presentational relations, which concern the way in which a text is presented to its readers or the way in
which speakers structure their contributions in a spoken dialogue.

Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie 8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)

L'ISO 24617-8:2016 d�termine la repr�sentation et l'annotation des relations du discours locales, de �bas niveau�, entre les situations mentionn�es dans le discours, chaque relation �tant annot�e ind�pendamment des autres relations dans le m�me discours.
L'ISO 24617-8:2016 fournit un socle d'annotation des relations du discours, en sp�cifiant un ensemble de base de relations du discours, un grand nombre d'entre elles rev�tant des d�finitions similaires dans des cadres diff�rents. Dans la mesure du possible, le pr�sent document fournit des transpositions de s�mantique dans les diff�rents cadres existants.
L'ISO 24617-8:2016 peut �tre appliqu� � deux situations diff�rentes:
- pour l'annotation des relations du discours dans les corpus de langage naturel;
- en tant que repr�sentation cible des m�thodes automatiques d'analyse de surface du discours, pour le r�sum� automatique et autres applications.
Les objectifs de cette sp�cification sont de fournir:
- un ensemble de r�f�rence de cat�gories de donn�es qui d�finissent une collection de types de relations du discours avec une s�mantique explicite;
- une repr�sentation pivot bas�e sur un cadre de d�finition des relations du discours qui peut faciliter la transposition entre diff�rents cadres;
- une base d'�laboration de lignes directrices en vue de cr�er de nouvelles ressources qui seront imm�diatement interop�rables avec des ressources pr�-existantes.
En ce qui concerne la structure du discours, la limite du pr�sent document aux sp�cifications d'annotation de relations du discours locales, de �bas niveau�, est fond�e sur l'id�e (a) que l'analyse � ce niveau correspond � ce qui est bien compris et peut �tre clairement d�fini, (b) qu'il est possible, s'il y a lieu, de proc�der � des extensions compl�mentaires permettant de repr�senter une structure de discours globale de niveau plus �lev�, et (c) qu'il permettra une compatibilit� des annotations en d�coulant avec les divers cadres, m�me s'ils reposent sur des th�ories de structure du discours diff�rentes.
En tant que partie int�grante du cadre d'annotation s�mantique (SemAF) de l'ISO 24617, l'ISO 24617-8:2016 DR-core a pour objectif d'�tre transparente dans sa relation avec les cadres d'annotations des relations du discours existants, mais �galement d'�tre compatible avec les autres parties de l'ISO 24617. Certaines relations du discours sont sp�cifiques au discours interactif et recoupent la Partie 2 de l'ISO 24617 consacr�e � l'annotation des actes de dialogue. D'autres relations du discours se rapportent au temps, et leur annotation fait partie int�grante de l'ISO 24617‑1 (temps et �v�nements); d'autres relations du discours encore sont tr�s semblables � certaines relations pr�dicat-argument (�r�les s�mantiques�), dont l'annotation est l'objet principal de l'ISO 24617‑4. Puisque les diff�rentes parties sont indispensables pour constituer un ensemble coh�rent, le pr�sent document porte une attention particuli�re aux interactions de l'annotation des relations du discours avec les autres sch�mas d'annotation s�mantique (voir Article 8).
L'ISO 24617-8:2016 ne traite pas de la repr�sentation des structures de discours globales de niveau �lev�, qui implique de relier des relations du discours locales pour constituer une ou plusieurs structures globales plus complexes.
L'ISO 24617-8:2016 se limite, en outre, aux relations strictement s�mantiques, et exclut donc, par exemple, les relations pr�sentationnelles, qui concernent la fa�on dont un texte est pr�sent� � ses lecteurs ou la fa�on dont des locuteurs structurent leurs contributions � un dialogue oral.

Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8. del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)

Ta dokument ureja predstavitev in označevanje odnosov v lokalnem diskurzu »na nizki ravni« med okoliščinami, omenjenimi v diskurzu, kjer je vsak odnos označen neodvisno od drugih odnosov v istem diskurzu.
Ta dokument določa podlago za označevanje odnosov diskurza z določitvijo nabora temeljnih odnosov diskurza, od katerih imajo številni podobne definicije v različnih ogrodjih. Ta dokument, kolikor mogoče, določa preslikave semantike med različnimi ogrodji.
Ta dokument se uporablja v dveh različnih okoliščinah:
— za označevanje odnosov diskurza v korpusu naravnega jezika;
— kot ciljno predstavitev samodejnih metod za plitko razčlenjevanje diskurza, za povzemanje in druge aplikacije.
Cilji te specifikacije so zagotoviti:
— referenčni nabor podatkovnih kategorij, ki definirajo zbirko vrst odnosov diskurza z eksplicitno semantiko;
— ključno predstavitev, ki temelji na ogrodju za definiranje odnosov diskurza, ki lahko omogoči preslikavo med različnimi ogrodji;
— podlago za pripravo smernic za ustvarjanje novih virov, ki bodo takoj interoperabilni s predhodno obstoječimi viri.
Ob upoštevanju strukture diskurza, omejitev tega dokumenta na specifikacije za označevanje lokalnih odnosov diskurza »na nizki ravni« temelji na pogledu, da (a) je analiza na tej ravni tisto, kar je dobro razumljivo in je mogoče jasno definirati; (b) so, kjer je zaželeno, mogoče nadaljnje razširitve za predstavitev globalne strukture diskurza na višji ravni; in (c) da omogoča združljivost označevanja, ki nastane, med ogrodji, tudi kadar ta temeljijo na različnih teorijah strukture diskurza. Kot del ogrodja za semantično označevanje (»SemAF«) iz standarda ISO 24617 trenutni standard CD-jedro poskuša biti transparenten v svojem odnosu do obstoječih ogrodij za označevanje odnosov diskurza, hkrati pa tudi združljiv z drugimi deli standarda ISO 24617. Nekateri odnosi diskurza so značilni za interaktivni diskurz in se prekrivajo z 2. delom standarda ISO 24617, standarda ISO za označevanje dialogov. Drugi odnosi diskurza se nanašajo na čas in njihovo označevanje je del standarda ISO 24617-1 (čas in dogodki); spet drugi odnosi diskurza pa so zelo podobni določenim odnosov med predikatom in argumenti (»semantične vloge«), katerih označevanje je predmet standarda ISO 24617-4. Ker so za oblikovanje konsistentne celote potrebni različni deli, ta dokument posveča posebno pozornost označevanju interakcij v odnosih diskurza in drugim shemam semantičnega označevanja (glej 8. točko).
Ta dokument ne upošteva predstavitve strukture globalnega diskurza na višji ravni, ki zajema povezovanje odnosov lokalnega diskurza za oblikovanje ene ali več sestavljenih globalnih struktur. Ta dokument je dodatno omejen strogo na semantične odnose, pri čemer so na primer predstavitveni odnosi, ki se nanašajo na način, na katerega je besedilo predstavljeno bralcem, ali način, na
katerega govorci strukturirajo svoje prispevke v govorjenem dialogu, izključeni.

General Information

Status
Published
Public Enquiry End Date
30-Jul-2017
Publication Date
23-Aug-2018
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
30-Jul-2018
Due Date
04-Oct-2018
Completion Date
24-Aug-2018

Buy Standard

Standard
SIST ISO 24617-8:2018
English language
48 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Standard
ISO 24617-8:2016 - Language resource management -- Semantic annotation framework (SemAF)
English language
43 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
SIST ISO 24617-8:2018
English language
48 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Standard
SIST ISO 24617-8:2018
English language
48 pages
sale 10% off
Preview
sale 10% off
Preview

e-Library read for
1 day
Standard
ISO 24617-8:2016 - Gestion des ressources langagieres -- Cadre d'annotation sémantique (SemAF)
French language
46 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

SLOVENSKI STANDARD
SIST ISO 24617-8:2018
01-september-2018
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8.
del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)

Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie

8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
01.140.20 Informacijske vede Information sciences
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24617-8:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 2 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 2

3 Terms and definitions ..................................................................................................................................................................................... 2

4 Basic concepts and metamodel ............................................................................................................................................................. 3

4.1 Overview ...................................................................................................................................................................................................... 3

4.2 Representation of discourse structure ............................................................................................................................... 3

4.3 Semantic description of discourse relations ................................................................................................................. 4

4.4 Pragmatic variants of discourse relations ....................................................................................................................... 4

4.5 Hierarchical classification of discourse relations ..................................................................................................... 5

4.6 Inference of multiple relations between two segments ...................................................................................... 5

4.7 Representation of (a)symmetry of relations ................................................................................................................. 6

4.8 Representation of the relative importance of arguments for discourse meaning/

structure ....................................................................................................................................................................................................... 6

4.9 Arity of arguments ............................................................................................................................................................................... 7

4.10 Syntactic form, extent, and (non-)adjacency of argument realizations ................................................. 7

4.11 Triggers of discourse relations ................................................................................................................................................. 7

4.12 Representation of attribution as a discourse relation .......................................................................................... 8

4.13 Representation of entity-based relations......................................................................................................................... 9

4.14 Representation of non-existence of a discourse relation ................................................................................10

4.15 Summary: Assumptions of the DR-core annotation scheme ........................................................................10

4.16 Issues to be taken up in the follow-up of DR-core .................................................................................................11

4.17 Metamodel ...............................................................................................................................................................................................11

5 Core discourse relations ............................................................................................................................................................................12

6 Current approaches and annotation schemes ....................................................................................................................21

6.1 Overview ...................................................................................................................................................................................................21

6.2 Rhetorical structure theory (RST) ......................................................................................................................................21

6.3 RST Treebank ........................................................................................................................................................................................22

6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .....................................................................................................24

6.5 GraphBank ...............................................................................................................................................................................................24

6.6 SDRT ..............................................................................................................................................................................................................25

6.7 CCR .................................................................................................................................................................................................................26

6.8 Penn Discourse Treebank (PDTB) .......................................................................................................................................26

6.9 Mapping of DR-core discourse relations to existing classifications .......................................................28

7 Interactions of this document with other annotation schemes .......................................................................30

7.1 Overlapping annotation schemes ........................................................................................................................................30

7.2 Discourse relations and semantic roles ..........................................................................................................................31

7.3 Discourse relations and temporal relations ...............................................................................................................31

7.4 Discourse relations and semantic relations between dialogue acts ......................................................32

8 DRelML: Discourse Relations Markup Language .............................................................................................................33

8.1 Overview ...................................................................................................................................................................................................33

8.2 DRelML abstract syntax and semantics ..........................................................................................................................34

8.3 Concrete syntax ...................................................................................................................................................................................35

Bibliography .............................................................................................................................................................................................................................39

© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Introduction

The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena

in support of empirical natural language research, both computational and theoretical. At the level of

discourse, interest in discourse processing has led to the development of several corpora annotated for

discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are

relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key

to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.

Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such

[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]

machine translation, opinion mining and sentiment analysis, and information retrieval. A

[76]

recent overview includes a description of the state of the art in discourse and computation. Several

international and collaborative efforts have resulted in annotated resources of discourse relations,

across languages as well as genres, to support the development of such applications.

Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of

which concerns the representation of discourse structure, while the other has to do with the semantic

classification of discourse relations. As a result, annotations constructed using one framework are not

easily interpreted in another framework, and annotated resources are limited in their interoperability.

Notwithstanding their differences, however, there are strong compatibilities between them that can be

clarified and used as the basis for mappings and comparisons between the resources, as well as for use

as a basis for future annotation.

In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,

states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,

temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence

relations”. Although discourse relations hold most prominently between the meanings of successive

sentences or utterances in a discourse, they may also occur between the meanings of smaller or

larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between

situations that are not explicitly described but that can be inferred.

This document aims to specify an interoperable approach to the annotation of local semantic relations

in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also

Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects

the view that strong underlying compatibilities with respect to the semantic description of discourse

relations can be observed in the various discourse relation frameworks being used to support data

[40]

annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory

[3] [59] [17][18]

(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and

[66]

the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation

of these compatibilities and a loose mapping between definitions of individual discourse relations, as

specified in the different frameworks that will benefit the community as a whole.

The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;

(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard

annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions

of a set of “core” discourse relations which are commonly found in some form in many existing discourse

relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.

This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at

providing an open, extensible set of core relations. The core annotation scheme also discusses certain

issues in discourse relation annotation that it leaves open, as they require further study in collaboration

with other efforts in multilingual discourse annotation, in particular the European COST action

TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a

complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension

of the standard. The issues to be taken up for this complementary part are listed in 4.16.

© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 8 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope

This document establishes the representation and annotation of local, “low-level” discourse relations

between situations mentioned in discourse, where each relation is annotated independently of other

relations in the same discourse.

This document provides a basis for annotating discourse relations by specifying a set of core discourse

relations, many of which have similar definitions in different frameworks. To the extent possible, this

document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;

— as a target representation of automatic methods for shallow discourse parsing, for summarization,

and for other applications.
The objectives of this specification are to provide:

— a reference set of data categories that define a collection of discourse relation types with an explicit

semantics;

— a pivot representation based on a framework for defining discourse relations that can facilitate

mapping between different frameworks;

— a basis for developing guidelines for creating new resources that will be immediately interoperable

with pre-existing resources.

With respect to discourse structure, the limitation of this document to specifications for annotating

local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is

well understood and can be clearly defined; (b) further extensions to represent higher-level, global

discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be

compatible across frameworks, even when they are based on different theories of discourse structure.

As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard

aims to be transparent in its relation to existing frameworks for discourse relation annotation, but

also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive

discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act

annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1

(time and events); still other discourse relations are very similar to certain predicate-argument

relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various

parts are required to form a consistent whole, this document pays special attention to the interactions

of discourse relation annotation and other semantic annotation schemes (see Clause 8).

This document does not consider global, higher-level discourse structure representation which involves

linking local discourse relations to form one or more composite global structures.

© ISO 2016 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)

This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,

presentational relations, which concern the way in which a text is presented to its readers or the way in

which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation

eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically

simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse

segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)

EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two

sentences are implicitly related through the discourse relation Cause.

EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the

two clauses are related by the discourse relation Concession, expressed by the connective “but”.

EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal

relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an

eventuality, but a proposition, formed by an event description with negative polarity.

Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence

relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)

EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word

discourse connectives include “as well as”, “such as”.

Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal

conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure

representation of discourse structure that only specifies local dependencies between a discourse

relation and its arguments, without further specifying any links or dependencies across these local

structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview

In a discourse, which comes into play when communication involves a sequence of clauses or sentences

in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,

states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.

Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called

the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and

phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis

of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized

with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”

is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the

buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate

two sentences to express the consequence of not seeing many signs about growth coming to a halt. In

Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,

but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction

or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two

sentences, with the second sentence offering an explanation for why some (investors) have raised their

cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse

context needs to be used together with, cohesive devices and world knowledge to get at the relation.

[44]

Often, when such relations are inferred, it is possible to insert a connective phrase to express the

relation, as shown here with the insertion of “because”. In this document, the term “connective” is used

in a broad sense, to refer to any word or phrase used to express a discourse relation, including both

those drawn from well-defined syntactic classes as well as those that are not.

Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares

because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had

recommended the stock as a good buy.

Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.

As a result, Fed officials may be divided over whether to ease credit.

Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund

managers said. A major reason is that investors already have sharply scaled back their purchases

of stock funds since Black Monday.

Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash

positions help buffer a fund when the market falls.

Existing frameworks for describing and representing discourse relations differ along several lines.

The remainder of this clause provides a comparison of the most important frameworks, focusing on

those that have been used as the basis for annotating discourse relations in corpora, in particular the

[18]

Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann

[40] [66]

and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,

[3]

Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation

[59][61]

framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses

the main issues that are considered relevant for developing the pivot representation in DR-core. For

each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends

with a summary of the key features of the DR-core specification, and the DR-core metamodel.

4.2 Representation of discourse structure

One important difference between existing DRel frameworks concerns the representation of discourse

[10] [40]

structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a

[78]

tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on

1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,

appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
[64]

HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus

[1]

and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,

but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to

[59]

discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse

[16][75] [65]

relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,

individual relations along with their arguments are annotated, without being combined with other

relations to form a composite structure encompassing the entire text.

These widely different views about the structural representation for discourse are difficult to reconcile

with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of

discourse relations is adopted, with the idea that individual relations can be more reliably annotated

and that they can be further annotated to project a higher-level tree or graph structure, depending on

one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can

also serve as a pivot representation when comparing annotations of different resources grounded in

different theories.
4.3 Semantic description of discourse relations

A second difference among existing frameworks relates to whether the meaning of a discourse relation

is described in “informatio
...

INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
ISO 2016
---------------------- Page: 1 ----------------------
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24617-8:2016(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 2

3 Terms and definitions ..................................................................................................................................................................................... 2

4 Basic concepts and metamodel ............................................................................................................................................................. 3

4.1 Overview ...................................................................................................................................................................................................... 3

4.2 Representation of discourse structure ............................................................................................................................... 3

4.3 Semantic description of discourse relations ................................................................................................................. 4

4.4 Pragmatic variants of discourse relations ....................................................................................................................... 4

4.5 Hierarchical classification of discourse relations ..................................................................................................... 5

4.6 Inference of multiple relations between two segments ...................................................................................... 5

4.7 Representation of (a)symmetry of relations ................................................................................................................. 6

4.8 Representation of the relative importance of arguments for discourse meaning/

structure ....................................................................................................................................................................................................... 6

4.9 Arity of arguments ............................................................................................................................................................................... 7

4.10 Syntactic form, extent, and (non-)adjacency of argument realizations ................................................. 7

4.11 Triggers of discourse relations ................................................................................................................................................. 7

4.12 Representation of attribution as a discourse relation .......................................................................................... 8

4.13 Representation of entity-based relations......................................................................................................................... 9

4.14 Representation of non-existence of a discourse relation ................................................................................10

4.15 Summary: Assumptions of the DR-core annotation scheme ........................................................................10

4.16 Issues to be taken up in the follow-up of DR-core .................................................................................................11

4.17 Metamodel ...............................................................................................................................................................................................11

5 Core discourse relations ............................................................................................................................................................................12

6 Current approaches and annotation schemes ....................................................................................................................21

6.1 Overview ...................................................................................................................................................................................................21

6.2 Rhetorical structure theory (RST) ......................................................................................................................................21

6.3 RST Treebank ........................................................................................................................................................................................22

6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .....................................................................................................24

6.5 GraphBank ...............................................................................................................................................................................................24

6.6 SDRT ..............................................................................................................................................................................................................25

6.7 CCR .................................................................................................................................................................................................................26

6.8 Penn Discourse Treebank (PDTB) .......................................................................................................................................26

6.9 Mapping of DR-core discourse relations to existing classifications .......................................................28

7 Interactions of this document with other annotation schemes .......................................................................30

7.1 Overlapping annotation schemes ........................................................................................................................................30

7.2 Discourse relations and semantic roles ..........................................................................................................................31

7.3 Discourse relations and temporal relations ...............................................................................................................31

7.4 Discourse relations and semantic relations between dialogue acts ......................................................32

8 DRelML: Discourse Relations Markup Language .............................................................................................................33

8.1 Overview ...................................................................................................................................................................................................33

8.2 DRelML abstract syntax and semantics ..........................................................................................................................34

8.3 Concrete syntax ...................................................................................................................................................................................35

Bibliography .............................................................................................................................................................................................................................39

© ISO 2016 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24617-8:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24617-8:2016(E)
Introduction

The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena

in support of empirical natural language research, both computational and theoretical. At the level of

discourse, interest in discourse processing has led to the development of several corpora annotated for

discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are

relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key

to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.

Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such

[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]

machine translation, opinion mining and sentiment analysis, and information retrieval. A

[76]

recent overview includes a description of the state of the art in discourse and computation. Several

international and collaborative efforts have resulted in annotated resources of discourse relations,

across languages as well as genres, to support the development of such applications.

Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of

which concerns the representation of discourse structure, while the other has to do with the semantic

classification of discourse relations. As a result, annotations constructed using one framework are not

easily interpreted in another framework, and annotated resources are limited in their interoperability.

Notwithstanding their differences, however, there are strong compatibilities between them that can be

clarified and used as the basis for mappings and comparisons between the resources, as well as for use

as a basis for future annotation.

In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,

states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,

temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence

relations”. Although discourse relations hold most prominently between the meanings of successive

sentences or utterances in a discourse, they may also occur between the meanings of smaller or

larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between

situations that are not explicitly described but that can be inferred.

This document aims to specify an interoperable approach to the annotation of local semantic relations

in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also

Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects

the view that strong underlying compatibilities with respect to the semantic description of discourse

relations can be observed in the various discourse relation frameworks being used to support data

[40]

annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory

[3] [59] [17][18]

(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and

[66]

the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation

of these compatibilities and a loose mapping between definitions of individual discourse relations, as

specified in the different frameworks that will benefit the community as a whole.

The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;

(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard

annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions

of a set of “core” discourse relations which are commonly found in some form in many existing discourse

relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.

This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at

providing an open, extensible set of core relations. The core annotation scheme also discusses certain

issues in discourse relation annotation that it leaves open, as they require further study in collaboration

with other efforts in multilingual discourse annotation, in particular the European COST action

TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a

complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension

of the standard. The issues to be taken up for this complementary part are listed in 4.16.

© ISO 2016 – All rights reserved v
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope

This document establishes the representation and annotation of local, “low-level” discourse relations

between situations mentioned in discourse, where each relation is annotated independently of other

relations in the same discourse.

This document provides a basis for annotating discourse relations by specifying a set of core discourse

relations, many of which have similar definitions in different frameworks. To the extent possible, this

document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;

— as a target representation of automatic methods for shallow discourse parsing, for summarization,

and for other applications.
The objectives of this specification are to provide:

— a reference set of data categories that define a collection of discourse relation types with an explicit

semantics;

— a pivot representation based on a framework for defining discourse relations that can facilitate

mapping between different frameworks;

— a basis for developing guidelines for creating new resources that will be immediately interoperable

with pre-existing resources.

With respect to discourse structure, the limitation of this document to specifications for annotating

local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is

well understood and can be clearly defined; (b) further extensions to represent higher-level, global

discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be

compatible across frameworks, even when they are based on different theories of discourse structure.

As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard

aims to be transparent in its relation to existing frameworks for discourse relation annotation, but

also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive

discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act

annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1

(time and events); still other discourse relations are very similar to certain predicate-argument

relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various

parts are required to form a consistent whole, this document pays special attention to the interactions

of discourse relation annotation and other semantic annotation schemes (see Clause 8).

This document does not consider global, higher-level discourse structure representation which involves

linking local discourse relations to form one or more composite global structures.

© ISO 2016 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO 24617-8:2016(E)

This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,

presentational relations, which concern the way in which a text is presented to its readers or the way in

which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation

eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically

simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse

segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)

EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two

sentences are implicitly related through the discourse relation Cause.

EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the

two clauses are related by the discourse relation Concession, expressed by the connective “but”.

EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal

relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an

eventuality, but a proposition, formed by an event description with negative polarity.

Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence

relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)

EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word

discourse connectives include “as well as”, “such as”.

Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal

conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure

representation of discourse structure that only specifies local dependencies between a discourse

relation and its arguments, without further specifying any links or dependencies across these local

structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview

In a discourse, which comes into play when communication involves a sequence of clauses or sentences

in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,

states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.

Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called

the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and

phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis

of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized

with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”

is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the

buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate

two sentences to express the consequence of not seeing many signs about growth coming to a halt. In

Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,

but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction

or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two

sentences, with the second sentence offering an explanation for why some (investors) have raised their

cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse

context needs to be used together with, cohesive devices and world knowledge to get at the relation.

[44]

Often, when such relations are inferred, it is possible to insert a connective phrase to express the

relation, as shown here with the insertion of “because”. In this document, the term “connective” is used

in a broad sense, to refer to any word or phrase used to express a discourse relation, including both

those drawn from well-defined syntactic classes as well as those that are not.

Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares

because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had

recommended the stock as a good buy.

Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.

As a result, Fed officials may be divided over whether to ease credit.

Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund

managers said. A major reason is that investors already have sharply scaled back their purchases

of stock funds since Black Monday.

Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash

positions help buffer a fund when the market falls.

Existing frameworks for describing and representing discourse relations differ along several lines.

The remainder of this clause provides a comparison of the most important frameworks, focusing on

those that have been used as the basis for annotating discourse relations in corpora, in particular the

[18]

Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann

[40] [66]

and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,

[3]

Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation

[59][61]

framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses

the main issues that are considered relevant for developing the pivot representation in DR-core. For

each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends

with a summary of the key features of the DR-core specification, and the DR-core metamodel.

4.2 Representation of discourse structure

One important difference between existing DRel frameworks concerns the representation of discourse

[10] [40]

structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a

[78]

tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on

1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,

appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO 24617-8:2016(E)
[64]

HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus

[1]

and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,

but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to

[59]

discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse

[16][75] [65]

relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,

individual relations along with their arguments are annotated, without being combined with other

relations to form a composite structure encompassing the entire text.

These widely different views about the structural representation for discourse are difficult to reconcile

with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of

discourse relations is adopted, with the idea that individual relations can be more reliably annotated

and that they can be further annotated to project a higher-level tree or graph structure, depending on

one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can

also serve as a pivot representation when comparing annotations of different resources grounded in

different theories.
4.3 Semantic description of discourse relations

A second difference among existing frameworks relates to whether the meaning of a discourse relation

is described in “informational” term, i.e. in terms of the “meaning” of the relation’s arguments, or in

“intentional” terms, i.e. in terms of the intentions of the speaker/writer (W) and intended effects on

the hearer/reader (R). While SDRT, HTDC, PDTB and CCR describe the meaning in informational terms,

RST provides definitions in intentional terms. For instance, Example 5 shows the definition for the

(non-volitional) Cause relation in RST (N = nucleus, S = satellite, W = writer, R = reader), while Example

6 presents the definition for the same relation in HTDC (where it is called Explanation).

Example 5 Non-Volitional Cause (RST)
Constraints on N: presents a situation that is not a nucleus

Constraints on the N + S combination: S presents a situation that, by means other than motivating a

volitional action, caused the situation presented in N; without the presentation of S, R might not know

the particular cause of the situation; a presentation of N is more central than S to W’s purposes in

putting forth the N-S combination

The effect: R recognizes the situation presented in S as a cause of the situation presented in N

Locus of the effect: N and S.
Example 6 Explanation (HTDC)

Infer that the state/event asserted by S causes or could cause the state/event asserted by S .

1 0

Despite the different ways of describing DRel semantics, it is important to note that in many cases, the

differences lie in the “level” at which the
...

SLOVENSKI STANDARD
SIST ISO 24617-8:2018
01-september-2018
Upravljanje z jezikovnimi viri - Ogrodje za semantično označevanje (SemAF) - 8.
del: Semantični odnosi v diskurzu, osnovna shema označevanja (CD-jedro)
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)

Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie

8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 Terminologija (načela in Terminology (principles and
koordinacija) coordination)
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24617-8:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 2 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 2

3 Terms and definitions ..................................................................................................................................................................................... 2

4 Basic concepts and metamodel ............................................................................................................................................................. 3

4.1 Overview ...................................................................................................................................................................................................... 3

4.2 Representation of discourse structure ............................................................................................................................... 3

4.3 Semantic description of discourse relations ................................................................................................................. 4

4.4 Pragmatic variants of discourse relations ....................................................................................................................... 4

4.5 Hierarchical classification of discourse relations ..................................................................................................... 5

4.6 Inference of multiple relations between two segments ...................................................................................... 5

4.7 Representation of (a)symmetry of relations ................................................................................................................. 6

4.8 Representation of the relative importance of arguments for discourse meaning/

structure ....................................................................................................................................................................................................... 6

4.9 Arity of arguments ............................................................................................................................................................................... 7

4.10 Syntactic form, extent, and (non-)adjacency of argument realizations ................................................. 7

4.11 Triggers of discourse relations ................................................................................................................................................. 7

4.12 Representation of attribution as a discourse relation .......................................................................................... 8

4.13 Representation of entity-based relations......................................................................................................................... 9

4.14 Representation of non-existence of a discourse relation ................................................................................10

4.15 Summary: Assumptions of the DR-core annotation scheme ........................................................................10

4.16 Issues to be taken up in the follow-up of DR-core .................................................................................................11

4.17 Metamodel ...............................................................................................................................................................................................11

5 Core discourse relations ............................................................................................................................................................................12

6 Current approaches and annotation schemes ....................................................................................................................21

6.1 Overview ...................................................................................................................................................................................................21

6.2 Rhetorical structure theory (RST) ......................................................................................................................................21

6.3 RST Treebank ........................................................................................................................................................................................22

6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .....................................................................................................24

6.5 GraphBank ...............................................................................................................................................................................................24

6.6 SDRT ..............................................................................................................................................................................................................25

6.7 CCR .................................................................................................................................................................................................................26

6.8 Penn Discourse Treebank (PDTB) .......................................................................................................................................26

6.9 Mapping of DR-core discourse relations to existing classifications .......................................................28

7 Interactions of this document with other annotation schemes .......................................................................30

7.1 Overlapping annotation schemes ........................................................................................................................................30

7.2 Discourse relations and semantic roles ..........................................................................................................................31

7.3 Discourse relations and temporal relations ...............................................................................................................31

7.4 Discourse relations and semantic relations between dialogue acts ......................................................32

8 DRelML: Discourse Relations Markup Language .............................................................................................................33

8.1 Overview ...................................................................................................................................................................................................33

8.2 DRelML abstract syntax and semantics ..........................................................................................................................34

8.3 Concrete syntax ...................................................................................................................................................................................35

Bibliography .............................................................................................................................................................................................................................39

© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Introduction

The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena

in support of empirical natural language research, both computational and theoretical. At the level of

discourse, interest in discourse processing has led to the development of several corpora annotated for

discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are

relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key

to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.

Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such

[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]

machine translation, opinion mining and sentiment analysis, and information retrieval. A

[76]

recent overview includes a description of the state of the art in discourse and computation. Several

international and collaborative efforts have resulted in annotated resources of discourse relations,

across languages as well as genres, to support the development of such applications.

Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of

which concerns the representation of discourse structure, while the other has to do with the semantic

classification of discourse relations. As a result, annotations constructed using one framework are not

easily interpreted in another framework, and annotated resources are limited in their interoperability.

Notwithstanding their differences, however, there are strong compatibilities between them that can be

clarified and used as the basis for mappings and comparisons between the resources, as well as for use

as a basis for future annotation.

In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,

states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,

temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence

relations”. Although discourse relations hold most prominently between the meanings of successive

sentences or utterances in a discourse, they may also occur between the meanings of smaller or

larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between

situations that are not explicitly described but that can be inferred.

This document aims to specify an interoperable approach to the annotation of local semantic relations

in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also

Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects

the view that strong underlying compatibilities with respect to the semantic description of discourse

relations can be observed in the various discourse relation frameworks being used to support data

[40]

annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory

[3] [59] [17][18]

(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and

[66]

the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation

of these compatibilities and a loose mapping between definitions of individual discourse relations, as

specified in the different frameworks that will benefit the community as a whole.

The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;

(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard

annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions

of a set of “core” discourse relations which are commonly found in some form in many existing discourse

relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.

This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at

providing an open, extensible set of core relations. The core annotation scheme also discusses certain

issues in discourse relation annotation that it leaves open, as they require further study in collaboration

with other efforts in multilingual discourse annotation, in particular the European COST action

TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a

complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension

of the standard. The issues to be taken up for this complementary part are listed in 4.16.

© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 8 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope

This document establishes the representation and annotation of local, “low-level” discourse relations

between situations mentioned in discourse, where each relation is annotated independently of other

relations in the same discourse.

This document provides a basis for annotating discourse relations by specifying a set of core discourse

relations, many of which have similar definitions in different frameworks. To the extent possible, this

document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;

— as a target representation of automatic methods for shallow discourse parsing, for summarization,

and for other applications.
The objectives of this specification are to provide:

— a reference set of data categories that define a collection of discourse relation types with an explicit

semantics;

— a pivot representation based on a framework for defining discourse relations that can facilitate

mapping between different frameworks;

— a basis for developing guidelines for creating new resources that will be immediately interoperable

with pre-existing resources.

With respect to discourse structure, the limitation of this document to specifications for annotating

local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is

well understood and can be clearly defined; (b) further extensions to represent higher-level, global

discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be

compatible across frameworks, even when they are based on different theories of discourse structure.

As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard

aims to be transparent in its relation to existing frameworks for discourse relation annotation, but

also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive

discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act

annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1

(time and events); still other discourse relations are very similar to certain predicate-argument

relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various

parts are required to form a consistent whole, this document pays special attention to the interactions

of discourse relation annotation and other semantic annotation schemes (see Clause 8).

This document does not consider global, higher-level discourse structure representation which involves

linking local discourse relations to form one or more composite global structures.

© ISO 2016 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)

This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,

presentational relations, which concern the way in which a text is presented to its readers or the way in

which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation

eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically

simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse

segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)

EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two

sentences are implicitly related through the discourse relation Cause.

EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the

two clauses are related by the discourse relation Concession, expressed by the connective “but”.

EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal

relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an

eventuality, but a proposition, formed by an event description with negative polarity.

Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence

relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)

EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word

discourse connectives include “as well as”, “such as”.

Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal

conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure

representation of discourse structure that only specifies local dependencies between a discourse

relation and its arguments, without further specifying any links or dependencies across these local

structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview

In a discourse, which comes into play when communication involves a sequence of clauses or sentences

in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,

states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.

Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called

the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and

phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis

of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized

with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”

is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the

buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate

two sentences to express the consequence of not seeing many signs about growth coming to a halt. In

Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,

but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction

or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two

sentences, with the second sentence offering an explanation for why some (investors) have raised their

cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse

context needs to be used together with, cohesive devices and world knowledge to get at the relation.

[44]

Often, when such relations are inferred, it is possible to insert a connective phrase to express the

relation, as shown here with the insertion of “because”. In this document, the term “connective” is used

in a broad sense, to refer to any word or phrase used to express a discourse relation, including both

those drawn from well-defined syntactic classes as well as those that are not.

Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares

because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had

recommended the stock as a good buy.

Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.

As a result, Fed officials may be divided over whether to ease credit.

Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund

managers said. A major reason is that investors already have sharply scaled back their purchases

of stock funds since Black Monday.

Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash

positions help buffer a fund when the market falls.

Existing frameworks for describing and representing discourse relations differ along several lines.

The remainder of this clause provides a comparison of the most important frameworks, focusing on

those that have been used as the basis for annotating discourse relations in corpora, in particular the

[18]

Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann

[40] [66]

and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,

[3]

Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation

[59][61]

framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses

the main issues that are considered relevant for developing the pivot representation in DR-core. For

each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends

with a summary of the key features of the DR-core specification, and the DR-core metamodel.

4.2 Representation of discourse structure

One important difference between existing DRel frameworks concerns the representation of discourse

[10] [40]

structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a

[78]

tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on

1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,

appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
[64]

HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus

[1]

and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,

but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to

[59]

discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse

[16][75] [65]

relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,

individual relations along with their arguments are annotated, without being combined with other

relations to form a composite structure encompassing the entire text.

These widely different views about the structural representation for discourse are difficult to reconcile

with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of

discourse relations is adopted, with the idea that individual relations can be more reliably annotated

and that they can be further annotated to project a higher-level tree or graph structure, depending on

one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can

also serve as a pivot representation when comparing annotations of different resources grounded in

different theories.
4.3 Semantic description of discourse relations

A second difference among existing frameworks relates to whether the meaning of a discourse relation

is described in “informational” term, i.e. in terms of the “meaning” of the r
...

SLOVENSKI STANDARD
SIST ISO 24617-8:2018
01-september-2018
8SUDYOMDQMH]MH]LNRYQLPLYLUL2JURGMH]DVHPDQWLþQRR]QDþHYDQMH 6HP$) 
GHO6HPDQWLþQLRGQRVLYGLVNXU]XRVQRYQDVKHPDR]QDþHYDQMD &'MHGUR
Language resource management -- Semantic annotation framework (SemAF) -- Part 8:
Semantic relations in discourse, core annotation schema (DR-core)

Gestion des ressources langagières -- Cadre d'annotation sémantique (SemAF) -- Partie

8: Relations sémantiques dans le discours, schéma d'annotation de base (DR-core)
Ta slovenski standard je istoveten z: ISO 24617-8:2016
ICS:
01.020 7HUPLQRORJLMD QDþHODLQ Terminology (principles and
NRRUGLQDFLMD coordination)
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
SIST ISO 24617-8:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 2 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL ISO
STANDARD 24617-8
First edition
2016-12-15
Language resource management —
Semantic annotation framework
(SemAF) —
Part 8:
Semantic relations in discourse, core
annotation schema (DR-core)
Gestion des ressources langagières — Cadre d’annotation sémantique
(SemAF) —
Partie 8: Relations sémantiques dans le discours, schéma d’annotation
de base (DR-core)
Reference number
ISO 24617-8:2016(E)
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Contents Page

Foreword ........................................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 2

3 Terms and definitions ..................................................................................................................................................................................... 2

4 Basic concepts and metamodel ............................................................................................................................................................. 3

4.1 Overview ...................................................................................................................................................................................................... 3

4.2 Representation of discourse structure ............................................................................................................................... 3

4.3 Semantic description of discourse relations ................................................................................................................. 4

4.4 Pragmatic variants of discourse relations ....................................................................................................................... 4

4.5 Hierarchical classification of discourse relations ..................................................................................................... 5

4.6 Inference of multiple relations between two segments ...................................................................................... 5

4.7 Representation of (a)symmetry of relations ................................................................................................................. 6

4.8 Representation of the relative importance of arguments for discourse meaning/

structure ....................................................................................................................................................................................................... 6

4.9 Arity of arguments ............................................................................................................................................................................... 7

4.10 Syntactic form, extent, and (non-)adjacency of argument realizations ................................................. 7

4.11 Triggers of discourse relations ................................................................................................................................................. 7

4.12 Representation of attribution as a discourse relation .......................................................................................... 8

4.13 Representation of entity-based relations......................................................................................................................... 9

4.14 Representation of non-existence of a discourse relation ................................................................................10

4.15 Summary: Assumptions of the DR-core annotation scheme ........................................................................10

4.16 Issues to be taken up in the follow-up of DR-core .................................................................................................11

4.17 Metamodel ...............................................................................................................................................................................................11

5 Core discourse relations ............................................................................................................................................................................12

6 Current approaches and annotation schemes ....................................................................................................................21

6.1 Overview ...................................................................................................................................................................................................21

6.2 Rhetorical structure theory (RST) ......................................................................................................................................21

6.3 RST Treebank ........................................................................................................................................................................................22

6.4 Hobbs’ Theory of Discourse Coherence (HTDC) .....................................................................................................24

6.5 GraphBank ...............................................................................................................................................................................................24

6.6 SDRT ..............................................................................................................................................................................................................25

6.7 CCR .................................................................................................................................................................................................................26

6.8 Penn Discourse Treebank (PDTB) .......................................................................................................................................26

6.9 Mapping of DR-core discourse relations to existing classifications .......................................................28

7 Interactions of this document with other annotation schemes .......................................................................30

7.1 Overlapping annotation schemes ........................................................................................................................................30

7.2 Discourse relations and semantic roles ..........................................................................................................................31

7.3 Discourse relations and temporal relations ...............................................................................................................31

7.4 Discourse relations and semantic relations between dialogue acts ......................................................32

8 DRelML: Discourse Relations Markup Language .............................................................................................................33

8.1 Overview ...................................................................................................................................................................................................33

8.2 DRelML abstract syntax and semantics ..........................................................................................................................34

8.3 Concrete syntax ...................................................................................................................................................................................35

Bibliography .............................................................................................................................................................................................................................39

© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
A list of all parts in the ISO 24617 series can be found on the ISO website.
iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
Introduction

The last decade has seen a proliferation of linguistically annotated corpora coding many phenomena

in support of empirical natural language research, both computational and theoretical. At the level of

discourse, interest in discourse processing has led to the development of several corpora annotated for

discourse relations. Discourse relations, also called “coherence relations” or “rhetorical relations”, are

relations, expressed explicitly or implicitly, between situations mentioned in a discourse and are key

to a complete understanding of the discourse, beyond the meaning conveyed by clauses and sentences.

Discourse relations and discourse structure are considered to be key ingredients for NLP tasks such

[39][41] [74] [19][47][56]
as summarization, complex question answering, natural language generation,
[42] [11][12] [38]

machine translation, opinion mining and sentiment analysis, and information retrieval. A

[76]

recent overview includes a description of the state of the art in discourse and computation. Several

international and collaborative efforts have resulted in annotated resources of discourse relations,

across languages as well as genres, to support the development of such applications.

Existing annotation frameworks exhibit two major differences in their underlying assumptions, one of

which concerns the representation of discourse structure, while the other has to do with the semantic

classification of discourse relations. As a result, annotations constructed using one framework are not

easily interpreted in another framework, and annotated resources are limited in their interoperability.

Notwithstanding their differences, however, there are strong compatibilities between them that can be

clarified and used as the basis for mappings and comparisons between the resources, as well as for use

as a basis for future annotation.

In a coherent (written or spoken) discourse, the situations mentioned in the discourse, such as events,

states, facts, propositions, and dialogue acts are semantically linked through causal, contrastive,

temporal and other relations, called “discourse relations”, “rhetorical relations”, or “coherence

relations”. Although discourse relations hold most prominently between the meanings of successive

sentences or utterances in a discourse, they may also occur between the meanings of smaller or

larger units (nominalizations, clauses, paragraphs, dialogue segments), and they may occur between

situations that are not explicitly described but that can be inferred.

This document aims to specify an interoperable approach to the annotation of local semantic relations

in discourse (DRels), following the Linguistic Annotation Framework (LAF, ISO 24612-2; see also

Reference [23]) and the general principles for semantic annotation established in ISO 24617-6. It reflects

the view that strong underlying compatibilities with respect to the semantic description of discourse

relations can be observed in the various discourse relation frameworks being used to support data

[40]

annotation, e.g. Rhetorical Structure Theory (RST), Segmented Discourse Representation Theory

[3] [59] [17][18]

(SDRT), the Penn Discourse Treebank, Hobbs’ Theory of Discourse Coherence (HTDC) and

[66]

the Cognitive Approach to Coherence Relations (CCR) . This document aims to provide an explanation

of these compatibilities and a loose mapping between definitions of individual discourse relations, as

specified in the different frameworks that will benefit the community as a whole.

The main aims of this document are to (1) establish a set of desiderata for interoperable DRel annotation;

(2) specify a way of annotating DRels that is compatible with existing and emerging ISO standard

annotation schemes for semantic information; and (3) provide clear and mutually consistent definitions

of a set of “core” discourse relations which are commonly found in some form in many existing discourse

relation frameworks. Together, (2) and (3) form a “core annotation scheme” for DRels.

This document does not aim at providing a fixed and exhaustive set of discourse relations, but rather at

providing an open, extensible set of core relations. The core annotation scheme also discusses certain

issues in discourse relation annotation that it leaves open, as they require further study in collaboration

with other efforts in multilingual discourse annotation, in particular the European COST action

TextLink. A future part of ISO 24617 is envisaged that will complement this document by providing a

complete interoperable annotation scheme for DRels, while also addressing the multilingual dimension

of the standard. The issues to be taken up for this complementary part are listed in 4.16.

© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24617-8:2018
---------------------- Page: 8 ----------------------
SIST ISO 24617-8:2018
INTERNATIONAL STANDARD ISO 24617-8:2016(E)
Language resource management — Semantic annotation
framework (SemAF) —
Part 8:
Semantic relations in discourse, core annotation schema
(DR-core)
1 Scope

This document establishes the representation and annotation of local, “low-level” discourse relations

between situations mentioned in discourse, where each relation is annotated independently of other

relations in the same discourse.

This document provides a basis for annotating discourse relations by specifying a set of core discourse

relations, many of which have similar definitions in different frameworks. To the extent possible, this

document provides mappings of the semantics across the different frameworks.
This document is applicable to two different situations:
— for annotating discourse relations in natural language corpora;

— as a target representation of automatic methods for shallow discourse parsing, for summarization,

and for other applications.
The objectives of this specification are to provide:

— a reference set of data categories that define a collection of discourse relation types with an explicit

semantics;

— a pivot representation based on a framework for defining discourse relations that can facilitate

mapping between different frameworks;

— a basis for developing guidelines for creating new resources that will be immediately interoperable

with pre-existing resources.

With respect to discourse structure, the limitation of this document to specifications for annotating

local, “low-level” discourse relations is based on the view that (a) the analysis at this level is what is

well understood and can be clearly defined; (b) further extensions to represent higher-level, global

discourse structure is possible where desired; and (c) that it allows for the resulting annotations to be

compatible across frameworks, even when they are based on different theories of discourse structure.

As a part of the ISO 24617 semantic annotation framework (“SemAF”), the present DR-core standard

aims to be transparent in its relation to existing frameworks for discourse relation annotation, but

also to be compatible with other ISO 24617 parts. Some discourse relations are specific to interactive

discourse, and give rise to an overlap with ISO 24617 Part 2, the ISO standard for dialogue act

annotation. Other discourse relations relate to time, and their annotation forms part of ISO 24617-1

(time and events); still other discourse relations are very similar to certain predicate-argument

relations (“semantic roles”), whose annotation is the subject matter of ISO 24617-4. Since the various

parts are required to form a consistent whole, this document pays special attention to the interactions

of discourse relation annotation and other semantic annotation schemes (see Clause 8).

This document does not consider global, higher-level discourse structure representation which involves

linking local discourse relations to form one or more composite global structures.

© ISO 2016 – All rights reserved 1
---------------------- Page: 9 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)

This document is, moreover, restricted to strictly semantic relations, to the exclusion of, for example,

presentational relations, which concern the way in which a text is presented to its readers or the way in

which speakers structure their contributions in a spoken dialogue.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
discourse
sequence of clauses or sentences in written text or of utterances in oral speech
3.2
situation

eventuality, fact, proposition, condition, belief or dialogue act, that can be realized by a linguistically

simple or complex expression, such as a clause, a nominalization, a sentence/utterance, or a discourse

segment consisting of multiple sentences or utterances
3.3
discourse relation
relation between two situations (3.2) mentioned in a discourse (3.1)

EXAMPLE 1 “Peter came late to the meeting. He had been in a traffic jam.” The events mentioned in the two

sentences are implicitly related through the discourse relation Cause.

EXAMPLE 2 “Peter was in a traffic jam, but he arrived on time for the meeting.” The events mentioned in the

two clauses are related by the discourse relation Concession, expressed by the connective “but”.

EXAMPLE 3 “Peter did not manage to come to the meeting; he was held up in a terrible traffic jam.” The causal

relation in this example is the same as in Example 1, but the argument expressed by the first clause is not an

eventuality, but a proposition, formed by an event description with negative polarity.

Note 1 to entry: Quasi-synonyms for “discourse relation”, with small variations in meaning, are “coherence

relation” and “rhetorical relation”.
3.4
discourse connective
word or multi-word expression expressing a discourse relation (3.3)

EXAMPLE Single-word discourse connectives include “but”, “since”, “and”, “however”, “because”. Multi-word

discourse connectives include “as well as”, “such as”.

Note 1 to entry: Many of the words that can be used as discourse connectives can also be used as intra-clausal

conjunctions, as with the use of “and” in “John and Mary are a lovely couple”.
3.5
low-level discourse structure

representation of discourse structure that only specifies local dependencies between a discourse

relation and its arguments, without further specifying any links or dependencies across these local

structures
2 © ISO 2016 – All rights reserved
---------------------- Page: 10 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
4 Basic concepts and metamodel
4.1 Overview

In a discourse, which comes into play when communication involves a sequence of clauses or sentences

in a text, or utterances in a dialogue, a major aspect of the understanding comes from how the events,

states, facts, propositions, and dialogue acts mentioned in the discourse are related to each other.

Understanding such relations, such as Cause, Contrast, and Condition, contribute to what is called

the “coherence” of the discourse, and they can be “realized” explicitly, by means of certain words and

phrases (often called “connectives”), or they can be implicit, when they have to be inferred on the basis

of the discourse context and world knowledge. Examples 1 to 3 illustrate the Cause relation realized

with expressions from different syntactic classes. In Example 1, a subordinating conjunction “because”

is used to connect some situation (here, the meaning of the subordinate clause) as the reason for the

buying event mentioned in its matrix clause. In Example 2, an adverb “as a result” is used to relate

two sentences to express the consequence of not seeing many signs about growth coming to a halt. In

Example 3, an explicit phrase is again used, to explain the claim about the level of investor withdrawal,

but here the phrase does not correspond to a well-defined single syntactic class such as a conjunction

or adverb. Finally, Example 4 shows that although a causal relation can be inferred between the two

sentences, with the second sentence offering an explanation for why some (investors) have raised their

cash positions, there is no word or phrase in the text to express this inference. Rather, the discourse

context needs to be used together with, cohesive devices and world knowledge to get at the relation.

[44]

Often, when such relations are inferred, it is possible to insert a connective phrase to express the

relation, as shown here with the insertion of “because”. In this document, the term “connective” is used

in a broad sense, to refer to any word or phrase used to express a discourse relation, including both

those drawn from well-defined syntactic classes as well as those that are not.

Example 1 Mr. Taft, who is also president of Taft Broadcasting Co., said he bought the shares

because he keeps a utility account at the brokerage firm of Salomon Brothers Inc., which had

recommended the stock as a good buy.

Example 2 Despite the economic slowdown, there are few clear signs that growth is coming to a halt.

As a result, Fed officials may be divided over whether to ease credit.

Example 3 But a strong level of investor withdrawal is much more unlikely this time around, fund

managers said. A major reason is that investors already have sharply scaled back their purchases

of stock funds since Black Monday.

Example 4 Some have raised their cash positions to record levels. [implicit (because)] High cash

positions help buffer a fund when the market falls.

Existing frameworks for describing and representing discourse relations differ along several lines.

The remainder of this clause provides a comparison of the most important frameworks, focusing on

those that have been used as the basis for annotating discourse relations in corpora, in particular the

[18]

Theory of Discourse Coherence (HTDC) by Hobbs, Rhetorical Structure Theory (RST) by Mann

[40] [66]

and Thompson, the Cognitive Approach of Coherence Relations (CCR) by Sanders and others,

[3]

Segmented Discourse Representation Theory (SDRT) by Asher and Lascarides and the annotation

[59][61]

framework of the Penn Discourse Treebank (PDTB). The comparison highlights and discusses

the main issues that are considered relevant for developing the pivot representation in DR-core. For

each issue, the discussion is followed by the ISO specification adopted for that issue. The clause ends

with a summary of the key features of the DR-core specification, and the DR-core metamodel.

4.2 Representation of discourse structure

One important difference between existing DRel frameworks concerns the representation of discourse

[10] [40]

structure. For example, the RST Treebank, based on the Rhetorical Structure Theory, assumes a

[78]

tree representation to subsume the entire text of the discourse. The Discourse GraphBank, based on

1) “HTDC” as an acronym for Hobbs’ theory is created for the purpose of this document and does not, thus far,

appear elsewhere in the literature.
© ISO 2016 – All rights reserved 3
---------------------- Page: 11 ----------------------
SIST ISO 24617-8:2018
ISO 24617-8:2016(E)
[64]

HTDC, allows for general graphs that permit multiple parents and crossing, and the DISCOR corpus

[1]

and the ANNODIS corpus, based on SDRT, allow directed acyclic graphs that permit multiple parents,

but not crossing. There are also frameworks that are pre-theoretical or theory-neutral with respect to

[59]

discourse structure. These include the PDTB, based loosely on a lexicalized approach to discourse

[16][75] [65]

relations and structure (DLTAG , and DiscAn, based on CCR). In both of these frameworks,

individual relations along with their arguments are annotated, without being combined with other

relations to form a composite structure encompassing the entire text.

These widely different views about the structural representation for discourse are difficult to reconcile

with each other. In the DR-core specification, a pre-theoretical stance involving low-level annotation of

discourse relations is adopted, with the idea that individual relations can be more reliably annotated

and that they can be further annotated to project a higher-level tree or graph structure, depending on

one’s theoretical inclination. From the point of view of interoperability, the low-level annotation can

also serve as a pivot representation when comparing annotations of different resources grounded in

different theories.
4.3 Semantic description of discourse relations

A second difference among existing frameworks relates to whether the meaning of a discourse relation

is described in “informational” term, i.e. in terms of the “meaning” of the relation

...

NORME ISO
INTERNATIONALE 24617-8
Première édition
2016-12-15
Gestion des ressources langagières —
Cadre d’annotation sémantique
(SemAF) —
Partie 8:
Relations sémantiques dans le
discours, schéma d’annotation de base
(DR-core)
Language resource management — Semantic annotation framework
(SemAF) —
Part 8: Semantic relations in discourse, core annotation schema
(DR-core)
Numéro de référence
ISO 24617-8:2016(F)
ISO 2016
---------------------- Page: 1 ----------------------
ISO 24617-8:2016(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2016, Publié en Suisse

Droits de reproduction réservés. Sauf indication contraire, aucune partie de cette publication ne peut être reproduite ni utilisée

sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique, y compris la photocopie, l’affichage sur

l’internet ou sur un Intranet, sans autorisation écrite préalable. Les demandes d’autorisation peuvent être adressées à l’ISO à

l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO 24617-8:2016(F)
Sommaire Page

Avant-propos ..............................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Domaine d’application ................................................................................................................................................................................... 1

2 Références normatives ................................................................................................................................................................................... 2

3 Termes et définitions ....................................................................................................................................................................................... 2

4 Notions fondamentales et métamodèle........................................................................................................................................ 3

4.1 Vue d’ensemble ....................................................................................................................................................................................... 3

4.2 Représentation de la structure du discours ................................................................................................................... 4

4.3 Description sémantique des relations du discours ................................................................................................. 4

4.4 Variantes pragmatiques des relations du discours ......... ......................................................................................... 5

4.5 Classification hiérarchique des relations du discours .......................................................................................... 6

4.6 Inférence de relations multiples entre deux segments ........................................................................................ 6

4.7 Représentation de la symétrie ou de l’asymétrie des relations .................................................................... 6

4.8 Représentation de l’importance relative des arguments pour la signification/la

structure du discours ........................................................................................................................................................................ 7

4.9 Arité des arguments ........................................................................................................................................................................... 8

4.10 Forme syntaxique, étendue et (non) adjacence des réalisations des arguments .......................... 8

4.11 Déclencheurs des relations du discours ........................................................................................................................... 8

4.12 Représentation de l’attribution en tant que relation du discours .............................................................. 9

4.13 Représentation des relations basées sur des entités ..........................................................................................10

4.14 Représentation de la non existence d’une relation du discours ................................................................11

4.15 Résumé: Postulats du schéma d’annotation du DR-core .................................................................................11

4.16 Questions à reprendre dans la suite donnée à DR-core ....................................................................................12

4.17 Métamodèle ............................................................................................................................................................................................12

5 Ensemble de base de relations du discours ..........................................................................................................................13

6 Approches actuelles et schémas d’annotation ...................................................................................................................23

6.1 Vue d’ensemble ....................................................................................................................................................................................23

6.2 Théorie des structures rhétoriques (Rhetorical Structure Theory – RST) ......................................23

6.3 RST Treebank ........................................................................................................................................................................................24

6.4 Théorie de Hobbs sur la cohérence du discours (Hobbs’ Theory of Discourse

Coherence – HTDC) ..........................................................................................................................................................................25

6.5 GraphBank ...............................................................................................................................................................................................26

6.6 SDRT ..............................................................................................................................................................................................................27

6.7 CCR .................................................................................................................................................................................................................28

6.8 Penn Discourse Treebank (PDTB) .......................................................................................................................................28

6.9 Transposition des relations du discours DR-Core dans les classifications existantes ...........30

7 Interactions du présent document avec les autres schémas d’annotation ...........................................33

7.1 Chevauchement des schémas d’annotation ................................................................................................................33

7.2 Relations du discours et rôles sémantiques ...............................................................................................................34

7.3 Relations du discours et relations temporelles .......................................................................................................34

7.4 Relations du discours et relations sémantiques entre actes du dialogue .........................................35

8 DRelML: Langage de balisage des relations du discours (Discourse Relations

Markup Language) ...........................................................................................................................................................................................36

8.1 Vue d’ensemble ....................................................................................................................................................................................36

8.2 Syntaxe abstraite et sémantique de DRelML ..............................................................................................................37

8.3 Syntaxe concrète .................................................................................................................................................................................38

Bibliographie ...........................................................................................................................................................................................................................42

© ISO 2016 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO 24617-8:2016(F)
Avant-propos

L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes

nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est

en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude

a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,

gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.

L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui

concerne la normalisation électrotechnique.

Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont

décrites dans les Directives ISO/IEC, Partie 1. Il convient, en particulier de prendre note des différents

critères d’approbation requis pour les différents types de documents ISO. Le présent document a été

rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www.

iso.org/directives).

L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de

droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable

de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant

les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de

l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de

brevets reçues par l’ISO (voir www.iso.org/brevets).

Les appellations commerciales éventuellement mentionnées dans le présent document sont données

pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un

engagement.

Pour une explication de la signification des termes et expressions spécifiques de l’ISO liés à l’évaluation

de la conformité, ou pour toute information au sujet de l’adhésion de l’ISO aux principes de l’Organisation

mondiale du commerce (OMC) concernant les obstacles techniques au commerce (OTC), voir le lien

suivant: www.iso.org/iso/fr/avant-propos.html

Le comité chargé de l’élaboration du présent document est l’ISO/TC 37, Terminologie et autres ressources

langagières et ressources de contenu, sous-comité SC 4, Gestion des ressources linguistiques.

Une liste de toutes les parties de l’ISO 24617 figure sur le site web de l’ISO.
iv © ISO 2016 – Tous droits réservés
---------------------- Page: 4 ----------------------
ISO 24617-8:2016(F)
Introduction

La dernière décennie a connu une multiplication de corpus annotés linguistiquement et codant de

nombreux phénomènes à l’appui de la recherche empirique en matière de langue naturelle, tant

informatique que théorique. Au niveau du discours, un intérêt pour le traitement du discours a conduit

à l’élaboration de plusieurs corpus annotés en ce qui concerne les relations du discours. Les relations

du discours, également appelées «relations de cohérence» ou «relations rhétoriques», sont des relations

exprimées de manière explicite ou implicite, entre des situations mentionnées dans un discours: elles

sont essentielles à une pleine compréhension du discours, allant au-delà de la signification véhiculée par

les propositions et les phrases. Les relations du discours et la structure du discours sont considérées

[39][41]

comme des composantes essentielles des tâches du TALN telles que le résumé automatique , les

[74] [19][47]

questions complexes dans les systèmes de question-réponses , la génération de langage naturel

[56] [42] [11][12]

, la traduction automatique , la fouille d’opinions et l’analyse des sentiments et la recherche

[38] [76]

d’information . Une synthèse récente intègre une description des dernières techniques en matière

de discours et de traitement automatique. Plusieurs initiatives internationales et collaboratives ont

permis de créer des ressources de relations du discours annotées, dans différentes langues et genres,

en vue de favoriser le développement de ce type d’applications.

Les cadres d’annotation existants présentent deux différences fondamentales au niveau des postulats de

départ: l’une d’entre elle concerne la représentation de la structure du discours, l’autre la classification

sémantique des relations du discours. Il s’ensuit que les annotations élaborées au moyen d’un cadre

donné sont difficiles à interpréter dans un autre cadre et que l’interopérabilité des ressources annotées

est limitée. Cependant, en dépit de ces différences, il existe entre ces cadres d’annotation de fortes

compatibilités qui peuvent être précisées et utilisées pour procéder à des transpositions et établir des

correspondances entre les ressources, ainsi que pour servir de base aux annotations futures.

Dans un discours (écrit ou oral) cohérent, les situations mentionnées dans le discours, comme les

événements, les déclarations, les faits, les propositions et les actes de dialogue, sont liées, sur le plan

sémantique, par des relations causales, contrastives, temporelles et autres, appelées «relations du

discours», «relations rhétoriques» ou «relations de cohérence». Bien que les relations du discours se

situent principalement entre les significations de phrases ou des énoncés successifs du discours, elles

peuvent aussi apparaître entre les significations d’unités plus petites ou plus grandes (nominalisations,

propositions, paragraphes, segments de dialogues) et elles peuvent également apparaître entre des

situations qui ne sont pas décrites de façon explicite, mais qui peuvent être inférées.

Le présent document a pour objet de spécifier une approche interopérable d’annotation de relations

sémantiques locales dans le discours (DRel), qui respecte le cadre d’annotation linguistique (LAF,

[23]

ISO 24612-2; voir également Référence ) et les grands principes de l’annotation sémantique

déterminés dans l’ISO 24617-6. Il illustre le point de vue selon lequel il peut être observé des

compatibilités sous-jacentes fortes par rapport à la description sémantique des relations du discours

dans les divers cadres de relations du discours utilisés pour l’annotation des données, par exemple la

[40]

théorie des structures rhétoriques (Rhetorical Structure Theory, RST) , la théorie des représentations

[3]

discursives segmentées (Segmented Discourse Representation Theory, SDRT) , le Penn Discourse

[59]

Treebank (PDTB) , la théorie de Hobbs sur la cohérence du discours (Hobbs’ Theory of Discourse

[17][18]

Coherence, HTDC) et l’approche cognitive des relations de cohérence (Cognitive Approach to

[66]

Coherence Relations, CCR) . Ce document a pour objet d’expliquer ces compatibilités et de proposer

des transpositions approximatives entre les définitions des relations individuelles du discours, telles

que spécifiées dans les différents cadre, qui bénéficieront à l’ensemble de la communauté.

Le présent document a pour objet de (1) dresser une liste de souhaits concernant l’interopérabilité

de l’annotation des DRel; (2) préciser une méthode d’annotation des DRel qui soit compatible avec

les schémas d’annotation normalisés de l’ISO relatifs à l’information sémantique, existants et à venir;

(3) fournir des définitions claires et mutuellement cohérentes d’un ensemble «de base» de relations du

discours qui apparaissent souvent sous une forme ou une autre dans de nombreux cadres actuels de

relations du discours. Ensemble, les objectifs (2) et (3) constituent un «schéma d’annotation de base»

des DRel.
© ISO 2016 – Tous droits réservés v
---------------------- Page: 5 ----------------------
ISO 24617-8:2016(F)

Le présent document n’a pas pour objet de fournir un ensemble exhaustif et figé de relations du discours,

mais plutôt de fournir un ensemble de base de relations ouvert et extensible. Le schéma d’annotation

de base aborde également certaines questions de l’annotation des relations du discours qui restent

en suspens, car elles nécessitent une étude plus approfondie en collaboration avec d’autres initiatives

d’annotation multilingue du discours, notamment l’action TextLink dans le cadre du programme

européen COST. Il est envisagé d’élaborer prochainement une nouvelle partie de l’ISO 24617 qui

complètera le présent document en fournissant un schéma d’annotation complet et interopérable des

DRel, tout en répondant à la dimension multilingue de la norme. Les questions qui seront reprises dans

cette partie complémentaire sont énumérées en 4.16.
vi © ISO 2016 – Tous droits réservés
---------------------- Page: 6 ----------------------
NORME INTERNATIONALE ISO 24617-8:2016(F)
Gestion des ressources langagières — Cadre d’annotation
sémantique (SemAF) —
Partie 8:
Relations sémantiques dans le discours, schéma
d’annotation de base (DR-core)
1 Domaine d’application

Le présent document détermine la représentation et l’annotation des relations du discours locales,

de «bas niveau», entre les situations mentionnées dans le discours, chaque relation étant annotée

indépendamment des autres relations dans le même discours.

Le présent document fournit un socle d’annotation des relations du discours, en spécifiant un ensemble

de base de relations du discours, un grand nombre d’entre elles revêtant des définitions similaires dans

des cadres différents. Dans la mesure du possible, le présent document fournit des transpositions de

sémantique dans les différents cadres existants.
Le présent document peut être appliqué à deux situations différentes:

— pour l’annotation des relations du discours dans les corpus de langage naturel;

— en tant que représentation cible des méthodes automatiques d’analyse de surface du discours, pour

le résumé automatique et autres applications.
Les objectifs de cette spécification sont de fournir:

— un ensemble de référence de catégories de données qui définissent une collection de types de

relations du discours avec une sémantique explicite;

— une représentation pivot basée sur un cadre de définition des relations du discours qui peut faciliter

la transposition entre différents cadres;

— une base d’élaboration de lignes directrices en vue de créer de nouvelles ressources qui seront

immédiatement interopérables avec des ressources pré-existantes.

En ce qui concerne la structure du discours, la limite du présent document aux spécifications

d’annotation de relations du discours locales, de «bas niveau», est fondée sur l’idée (a) que l’analyse

à ce niveau correspond à ce qui est bien compris et peut être clairement défini, (b) qu’il est possible,

s’il y a lieu, de procéder à des extensions complémentaires permettant de représenter une structure

de discours globale de niveau plus élevé, et (c) qu’il permettra une compatibilité des annotations en

découlant avec les divers cadres, même s’ils reposent sur des théories de structure du discours

différentes.

En tant que partie intégrante du cadre d’annotation sémantique (SemAF) de l’ISO 24617, la présente

norme DR-core a pour objectif d’être transparente dans sa relation avec les cadres d’annotations des

relations du discours existants, mais également d’être compatible avec les autres parties de l’ISO 24617.

Certaines relations du discours sont spécifiques au discours interactif et recoupent la Partie 2 de

l’ISO 24617 consacrée à l’annotation des actes de dialogue. D’autres relations du discours se rapportent

au temps, et leur annotation fait partie intégrante de l’ISO 24617-1 (temps et événements); d’autres

relations du discours encore sont très semblables à certaines relations prédicat-argument («rôles

sémantiques»), dont l’annotation est l’objet principal de l’ISO 24617-4. Puisque les différentes parties

sont indispensables pour constituer un ensemble cohérent, le présent document porte une attention

© ISO 2016 – Tous droits réservés 1
---------------------- Page: 7 ----------------------
ISO 24617-8:2016(F)

particulière aux interactions de l’annotation des relations du discours avec les autres schémas

d’annotation sémantique (voir Article 8).

Le présent document ne traite pas de la représentation des structures de discours globales de niveau

élevé, qui implique de relier des relations du discours locales pour constituer une ou plusieurs

structures globales plus complexes.

Le présent document se limite, en outre, aux relations strictement sémantiques, et exclut donc, par

exemple, les relations présentationnelles, qui concernent la façon dont un texte est présenté à ses

lecteurs ou la façon dont des locuteurs structurent leurs contributions à un dialogue oral.

2 Références normatives
Le présent document ne contient aucune référence normative.
3 Termes et définitions

Pour les besoins du présent document, les termes et définitions suivants s’appliquent.

L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en

normalisation, consultables aux adresses suivantes:
— IEC Electropedia: disponible à l’adresse http://www.electropedia.org/.
— ISO Online browsing platform: disponible à l’adresse http://www.iso.org/obp.
3.1
discours

séquence de propositions ou de phrases dans un texte écrit ou d’énoncés dans un discours oral

3.2
situation

éventualité, fait, proposition, condition, croyance ou acte de dialogue, qui peut être réalisé au moyen

d’une expression simple ou complexe sur le plan linguistique, par exemple une proposition, une

nominalisation, une phrase/un énoncé ou un segment de discours comportant des phrases ou des

énoncés multiples
3.3
relation du discours
relation entre deux situations (3.2) mentionnées dans un discours (3.1)

EXEMPLE 1 «Pierre est arrivé en retard à la réunion. Il était bloqué dans un embouteillage.» Les événements

mentionnés dans ces deux phrases sont implicitement liés par la relation du discours Cause.

EXEMPLE 2 «Pierre était bloqué dans un embouteillage, mais il est arrivé à temps à la réunion.» Les

événements mentionnés dans ces deux propositions sont reliés par la relation du discours Concession, exprimé

par le connecteur «mais».

EXEMPLE 3 «Pierre n’a pas réussi à venir à la réunion: il a été retenu dans un très gros embouteillage.» Dans

cet exemple, la relation causale est la même que dans l’exemple 1, cependant l’argument exprimé par la première

partie n’est pas une éventualité, mais une proposition, formée par la description d’un événement à polarité

négative.

Note 1 à l’article: Il existe des quasi-synonymes pour «relation du discours», dont la signification est légèrement

différente, à savoir «relation de cohérence» et «relation rhétorique».
2 © ISO 2016 – Tous droits réservés
---------------------- Page: 8 ----------------------
ISO 24617-8:2016(F)
3.4
connecteur de discours
mot ou expression à mots multiples exprimant une relation du discours (3.3)

EXEMPLE Les connecteurs de discours à mot unique comprennent «mais», «puisque», «et», «cependant»,

«car». Les connecteurs de discours à mots multiples comprennent «ainsi que», «tel que».

Note 1 à l’article: De nombreux mots utilisés en tant que connecteurs de discours peuvent également être utilisés

comme conjonction à l’intérieur d’une proposition, par exemple l’utilisation de «et» dans «Jean et Marie forment

un beau couple».
3.5
structure de discours de bas niveau

représentation de la structure de discours qui ne spécifie que les dépendances locales entre une

relation de discours et ses arguments, sans que soient précisés les liens ou les dépendances entre ces

structures locales
4 Notions fondamentales et métamodèle
4.1 Vue d’ensemble

Dans un discours, qui se déclenche lorsque la communication implique une séquence de propositions

ou de phrases dans un texte, ou d’énoncés dans un dialogue, un aspect essentiel de la compréhension

découle de la façon dont les événements, les déclarations, les faits, les propositions et les actes de

dialogue mentionnés dans le discours sont reliés les uns aux autres. Comprendre ces relations, telles que

la Causalité (Cause), le Contraste (Contrast) et la Condition (Condition), contribue à ce que l’on appelle

la «cohérence du discours»: ces relations peuvent être «réalisées» de manière explicite au moyen de

certains mots et expressions (souvent appelés «connecteurs») ou peuvent être implicites lorsqu’elles

doivent être inférées à partir du contexte du discours et de notre connaissance du monde. Les exemples 1

à 3 illustrent la relation Cause réalisée avec des expressions de différentes classes syntaxiques. Dans

l’exemple 1, une conjonction de surbordination, «parce que», est utilisée pour identifier une situation

donnée (ici, la signification de la proposition subordonnée) comme la raison à l’événement d’achat

évoqué dans la proposition principale. Dans l’exemple 2, un adverbe, «En conséquence», est utilisé pour

relier deux phrases en exprimant la conséquence liée au fait de ne pas constater beaucoup de signes

indiquant un arrêt de la croissance. Dans l’exemple 3, il est de nouveau fait usage d’une expression

explicite, pour expliquer l’allégation concernant le niveau de retrait des investisseurs, mais, ici, cette

expression ne correspond pas à une classe syntaxique unique et bien définie comme une conjonction

ou un adverbe. Enfin, l’exemple 4 montre que, bien qu’une relation causale puisse être inférée entre

les deux phrases, la deuxième phrase proposant une explication de la raison pour laquelle certains

(investisseurs) ont relevé leurs liquidités, aucun mot, aucune expression du texte n’exprime cette

inférence. Au lieu de cela, il est nécessaire d’utiliser le contexte du discours avec des mécanismes de

cohésion et de connaissance du monde pour comprendre la relation. Souvent, lorsque de telles relations

[44]

sont inférées, il est possible d’introduire une expression conjonctive pour exprimer la relation,

comme démontré ici avec l’insertion de «parce que». Dans ce document, le terme «conjonctif» est utilisé

au sens large, pour faire référence à des mots ou des expressions utilisés pour exprimer une relation du

discours, notamment les mots ou les expressions tirés de classes syntaxiques bien définies tout autant

que ceux qui ne le sont pas.

Exemple 1 M. Taft, qui est également président de Taft Broadcasting Co., a déclaré qu’il achetait des actions,

parce qu’il dispose d’un compte à la société de courtage Salomon Brothers Inc., qui lui avait recommandé

ces actions comme un bon investissement.

Exemple 2 En dépit du ralentissement économique, rares sont les signes montrant clairement que la croissance

marque le pas. En conséquence, les dirigeants de la Fed peuvent être divisés sur l’utilité ou non d’une

politique d’assouplissement du crédit.

Exemple 3 Mais un désengagement prononcé des investisseurs est plus qu’improbable cette fois-ci, selon

les gestionnaires de fonds. L’une des principales raisons tient au fait que les investisseurs ont déjà

considérablement réduit leurs achats de fonds en actions depuis le Lundi noir.
© ISO 2016 – Tous droits réservés 3
---------------------- Page: 9 ----------------------
ISO 24617-8:2016(F)

Exemple 4 Certains augmentent leurs liquidités à des niveaux record. [Implicite (parce que)] Des niveaux

élevés de liquidités permettent de réguler un fonds lorsque le marché s’effondre.

Les cadres existants de description et de représentation des relations du discours diffèrent les uns

des autres sur plusieurs aspects. La suite de cet article établit une comparaison des cadres les plus

importants, en privilégiant ceux qui ont été utilisés comme base d’annotation des relations du discours

dans les corpus, en particulier la théorie de Hobbs sur la cohérence du discours (Theory of Discourse

[18]

Coherence, HTDC) , la théorie des structures rhétoriques (Rhetorical Structure Theory, RST)

[40]

de Mann et Thompson , l’approche cognitive des relations de cohérence (Cognitive Approach of

[66]

Coherence Relations, CCR) de Sanders et al. , la théorie des représentations discursives segmentées

[3]

(Segmented Discourse Representation Theory, SDRT) de Asher et Lascarides et le cadre d’annotation

[59][61]

du Penn Discourse Treebank (PDTB) . Cette comparaison met en exergue et analyse les aspects

principaux considérés comme pertinents pour l’élaboration de la représentation pivot de DR-core. Pour

chaque aspect, l’analyse est suivie de la spécification ISO adoptée pour ledit aspect. L’article se termine

par un résumé des caractéristiques de base de la spécification DR-core et le métamodèle DR-Core.

4.2 Représentation de la structure d
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.