Language resource management — Transcription of spoken language

ISO 24624:2016 specifies rules for representing transcriptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts. Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

Gestion des ressources linguistiques — Transcription du langage parlé

L'ISO 24624:2016 énonce des règles de représentation des transcriptions d'enregistrements audio et vidéo d'interactions parlées, dans des documents XML reposant sur les recommandations de la TEI. Le deuxième objectif de ce document vise à rattacher les données transcrites à des normes de corpus annotés. Il s'applique aux données de transcription pour des études sociolinguistiques, l'analyse de conversation, la dialectologie, la linguistique de corpus, la lexicographie de corpus, les technologies langagières, les études qualitatives en sciences sociales, et aux autres données de transcription d'enregistrements du langage parlé. Il ne s'applique pas aux autres formes de transcription et surtout pas aux transcriptions de manuscrits. L'Annexe A présente un exemple d'encodage complet et l'Annexe B fournit un index des éléments et un index des attributs.

Upravljanje z jezikovnimi viri - Transkripcija govorjenega jezika

Ta dokument določa pravila za predstavitev transkripcij zvočnih in video posnetkov govorne komunikacije v dokumentih XML na podlagi smernic pobude za zapis besedil (TEI). Drugotni namen tega dokumenta je povezati prepisane podatke in standarde za označene korpuse. Uporablja se za prepisane podatke za študije na področju sociolingvistike, pogovorne analize, dialektologije, korpusnega jezikoslovja, korpusne leksikografije, jezikovne tehnologije, kvalitativne družboslovne študije in druge prepisane podatke zabeleženega govornega jezika. Ne uporablja se za druge oblike transkripcije, zlasti transkripcije ročno napisanih rokopisov.
V dodatku A je podan v celoti kodiran primer, v dodatku B pa sta podana kazalo elementov in kazalo atributov.

General Information

Status
Published
Publication Date
24-Jul-2016
Current Stage
9020 - International Standard under periodical review
Start Date
15-Jul-2021

Buy Standard

Standard
ISO 24624:2018
English language
39 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24624:2016 - Language resource management -- Transcription of spoken language
English language
32 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24624:2018
English language
39 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24624:2016 - Gestion des ressources linguistiques -- Transcription du langage parlé
French language
34 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

SLOVENSKI STANDARD
SIST ISO 24624:2018
01-oktober-2018
Upravljanje z jezikovnimi viri - Transkripcija govorjenega jezika
Language resource management -- Transcription of spoken language
Gestion des ressources linguistiques -- Transcription du langage parlé
Ta slovenski standard je istoveten z: ISO 24624:2016
ICS:
01.140.10 Pisanje in prečrkovanje Writing and transliteration
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24624:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24624:2018
---------------------- Page: 2 ----------------------
SIST ISO 24624:2018
INTERNATIONAL ISO
STANDARD 24624
First edition
2016-08-15
Language resource management —
Transcription of spoken language
Gestion des ressources linguistiques — Transcription du langage parlé
Reference number
ISO 24624:2016(E)
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Contents Page

Foreword ..........................................................................................................................................................................................................................................v

Introduction ................................................................................................................................................................................................................................vi

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Metadata ....................................................................................................................................................................................................................... 2

4.1 Description of the electronic file () ............................................................................................................. 2

4.1.1 Distribution information () ..................................................................................... 2

4.1.2 Recording information ()............................................................................................... 2

4.2 Description of circumstances () ............................................................................................................ 4

4.2.1 Participant information () ...................................................................................................... 4

4.2.2 Setting information () .............................................................................................................. 4

4.3 Description of source () ......................................................................................................................... 5

5 Macrostructure ...................................................................................................................................................................................................... 5

5.1 Timeline () ....................................................................................................................................................................... 5

5.2 Utterances () ................................................................................................................................................................................... 6

5.3 Free dependent annotations (, ) ................................................................................................ 7

5.4 Grouping of utterances and dependent annotations () ....................................... 9

5.5 Independent elements outside utterances ( and ) ...............................................10

5.6 Inline paralinguistic annotation () ....................................................................................................................10

5.7 Global divisions of a transcription (

) ..................................................................................................................11

6 Microstructure .....................................................................................................................................................................................................12

6.1 Tokens () .........................................................................................................................................................................................12

6.1.1 Characterization ............................................................................................................................................................12

6.1.2 Representation as ............................................................................................................................................12

6.1.3 Further constraints .....................................................................................................................................................13

6.1.4 Examples ..............................................................................................................................................................................13

6.2 Pauses () ...............................................................................................................................................................................14

6.2.1 Characterization ............................................................................................................................................................14

6.2.2 Representation as ..................................................................................................................................14

6.2.3 Further constraints .....................................................................................................................................................14

6.2.4 Examples ..............................................................................................................................................................................15

6.3 Audible and visible non-speech events (, and ) ..............................15

6.3.1 Characterization ............................................................................................................................................................15

6.3.2 Representation as , or .....................................................................16

6.3.3 Examples ..............................................................................................................................................................................16

6.4 Punctuation () ..........................................................................................................................................................................17

6.4.1 Characterization ............................................................................................................................................................17

6.4.2 Representation as ...........................................................................................................................................17

6.4.3 Further constraints .....................................................................................................................................................17

6.4.4 Examples ..............................................................................................................................................................................18

6.5 Uncertainty, alternatives, incomprehensible and omitted passages (,

, ) ................................................................................................................................................................................18

6.5.1 Characterization ............................................................................................................................................................18

6.5.2 Representation as or .....................................................................................................18

6.5.3 Further constraints .....................................................................................................................................................18

6.5.4 Examples ..............................................................................................................................................................................19

6.6 Units above the token and below the level () ..................................................................................20

6.6.1 Characterization ............................................................................................................................................................20

6.6.2 Representation as ........................................................................................................................................20

6.6.3 Further constraints .....................................................................................................................................................20

6.6.4 Examples ..............................................................................................................................................................................20

© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

Annex A (informative) Fully encoded example .......................................................................................................................................22

Annex B (informative) Element and attribute index .........................................................................................................................28

Bibliography .............................................................................................................................................................................................................................31

iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Introduction

This document sets out to facilitate the interchange of transcriptions of spoken language between

different computational tools and environments for creating, editing, publishing and exploiting such

data. Transcription of spoken language in this context means an orthography-based transcription of

verbal activity as recorded in an audio or video recording of a natural interaction. The description of

activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken

language transcription, but this document starts from the assumption that the verbal dimension is

the primary focus of a spoken language transcription. Likewise, although this document may also be

relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is

that orthography-based transcription is the default case.

This document is developed in the context of the joint agreement between ISO and the Text Encoding

Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI

[23]
guidelines.

This document takes into account data models and encoding practices supported by widely used

[12],[16],[17],[19]

transcription software. More specifically, it builds on several interoperability studies

involving the following tools:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber

This document was developed to be compatible with the formats produced by these tools. The

[4]

compatibility may extend to the formats of further labelling tools (e.g. Praat or Wavesurfer, http://

www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement

to convert these formats to one of the above-mentioned before adding mandatory information (e.g.

speaker assignment) using the respective tools.

This document also aims to be usable with widely used transcription systems (“conventions”). However,

in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most

of these systems lack an explicit formalization. The following selection of transcription systems was

considered for this document:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)

Since TEI is the reference framework for this document and metadata is not its main concern, no attempt

is made here to address metadata compatibility issues beyond the TEI header. However, it should be

noted that there are several TEI profiles for the CMDI framework which are related both to each other

and to CMDI profiles of other metadata formats (e.g. IMDI) via the ISOCAT registry (see also References

[5], [6] and [9]).

This document aims to define both a target format for legacy data conversion and a format suitable for

future data processing requirements. The pros and cons of these two demands were carefully weighed

up before decisions were taken. At some points, certain techniques are therefore marked as preferred

vi © ISO 2016 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

from a data processing point of view while an alternative technique is still allowed if the structure of

legacy data makes its use unavoidable.

With regard to the other standards developed within ISO committee TC 37/SC 4, this document is

intended to provide the primary layer on top of which further annotation layers may be implemented.

In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based

representation of tokens ISO 24611 (MAF).

This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-

off annotations within a TEI document. In particular, this mechanism contains a generic element

() that groups together annotations related to the same linguistic segment; this

grouping meets the needs of this document in the case of annotations of elements or its children.

Finally, this document is complementary and does not overlap with the speech and multimodal

interaction-related standards developed within the W3C. In particular, it does not deal with speech

[24]

synthesis as is the case for SSML, nor does it deal with the representation of the semantic

[25]
interpretation of multimodal utterances as does EMMA.
© ISO 2016 – All rights reserved vii
---------------------- Page: 9 ----------------------
SIST ISO 24624:2018
---------------------- Page: 10 ----------------------
SIST ISO 24624:2018
INTERNATIONAL STANDARD ISO 24624:2016(E)
Language resource management — Transcription of
spoken language
1 Scope

This document specifies rules for representing transcriptions of audio- and video-recorded spoken

interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the

document aims to relate transcribed data with standards for annotated corpora. It is applicable to

transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics,

corpus lexicography, language technology, qualitative social studies and other transcription data

of recorded spoken language. It is not applicable to other forms of transcription, most importantly

transcriptions of hand-written manuscripts.

Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
dependent annotation

annotation which does not refer directly to an audio or video recording, but to another annotation,

typically an orthographic or phonetic transcription
3.2
milestone element
empty XML element used to indicate a boundary point
3.3
orthographic transcription

representation or modelling of spoken language based on the orthography of the respective language

3.4
paralinguistic feature

feature of spoken language beyond the individual sound(s), such as voice quality, pitch, volume,

intonation
3.5
phonetic transcription

representation or modelling of spoken language based on the sound system of the respective language

3.6
spoken language
oral language produced by a person’s vocal system
© ISO 2016 – All rights reserved 1
---------------------- Page: 11 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
3.7
transcriber
person who carries out the transcription
3.8
transcription
representation or modelling of spoken language by means of written symbols
3.9
transcription system

theoretically founded set of principles and rules detailing what spoken language phenomena are to be

transcribed, and how they are to be transcribed
4 Metadata

The TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections

of the element. The following section addresses only those pieces of metadata which are

either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions

in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility

of, or necessity for, encoding further metadata inside the element.
4.1 Description of the electronic file ()
4.1.1 Distribution information ()

The element inside the section of the should be used to

record information about access rights and contact information for the transcription in question.

EXAMPLE 1 Use of

Hamburger Zentrum für Sprachkorpora


Available free for research and teaching purposes.
No redistributing allowed.



Hamburger Zentrum für Sprachkorpora

Max Brauer-Allee 60
22765
Hamburg
Germany


4.1.2 Recording information ()

The element inside the section of the should be used to

record information about the transcribed recording(s). Only the actual recording(s), usually digital

audio and/or video files, should be described here. General information about the respective interaction

which is independent of the recording(s) should be described in the element (see 4.2.2).

2 © ISO 2016 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

A element inside a element should be used to refer to the corresponding digital

file via a @url attribute (see Reference [2]). A @type attribute on should be used to

indicate the media type of the recording; audio and video are the permissible values for that attribute.

The actual digital file type should be encoded as a @mimeType attribute (see Reference [8]) on the

element. Where two or more files are derived from the same master recording (e.g. a video

file or an extracted audio track), these should be represented as different elements inside the

same element, rather than as different elements. TEI linking mechanisms,

such as or @corresp, can be used to describe relationships between different recordings or

between recordings and other elements, such as speakers.
EXAMPLE 2 Use of







Parkinson Talkshow on BBC, broadcast on 02 November 2007





Video excerpt downloaded from YouTube with aTube-Catcher, converted

into MPG format with Adobe Premiere
Audio extracted from video with Audacity 1.3 beta









Recorded with a ZOOM H4NSP, external lapel microphone
clipped to Victoria Beckham’s
dress
Synchronized with David Beckham’s record-
ing





Recorded with a ZOOM H4NSP, external lapel microphone
clipped to David Beckham’s
shirt collar
Synchronized with
© ISO 2016 – All rights reserved 3
---------------------- Page: 13 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Victoria Beckham’s recording



4.2 Description of circumstances ()
4.2.1 Participant information ()

The participants of the transcribed interaction should be described in elements inside

the section of a element. The use of an @n attribute on the

element to define an abbreviated code for the respective participant is mandatory since it can be crucial

for many processing purposes. elements inside the body of the transcription refer to the @xml:id

attribute of a element, which shall therefore always be provided.

In order to provide additional metadata about participants, the content model of can be fully

exploited, for example, to record a person’s age, birth date, language knowledge or role in the recorded

conversation.
EXAMPLE 3 Use of



Daniel
Steward




British English
French





Fiona
Baker




4.2.2 Setting information ()

The element should be used to provide general information about the setting and

circumstances of the interaction. This includes such matters as the place and time, spatial organization

4 © ISO 2016 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

and artefacts of the interaction. Information pertaining to a specific recording of that interaction should

not be recorded here, but in the (see 4.1.2).
EXAMPLE 4 Use of


BBC studio London


Talkshow host Michael Parkinson interviewing David and Victoria
Beckham about their relationship



4.3 Description of source ()

The element is used to record information about the way the TEI encoded text has

been derived from a recorded source. This includes information about both the tool which created the

transcription inside an element and the convention used in transcribing the data inside a

element. @ident and @version attributes should be used on these elements to

provide a machine-readable way of accessing this information.
EXAMPLE 5 Use of






Transcription Tool providing a TEI Export




Orthographic transcription according to HIAT


5 Macrostructure
5.1 Timeline ()

elements inside a element should be used to define points in the recording;

these points are then referred to by @start, @end and @synch attributes of other elements (most

importa
...

INTERNATIONAL ISO
STANDARD 24624
First edition
2016-08-15
Language resource management —
Transcription of spoken language
Gestion des ressources linguistiques — Transcription du langage parlé
Reference number
ISO 24624:2016(E)
ISO 2016
---------------------- Page: 1 ----------------------
ISO 24624:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 24624:2016(E)
Contents Page

Foreword ..........................................................................................................................................................................................................................................v

Introduction ................................................................................................................................................................................................................................vi

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Metadata ....................................................................................................................................................................................................................... 2

4.1 Description of the electronic file () ............................................................................................................. 2

4.1.1 Distribution information () ..................................................................................... 2

4.1.2 Recording information ()............................................................................................... 2

4.2 Description of circumstances () ............................................................................................................ 4

4.2.1 Participant information () ...................................................................................................... 4

4.2.2 Setting information () .............................................................................................................. 4

4.3 Description of source () ......................................................................................................................... 5

5 Macrostructure ...................................................................................................................................................................................................... 5

5.1 Timeline () ....................................................................................................................................................................... 5

5.2 Utterances () ................................................................................................................................................................................... 6

5.3 Free dependent annotations (, ) ................................................................................................ 7

5.4 Grouping of utterances and dependent annotations () ....................................... 9

5.5 Independent elements outside utterances ( and ) ...............................................10

5.6 Inline paralinguistic annotation () ....................................................................................................................10

5.7 Global divisions of a transcription (

) ..................................................................................................................11

6 Microstructure .....................................................................................................................................................................................................12

6.1 Tokens () .........................................................................................................................................................................................12

6.1.1 Characterization ............................................................................................................................................................12

6.1.2 Representation as ............................................................................................................................................12

6.1.3 Further constraints .....................................................................................................................................................13

6.1.4 Examples ..............................................................................................................................................................................13

6.2 Pauses () ...............................................................................................................................................................................14

6.2.1 Characterization ............................................................................................................................................................14

6.2.2 Representation as ..................................................................................................................................14

6.2.3 Further constraints .....................................................................................................................................................14

6.2.4 Examples ..............................................................................................................................................................................15

6.3 Audible and visible non-speech events (, and ) ..............................15

6.3.1 Characterization ............................................................................................................................................................15

6.3.2 Representation as , or .....................................................................16

6.3.3 Examples ..............................................................................................................................................................................16

6.4 Punctuation () ..........................................................................................................................................................................17

6.4.1 Characterization ............................................................................................................................................................17

6.4.2 Representation as ...........................................................................................................................................17

6.4.3 Further constraints .....................................................................................................................................................17

6.4.4 Examples ..............................................................................................................................................................................18

6.5 Uncertainty, alternatives, incomprehensible and omitted passages (,

, ) ................................................................................................................................................................................18

6.5.1 Characterization ............................................................................................................................................................18

6.5.2 Representation as or .....................................................................................................18

6.5.3 Further constraints .....................................................................................................................................................18

6.5.4 Examples ..............................................................................................................................................................................19

6.6 Units above the token and below the level () ..................................................................................20

6.6.1 Characterization ............................................................................................................................................................20

6.6.2 Representation as ........................................................................................................................................20

6.6.3 Further constraints .....................................................................................................................................................20

6.6.4 Examples ..............................................................................................................................................................................20

© ISO 2016 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 24624:2016(E)

Annex A (informative) Fully encoded example .......................................................................................................................................22

Annex B (informative) Element and attribute index .........................................................................................................................28

Bibliography .............................................................................................................................................................................................................................31

iv © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 24624:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
© ISO 2016 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO 24624:2016(E)
Introduction

This document sets out to facilitate the interchange of transcriptions of spoken language between

different computational tools and environments for creating, editing, publishing and exploiting such

data. Transcription of spoken language in this context means an orthography-based transcription of

verbal activity as recorded in an audio or video recording of a natural interaction. The description of

activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken

language transcription, but this document starts from the assumption that the verbal dimension is

the primary focus of a spoken language transcription. Likewise, although this document may also be

relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is

that orthography-based transcription is the default case.

This document is developed in the context of the joint agreement between ISO and the Text Encoding

Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI

[23]
guidelines.

This document takes into account data models and encoding practices supported by widely used

[12],[16],[17],[19]

transcription software. More specifically, it builds on several interoperability studies

involving the following tools:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber

This document was developed to be compatible with the formats produced by these tools. The

[4]

compatibility may extend to the formats of further labelling tools (e.g. Praat or Wavesurfer, http://

www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement

to convert these formats to one of the above-mentioned before adding mandatory information (e.g.

speaker assignment) using the respective tools.

This document also aims to be usable with widely used transcription systems (“conventions”). However,

in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most

of these systems lack an explicit formalization. The following selection of transcription systems was

considered for this document:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)

Since TEI is the reference framework for this document and metadata is not its main concern, no attempt

is made here to address metadata compatibility issues beyond the TEI header. However, it should be

noted that there are several TEI profiles for the CMDI framework which are related both to each other

and to CMDI profiles of other metadata formats (e.g. IMDI) via the ISOCAT registry (see also References

[5], [6] and [9]).

This document aims to define both a target format for legacy data conversion and a format suitable for

future data processing requirements. The pros and cons of these two demands were carefully weighed

up before decisions were taken. At some points, certain techniques are therefore marked as preferred

vi © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
ISO 24624:2016(E)

from a data processing point of view while an alternative technique is still allowed if the structure of

legacy data makes its use unavoidable.

With regard to the other standards developed within ISO committee TC 37/SC 4, this document is

intended to provide the primary layer on top of which further annotation layers may be implemented.

In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based

representation of tokens ISO 24611 (MAF).

This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-

off annotations within a TEI document. In particular, this mechanism contains a generic element

() that groups together annotations related to the same linguistic segment; this

grouping meets the needs of this document in the case of annotations of elements or its children.

Finally, this document is complementary and does not overlap with the speech and multimodal

interaction-related standards developed within the W3C. In particular, it does not deal with speech

[24]

synthesis as is the case for SSML, nor does it deal with the representation of the semantic

[25]
interpretation of multimodal utterances as does EMMA.
© ISO 2016 – All rights reserved vii
---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO 24624:2016(E)
Language resource management — Transcription of
spoken language
1 Scope

This document specifies rules for representing transcriptions of audio- and video-recorded spoken

interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the

document aims to relate transcribed data with standards for annotated corpora. It is applicable to

transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics,

corpus lexicography, language technology, qualitative social studies and other transcription data

of recorded spoken language. It is not applicable to other forms of transcription, most importantly

transcriptions of hand-written manuscripts.

Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
dependent annotation

annotation which does not refer directly to an audio or video recording, but to another annotation,

typically an orthographic or phonetic transcription
3.2
milestone element
empty XML element used to indicate a boundary point
3.3
orthographic transcription

representation or modelling of spoken language based on the orthography of the respective language

3.4
paralinguistic feature

feature of spoken language beyond the individual sound(s), such as voice quality, pitch, volume,

intonation
3.5
phonetic transcription

representation or modelling of spoken language based on the sound system of the respective language

3.6
spoken language
oral language produced by a person’s vocal system
© ISO 2016 – All rights reserved 1
---------------------- Page: 8 ----------------------
ISO 24624:2016(E)
3.7
transcriber
person who carries out the transcription
3.8
transcription
representation or modelling of spoken language by means of written symbols
3.9
transcription system

theoretically founded set of principles and rules detailing what spoken language phenomena are to be

transcribed, and how they are to be transcribed
4 Metadata

The TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections

of the element. The following section addresses only those pieces of metadata which are

either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions

in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility

of, or necessity for, encoding further metadata inside the element.
4.1 Description of the electronic file ()
4.1.1 Distribution information ()

The element inside the section of the should be used to

record information about access rights and contact information for the transcription in question.

EXAMPLE 1 Use of

Hamburger Zentrum für Sprachkorpora


Available free for research and teaching purposes.
No redistributing allowed.



Hamburger Zentrum für Sprachkorpora

Max Brauer-Allee 60
22765
Hamburg
Germany


4.1.2 Recording information ()

The element inside the section of the should be used to

record information about the transcribed recording(s). Only the actual recording(s), usually digital

audio and/or video files, should be described here. General information about the respective interaction

which is independent of the recording(s) should be described in the element (see 4.2.2).

2 © ISO 2016 – All rights reserved
---------------------- Page: 9 ----------------------
ISO 24624:2016(E)

A element inside a element should be used to refer to the corresponding digital

file via a @url attribute (see Reference [2]). A @type attribute on should be used to

indicate the media type of the recording; audio and video are the permissible values for that attribute.

The actual digital file type should be encoded as a @mimeType attribute (see Reference [8]) on the

element. Where two or more files are derived from the same master recording (e.g. a video

file or an extracted audio track), these should be represented as different elements inside the

same element, rather than as different elements. TEI linking mechanisms,

such as or @corresp, can be used to describe relationships between different recordings or

between recordings and other elements, such as speakers.
EXAMPLE 2 Use of







Parkinson Talkshow on BBC, broadcast on 02 November 2007





Video excerpt downloaded from YouTube with aTube-Catcher, converted

into MPG format with Adobe Premiere
Audio extracted from video with Audacity 1.3 beta









Recorded with a ZOOM H4NSP, external lapel microphone
clipped to Victoria Beckham’s
dress
Synchronized with David Beckham’s record-
ing





Recorded with a ZOOM H4NSP, external lapel microphone
clipped to David Beckham’s
shirt collar
Synchronized with
© ISO 2016 – All rights reserved 3
---------------------- Page: 10 ----------------------
ISO 24624:2016(E)
Victoria Beckham’s recording



4.2 Description of circumstances ()
4.2.1 Participant information ()

The participants of the transcribed interaction should be described in elements inside

the section of a element. The use of an @n attribute on the

element to define an abbreviated code for the respective participant is mandatory since it can be crucial

for many processing purposes. elements inside the body of the transcription refer to the @xml:id

attribute of a element, which shall therefore always be provided.

In order to provide additional metadata about participants, the content model of can be fully

exploited, for example, to record a person’s age, birth date, language knowledge or role in the recorded

conversation.
EXAMPLE 3 Use of



Daniel
Steward




British English
French





Fiona
Baker




4.2.2 Setting information ()

The element should be used to provide general information about the setting and

circumstances of the interaction. This includes such matters as the place and time, spatial organization

4 © ISO 2016 – All rights reserved
---------------------- Page: 11 ----------------------
ISO 24624:2016(E)

and artefacts of the interaction. Information pertaining to a specific recording of that interaction should

not be recorded here, but in the (see 4.1.2).
EXAMPLE 4 Use of


BBC studio London


Talkshow host Michael Parkinson interviewing David and Victoria
Beckham about their relationship



4.3 Description of source ()

The element is used to record information about the way the TEI encoded text has

been derived from a recorded source. This includes information about both the tool which created the

transcription inside an element and the convention used in transcribing the data inside a

element. @ident and @version attributes should be used on these elements to

provide a machine-readable way of accessing this information.
EXAMPLE 5 Use of






Transcription Tool providing a TEI Export




Orthographic transcription according to HIAT


5 Macrostructure
5.1 Timeline ()

elements inside a element should be used to define points in the recording;

these points are then referred to by @start, @end and @synch attributes of other elements (most

importantly elements) of the transcription to represent its temporal structure. It is therefore

obligatory to provide an @xml:id attribute for each element. elements shall be in

© ISO 2016 – All rights reserved 5
---------------------- Page: 12 ----------------------
ISO 24624:2016(E)

the same order as the timepoints they refer to. Specifying an @interval attribute is optional, but it is

very useful for many processing purposes. Absolute time values in the @interval attribute should be

given in seconds from the start of the recording with the appropriate number of decimal points. The

first element in the timeline corresponds to the start time of the transcribed recording. If an

absolute value is known for this point in time, it can be encoded in an @absolute attribute of the first

element and the element can point to it via an @origin attribute. If no absolute value for

the start of the recording can be provided, the @origin and @absolute attributes should be omitted.

EXAMPLE 6 Use of

...

SLOVENSKI STANDARD
SIST ISO 24624:2018
01-oktober-2018
Upravljanje z jezikovnimi viri - Transkripcija govorjenega jezika
Language resource management -- Transcription of spoken language
Gestion des ressources linguistiques -- Transcription du langage parlé
Ta slovenski standard je istoveten z: ISO 24624:2016
ICS:
01.140.10 3LVDQMHLQSUHþUNRYDQMH Writing and transliteration
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
SIST ISO 24624:2018 en,fr,de

2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24624:2018
---------------------- Page: 2 ----------------------
SIST ISO 24624:2018
INTERNATIONAL ISO
STANDARD 24624
First edition
2016-08-15
Language resource management —
Transcription of spoken language
Gestion des ressources linguistiques — Transcription du langage parlé
Reference number
ISO 24624:2016(E)
ISO 2016
---------------------- Page: 3 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form

or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior

written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of

the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved
---------------------- Page: 4 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Contents Page

Foreword ..........................................................................................................................................................................................................................................v

Introduction ................................................................................................................................................................................................................................vi

1 Scope ................................................................................................................................................................................................................................. 1

2 Normative references ...................................................................................................................................................................................... 1

3 Terms and definitions ..................................................................................................................................................................................... 1

4 Metadata ....................................................................................................................................................................................................................... 2

4.1 Description of the electronic file () ............................................................................................................. 2

4.1.1 Distribution information () ..................................................................................... 2

4.1.2 Recording information ()............................................................................................... 2

4.2 Description of circumstances () ............................................................................................................ 4

4.2.1 Participant information () ...................................................................................................... 4

4.2.2 Setting information () .............................................................................................................. 4

4.3 Description of source () ......................................................................................................................... 5

5 Macrostructure ...................................................................................................................................................................................................... 5

5.1 Timeline () ....................................................................................................................................................................... 5

5.2 Utterances () ................................................................................................................................................................................... 6

5.3 Free dependent annotations (, ) ................................................................................................ 7

5.4 Grouping of utterances and dependent annotations () ....................................... 9

5.5 Independent elements outside utterances ( and ) ...............................................10

5.6 Inline paralinguistic annotation () ....................................................................................................................10

5.7 Global divisions of a transcription (

) ..................................................................................................................11

6 Microstructure .....................................................................................................................................................................................................12

6.1 Tokens () .........................................................................................................................................................................................12

6.1.1 Characterization ............................................................................................................................................................12

6.1.2 Representation as ............................................................................................................................................12

6.1.3 Further constraints .....................................................................................................................................................13

6.1.4 Examples ..............................................................................................................................................................................13

6.2 Pauses () ...............................................................................................................................................................................14

6.2.1 Characterization ............................................................................................................................................................14

6.2.2 Representation as ..................................................................................................................................14

6.2.3 Further constraints .....................................................................................................................................................14

6.2.4 Examples ..............................................................................................................................................................................15

6.3 Audible and visible non-speech events (, and ) ..............................15

6.3.1 Characterization ............................................................................................................................................................15

6.3.2 Representation as , or .....................................................................16

6.3.3 Examples ..............................................................................................................................................................................16

6.4 Punctuation () ..........................................................................................................................................................................17

6.4.1 Characterization ............................................................................................................................................................17

6.4.2 Representation as ...........................................................................................................................................17

6.4.3 Further constraints .....................................................................................................................................................17

6.4.4 Examples ..............................................................................................................................................................................18

6.5 Uncertainty, alternatives, incomprehensible and omitted passages (,

, ) ................................................................................................................................................................................18

6.5.1 Characterization ............................................................................................................................................................18

6.5.2 Representation as or .....................................................................................................18

6.5.3 Further constraints .....................................................................................................................................................18

6.5.4 Examples ..............................................................................................................................................................................19

6.6 Units above the token and below the level () ..................................................................................20

6.6.1 Characterization ............................................................................................................................................................20

6.6.2 Representation as ........................................................................................................................................20

6.6.3 Further constraints .....................................................................................................................................................20

6.6.4 Examples ..............................................................................................................................................................................20

© ISO 2016 – All rights reserved iii
---------------------- Page: 5 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

Annex A (informative) Fully encoded example .......................................................................................................................................22

Annex B (informative) Element and attribute index .........................................................................................................................28

Bibliography .............................................................................................................................................................................................................................31

iv © ISO 2016 – All rights reserved
---------------------- Page: 6 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Foreword

ISO (the International Organization for Standardization) is a worldwide federation of national standards

bodies (ISO member bodies). The work of preparing International Standards is normally carried out

through ISO technical committees. Each member body interested in a subject for which a technical

committee has been established has the right to be represented on that committee. International

organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.

ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of

electrotechnical standardization.

The procedures used to develop this document and those intended for its further maintenance are

described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the

different types of ISO documents should be noted. This document was drafted in accordance with the

editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).

Attention is drawn to the possibility that some of the elements of this document may be the subject of

patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of

any patent rights identified during the development of the document will be in the Introduction and/or

on the ISO list of patent declarations received (see www.iso.org/patents).

Any trade name used in this document is information given for the convenience of users and does not

constitute an endorsement.

For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,

as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the

Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.

The committee responsible for this document is ISO/TC 37, Terminology and other language and content

resources, Subcommittee SC 4, Language resource management.
© ISO 2016 – All rights reserved v
---------------------- Page: 7 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Introduction

This document sets out to facilitate the interchange of transcriptions of spoken language between

different computational tools and environments for creating, editing, publishing and exploiting such

data. Transcription of spoken language in this context means an orthography-based transcription of

verbal activity as recorded in an audio or video recording of a natural interaction. The description of

activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken

language transcription, but this document starts from the assumption that the verbal dimension is

the primary focus of a spoken language transcription. Likewise, although this document may also be

relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is

that orthography-based transcription is the default case.

This document is developed in the context of the joint agreement between ISO and the Text Encoding

Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI

[23]
guidelines.

This document takes into account data models and encoding practices supported by widely used

[12],[16],[17],[19]

transcription software. More specifically, it builds on several interoperability studies

involving the following tools:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber

This document was developed to be compatible with the formats produced by these tools. The

[4]

compatibility may extend to the formats of further labelling tools (e.g. Praat or Wavesurfer, http://

www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement

to convert these formats to one of the above-mentioned before adding mandatory information (e.g.

speaker assignment) using the respective tools.

This document also aims to be usable with widely used transcription systems (“conventions”). However,

in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most

of these systems lack an explicit formalization. The following selection of transcription systems was

considered for this document:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)

Since TEI is the reference framework for this document and metadata is not its main concern, no attempt

is made here to address metadata compatibility issues beyond the TEI header. However, it should be

noted that there are several TEI profiles for the CMDI framework which are related both to each other

and to CMDI profiles of other metadata formats (e.g. IMDI) via the ISOCAT registry (see also References

[5], [6] and [9]).

This document aims to define both a target format for legacy data conversion and a format suitable for

future data processing requirements. The pros and cons of these two demands were carefully weighed

up before decisions were taken. At some points, certain techniques are therefore marked as preferred

vi © ISO 2016 – All rights reserved
---------------------- Page: 8 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

from a data processing point of view while an alternative technique is still allowed if the structure of

legacy data makes its use unavoidable.

With regard to the other standards developed within ISO committee TC 37/SC 4, this document is

intended to provide the primary layer on top of which further annotation layers may be implemented.

In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based

representation of tokens ISO 24611 (MAF).

This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-

off annotations within a TEI document. In particular, this mechanism contains a generic element

() that groups together annotations related to the same linguistic segment; this

grouping meets the needs of this document in the case of annotations of elements or its children.

Finally, this document is complementary and does not overlap with the speech and multimodal

interaction-related standards developed within the W3C. In particular, it does not deal with speech

[24]

synthesis as is the case for SSML, nor does it deal with the representation of the semantic

[25]
interpretation of multimodal utterances as does EMMA.
© ISO 2016 – All rights reserved vii
---------------------- Page: 9 ----------------------
SIST ISO 24624:2018
---------------------- Page: 10 ----------------------
SIST ISO 24624:2018
INTERNATIONAL STANDARD ISO 24624:2016(E)
Language resource management — Transcription of
spoken language
1 Scope

This document specifies rules for representing transcriptions of audio- and video-recorded spoken

interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the

document aims to relate transcribed data with standards for annotated corpora. It is applicable to

transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics,

corpus lexicography, language technology, qualitative social studies and other transcription data

of recorded spoken language. It is not applicable to other forms of transcription, most importantly

transcriptions of hand-written manuscripts.

Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.

ISO and IEC maintain terminological databases for use in standardization at the following addresses:

— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
dependent annotation

annotation which does not refer directly to an audio or video recording, but to another annotation,

typically an orthographic or phonetic transcription
3.2
milestone element
empty XML element used to indicate a boundary point
3.3
orthographic transcription

representation or modelling of spoken language based on the orthography of the respective language

3.4
paralinguistic feature

feature of spoken language beyond the individual sound(s), such as voice quality, pitch, volume,

intonation
3.5
phonetic transcription

representation or modelling of spoken language based on the sound system of the respective language

3.6
spoken language
oral language produced by a person’s vocal system
© ISO 2016 – All rights reserved 1
---------------------- Page: 11 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
3.7
transcriber
person who carries out the transcription
3.8
transcription
representation or modelling of spoken language by means of written symbols
3.9
transcription system

theoretically founded set of principles and rules detailing what spoken language phenomena are to be

transcribed, and how they are to be transcribed
4 Metadata

The TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections

of the element. The following section addresses only those pieces of metadata which are

either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions

in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility

of, or necessity for, encoding further metadata inside the element.
4.1 Description of the electronic file ()
4.1.1 Distribution information ()

The element inside the section of the should be used to

record information about access rights and contact information for the transcription in question.

EXAMPLE 1 Use of

Hamburger Zentrum für Sprachkorpora


Available free for research and teaching purposes.
No redistributing allowed.



Hamburger Zentrum für Sprachkorpora

Max Brauer-Allee 60
22765
Hamburg
Germany


4.1.2 Recording information ()

The element inside the section of the should be used to

record information about the transcribed recording(s). Only the actual recording(s), usually digital

audio and/or video files, should be described here. General information about the respective interaction

which is independent of the recording(s) should be described in the element (see 4.2.2).

2 © ISO 2016 – All rights reserved
---------------------- Page: 12 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

A element inside a element should be used to refer to the corresponding digital

file via a @url attribute (see Reference [2]). A @type attribute on should be used to

indicate the media type of the recording; audio and video are the permissible values for that attribute.

The actual digital file type should be encoded as a @mimeType attribute (see Reference [8]) on the

element. Where two or more files are derived from the same master recording (e.g. a video

file or an extracted audio track), these should be represented as different elements inside the

same element, rather than as different elements. TEI linking mechanisms,

such as or @corresp, can be used to describe relationships between different recordings or

between recordings and other elements, such as speakers.
EXAMPLE 2 Use of







Parkinson Talkshow on BBC, broadcast on 02 November 2007





Video excerpt downloaded from YouTube with aTube-Catcher, converted

into MPG format with Adobe Premiere
Audio extracted from video with Audacity 1.3 beta









Recorded with a ZOOM H4NSP, external lapel microphone
clipped to Victoria Beckham’s
dress
Synchronized with David Beckham’s record-
ing





Recorded with a ZOOM H4NSP, external lapel microphone
clipped to David Beckham’s
shirt collar
Synchronized with
© ISO 2016 – All rights reserved 3
---------------------- Page: 13 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)
Victoria Beckham’s recording



4.2 Description of circumstances ()
4.2.1 Participant information ()

The participants of the transcribed interaction should be described in elements inside

the section of a element. The use of an @n attribute on the

element to define an abbreviated code for the respective participant is mandatory since it can be crucial

for many processing purposes. elements inside the body of the transcription refer to the @xml:id

attribute of a element, which shall therefore always be provided.

In order to provide additional metadata about participants, the content model of can be fully

exploited, for example, to record a person’s age, birth date, language knowledge or role in the recorded

conversation.
EXAMPLE 3 Use of



Daniel
Steward




British English
French





Fiona
Baker




4.2.2 Setting information ()

The element should be used to provide general information about the setting and

circumstances of the interaction. This includes such matters as the place and time, spatial organization

4 © ISO 2016 – All rights reserved
---------------------- Page: 14 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

and artefacts of the interaction. Information pertaining to a specific recording of that interaction should

not be recorded here, but in the (see 4.1.2).
EXAMPLE 4 Use of


BBC studio London


Talkshow host Michael Parkinson interviewing David and Victoria
Beckham about their relationship



4.3 Description of source ()

The element is used to record information about the way the TEI encoded text has

been derived from a recorded source. This includes information about both the tool which created the

transcription inside an element and the convention used in transcribing the data inside a

element. @ident and @version attributes should be used on these elements to

provide a machine-readable way of accessing this information.
EXAMPLE 5 Use of






Transcription Tool providing a TEI Export




Orthographic transcription according to HIAT


5 Macrostructure
5.1 Timeline ()

elements inside a element should be used to define points in the recording;

these points are then referred to by @start, @end and @synch attributes of other elements (most

importa
...

NORME ISO
INTERNATIONALE 24624
Première édition
2016-08-15
Gestion des ressources
linguistiques — Transcription du
langage parlé
Language resource management — Transcription of spoken language
Numéro de référence
ISO 24624:2016(F)
ISO 2016
---------------------- Page: 1 ----------------------
ISO 24624:2016(F)
DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2016, Publié en Suisse

Droits de reproduction réservés. Sauf indication contraire, aucune partie de cette publication ne peut être reproduite ni utilisée

sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique, y compris la photocopie, l’affichage sur

l’internet ou sur un Intranet, sans autorisation écrite préalable. Les demandes d’autorisation peuvent être adressées à l’ISO à

l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – Tous droits réservés
---------------------- Page: 2 ----------------------
ISO 24624:2016(F)
Sommaire Page

Avant-propos ..............................................................................................................................................................................................................................iv

Introduction ..................................................................................................................................................................................................................................v

1 Domaine d’application ................................................................................................................................................................................... 1

2 Références normatives ................................................................................................................................................................................... 1

3 Termes et définitions ....................................................................................................................................................................................... 1

4 Métadonnées ............................................................................................................................................................................................................ 2

4.1 Description du fichier électronique () ...................................................................................................... 2

4.1.1 Informations de diffusion ( ) .................................................................................. 2

4.1.2 Informations sur l’enregistrement () .................................................................. 3

4.2 Description des circonstances () .......................................................................................................... 4

4.2.1 Informations sur les participants () .............................................................................. 4

4.2.2 Informations sur le contexte () ........................................................................................ 5

4.3 Description de la source () .................................................................................................................. 6

5 Macrostructure ...................................................................................................................................................................................................... 6

5.1 Frise chronologique () .......................................................................................................................................... 6

5.2 Énoncés () .......................................................................................................................................................................................... 7

5.3 Annotations libres et dépendantes (,) .................................................................................. 8

5.4 Regroupement des énoncés et des annotations dépendantes () .............10

5.5 Éléments indépendants hors énoncé ( et ) .................................................................11

5.6 Annotations paralinguistiques en ligne () ..................................................................................................11

5.7 Divisions globales d’une transcription (

)........................................................................................................12

6 Microstructure .....................................................................................................................................................................................................13

6.1 Token () ...........................................................................................................................................................................................13

6.1.1 Caractérisation ...............................................................................................................................................................13

6.1.2 Représentation comme ...............................................................................................................................13

6.1.3 Autres contraintes .......................................................................................................................................................14

6.1.4 Exemples ..............................................................................................................................................................................14

6.2 Pauses () ...............................................................................................................................................................................15

6.2.1 Caractérisation ...............................................................................................................................................................15

6.2.2 Représentation comme ....................................................................................................................16

6.2.3 Autres contraintes .......................................................................................................................................................16

6.2.4 Exemples ..............................................................................................................................................................................16

6.3 Événements audibles et visibles ne relevant pas du discours (, et

) ..............................................................................................................................................................................................17

6.3.1 Caractérisation ...............................................................................................................................................................17

6.3.2 Représentation comme , ou .......................................................17

6.3.3 Exemples ..............................................................................................................................................................................18

6.4 Ponctuation ()...........................................................................................................................................................................19

6.4.1 Caractérisation ...............................................................................................................................................................19

6.4.2 Représentation comme .............................................................................................................................19

6.4.3 Autres contraintes .......................................................................................................................................................19

6.4.4 Exemples ..............................................................................................................................................................................19

6.5 Incertitude, alternatives, passages incompréhensibles et omis (,

, ) ................................................................................................................................................................................20

6.5.1 Caractérisation ...............................................................................................................................................................20

6.5.2 Représentation en tant que ou .............................................................................20

6.5.3 Autres contraintes .......................................................................................................................................................20

6.5.4 Exemples ..............................................................................................................................................................................20

6.6 Unités au-dessus du token et en dessous du niveau ()............................................................22

6.6.1 Caractérisation ...............................................................................................................................................................22

6.6.2 Représentation comme ...........................................................................................................................22

6.6.3 Autres contraintes .......................................................................................................................................................22

© ISO 2016 – Tous droits réservés iii
---------------------- Page: 3 ----------------------
ISO 24624:2016(F)ISO 24624:2016(F)
Avant-propos

L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes

nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est

en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude

a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,

gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.

L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui

concerne la normalisation électrotechnique.

Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont

décrites dans les Directives ISO/IEC, Partie 1 Il convient, en particulier, de prendre note des différents

critères d’approbation requis pour les différents types de documents ISO. Le présent document a été

rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www.

iso.org/directives).

L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de

droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable

de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant

les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de

l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de

brevets reçues par l’ISO (voir www.iso.org/brevets).

Les appellations commerciales éventuellement mentionnées dans le présent document sont données

pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un

engagement.

Pour une explication de la signification des termes et expressions spécifiques de l’ISO liés à l’évaluation

de la conformité, ou pour toute information au sujet de l’adhésion de l’ISO aux principes de l’Organisation

mondiale du commerce (OMC) concernant les obstacles techniques au commerce (OTC), voir le lien

suivant: www.iso.org/iso/fr/avant-propos.html.

Le présent document a été élaboré par le comité technique ISO/TC 37, Terminologie et autres ressources

langagières et ressources de contenu, sous-comité SC 4, Gestion des ressources linguistiques.

iviv © ISO 2016 – T© ISO 2016 – Tous drous droits roits réservéservésés
---------------------- Page: 4 ----------------------
ISO 24624:2016(F)
Introduction

Le présent document vise à faciliter l’échange de transcriptions du langage parlé entre différents outils

et environnements informatiques de création, de révision, de publication et d’exploitation de telles

données. La transcription du langage parlé dans ce contexte implique une transcription orthographique

de l’activité verbale telle qu’elle figure dans un enregistrement audio ou vidéo d’une interaction

naturelle. La description de l’activité selon d’autres modalités (par exemple, langage corporel, gestes et

expressions faciales) peut faire partie intégrante d’une transcription du langage parlé, mais ce document

part du principe que la composante verbale est l’objet premier d’une transcription du langage parlé. De

la même façon, bien que ce document puisse s’avérer pertinent pour une transcription en alphabets

phonétiques comme l’API, ce document repose sur l’hypothèse que la transcription orthographique est

le cas par défaut.

Le présent document est élaboré dans le cadre de l’accord commun entre l’ISO et le Text Encoding

Initiative (TEI) Consortium et, par conséquent, son contenu figure également dans les recommandations

[23]
de la TEI .

Le présent document tient compte des modèles de données et des pratiques d’encodage pris en charge

par des logiciels de transcription d’utilisation courante. Plus précisément, il s’appuie sur plusieurs

[12][16][17][19]
études d’interopérabilité portant sur les outils suivants:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber

Le présent document a été élaboré pour être compatible avec les formats créés par ces outils. La

[4]

compatibilité peut s’étendre aux formats d’autres outils d’étiquetage (par exemple, Praat ou

Wavesurfer, http://www.speech.kth.se/wavesurfer/index2.html), mais peut-être à un niveau moindre

et/ou avec la nécessité de convertir ces formats dans l’un des formats ci-dessus mentionnés avant

d’ajouter des informations obligatoires (par exemple, assignation des locuteurs) à l’aide des outils

respectifs.

Le présent document a aussi pour objet d’être utilisé avec des systèmes de transcription d’utilisation

courante («conventions»). Cependant, sur un plan technique, la compatibilité n’est pas facile à définir

dans ce domaine puisque, à la différence des formats logiciels, la plupart de ces systèmes manquent de

formalisation explicite. Pour l’élaboration du présent document, les systèmes de transcription suivants

ont été pris en compte:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)

Puisque la TEI est le cadre de référence du présent document et que les métadonnées ne constituent

pas sa priorité, il n’est nullement question ici de traiter des questions de compatibilité des métadonnées

allant au-delà de l’en-tête TEI. Cependant, il convient de noter qu’il existe plusieurs profils TEI pour le

cadre CMDI qui sont reliés les uns aux autres et aux profils CMDI d’autres formats de métadonnées (par

exemple, IMDI) par l’intermédiaire du registre ISOCAT (voir aussi Références [5], [6] et [9]).

© ISO 2016 – Tous droits réservés v
---------------------- Page: 5 ----------------------
ISO 24624:2016(F)

Le présent document vise à définir tant un format cible pour la conversion des données héritées qu’un

format adapté aux exigences futures de traitement des données. Les décisions n’ont été prises qu’après

avoir soigneusement pesé les avantages et les inconvénients de ces deux exigences. Par conséquent,

en quelques endroits, certaines techniques sont indiquées comme étant recommandées d’un point de

vue de traitement des données, cependant qu’une technique alternative est toujours autorisée si la

structure des données héritées rend son utilisation incontournable.

En ce qui concerne les autres normes élaborées au sein du Comité ISO TC 37/SC 4, le présent document a

pour objet la mise en place d’une première couche sur laquelle pourront se superposer d’autres couches

d’annotations. L’utilisation de l’élément pour la tokénisation d’une transcription, notamment, est

conforme à la représentation TEI des token de l’ISO 24611 (MAF).

Le présent document s’aligne également sur les mécanismes proposés dans les recommandations de la

TEI pour intégrer les annotations déportées à un document TEI. Ce mécanisme comporte notamment

un élément générique () qui regroupe les annotations relatives au même segment

linguistique: ce regroupement répond aux besoins du présent document dans le cas d’annotations de

l’élément ou de ses enfants.

Enfin, le présent document constitue un document complémentaire: il n’empiète pas sur les normes

relatives aux interactions orales et multimodales élaborées au sein du W3C. Il ne traite pas, notamment,

[24]

de la synthèse de la parole, comme dans le cas de la SSML, ni de la représentation de l’interprétation

[25]
sémantique des énoncés multimodaux comme l’EMMA.
vi © ISO 2016 – Tous droits réservés
---------------------- Page: 6 ----------------------
NORME INTERNATIONALE ISO 24624:2016(F)
Gestion des ressources linguistiques — Transcription du
langage parlé
1 Domaine d’application

Le présent document énonce des règles de représentation des transcriptions d’enregistrements audio

et vidéo d’interactions parlées, dans des documents XML reposant sur les recommandations de la TEI.

Le deuxième objectif de ce document vise à rattacher les données transcrites à des normes de corpus

annotés. Il s’applique aux données de transcription pour des études sociolinguistiques, l’analyse de

conversation, la dialectologie, la linguistique de corpus, la lexicographie de corpus, les technologies

langagières, les études qualitatives en sciences sociales, et aux autres données de transcription

d’enregistrements du langage parlé. Il ne s’applique pas aux autres formes de transcription et surtout

pas aux transcriptions de manuscrits.

L’Annexe A présente un exemple d’encodage complet et l’Annexe B fournit un index des éléments et un

index des attributs.
2 Références normatives
Le présent document ne contient aucune référence normative.
3 Termes et définitions

Pour les besoins du présent document, les termes et définitions suivants s’appliquent.

L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en

normalisation, consultables aux adresses suivantes:
— IEC Electropedia: disponible à l’adresse http://www.electropedia.org/
— ISO Online browsing platform: disponible à l’adresse http://www.iso.org/obp
3.1
annotation dépendante

annotation qui ne renvoie pas directement à un enregistrement audio ou vidéo, mais à une autre

annotation, généralement une transcription orthographique ou phonétique
3.2
élément de bornage
élément XML vide servant à indiquer un point de délimitation
3.3
transcription orthographique

représentation ou modélisation du langage parlé reposant sur l’orthographe dudit langage

3.4
caractéristique paralinguistique

caractéristique du langage parlé, au-delà du ou des sons proprement dits, comme la qualité de la voix, sa

tonalité, son volume ou son intonation
3.5
transcription phonétique

représentation ou modélisation du langage parlé reposant sur le système phonologique dudit langage

© ISO 2016 – Tous droits réservés 1
---------------------- Page: 7 ----------------------
ISO 24624:2016(F)
3.6
langage parlé
langage oral produit par la voix humaine
3.7
transcripteur
personne qui réalise la transcription
3.8
transcription

représentation ou modélisation d’un langage parlé au moyen de symboles scripturaux

3.9
système de transcription

ensemble de principes et de règles fondés sur une base théorique, détaillant les phénomènes du langage

parlé qui doivent être transcrits, ainsi que la façon de procéder à la transcription

4 Métadonnées

Les recommandations de la TEI donnent des indications détaillées d’encodage des métadonnées dans

différentes sous-sections de l’élément . La section suivante ne traite que des métadonnées

qui sont soit (i) essentielles pour assurer le caractère interprétable et échangeable de transcriptions

de langage parlé en général, soit (ii) susceptibles de s’avérer pertinentes dans une grande majorité

de cas. Cela n’exclut pas la possibilité ou la nécessité d’encoder d’autres métadonnées dans l’élément

.
4.1 Description du fichier électronique ()
4.1.1 Informations de diffusion ( )

Il convient d’utiliser l’élément dans la section de

pour enregistrer les informations relatives aux droits d’accès et aux coordonnées de contact pour la

transcription en question.
EXEMPLE 1 Utilisation de

Hamburger Zentrum für Sprachkorpora


Accès libre à des fins de recherche et d’enseignement.
Aucune rediffusion autorisée.



Hamburger Zentrum für Sprachkorpora

Max Brauer-Allee 60
22765
Hamburg
Germany


2 © ISO 2016 – Tous droits réservés
---------------------- Page: 8 ----------------------
ISO 24624:2016(F)
4.1.2 Informations sur l’enregistrement ()

Il convient d’utiliser l’élément dans la section de pour

enregistrer les informations relatives aux enregistrements transcrits. Il convient de décrire dans cet

élément uniquement le ou les enregistrements proprement dits, généralement des fichiers numériques

audio et/ou vidéo. Il convient de décrire les informations d’ordre général portant sur l’interaction

considérée, qui sont indépendantes de (des) enregistrement(s), dans l’élément

(voir 4.2.2).

Il convient d’utiliser un élément dans un élément pour renvoyer au fichier

numérique correspondant par l’intermédiaire d’un attribut @url (voir Référence [2]). Il convient

d’assigner un attribut @type à pour indiquer le type de média de l’enregistrement: les

valeurs autorisées pour cet attribut sont «audio» et «video». Il convient d’encoder le type véritable

du fichier numérique comme attribut @mimeType (voir Référence [8]) assigné à l’élément .

Lorsqu’au moins deux fichiers sont obtenus à partir du même enregistrement maître (par exemple, un

fichier vidéo ou un extrait de piste audio), il convient que lesdits fichiers soient représentés sous forme

d’éléments différents dans le même élément , plutôt que comme des éléments

différents. Des mécanismes de liaison TEI, tels que ou @corresp, peuvent être

utilisés pour décrire des relations entre différents enregistrements ou entre des enregistrements et

d’autres éléments, comme les locuteurs.
EXEMPLE 2 Utilisation de







Parkinson Talkshow sur la BBC, émission du 02 novembre 2007

gistrement -–>
sera -–>
ex. Camcorder) –->

Extrait vidéo téléchargé sur YouTube avec aTube-Catcher, converti
au format MPG avec Adobe Premiere
Piste audio extraite de la vidéo avec Audacity 1.3 beta









Enregistré avec un micro enregistreur portatif ZOOM H4NSP
© ISO 2016 – Tous droits réservés 3
---------------------- Page: 9 ----------------------
ISO 24624:2016(F)
fixé à la robe de Victoria Beckham persName>
Synchronisé avec l’enregistrement de
David Beckham





Enregistré avec un micro enregistreur portatif ZOOM H4NSP
Fixé au col de chemise
de David Beckham
Synchronisé avec
l’enregistrement de Victoria Beckham



4.2 Description des circonstances ()
4.2.1 Informations sur les participants ()

Il convient de décrire les participants à l’interaction transcrite dans des éléments de la

section d’un élément . L’utilisation d’un attribut @n assigné à l’élément

pour définir un code abrégé représentant le participant concerné est obligatoire, car il

peut être indispensable pour répondre à de nombreux objectifs de traitement. Des éléments

dans le corps de la transcription renvoient à l’attribut @xml:id d’un élément qui doit, par

conséquent, être toujours prévu.

Afin de fournir des métadonnées supplémentaires sur les participants, il est possible d’exploiter la

totalité du modèle de contenu de , par exemple pour enregistrer l’âge, la date de naissance, le

niveau linguistique ou le rôle d’une personne dans la conversation enregistrée.
EXEMPLE 3 Utilisation de
4 © ISO 2016 – Tous droits réservés
---------------------- Page: 10 ----------------------
ISO 24624:2016(F)



Daniel
Steward




anglais britannique
français





Fiona
Baker




4.2.2 Informations sur le contexte ()

Il convient d’utiliser l’élément pour fournir des informations d’ordre général sur

le contexte et les circonstances de l’interaction. Cela inclut des aspects tels que l’endroit et l’heure,

l’organisation spatiale et les artéfacts de l’interaction. Il convient que les informations concernant un

enregistrement spécifique de cette interaction ne soient pas enregistrées dans cet élément, mais dans

l’élément (voir 4.1.2).
EXEMPLE 4 Utilisation de


studio de la BBC Londres


Animateur du talkshow Michael Parkinson interviewant David et
Victoria
Beckham au sujet de leur relation



© ISO 2016 – Tous droits réservés 5
---------------------- Page: 11 ----------------------
ISO 24624:2016(F)
4.3 Description de la source ()

On utilise l’élément pour enregistrer des informations sur la façon dont on obtient,

à partir d’une source enregistrée, le texte encodé selon la TEI. Cela comprend les informations tant

sur l’outil qui a produit la transcription, dans un élément , que la convention utilisée pour

transcrire les données, dans un élément . Il convient d’assigner les attributs @

ident et @version à ces éléments pour permettre l’accès à ces informations via un procédé lisible par

machine.
EXEMPLE 5 Utilisation de






Outil de transcription avec exportation TEI




Transcription orthographi
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.