Language resource management — Transcription of spoken language

ISO 24624:2016 specifies rules for representing transcriptions of audio- and video-recorded spoken interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the document aims to relate transcribed data with standards for annotated corpora. It is applicable to transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics, corpus lexicography, language technology, qualitative social studies and other transcription data of recorded spoken language. It is not applicable to other forms of transcription, most importantly transcriptions of hand-written manuscripts. Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.

Gestion des ressources linguistiques — Transcription du langage parlé

L'ISO 24624:2016 énonce des règles de représentation des transcriptions d'enregistrements audio et vidéo d'interactions parlées, dans des documents XML reposant sur les recommandations de la TEI. Le deuxième objectif de ce document vise à rattacher les données transcrites à des normes de corpus annotés. Il s'applique aux données de transcription pour des études sociolinguistiques, l'analyse de conversation, la dialectologie, la linguistique de corpus, la lexicographie de corpus, les technologies langagières, les études qualitatives en sciences sociales, et aux autres données de transcription d'enregistrements du langage parlé. Il ne s'applique pas aux autres formes de transcription et surtout pas aux transcriptions de manuscrits. L'Annexe A présente un exemple d'encodage complet et l'Annexe B fournit un index des éléments et un index des attributs.

Upravljanje z jezikovnimi viri - Transkripcija govorjenega jezika

Ta dokument določa pravila za predstavitev transkripcij zvočnih in video posnetkov govorne komunikacije v dokumentih XML na podlagi smernic pobude za zapis besedil (TEI). Drugotni namen tega dokumenta je povezati prepisane podatke in standarde za označene korpuse. Uporablja se za prepisane podatke za študije na področju sociolingvistike, pogovorne analize, dialektologije, korpusnega jezikoslovja, korpusne leksikografije, jezikovne tehnologije, kvalitativne družboslovne študije in druge prepisane podatke zabeleženega govornega jezika. Ne uporablja se za druge oblike transkripcije, zlasti transkripcije ročno napisanih rokopisov.
V dodatku A je podan v celoti kodiran primer, v dodatku B pa sta podana kazalo elementov in kazalo atributov.

General Information

Status
Published
Publication Date
24-Jul-2016
Current Stage
9093 - International Standard confirmed
Completion Date
28-Apr-2022

Buy Standard

Standard
ISO 24624:2018
English language
39 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24624:2016 - Language resource management -- Transcription of spoken language
English language
32 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO 24624:2018
English language
39 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO 24624:2016 - Gestion des ressources linguistiques -- Transcription du langage parlé
French language
34 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

SLOVENSKI STANDARD
SIST ISO 24624:2018
01-oktober-2018
Upravljanje z jezikovnimi viri - Transkripcija govorjenega jezika
Language resource management -- Transcription of spoken language
Gestion des ressources linguistiques -- Transcription du langage parlé
Ta slovenski standard je istoveten z: ISO 24624:2016
ICS:
01.140.10 Pisanje in prečrkovanje Writing and transliteration
35.240.30 Uporabniške rešitve IT v IT applications in information,
informatiki, dokumentiranju in documentation and
založništvu publishing
SIST ISO 24624:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------
SIST ISO 24624:2018

---------------------- Page: 2 ----------------------
SIST ISO 24624:2018
INTERNATIONAL ISO
STANDARD 24624
First edition
2016-08-15
Language resource management —
Transcription of spoken language
Gestion des ressources linguistiques — Transcription du langage parlé
Reference number
ISO 24624:2016(E)
©
ISO 2016

---------------------- Page: 3 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved

---------------------- Page: 4 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Metadata . 2
4.1 Description of the electronic file () . 2
4.1.1 Distribution information () . 2
4.1.2 Recording information (). 2
4.2 Description of circumstances () . 4
4.2.1 Participant information () . 4
4.2.2 Setting information () . 4
4.3 Description of source () . 5
5 Macrostructure . 5
5.1 Timeline () . 5
5.2 Utterances () . 6
5.3 Free dependent annotations (, ) . 7
5.4 Grouping of utterances and dependent annotations () . 9
5.5 Independent elements outside utterances ( and ) .10
5.6 Inline paralinguistic annotation () .10
5.7 Global divisions of a transcription (

) .11
6 Microstructure .12
6.1 Tokens () .12
6.1.1 Characterization .12
6.1.2 Representation as .12
6.1.3 Further constraints .13
6.1.4 Examples .13
6.2 Pauses () .14
6.2.1 Characterization .14
6.2.2 Representation as .14
6.2.3 Further constraints .14
6.2.4 Examples .15
6.3 Audible and visible non-speech events (, and ) .15
6.3.1 Characterization .15
6.3.2 Representation as , or .16
6.3.3 Examples .16
6.4 Punctuation () .17
6.4.1 Characterization .17
6.4.2 Representation as .17
6.4.3 Further constraints .17
6.4.4 Examples .18
6.5 Uncertainty, alternatives, incomprehensible and omitted passages (,
, ) .18
6.5.1 Characterization .18
6.5.2 Representation as or .18
6.5.3 Further constraints .18
6.5.4 Examples .19
6.6 Units above the token and below the level () .20
6.6.1 Characterization .20
6.6.2 Representation as .20
6.6.3 Further constraints .20
6.6.4 Examples .20
© ISO 2016 – All rights reserved iii

---------------------- Page: 5 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

Annex A (informative) Fully encoded example .22
Annex B (informative) Element and attribute index .28
Bibliography .31
iv © ISO 2016 – All rights reserved

---------------------- Page: 6 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
© ISO 2016 – All rights reserved v

---------------------- Page: 7 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

Introduction
This document sets out to facilitate the interchange of transcriptions of spoken language between
different computational tools and environments for creating, editing, publishing and exploiting such
data. Transcription of spoken language in this context means an orthography-based transcription of
verbal activity as recorded in an audio or video recording of a natural interaction. The description of
activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken
language transcription, but this document starts from the assumption that the verbal dimension is
the primary focus of a spoken language transcription. Likewise, although this document may also be
relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is
that orthography-based transcription is the default case.
This document is developed in the context of the joint agreement between ISO and the Text Encoding
Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI
[23]
guidelines.
This document takes into account data models and encoding practices supported by widely used
[12],[16],[17],[19]
transcription software. More specifically, it builds on several interoperability studies
involving the following tools:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber
This document was developed to be compatible with the formats produced by these tools. The
[4]
compatibility may extend to the formats of further labelling tools (e.g. Praat or Wavesurfer, http://
www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement
to convert these formats to one of the above-mentioned before adding mandatory information (e.g.
speaker assignment) using the respective tools.
This document also aims to be usable with widely used transcription systems (“conventions”). However,
in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most
of these systems lack an explicit formalization. The following selection of transcription systems was
considered for this document:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)
Since TEI is the reference framework for this document and metadata is not its main concern, no attempt
is made here to address metadata compatibility issues beyond the TEI header. However, it should be
noted that there are several TEI profiles for the CMDI framework which are related both to each other
and to CMDI profiles of other metadata formats (e.g. IMDI) via the ISOCAT registry (see also References
[5], [6] and [9]).
This document aims to define both a target format for legacy data conversion and a format suitable for
future data processing requirements. The pros and cons of these two demands were carefully weighed
up before decisions were taken. At some points, certain techniques are therefore marked as preferred
vi © ISO 2016 – All rights reserved

---------------------- Page: 8 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

from a data processing point of view while an alternative technique is still allowed if the structure of
legacy data makes its use unavoidable.
With regard to the other standards developed within ISO committee TC 37/SC 4, this document is
intended to provide the primary layer on top of which further annotation layers may be implemented.
In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based
representation of tokens ISO 24611 (MAF).
This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-
off annotations within a TEI document. In particular, this mechanism contains a generic element
() that groups together annotations related to the same linguistic segment; this
grouping meets the needs of this document in the case of annotations of elements or its children.
Finally, this document is complementary and does not overlap with the speech and multimodal
interaction-related standards developed within the W3C. In particular, it does not deal with speech
[24]
synthesis as is the case for SSML, nor does it deal with the representation of the semantic
[25]
interpretation of multimodal utterances as does EMMA.
© ISO 2016 – All rights reserved vii

---------------------- Page: 9 ----------------------
SIST ISO 24624:2018

---------------------- Page: 10 ----------------------
SIST ISO 24624:2018
INTERNATIONAL STANDARD ISO 24624:2016(E)
Language resource management — Transcription of
spoken language
1 Scope
This document specifies rules for representing transcriptions of audio- and video-recorded spoken
interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the
document aims to relate transcribed data with standards for annotated corpora. It is applicable to
transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics,
corpus lexicography, language technology, qualitative social studies and other transcription data
of recorded spoken language. It is not applicable to other forms of transcription, most importantly
transcriptions of hand-written manuscripts.
Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
dependent annotation
annotation which does not refer directly to an audio or video recording, but to another annotation,
typically an orthographic or phonetic transcription
3.2
milestone element
empty XML element used to indicate a boundary point
3.3
orthographic transcription
representation or modelling of spoken language based on the orthography of the respective language
3.4
paralinguistic feature
feature of spoken language beyond the individual sound(s), such as voice quality, pitch, volume,
intonation
3.5
phonetic transcription
representation or modelling of spoken language based on the sound system of the respective language
3.6
spoken language
oral language produced by a person’s vocal system
© ISO 2016 – All rights reserved 1

---------------------- Page: 11 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

3.7
transcriber
person who carries out the transcription
3.8
transcription
representation or modelling of spoken language by means of written symbols
3.9
transcription system
theoretically founded set of principles and rules detailing what spoken language phenomena are to be
transcribed, and how they are to be transcribed
4 Metadata
The TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections
of the element. The following section addresses only those pieces of metadata which are
either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions
in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility
of, or necessity for, encoding further metadata inside the element.
4.1 Description of the electronic file ()
4.1.1 Distribution information ()
The element inside the section of the should be used to
record information about access rights and contact information for the transcription in question.
EXAMPLE 1 Use of

  Hamburger Zentrum für Sprachkorpora
 
   
   

Available free for research and teaching purposes.
     No redistributing allowed.


 
  Hamburger Zentrum für Sprachkorpora
 

   Max Brauer-Allee 60
   22765
   Hamburg
   Germany
 


4.1.2 Recording information ()
The element inside the section of the should be used to
record information about the transcribed recording(s). Only the actual recording(s), usually digital
audio and/or video files, should be described here. General information about the respective interaction
which is independent of the recording(s) should be described in the element (see 4.2.2).
2 © ISO 2016 – All rights reserved

---------------------- Page: 12 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

A element inside a element should be used to refer to the corresponding digital
file via a @url attribute (see Reference [2]). A @type attribute on should be used to
indicate the media type of the recording; audio and video are the permissible values for that attribute.
The actual digital file type should be encoded as a @mimeType attribute (see Reference [8]) on the
element. Where two or more files are derived from the same master recording (e.g. a video
file or an extracted audio track), these should be represented as different elements inside the
same element, rather than as different elements. TEI linking mechanisms,
such as or @corresp, can be used to describe relationships between different recordings or
between recordings and other elements, such as speakers.
EXAMPLE 2 Use of



  
    
    
    
      Parkinson Talkshow on BBC, broadcast on 02 November 2007
    
    
    
    
    
      Video excerpt downloaded from YouTube with aTube-Catcher, converted
        into MPG format with Adobe Premiere
      Audio extracted from video with Audacity 1.3 beta
    
  




  
    
    
      Recorded with a ZOOM H4NSP, external lapel microphone
       clipped to Victoria Beckham’s
dress
      Synchronized with David Beckham’s record-
ing
    
  
  
    
    
      Recorded with a ZOOM H4NSP, external lapel microphone
      clipped to David Beckham’s
      shirt collar
      Synchronized with
© ISO 2016 – All rights reserved 3

---------------------- Page: 13 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

      Victoria Beckham’s recording
    
  

4.2 Description of circumstances ()
4.2.1 Participant information ()
The participants of the transcribed interaction should be described in elements inside
the section of a element. The use of an @n attribute on the
element to define an abbreviated code for the respective participant is mandatory since it can be crucial
for many processing purposes. elements inside the body of the transcription refer to the @xml:id
attribute of a element, which shall therefore always be provided.
In order to provide additional metadata about participants, the content model of can be fully
exploited, for example, to record a person’s age, birth date, language knowledge or role in the recorded
conversation.
EXAMPLE 3 Use of

 
  
    Daniel
    Steward
  
  
  
  
    British English
    French
  
  
 
 
  
    Fiona
    Baker
  
  
 

4.2.2 Setting information ()
The element should be used to provide general information about the setting and
circumstances of the interaction. This includes such matters as the place and time, spatial organization
4 © ISO 2016 – All rights reserved

---------------------- Page: 14 ----------------------
SIST ISO 24624:2018
ISO 24624:2016(E)

and artefacts of the interaction. Information pertaining to a specific recording of that interaction should
not be recorded here, but in the (see 4.1.2).
EXAMPLE 4 Use of

 
  BBC studio London
 
 
   Talkshow host Michael Parkinson interviewing David and Victoria
        Beckham about their relationship
 
 

4.3 Description of source ()
The element is used to record information about the way the TEI encoded text has
been derived from a recorded source. This includes information about both the tool which created the
transcription inside an element and the convention used in transcribing the data inside a
element. @ident and @version attributes should be used on these elements to
provide a machine-readable way of accessing this information.
EXAMPLE 5 Use of

 
   
   
   
    
     Transcription Tool providing a TEI Export
   
 
 
 
    Orthographic transcription according to HIAT
 

5 Macrostructure
5.1 Timeline ()
elements inside a element should be used to define points in the recording;
these points are then referred to by @start, @end and @synch attributes of other elements (most
importa
...

INTERNATIONAL ISO
STANDARD 24624
First edition
2016-08-15
Language resource management —
Transcription of spoken language
Gestion des ressources linguistiques — Transcription du langage parlé
Reference number
ISO 24624:2016(E)
©
ISO 2016

---------------------- Page: 1 ----------------------
ISO 24624:2016(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24624:2016(E)

Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Metadata . 2
4.1 Description of the electronic file () . 2
4.1.1 Distribution information () . 2
4.1.2 Recording information (). 2
4.2 Description of circumstances () . 4
4.2.1 Participant information () . 4
4.2.2 Setting information () . 4
4.3 Description of source () . 5
5 Macrostructure . 5
5.1 Timeline () . 5
5.2 Utterances () . 6
5.3 Free dependent annotations (, ) . 7
5.4 Grouping of utterances and dependent annotations () . 9
5.5 Independent elements outside utterances ( and ) .10
5.6 Inline paralinguistic annotation () .10
5.7 Global divisions of a transcription (

) .11
6 Microstructure .12
6.1 Tokens () .12
6.1.1 Characterization .12
6.1.2 Representation as .12
6.1.3 Further constraints .13
6.1.4 Examples .13
6.2 Pauses () .14
6.2.1 Characterization .14
6.2.2 Representation as .14
6.2.3 Further constraints .14
6.2.4 Examples .15
6.3 Audible and visible non-speech events (, and ) .15
6.3.1 Characterization .15
6.3.2 Representation as , or .16
6.3.3 Examples .16
6.4 Punctuation () .17
6.4.1 Characterization .17
6.4.2 Representation as .17
6.4.3 Further constraints .17
6.4.4 Examples .18
6.5 Uncertainty, alternatives, incomprehensible and omitted passages (,
, ) .18
6.5.1 Characterization .18
6.5.2 Representation as or .18
6.5.3 Further constraints .18
6.5.4 Examples .19
6.6 Units above the token and below the level () .20
6.6.1 Characterization .20
6.6.2 Representation as .20
6.6.3 Further constraints .20
6.6.4 Examples .20
© ISO 2016 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO 24624:2016(E)

Annex A (informative) Fully encoded example .22
Annex B (informative) Element and attribute index .28
Bibliography .31
iv © ISO 2016 – All rights reserved

---------------------- Page: 4 ----------------------
ISO 24624:2016(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
© ISO 2016 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO 24624:2016(E)

Introduction
This document sets out to facilitate the interchange of transcriptions of spoken language between
different computational tools and environments for creating, editing, publishing and exploiting such
data. Transcription of spoken language in this context means an orthography-based transcription of
verbal activity as recorded in an audio or video recording of a natural interaction. The description of
activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken
language transcription, but this document starts from the assumption that the verbal dimension is
the primary focus of a spoken language transcription. Likewise, although this document may also be
relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is
that orthography-based transcription is the default case.
This document is developed in the context of the joint agreement between ISO and the Text Encoding
Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI
[23]
guidelines.
This document takes into account data models and encoding practices supported by widely used
[12],[16],[17],[19]
transcription software. More specifically, it builds on several interoperability studies
involving the following tools:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber
This document was developed to be compatible with the formats produced by these tools. The
[4]
compatibility may extend to the formats of further labelling tools (e.g. Praat or Wavesurfer, http://
www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement
to convert these formats to one of the above-mentioned before adding mandatory information (e.g.
speaker assignment) using the respective tools.
This document also aims to be usable with widely used transcription systems (“conventions”). However,
in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most
of these systems lack an explicit formalization. The following selection of transcription systems was
considered for this document:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)
Since TEI is the reference framework for this document and metadata is not its main concern, no attempt
is made here to address metadata compatibility issues beyond the TEI header. However, it should be
noted that there are several TEI profiles for the CMDI framework which are related both to each other
and to CMDI profiles of other metadata formats (e.g. IMDI) via the ISOCAT registry (see also References
[5], [6] and [9]).
This document aims to define both a target format for legacy data conversion and a format suitable for
future data processing requirements. The pros and cons of these two demands were carefully weighed
up before decisions were taken. At some points, certain techniques are therefore marked as preferred
vi © ISO 2016 – All rights reserved

---------------------- Page: 6 ----------------------
ISO 24624:2016(E)

from a data processing point of view while an alternative technique is still allowed if the structure of
legacy data makes its use unavoidable.
With regard to the other standards developed within ISO committee TC 37/SC 4, this document is
intended to provide the primary layer on top of which further annotation layers may be implemented.
In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based
representation of tokens ISO 24611 (MAF).
This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-
off annotations within a TEI document. In particular, this mechanism contains a generic element
() that groups together annotations related to the same linguistic segment; this
grouping meets the needs of this document in the case of annotations of elements or its children.
Finally, this document is complementary and does not overlap with the speech and multimodal
interaction-related standards developed within the W3C. In particular, it does not deal with speech
[24]
synthesis as is the case for SSML, nor does it deal with the representation of the semantic
[25]
interpretation of multimodal utterances as does EMMA.
© ISO 2016 – All rights reserved vii

---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO 24624:2016(E)
Language resource management — Transcription of
spoken language
1 Scope
This document specifies rules for representing transcriptions of audio- and video-recorded spoken
interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the
document aims to relate transcribed data with standards for annotated corpora. It is applicable to
transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics,
corpus lexicography, language technology, qualitative social studies and other transcription data
of recorded spoken language. It is not applicable to other forms of transcription, most importantly
transcriptions of hand-written manuscripts.
Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
dependent annotation
annotation which does not refer directly to an audio or video recording, but to another annotation,
typically an orthographic or phonetic transcription
3.2
milestone element
empty XML element used to indicate a boundary point
3.3
orthographic transcription
representation or modelling of spoken language based on the orthography of the respective language
3.4
paralinguistic feature
feature of spoken language beyond the individual sound(s), such as voice quality, pitch, volume,
intonation
3.5
phonetic transcription
representation or modelling of spoken language based on the sound system of the respective language
3.6
spoken language
oral language produced by a person’s vocal system
© ISO 2016 – All rights reserved 1

---------------------- Page: 8 ----------------------
ISO 24624:2016(E)

3.7
transcriber
person who carries out the transcription
3.8
transcription
representation or modelling of spoken language by means of written symbols
3.9
transcription system
theoretically founded set of principles and rules detailing what spoken language phenomena are to be
transcribed, and how they are to be transcribed
4 Metadata
The TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections
of the element. The following section addresses only those pieces of metadata which are
either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions
in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility
of, or necessity for, encoding further metadata inside the element.
4.1 Description of the electronic file ()
4.1.1 Distribution information ()
The element inside the section of the should be used to
record information about access rights and contact information for the transcription in question.
EXAMPLE 1 Use of

  Hamburger Zentrum für Sprachkorpora
 
   
   

Available free for research and teaching purposes.
     No redistributing allowed.


 
  Hamburger Zentrum für Sprachkorpora
 

   Max Brauer-Allee 60
   22765
   Hamburg
   Germany
 


4.1.2 Recording information ()
The element inside the section of the should be used to
record information about the transcribed recording(s). Only the actual recording(s), usually digital
audio and/or video files, should be described here. General information about the respective interaction
which is independent of the recording(s) should be described in the element (see 4.2.2).
2 © ISO 2016 – All rights reserved

---------------------- Page: 9 ----------------------
ISO 24624:2016(E)

A element inside a element should be used to refer to the corresponding digital
file via a @url attribute (see Reference [2]). A @type attribute on should be used to
indicate the media type of the recording; audio and video are the permissible values for that attribute.
The actual digital file type should be encoded as a @mimeType attribute (see Reference [8]) on the
element. Where two or more files are derived from the same master recording (e.g. a video
file or an extracted audio track), these should be represented as different elements inside the
same element, rather than as different elements. TEI linking mechanisms,
such as or @corresp, can be used to describe relationships between different recordings or
between recordings and other elements, such as speakers.
EXAMPLE 2 Use of



  
    
    
    
      Parkinson Talkshow on BBC, broadcast on 02 November 2007
    
    
    
    
    
      Video excerpt downloaded from YouTube with aTube-Catcher, converted
        into MPG format with Adobe Premiere
      Audio extracted from video with Audacity 1.3 beta
    
  




  
    
    
      Recorded with a ZOOM H4NSP, external lapel microphone
       clipped to Victoria Beckham’s
dress
      Synchronized with David Beckham’s record-
ing
    
  
  
    
    
      Recorded with a ZOOM H4NSP, external lapel microphone
      clipped to David Beckham’s
      shirt collar
      Synchronized with
© ISO 2016 – All rights reserved 3

---------------------- Page: 10 ----------------------
ISO 24624:2016(E)

      Victoria Beckham’s recording
    
  

4.2 Description of circumstances ()
4.2.1 Participant information ()
The participants of the transcribed interaction should be described in elements inside
the section of a element. The use of an @n attribute on the
element to define an abbreviated code for the respective participant is mandatory since it can be crucial
for many processing purposes. elements inside the body of the transcription refer to the @xml:id
attribute of a element, which shall therefore always be provided.
In order to provide additional metadata about participants, the content model of can be fully
exploited, for example, to record a person’s age, birth date, language knowledge or role in the recorded
conversation.
EXAMPLE 3 Use of

 
  
    Daniel
    Steward
  
  
  
  
    British English
    French
  
  
 
 
  
    Fiona
    Baker
  
  
 

4.2.2 Setting information ()
The element should be used to provide general information about the setting and
circumstances of the interaction. This includes such matters as the place and time, spatial organization
4 © ISO 2016 – All rights reserved

---------------------- Page: 11 ----------------------
ISO 24624:2016(E)

and artefacts of the interaction. Information pertaining to a specific recording of that interaction should
not be recorded here, but in the (see 4.1.2).
EXAMPLE 4 Use of

 
  BBC studio London
 
 
   Talkshow host Michael Parkinson interviewing David and Victoria
        Beckham about their relationship
 
 

4.3 Description of source ()
The element is used to record information about the way the TEI encoded text has
been derived from a recorded source. This includes information about both the tool which created the
transcription inside an element and the convention used in transcribing the data inside a
element. @ident and @version attributes should be used on these elements to
provide a machine-readable way of accessing this information.
EXAMPLE 5 Use of

 
   
   
   
    
     Transcription Tool providing a TEI Export
   
 
 
 
    Orthographic transcription according to HIAT
 

5 Macrostructure
5.1 Timeline ()
elements inside a element should be used to define points in the recording;
these points are then referred to by @start, @end and @synch attributes of other elements (most
importantly elements) of the transcription to represent its temporal structure. It is therefore
obligatory to provide an @xml:id attribute for each element. elements shall be in
© ISO 2016 – All rights reserved 5

---------------------- Page: 12 ----------------------
ISO 24624:2016(E)

the same order as the timepoints they refer to. Specifying an @interval attribute is optional, but it is
very useful for many processing purposes. Absolute time values in the @interval attribute should be
given in seconds from the start of the recording with the appropriate number of decimal points. The
first element in the timeline corresponds to the start time of the transcribed recording. If an
absolute value is known for this point in time, it can be encoded in an @absolute attribute of the first
element and the element can point to it via an @origin attribute. If no absolute value for
the start of the recording can be provided, the @origin and @absolute attributes should be omitted.
EXAMPLE 6 Use of

  ...

SLOVENSKI STANDARD
SIST ISO 24624:2018
01-oktober-2018
Upravljanje z jezikovnimi viri - Transkripcija govorjenega jezika
Language resource management -- Transcription of spoken language
Gestion des ressources linguistiques -- Transcription du langage parlé
Ta slovenski standard je istoveten z: ISO 24624:2016
ICS:
01.140.10 3LVDQMHLQSUHþUNRYDQMH Writing and transliteration
35.060 Jeziki, ki se uporabljajo v Languages used in
informacijski tehniki in information technology
tehnologiji
SIST ISO 24624:2018 en,fr,de
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------

SIST ISO 24624:2018

---------------------- Page: 2 ----------------------

SIST ISO 24624:2018
INTERNATIONAL ISO
STANDARD 24624
First edition
2016-08-15
Language resource management —
Transcription of spoken language
Gestion des ressources linguistiques — Transcription du langage parlé
Reference number
ISO 24624:2016(E)
©
ISO 2016

---------------------- Page: 3 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO 2016, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – All rights reserved

---------------------- Page: 4 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Metadata . 2
4.1 Description of the electronic file () . 2
4.1.1 Distribution information () . 2
4.1.2 Recording information (). 2
4.2 Description of circumstances () . 4
4.2.1 Participant information () . 4
4.2.2 Setting information () . 4
4.3 Description of source () . 5
5 Macrostructure . 5
5.1 Timeline () . 5
5.2 Utterances () . 6
5.3 Free dependent annotations (, ) . 7
5.4 Grouping of utterances and dependent annotations () . 9
5.5 Independent elements outside utterances ( and ) .10
5.6 Inline paralinguistic annotation () .10
5.7 Global divisions of a transcription (

) .11
6 Microstructure .12
6.1 Tokens () .12
6.1.1 Characterization .12
6.1.2 Representation as .12
6.1.3 Further constraints .13
6.1.4 Examples .13
6.2 Pauses () .14
6.2.1 Characterization .14
6.2.2 Representation as .14
6.2.3 Further constraints .14
6.2.4 Examples .15
6.3 Audible and visible non-speech events (, and ) .15
6.3.1 Characterization .15
6.3.2 Representation as , or .16
6.3.3 Examples .16
6.4 Punctuation () .17
6.4.1 Characterization .17
6.4.2 Representation as .17
6.4.3 Further constraints .17
6.4.4 Examples .18
6.5 Uncertainty, alternatives, incomprehensible and omitted passages (,
, ) .18
6.5.1 Characterization .18
6.5.2 Representation as or .18
6.5.3 Further constraints .18
6.5.4 Examples .19
6.6 Units above the token and below the level () .20
6.6.1 Characterization .20
6.6.2 Representation as .20
6.6.3 Further constraints .20
6.6.4 Examples .20
© ISO 2016 – All rights reserved iii

---------------------- Page: 5 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

Annex A (informative) Fully encoded example .22
Annex B (informative) Element and attribute index .28
Bibliography .31
iv © ISO 2016 – All rights reserved

---------------------- Page: 6 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity assessment,
as well as information about ISO’s adherence to the World Trade Organization (WTO) principles in the
Technical Barriers to Trade (TBT) see the following URL: www.iso.org/iso/foreword.html.
The committee responsible for this document is ISO/TC 37, Terminology and other language and content
resources, Subcommittee SC 4, Language resource management.
© ISO 2016 – All rights reserved v

---------------------- Page: 7 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

Introduction
This document sets out to facilitate the interchange of transcriptions of spoken language between
different computational tools and environments for creating, editing, publishing and exploiting such
data. Transcription of spoken language in this context means an orthography-based transcription of
verbal activity as recorded in an audio or video recording of a natural interaction. The description of
activity in other modalities (e.g. body language, gestures and facial expression) may be part of a spoken
language transcription, but this document starts from the assumption that the verbal dimension is
the primary focus of a spoken language transcription. Likewise, although this document may also be
relevant for transcription based on phonetic alphabets like the IPA, the assumption for this document is
that orthography-based transcription is the default case.
This document is developed in the context of the joint agreement between ISO and the Text Encoding
Initiative (TEI) consortium, and accordingly, its content is also distributed as part of the TEI
[23]
guidelines.
This document takes into account data models and encoding practices supported by widely used
[12],[16],[17],[19]
transcription software. More specifically, it builds on several interoperability studies
involving the following tools:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber
This document was developed to be compatible with the formats produced by these tools. The
[4]
compatibility may extend to the formats of further labelling tools (e.g. Praat or Wavesurfer, http://
www.speech.kth.se/wavesurfer/index2.html), but possibly on a lower level and/or with a requirement
to convert these formats to one of the above-mentioned before adding mandatory information (e.g.
speaker assignment) using the respective tools.
This document also aims to be usable with widely used transcription systems (“conventions”). However,
in a technical sense, compatibility is not easily definable in this area since, unlike the tool formats, most
of these systems lack an explicit formalization. The following selection of transcription systems was
considered for this document:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)
Since TEI is the reference framework for this document and metadata is not its main concern, no attempt
is made here to address metadata compatibility issues beyond the TEI header. However, it should be
noted that there are several TEI profiles for the CMDI framework which are related both to each other
and to CMDI profiles of other metadata formats (e.g. IMDI) via the ISOCAT registry (see also References
[5], [6] and [9]).
This document aims to define both a target format for legacy data conversion and a format suitable for
future data processing requirements. The pros and cons of these two demands were carefully weighed
up before decisions were taken. At some points, certain techniques are therefore marked as preferred
vi © ISO 2016 – All rights reserved

---------------------- Page: 8 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

from a data processing point of view while an alternative technique is still allowed if the structure of
legacy data makes its use unavoidable.
With regard to the other standards developed within ISO committee TC 37/SC 4, this document is
intended to provide the primary layer on top of which further annotation layers may be implemented.
In particular, the use of the element for tokenizing a transcription is conformable to the TEI-based
representation of tokens ISO 24611 (MAF).
This document also aligns with the mechanism proposed in the TEI guidelines to embed stand-
off annotations within a TEI document. In particular, this mechanism contains a generic element
() that groups together annotations related to the same linguistic segment; this
grouping meets the needs of this document in the case of annotations of elements or its children.
Finally, this document is complementary and does not overlap with the speech and multimodal
interaction-related standards developed within the W3C. In particular, it does not deal with speech
[24]
synthesis as is the case for SSML, nor does it deal with the representation of the semantic
[25]
interpretation of multimodal utterances as does EMMA.
© ISO 2016 – All rights reserved vii

---------------------- Page: 9 ----------------------

SIST ISO 24624:2018

---------------------- Page: 10 ----------------------

SIST ISO 24624:2018
INTERNATIONAL STANDARD ISO 24624:2016(E)
Language resource management — Transcription of
spoken language
1 Scope
This document specifies rules for representing transcriptions of audio- and video-recorded spoken
interactions in XML documents based on the guidelines of the TEI. As a secondary objective, the
document aims to relate transcribed data with standards for annotated corpora. It is applicable to
transcription data for studies in sociolinguistics, conversation analysis, dialectology, corpus linguistics,
corpus lexicography, language technology, qualitative social studies and other transcription data
of recorded spoken language. It is not applicable to other forms of transcription, most importantly
transcriptions of hand-written manuscripts.
Annex A gives a fully encoded example and Annex B provides an element index and an attribute index.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
dependent annotation
annotation which does not refer directly to an audio or video recording, but to another annotation,
typically an orthographic or phonetic transcription
3.2
milestone element
empty XML element used to indicate a boundary point
3.3
orthographic transcription
representation or modelling of spoken language based on the orthography of the respective language
3.4
paralinguistic feature
feature of spoken language beyond the individual sound(s), such as voice quality, pitch, volume,
intonation
3.5
phonetic transcription
representation or modelling of spoken language based on the sound system of the respective language
3.6
spoken language
oral language produced by a person’s vocal system
© ISO 2016 – All rights reserved 1

---------------------- Page: 11 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

3.7
transcriber
person who carries out the transcription
3.8
transcription
representation or modelling of spoken language by means of written symbols
3.9
transcription system
theoretically founded set of principles and rules detailing what spoken language phenomena are to be
transcribed, and how they are to be transcribed
4 Metadata
The TEI guidelines formulate extensive suggestions for encoding metadata inside different subsections
of the element. The following section addresses only those pieces of metadata which are
either (i) crucial for ensuring the interpretability and exchangeability of spoken language transcriptions
in general or (ii) likely to be relevant in a large majority of cases. This does not preclude the possibility
of, or necessity for, encoding further metadata inside the element.
4.1 Description of the electronic file ()
4.1.1 Distribution information ()
The element inside the section of the should be used to
record information about access rights and contact information for the transcription in question.
EXAMPLE 1 Use of

  Hamburger Zentrum für Sprachkorpora
 
   
   

Available free for research and teaching purposes.
     No redistributing allowed.


 
  Hamburger Zentrum für Sprachkorpora
 

   Max Brauer-Allee 60
   22765
   Hamburg
   Germany
 


4.1.2 Recording information ()
The element inside the section of the should be used to
record information about the transcribed recording(s). Only the actual recording(s), usually digital
audio and/or video files, should be described here. General information about the respective interaction
which is independent of the recording(s) should be described in the element (see 4.2.2).
2 © ISO 2016 – All rights reserved

---------------------- Page: 12 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

A element inside a element should be used to refer to the corresponding digital
file via a @url attribute (see Reference [2]). A @type attribute on should be used to
indicate the media type of the recording; audio and video are the permissible values for that attribute.
The actual digital file type should be encoded as a @mimeType attribute (see Reference [8]) on the
element. Where two or more files are derived from the same master recording (e.g. a video
file or an extracted audio track), these should be represented as different elements inside the
same element, rather than as different elements. TEI linking mechanisms,
such as or @corresp, can be used to describe relationships between different recordings or
between recordings and other elements, such as speakers.
EXAMPLE 2 Use of



  
    
    
    
      Parkinson Talkshow on BBC, broadcast on 02 November 2007
    
    
    
    
    
      Video excerpt downloaded from YouTube with aTube-Catcher, converted
        into MPG format with Adobe Premiere
      Audio extracted from video with Audacity 1.3 beta
    
  




  
    
    
      Recorded with a ZOOM H4NSP, external lapel microphone
       clipped to Victoria Beckham’s
dress
      Synchronized with David Beckham’s record-
ing
    
  
  
    
    
      Recorded with a ZOOM H4NSP, external lapel microphone
      clipped to David Beckham’s
      shirt collar
      Synchronized with
© ISO 2016 – All rights reserved 3

---------------------- Page: 13 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

      Victoria Beckham’s recording
    
  

4.2 Description of circumstances ()
4.2.1 Participant information ()
The participants of the transcribed interaction should be described in elements inside
the section of a element. The use of an @n attribute on the
element to define an abbreviated code for the respective participant is mandatory since it can be crucial
for many processing purposes. elements inside the body of the transcription refer to the @xml:id
attribute of a element, which shall therefore always be provided.
In order to provide additional metadata about participants, the content model of can be fully
exploited, for example, to record a person’s age, birth date, language knowledge or role in the recorded
conversation.
EXAMPLE 3 Use of

 
  
    Daniel
    Steward
  
  
  
  
    British English
    French
  
  
 
 
  
    Fiona
    Baker
  
  
 

4.2.2 Setting information ()
The element should be used to provide general information about the setting and
circumstances of the interaction. This includes such matters as the place and time, spatial organization
4 © ISO 2016 – All rights reserved

---------------------- Page: 14 ----------------------

SIST ISO 24624:2018
ISO 24624:2016(E)

and artefacts of the interaction. Information pertaining to a specific recording of that interaction should
not be recorded here, but in the (see 4.1.2).
EXAMPLE 4 Use of

 
  BBC studio London
 
 
   Talkshow host Michael Parkinson interviewing David and Victoria
        Beckham about their relationship
 
 

4.3 Description of source ()
The element is used to record information about the way the TEI encoded text has
been derived from a recorded source. This includes information about both the tool which created the
transcription inside an element and the convention used in transcribing the data inside a
element. @ident and @version attributes should be used on these elements to
provide a machine-readable way of accessing this information.
EXAMPLE 5 Use of

 
   
   
   
    
     Transcription Tool providing a TEI Export
   
 
 
 
    Orthographic transcription according to HIAT
 

5 Macrostructure
5.1 Timeline ()
elements inside a element should be used to define points in the recording;
these points are then referred to by @start, @end and @synch attributes of other elements (most
importa
...

NORME ISO
INTERNATIONALE 24624
Première édition
2016-08-15
Gestion des ressources
linguistiques — Transcription du
langage parlé
Language resource management — Transcription of spoken language
Numéro de référence
ISO 24624:2016(F)
©
ISO 2016

---------------------- Page: 1 ----------------------
ISO 24624:2016(F)

DOCUMENT PROTÉGÉ PAR COPYRIGHT
© ISO 2016, Publié en Suisse
Droits de reproduction réservés. Sauf indication contraire, aucune partie de cette publication ne peut être reproduite ni utilisée
sous quelque forme que ce soit et par aucun procédé, électronique ou mécanique, y compris la photocopie, l’affichage sur
l’internet ou sur un Intranet, sans autorisation écrite préalable. Les demandes d’autorisation peuvent être adressées à l’ISO à
l’adresse ci-après ou au comité membre de l’ISO dans le pays du demandeur.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO 2016 – Tous droits réservés

---------------------- Page: 2 ----------------------
ISO 24624:2016(F)

Sommaire Page
Avant-propos .iv
Introduction .v
1 Domaine d’application . 1
2 Références normatives . 1
3 Termes et définitions . 1
4 Métadonnées . 2
4.1 Description du fichier électronique () . 2
4.1.1 Informations de diffusion ( ) . 2
4.1.2 Informations sur l’enregistrement () . 3
4.2 Description des circonstances () . 4
4.2.1 Informations sur les participants () . 4
4.2.2 Informations sur le contexte () . 5
4.3 Description de la source () . 6
5 Macrostructure . 6
5.1 Frise chronologique () . 6
5.2 Énoncés () . 7
5.3 Annotations libres et dépendantes (,) . 8
5.4 Regroupement des énoncés et des annotations dépendantes () .10
5.5 Éléments indépendants hors énoncé ( et ) .11
5.6 Annotations paralinguistiques en ligne () .11
5.7 Divisions globales d’une transcription (

).12
6 Microstructure .13
6.1 Token () .13
6.1.1 Caractérisation .13
6.1.2 Représentation comme .13
6.1.3 Autres contraintes .14
6.1.4 Exemples .14
6.2 Pauses () .15
6.2.1 Caractérisation .15
6.2.2 Représentation comme .16
6.2.3 Autres contraintes .16
6.2.4 Exemples .16
6.3 Événements audibles et visibles ne relevant pas du discours (, et
) .17
6.3.1 Caractérisation .17
6.3.2 Représentation comme , ou .17
6.3.3 Exemples .18
6.4 Ponctuation ().19
6.4.1 Caractérisation .19
6.4.2 Représentation comme .19
6.4.3 Autres contraintes .19
6.4.4 Exemples .19
6.5 Incertitude, alternatives, passages incompréhensibles et omis (,
, ) .20
6.5.1 Caractérisation .20
6.5.2 Représentation en tant que ou .20
6.5.3 Autres contraintes .20
6.5.4 Exemples .20
6.6 Unités au-dessus du token et en dessous du niveau ().22
6.6.1 Caractérisation .22
6.6.2 Représentation comme .22
6.6.3 Autres contraintes .22
© ISO 2016 – Tous droits réservés iii

---------------------- Page: 3 ----------------------
ISO 24624:2016(F)ISO 24624:2016(F)

Avant-propos
L’ISO (Organisation internationale de normalisation) est une fédération mondiale d’organismes
nationaux de normalisation (comités membres de l’ISO). L’élaboration des Normes internationales est
en général confiée aux comités techniques de l’ISO. Chaque comité membre intéressé par une étude
a le droit de faire partie du comité technique créé à cet effet. Les organisations internationales,
gouvernementales et non gouvernementales, en liaison avec l’ISO participent également aux travaux.
L’ISO collabore étroitement avec la Commission électrotechnique internationale (IEC) en ce qui
concerne la normalisation électrotechnique.
Les procédures utilisées pour élaborer le présent document et celles destinées à sa mise à jour sont
décrites dans les Directives ISO/IEC, Partie 1 Il convient, en particulier, de prendre note des différents
critères d’approbation requis pour les différents types de documents ISO. Le présent document a été
rédigé conformément aux règles de rédaction données dans les Directives ISO/IEC, Partie 2 (voir www.
iso.org/directives).
L’attention est appelée sur le fait que certains des éléments du présent document peuvent faire l’objet de
droits de propriété intellectuelle ou de droits analogues. L’ISO ne saurait être tenue pour responsable
de ne pas avoir identifié de tels droits de propriété et averti de leur existence. Les détails concernant
les références aux droits de propriété intellectuelle ou autres droits analogues identifiés lors de
l’élaboration du document sont indiqués dans l’Introduction et/ou dans la liste des déclarations de
brevets reçues par l’ISO (voir www.iso.org/brevets).
Les appellations commerciales éventuellement mentionnées dans le présent document sont données
pour information, par souci de commodité, à l’intention des utilisateurs et ne sauraient constituer un
engagement.
Pour une explication de la signification des termes et expressions spécifiques de l’ISO liés à l’évaluation
de la conformité, ou pour toute information au sujet de l’adhésion de l’ISO aux principes de l’Organisation
mondiale du commerce (OMC) concernant les obstacles techniques au commerce (OTC), voir le lien
suivant: www.iso.org/iso/fr/avant-propos.html.
Le présent document a été élaboré par le comité technique ISO/TC 37, Terminologie et autres ressources
langagières et ressources de contenu, sous-comité SC 4, Gestion des ressources linguistiques.
iviv  © ISO 2016 – T© ISO 2016 – Tous drous droits roits réservéservésés

---------------------- Page: 4 ----------------------
ISO 24624:2016(F)

Introduction
Le présent document vise à faciliter l’échange de transcriptions du langage parlé entre différents outils
et environnements informatiques de création, de révision, de publication et d’exploitation de telles
données. La transcription du langage parlé dans ce contexte implique une transcription orthographique
de l’activité verbale telle qu’elle figure dans un enregistrement audio ou vidéo d’une interaction
naturelle. La description de l’activité selon d’autres modalités (par exemple, langage corporel, gestes et
expressions faciales) peut faire partie intégrante d’une transcription du langage parlé, mais ce document
part du principe que la composante verbale est l’objet premier d’une transcription du langage parlé. De
la même façon, bien que ce document puisse s’avérer pertinent pour une transcription en alphabets
phonétiques comme l’API, ce document repose sur l’hypothèse que la transcription orthographique est
le cas par défaut.
Le présent document est élaboré dans le cadre de l’accord commun entre l’ISO et le Text Encoding
Initiative (TEI) Consortium et, par conséquent, son contenu figure également dans les recommandations
[23]
de la TEI .
Le présent document tient compte des modèles de données et des pratiques d’encodage pris en charge
par des logiciels de transcription d’utilisation courante. Plus précisément, il s’appuie sur plusieurs
[12][16][17][19]
études d’interopérabilité portant sur les outils suivants:
[10]
— ANVIL
[11]
— CLAN
[22]
— ELAN
[20]
— EXMARaLDA
[18]
— FOLKER
[1]
— Transcriber
Le présent document a été élaboré pour être compatible avec les formats créés par ces outils. La
[4]
compatibilité peut s’étendre aux formats d’autres outils d’étiquetage (par exemple, Praat ou
Wavesurfer, http://www.speech.kth.se/wavesurfer/index2.html), mais peut-être à un niveau moindre
et/ou avec la nécessité de convertir ces formats dans l’un des formats ci-dessus mentionnés avant
d’ajouter des informations obligatoires (par exemple, assignation des locuteurs) à l’aide des outils
respectifs.
Le présent document a aussi pour objet d’être utilisé avec des systèmes de transcription d’utilisation
courante («conventions»). Cependant, sur un plan technique, la compatibilité n’est pas facile à définir
dans ce domaine puisque, à la différence des formats logiciels, la plupart de ces systèmes manquent de
formalisation explicite. Pour l’élaboration du présent document, les systèmes de transcription suivants
ont été pris en compte:
[11]
— Codes for the Human Analysis of Transcripts (CHAT)
[7]
— Discourse Transcription (DT)
[21]
— Gesprächsanalytisches Transkriptionssystem (GAT)
[13]
— Halbinterpretative Arbeitstranskriptionen (HIAT)
Puisque la TEI est le cadre de référence du présent document et que les métadonnées ne constituent
pas sa priorité, il n’est nullement question ici de traiter des questions de compatibilité des métadonnées
allant au-delà de l’en-tête TEI. Cependant, il convient de noter qu’il existe plusieurs profils TEI pour le
cadre CMDI qui sont reliés les uns aux autres et aux profils CMDI d’autres formats de métadonnées (par
exemple, IMDI) par l’intermédiaire du registre ISOCAT (voir aussi Références [5], [6] et [9]).
© ISO 2016 – Tous droits réservés v

---------------------- Page: 5 ----------------------
ISO 24624:2016(F)

Le présent document vise à définir tant un format cible pour la conversion des données héritées qu’un
format adapté aux exigences futures de traitement des données. Les décisions n’ont été prises qu’après
avoir soigneusement pesé les avantages et les inconvénients de ces deux exigences. Par conséquent,
en quelques endroits, certaines techniques sont indiquées comme étant recommandées d’un point de
vue de traitement des données, cependant qu’une technique alternative est toujours autorisée si la
structure des données héritées rend son utilisation incontournable.
En ce qui concerne les autres normes élaborées au sein du Comité ISO TC 37/SC 4, le présent document a
pour objet la mise en place d’une première couche sur laquelle pourront se superposer d’autres couches
d’annotations. L’utilisation de l’élément pour la tokénisation d’une transcription, notamment, est
conforme à la représentation TEI des token de l’ISO 24611 (MAF).
Le présent document s’aligne également sur les mécanismes proposés dans les recommandations de la
TEI pour intégrer les annotations déportées à un document TEI. Ce mécanisme comporte notamment
un élément générique () qui regroupe les annotations relatives au même segment
linguistique: ce regroupement répond aux besoins du présent document dans le cas d’annotations de
l’élément ou de ses enfants.
Enfin, le présent document constitue un document complémentaire: il n’empiète pas sur les normes
relatives aux interactions orales et multimodales élaborées au sein du W3C. Il ne traite pas, notamment,
[24]
de la synthèse de la parole, comme dans le cas de la SSML, ni de la représentation de l’interprétation
[25]
sémantique des énoncés multimodaux comme l’EMMA.
vi © ISO 2016 – Tous droits réservés

---------------------- Page: 6 ----------------------
NORME INTERNATIONALE ISO 24624:2016(F)
Gestion des ressources linguistiques — Transcription du
langage parlé
1 Domaine d’application
Le présent document énonce des règles de représentation des transcriptions d’enregistrements audio
et vidéo d’interactions parlées, dans des documents XML reposant sur les recommandations de la TEI.
Le deuxième objectif de ce document vise à rattacher les données transcrites à des normes de corpus
annotés. Il s’applique aux données de transcription pour des études sociolinguistiques, l’analyse de
conversation, la dialectologie, la linguistique de corpus, la lexicographie de corpus, les technologies
langagières, les études qualitatives en sciences sociales, et aux autres données de transcription
d’enregistrements du langage parlé. Il ne s’applique pas aux autres formes de transcription et surtout
pas aux transcriptions de manuscrits.
L’Annexe A présente un exemple d’encodage complet et l’Annexe B fournit un index des éléments et un
index des attributs.
2 Références normatives
Le présent document ne contient aucune référence normative.
3 Termes et définitions
Pour les besoins du présent document, les termes et définitions suivants s’appliquent.
L’ISO et l’IEC tiennent à jour des bases de données terminologiques destinées à être utilisées en
normalisation, consultables aux adresses suivantes:
— IEC Electropedia: disponible à l’adresse http://www.electropedia.org/
— ISO Online browsing platform: disponible à l’adresse http://www.iso.org/obp
3.1
annotation dépendante
annotation qui ne renvoie pas directement à un enregistrement audio ou vidéo, mais à une autre
annotation, généralement une transcription orthographique ou phonétique
3.2
élément de bornage
élément XML vide servant à indiquer un point de délimitation
3.3
transcription orthographique
représentation ou modélisation du langage parlé reposant sur l’orthographe dudit langage
3.4
caractéristique paralinguistique
caractéristique du langage parlé, au-delà du ou des sons proprement dits, comme la qualité de la voix, sa
tonalité, son volume ou son intonation
3.5
transcription phonétique
représentation ou modélisation du langage parlé reposant sur le système phonologique dudit langage
© ISO 2016 – Tous droits réservés 1

---------------------- Page: 7 ----------------------
ISO 24624:2016(F)

3.6
langage parlé
langage oral produit par la voix humaine
3.7
transcripteur
personne qui réalise la transcription
3.8
transcription
représentation ou modélisation d’un langage parlé au moyen de symboles scripturaux
3.9
système de transcription
ensemble de principes et de règles fondés sur une base théorique, détaillant les phénomènes du langage
parlé qui doivent être transcrits, ainsi que la façon de procéder à la transcription
4 Métadonnées
Les recommandations de la TEI donnent des indications détaillées d’encodage des métadonnées dans
différentes sous-sections de l’élément . La section suivante ne traite que des métadonnées
qui sont soit (i) essentielles pour assurer le caractère interprétable et échangeable de transcriptions
de langage parlé en général, soit (ii) susceptibles de s’avérer pertinentes dans une grande majorité
de cas. Cela n’exclut pas la possibilité ou la nécessité d’encoder d’autres métadonnées dans l’élément
.
4.1 Description du fichier électronique ()
4.1.1 Informations de diffusion ( )
Il convient d’utiliser l’élément dans la section de
pour enregistrer les informations relatives aux droits d’accès et aux coordonnées de contact pour la
transcription en question.
EXEMPLE 1 Utilisation de

  Hamburger Zentrum für Sprachkorpora
 
   
   

Accès libre à des fins de recherche et d’enseignement.
     Aucune rediffusion autorisée. 


 
  Hamburger Zentrum für Sprachkorpora
 

   Max Brauer-Allee 60
   22765
   Hamburg
   Germany
 


2 © ISO 2016 – Tous droits réservés

---------------------- Page: 8 ----------------------
ISO 24624:2016(F)

4.1.2 Informations sur l’enregistrement ()
Il convient d’utiliser l’élément dans la section de pour
enregistrer les informations relatives aux enregistrements transcrits. Il convient de décrire dans cet
élément uniquement le ou les enregistrements proprement dits, généralement des fichiers numériques
audio et/ou vidéo. Il convient de décrire les informations d’ordre général portant sur l’interaction
considérée, qui sont indépendantes de (des) enregistrement(s), dans l’élément
(voir 4.2.2).
Il convient d’utiliser un élément dans un élément pour renvoyer au fichier
numérique correspondant par l’intermédiaire d’un attribut @url (voir Référence [2]). Il convient
d’assigner un attribut @type à pour indiquer le type de média de l’enregistrement: les
valeurs autorisées pour cet attribut sont «audio» et «video». Il convient d’encoder le type véritable
du fichier numérique comme attribut @mimeType (voir Référence [8]) assigné à l’élément .
Lorsqu’au moins deux fichiers sont obtenus à partir du même enregistrement maître (par exemple, un
fichier vidéo ou un extrait de piste audio), il convient que lesdits fichiers soient représentés sous forme
d’éléments différents dans le même élément , plutôt que comme des éléments
différents. Des mécanismes de liaison TEI, tels que ou @corresp, peuvent être
utilisés pour décrire des relations entre différents enregistrements ou entre des enregistrements et
d’autres éléments, comme les locuteurs.
EXEMPLE 2 Utilisation de



  
    
    
    
      Parkinson Talkshow sur la BBC, émission du 02 novembre 2007
    
     gistrement -–>
     sera -–>
     ex. Camcorder) –->
    
      Extrait vidéo téléchargé sur YouTube avec aTube-Catcher, converti
        au format MPG avec Adobe Premiere
      Piste audio extraite de la vidéo avec Audacity 1.3 beta
    
  





  
    
    
      Enregistré avec un micro enregistreur portatif ZOOM H4NSP
© ISO 2016 – Tous droits réservés 3

---------------------- Page: 9 ----------------------
ISO 24624:2016(F)

       fixé à la robe de Victoria Beckham persName>
      Synchronisé avec l’enregistrement de
David Beckham
    
  
  
    
    
      Enregistré avec un micro enregistreur portatif ZOOM H4NSP
      Fixé au col de chemise
      de David Beckham
      Synchronisé avec
      l’enregistrement de Victoria Beckham
    
  

4.2 Description des circonstances ()
4.2.1 Informations sur les participants ()
Il convient de décrire les participants à l’interaction transcrite dans des éléments de la
section d’un élément . L’utilisation d’un attribut @n assigné à l’élément
pour définir un code abrégé représentant le participant concerné est obligatoire, car il
peut être indispensable pour répondre à de nombreux objectifs de traitement. Des éléments
dans le corps de la transcription renvoient à l’attribut @xml:id d’un élément qui doit, par
conséquent, être toujours prévu.
Afin de fournir des métadonnées supplémentaires sur les participants, il est possible d’exploiter la
totalité du modèle de contenu de , par exemple pour enregistrer l’âge, la date de naissance, le
niveau linguistique ou le rôle d’une personne dans la conversation enregistrée.
EXEMPLE 3 Utilisation de
4 © ISO 2016 – Tous droits réservés

---------------------- Page: 10 ----------------------
ISO 24624:2016(F)


 
  
    Daniel
    Steward
  
  
  
  
    anglais britannique
    français
  
  
 
 
  
    Fiona
    Baker
  
  
 

4.2.2 Informations sur le contexte ()
Il convient d’utiliser l’élément pour fournir des informations d’ordre général sur
le contexte et les circonstances de l’interaction. Cela inclut des aspects tels que l’endroit et l’heure,
l’organisation spatiale et les artéfacts de l’interaction. Il convient que les informations concernant un
enregistrement spécifique de cette interaction ne soient pas enregistrées dans cet élément, mais dans
l’élément (voir 4.1.2).
EXEMPLE 4 Utilisation de

 
   studio de la BBC Londres
 
 
   Animateur du talkshow Michael Parkinson interviewant David et
Victoria
        Beckham au sujet de leur relation
 
 

© ISO 2016 – Tous droits réservés 5

---------------------- Page: 11 ----------------------
ISO 24624:2016(F)

4.3 Description de la source ()
On utilise l’élément pour enregistrer des informations sur la façon dont on obtient,
à partir d’une source enregistrée, le texte encodé selon la TEI. Cela comprend les informations tant
sur l’outil qui a produit la transcription, dans un élément , que la convention utilisée pour
transcrire les données, dans un élément . Il convient d’assigner les attributs @
ident et @version à ces éléments pour permettre l’accès à ces informations via un procédé lisible par
machine.
EXEMPLE 5 Utilisation de

 
   
   
   
    
     Outil de transcription avec exportation TEI
   
 
 
 
    Transcription orthographi
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.