SIST ISO 21636-3:2024
(Main)Language coding — A framework for language varieties — Part 3: Application of the framework
Language coding — A framework for language varieties — Part 3: Application of the framework
The ISO 21636 series provides a framework for the identification and description of varieties of all individual human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language (such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely combined into complex expressions).
This document gives guidance on how to apply the framework to identify basic dimensions and sub-dimensions of linguistic variation and the resulting varieties, including major modalities of human communication. It does not include any code or individual identifiers.
This document is structured strictly analogously to ISO/TR 21636-2. For a general description of the dimension and varieties dealt with in each clause, the user can refer to the corresponding clause in that document.
This document focuses only on the identification and description of language varieties, not on the general, formal or technical aspects of the description of human language resources (LRs), which are covered by general metadata frameworks.
NOTE 1 For the general description of a language resource, a user can minimally apply at least the metadata of the Open Language Archives Community (OLAC) metadata standard, which provides an application of the Dublin Core metadata element set as defined by the Dublin Core Metadata Initiative (DCMI). These descriptors have been recognized in ISO 15836-1:2017.
NOTE 2 The Component Metadata Infrastructure (CMDI) provides a best practice guide for the sake of technical and content interoperability between LRs as well as of their sustainability.
Codage des langues — Identification et description des variétés de langues — Partie 3: Exigences et recommandations pour la mise en œuvre
Jezikovno kodiranje - Ogrodje za jezikovne različice - 3. del: Uporaba ogrodja
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
01-november-2024
Jezikovno kodiranje - Ogrodje za jezikovne različice - 3. del: Uporaba ogrodja
Language coding — A framework for language varieties — Part 3: Application of the
framework
Codage des langues — Identification et description des variétés de langues — Partie 3:
Exigences et recommandations pour la mise en œuvre
Ta slovenski standard je istoveten z: ISO 21636-3:2024
ICS:
01.140.20 Informacijske vede Information sciences
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
International
Standard
ISO 21636-3
First edition
Language coding — A framework
2024-06
for language varieties —
Part 3:
Application of the framework
Codage des langues — Identification et description des variétés
de langues —
Partie 3: Exigences et recommandations pour la mise en œuvre
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and abbreviated terms . 2
3.1 Terms and definitions .2
3.2 Abbreviated terms .2
4 Indication of language varieties according to the dimensions of linguistic variation . 2
4.1 Overview .2
4.2 Indication of individual language varieties .3
4.3 Indication of the (geographical) space dimension of linguistic variation .4
4.4 Indication of the time dimension of linguistic variation .4
4.5 Indication of the social group dimension of linguistic variation .5
4.6 Indication of the medium dimension of linguistic variation .5
4.7 Indication of the situation dimension of linguistic variation .7
4.8 Indication of the person dimension of linguistic variation .8
4.9 Indication of the proficiency dimension of linguistic variation .8
4.10 Indication of the communicative functioning dimension of linguistic variation .8
Bibliography .10
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 2, Terminology workflow and language coding.
A list of all parts of the ISO 21636 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
An increasing amount of digital language resources (LRs) are being created (including via retro-digitization),
archived, processed and analysed. Within this context, the detailed and exact characterization of language
varieties present in a given language use event is quickly gaining importance. Here, language use includes
all modalities such as written, spoken, or signed, and also new forms of language use supported by digital
technology (in social media and similar forms of digital communication). Such modalities demonstrate one
way in which languages vary internally. Others include, for instance, familiar regional (dialectal) and social
variation.
In the past, a primary goal of working with LRs was the archiving and preservation of LRs. However, new
goals have now emerged and are still emerging:
— Institutions and individuals need to exchange metadata (i.e. bibliographic description data and other
secondary information) for making the information on existing LRs widely available in a harmonized form.
— Researchers are identifying primary data (i.e. the LRs themselves) for various research purposes,
including research on linguistic variation.
— Researchers and developers need LRs for the development of more advanced language technologies (LTs)
and for testing purposes, because LTs, in particular those concerning speech recognition and language
analysis, are entering more dimensions of human communication.
In order to achieve the above-mentioned goals and purposes, along with others not outlined in the
ISO 21636 series, a standardized set of metadata for the identification of language varieties is important
for guaranteeing the frictionless exchange of secondary information. Well-organized metadata also help
to indicate the degree of interoperability (equalling re-usability and re-purposability of LRs), and the
applicability of LTs to different situations or LRs over time. These metadata are applicable in eBusiness,
eHealth, eGovernment, eInclusion, eLearning, smart environments, ambient assisted living (AAL), and
virtually all other information-rich applications which depend on information about LRs. A clear metadata
approach is also a prerequisite for the durability of LR archiving (in particular in the case of cultural heritage
and scientific research data).
ISO 639 provides a framework for identifying the individual languages used in an LR. The ISO 21636 series
presupposes and complements ISO 639 in that it extends the language coding framework in order to allow for
the identification of different types of language varieties (e.g. geographical, social, modal). The identification
of language varieties can then be included in general metadata, library metadata and archival metadata for
describing LRs (which may also include technical information, time and location of recording, and similar
general information, which are not included in the ISO 21636 series).
The conceptual framework developed in this document for dealing with linguistic variation respects the
major approaches represented in the linguistic literature without simply reproducing them. The framework
is closest though in general orientation and in a number of details, such as the role assigned to idiolects, to
[6]
work of a type represented by Lieb .
The metadata categories and values addressed in this document can be candidates for a future fine-grained
coding of language varieties based on the comprehensive principles of the ISO 21636 series. Thus, this
document fits within the general framework of the ISO/IEC 11179 series for metadata.
Stakeholders include, but are not limited to:
— information and communication technologies (ICTs) industry (including LTs);
— libraries;
— the media industry (including entertainment);
— internet communities;
— people engaging in language documentation and preservation;
v
— language archivists;
— researchers (linguists, in particular sociolinguists, ethnologists, sociologists, etc.);
— people and institutions providing language training;
— emerging new user communities.
It is anticipated that these stakeholders will need to refer not only to a certain individual language, but also
to a certain language variety, for instance for oral human-computer interaction, or for tailoring a certain
LR or LT to the needs and specific environment of a target user group. An initial step towards achieving
the needed specificity involves the ability to identify the dimension(s) of linguistic variation internal to
individual languages involved, and the respective relevant language varieties. A conceptually sound uniform
framework of reference as developed in the ISO 21636 series is superior to the proliferation of different
individual ad-hoc solutions.
vi
International Standard ISO 21636-3:2024(en)
Language coding — A framework for language varieties —
Part 3:
Application of the framework
1 Scope
The ISO 21636 series provides a framework for the identification and description of varieties of all individual
human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language
(such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely
combined into complex expressions).
This document gives guidance on how to apply the framework to identify basic dimensions and sub-
dimensions of linguistic variation and the resulting varieties, including major modalities of human
communication. It does not include any code or individual identifiers.
This document is structured strictly analogously to ISO/TR 21636-2. For a general description of the
dimension and varieties dealt with in each clause, the user can refer to the corresponding clause in that
document.
This document focuses only on the identification and description of language varieties, not on the general,
formal or technical aspects of the description of human language resources (LRs), which are covered by
general metadata frameworks.
NOTE 1 For the general description of a language resource, a user can minimally apply at least the metadata of
[7]
the Open Language Archives Community (OLAC) metadata standard , which provides an application of the Dublin
[8]
...
International
Standard
ISO 21636-3
First edition
Language coding — A framework
2024-06
for language varieties —
Part 3:
Application of the framework
Codage des langues — Identification et description des variétés
de langues —
Partie 3: Exigences et recommandations pour la mise en œuvre
Reference number
© ISO 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and abbreviated terms . 2
3.1 Terms and definitions .2
3.2 Abbreviated terms .2
4 Indication of language varieties according to the dimensions of linguistic variation . 2
4.1 Overview .2
4.2 Indication of individual language varieties .3
4.3 Indication of the (geographical) space dimension of linguistic variation .4
4.4 Indication of the time dimension of linguistic variation .4
4.5 Indication of the social group dimension of linguistic variation .5
4.6 Indication of the medium dimension of linguistic variation .5
4.7 Indication of the situation dimension of linguistic variation .7
4.8 Indication of the person dimension of linguistic variation .8
4.9 Indication of the proficiency dimension of linguistic variation .8
4.10 Indication of the communicative functioning dimension of linguistic variation .8
Bibliography .10
iii
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out through
ISO technical committees. Each member body interested in a subject for which a technical committee
has been established has the right to be represented on that committee. International organizations,
governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely
with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of ISO document should be noted. This document was drafted in accordance with the editorial rules of the
ISO/IEC Directives, Part 2 (see www.iso.org/directives).
ISO draws attention to the possibility that the implementation of this document may involve the use of (a)
patent(s). ISO takes no position concerning the evidence, validity or applicability of any claimed patent
rights in respect thereof. As of the date of publication of this document, ISO had not received notice of (a)
patent(s) which may be required to implement this document. However, implementers are cautioned that
this may not represent the latest information, which may be obtained from the patent database available at
www.iso.org/patents. ISO shall not be held responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO’s adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 37, Language and terminology, Subcommittee
SC 2, Terminology workflow and language coding.
A list of all parts of the ISO 21636 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
iv
Introduction
An increasing amount of digital language resources (LRs) are being created (including via retro-digitization),
archived, processed and analysed. Within this context, the detailed and exact characterization of language
varieties present in a given language use event is quickly gaining importance. Here, language use includes
all modalities such as written, spoken, or signed, and also new forms of language use supported by digital
technology (in social media and similar forms of digital communication). Such modalities demonstrate one
way in which languages vary internally. Others include, for instance, familiar regional (dialectal) and social
variation.
In the past, a primary goal of working with LRs was the archiving and preservation of LRs. However, new
goals have now emerged and are still emerging:
— Institutions and individuals need to exchange metadata (i.e. bibliographic description data and other
secondary information) for making the information on existing LRs widely available in a harmonized form.
— Researchers are identifying primary data (i.e. the LRs themselves) for various research purposes,
including research on linguistic variation.
— Researchers and developers need LRs for the development of more advanced language technologies (LTs)
and for testing purposes, because LTs, in particular those concerning speech recognition and language
analysis, are entering more dimensions of human communication.
In order to achieve the above-mentioned goals and purposes, along with others not outlined in the
ISO 21636 series, a standardized set of metadata for the identification of language varieties is important
for guaranteeing the frictionless exchange of secondary information. Well-organized metadata also help
to indicate the degree of interoperability (equalling re-usability and re-purposability of LRs), and the
applicability of LTs to different situations or LRs over time. These metadata are applicable in eBusiness,
eHealth, eGovernment, eInclusion, eLearning, smart environments, ambient assisted living (AAL), and
virtually all other information-rich applications which depend on information about LRs. A clear metadata
approach is also a prerequisite for the durability of LR archiving (in particular in the case of cultural heritage
and scientific research data).
ISO 639 provides a framework for identifying the individual languages used in an LR. The ISO 21636 series
presupposes and complements ISO 639 in that it extends the language coding framework in order to allow for
the identification of different types of language varieties (e.g. geographical, social, modal). The identification
of language varieties can then be included in general metadata, library metadata and archival metadata for
describing LRs (which may also include technical information, time and location of recording, and similar
general information, which are not included in the ISO 21636 series).
The conceptual framework developed in this document for dealing with linguistic variation respects the
major approaches represented in the linguistic literature without simply reproducing them. The framework
is closest though in general orientation and in a number of details, such as the role assigned to idiolects, to
[6]
work of a type represented by Lieb .
The metadata categories and values addressed in this document can be candidates for a future fine-grained
coding of language varieties based on the comprehensive principles of the ISO 21636 series. Thus, this
document fits within the general framework of the ISO/IEC 11179 series for metadata.
Stakeholders include, but are not limited to:
— information and communication technologies (ICTs) industry (including LTs);
— libraries;
— the media industry (including entertainment);
— internet communities;
— people engaging in language documentation and preservation;
v
— language archivists;
— researchers (linguists, in particular sociolinguists, ethnologists, sociologists, etc.);
— people and institutions providing language training;
— emerging new user communities.
It is anticipated that these stakeholders will need to refer not only to a certain individual language, but also
to a certain language variety, for instance for oral human-computer interaction, or for tailoring a certain
LR or LT to the needs and specific environment of a target user group. An initial step towards achieving
the needed specificity involves the ability to identify the dimension(s) of linguistic variation internal to
individual languages involved, and the respective relevant language varieties. A conceptually sound uniform
framework of reference as developed in the ISO 21636 series is superior to the proliferation of different
individual ad-hoc solutions.
vi
International Standard ISO 21636-3:2024(en)
Language coding — A framework for language varieties —
Part 3:
Application of the framework
1 Scope
The ISO 21636 series provides a framework for the identification and description of varieties of all individual
human languages (see ISO 639).
It is applicable to sign languages.
It does not apply to:
— artificial means of communication with or between machines (such as programming languages);
— those means of human communication which are neither fully nor largely equivalent to human language
(such as sets of individual symbols or gestures that each carry isolated meanings but cannot be freely
combined into complex expressions).
This document gives guidance on how to apply the framework to identify basic dimensions and sub-
dimensions of linguistic variation and the resulting varieties, including major modalities of human
communication. It does not include any code or individual identifiers.
This document is structured strictly analogously to ISO/TR 21636-2. For a general description of the
dimension and varieties dealt with in each clause, the user can refer to the corresponding clause in that
document.
This document focuses only on the identification and description of language varieties, not on the general,
formal or technical aspects of the description of human language resources (LRs), which are covered by
general metadata frameworks.
NOTE 1 For the general description of a language resource, a user can minimally apply at least the metadata of
[7]
the Open Language Archives Community (OLAC) metadata standard , which provides an application of the Dublin
[8]
Core metadata element set as defined by the Dublin Core Metadata Initiative (DCMI) . These descriptors have been
recognized in ISO 15836-1:2017.
[9] [10]
NOTE 2 The Component Metadata Infrastructure (CMDI) provides a best practice guide for the sake of
technical and content interoperability between LRs as well as of their sustainability.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO 15924, Information and documentation — Codes for the representation of names of scripts
ISO 21636-1, Language coding — A framework for language varieties — Part 1: Vocabulary
----------------
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.