Information and documentation — Codes for written language conversion systems

This document provides principles for establishing codes for the representation of written language conversion systems. The codes are devised for usage in any application requiring the expression of written language conversion systems, including transliteration and romanization systems, in coded form.

Information et documentation — Codes pour les systèmes de conversion des langues écrites

General Information

Status
Published
Publication Date
07-Nov-2022
Current Stage
6060 - International Standard published
Start Date
08-Nov-2022
Due Date
21-Jun-2022
Completion Date
08-Nov-2022
Ref Project

Buy Standard

Standard
ISO 24229:2022 - Information and documentation — Codes for written language conversion systems Released:8. 11. 2022
English language
17 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO
STANDARD 24229
First edition
2022-11
Information and documentation —
Codes for written language conversion
systems
Information et documentation — Codes pour les systèmes de
conversion des langues écrites
Reference number
ISO 24229:2022(E)
© ISO 2022

---------------------- Page: 1 ----------------------
ISO 24229:2022(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
  © ISO 2022 – All rights reserved

---------------------- Page: 2 ----------------------
ISO 24229:2022(E)
Contents Page
Foreword .v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Conversion system codes . 3
4.1 Structure of conversion system codes. 3
4.1.1 General . 3
4.1.2 Construction of the conversion system code . 3
4.1.3 Titular segment . 3
4.1.4 Source spelling system segment . 4
4.1.5 Target spelling system segment . 4
4.1.6 Identifying segment . 4
4.2 Requirements for new conversion system codes. 4
4.3 Deprecation of conversion system codes . 5
4.4 User assigned conversion system codes . 5
4.5 Capitalization of conversion system codes . 5
4.6 Abbreviated conversion system codes . 5
4.7 Examples of conversion system codes. 5
5 Conversion system authority . 6
5.1 General . 6
5.2 Requirements . 6
5.2.1 General . 6
5.2.2 Inactive authorities . 6
5.2.3 Varia authorities . 6
5.2.4 Competency . 6
5.3 Registration . 7
5.4 Conversion system authority identifiers . 7
5.4.1 Principles for construction of identifiers . 7
5.4.2 Examples of conversion system authority identifiers. 8
6 Data model and attributes .8
6.1 Common data model and attributes . 8
6.1.1 General . 8
6.1.2 Data models . 8
6.1.3 Usage of ISO 15924 code elements . 8
6.1.4 Usage of ISO 639 code elements . 8
6.1.5 Usage of ISO 3166 code elements . 8
6.1.6 Usage of ISO 8601 expressions . 8
6.2 System authority data model and attributes . 9
6.2.1 Diagram . 9
6.2.2 Conversion system authority . 9
6.2.3 Authority identifier . 9
6.3 Conversion system data model and attributes . 10
6.3.1 Diagram . 10
6.3.2 Written language conversion system . 10
6.3.3 Spelling system . 11
6.3.4 Conversion system relation . 11
6.3.5 Conversion system code status . 11
6.3.6 Conversion system status . 11
6.3.7 Conversion system relation type .12
Annex A (normative) Registration authority .13
iii
© ISO 2022 – All rights reserved

---------------------- Page: 3 ----------------------
ISO 24229:2022(E)
Bibliography .17
iv
  © ISO 2022 – All rights reserved

---------------------- Page: 4 ----------------------
ISO 24229:2022(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 46, Information and documentation.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
© ISO 2022 – All rights reserved

---------------------- Page: 5 ----------------------
ISO 24229:2022(E)
Introduction
A number of international applications require the identification of written language conversion
systems, including for terminology, lexicography, bibliography, and linguistics, especially for reverse
transliteration, computational linguistics and machine pronunciation.
This document sets out the necessary procedures to maintain the registry of written language
conversion systems.
The chosen term “written language conversion” is intended to refer to all types of conversions, i.e.
transformations of written texts from one spelling system to another. It thus includes both script
conversion (change of script: transliteration, transcription) and conversion of texts without changing
the script (e.g. transcription of foreign names or words using the alphabet of a target language, change
of the orthography in a language, etc.). For the sake of compactness of expression, “written language
conversion” has been shortened to “conversion” in this document where it does not cause ambiguity.
vi
  © ISO 2022 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO 24229:2022(E)
Information and documentation — Codes for written
language conversion systems
1 Scope
This document provides principles for establishing codes for the representation of written language
conversion systems.
The codes are devised for usage in any application requiring the expression of written language
conversion systems, including transliteration and romanization systems, in coded form.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639-2, Codes for the representation of names of languages — Part 2: Alpha-3 code
ISO 639-3, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive
coverage of languages
ISO 639-5, Codes for the representation of names of languages — Part 5: Alpha-3 code for language families
and groups
ISO 3166-1, Codes for the representation of names of countries and their subdivisions — Part 1: Country
code
ISO 5127, Information and documentation — Foundation and vocabulary
ISO 8601 (all parts), Date and time — Representations for information interchange
ISO 15924, Information and documentation — Codes for the representation of names of scripts
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 5127 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org
3.1
script
particular graphic representation or class of representations of a set of characters used to write one or
more languages
[SOURCE: ISO 5127:2017, 3.1.6.02]
1
© ISO 2022 – All rights reserved

---------------------- Page: 7 ----------------------
ISO 24229:2022(E)
3.2
spelling system
set of rules governing the orthography of a language
Note 1 to entry: Typically, a spelling system defines how the spoken form of a language is represented in writing.
Several languages have undergone orthographic reforms which means they have had different spelling systems.
3.3
natural language
language which is or was in active use in a community of people, and the rules of which are mainly
deduced from the usage
[SOURCE: ISO 5127:2017, 3.1.5.02]
3.4
character
member of a set of elements that is used for the representation, organization, or control of data
[SOURCE: ISO 5127:2017, 3.1.4.02]
3.5
written language
natural language (3.3) realized through the writing of characters (3.4)
[SOURCE: ISO 5127:2017, 3.1.5.04]
3.6
written language conversion
process whereby one spelling system (3.2) is converted into another spelling system
Note 1 to entry: This is a general term that includes script conversion but also, e.g. cases when a language changes
its orthography without changing the script.
3.7
transliteration
process which consists of representing the characters of an alphabetical or syllabic system of writing
by the characters of a conversion alphabet
3.8
transcription
process whereby the sounds of a given language are noted by the system of signs of a conversion
language
3.9
romanization
script conversion from non-Roman to Roman script (3.1) by means of transliteration (3.7), transcription
(3.8) or both
[SOURCE: ISO 5127:2017, 3.1.6.14]
3.10
written language conversion system
set of rules for written language conversion (3.6)
3.11
language code
combination of characters used to represent the name of a language or languages
[SOURCE: ISO 5127:2017, 3.2.5.14]
2
  © ISO 2022 – All rights reserved

---------------------- Page: 8 ----------------------
ISO 24229:2022(E)
3.12
script code
combination of characters used to represent the name of a script (3.1)
[SOURCE: ISO 15924:2004, 3.8]
3.13
conversion system code
combination of characters used in a structured way to represent a written language conversion system
(3.10)
4 Conversion system codes
4.1 Structure of conversion system codes
4.1.1 General
A conversion system code shall consist of four segments:
— titular segment;
— source spelling system segment;
— target spelling system segment;
— identifying segment.
Each segment shall consist of one or more elements.
4.1.2 Construction of the conversion system code
The following rules are to be adhered to for the construction of a conversion system code:
— The codes shall consist of elements from the following Unicode ranges:
— DIGIT ZERO through DIGIT NINE (U+0030 — U+0039)
— LATIN CAPITAL LETTER A through LATIN CAPITAL LETTER Z (U+0041 — U+005A)
— LATIN SMALL LETTER A through LATIN SMALL LETTER Z (U+0061 — U+007A)
— Segments shall be separated by a single “COLON” (“:”, Unicode U+003A).
— Elements within a segment shall be separated by a single “HYPHEN-MINUS” (“-”, Unicode U+002D).
— “HYPHEN-MINUS” (“-”, Unicode U+002D) within an element (e.g. 233-3) will also be accepted.
— Other characters in the elements not covered by the above should be omitted or substituted.
4.1.3 Titular segment
This part will contain a reference to the conversion system authority or authorities by using identifiers,
the list of which is maintained by ISO 24229/RA (see A.1). If an authority cannot be identified but the
conversion system has a national character and/or is used by the government, the 2-letter country code
from ISO 3166-1 should be used as the conversion system authority. If no conversion system authorities
can be identified or its identification is not relevant, “Var” (varia) is used as the titular segment. See
Clause 5 for more details.
3
© ISO 2022 – All rights reserved

---------------------- Page: 9 ----------------------
ISO 24229:2022(E)
4.1.4 Source spelling system segment
Except as specified in 4.6, a script code is a mandatory element. Language-specific spelling systems also
have language codes. In order to cover more specific needs, the following four elements in the order
given shall be used:
— language code (3-letter code from ISO 639-2 or ISO 639-3 with preference to terminological codes.
If a synonym is used from ISO 639-2, the ISO 639-2/T associated code should be used. ISO 639-2/T
codes are intended to be used for terminology applications.);
— script code (4-letter code from ISO 15924);
— country code (2-letter code from ISO 3166-1);
— spelling system extension (an ad hoc string to refer to a non-default spelling system of a language,
such as old orthography).
EXAMPLE 1 ind-Latn-pre1972 (Indonesian language using the pre-1972 orthography).
EXAMPLE 2 bos-Arab (Bosnian language using Arabic script).
EXAMPLE 3 uzb-Arab-AF (Uzbek language as used in Afghanistan).
4.1.5 Target spelling system segment
This part may have the same four elements as listed in 4.1.4.
4.1.6 Identifying segment
This part will serve to distinguish by version, year of issue, etc. conversion systems that otherwise
have the same scope. It may also contain elements necessary for the recognition of the system itself if
the system has some kind of identification element. All in all, the following elements may occur (in the
order given):
— identifying numbers, letters or else (such as standard number, e.g. 843);
— version number (e.g. v6, v4-1);
— year of adoption;
— year of issue;
— method identifier (if a standard devises more than one method of conversion, this optional ad hoc
identifier can be used for distinction).
If there are cases when no elements can be used for this part, “na” (not applicable) will be the substitute.
EXAMPLE 2017 is the identifying segment of the system coded as UN:ara-Arab:Latn:2017.
4.2 Requirements for new conversion system codes
Additions to the list of conversion system codes shall be made on the basis of information from upon
the request of a member of ISO 24229/AG (see A.2) or the conversion system authority that manages
this system.
The ISO 24229/AG decides upon the addition, on the basis of the justification given for the actual
requirements for international interchange. Code elements will be allocated accordingly.
A written language conversion system is eligible for a conversion system code assignment if it fulfils
one of the following criteria.
— The system has been approved for official use at some level of government.
4
  © ISO 2022 – All rights reserved

---------------------- Page: 10 ----------------------
ISO 24229:2022(E)
— The system has been developed and used by educational/scientific institutions, published in a peer
reviewed scientific publication.
— The system has been in substantial usage.
Assigning of a conversion system code also requires demonstration of one of the following usage factors:
— necessity of identification of the system in interchange.
— necessity of identification of the system in data encoding.
Systems that are used in isolation or only for temporary usage do not need to have assigned codes.
4.3 Deprecation of conversion system codes
Deprecation of conversion system codes shall be made upon request of a member of ISO 24229/AG or
the conversion system authority that manages the system.
The ISO 24229/AG will decide upon the marking of deprecation, on the basis of the information received.
The corresponding code is reserved for backwards-compatibility.
NOTE Deprecation only applies to the code representation of the written languages conversion system, and
not the system itself. For example, deprecation can be necessary when the authority undergoes a rename.
4.4 User assigned conversion system codes
If users need codes to represent conversion systems not included in the conversion system registry, the
code prefix of zz can be used, which shall be placed at the beginning of the conversion system code, in
the titular segment, and followed by a “HYPHEN MINUS” character (“-”, Unicode U+002D).
NOTE Users are advised that the above series of codes are not universally used, those code elements are not
compatible between different entities.
4.5 Capitalization of conversion system codes
Conversion system codes will use capitalization according to the relevant standards but this does not
have any distinctive meaning. For example, an all lower case code will be an equally valid code.
4.6 Abbreviated conversion system codes
In case of user demand, abbreviated conversion system codes may additionally be registered whereby
in identifying language-specific spelling systems script codes are omitted if they can be considered as
default scripts for the languages concerned. Examples are given in 4.7. Sources, such as Common Locale
Data Repository (CLDR) of the Unicode Consortium, should be consulted when determining default
scripts for languages.
4.7 Examples of conversion system codes
The examples given here are only indicative and do not guarantee that such codes will be actually
registered.
EXAMPLE 1 UN:ara-Arab:Latn:2017 (possible abbreviation — UN:ara:Latn:2017; United Nations system
for the romanization of Arabic, approved 2017)
EXAMPLE 2 UN:mon-Mong-CN:Latn:1977 (possible abbreviation — UN:mon-CN:Latn:1977; United Nations
system for the romanization of Mongolian in China, approved 1977)
EXAMPLE 3 BGN-PCGN:chn-Hans:Latn:1979 (BGN/PCGN 1979 Agreement — Romanization of Chinese)
EXAMPLE 4 ALA-LC:mal-Mlym:Latn:2012 (possible abbreviation — ALA-LC:mal:Latn:2012; ALA-LC
romanization system that transliterates the Malayam language from Malayam script characters into Latin script)
5
© ISO 2022 – All rights reserved

---------------------- Page: 11 ----------------------
ISO 24229:2022(E)
EXAMPLE 5 ISO:Cyrl:Latn:9–1995 (ISO 9:1995 for the transliteration into Latin of Cyrillic characters)
EXAMPLE 6 ICAO:Arab:Latn:2015 (ICAO rules for rendering Arabic-script names in Latin letters, issued in
2015)
EXAMPLE 7 DIN:bel-Cyrl:Latn:1460–1982 (possible abbreviation — DIN:bel:Latn:1460–1982;
DIN 1460 for the transliteration of Belarusian into Latin)
EXAMPLE 8 ESKT:udm-Cyrl:est-Latn:2021 (possible abbreviation — ESKT:udm:est:2021; Estonian
Language Committee’s rules for rendering Udmurt names in Estonian texts, approved 2021)
EXAMPLE 9 LV:eng-Latn:lav-Latn:2006 (possible abbreviation — LV:eng:lav:2006; official instructions
in Latvia on rendering English proper names in Latvian, issued in 2006)
Target spelling systems can also be language-specific. Example 8 denotes a system to represent Udmurt
names in Estonian texts using the Estonian alphabet, not Latin as a whole.
5 Conversion system authority
5.1 General
A conversion system authority is a competent authority that creates, publishes and/or manages written
language conversion systems.
Authorities that are no longer competent will depend on ISO 24229/AG for managing codes, which will
be considered on a case-by-case basis.
5.2 Requirements
5.2.1 General
A conversion system authority should:
a) have at least one written language conversion system eligible for a conversion system code;
b) be competent in managing its written language conversion systems (5.2.4).
5.2.2 Inactive authorities
If a conversion system authority does not meet requirements outlined in 5.2.1 b), it is considered
“inactive”.
5.2.3 Varia authorities
The “Varia systems” (Var) conversion system authority is managed by ISO 24229/AG to represent
written language conversion systems that:
a) have a need to be represented as determined by ISO 24229/AG;
b) yet do not have a clear extant authority.
5.2.4 Competency
A competent conversion system authority is a recognized institution that has standardized processes
surrounding the management of the written language conversion systems, covering the following
processes:
a) planning of written language conversion systems, including the process of designing and defining
written language conversion systems; and
6
  © ISO 2022 – All rights reserved

---------------------- Page: 12 ----------------------
ISO 24229:2022(E)
b) performing changes to written language conversion systems are well planned.
It is recommended for a competent conversion system authority to also establish standardized
processes for the following:
a) public announcement and dissemination of its written language conversion systems; and
b) allows a public review period for people affected by written language conversion systems under its
management prior to enactment.
5.3 Registration
The ISO 24229/AG is tasked with managing a list of conversion system authorities.
5.4 Conversion system authority identifiers
5.4.1 Principles for construction of identifiers
5.4.1.1 Relationship with names
The principle behind the alphabetic identifiers for conversion system authorities is a visual association
between the conversion system authorities’ names and their corresponding identifiers.
In applying this principle, the identifiers will be generally assigned on the basis of the abbreviated
names of the conversion system authorities, thus avoiding, wherever possible, any reflection of their
political status.
5.4.1.2 Construction of the alphabetic identifier
The following rules shall be adhered to for the construction of the alphabetic identifier.
— The maximum length of the identifier shall be 16 characters.
— The identifier shall consist of elements from the following Unicode ranges:
— DIGIT ZERO through DIGIT NINE (U+0030 — U+0039)
— LATIN CAPITAL LETTER A through LATIN CAPITAL LETTER Z (U+0041 — U+005A)
— LATIN SMALL LETTER A through LATIN SMALL LETTER Z (U+0061 — U+007A)
— The identifier elements shall be separated by a single HYPHEN-MINUS (U+002D).
— The minimal length of the identifier is 3 characters to encourage the creation of descriptive and
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.