Information and documentation — Codes for written language conversion systems

This document provides principles for establishing codes for the representation of written language conversion systems. The codes are devised for usage in any application requiring the expression of written language conversion systems, including transliteration and romanization systems, in coded form.

Information et documentation — Codes pour les systèmes de conversion des langues écrites

General Information

Status: Published
Publication Date: 07-Nov-2022

ICS: 01.140.10 - Writing and transliteration
: 01.140.20 - Information sciences

Technical Committee: ISO/TC 46 - Information and documentation
Drafting Committee: ISO/TC 46/WG 3 - Conversion of written languages

Current Stage: 6060 - International Standard published
Start Date: 08-Nov-2022
Due Date: 21-Jun-2022
Completion Date: 08-Nov-2022

Relations

Consolidated By: ISO 80000-8:2020/Amd 1:2025 - Quantities and units — Part 8: Acoustics — Amendment 1
Effective Date: 10-Jun-2023

Overview

ISO 24229:2022 - "Information and documentation - Codes for written language conversion systems" defines principles and a registry model for assigning codes that identify written language conversion systems (including transliteration, transcription and romanization). The standard specifies how to construct machine-readable conversion-system identifiers and how to maintain a registry of authorities, codes and attributes to support consistent exchange of conversion metadata across applications.

Key Topics and Requirements

Code structure: A conversion system code consists of four segments - titular, source spelling system, target spelling system, and identifying segment. Segments are separated by a single colon (:) and elements within segments by a single hyphen (-).
Character rules: Code elements use limited Unicode ranges: digits 0–9 and Latin letters (A–Z, a–z). Other characters should be omitted or substituted; a hyphen within an element is accepted.
Titular segment / authority: The titular segment references the conversion system authority (ISO 24229/RA maintains the authority list). If no authority is identifiable, a country code (ISO 3166-1) or “Var” (varia) is used.
Governance: Procedures cover registration, requirements for new codes, deprecation, user-assigned codes, capitalization rules and abbreviated forms.
Data model & attributes: A common data model defines attributes for authorities and conversion systems. The model reuses existing ISO code elements (ISO 15924 script codes, ISO 639 language codes, ISO 3166 country codes, ISO 8601 dates) to ensure interoperable metadata.
Authority identifiers: Principles for constructing authority identifiers and competency criteria for authorities are specified to ensure reliable registry administration.

Applications and Who Uses It

ISO 24229:2022 is practical for any system that must express which written-language conversion method is applied:

Libraries, archives and bibliographic services for cataloguing names and titles across scripts.
Lexicographers and terminologists documenting transliteration standards.
Publishers and editors needing consistent romanization/transliteration metadata.
NLP, machine translation and speech systems for reverse transliteration, machine pronunciation and conversion workflows.
Data exchange & metadata standards (catalogs, authority files, bibliographic records, APIs) that require unambiguous, coded identification of conversion rules.

Using standardized conversion-system codes improves interoperability, reproducibility of conversions, and automated processing across multilingual information systems.

Related Standards

ISO 15924 - Codes for the representation of names of scripts
ISO 639-2 / ISO 639-3 / ISO 639-5 - Language codes
ISO 3166-1 - Country codes
ISO 8601 - Date/time representations
ISO 5127 - Information and documentation - Foundation and vocabulary

Keywords: ISO 24229:2022, written language conversion, conversion system code, transliteration, romanization, transcription, script conversion, ISO 15924, ISO 639, registry, metadata.

Buy Documents

ISO 24229:2022 - Information and documentation — Codes for written language conversion systems
Released:8. 11. 2022 - Page 1 preview

ISO 24229:2022 - Information and documentation — Codes for written language conversion systems
Released:8. 11. 2022 - Page 2 preview

ISO 24229:2022 - Information and documentation — Codes for written language conversion systems
Released:8. 11. 2022 - Page 3 preview

Standard

ISO 24229:2022 - Information and documentation — Codes for written language conversion systems Released:8. 11. 2022

English language (17 pages)

sale 15% off

Preview

sale 15% off

Preview

Frequently Asked Questions

What is ISO 24229:2022?

ISO 24229:2022 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information and documentation — Codes for written language conversion systems". This standard covers: This document provides principles for establishing codes for the representation of written language conversion systems. The codes are devised for usage in any application requiring the expression of written language conversion systems, including transliteration and romanization systems, in coded form.

What is the scope of ISO 24229:2022?

What ICS categories does ISO 24229:2022 belong to?

ISO 24229:2022 is classified under the following ICS (International Classification for Standards) categories: 01.140.10 - Writing and transliteration; 01.140.20 - Information sciences. The ICS classification helps identify the subject area and facilitates finding related standards.

What standards are related to ISO 24229:2022?

ISO 24229:2022 has the following relationships with other standards: It is inter standard links to ISO 80000-8:2020/Amd 1:2025. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

How can I access ISO 24229:2022?

ISO 24229:2022 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)

ISO 24229:2022 - Information a...

INTERNATIONAL ISO
STANDARD 24229
First edition
2022-11
Information and documentation —
Codes for written language conversion
systems
Information et documentation — Codes pour les systèmes de
conversion des langues écrites
Reference number
© ISO 2022
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Conversion system codes . 3
4.1 Structure of conversion system codes. 3
4.1.1 General . 3
4.1.2 Construction of the conversion system code . 3
4.1.3 Titular segment . 3
4.1.4 Source spelling system segment . 4
4.1.5 Target spelling system segment . 4
4.1.6 Identifying segment . 4
4.2 Requirements for new conversion system codes. 4
4.3 Deprecation of conversion system codes . 5
4.4 User assigned conversion system codes . 5
4.5 Capitalization of conversion system codes . 5
4.6 Abbreviated conversion system codes . 5
4.7 Examples of conversion system codes. 5
5 Conversion system authority . 6
5.1 General . 6
5.2 Requirements . 6
5.2.1 General . 6
5.2.2 Inactive authorities . 6
5.2.3 Varia authorities . 6
5.2.4 Competency . 6
5.3 Registration . 7
5.4 Conversion system authority identifiers . 7
5.4.1 Principles for construction of identifiers . 7
5.4.2 Examples of conversion system authority identifiers. 8
6 Data model and attributes .8
6.1 Common data model and attributes . 8
6.1.1 General . 8
6.1.2 Data models . 8
6.1.3 Usage of ISO 15924 code elements . 8
6.1.4 Usage of ISO 639 code elements . 8
6.1.5 Usage of ISO 3166 code elements . 8
6.1.6 Usage of ISO 8601 expressions . 8
6.2 System authority data model and attributes . 9
6.2.1 Diagram . 9
6.2.2 Conversion system authority . 9
6.2.3 Authority identifier . 9
6.3 Conversion system data model and attributes . 10
6.3.1 Diagram . 10
6.3.2 Written language conversion system . 10
6.3.3 Spelling system . 11
6.3.4 Conversion system relation . 11
6.3.5 Conversion system code status . 11
6.3.6 Conversion system status . 11
6.3.7 Conversion system relation type .12
Annex A (normative) Registration authority .13
iii
Bibliography .17
iv
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 46, Information and documentation.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
Introduction
A number of international applications require the identification of written language conversion
systems, including for terminology, lexicography, bibliography, and linguistics, especially for reverse
transliteration, computational linguistics and machine pronunciation.
This document sets out the necessary procedures to maintain the registry of written language
conversion systems.
The chosen term “written language conversion” is intended to refer to all types of conversions, i.e.
transformations of written texts from one spelling system to another. It thus includes both script
conversion (change of script: transliteration, transcription) and conversion of texts without changing
the script (e.g. transcription of foreign names or words using the alphabet of a target language, change
of the orthography in a language, etc.). For the sake of compactness of expression, “written language
conversion” has been shortened to “conversion” in this document where it does not cause ambiguity.
vi
INTERNATIONAL STANDARD ISO 24229:2022(E)
Information and documentation — Codes for written
language conversion systems
1 Scope
This document provides principles for establishing codes for the representation of written language
conversion systems.
The codes are devised for usage in any application requiring the expression of written language
conversion systems, including transliteration and romanization systems, in coded form.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 639-2, Codes for the representation of names of languages — Part 2: Alpha-3 code
ISO 639-3, Codes for the representation of names of languages — Part 3: Alpha-3 code for comprehensive
coverage of languages
ISO 639-5, Codes for the representation of names of languages — Part 5: Alpha-3 code for language families
and groups
ISO 3166-1, Codes for the representation of names of countries and their subdivisions — Part 1: Country
code
ISO 5127, Information and documentation — Foundation and vocabulary
ISO 8601 (all parts), Date and time — Representations for information interchange
ISO 15924, Information and documentation — Codes for the representation of names of scripts
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 5127 and the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org
3.1
script
particular graphic representation or class of representations of a set of characters used to write one or
more languages
[SOURCE: ISO 5127:2017, 3.1.6.02]
3.2
spelling system
set of rules governing the orthography of a language
Note 1 to entry: Typically, a spelling system defines how the spoken form of a language is represented in writing.
Several languages have undergone orthographic reforms which means they have had different spelling systems.
3.3
natural language
language which is or was in active use in a community of people, and the rules of which are mainly
deduced from the usage
[SOURCE: ISO 5127:2017, 3.1.5.02]
3.4
character
member of a set of elements that is used for the representation, organization, or control of data
[SOURCE: ISO 5127:2017, 3.1.4.02]
3.5
written language
natural language (3.3) realized through the writing of characters (3.4)
[SOURCE: ISO 5127:2017, 3.1.5.04]
3.6
written language conversion
process whereby one spelling system (3.2) is converted into another spelling system
Note 1 to entry: This is a general term that includes script conversion but also, e.g. cases when a language changes
its orthography without changing the script.
3.7
transliteration
process which consists of representing the characters of an alphabetical or syllabic system of writing
by the characters of a conversion alphabet
3.8
transcription
process whereby the sounds of a given language are noted by the system of signs of a conversion
language
3.9
romanization
script conversion from non-Roman to Roman script (3.1) by means of transliteration (3.7), transcription
(3.8) or both
[SOURCE: ISO 5127:2017, 3.1.6.14]
3.10
written language conversion system
set of rules for written language conversion (3.6)
3.11
language code
combination of characters used to represent the name of a language or languages
[SOURCE: ISO 5127:2017, 3.2.5.14]
3.12
script code
combination of characters used to represent the name of a script (3.1)
[SOURCE: ISO 15924:2004, 3.8]
3.13
conversion system code
combination of characters used in a structured way to represent a written language conversion system
(3.10)
4 Conversion system codes
4.1 Structure of conversion system codes
4.1.1 General
A conversion system code shall consist of four segments:
— titular segment;
— source spelling system segment;
— target spelling system segment;
— identifying segment.
Each segment shall consist of one or more elements.
4.1.2 Construction of the conversion system code
The following rules are to be adhered to for the construction of a conversion system code:
— The codes shall consist of elements from the following Unicode ranges:
— DIGIT ZERO through DIGIT NINE (U+0030 — U+0039)
— LATIN CAPITAL LETTER A through LATIN CAPITAL LETTER Z (U+0041 — U+005A)
— LATIN SMALL LETTER A through LATIN SMALL LETTER Z (U+0061 — U+007A)
— Segments shall be separated by a single “COLON” (“:”, Unicode U+003A).
— Elements within a segment shall be separated by a single “HYPHEN-MINUS” (“-”, Unicode U+002D).
— “HYPHEN-MINUS” (“-”, Unicode U+002D) within an element (e.g. 233-3) will also be accepted.
— Other characters in the elements not covered by the above should be omitted or substituted.
4.1.3 Titular segment
This part will contain a reference to the conversion system authority or authorities by using identifiers,
the list of which is maintained by ISO 24229/RA (see A.1). If an authority cannot be identified but the
conversion system has a national character and/or is used by the government, the 2-letter country code
from ISO 3166-1 should be used as the conversion system authority. If no conversion system authorities
can be identified or its identification is not relevant, “Var” (varia) is used as the titular segment. See
Clause 5 for more details.
4.1.4 Source spelling system segment
Except as specified in 4.6, a script code is a mandatory element. Language-specific spelling systems also
have language codes. In order to cover more specific needs, the following four elements in the order
given shall be used:
— language code (3-letter code from ISO 639-2 or ISO 639-3 with preference to terminological codes.
If a synonym is used from ISO 639-2, the ISO 639-2/T associated code should be used. ISO 639-2/T
codes are intended to be used for terminology applications.);
— script code (4-letter code from ISO 15924);
— country code (2-letter code from ISO 3166-1);
— spelling system extension (an ad hoc string to refer to a non-default spelling system of a language,
such as old orthography).
EXAMPLE 1 ind-Latn-pre1972 (Indonesian language using the pre-1972 orthography).
EXAMPLE 2 bos-Arab (Bosnian language using Arabic script).
EXAMPLE 3 uzb-Arab-AF (Uzbek language as used in Afghanistan).
4.1.5 Target spelling system segment
This part may have the same four elements as listed in 4.1.4.
4.1.6 Identifying segment
This part will serve to distinguish by version, year of issue, etc. conversion systems that otherwise
have the same scope. It may also contain elements necessary for the recognition of the system itself if
the system has some kind of identification element. All in all, the following elements may occur (in the
order given):
— identifying numbers, letters or else (such as standard number, e.g. 843);
— version number (e.g. v6, v4-1);
— year of adoption;
— year of issue;
— method identifier (if a standard devises more than one method of conversion, this optional ad hoc
identifier can be used for distinction).
If there are cases when no elements can be used for this part, “na” (not applicable) will be the substitute.
EXAMPLE 2017 is the identifying segment of the system coded as UN:ara-Arab:Latn:2017.
4.2 Requirements for new conversion system codes
Additions to the list of conversion system codes shall be made on the basis of information from upon
the request of a member of ISO 24229/AG (see A.2) or the conversion system authority that manages
this system.
The ISO 24229/AG decides upon the addition, on the basis of the justification given for the actual
requirements for international interchange. Code elements will be allocated accordingly.
A written language conversion system is eligible for a conversion system code assignment if it fulfils
one of the following criteria.
— The system has been approved for official use at some level of government.
— The system has been developed and used by educational/scientific institutions, published in a peer
reviewed scientific publication.
— The system has been in substantial usage.
Assigning of a conversion system code also requires demonstration of one of the following usage factors:
— necessity of identification of the system in interchange.
— necessity of identification of the system in data encoding.
Systems that are used in isolation or only for temporary usage do not need to have assigned codes.
4.3 Deprecation of conversion system codes
Deprecation of conversion system codes shall be made upon request of a member of ISO 24229/AG or
the conversion system authority that manages the sy
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...