ISO/IEC TR 30112:2014
(Main)Information technology - Specification methods for cultural conventions
Information technology - Specification methods for cultural conventions
ISO/TR 30112:2014 specifies description formats and functionality for the specification of cultural conventions, description formats for character sets, and description formats for binding character names to ISO/IEC 10646, plus a set of default values for some of these items. The specification is upward compatible with POSIX locale specifications - a locale conformant to POSIX specifications will also be conformant to specifications in this Technical Report, while the reverse condition will not hold. Some of the descriptions are intended to be coded in text files to be used via Application Programming Interfaces, that are expected to be developed for a number of systems which comply with ISO/IEC 9945. An alignment effort has been undertaken for this specification to be aligned with ISO/IEC 9945.
Technologies de l'information — Méthodes de spécification des conventions culturelles
General Information
Relations
Frequently Asked Questions
ISO/IEC TR 30112:2014 is a technical report published by the International Organization for Standardization (ISO). Its full title is "Information technology - Specification methods for cultural conventions". This standard covers: ISO/TR 30112:2014 specifies description formats and functionality for the specification of cultural conventions, description formats for character sets, and description formats for binding character names to ISO/IEC 10646, plus a set of default values for some of these items. The specification is upward compatible with POSIX locale specifications - a locale conformant to POSIX specifications will also be conformant to specifications in this Technical Report, while the reverse condition will not hold. Some of the descriptions are intended to be coded in text files to be used via Application Programming Interfaces, that are expected to be developed for a number of systems which comply with ISO/IEC 9945. An alignment effort has been undertaken for this specification to be aligned with ISO/IEC 9945.
ISO/TR 30112:2014 specifies description formats and functionality for the specification of cultural conventions, description formats for character sets, and description formats for binding character names to ISO/IEC 10646, plus a set of default values for some of these items. The specification is upward compatible with POSIX locale specifications - a locale conformant to POSIX specifications will also be conformant to specifications in this Technical Report, while the reverse condition will not hold. Some of the descriptions are intended to be coded in text files to be used via Application Programming Interfaces, that are expected to be developed for a number of systems which comply with ISO/IEC 9945. An alignment effort has been undertaken for this specification to be aligned with ISO/IEC 9945.
ISO/IEC TR 30112:2014 is classified under the following ICS (International Classification for Standards) categories: 35.240.20 - IT applications in office work. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC TR 30112:2014 has the following relationships with other standards: It is inter standard links to ISO/IEC 30112:2020. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC TR 30112:2014 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
TECHNICAL ISO/IEC
REPORT TR
First edition
Information technology —
Specification methods for cultural
conventions
Technologies de l’information — Méthodes de spécification des
conventions culturelles
PROOF/ÉPREUVE
Reference number
©
ISO/IEC 2014
© ISO/IEC 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, and notations . 1
3.1 Terms and definitions . 1
3.2 Notations . 3
4 FDCC-set . 6
4.1 FDCC-set description . 7
4.2 LC_IDENTIFICATION .10
4.3 LC_CTYPE .12
4.4 LC_COLLATE .40
4.5 LC_MONETARY .49
4.6 LC_NUMERIC .52
4.7 LC_TIME .53
4.8 LC_MESSAGES .59
4.9 LC_XLITERATE .59
4.10 LC_NAME .61
4.11 LC_ADDRESS .62
4.12 LC_TELEPHONE.64
4.13 LC_PAPER .65
4.14 LC_MEASUREMENT .66
4.15 LC_KEYBOARD .66
5 CHARMAP .66
5.1 Character Set Description Text .66
5.2 WIDTH section .70
6 REPERTOIREMAP .70
Annex A (informative) Differences from the ISO/IEC 9945 standard .104
Annex B (informative) Rationale .106
Annex C (informative) BNF Grammar .117
Annex D (informative) Relation to taxonomy .122
Annex E (informative) Implementation in glibc .125
Annex F (informative) Index.126
Bibliography .135
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Details of any patent rights identified during the development of the document will be in the Introduction
and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 35, User
interfaces.
iv PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
Introduction
This Technical Report defines general mechanisms to specify cultural conventions, and it defines formats
for a number of specific cultural conventions in the areas of character classification and conversion,
sorting, number formatting, monetary formatting, date formatting, message display, addressing of
persons, postal address formatting, and telephone number handling.
There are a number of benefits coming from this Technical Report:
Rigid specification: Using this Technical Report, a user can rigidly specify a number of the cultural
conventions that apply to the information technology environment of the user.
Cultural adaptability: If an application has been designed and built in a culturally neutral manner,
the application may use the specifications as data to its APIs, and thus the same application may
accommodate different users in a culturally acceptable way to each of the users, without change of the
binary application.
Productivity: This Technical Report specifies cultural conventions and how to specify data for them.
With that data an application developer is relieved from getting the different information to support all
the cultural environments for the expected customers of the product. The application developer is thus
ensured of culturally correct behaviour as specified by the customer, and possibly more markets may
be reached as customers may have the possibility to provide the data themselves for markets that were
not targeted.
Uniform behaviour: When a number of applications share one cultural specification, which may be
supplied from the user or provided by the application or operating system, their behaviour for cultural
adaptation becomes uniform.
The specification formats are independent of platforms and specific encoding, and targeted to be usable
from a wide range of programming languages.
A number of cultural conventions, such as spelling, hyphenation rules and terminology, are not specifiable
with this Technical Report, but the Technical Report provides mechanisms to define new categories and
also new keywords within existing categories. An internationalized application may take advantage
of information provided with the FDCC-set (such as the language) to provide further internationalized
services to the user.
This Technical Report defines a format compatible with the one used in the International string ordering
standard, ISO/IEC 14651. This Technical Report is upward compatible with parts of the ISO/IEC 9945
POSIX standard, especially those on POSIX locales and charmaps. The major extensions from that text
are listed in annex A. This Technical Report has enhanced functionality in a number of areas such as
ISO/IEC 10646 support, more classification of characters, transliteration, dual (multi) currency support,
enhanced date and time formatting, personal name writing, postal address formatting, telephone
number handling, keyboard handling, and management of categories. There is enhanced support for
character sets including ISO/IEC 2022 handling and an enhanced method to separate the specification
of cultural conventions from an actual encoding via a description of the character repertoire employed.
A standard set of values for all the categories has been defined covering the repertoire of ISO/IEC 10646.
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE v
TECHNICAL REPORT ISO/IEC TR 30112:2014(E)
Information technology — Specification methods for
cultural conventions
1 Scope
This Technical Report specifies description formats and functionality for the specification of cultural
conventions, description formats for character sets, and description formats for binding character
names to ISO/IEC 10646, plus a set of default values for some of these items.
The specification is upward compatible with POSIX locale specifications - a locale conformant to POSIX
specifications will also be conformant to specifications in this Technical Report, while the reverse
condition will not hold. Some of the descriptions are intended to be coded in text files to be used via
Application Programming Interfaces, that are expected to be developed for a number of systems which
comply with ISO/IEC 9945. An alignment effort has been undertaken for this specification to be aligned
with ISO/IEC 9945.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
This document contains no normative references.
3 Terms and definitions, and notations
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1.1 Bytes and characters
3.1.1.1
byte
individually addressable unit of data storage that is equal to or larger than an octet, used to store a
character or a portion of a character
Note 1 to entry: A byte is composed of a contiguous sequence of bits, the number of which is implementation
defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit
3.1.1.2
character
member of a set of elements used for the organization, control or representation of data
3.1.1.3
coded character
sequence of one or more bytes representing a single character
3.1.1.4
text file
file that contains characters organized into one or more lines
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE 1
3.1.2 Cultural and other major concepts
3.1.2.1
cultural convention
data item for information technology that may vary dependent on language, territory, or other cultural
habits
3.1.2.2
FDCC
Formal Definition of a Cultural Convention
cultural convention put into a formal definition scheme
3.1.2.3
FDCC-set
set of Formal Definitions of Cultural Conventions (FDCC’s)
definition of the subset of a user’s information technology environment that depends on language and
cultural conventions
Note 1 to entry: the FDCC-set is a superset of the “locale” term in C and POSIX.
3.1.2.4
charmap
definition of a mapping between symbolic character names and character codes, plus related information
3.1.2.5
repertoiremap
definition of a mapping between symbolic character names and characters for the repertoire of
characters used in a FDCC-set
Note 1 to entry: This is further described in clause 6.
3.1.3 FDCC categories related
3.1.3.1
character class:
named set of characters sharing an attribute associated with the name of the class
3.1.3.2
collation
logical ordering of strings according to defined precedence rules
3.1.3.3
collating element
smallest entity used to determine logical ordering
Note 1 to entry: See collating sequence. A collating element consists of either a single character, or two or more
characters collating as a single entity. The LC_COLLATE category in the associated FDCC-set determines the set
of collating elements.
3.1.3.4
multicharacter collating element
sequence of two or more characters that collate as an entity
Note 1 to entry: For example, in some languages two characters are sorted as one letter, as in the case for Danish
and Norwegian “aa”.
3.1.3.5
collating sequence
relative order of collating elements as determined by the setting of the LC_COLLATE category in the
applied FDCC-set
2 PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
3.1.3.6
equivalence class
set of collating elements with the same primary collation weight
Note 1 to entry: Elements in an equivalence class are typically elements that naturally group together, such
as all accented letters based on the same letter. The collation order of elements within an equivalence class is
determined by the weights assigned on any subsequent levels after the primary weight.
3.2 Notations
The following notations and common conventions for specifications apply to this Technical Report:
3.2.1 Notation for defining syntax
In this Technical Report, the description of an individual record in a FDCC-set is done using the syntax
notation given in the following.
The syntax notation looks as follows:
“”,[,,.,]
The is given in a format string enclosed in double quotes, followed by a number of parameters,
separated by commas. It is similar to the format specification defined in the ISO/IEC 9945 standard and
the format specification used in C language printf() function. The format of each parameter is given by
an escape sequence as follows:
— %s specifies a string
— %d specifies a decimal integer
— %c specifies a character
— %o specifies an octal integer
— %x specifies a hexadecimal integer
A “ “ (an empty character position) in the syntax string represents one or more characters.
All other characters in the format string represent themselves, except:
— %% specifies a single %
— \n specifies an end-of-line
The notation “.” is used to specify that repetition of the previous specification is optional, and this is
done in both the format string and in the parameter list.
3.2.2 Portable character set
A set of symbolic names for characters in Table 1, which is called the portable character set, is used
in character description text of this specification. The first eight entries in Table 1 are defined in
ISO/IEC 6429 and the rest is defined in ISO/IEC 9945 with some definitions from ISO/IEC 10646.
Table 1 — Portable character set
Symbolic name Glyph UCS Description
NULL (NUL)
BELL (BEL)
BACKSPACE (BS)
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE 3
Table 1 (continued)
Symbolic name Glyph UCS Description
CHARACTER TABULATION (HT)
CARRIAGE RETURN (CR)
LINE FEED (LF)
LINE TABULATION (VT)
FORM FEED (FF)
SPACE
! EXCLAMATION MARK
“ QUOTATION MARK
# NUMBER SIGN
$ DOLLAR SIGN
% PERCENT SIGN
& AMPERSAND
‘ APOSTROPHE
( LEFT PARENTHESIS
) RIGHT PARENTHESIS
* ASTERISK
+ PLUS SIGN
, COMMA
- HYPHEN-MINUS
- HYPHEN-MINUS
. FULL STOP
. FULL STOP
/ SOLIDUS
/ SOLIDUS
0 DIGIT ZERO
1 DIGIT ONE
2 DIGIT TWO
3 DIGIT THREE
4 DIGIT FOUR
5 DIGIT FIVE
6 DIGIT SIX
7 DIGIT SEVEN
8 DIGIT EIGHT
9 DIGIT NINE
: COLON
; SEMICOLON
< LESS-THAN SIGN
= EQUALS SIGN
> GREATER-THAN SIGN
? QUESTION MARK
@ COMMERCIAL AT
4 PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
Table 1 (continued)
Symbolic name Glyph UCS Description
A LATIN CAPITAL LETTER A
B LATIN CAPITAL LETTER B
C LATIN CAPITAL LETTER C
D LATIN CAPITAL LETTER D
E LATIN CAPITAL LETTER E
F LATIN CAPITAL LETTER F
G LATIN CAPITAL LETTER G
H LATIN CAPITAL LETTER H
I LATIN CAPITAL LETTER I
J LATIN CAPITAL LETTER J
K LATIN CAPITAL LETTER K
L LATIN CAPITAL LETTER L
M LATIN CAPITAL LETTER M
N LATIN CAPITAL LETTER N
O LATIN CAPITAL LETTER O
p LATIN SMALL LETTER P
q LATIN SMALL LETTER Q
r LATIN SMALL LETTER R
s LATIN SMALL LETTER S
t LATIN SMALL LETTER T
u LATIN SMALL LETTER U
v LATIN SMALL LETTER V
w LATIN SMALL LETTER W
x LATIN SMALL LETTER X
y LATIN SMALL LETTER Y
z LATIN SMALL LETTER Z
{ LEFT CURLY BRACKET
{ LEFT CURLY BRACKET
| VERTICAL LINE
} RIGHT CURLY BRACKET
} RIGHT CURLY BRACKET
~ TILDE
This Technical Report may use other symbolic character names than the above in examples, to illustrate
the use of the range of symbols allowed by the syntax specified in 4.1.1.
4 FDCC-set
A FDCC-set is the definition of the subset of a user’s information technology environment that depends
on language and cultural conventions. A FDCC-set is made up from one or more categories. Each category
is identified by its name and controls specific aspects of the behaviour of components of the system. The
functionality is implied by the description of the categories. This Technical Report defines the following
categories:
— LC_IDENTIFICATION Versions and status of categories
— LC_CTYPE Character classification, case conversion and code transformation.
— LC_COLLATE Collation order.
— LC_TIME Date and time formats.
— LC_NUMERIC Numeric, non-monetary formatting.
6 PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
— LC_MONETARY Monetary formatting.
— LC_MESSAGES Formats of informative and diagnostic messages and interactive responses.
— LC_XLITERATE Character transliteration.
— LC_NAME Format of writing personal names.
— LC_ADDRESS Format of postal addresses.
— LC_TELEPHONE Format for telephone numbers, and other telephone information.
— LC_PAPER Paper format
— LC_MEASUREMENT Information on measurement system
— LC_KEYBOARD Format for identifying keyboards.
NOTE In future editions of this Technical Report further categories may be added.
Other category names beginning with the 3 characters “LC_” are reserved for future standardization,
except for category names beginning with the five characters “LC_X_” which is not used for future
addition of categories specified in this Technical Report. An application may thus use category names
beginning with the five characters “LC_X_” for application defined categories to avoid clashes with
future standardized categories.
This Technical Report also defines an FDCC-set named “i18n” with values for some of the above categories
in order to simplify FDCC-set descriptions for a number of cultures. The contents of “i18n” categories
should not necessarily be considered as the most commonly accepted values, while in many cases it
could be the recommended values. The complete “i18n” FDCC-set is defined as the sum of the “i18n”
categories specified in the clauses below. The ”i18n” FDCC-set and its parts are released under the GNU
Public License, version 2, as it is taken from glibc sources.
4.1 FDCC-set description
FDCC-sets are described with the syntax presented in this subclause. For the purposes of this Technical
Report, the text is referred to as the FDCC-set definition text or FDCC-set source text.
The FDCC-set definition text contains one or more FDCC-set category source definitions, and does not
contain more than one definition for the same FDCC-set category. If the text contains source definitions
for more than one category, application-defined categories, if present, appears after the categories
defined by this clause. A category source definition contains either the definition of a category or a copy
directive. In the event that some of the information for a FDCC-set category, as specified in this Technical
Report, is missing from the FDCC-set source definition, the behaviour of that category, if it is referenced,
is unspecified. A FDCC-set category is the normal way of specifying a single FDCC.
There are no naming conventions for FDCC-sets specified in this Technical Report, but clause 6.8 in
ISO/IEC 15897:1999 specifies naming rules for POSIX locales, charmaps and repertoiremaps, that may
also be applied to FDCC-sets, charmaps and repertoiremaps specified according to this Technical Report.
A category source definition consists of a category header, a category body, and a category trailer. A
category header consists of the character string naming of the category, beginning with the characters
“LC_”. The category trailer consists of the string “END”, followed by one or more “blank”s and the string
used in the corresponding category header.
The category body consists of one or more lines of text. Each line is one of the following:
— a line containing an identifier, optionally followed by one or more operands. Identifiers are either
keywords, identifying a particular FDCC, or collating elements, or section symbols,
— one of transliteration statements defined in 4.3.
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE 7
In addition to the keywords defined in this Technical Report, the source can contain application-
defined keywords. Each keyword within a category has a unique name (i.e., two categories can have
a commonly-named keyword); no keyword starts with the characters “LC_”. Identifiers are separated
from the operands by one or more “blank”s.
Operands are characters, collating elements, section symbols, or strings of characters. Strings are
enclosed in double-quotes. Literal double-quotes within strings are preceded by the ,
described below. When a keyword is followed by more than one operand, the operands are separated by
semicolons; “blank”s are allowed before and/or after a semicolon.
4.1.1 Character representation
Individual characters, characters in strings, and collating elements are represented using symbolic
names, UCS notation or characters themselves, or as octal, hexadecimal, or decimal constants as defined
below. When constant notation is used, the resultant FDCC-set definitions need not be portable between
systems.
(0) The left angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to
represent itself outside a symbolic name it is preceded by the escape character.
(1) A character can be represented via a symbolic name, enclosed within angle brackets (< and >). The
symbolic name, including the angle brackets, exactly matches a symbolic name defined in a charmap or
a repertoiremap to be used, and is replaced by a character value determined from the value associated
with the symbolic name in the charmap or a value associated via a repertoiremap. Repertoiremaps have
predefined symbolic names for UCS characters, see clause 6. A FDCC-set may also use the UCS notation
of clause 6 to represent characters, without a repertoiremap being defined for the FDCC-set. Use of
the escape character or a right angle bracket within a symbolic name is invalid unless the character is
preceded by the escape character.
EXAMPLE ; “”
The items (2), (3), (4) and (5) are deprecated and are retained for compatibility with the POSIX standard.
FDCC-sets should be specified in a coded character set independent way, using symbolic names. To
make actual use of the FDCC-set, it is used together with charmaps and/or repertoiremaps, so that the
symbolic character names can be resolved into the actual character encoding used.
(2) A character can be represented by the character itself, in which case the value of the character is
application-defined. Within a string, the double-quote character, the escape character, and the right
angle bracket character are escaped (preceded by the escape character) to be interpreted as the
character itself. Outside strings, the characters
, ; < > escape_char
are escaped by the escape character to be interpreted as the character itself.
EXAMPLE c ä “May”
(3) A character can be represented as an octal constant. An octal constant is specified as the escape
character followed by two or more octal digits. Each constant represents a byte value.
EXAMPLE \143; \347; “\115”
(4) A character can be represented as a hexadecimal constant. A hexadecimal constant is specified as the
escape character followed by an x followed by two or more hexadecimal digits. Each constant represents
a byte value.
EXAMPLE \x63;\xe7;
(5) A character can be represented as a decimal constant. A decimal constant is specified as the escape
character followed by a d followed by two or more decimal digits. Each constant represents a byte value.
EXAMPLE \d99; \d231;
8 PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
(6) Multibyte characters can be represented by concatenated constants specified in byte order with the
last constant specifying the least significant byte of the character. Concatenated constants can include a
mix of the above character representations.
EXAMPLE \143\xe7; “\115\xe7\d171”
Only characters existing in the character set for which the FDCC-set definition is created are specified,
whether using symbolic names, the characters themselves, or octal, decimal, or hexadecimal constants.
If a charmap is present, only characters defined in the charmap can be specified using octal, decimal, or
hexadecimal constants. Symbolic names not present in the charmap can be specified and are ignored, as
specified under item (1) above.
NOTE The symbolic character notation is recommended for use of specifying all characters in
a FDCC-set, to facilitate portability of the FDCC-sets, as the coded character set of the application of the FDCC-
set may be different from the coded character set of the FDCC-set source. This is also recommended for format
effectors in strings, such as in LC_DATE or LC_ADDRESS, where the format effectors are allowed to be stored
together with the rest of the string, in a binary string with a different encoding from that of the source FDCC-set.
4.1.2 Continuation of lines
A line in a specification can be continued by placing an escape character as the last visible graphic
character on the line; this continuation character is discarded from the input. The line is continued to
the next non-comment line.
4.1.3 Names for copy keyword
In most of the categories a “copy” keyword is allowed. The name specified with this copy keyword is
one of:
— “i18n” which indicate the “i18n” FDCC-set defined in this specification,
— the name of a FDCC-set or POSIX locale registered by the process defined in ISO/IEC 15897,
— any other name which may be recognized in some local context - not being recommended as an
international specification.
4.1.4 Pre-category statements
In a FDCC-set the following statements can precede category specifications, and they apply to all
categories in the specified FDCC-set.
4.1.4.1 comment_char
The following line in a FDCC-set modifies the comment character. It has the following syntax, starting
in column 1:
“comment_char %c\n”,
The comment character defaults to the number-sign (#). All examples in this Technical Report use
“%” as the , except where otherwise noted. Blank lines and lines containing the
in the first position are ignored. In collating statements a
occurring where the delimiter “;” may occur, terminates the collating statement.
4.1.4.2 escape_char
The following line in a FDCC-set modifies the escape character to be used in the text. It has the following
syntax, starting in column 1:
“escape_char %c\n”,
The escape character is used for representing characters in 4.1.1 and for continuing lines.
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE 9
The escape character defaults to backslash “\”. All examples in this Technical Report uses “/” as the
escape character, except where otherwise noted.
4.1.4.3 repertoiremap
The following line in a FDCC-set specifies the name of a repertoiremap used to define the symbolic
character names in the FDCC-set. There may be at most one “repertoiremap” line. It has the following
syntax, starting in column 1:
“repertoiremap %s\n”,
The name is one of:
— “i18nrep” which indicates the “i18nrep” repertoiremap defined in this specification,
— the name of a registered by the process defined in ISO/IEC 15897,
— any other name which may be recognized in some local context - not being recommended as an
international specification.
4.1.4.4 charmap
The following line in a FDCC-set specifies the name of a charmap which may be used with the FDCC-set.
It has the following syntax, starting in column 1:
“charmap %s\n”,
This keyword gives a hint on which charmaps a FDCC-set is meant to be supported by. There may be
more than one charmap specification useful with a FDCC-set. It is an application’s responsibility to
decide what charmap specification is to be used with that application.
The name is one of:
— the name of a registered by the process defined in ISO/IEC 15897,
— any other name which may be recognized in some local context - not being recommended as an
international specification.
4.2 LC_IDENTIFICATION
The LC_IDENTIFICATION category defines properties of the FDCC-set, and which specification methods
the FDCC-set is conforming to. Values must be supplied for all unless otherwise noted, and the operands
are strings. The following keywords are defined:
— title Title of the FDCC-set.
— source Organization name of provider of the source.
— address Organization postal address.
— contact Name of contact person. This keyword is optional.
— email Electronic mail address of the organization, or contact person. This keyword is optional.
— tel Telephone number for the organization, in international format. This keyword is optional.
— fax Fax number for the organization, in international format. This keyword is optional.
— language Natural language to which the FDCC-set applies, as specified in ISO 639. If a two-letter
code exists for this language, it is used, else the three-letter code is used. This keyword is optional.
— territory The geographic extent where the FDCC-set applies (where applicable), as two-letter form
of ISO 3166. This keyword is optional.
10 PROOF/ÉPREUVE © ISO/IEC 2014 – All rights reserved
— script Script that the FDCC-set especially uses, as defined by ISO 15924 and its registry. This
keyword is optional.
— audience If not for general use, an indication of the intended user audience. This keyword is optional.
— application If for use of a special application, a description of the application. This keyword is
optional.
— abbreviation Short name for provider of the source. This keyword is optional.
— revision Revision number consisting of digits and zero or more full stops (“.”).
— date Revision date in the format according to this example: “1995-02-05” meaning the 5th of
February, 1995.
If required information is not present in ISO 639 or ISO 3166, the string should be given as empty, and
the relevant Maintenance Authority should be approached to get the needed item registered.
NOTE Only one language per territory can be addressed with a single FDCC-set; an additional FDCC-set is
required for each additional language for that territory.
category Is used to define that a category is present and what specification the category is claiming
conformance to. The first operand is a string in double-quotes that describes the specification that the
category is claiming conformance to, and the following values are defined:
— ”i18n:2004”
— “i18n:2012”
— “posix:1993”
The second operand is a string with the category name, where the category names of clause 4 are defined.
More than one “category” keyword may be given, but only one per category name.
The “i18n” LC_IDENTIFICATION category is:
LC_IDENTIFICATION
% This is the ISO/IEC TR 30112 “i18n” definition for
% the LC_IDENTIFICATION category.
%
title “ISO/IEC TR 30112 i18n FDCC-set”
source “ISO/IEC Copyright Office”
address “Case postale 56, CH-1211 Geneve 20, Switzerland”
contact “”
email “”
tel “”
fax “”
language “”
territory “”
revision “1.1”
date “2010-07-30”
%
category “i18n:2004”;LC_IDENTIFICATION
category “i18n:2012”;LC_CTYPE
category “i18n:2004”;LC_COLLATE
category “i18n:2004”;LC_TIME
category “i18n:2004”;LC_NUMERIC
category “i18n:2004”;LC_MONETARY
category “i18n:2004”;LC_MESSAGES
category “i18n:2004”;LC_NAME
category “i18n:2004”;LC_ADDRESS
category “i18n:2004”;LC_TELEPHONE
category “i18n:2012”;LC_PAPER
category “i18n:2012”;LC_MEASUREMENT
category “i18n:2012”;LC_KEYBOARD
END LC_IDENTIFICATION
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE 11
4.3 LC_CTYPE
The LC_CTYPE category defines character classification, case conversion, character transformation, and
other character attribute mappings. Support for the portable character set is required.
A series of characters in a specification can be represented by the hexadecimal symbolic ellipsis symbol
“.” (two dots), the decimal symbolic ellipses symbols “.” (4 dots), the double increment hexadecimal
symbolic ellipses “.(2).”, or the absolute ellipses “.” (3 dots).
The hexadecimal symbolic ellipsis (“.”) specification is only valid between symbolic character names.
The symbolic names consists of zero or more nonnumeric characters from the set shown with visible
glyphs in Table 1 of clause 3.2.2, followed by an integer formed by one or more hexadecimal digits, using
uppercase letters only for the range “A” to “F”. The characters preceding the hexadecimal integer are
identical in the two symbolic names, and the integer formed by the hexadecimal digits in the second
symbolic name are identical to or greater than the integer formed by the hexadecimal digits in the first
name. This is interpreted as a series of symbolic names formed from the common part and each of the
integers in hexadecimal format using uppercase letters only between the first and the second integer,
inclusive, and with a length of the symbolic names generated that is equal to the length of the first
(and also the second) symbolic name. As an example, . is interpreted as the symbolic
names , , , and , in that order.
The decimal symbolic ellipsis (“.”) specification is only valid between symbolic character names. The
symbolic names consist of zero or more nonnumeric characters from the set shown with visible glyphs
in Table 1 of clause 3.2.2, followed by an integer formed by one or more decimal digits. The characters
preceding the decimal integer are identical in the two symbolic names, and the integer formed by the
decimal digits in the second symbolic name is identical to or greater than the integer formed by the
decimal digits in the first name. This is interpreted as a series of symbolic names formed from the
common part and each of the integers in decimal format between the first and the second integer,
inclusive, and with a length of the symbolic names generated that is equal to the length of the first (and
also the second) symbolic name. As an example, . is interpreted as the symbolic names
, , , and , in that order.
The double increment hexadecimal symbolic ellipses (“.(2).”) works like the hexadecimal symbolic
ellipses, but generates only every other of the symbolic character names. As an example. .
(2). is interpreted as the symbolic character names , , , and
, in that order.
The absolute ellipsis specification is only valid within a single encoded character set. An ellipsis is
interpreted as including in the list all characters with an encoded value higher than the encoded value
of the character preceding the ellipsis and lower than the encoded value of the character following the
ellipsis. The absolute ellipsis specification is deprecated, as this is only relevant to FDCC-sets not using
symbolic characters.
NOTE \x30;.;\x39 includes in the character class all characters with encoded values between the endpoints.
4.3.1 Character classification keywords
The following keywords are recognized. In the descriptions, the term “automatically included” means
that it is not an error to either include the referenced characters or to omit them; the interpreting system
provides them if missing an
...
P LATIN CAPITAL LETTER P
Q LATIN CAPITAL LETTER Q
R LATIN CAPITAL LETTER R
S LATIN CAPITAL LETTER S
T LATIN CAPITAL LETTER T
U LATIN CAPITAL LETTER U
V LATIN CAPITAL LETTER V
W LATIN CAPITAL LETTER W
X LATIN CAPITAL LETTER X
Y LATIN CAPITAL LETTER Y
Z LATIN CAPITAL LETTER Z
[ LEFT SQUARE BRACKET
\ REVERSE SOLIDUS
\ REVERSE SOLIDUS
] RIGHT SQUARE BRACKET
^ CIRCUMFLEX ACCENT
^ CIRCUMFLEX ACCENT
_ LOW LINE
_ LOW LINE
` GRAVE ACCENT
a LATIN SMALL LETTER A
b LATIN SMALL LETTER B
c LATIN SMALL LETTER C
d LATIN SMALL LETTER D
e LATIN SMALL LETTER E
f LATIN SMALL LETTER F
© ISO/IEC 2014 – All rights reserved PROOF/ÉPREUVE 5
Table 1 (continued)
Symbolic name Glyph UCS Description
g LATIN SMALL LETTER G
h LATIN SMALL LETTER H
I LATIN SMALL LETTER I
j LATIN SMALL LETTER J
k LATIN SMALL LETTER K
l LATIN SMALL LETTER L
m LATIN SMALL LETTER M
n LATIN SMALL LETTER N
o LATIN SMALL LETTER O
TECHNICAL ISO/IEC
REPORT TR
First edition
2014-06-15
Corrected version
2014-08-01
Information technology —
Specification methods for cultural
conventions
Technologies de l’information — Méthodes de spécification des
conventions culturelles
Reference number
©
ISO/IEC 2014
© ISO/IEC 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2014 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, and notations . 1
3.1 Terms and definitions . 1
3.2 Notations . 3
4 FDCC-set . 6
4.1 FDCC-set description . 7
4.2 LC_IDENTIFICATION .10
4.3 LC_CTYPE .12
4.4 LC_COLLATE .40
4.5 LC_MONETARY .49
4.6 LC_NUMERIC .52
4.7 LC_TIME .53
4.8 LC_MESSAGES .59
4.9 LC_XLITERATE .59
4.10 LC_NAME .61
4.11 LC_ADDRESS .62
4.12 LC_TELEPHONE.64
4.13 LC_PAPER .65
4.14 LC_MEASUREMENT .66
4.15 LC_KEYBOARD .66
5 CHARMAP .66
5.1 Character Set Description Text .66
5.2 WIDTH section .70
6 REPERTOIREMAP .70
Annex A (informative) Differences from the ISO/IEC 9945 standard .104
Annex B (informative) Rationale .106
Annex C (informative) BNF Grammar .117
Annex D (informative) Relation to taxonomy .122
Annex E (informative) Implementation in glibc .125
Annex F (informative) Index.126
Bibliography .135
© ISO/IEC 2014 – All rights reserved iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Details of any patent rights identified during the development of the document will be in the Introduction
and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical Barriers
to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 35, User
interfaces.
This corrected version of ISO/IEC/TR 30112:2014 incorporates the following corrections: the cover
page, the header on page 1 and the reference number have been corrected to reflect that this document
is a Technical Report, and not an International Standard.
iv © ISO/IEC 2014 – All rights reserved
Introduction
This Technical Report defines general mechanisms to specify cultural conventions, and it defines formats
for a number of specific cultural conventions in the areas of character classification and conversion,
sorting, number formatting, monetary formatting, date formatting, message display, addressing of
persons, postal address formatting, and telephone number handling.
There are a number of benefits coming from this Technical Report:
Rigid specification: Using this Technical Report, a user can rigidly specify a number of the cultural
conventions that apply to the information technology environment of the user.
Cultural adaptability: If an application has been designed and built in a culturally neutral manner,
the application may use the specifications as data to its APIs, and thus the same application may
accommodate different users in a culturally acceptable way to each of the users, without change of the
binary application.
Productivity: This Technical Report specifies cultural conventions and how to specify data for them.
With that data an application developer is relieved from getting the different information to support all
the cultural environments for the expected customers of the product. The application developer is thus
ensured of culturally correct behaviour as specified by the customer, and possibly more markets may
be reached as customers may have the possibility to provide the data themselves for markets that were
not targeted.
Uniform behaviour: When a number of applications share one cultural specification, which may be
supplied from the user or provided by the application or operating system, their behaviour for cultural
adaptation becomes uniform.
The specification formats are independent of platforms and specific encoding, and targeted to be usable
from a wide range of programming languages.
A number of cultural conventions, such as spelling, hyphenation rules and terminology, are not specifiable
with this Technical Report, but the Technical Report provides mechanisms to define new categories and
also new keywords within existing categories. An internationalized application may take advantage
of information provided with the FDCC-set (such as the language) to provide further internationalized
services to the user.
This Technical Report defines a format compatible with the one used in the International string ordering
standard, ISO/IEC 14651. This Technical Report is upward compatible with parts of the ISO/IEC 9945
POSIX standard, especially those on POSIX locales and charmaps. The major extensions from that text
are listed in annex A. This Technical Report has enhanced functionality in a number of areas such as
ISO/IEC 10646 support, more classification of characters, transliteration, dual (multi) currency support,
enhanced date and time formatting, personal name writing, postal address formatting, telephone
number handling, keyboard handling, and management of categories. There is enhanced support for
character sets including ISO/IEC 2022 handling and an enhanced method to separate the specification
of cultural conventions from an actual encoding via a description of the character repertoire employed.
A standard set of values for all the categories has been defined covering the repertoire of ISO/IEC 10646.
© ISO/IEC 2014 – All rights reserved v
TECHNICAL REPORT ISO/IEC TR 30112:2014(E)
Information technology — Specification methods for
cultural conventions
1 Scope
This Technical Report specifies description formats and functionality for the specification of cultural
conventions, description formats for character sets, and description formats for binding character
names to ISO/IEC 10646, plus a set of default values for some of these items.
The specification is upward compatible with POSIX locale specifications - a locale conformant to POSIX
specifications will also be conformant to specifications in this Technical Report, while the reverse
condition will not hold. Some of the descriptions are intended to be coded in text files to be used via
Application Programming Interfaces, that are expected to be developed for a number of systems which
comply with ISO/IEC 9945. An alignment effort has been undertaken for this specification to be aligned
with ISO/IEC 9945.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
This document contains no normative references.
3 Terms and definitions, and notations
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1.1 Bytes and characters
3.1.1.1
byte
individually addressable unit of data storage that is equal to or larger than an octet, used to store a
character or a portion of a character
Note 1 to entry: A byte is composed of a contiguous sequence of bits, the number of which is implementation
defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit
3.1.1.2
character
member of a set of elements used for the organization, control or representation of data
3.1.1.3
coded character
sequence of one or more bytes representing a single character
3.1.1.4
text file
file that contains characters organized into one or more lines
© ISO/IEC 2014 – All rights reserved 1
3.1.2 Cultural and other major concepts
3.1.2.1
cultural convention
data item for information technology that may vary dependent on language, territory, or other cultural
habits
3.1.2.2
FDCC
Formal Definition of a Cultural Convention
cultural convention put into a formal definition scheme
3.1.2.3
FDCC-set
set of Formal Definitions of Cultural Conventions (FDCC’s)
definition of the subset of a user’s information technology environment that depends on language and
cultural conventions
Note 1 to entry: the FDCC-set is a superset of the “locale” term in C and POSIX.
3.1.2.4
charmap
definition of a mapping between symbolic character names and character codes, plus related information
3.1.2.5
repertoiremap
definition of a mapping between symbolic character names and characters for the repertoire of
characters used in a FDCC-set
Note 1 to entry: This is further described in clause 6.
3.1.3 FDCC categories related
3.1.3.1
character class:
named set of characters sharing an attribute associated with the name of the class
3.1.3.2
collation
logical ordering of strings according to defined precedence rules
3.1.3.3
collating element
smallest entity used to determine logical ordering
Note 1 to entry: See collating sequence. A collating element consists of either a single character, or two or more
characters collating as a single entity. The LC_COLLATE category in the associated FDCC-set determines the set
of collating elements.
3.1.3.4
multicharacter collating element
sequence of two or more characters that collate as an entity
Note 1 to entry: For example, in some languages two characters are sorted as one letter, as in the case for Danish
and Norwegian “aa”.
3.1.3.5
collating sequence
relative order of collating elements as determined by the setting of the LC_COLLATE category in the
applied FDCC-set
2 © ISO/IEC 2014 – All rights reserved
3.1.3.6
equivalence class
set of collating elements with the same primary collation weight
Note 1 to entry: Elements in an equivalence class are typically elements that naturally group together, such
as all accented letters based on the same letter. The collation order of elements within an equivalence class is
determined by the weights assigned on any subsequent levels after the primary weight.
3.2 Notations
The following notations and common conventions for specifications apply to this Technical Report:
3.2.1 Notation for defining syntax
In this Technical Report, the description of an individual record in a FDCC-set is done using the syntax
notation given in the following.
The syntax notation looks as follows:
“”,[,,.,]
The is given in a format string enclosed in double quotes, followed by a number of parameters,
separated by commas. It is similar to the format specification defined in the ISO/IEC 9945 standard and
the format specification used in C language printf() function. The format of each parameter is given by
an escape sequence as follows:
— %s specifies a string
— %d specifies a decimal integer
— %c specifies a character
— %o specifies an octal integer
— %x specifies a hexadecimal integer
A “ “ (an empty character position) in the syntax string represents one or more characters.
All other characters in the format string represent themselves, except:
— %% specifies a single %
— \n specifies an end-of-line
The notation “.” is used to specify that repetition of the previous specification is optional, and this is
done in both the format string and in the parameter list.
3.2.2 Portable character set
A set of symbolic names for characters in Table 1, which is called the portable character set, is used
in character description text of this specification. The first eight entries in Table 1 are defined in
ISO/IEC 6429 and the rest is defined in ISO/IEC 9945 with some definitions from ISO/IEC 10646.
Table 1 — Portable character set
Symbolic name Glyph UCS Description
NULL (NUL)
BELL (BEL)
BACKSPACE (BS)
© ISO/IEC 2014 – All rights reserved 3
Table 1 (continued)
Symbolic name Glyph UCS Description
CHARACTER TABULATION (HT)
CARRIAGE RETURN (CR)
LINE FEED (LF)
LINE TABULATION (VT)
FORM FEED (FF)
SPACE
! EXCLAMATION MARK
“ QUOTATION MARK
# NUMBER SIGN
$ DOLLAR SIGN
% PERCENT SIGN
& AMPERSAND
‘ APOSTROPHE
( LEFT PARENTHESIS
) RIGHT PARENTHESIS
* ASTERISK
+ PLUS SIGN
, COMMA
- HYPHEN-MINUS
- HYPHEN-MINUS
. FULL STOP
. FULL STOP
/ SOLIDUS
/ SOLIDUS
0 DIGIT ZERO
1 DIGIT ONE
2 DIGIT TWO
3 DIGIT THREE
4 DIGIT FOUR
5 DIGIT FIVE
6 DIGIT SIX
7 DIGIT SEVEN
8 DIGIT EIGHT
9 DIGIT NINE
: COLON
; SEMICOLON
< LESS-THAN SIGN
= EQUALS SIGN
> GREATER-THAN SIGN
? QUESTION MARK
@ COMMERCIAL AT
4 © ISO/IEC 2014 – All rights reserved
Table 1 (continued)
Symbolic name Glyph UCS Description
A LATIN CAPITAL LETTER A
B LATIN CAPITAL LETTER B
C LATIN CAPITAL LETTER C
D LATIN CAPITAL LETTER D
E LATIN CAPITAL LETTER E
F LATIN CAPITAL LETTER F
G LATIN CAPITAL LETTER G
H LATIN CAPITAL LETTER H
I LATIN CAPITAL LETTER I
J LATIN CAPITAL LETTER J
K LATIN CAPITAL LETTER K
L LATIN CAPITAL LETTER L
M LATIN CAPITAL LETTER M
N LATIN CAPITAL LETTER N
O LATIN CAPITAL LETTER O
p LATIN SMALL LETTER P
q LATIN SMALL LETTER Q
r LATIN SMALL LETTER R
s LATIN SMALL LETTER S
t LATIN SMALL LETTER T
u LATIN SMALL LETTER U
v LATIN SMALL LETTER V
w LATIN SMALL LETTER W
x LATIN SMALL LETTER X
y LATIN SMALL LETTER Y
z LATIN SMALL LETTER Z
{ LEFT CURLY BRACKET
{ LEFT CURLY BRACKET
| VERTICAL LINE
} RIGHT CURLY BRACKET
} RIGHT CURLY BRACKET
~ TILDE
This Technical Report may use other symbolic character names than the above in examples, to illustrate
the use of the range of symbols allowed by the syntax specified in 4.1.1.
4 FDCC-set
A FDCC-set is the definition of the subset of a user’s information technology environment that depends
on language and cultural conventions. A FDCC-set is made up from one or more categories. Each category
is identified by its name and controls specific aspects of the behaviour of components of the system. The
functionality is implied by the description of the categories. This Technical Report defines the following
categories:
— LC_IDENTIFICATION Versions and status of categories
— LC_CTYPE Character classification, case conversion and code transformation.
— LC_COLLATE Collation order.
— LC_TIME Date and time formats.
— LC_NUMERIC Numeric, non-monetary formatting.
6 © ISO/IEC 2014 – All rights reserved
— LC_MONETARY Monetary formatting.
— LC_MESSAGES Formats of informative and diagnostic messages and interactive responses.
— LC_XLITERATE Character transliteration.
— LC_NAME Format of writing personal names.
— LC_ADDRESS Format of postal addresses.
— LC_TELEPHONE Format for telephone numbers, and other telephone information.
— LC_PAPER Paper format
— LC_MEASUREMENT Information on measurement system
— LC_KEYBOARD Format for identifying keyboards.
NOTE In future editions of this Technical Report further categories may be added.
Other category names beginning with the 3 characters “LC_” are reserved for future standardization,
except for category names beginning with the five characters “LC_X_” which is not used for future
addition of categories specified in this Technical Report. An application may thus use category names
beginning with the five characters “LC_X_” for application defined categories to avoid clashes with
future standardized categories.
This Technical Report also defines an FDCC-set named “i18n” with values for some of the above categories
in order to simplify FDCC-set descriptions for a number of cultures. The contents of “i18n” categories
should not necessarily be considered as the most commonly accepted values, while in many cases it
could be the recommended values. The complete “i18n” FDCC-set is defined as the sum of the “i18n”
categories specified in the clauses below. The ”i18n” FDCC-set and its parts are released under the GNU
Public License, version 2, as it is taken from glibc sources.
4.1 FDCC-set description
FDCC-sets are described with the syntax presented in this subclause. For the purposes of this Technical
Report, the text is referred to as the FDCC-set definition text or FDCC-set source text.
The FDCC-set definition text contains one or more FDCC-set category source definitions, and does not
contain more than one definition for the same FDCC-set category. If the text contains source definitions
for more than one category, application-defined categories, if present, appears after the categories
defined by this clause. A category source definition contains either the definition of a category or a copy
directive. In the event that some of the information for a FDCC-set category, as specified in this Technical
Report, is missing from the FDCC-set source definition, the behaviour of that category, if it is referenced,
is unspecified. A FDCC-set category is the normal way of specifying a single FDCC.
There are no naming conventions for FDCC-sets specified in this Technical Report, but clause 6.8 in
ISO/IEC 15897:1999 specifies naming rules for POSIX locales, charmaps and repertoiremaps, that may
also be applied to FDCC-sets, charmaps and repertoiremaps specified according to this Technical Report.
A category source definition consists of a category header, a category body, and a category trailer. A
category header consists of the character string naming of the category, beginning with the characters
“LC_”. The category trailer consists of the string “END”, followed by one or more “blank”s and the string
used in the corresponding category header.
The category body consists of one or more lines of text. Each line is one of the following:
— a line containing an identifier, optionally followed by one or more operands. Identifiers are either
keywords, identifying a particular FDCC, or collating elements, or section symbols,
— one of transliteration statements defined in 4.3.
© ISO/IEC 2014 – All rights reserved 7
In addition to the keywords defined in this Technical Report, the source can contain application-
defined keywords. Each keyword within a category has a unique name (i.e., two categories can have
a commonly-named keyword); no keyword starts with the characters “LC_”. Identifiers are separated
from the operands by one or more “blank”s.
Operands are characters, collating elements, section symbols, or strings of characters. Strings are
enclosed in double-quotes. Literal double-quotes within strings are preceded by the ,
described below. When a keyword is followed by more than one operand, the operands are separated by
semicolons; “blank”s are allowed before and/or after a semicolon.
4.1.1 Character representation
Individual characters, characters in strings, and collating elements are represented using symbolic
names, UCS notation or characters themselves, or as octal, hexadecimal, or decimal constants as defined
below. When constant notation is used, the resultant FDCC-set definitions need not be portable between
systems.
(0) The left angle bracket (<) is a reserved symbol, denoting the start of a symbolic name; when used to
represent itself outside a symbolic name it is preceded by the escape character.
(1) A character can be represented via a symbolic name, enclosed within angle brackets (< and >). The
symbolic name, including the angle brackets, exactly matches a symbolic name defined in a charmap or
a repertoiremap to be used, and is replaced by a character value determined from the value associated
with the symbolic name in the charmap or a value associated via a repertoiremap. Repertoiremaps have
predefined symbolic names for UCS characters, see clause 6. A FDCC-set may also use the UCS notation
of clause 6 to represent characters, without a repertoiremap being defined for the FDCC-set. Use of
the escape character or a right angle bracket within a symbolic name is invalid unless the character is
preceded by the escape character.
EXAMPLE ; “”
The items (2), (3), (4) and (5) are deprecated and are retained for compatibility with the POSIX standard.
FDCC-sets should be specified in a coded character set independent way, using symbolic names. To
make actual use of the FDCC-set, it is used together with charmaps and/or repertoiremaps, so that the
symbolic character names can be resolved into the actual character encoding used.
(2) A character can be represented by the character itself, in which case the value of the character is
application-defined. Within a string, the double-quote character, the escape character, and the right
angle bracket character are escaped (preceded by the escape character) to be interpreted as the
character itself. Outside strings, the characters
, ; < > escape_char
are escaped by the escape character to be interpreted as the character itself.
EXAMPLE c ä “May”
(3) A character can be represented as an octal constant. An octal constant is specified as the escape
character followed by two or more octal digits. Each constant represents a byte value.
EXAMPLE \143; \347; “\115”
(4) A character can be represented as a hexadecimal constant. A hexadecimal constant is specified as the
escape character followed by an x followed by two or more hexadecimal digits. Each constant represents
a byte value.
EXAMPLE \x63;\xe7;
(5) A character can be represented as a decimal constant. A decimal constant is specified as the escape
character followed by a d followed by two or more decimal digits. Each constant represents a byte value.
EXAMPLE \d99; \d231;
8 © ISO/IEC 2014 – All rights reserved
(6) Multibyte characters can be represented by concatenated constants specified in byte order with the
last constant specifying the least significant byte of the character. Concatenated constants can include a
mix of the above character representations.
EXAMPLE \143\xe7; “\115\xe7\d171”
Only characters existing in the character set for which the FDCC-set definition is created are specified,
whether using symbolic names, the characters themselves, or octal, decimal, or hexadecimal constants.
If a charmap is present, only characters defined in the charmap can be specified using octal, decimal, or
hexadecimal constants. Symbolic names not present in the charmap can be specified and are ignored, as
specified under item (1) above.
NOTE The symbolic character notation is recommended for use of specifying all characters in
a FDCC-set, to facilitate portability of the FDCC-sets, as the coded character set of the application of the FDCC-
set may be different from the coded character set of the FDCC-set source. This is also recommended for format
effectors in strings, such as in LC_DATE or LC_ADDRESS, where the format effectors are allowed to be stored
together with the rest of the string, in a binary string with a different encoding from that of the source FDCC-set.
4.1.2 Continuation of lines
A line in a specification can be continued by placing an escape character as the last visible graphic
character on the line; this continuation character is discarded from the input. The line is continued to
the next non-comment line.
4.1.3 Names for copy keyword
In most of the categories a “copy” keyword is allowed. The name specified with this copy keyword is
one of:
— “i18n” which indicate the “i18n” FDCC-set defined in this specification,
— the name of a FDCC-set or POSIX locale registered by the process defined in ISO/IEC 15897,
— any other name which may be recognized in some local context - not being recommended as an
international specification.
4.1.4 Pre-category statements
In a FDCC-set the following statements can precede category specifications, and they apply to all
categories in the specified FDCC-set.
4.1.4.1 comment_char
The following line in a FDCC-set modifies the comment character. It has the following syntax, starting
in column 1:
“comment_char %c\n”,
The comment character defaults to the number-sign (#). All examples in this Technical Report use
“%” as the , except where otherwise noted. Blank lines and lines containing the
in the first position are ignored. In collating statements a
occurring where the delimiter “;” may occur, terminates the collating statement.
4.1.4.2 escape_char
The following line in a FDCC-set modifies the escape character to be used in the text. It has the following
syntax, starting in column 1:
“escape_char %c\n”,
The escape character is used for representing characters in 4.1.1 and for continuing lines.
© ISO/IEC 2014 – All rights reserved 9
The escape character defaults to backslash “\”. All examples in this Technical Report uses “/” as the
escape character, except where otherwise noted.
4.1.4.3 repertoiremap
The following line in a FDCC-set specifies the name of a repertoiremap used to define the symbolic
character names in the FDCC-set. There may be at most one “repertoiremap” line. It has the following
syntax, starting in column 1:
“repertoiremap %s\n”,
The name is one of:
— “i18nrep” which indicates the “i18nrep” repertoiremap defined in this specification,
— the name of a registered by the process defined in ISO/IEC 15897,
— any other name which may be recognized in some local context - not being recommended as an
international specification.
4.1.4.4 charmap
The following line in a FDCC-set specifies the name of a charmap which may be used with the FDCC-set.
It has the following syntax, starting in column 1:
“charmap %s\n”,
This keyword gives a hint on which charmaps a FDCC-set is meant to be supported by. There may be
more than one charmap specification useful with a FDCC-set. It is an application’s responsibility to
decide what charmap specification is to be used with that application.
The name is one of:
— the name of a registered by the process defined in ISO/IEC 15897,
— any other name which may be recognized in some local context - not being recommended as an
international specification.
4.2 LC_IDENTIFICATION
The LC_IDENTIFICATION category defines properties of the FDCC-set, and which specification methods
the FDCC-set is conforming to. Values must be supplied for all unless otherwise noted, and the operands
are strings. The following keywords are defined:
— title Title of the FDCC-set.
— source Organization name of provider of the source.
— address Organization postal address.
— contact Name of contact person. This keyword is optional.
— email Electronic mail address of the organization, or contact person. This keyword is optional.
— tel Telephone number for the organization, in international format. This keyword is optional.
— fax Fax number for the organization, in international format. This keyword is optional.
— language Natural language to which the FDCC-set applies, as specified in ISO 639. If a two-letter
code exists for this language, it is used, else the three-letter code is used. This keyword is optional.
— territory The geographic extent where the FDCC-set applies (where applicable), as two-letter form
of ISO 3166. This keyword is optional.
10 © ISO/IEC 2014 – All rights reserved
— script Script that the FDCC-set especially uses, as defined by ISO 15924 and its registry. This
keyword is optional.
— audience If not for general use, an indication of the intended user audience. This keyword is optional.
— application If for use of a special application, a description of the application. This keyword is
optional.
— abbreviation Short name for provider of the source. This keyword is optional.
— revision Revision number consisting of digits and zero or more full stops (“.”).
— date Revision date in the format according to this example: “1995-02-05” meaning the 5th of
February, 1995.
If required information is not present in ISO 639 or ISO 3166, the string should be given as empty, and
the relevant Maintenance Authority should be approached to get the needed item registered.
NOTE Only one language per territory can be addressed with a single FDCC-set; an additional FDCC-set is
required for each additional language for that territory.
category Is used to define that a category is present and what specification the category is claiming
conformance to. The first operand is a string in double-quotes that describes the specification that the
category is claiming conformance to, and the following values are defined:
— ”i18n:2004”
— “i18n:2012”
— “posix:1993”
The second operand is a string with the category name, where the category names of clause 4 are defined.
More than one “category” keyword may be given, but only one per category name.
The “i18n” LC_IDENTIFICATION category is:
LC_IDENTIFICATION
% This is the ISO/IEC TR 30112 “i18n” definition for
% the LC_IDENTIFICATION category.
%
title “ISO/IEC TR 30112 i18n FDCC-set”
source “ISO/IEC Copyright Office”
address “Case postale 56, CH-1211 Geneve 20, Switzerland”
contact “”
email “”
tel “”
fax “”
language “”
territory “”
revision “1.1”
date “2010-07-30”
%
category “i18n:2004”;LC_IDENTIFICATION
category “i18n:2012”;LC_CTYPE
category “i18n:2004”;LC_COLLATE
category “i18n:2004”;LC_TIME
category “i18n:2004”;LC_NUMERIC
category “i18n:2004”;LC_MONETARY
category “i18n:2004”;LC_MESSAGES
category “i18n:2004”;LC_NAME
category “i18n:2004”;LC_ADDRESS
category “i18n:2004”;LC_TELEPHONE
category “i18n:2012”;LC_PAPER
category “i18n:2012”;LC_MEASUREMENT
category “i18n:2012”;LC_KEYBOARD
END LC_IDENTIFICATION
© ISO/IEC 2014 – All rights reserved 11
4.3 LC_CTYPE
The LC_CTYPE category defines character classification, case conversion, character transformation, and
other character attribute mappings. Support for the portable character set is required.
A series of characters in a specification can be represented by the hexadecimal symbolic ellipsis symbol
“.” (two dots), the decimal symbolic ellipses symbols “.” (4 dots), the double increment hexadecimal
symbolic ellipses “.(2).”, or the absolute ellipses “.” (3 dots).
The hexadecimal symbolic ellipsis (“.”) specification is only valid between symbolic character names.
The symbolic names consists of zero or more nonnumeric characters from the set shown with visible
glyphs in Table 1 of clause 3.2.2, followed by an integer formed by one or more hexadecimal digits, using
uppercase letters only for the range “A” to “F”. The characters preceding the hexadecimal integer are
identical in the two symbolic names, and the integer formed by the hexadecimal digits in the second
symbolic name are identical to or greater than the integer formed by the hexadecimal digits in the first
name. This is interpreted as a series of symbolic names formed from the common part and each of the
integers in hexadecimal format using uppercase letters only between the first and the second integer,
inclusive, and with a length of the symbolic names generated that is equal to the length of the first
(and also the second) symbolic name. As an example, . is interpreted as the symbolic
names , , , and , in that order.
The decimal symbolic ellipsis (“.”) specification is only valid between symbolic character names. The
symbolic names consist of zero or more nonnumeric characters from the set shown with visible glyphs
in Table 1 of clause 3.2.2, followed by an integer formed by one or more decimal digits. The characters
preceding the decimal integer are identical in the two symbolic names, and the integer formed by the
decimal digits in the second symbolic name is identical to or greater than the integer formed by the
decimal digits in the first name. This is interpreted as a series of symbolic names formed from the
common part and each of the integers in decimal format between the first and the second integer,
inclusive, and with a length of the symbolic names generated that is equal to the length of the first (and
also the second) symbolic name. As an example, . is interpreted as the symbolic names
, , , and , in that order.
The double increment hexadecimal symbolic ellipses (“.(2).”) works like the hexadecimal symbolic
ellipses, but generates only every other of the symbolic character names. As an example. .
(2). is interpreted as the symbolic character names , , , and
, in that order.
The absolute ellipsis specification is only valid within a single encoded character set. An ellipsis is
interpreted as including in the list all characters with an encoded value higher than the encoded value
of the character preceding the ellipsis and lower than the encoded value of the character following the
ellipsis. The absolute ellipsis specification is deprecated, as this is only relevant to FDCC-sets not using
symbolic characters.
NOTE \x30;.;\x39 includes in the character class all characters with encoded values between the endpoints.
4.3.1 Character classification keywords
The following keywords are recognized. In the descriptions, the term “automatically included” means
that it is not an error to either include the
...
P LATIN CAPITAL LETTER P
Q LATIN CAPITAL LETTER Q
R LATIN CAPITAL LETTER R
S LATIN CAPITAL LETTER S
T LATIN CAPITAL LETTER T
U LATIN CAPITAL LETTER U
V LATIN CAPITAL LETTER V
W LATIN CAPITAL LETTER W
X LATIN CAPITAL LETTER X
Y LATIN CAPITAL LETTER Y
Z LATIN CAPITAL LETTER Z
[ LEFT SQUARE BRACKET
\ REVERSE SOLIDUS
\ REVERSE SOLIDUS
] RIGHT SQUARE BRACKET
^ CIRCUMFLEX ACCENT
^ CIRCUMFLEX ACCENT
_ LOW LINE
_ LOW LINE
` GRAVE ACCENT
a LATIN SMALL LETTER A
b LATIN SMALL LETTER B
c LATIN SMALL LETTER C
d LATIN SMALL LETTER D
e LATIN SMALL LETTER E
f LATIN SMALL LETTER F
© ISO/IEC 2014 – All rights reserved 5
Table 1 (continued)
Symbolic name Glyph UCS Description
g LATIN SMALL LETTER G
h LATIN SMALL LETTER H
I LATIN SMALL LETTER I
j LATIN SMALL LETTER J
k LATIN SMALL LETTER K
l LATIN SMALL LETTER L
m LATIN SMALL LETTER M
n LATIN SMALL LETTER N
o LATIN SMALL LETTER O










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...