ISO/IEC 10646-1:1993
(Main)Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane
Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane
Technologies de l'information — Jeu universel de caractères codés à plusieurs octets — Partie 1: Architecture et table multilingue
General Information
Relations
Standards Content (Sample)
INTERNATIONAL
lSO/IEC
STANDARD
First edition
1993-05-01
Information technology - Universal
Multiple-Octet Coded Character Set
(UCS) -
Part 1:
Architecture and Basic Multilingual Plane
Technologies de I’informa tion
- Jeu universe/ de caracMes cod& 2
plusieurs octets -
Partie 1: Architecture et table multilingue
Reference number
&O/l EC 10646-I :1993(E)
ISOllEC 10646-l : 1993 (E)
Contents
Page
1 scope .
............................................................................................ 1
2 Conformance
3 Normative references .
4 Definitions .
................................................................... 3
5 General structure of the UCS
............................................................
6 Basic structure and nomenclature
..................................................................... 7
7 Special features of the UCS
.................................................................... 7
8 The Basic Multilingual Plane
............................................................................................. 7
9 Other planes
.......................................................................... 7
10 The Restricted Use zone
................................................................ 8
11 Private Use groups and planes
........................................................... 8
12 Revision and updating of the UCS
13 Subsets .
.................................................. 8
14 Coded representation forms of the UCS
15 Implementation levels .
...................................................... 9
16 Use of control functions with the UCS
.................................................... 9
17 Declaration of identification of features
..................................................... 11
18 Structure of the code tables and lists
19 Block names .
20 Characters in bi-directional context .
21 Special characters .
22 .
Order of characters
............................................................................. 13
23 Combining characters
24 Hangul syllable composition method .
.............................................. 14
25 Code tables and lists of character names
........................................................................ 262
26 CJK unified ideographs
Annexes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Collections of graphic characters for subsets 699
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
B List of combining characters
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
C Mirrored characters in Arabic bi-directional context
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
D Alternate format characters
0 ISO/IEC 1993
All rights reserved. No part of this publication may be reproduced or utilized in any form or
by any means, electronic or mechanical, including photocopying and microfilm, without per-
mission in writing from the publisher.
lSO/IEC Copyright Office l Case Postale 56 l CH-1211 Geneve 20 l Switzerland
Printed in Switzerland
ii
ISOAEC 10646-l : 1993 (E)
E Alphabetically sorted list of character names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
F The use of “signatures” to identify UCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743
G UCS transformation format (UTF-1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744
H Recommendation for combined receiving/originating
devices with internal storage . .*.
J Notations of octet value representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
K Character naming guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .*.*.~.
L Sources of characters 750
M External references to character repertoires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 752
N Scripts under consideration for future editions of
ISO/IEC 10646 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754
..I
Ill
ISOAEC 10646-l : 1993 (E)
Foreword
IS0 (the International Organization for Standardization) and IEC (the Inter-
national Electrotechnical Commission) form the specialized system for
worldwide standardization. National bodies that are members of IS0 or
IEC participate in the development of International Standards through
technical committees established by the respective organization to deal
with particular fields of technical activity. IS0 and IEC technical com-
mittees collaborate in fields of mutual interest. Other international organ-
izations, governmental and non-governmental, in liaison with IS0 and IEC,
also take part in the work.
In the field of information technology, IS0 and IEC have established a joint
technical committee, lSO/IEC JTC 1. Draft International Standards adopted
by the joint technical committee are circulated to national bodies for vot-
ing. Publication as an International Standard requires approval by at least
75 % of the national bodies casting a vote.
International Standard lSO/IEC 10646-l was prepared by Joint Technical
Committee lSO/IEC JTC 1, Information technology, Sub-Committee SC 2,
Character sets and information coding.
lSO/IEC 10646 consists of the following parts, under the general title In-
formation technology - Universal Multiple-Octet Coded Character Set
(KS):
- Part 1: Architecture and Basic Multilingual Plane
Additional parts will specify other planes.
Annexes A and B form an integral part of this part of lSO/IEC 10646. An-
nexes C to N are for information only.
ISOllEC 10646-l : 1993 (E)
Introduction
ISOAEC 10646 specifies the Universal Multiple-Octet Coded Character Set
(UCS). It is applicable to the representation, transmission, interchange,
processing, storage, input and presentation of the written form of the
languages (scripts) of the world as well as additional symbols.
This part of ISOAEC 10646 specifies the overall architecture and the Basic
Multilingual Plane (BMP) of the UCS.
ISO/IEC 10646-l : 1993 (E)
vi
~~~
INTERNATIONAL STANDARD ISO/IEC 10646-I : 1993 (E)
Information technology - Universal Multiple-Octet
Coded Character Set (UCS) -
Part 1:
Architecture and Basic Multilingual Plane
1 Scope 2 Conformance
ISO/l EC 10646 specifies the Universal Multiple-Octet
2.1 General
Coded Character Set (UCS). It is applicable to the
Whenever Private Use characters are used as
interchange,
representation, transmission,
specified in ISO/IEC 10646, the characters
processing, storage, input and presentation of the
themselves shall not be covered by these
written form of the languages of the world as well as
conformance requirements.
additional symbols.
This part of ISO/lEC 10646 specifies the overall
2.2 Conformance of information interchange
architecture, and
A coded-character-data-element (CC-data-element)
within coded information for interchange is in
- defines terms used in ISO/IEC 10646;
conformance with ISO/IEC 10646 if
- describes the general structure of the coded
a) all the coded representations of graphic
character set;
characters within that CC-data-element conform to
- specifies the Basic Multilingual Plane (BMP) of the
clauses 6 and 7, to an identified form chosen from
UCS, and defines a set of graphic characters used in
clause 14, and to an identified implementation level
scripts and the written form of languages on a
chosen from clause 15;
world-wide scale;
b) all the graphic characters represented within that
- specifies the names for the graphic characters of
CC-data-element are taken from those within an
the BMP, and the coded representations;
identified subset (clause 13);
- specifies the four-octet (32-bit) canonical form of
c) all the coded representations of control functions
the UCS: UCS-4;
within that CC-data-element conform to clause 16.
- specifies a two-octet (16-bit) BMP form of the UCS:
A claim of conformance shall identify the adopted
ucs-2;
form, the adopted implementation level and the
adopted subset by means of a list of collections
- specifies the coded representations for control
and/or characters.
functions;
- specifies the management of future additions to this
2.3 Conformance of devices
coded character set.
A device is in conformance with ISO/IEC 10646 if it
The UCS is a coding system different from that
conforms to the requirements of item a) below, and
specified in IS0 2022. The method to designate
either or both of items b) , and c).
UCS from IS0 2022 is specified in 17.2.
NOTE - The term device is defined (in 4.17) as a
component of information processing equipment which can
transmit and/or receive coded information within
CC-data-elements. A device may be a conventional
ISOAEC 10646-l : 1993 (E)
such as an application
input/output device, or a process editions of the standards listed below. Members of
program or gateway function.
IEC and IS0 maintain registers of currently valid
International Standards.
A claim of conformance shall identify the document
IS0 2022:1986 Information processing - IS0 7-bit
that contains the description specified in a) below,
and 8-bit coded character sets -Code extension
and shall identify the adopted form(s), the adopted
techniques.
implementation level, the adopted subset (by means
of a list of collections and/or characters), and the
ISO/lEC 6429:1992 Information technology -
selection of control functions adopted in accordance
Control functions for coded character sets.
with clause 16.
a) Device description: A device that conforms to
4 Definitions
ISO/IEC 10646 shall be the subject of a description
that identifies the means by which the user may
For the purposes of ISO/IEC 10646, the following
supply characters to the device and/or may
definitions apply :
recognise them when they are made available to the
user, as specified respectively, in subclauses b), and 4.1 Basic Multilingual Plane (BMP) : Plane 00 of
c) below. Group 00.
b) Originating device: An originating device shall 4.2 block : A contiguous collection of characters that
allow its user to supply any characters from an
share common characteristics, such as script.
adopted subset, and be capable of transmitting their
4.3 canonical form : The form with which characters
coded representations within a CC-data-element in
of this coded character set are specified using four
accordance with the adopted form and
octets to represent each character.
implementation level.
4.4 CC-data-element (Coded-Character-Data-
c) Receiving device: A receiving device shall be
Element) : An element of interchanged information
capable of receiving and interpreting any coded
that is specified to consist of a sequence of coded
representation of characters that are within a
representations of characters, in accordance with
CC-data-element in accordance with the adopted
one or more identified standards for coded character
form and implementation level, and shall make any
sets.
corresponding characters from the adopted subset
available to the user in such a way that the user can 4.5 cell : The place within a row at which an
identify them. individual character may be allocated.
Any corresponding characters that are not within the 4.6 character : A member of a set of elements used
adopted subset shall be indicated to the user in a for the organisation, control, or representation of
way which need not allow them to be distinguished data.
from each other.
4.7 character boundary : Within a stream of octets
NOTES
the demarcation between the last octet of the coded
representation of a character and the first octet of
An indication to the user may consist of making available
that of the next coded character.
the same character to represent all characters not in the
adopted subset, or providing a distinctive audible or visible
4.8 coded character : A character together with its
signal when appropriate to the type of user.
coded representation.
receiving with
2 See also annex H for
4.9 coded character set : A set of unambiguous
re-transmission capability.
rules that establishes a character set and the
relationship between the characters of the set and
their coded representation.
3 Normative references
4.10 code table : A table showing the characters
allocated to the octets in a code.
The following standards contain provisions which,
through reference in this text, constitute provisions of
4.11 combining character : A member of an
this part of ISO/IEC 10646. At the time of publication,
identified subset of the coded character set of
the editions indicated were. valid. All standards are
ISO/IEC 10646 intended for combination with the
subject to revision, and parties to agreements based
preceding non-combining graphic character, or with
on this part of ISO/IEC 10646 are encouraged to
a sequence of combining characters preceded by a
investigate the possibility of applying the most recent
non-combining character (see also 4.13).
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.