Information technology -- Universal Multiple-Octet Coded Character Set (UCS)

ISO/IEC 10646 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is applicable to the representation, transmission, interchange, processing, storage, input, and presentation of the written form of the languages of the world as well as of additional symbols. This document: specifies the architecture of ISO/IEC 10646, defines terms used in ISO/IEC 10646, describes the general structure of the coded character set; specifies the Basic Multilingual Plane (BMP) of the UCS, specifies supplementary planes of the UCS: the Supplementary Multilingual Plane (SMP), the Supplementary Ideographic Plane (SIP) and the Supplementary Special-purpose Plane (SSP), defines a set of graphic characters used in scripts and the written form of languages on a world-wide scale; specifies the names for the graphic characters of the BMP, SMP, SIP, SSP and their coded representations; specifies the four-octet (32-bit) canonical form of the UCS: UCS-4; specifies a two-octet (16-bit) BMP form of the UCS: UCS-2; specifies the coded representations for control functions; specifies the management of future additions to this coded character set. The UCS is a coding system different from that specified in ISO/IEC 2022. The method to designate UCS from ISO/IEC 2022 is specified in clause 16.2. A graphic character will be assigned only one code position in the standard, located either in the BMP or in one of the supplementary planes.

Technologies de l'information -- Jeu universel de caractères codés sur plusieurs octets (JUC)

L'ISO/CEI 10646:2003 normalise le jeu universel de caractères codés sur plusieurs octets (JUC). Elle s'applique à la représentation, à la transmission, à l'échange, au traitement, au stockage, à la saisie et à la présentation des langues du monde sous forme écrite et de symboles complémentaires.
L'ISO/CEI 10646:2003
décrit l'architecture de l'ISO/CEI 10646:2003,
définit les termes utilisés dans l'ISO/CEI 10646:2003,
décrit la structure générale du jeu de caractères codés,
décrit le plan multilingue de base (PMB) du JUC,
décrit les plans complémentaires du JUC: le Plan multilingue complémentaire (PMC), le Plan idéographique complémentaire (PIC) et le Plan complémentaire spécialisé (PCS),
définit un ensemble de caractères graphiques utilisés dans la forme écrite des langues à l'échelle mondiale,
nomme et établit la représentation codée des caractères graphiques du PMB, du PMC, du PIC et du PCS,
prescrit la forme canonique à quatre octets (32 bits) du JUC: UCS-4,
précise une forme du PMB à deux octets (16 bits) pour le JUC: UCS-2,
établit la représentation codée des fonctions de commandes, et
établit la gestion de tout développement ultérieur du présent jeu de caractères codés.
Le JUC est un système de codage différent de celui décrit dans l'ISO/CEI 2022. Un caractère graphique donné ne sera affecté qu'à une seule position de code dans l'ISO/CEI 10646:2003, située soit dans le PMB, soit dans un des plans complémentaires.
NOTE - La version 4.0 d'Unicode définit un ensemble de caractères, de noms et de représentations codées identiques à l'ensemble de l'ISO/CEI 10646:2003. Elle fournit, de surcroît, des informations supplémentaires relatives aux propriétés de ces caractères, aux algorithmes de traitement ainsi que des définitions utiles aux développeurs.
En définissant une manière cohérente de coder du texte multilingue, l'ISO/CEI 10646:2003 permet l'échange de données au niveau international. L'industrie des technologies de l'information y gagne en stabilité des données et en une meilleure interopérabilité mondiale. L'ISO/CEI 10646:2003 a été adoptée par de nouveaux protocoles Internet et mise en oeuvre dans des systèmes d'exploitation et des langages informatiques. Cette édition contient plus de 95 000 caractères des écritures du monde entier.

Informacijska tehnologija – Univerzalni večoktetni nabor znakov (UCS)

General Information

Status
Withdrawn
Publication Date
15-Oct-2008
Withdrawal Date
23-Oct-2013
Technical Committee
Current Stage
9900 - Withdrawal (Adopted Project)
Start Date
24-Oct-2013
Due Date
16-Nov-2013
Completion Date
24-Oct-2013

Relations

Buy Standard

Standard
ISO/IEC 10646:2003 - Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
English language
1418 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO/IEC 10646:2008 - To je samo 1.del standarda + SIST cover. Standard je obsežen, sestavljen iz 12 dokumentov (vsi so označeni z tipom "Dokument"). Število strani in cena veljata za celoten SIST standard. Standard za prodajo se izdela v elektronski obliki na CD-romu.
English language
1418 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day
Standard
ISO/IEC 10646:2003 - Technologies de l'information -- Jeu universel de caracteres codés sur plusieurs octets (JUC)
French language
1425 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 10646
First edition
2003-12-15

Information technology — Universal
Multiple-Octet Coded Character Set (UCS)
Technologies de l'information — Jeu universel de caractères codés sur
plusieurs octets (JUC)




Reference number
ISO/IEC 10646:2003(E)
©
ISO/IEC 2003

---------------------- Page: 1 ----------------------
ISO/IEC 10646:2003(E)

PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

This CD-ROM contains
1) the publication ISO/IEC 10646:2003 in portable document format (PDF), which can be viewed using
Adobe® Acrobat® Reader;
2) text files containing lists of
i) source references for CJK ideographs,
ii) Hangul syllables and mapping information,
iii) alphabetically sorted character names.
Adobe and Acrobat are trademarks of Adobe Systems Incorporated.

This first edition cancels and replaces ISO/IEC 10646-1:2000 and ISO/IEC 10
...

SLOVENSKI STANDARD
SIST ISO/IEC 10646:2008
01-november-2008
,QIRUPDFLMVNDWHKQRORJLMD±8QLYHU]DOQLYHþRNWHWQLQDERU]QDNRY 8&6
Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
Technologies de l'information -- Jeu universel de caractères codés sur plusieurs octets
(JUC)
Ta slovenski standard je istoveten z: ISO/IEC 10646:2003
ICS:
35.040 Nabori znakov in kodiranje Character sets and
informacij information coding
SIST ISO/IEC 10646:2008 en,fr
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

---------------------- Page: 1 ----------------------

SIST ISO/IEC 10646:2008

---------------------- Page: 2 ----------------------

SIST ISO/IEC 10646:2008

INTERNATIONAL ISO/IEC
STANDARD 10646
First edition
2003-12-15


Information technology — Universal
Multiple-Octet Coded Character Set (UCS)
Technologies de l'information — Jeu universel de caractères codés sur
plusieurs octets (JUC)




Reference number
ISO/IEC 10646:2003(E)
©
ISO/IEC 2003

---------------------- Page: 3 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.


©  ISO/IEC 2003
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2003 – All rights reserved

---------------------- Page: 4 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)
Contents Page
1 Scope.1
2 Conformance.1
3 Normative references.2
4 Terms and definitions.2
5 General structure of the UCS.4
6 Basic structure and nomenclature.5
7 General requirements for the UCS.9
8 The Basic Multilingual Plane .9
9 Supplementary planes.10
10 Private use groups, planes, and zones.10
11 Revision and updating of the UCS .10
12 Subsets.10
13 Coded representation forms of the UCS .11
14 Implementation levels.11
15 Use of control functions with the UCS.11
16 Declaration of identification of features.12
17 Structure of the code tables and lists .13
18 Block names.13
19 Characters in bi-directional context.14
20 Special characters.14
21 Presentation forms of characters .17
22 Compatibility characters.18
23 Order of characters .18
24 Normalization forms.18
25 Combining characters.18
26 Special features of individual scripts .20
27 Source references for CJK Ideographs.20
28 Character names and annotations.23
29 Structure of the Basic Multilingual Plane.25
30 Structure of the Supplementary Multilingual Plane for Scripts and symbols.27
31 Structure of the Supplementary Ideographic Plane .28
32 Supplementary Special-purpose Plane.28
33 Code tables and lists of character names.28
NOTE  The code tables and lists of character names are given on pages 29-1348. They are contained

in separate files which are accessed by clicking on the appropriate highlighted text in Clause 33.

Annexes
A (normative) Collections of graphic characters for subsets .1349
B (normative) List of combining characters .1358
C (normative) Transformation format for 16 planes of Group 00 (UTF-16) .1364
© ISO/IEC 2003 – All rights reserved
iii

---------------------- Page: 5 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)
D (normative) UCS Transformation Format 8 (UTF-8) . 1367
E (informative) Mirrored characters in Arabic bi-directional context. 1371
F (informative) Alternate format characters. 1374
G (informative) Alphabetically sorted list of character names . 1379
H (informative) The use of “signatures” to identify UCS. 1380
J (informative) Recommendation for combined receiving/originating devices with
internal storage . 1381
K (informative) Notations of octet value representations . 1382
L (informative) Character naming guidelines . 1383
M (informative) Sources of characters . 1386
N (informative) External references to character repertoires . 1390
P (informative) Additional information on characters . 1392
Q (informative) Code mapping table for Hangul syllables . 1395
R (informative) Names of Hangul syllables . 1396
S (informative) Procedure for the unification and arrangement of CJK
Ideographs. 1408
T (informative) Language tagging using Tag Characters. 1416
U (informative) Usage of musical symbols . 1418

iv © ISO/IEC 2003 – All rights reserved

---------------------- Page: 6 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)

Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 10646 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 2, Coded character sets.
This first edition of ISO/IEC 10646 cancels and replaces ISO/IEC 10646-1:2000 and ISO/IEC 10646-2:2001. It
also incorporates ISO/IEC 10646-1:2000/Amd.1:2002.

© ISO/IEC 2003 – All rights reserved
v

---------------------- Page: 7 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)



Introduction
ISO/IEC 10646 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is
applicable to the representation, transmission, interchange, processing, storage, input
and presentation of the written form of the languages of the world as well as additional
symbols.
By defining a consistent way of encoding multilingual text it enables the exchange of
data internationally. The information technology industry gains data stability, greater
global interoperability and data interchange. ISO/IEC 10646 has been widely adopted in
new Internet protocols and implemented in modern operating systems and computer
languages. This edition covers over 95 000 characters from the world’s scripts.
ISO/IEC 10646 contains material which may only be available to users who obtain their
copy in a machine readable format. That material consists of the following printable files:
⎯ CJKU_SR.txt
⎯ CJKC_SR.txt
⎯ Allnames.txt
⎯ HangulX.txt
⎯ HangulSy.txt

vi © ISO/IEC 2003 – All rights reserved

---------------------- Page: 8 ----------------------

SIST ISO/IEC 10646:2008
INTERNATIONAL STANDARD ISO/IEC 10646: 2003 (E)
Information technology — Universal Multiple-Octet
Coded Character Set (UCS)

1 Scope
2 Conformance
ISO/IEC 10646 specifies the Universal Multiple-Octet
Coded Character Set (UCS). It is applicable to the
2.1 General
representation, transmission, interchange, processing,
storage, input, and presentation of the written form of
Whenever private use characters are used as speci-
the languages of the world as well as of additional
fied in ISO/IEC 10646, the characters themselves shall
symbols.
not be covered by these conformance requirements.
This document:
2.2 Conformance of information interchange
- specifies the architecture of ISO/IEC 10646,
- defines terms used in ISO/IEC 10646, A coded-character-data-element (CC-data-element)
within coded information for interchange is in confor-
- describes the general structure of the coded char-
mance with ISO/IEC 10646 if
acter set;
a) all the coded representations of graphic charac-
- specifies the Basic Multilingual Plane (BMP) of the
ters within that CC-data-element conform to
UCS,
clauses 6 and 7, to an identified form chosen from
- specifies supplementary planes of the UCS: the
clause 13 or annex C or annex D, and to an identi-
Supplementary Multilingual Plane (SMP), the
fied implementation level chosen from clause 14;
Supplementary Ideographic Plane (SIP) and the
b) all the graphic characters represented within that
Supplementary Special-purpose Plane (SSP),
CC-data-element are taken from those within an
- defines a set of graphic characters used in scripts
identified subset (see clause 12);
and the written form of languages on a world-wide
c) all the coded representations of control functions
scale;
within that CC-data-element conform to clause 15.
- specifies the names for the graphic characters of
the BMP, SMP, SIP, SSP and their coded repre- A claim of conformance shall identify the adopted form,
sentations; the adopted implementation level and the adopted
subset by means of a list of collections and/or charac-
- specifies the four-octet (32-bit) canonical form of
ters.
the UCS: UCS-4;
- specifies a two-octet (16-bit) BMP form of the
2.3 Conformance of devices
UCS: UCS-2;
A device is in conformance with ISO/IEC 10646 if it
- specifies the coded representations for control
conforms to the requirements of item a) below, and
functions;
either or both of items b) and c).
- specifies the management of future additions to
NOTE – The term device is defined (in 4.18) as a compo-
this coded character set.
nent of information processing equipment which can trans-
The UCS is a coding system different from that speci- mit and/or receive coded information within CC-data-
elements. A device may be a conventional input/output de-
fied in ISO/IEC 2022. The method to designate UCS
vice, or a process such as an application program or gate-
from ISO/IEC 2022 is specified in clause 16.2.
way function.
A graphic character will be assigned only one code
A claim of conformance shall identify the document
position in the standard, located either in the BMP or
that contains the description specified in a) below, and
in one of the supplementary planes.
shall identify the adopted form(s), the adopted imple-
NOTE – The Unicode Standard, Version 4.0 includes a set
mentation level, the adopted subset (by means of a list
of characters, names, and coded representations that are
of collections and/or characters), and the selection of
identical with those in this International Standard. It addi-
control functions adopted in accordance with
tionally provides details of character properties, processing
algorithms, and definitions that are useful to implementers.
clause 15.
© ISO/IEC 2003 – All rights reserved 1

---------------------- Page: 9 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)
a) Device description: A device that conforms to
4 Terms and definitions
ISO/IEC 10646 shall be the subject of a descrip-
For the purposes of this document, the following terms
tion that identifies the means by which the user
and definitions apply.
may supply characters to the device and/or may
recognize them when they are made available to
4.1 Basic Multilingual Plane (BMP)
the user, as specified respectively, in sub-clauses
Plane 00 of Group 00.
b), and c) below.
4.2 Block
A contiguous range of code positions to which a set of
b) Originating device: An originating device shall
characters that share common characteristics, such as
allow its user to supply any characters from an
a script, are allocated. A block does not overlap an-
adopted subset, and be capable of transmitting
other block. One or more of the code positions within a
their coded representations within a CC-data-
block may have no character allocated to them.
element in accordance with the adopted form and
implementation level.
4.3 Canonical form
The form with which characters of this coded character
c) Receiving device: A receiving device shall be
set are specified using four octets to represent each
capable of receiving and interpreting any coded
character.
representation of characters that are within a CC-
data-element in accordance with the adopted form
4.4 CC-data-element (coded-character-data-
and implementation level, and shall make any cor-
element)
responding characters from the adopted subset
An element of interchanged information that is speci-
available to the user in such a way that the user
fied to consist of a sequence of coded representations
can identify them.
of characters, in accordance with one or more identi-
fied standards for coded character sets.
Any corresponding characters that are not within the
4.5 Cell
adopted subset shall be indicated to the user. The way
The place within a row at which an individual character
used for indicating them need not distinguish them
may be allocated.
from each other.
NOTE 1 – An indication to the user may consist of making 4.6 Character
available the same character to represent all characters not
A member of a set of elements used for the organiza-
in the adopted subset, or providing a distinctive audible or
tion, control, or representation of data.
visible signal when appropriate to the type of user.
4.7 Character boundary
NOTE 2 – See also annex J for receiving devices with re-
Within a stream of octets the demarcation between the
transmission capability.
last octet of the coded representation of a character
and the first octet of that of the next coded character.
4.8 Coded character
3 Normative references
A character together with its coded representation.
The following referenced documents are
4.9 Coded character set
indispensable for the application of this document.
A set of unambiguous rules that establishes a charac-
For dated references, only the edition cited applies.
ter set and the relationship between the characters of
For undated references, the latest edition of the
the set and their coded representation.
referenced document (including any amendments)
4.10 Code table
applies.
A table showing the characters allocated to the octets
in a code.
ISO/IEC 2022:1994, Information technology — Charac-
ter code structure and extension techniques.
4.11 Collection
A set of coded characters which is numbered and
ISO/IEC 6429:1992, Information technology — Control
named and which consists of those coded characters
functions for coded character sets.
whose code positions lie within one or more identified

ranges.
Unicode Standard Annex, UAX#9, The Unicode Bidi-
NOTE – If any of the identified ranges include code posi-
rectional Algorithm, Version 4.0.0, 2003-04-17.
tions to which no character is allocated, the repertoire of the
collection will change if an additional character is assigned
to any of those positions at a future amendment of this In-
Unicode Standard Annex, UAX#15, Unicode Normali-
ternational Standard. However it is intended that the collec-
zation Forms, Version 4.0.0, 2003-04-17.
tion number and name will remain unchanged in future edi-
tions of this International Standard.
2 © ISO/IEC 2003 – All rights reserved

---------------------- Page: 10 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)
4.12 Combining character 4.22 Group
A member of an identified subset of the coded charac- A subdivision of the coding space of this coded char-
ter set of ISO/IEC 10646 intended for combination with acter set; of 256 x 256 x 256 cells.
the preceding non-combining graphic character, or
4.23 High-half zone
with a sequence of combining characters preceded by
A set of cells reserved for use in UTF-16 (see annex
a non-combining character (see also 4.14).
C); an RC-element corresponding to any of these cells
NOTE – ISO/IEC 10646 specifies several subset collections
may be used in UTF-16 as the first of a pair of RC-
which include combining characters.
elements which represents a character from a plane
other than the BMP.
4.13 Compatibility character
A graphic character included as a coded character of
4.24 Interchange
ISO/IEC 10646 primarily for compatibility with existing
The transfer of character coded data from one user to
coded character sets.
another, using telecommunication means or inter-
changeable media.
4.14 Composite sequence
A sequence of graphic characters consisting of a non-
4.25 Interworking
combining character followed by one or more combin-
The process of permitting two or more systems, each
ing characters (see also 4.12).
employing different coded character sets, meaningfully
to interchange character coded data; conversion be-
NOTE 1 – A graphic symbol for a composite sequence
generally consists of the combination of the graphic sym-
tween the two codes may be involved.
bols of each character in the sequence.
4.26 ISO/IEC 10646-1
NOTE 2 – A composite sequence is not a character and
A former subdivision of the standard. It is also referred
therefore is not a member of the repertoire of ISO/IEC
to as Part 1 of ISO/IEC 10646 and contained the
10646.
specification of the overall architecture and the Basic
4.15 Control function
Multilingual Plane (BMP). There are a First and a Sec-
An action that affects the recording, processing,
ond Edition of ISO/IEC 10646-1.
transmission, or interpretation of data, and that has a
4.27 ISO/IEC 10646-2
coded representation consisting of one or more octets.
A former subdivision of the standard. It is also referred
4.16 Default state
to as Part 2 of ISO/IEC 10646 and contained the
The state that is assumed when no state has been
specification of the Supplementary Multilingual Plane
explicitly specified.
(SMP), the Supplementary Ideographic Plane (SIP)
and the Supplementary Special-purpose Plane (SSP).
4.17 Detailed code table
There is only a First Edition of ISO/IEC 10646-2.
A code table showing the individual characters, and
normally showing a partial row.
4.28 Low-half zone
A set of cells reserved for use in UTF-16 (see annex
4.18 Device
C); an RC-element corresponding to any of these cells
A component of information processing equipment
may be used in UTF-16 as the second of a pair of RC-
which can transmit and/or receive coded information
elements which represents a character from a plane
within CC-data-elements. (It may be an input/output
other than the BMP.
device in the conventional sense, or a process such as
an application program or gateway function.)
4.29 Octet
An ordered sequence of eight bits considered as a unit.
4.19 Fixed collection
A collection in which every code position within the
4.30 Plane
identified range(s) has a character allocated to it, and
A subdivision of a group; of 256 x 256 cells.
which is intended to remain unchanged in future edi-
4.31 Presentation; to present
tions of this International Standard.
The process of writing, printing, or displaying a graphic
4.20 Graphic character
symbol.
A character, other than a control function, that has a
4.32 Presentation form
visual representation normally handwritten, printed, or
In the presentation of some scripts, a form of a graphic
displayed.
symbol representing a character that depends on the
4.21 Graphic symbol
position of the character relative to other characters.
The visual representation of a graphic character or of
4.33 Private use plane
a composite sequence.
A plane within this coded character set; the contents of
which is not specified in ISO/IEC 10646 (see
clause 10).
© ISO/IEC 2003 – All rights reserved  3

---------------------- Page: 11 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)
The value of any octet is expressed in hexadecimal
4.34 RC-element
notation from 00 to FF in ISO/IEC 10646 (see an-
A two-octet sequence comprising the R-octet and the
nex K).
C-octet (see clause 6.2) from the four octet sequence
(in the canonical form) that corresponds to a cell in the
The canonical form of this coded character set – the
coding space of this coded character set.
way in which it is to be conceived – uses a four-
dimensional coding space, regarded as a single entity,
4.35 Repertoire
consisting of 128 three-dimensional groups.
A specified set of characters that are represented in a
coded character set.
NOTE 1 – Thus, bit 8 of the most significant octet in the ca-
nonical form of a coded character can be used for internal
4.36 Row
processing purposes within a device as long as it is set to
A subdivision of a plane; of 256 cells.
zero within a conforming CC-data-element.
4.37 Script
Each group consists of 256 two-dimensional planes.
A set of graphic characters used for the written form of
Each plane consists of 256 one-dimensional rows,
one or more languages.
each row containing 256 cells. A character is located
and coded at a cell within this coding space or the cell
4.38 Supplementary plane
is declared unused.
A plane other than Plane 00 of Group 00; a plane that
accommodates characters which have not been allo-
In the canonical form, four octets are used to repre-
cated to the Basic Multilingual Plane.
sent each character, and they specify the group, plane,
row and cell, respectively. The canonical form consists
4.39 Supplementary Multilingual Plane for
of four octets since two octets are not sufficient to
scripts and symbols (SMP)
cover all the characters in the world, and a 32-bit rep-
Plane 01 of Group 00.
resentation follows modern processor architectures.
4.40 Supplementary Ideographic Plane (SIP)
The four-octet canonical form can be used as a four-
Plane 02 of Group 00.
octet coded character set, in which case it is called
UCS-4.
4.41 Supplementary Special-purpose Plane
(SSP)
NOTE 2 – The use of the term “canonical” for this form
Plane 0E of Group 00. does not imply any restriction or preference for this form
over transformation formats that a conforming implementa-
4.42 Unpaired RC-element
tion may choose for the representation of UCS characters.
An RC-element in a CC-data element that is either:
ISO/IEC 10646 defines graphic characters and their
• an RC-element from the high-half zone that is not coded representation for the following planes:
immediately followed by an RC-element from the
• The Basic Multilingual Plane (BMP, Plane 00 of
low-half zone, or
Group 00). The Basic Multilingual Plane can be
• an RC-element from the low-half zone that is not
used as a two-octet coded character set identified
immediately preceded by an RC-element from the
as UCS-2.
high-half zone.
• The Supplementary Multilingual Plane for scripts
4.43 User
and symbols (SMP, Plane 01 of Group 00).
A person or other entity that invokes the service pro-
• The Supplementary Ideographic Plane (SIP,
vided by a device. (This entity may be a process such
Plane 02 of Group 00).
as an application program if the “device” is a code
converter or a gateway function, for example.) • The Supplementary Special-purpose Plane (SSP,
Plane 0E of Group 00).
4.44 Zone
Additional supplementary planes may be defined in
A sequence of cells of a code table, comprising one or
the future to accommodate additional graphic charac-
more rows, either in whole or in part, containing char-
ters.
acters of a particular class (for example see clause 8).
The planes that are reserved for private use are speci-
fied in clause 10. The contents of the cells in private
5 General structure of the UCS
use planes and zones are not specified in ISO/IEC
The general structure of the Universal Multiple-Octet
10646.
Coded Character Set (referred to hereafter as “this
Each character is located within the coded character
coded character set”) is described in this explanatory
set in terms of its Group-octet, Plane-octet, Row-octet,
clause, and is illustrated in figures 1 and 2. The nor-
and Cell-octet.
mative specification of the structure is given in the fol-
lowing clauses.
4 © ISO/IEC 2003 – All rights reserved

---------------------- Page: 12 ----------------------

SIST ISO/IEC 10646:2008
ISO/IEC 10646:2003 (E)
Subsets of the coding space may be used in order to This entire coded character set shall be conceived of
give a sub-repertoire of graphic characters. as comprising 128 groups of 256 planes. Each plane
shall be regarded as containing 256 rows of charac-
A UCS Transformation Format (UTF-16) is specified in
ters, each row containing 256 cells. In a code table
annex C which can be used to represent characters
representing the contents of a plane (such as in figure
from 16 supplementary planes of Group 00 (Planes 01
2), the horizontal axis shall represent the least signifi-
to 10), in addition to the BMP (Plane 00), in a form that
cant octet, with its smaller value to the left; and the
is compatible with the two-octet BMP form.
vertical axis shall represent the more significant octet,
with its smaller value at the top.
Another UC
...

NORME ISO/CEI
INTERNATIONALE 10646
Première édition
2003-12-15

Technologies de l'information — Jeu
universel de caractères codés sur
plusieurs octets (JUC)
Information technology — Universal Multiple-Octet Coded Character
Set (UCS)




Numéro de référence
ISO/CEI 10646:2003(F)
©
ISO/CEI 2003

---------------------- Page: 1 ----------------------
ISO/CEI 10646:2003(F)

PDF – Exonération de responsabilité
Le présent fichier PDF peut contenir des polices de caractères intégrées. Conformément aux conditions de licence d'Adobe, ce fichier
peut être imprimé ou visualisé, mais ne doit pas être modifié à moins que l'ordinateur employé à cet effet ne bénéficie d'une licence
autorisant l'utilisation de ces polices et que celles-ci y soient installées. Lors du téléchargement de ce fichier, les parties concernées
acceptent de fait la responsabilité de ne pas enfreindre les conditions de licence d'Adobe. Le Secrétariat central de l'ISO décline toute
responsabilité en la matière.
Adobe est une marque déposée d'Adobe Systems Incorporated.
Les détails relatifs aux produits logiciels utilisés pour la création du présent fichier PDF sont disponibles dans la rubrique General Info
du fichier; les paramètres de création PDF ont été optimisés pour l'impression. Toutes les mesures ont été prises pour garantir
l'exploitation de ce fichier par les comités membres de l'ISO. Dans le cas peu probable où surviendrait un problème d'utilisation,
veuillez en informer le Secrétariat central à l'adresse donnée ci-dessous.

Le présent CD-ROM contient:
1) la publication ISO/CEI 10646:2003 au format PDF (portable document format), qui peut être
visualisée en utilisant Adobe® Acrobat® Reader;
2) des fichiers textes contenant les listes de
i) références de source pour les idéogrammes CJC,
ii) syllabes hangûl et d'informations relatives au mappage,
iii) noms de caractères triés par ordre alphabétique.
Adobe et Acrobat sont des marques déposées de Adobe Systems Incorporated.

Cette première édition annule et remplace l
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.