SIST ISO/IEC 10646:2008
Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
ISO/IEC 10646:2003 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is applicable to the representation, transmission, interchange, processing, storage, input and presentation of the written form of the languages of the world as well as additional symbols.
ISO/IEC 10646:2003
specifies the architecture of ISO/IEC 10646:2003;
defines terms used in ISO/IEC 10646:2003;
describes the general structure of the coded character set;
specifies the Basic Multilingual Plane (BMP) of the UCS;
specifies supplementary planes of the UCS: the Supplementary Multilingual Plane (SMP), the Supplementary Ideographic Plane (SIP) and the Supplementary Special-purpose Plane (SSP);
defines a set of graphic characters used in scripts and the written form of languages on a world-wide scale;
specifies the names for the graphic characters of the BMP, SMP, SIP, SSP and their coded representations;
specifies the four-octet (32-bit) canonical form of the UCS: UCS-4;
specifies a two-octet (16-bit) BMP form of the UCS: UCS-2;
specifies a multiple byte (one to four) byte transformation UTF-8 for use with ISO 646 (ASCII) byte-oriented environments;
specifies a two 16-bit form and associated transformation UTF-16 for supplementary characters;
specifies collection identifiers for selected set of character subsets;
specifies the coded representations for control functions;
specifies the management of future additions to this coded character set;
incorporates the Unicode bi-directional algorithm and normalization forms by reference.
The UCS is a coding system different from that specified in ISO/IEC 2022. A graphic character will be assigned only one code position in ISO/IEC 10646:2003, located either in the BMP or in one of the supplementary planes.
NOTE - The Unicode Standard, Version 4.0 includes a set of characters, names, and coded representations that are identical with those in ISO/IEC 10646:2003. It additionally provides details of character properties, processing algorithms, and definitions that are useful to implementers. Version 4.0 strengthens Unicode support for worldwide communication, software availability, and publishing.
By defining a consistent way of encoding multilingual text ISO/IEC 10646:2003 enables the exchange of data internationally. The information technology industry gains data stability, greater global interoperability and data interchange. ISO/IEC 10646:2003 has been widely adopted in new Internet and W3C protocols and mark up languages such as XML and HTML, and implemented in modern operating systems and computer programming languages. This edition covers over 96 000 characters from the world's scripts.
Technologies de l'information -- Jeu universel de caractères codés sur plusieurs octets (JUC)
L'ISO/CEI 10646:2003 normalise le jeu universel de caractères codés sur plusieurs octets (JUC). Elle s'applique à la représentation, à la transmission, à l'échange, au traitement, au stockage, à la saisie et à la présentation des langues du monde sous forme écrite et de symboles complémentaires.
L'ISO/CEI 10646:2003
décrit l'architecture de l'ISO/CEI 10646:2003,
définit les termes utilisés dans l'ISO/CEI 10646:2003,
décrit la structure générale du jeu de caractères codés,
décrit le plan multilingue de base (PMB) du JUC,
décrit les plans complémentaires du JUC: le Plan multilingue complémentaire (PMC), le Plan idéographique complémentaire (PIC) et le Plan complémentaire spécialisé (PCS),
définit un ensemble de caractères graphiques utilisés dans la forme écrite des langues à l'échelle mondiale,
nomme et établit la représentation codée des caractères graphiques du PMB, du PMC, du PIC et du PCS,
prescrit la forme canonique à quatre octets (32 bits) du JUC: UCS-4,
précise une forme du PMB à deux octets (16 bits) pour le JUC: UCS-2,
établit la représentation codée des fonctions de commandes, et
établit la gestion de tout développement ultérieur du présent jeu de caractères codés.
Le JUC est un système de codage différent de celui décrit dans l'ISO/CEI 2022. Un caractère graphique donné ne sera affecté qu'à une seule position de code dans l'ISO/CEI 10646:2003, située soit dans le PMB, soit dans un des plans complémentaires.
NOTE - La version 4.0 d'Unicode définit un ensemble de caractères, de noms et de représentations codées identiques à l'ensemble de l'ISO/CEI 10646:2003. Elle fournit, de surcroît, des informations supplémentaires relatives aux propriétés de ces caractères, aux algorithmes de traitement ainsi que des définitions utiles aux développeurs.
En définissant une manière cohérente de coder du texte multilingue, l'ISO/CEI 10646:2003 permet l'échange de données au niveau international. L'industrie des technologies de l'information y gagne en stabilité des données et en une meilleure interopérabilité mondiale. L'ISO/CEI 10646:2003 a été adoptée par de nouveaux protocoles Internet et mise en oeuvre dans des systèmes d'exploitation et des langages informatiques. Cette édition contient plus de 95 000 caractères des écritures du monde entier.
Informacijska tehnologija – Univerzalni večoktetni nabor znakov (UCS)
General Information
Relations
Buy Standard
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 10646
First edition
2003-12-15
Information technology — Universal
Multiple-Octet Coded Character Set (UCS)
Technologies de l'information — Jeu universel de caractères codés sur
plusieurs octets (JUC)
Reference number
©
ISO/IEC 2003
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
This CD-ROM contains
1) the publication ISO/IEC 10646:2003 in portable document format (PDF), which can be viewed using
Adobe® Acrobat® Reader;
2) text files containing lists of
i) source references for CJK ideographs,
ii) Hangul syllables and mapping information,
iii) alphabetically sorted character names.
Adobe and Acrobat are trademarks of Adobe Systems Incorporated.
This first edition cancels and replaces ISO/IEC 10646-1:2000 and ISO/IEC 10
...
SLOVENSKI STANDARD
01-november-2008
,QIRUPDFLMVNDWHKQRORJLMD±8QLYHU]DOQLYHþRNWHWQLQDERU]QDNRY8&6
Information technology -- Universal Multiple-Octet Coded Character Set (UCS)
Technologies de l'information -- Jeu universel de caractères codés sur plusieurs octets
(JUC)
Ta slovenski standard je istoveten z: ISO/IEC 10646:2003
ICS:
35.040 Nabori znakov in kodiranje Character sets and
informacij information coding
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
INTERNATIONAL ISO/IEC
STANDARD 10646
First edition
2003-12-15
Information technology — Universal
Multiple-Octet Coded Character Set (UCS)
Technologies de l'information — Jeu universel de caractères codés sur
plusieurs octets (JUC)
Reference number
©
ISO/IEC 2003
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2003
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2003 – All rights reserved
Contents Page
1 Scope.1
2 Conformance.1
3 Normative references.2
4 Terms and definitions.2
5 General structure of the UCS.4
6 Basic structure and nomenclature.5
7 General requirements for the UCS.9
8 The Basic Multilingual Plane .9
9 Supplementary planes.10
10 Private use groups, planes, and zones.10
11 Revision and updating of the UCS .10
12 Subsets.10
13 Coded representation forms of the UCS .11
14 Implementation levels.11
15 Use of control functions with the UCS.11
16 Declaration of identification of features.12
17 Structure of the code tables and lists .13
18 Block names.13
19 Characters in bi-directional context.14
20 Special characters.14
21 Presentation forms of characters .17
22 Compatibility characters.18
23 Order of characters .18
24 Normalization forms.18
25 Combining characters.18
26 Special features of individual scripts .20
27 Source references for CJK Ideographs.20
28 Character names and annotations.23
29 Structure of the Basic Multilingual Plane.25
30 Structure of the Supplementary Multilingual Plane for Scripts and symbols.27
31 Structure of the Supplementary Ideographic Plane .28
32 Supplementary Special-purpose Plane.28
33 Code tables and lists of character names.28
NOTE The code tables and lists of character names are given on pages 29-1348. They are contained
in separate files which are accessed by clicking on the appropriate highlighted text in Clause 33.
Annexes
A (normative) Collections of graphic characters for subsets .1349
B (normative) List of combining characters .1358
C (normative) Transformation format for 16 planes of Group 00 (UTF-16) .1364
© ISO/IEC 2003 – All rights reserved
iii
D (normative) UCS Transformation Format 8 (UTF-8) . 1367
E (informative) Mirrored characters in Arabic bi-directional context. 1371
F (informative) Alternate format characters. 1374
G (informative) Alphabetically sorted list of character names . 1379
H (informative) The use of “signatures” to identify UCS. 1380
J (informative) Recommendation for combined receiving/originating devices with
internal storage . 1381
K (informative) Notations of octet value representations . 1382
L (informative) Character naming guidelines . 1383
M (informative) Sources of characters . 1386
N (informative) External references to character repertoires . 1390
P (informative) Additional information on characters . 1392
Q (informative) Code mapping table for Hangul syllables . 1395
R (informative) Names of Hangul syllables . 1396
S (informative) Procedure for the unification and arrangement of CJK
Ideographs. 1408
T (informative) Language tagging using Tag Characters. 1416
U (informative) Usage of musical symbols . 1418
iv © ISO/IEC 2003 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 10646 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 2, Coded character sets.
This first edition of ISO/IEC 10646 cancels and replaces ISO/IEC 10646-1:2000 and ISO/IEC 10646-2:2001. It
also incorporates ISO/IEC 10646-1:2000/Amd.1:2002.
© ISO/IEC 2003 – All rights reserved
v
Introduction
ISO/IEC 10646 specifies the Universal Multiple-Octet Coded Character Set (UCS). It is
applicable to the representation, transmission, interchange, processing, storage, input
and presentation of the written form of the languages of the world as well as additional
symbols.
By defining a consistent way of encoding multilingual text it enables the exchange of
data internationally. The information technology industry gains data stability, greater
global interoperability and data interchange. ISO/IEC 10646 has been widely adopted in
new Internet protocols and implemented in modern operating systems and computer
languages. This edition covers over 95 000 characters from the world’s scripts.
ISO/IEC 10646 contains material which may only be available to users who obtain their
copy in a machine readable format. That material consists of the following printable files:
⎯ CJKU_SR.txt
⎯ CJKC_SR.txt
⎯ Allnames.txt
⎯ HangulX.txt
⎯ HangulSy.txt
vi © ISO/IEC 2003 – All rights reserved
INTERNATIONAL STANDARD ISO/IEC 10646: 2003 (E)
Information technology — Universal Multiple-Octet
Coded Character Set (UCS)
1 Scope
2 Conformance
ISO/IEC 10646 specifies the Universal Multiple-Octet
Coded Character Set (UCS). It is applicable to the
2.1 General
representation, transmission, interchange, processing,
storage, input, and presentation of the written form of
Whenever private use characters are used as speci-
the languages of the world as well as of additional
fied in ISO/IEC 10646, the characters themselves shall
symbols.
not be covered by these conformance requirements.
This document:
2.2 Conformance of information interchange
- specifies the architecture of ISO/IEC 10646,
- defines terms used in ISO/IEC 10646, A coded-character-data-element (CC-data-element)
within coded information for interchange is in confor-
- describes the general structure of the coded char-
mance with ISO/IEC 10646 if
acter set;
a) all the coded representations of graphic charac-
- specifies the Basic Multilingual Plane (BMP) of the
ters within that CC-data-element conform to
UCS,
clauses 6 and 7, to an identified form chosen from
- specifies supplementary planes of the UCS: the
clause 13 or annex C or annex D, and to an identi-
Supplementary Multilingual Plane (SMP), the
fied implementation level chosen from clause 14;
Supplementary Ideographic Plane (SIP) and the
b) all the graphic characters represented within that
Supplementary Special-purpose Plane (SSP),
CC-data-element are taken from those within an
- defines a set of graphic characters used in scripts
identified subset (see clause 12);
and the written form of languages on a world-wide
c) all the coded representations of control functions
scale;
within that CC-data-element conform to clause 15.
- specifies the names for the graphic characters of
the BMP, SMP, SIP, SSP and their coded repre- A claim of conformance shall identify the adopted form,
sentations; the adopted implementation level and the adopted
subset by means of a list of collections and/or charac-
- specifies the four-octet (32-bit) canonical form of
ters.
the UCS: UCS-4;
- specifies a two-octet (16-bit) BMP form of the
2.3 Conformance of devices
UCS: UCS-2;
A device is in conformance with ISO/IEC 10646 if it
- specifies the coded representations for control
conforms to the requirements of item a) below, and
functions;
either or both of items b) and c).
- specifies the management of future additions to
NOTE – The term device is defined (in 4.18) as a compo-
this coded character set.
nent of information processing equipment which can trans-
The UCS is a coding system different from that speci- mit and/or receive coded information within CC-data-
elements. A device may be a conventional input/output de-
fied in ISO/IEC 2022. The method to designate UCS
vice, or a process such as an application program or gate-
from ISO/IEC 2022 is specified in clause 16.2.
way function.
A graphic character will be assigned only one code
A
...
NORME ISO/CEI
INTERNATIONALE 10646
Première édition
2003-12-15
Technologies de l'information — Jeu
universel de caractères codés sur
plusieurs octets (JUC)
Information technology — Universal Multiple-Octet Coded Character
Set (UCS)
Numéro de référence
ISO/CEI 10646:2003(F)
©
ISO/CEI 2003
ISO/CEI 10646:2003(F)
PDF – Exonération de responsabilité
Le présent fichier PDF peut contenir des polices de caractères intégrées. Conformément aux conditions de licence d'Adobe, ce fichier
peut être imprimé ou visualisé, mais ne doit pas être modifié à moins que l'ordinateur employé à cet effet ne bénéficie d'une licence
autorisant l'utilisation de ces polices et que celles-ci y soient installées. Lors du téléchargement de ce fichier, les parties concernées
acceptent de fait la responsabilité de ne pas enfreindre les conditions de licence d'Adobe. Le Secrétariat central de l'ISO décline toute
responsabilité en la matière.
Adobe est une marque déposée d'Adobe Systems Incorporated.
Les détails relatifs aux produits logiciels utilisés pour la création du présent fichier PDF sont disponibles dans la rubrique General Info
du fichier; les paramètres de création PDF ont été optimisés pour l'impression. Toutes les mesures ont été prises pour garantir
l'exploitation de ce fichier par les comités membres de l'ISO. Dans le cas peu probable où surviendrait un problème d'utilisation,
veuillez en informer le Secrétariat central à l'adresse donnée ci-dessous.
Le présent CD-ROM contient:
1) la publication ISO/CEI 10646:2003 au format PDF (portable document format), qui peut être
visualisée en utilisant Adobe® Acrobat® Reader;
2) des fichiers textes contenant les listes de
i) références de source pour les idéogrammes CJC,
ii) syllabes hangûl et d'informations relatives au mappage,
iii) noms de caractères triés par ordre alphabétique.
Adobe et Acrobat sont des marques déposées de Adobe Systems Incorporated.
Cette première édition annule et remplace l
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.