ISO/IEC 10646-1:1993/Amd 8:1997
(Amendment)Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane — Amendment 8
Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane — Amendment 8
Technologies de l'information — Jeu universel de caractères codés à plusieurs octets — Partie 1: Architecture et table multilingue — Amendement 8
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISOIIEC
106464
STANDARD
First edition
1993-05-01
AMENDMENT 8
1997-12-15
Information technology - Universal
Multiple-Octet Coded Character Set
(UCS) -
Part 1:
Architecture and Basic Multilingual Plane
AMENDMENT 8
Jeu universe/ de caractkres cod& a
Technologies de I’informa tion -
plusieurs octets -
Partie I: Architecture et table multilingue
AMENDEMENT 8
Reference number
ISO/IEC 10646-I : 1993/Amd.8: 1997( E)
---------------------- Page: 1 ----------------------
ISO/IEC 10646-l: 1993/Amd.S: 1997(E)
Foreword
IS0 (the International Organization for Standardization) and IEC (the Inter-
national Electrotechnical Commission) form the specialized system for worldwide
standardization. National bodies that are members of IS0 or IEC participate in the
development of International Standards through technical committees established
by the respective organization to deal with particular fields of technical activity.
IS0 and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with
IS0 and IEC, also take part in the work.
In the field of information technology, IS0 and IEC have established a joint
technical committee, ISO/IEC JTC 1. Draft International Standards adopted by the
joint technical committee are circulated to national bodies for voting. Publication
as an International Standard requires approval by at least 75 % of the national
bodies casting a vote.
Amendment 8 to International Standard ISO/IEC 10646- 1: 1993 was prepared by
Joint Technical Committee ISO/IEC JTC 1, Information technology, Sub-
committee SC 2, Coded character sets.
0 ISO/IEC 1997
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or
utilized in any form or by any means, electronic or mechanical, including photocopying and micro-
film, without permission in writing from the publisher.
ISO/IEC Copyright Office l Case postale 56 l CH- 1211 Geneve 20 l Switzerland
Printed in Switzerland
---------------------- Page: 2 ----------------------
0 ISO/IEC lSO/IEC 10646-l : 1993/Amd.8: 1997( E)
Information technology - Universal Multiple-Octet Coded
Character Set (UCS) -
Part 1:
Architecture and Basic Multilingual Plane
AMENDMENT 8
Add the following new annex:
Annex T
(informative)
Procedure for the unification and arrangement of CJK ldeographs
The graphic character collection CJK UNIFIED For the purposes of ISO/IEC 10646-I a unification
IDEOGRAPHS in ISO/IEC 10646-I :I 993 contains process is applied to the ideographic characters
20,902 ideographs (see clause 26). They are taken from the codes in the source groups. In this
derived from over 54,000 ideographs which are process single ideographs from two or more of the
found in various different national and regional source groups are associated together, and a single
standards for coded character sets (the “source code position is assigned to them in this standard.
codes ”). The associations are made according to a set of
procedures that are described below. ldeographs
This Annex describes how the ideographs in this
that are thus associated are described here as
standard are derived from the source codes by
“unified ”.
applying a set of unification procedures, It also
describes how the ideographs in this standard are
arranged in the sequence of consecutive code
T.1 m Unification procedure
positions to which they are assigned.
T.l.l Scope of unification
The source code standards are shown below in four
ldeographs that are unrelated in historical derivation
groups according to their origins. The groups are
(non-cognate characters) have not been unified.
identified as the G-, T-, J-, and K-sources.
G-source: GB2312-80, GBI 234590,
*, *
Example:
GB7589-87*, GB7590-87*,
GB8565-88*,
NOTE - The difference of shape between the two ideographs
in the above example is in the length of the lower horizontal
General Purpose Hanzi List for
line. This is considered an actual difference of shape.
Modern Chinese Language*
Furthermore these ideographs have different meanings. The
T-source: TCA-CNS 11643-l 986/l st plane,
meaning of the first is “Soldier” and of the second is “Soil or
Earth ”.
TCA-CNS 11643-l 986/2nd plane,
TCA-CNS 11643-l 986/I 4th plane*
An association between ideographs from different
J-source: JIS X 0208-I 990, JIS X 0212-I 990
sources is made here if their shapes are sufficiently
K-source: KS C 5601-1989, KS C 5657-1991
similar, according to the following system of
II * II
classification.
after the reference number of a standard
(A
indicates that some of the ideographs included in
that standard are not introduced into the unified
collection.)
---------------------- Page: 3 ----------------------
lSO/IEC 10646-l :1993/Amd.8:1997(E) 0 ISO/IEC
T.1.2 Two level (
classification
A two-level system of classification is used to
differentiate (a) between abstract shapes and (b)
between actual shapes determined by particular
typefaces. Variant forms of an ideograph, which can
not be unified, are identified based on the difference
between their abstract shapes.
most superior node
T.1.3 Procedure
Figure 2 - The most superior node of a
A unification procedure is used to determine whether
component
two ideographs have the same abstract shape or
different ones. The unification procedure has two The following features of each ideograph to be
stages, applied in the following order: compared are examined:
a) Analysis of component structure; a : the number of components,
b) Analysis of component features; b : the relative position of the components in each
complete ideograph,
T.1.3.1 Analysis of component structure
c : the structure of corresponding components.
In the first staae of the brocedure the combonent
structure of each idedgraph is examined. A
If one or more of the features (a to c above) are
component of an ideograph is a geometrical
different between the ideographs in the comparison,
combination of primitive elements. Alternative
the ideographs are considered to have different
ideographs can be configured from the same set of
abstract shapes and are therefore not unified.
components. Components can be combined to
If all of the features (a to c above) are the same
create a new component with a more complicated
between the ideographs, the ideographs are
structure. An ideograph, therefore, can be defined as
considered to have the same abstract shape and are
a component tree, where the top node is the
therefore unified.
ideograph itself, and the bottom nodes are the
primitive elements. This is shown in Figure 1.
T.1.4 Examples of differences of abstract shapes
To illustrate rules derived from a: to c: in T.1.3.2,
some typical examples of ideographs that are not
--+.I
unified, owing to differences of abstract shapes, are
.
shown below.
I * d
T.1.4.1 Different number of components
C7.L
The examples below illustrate rule a: since the two
I,--L-
ideographs in each pair have different numbers of
components.
Figure 1 - Component structure
T.1.3.2 Analysis of component features T.1.4.2 Different relative positions of components
In the second stage of the procedure, the
The examples below illustrate rule b:. Although the
components located at corresponding nodes of two
two ideographs in each pair have the same number
ideographs are compared, starting from the most the relative positions of the
of components,
superior node, as shown in Figure 2.
components are different.
T.1.4.3 Different structure of a corresponding
component
The examples below illustrate rule c:. The structure
of one (or more) corresponding components within
the two ideographs in each pair is different.
2
---------------------- Page: 4 ----------------------
0 ISO/IEC
lSO/IEC 10646-l :1993/Amd.8:1997(E)
c) Differences in contact of strokes
d) Differences in protrusion at the folded corner of
strokes
e) Differences in bent strokes
T.1.5 Differences of actual shapes
To illustrate the classification described in T.1.2, f) Differences in folding back at the stroke
some typical examples of ideographs that are unified termination
are shown below. The two or three ideographs in
each group below have different actual shapes, but
they are considered to have the same abstract
g) Differences in accent at the stroke initiation
shape, and are therefore unified.
%wt, wk, L@L
h) Differences in “rooftop” modification
Aal\, wi
j) Combinations of the above differences
%mJ
SW?, wf, wf, Ex, These differences in actual shapes of a unified
ideograph are presented in the corresponding source
columns for each code position entry in the code
table in clause 26 of this International Standard.
T.1.6 Source separation rule
To preserve data integrity through multiple stages of
code conversion (commonly known as “round-trip
integrity ’), any ideographs that are separately
encoded in any one of the source standards listed
above have not been unified.
The differences are further classified according to
the following examples.
However, some ideographs encod
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.