ISO/IEC TR 9789:1994
(Main)Information technology — Guidelines for the organization and representation of data elements for data interchange — Coding methods and principles
Information technology — Guidelines for the organization and representation of data elements for data interchange — Coding methods and principles
Provides general guidance on the manner on which data can be expressed by codes. Describes the objectives of coding, the characteristics, advantages and disadvantages of different coding methods, the features of codes and gives guidelines for the design of codes. Examples of applications are ISO 9735:1988, ISO 8601:1988, ISO 3166:1993.
Technologies de l'information — Principes directeurs pour l'organisation et la représentation des éléments de données — Méthodes et principes de codage
General Information
Standards Content (Sample)
TECHNICAL
ISO/IEC
REPORT TR 9789
First edition
1994-l 2-l 5
Information technology - Guidelines for
the organization and representation of data
elements for data interchange - Coding
methods and principles
Technologies de /‘information - Principes directeurs pour I’organisation
et la reprksentation des 6kments de don&es pour ISchange de
don&es - M&hodes et principes de codage
Reference number
GO/l EC TR 9789: 1994(E)
---------------------- Page: 1 ----------------------
ISO/IEC TR 9789: 1994(E)
Contents
1 Scope . 1
2 References . 1
........................ ............. 1
2.1 General references
1
2.2 Examples of applications of this Technical Report .
...................................... 2
3 Definitions
3
4 Principles of coding .
.................... 3
4.1 Information and coding
4
4.2 Coding. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 Coding objectives.
. . . . .*.*.
5.1 Identification
5.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Key to further information . . . . . . . . . . . . . . . .
7
6 Types of codes .
7
6.1 Forms of codes
........................ 7
6.2 Sequential codes.
10
6.3 Random codes. .
.......................... 11
6.4 Mnemonic codes.
12
6.5 Abbreviation based codes .
.............................. 13
6.6 Matrix codes
14
6.7 Hierarchical codes. .
....................... 15
6.8 Juxtaposition codes
16
6.9 Combination codes. .
....................... 16
6.10Value addition code
17
7 Features of codes .
17
........................
7.1 General features.
17
7.2 Uniqueness. .
18
.............................
7.3 Expandibility
18
7.4 Conciseness. .
18
7.5 Simplicity .
18
7.6 Versatility. .
18
...................
7.7 Suitability for sorting
............. 18
7.8 Stability .
............. 19
7.9 Significance. .
............. 19
7.10Size. .
............. 20
......................
7.11Structure and format
............. 20
7.12Capacity. .
............. 21
....................
8 Representation of code values
............. 21
8.1 Character sets .
22
.............
.................
8.2 Representation techniques
0 lSO/IEC 1994
Unless otherwise specified, no part of this publication may be
All rights reserved.
reproduced or utilized in any form or by any means, electronic or mechanical, including
photocopying and microfilm, without permission in writing from the publisher.
lSO/IEC Copyright Office l Case postale 56 l CH-1211 Geneve 20 l Switzerland
Printed in Switzerland
ii
---------------------- Page: 2 ----------------------
ISO/IEC TR 9789:1994(E)
o lSO/IEC
23
...............................................
9 Design of codes
23
9.1 General .
24
....................................
9.2 Number of characters
25
......................
9.3 Format of the coded representation
26
..........................................
9.4 Character sets
26
..................................
9.5 Assignment conventions
27
...............
9.6 Error detection by using check characters
28
..............
9.7 Recommendations for some application areas
28
...............................
9.8 Practical recommendations
. . .
III
---------------------- Page: 3 ----------------------
ISO/IEC TR 9789: 1994(E) o lSO/IEC
Foreword
IS0 (the International Organization for Standardization) and IEC (the
International Electrotechnical Commission) form the specialized system
for worldwide standardization.
National Bodies that are members of IS0 or IEC participate in the
development of International Standards through technical committees
established by the respective organization to deal with particular
committees
fields of technical activity. IS0 and IEC technical
international
collaborate in fields of mutual interest. Other
organizations, governmental and non-governmental, in liaison with IS0
and IEC, also take part in the work.
In the field of information technology, IS0 and IEC have established a
joint technical committee, ISO/IEC/JTC 1.
The main task of technical committees is to prepare International
Standards. In exceptional circumstances a technical committee
may
propose the publication of a Technical report of one of the following
types:
-
when the required support cannot be obtained for the
type 1,
publication of an International Standard, despite repeated
efforts;
-
when the subject is still under technical development or
type 2,
where for any other reason there is the future but not immediate
possibility for an agreement on an International Standard;
-
when a technical committee has collected data of a
type 3,
different kind from that which is normally published as an
International Standard; ("state of the art", for example).
Technical reports of types 1 and 2 are subject to review within three
to decide whether they can be transferred into
years of publication,
Technical reports of type 3 do not necessarily
International Standard.
have to be reviewed until the data they provide are considered to be
no longer valid or useful.
ISO/IEC TR 9789, which is a Technical Report of type 3, was prepared
Information technology
by Joint Technical Committee ISO/IEC JTC I,
Subcommittee 14, Data element principles.
---------------------- Page: 4 ----------------------
ISO/IEC TR 9789: 1994(E)
o lSO/IEC
Introduction
to develop and implement coded
Technical Report is a guide
This
representations.
Coding covers the way and the form in which data are expressed.
and electronic data interchange
The increased use of data processing
controllable and verifiable data
heavily relies on accurate, reliable,
recorded in data bases.
In formal communication and storage data are expressed in symbols
(usually digits or letters), arithmetic numbers and descriptions,
which should have a fixed stable meaning for every one involved and
thereby be suitable for purposes of processing and communication.
objectives of coding, the
Report presents the
This Technical
different coding
advantages and disadvantages of
characteristics,
for the
a survey of the features of codes and guidelines
methods,
design of codes.
V
---------------------- Page: 5 ----------------------
This page intentionally left blank
---------------------- Page: 6 ----------------------
ISO/IEC TR 9789:1994(E)
TECHNICAL REPORT o lSO/IEC
for the organization
Information technology -- Guidelines
and representation of data elements for data interchange --
Coding methods and principles
1 Scope
on the manner on which
This Technical Report provides general guidance
data can be expressed by codes.
It describes the objectives of coding, the characteristics, advantages
the features of codes
of different coding methods,
and disadvantages
and gives guidelines for the design of codes.
This Technical Report is not directed toward any specific application
design method for application systems or
area nor dependent on any
data interchange.
2 References
2.1 Genera1 references
IS0 7-bit coded character
Information technology --
ISo/IEC 646:1991,
set for information interchange.
Vocabulary -
processing systems -
2382-4:1987, Information
IS0
Part O&Organization of data.
Procedure for registration of escape
Data processing -
ISO 2375:1985,
sequences.
Check character systems.
Data processing -
ISO 7064:1983,
and
Specification
Information technology -
11179-3:1994,
ISO/IEC
Basic attributes of data
standardization of data elements - Part 3:
elements.
2.2 Examples of applications of this Technical Report
interchange for administration,
data
Electronic
IS0 9735:1988,
Application level syntax.
commerce and transport (EDIFACT) -
Information
formats -
and interchange
Data elements
Is0 8601:1988,
Representation of dates and times.
interchange -
Codes for the representation of names of countries.
Is0 3166:1993,
1
---------------------- Page: 7 ----------------------
ISO/IEC TR 9789:1994(E)
o lSO/IEC
3 Definitions
For the purpose of this Technical Report,
the following definitions
apply*
3.1 attribute: A characteristic of an object.
3.2 character set: A finite set of different characters that is
complete for a given purpose.
Example: The international reference version of the character set of
IS0 646.
3.3 code: A collection of rules that maps the elements of one set on
to the elements of another set.
NOTES
1. The elements may be characters or character strings.
2. The first set is the coded set and the other set is the code element
set.
3. An element of the code element set may be related to more than one
element of the coded set but the reverse is not true.
3.4 code element: The result of applying a code to an element in a
.
coded set.
Examples:
1 " CDG " as the representation of Paris Charles de Gaulle in the code
for three-letter representation of airport names.
2 The seven binary digits representing the delete character in IS0 646.
3.5 code element set: The result of applying a code to all elements
of a coded set.
All the three-letter international representations of airport
Example:
names.
Synonym of code element set.
3.6 code set:
Synonym of code element,
3.7 code value:
Synonym of code element.
3.8 coded representation:
A set of elements which is mapped on to another set
3.9 coded set:
according to a code.
Example: A list of the names of airports which is mapped on to a
corresponding set of three-letter abbreviations.
3.10 coding scheme: Synonym of code.
See preferred term code element.
3.11 data code:
2
---------------------- Page: 8 ----------------------
ISO/IEC TR 9789:‘t994(E)
o lSO/IEC
An occurrence of a data element type.
3.12 data element instance:
A category of data which represents a concept
3.13 data element type:
expressed as a set of data element type
and whose properties are
attributes which permit it to support information interchange.
One or more characters used to indicate the beginning
3.14 delimiter:
or the end of a character string.
including
Any concrete or abstract thing of interest,
3.15 entity:
associations among things.
3.16 field: A specified area on a data medium or in storage, used for
a particular class of data elements.
characters used to identify or name a
3.17 identifier: One or more
data element type and possibly to indicate certain properties of
that data element type.
3.18 key: An identifier within a set of data element types.
in a string that may be occupied by an
3.19 position: Any location
element and that is identified by a serial number.
the same nature, such as
A sequence of elements of
3.20 string:
considered as a whole.
characters,
data each item of which may be
3.21 table: An arrangement of
identified by means of arguments or keys.
4 Principles of coding
4.1 Information and coding
is understood as facts of and propositions
In daily life information
of interest expressed by:
about all the concrete or abstract things
messages and further particulars.
data,
Information is necessary for the proper execution of any conceivable
task be it in administration, commerce, transport, science, etc.
information is a prerequisite in
objective and unambiguous
Accurate,
and the data interchange
cases of computer based information systems
between them.
(usually
In formal information systems data is expressed in symbols
which should
arithmetic numbers and descriptions,
digits or letters),
involved and thereby be
stable meaning for every one
have a fixed,
suitable for purposes of processing and communication.
of their function or tasks, should be able to
irrespective
Users,
their information correctly.
and handle
interpret
understand,
Information shared by different user groups or application
semantic meaning of a
systems has to have an agreed definition e.g.
and all instances of a concept (denotation) and
concept (connotation)
an agreed representation.
3
---------------------- Page: 9 ----------------------
o lSO/lEC
lSO/IEC TR 9789: 1994(E)
and the form in which data is expressed by
Coding covers the way
codes.
It is necessary to make clear agreements on these representations.
forms and code elements is part
An explanation of the representation
of the specification of data.
4.2 Coding
By coding is understood the rule-based assignment of code elements to
a named and defined set of elements in an orderly way.
(usually digits or
Coding is mostly done by means of symbols,
letters), resulting in a concise representation.
" CDG " as the concise
Example: The assignment of code element:
"Paris de Gaulle". This
representation of airport name: Charles
maintained by the
to the set of airport names
airport name belongs
has set the rules
International Air Transport Association (IATA). IATA
for establishing the concise representations.
Coding of
tool for information processing.
Coding is a necessary
interpreted, processed and
enables it to be recorded,
information
transmitted by humans and/or by machines.
information about products,
All kinds of information can be coded:
processes, documents, countries, currencies, packages, etc.
persons,
methodology, i.e. the
making agreements on the coding
Before
actions, concrete or
representation of information concerning events,
it must be investigated which data
abstract objects in the real world,
Information analysis of the
are relevant for the intended application.
universe of discourse concerned has to determine the role of the data
in the information structure thereof.
should be made between
this a clear distinction
In doing
classification and reference needs.
identification,
5 Coding objectives
Information about any abstract or concrete object, action, or event of
(its characteristics or attributes) can be coded. Before
interest,
of their representation, the
on the configuration
making agreements
it is necessary to determine the objective of the coding
coding rules,
effort. It is not enough to design an ordered short representation of
First the information requirements must be clear.
certain data.
The following requirements generally occur:
-
identification
-
classification
-
key to further information.
When data modelling is applied for the specification of application
for use in data interchange, the objectives of the
systems or messages
will determine the choice of the
users in the application environment
their
into account as well as
to be taken
entities and attributes
used for identification,
The methods to be
interrelationship.
classification or referencing will depend on those objectives.
4
---------------------- Page: 10 ----------------------
ISO/IEC TR 9789:1994(E)
o lSO/IEC
5.1 Identification
to distinguish elements of a set from
The purpose of identification is
each other.
must first be determined which
this it
To be able to do
Based on the selected
characteristics have to be taken into account.
characteristics comparisons can be performed and it can be ascertained
whether an element of the set is equal to another element or different
from it. To which degree of detail characteristics have to be recorded
depends
of elements of the set,
to indicate similarity or difference
on the area of application for which identification is needed.
Examole
For its stock control of stationery an organization wants to identify
Sometimes it may be sufficient to
of sheets of paper.
various types
A4 or A5. Depending on the
the various formats: A3,
distinguish
utilization of the sheets of paper other characteristics may be added,
composition. If no specific
weight, chemical
such as thickness,
to be met the recording of the format may suffice.
requirements have
where handling and processing of the sheets of paper
In other cases,
are critical the necessary characteristics have to be mentioned.
identification can be defined as:
so
The systematic registration of characteristics of elements of a set in
such a way that they can be distinguished from each other.
to be distinguished are inalienably part of an
The characteristics
object or concept of interest.
The extent of the details to be observed is dependent on the user's
objectives and the area of application.
These criteria result in the design or selection of an identification
system.
Examples of details to be distinguished:
In the real world a person has a family name and a given name, is born
resides in a country, his eyes have a
on a certain date,
in a country,
certain colour, etc.
wants to
application system one
In a governmental
Application 1:
distinguish the colours of eyes of citizens.
for medical research project
In an application system
Application 2:
one wants to distinguish the colours of eyes of human beings.
of discrimination need not
In these applications the required degree
be identical.
it may be required that more
In the medical research application,
are to be distinguished than in an
colours of eyes of human beings
the colours of eyes of
application of a governmental body registering
citizens.
compromise when selecting an
into a
this may result
In practice
Often the choice of a system is then determined
identification system.
identification
the wish to have a minimum number of
on basis of
5
---------------------- Page: 11 ----------------------
o lSO/IEC
ISO/IEC TR 9789: 1994(E)
systems to accommodate a maximum number of functions.
intrinsic
and applications determine which
The objectives
characteristics will be taken into consideration.
5.2 Classification
to group objects or concepts of
The purpose of classification is
interest into classes in accordance with predetermined characteristics
based on which similarities can be ascertained.
decision making or to get
Classification is often used to support
insight on trends or developments, without having to examine each
instance of a set separately.
classification can be defined as:
so
in groups or categories based on
A systematic arrangement of elements
the similarity of predetermined characteristics.
Classification is done by means of control characteristics, i.e. those
characteristics which have been assigned or are related to an object
or concept of interest.
These characteristics may be intrinsic or extrinsic.
Examole of control characteristics:
turnover speed, market sector,
Place of manufacturing of products,
production process for a product.
The information requirements and the business policy are determinative
for the choice of control characteristics.
classification systems for
An organization may choose to apply various
dependent on different needs.
the same type of objects,
Examole:
A product may be classified according to
-
function on behalf of sales
-
manufacturing process on behalf of production
-
value on behalf of inventory control
-
volume/weight on behalf of transport
-
type on behalf of Customs or statistical requirements
5.3 Key to further information
set of data element types. within the
A key is an identifier within a
data interchange a key shall be unique.
context of an application or
In many application systems a reference number is needed as key to
but it
itself can be meaningless,
further information. The key in
gives access to the data required.
Examoles:
-
access to the party to whom the order was
an order number may give
and the goods or services ordered;
sent, on which date,
its
- its price,
related to a description,
an article number may be
the place of manufacturing;
production process,
his name and address, his
-
a salary number may refer to an employee,
birthdate, his rank, his salary.
---------------------- Page: 12 ----------------------
ISO/IEC TR 9789: 1994(E)
o lSO/IEC
and
Reference numbers may be identifying in one application area,
classifying in another.
6 Types of codes
of basic coding methods. It is
This chapter provides a description
intended to assist in selecting appropriate code structures based upon
and the nature of the elements in
application requirements
specific
It also provides principles and criteria to be
the set to be coded.
code structures, and mentions
considered in assessing alternative
advantages and disadvantages of each coding method.
fairly extensive. The following
code structures is
The choice of
should help to select the best method.
information, however,
6.1 Forms of codes
methods discussed in this chapter are outlined by the
The coding
The set of methods shown is not exhaustive but does
following listing.
include all the significant types.
applied in practice are often combinations of
Many code structures
these basic types.
Non-significant codes
Sequential
incremental sequential
group sequential
arranged sequential (chronological, alphabetical)
Random
Significant codes
Mnemonic
Abbreviation based
Matrix
Hierarchical
Juxtaposition
Combination
Value addition
6.2 Sequential codes
6.2.1 Principle
Elements of a set to be coded are assigned a number taken sequentially
numbers. These numbers are mostly natural
from an ordered set of
beginning with "1') but alphabetic characters
integer numbers kg-
may also be used, e.g. AAA, AAB, AAC . . .
of code elements may be based on lists built in
Sequential assignment
various ways, for example:
limited to the number of
1 . The list of natural integer numbers
possibilities wished to be available.
2 . Lists of numbers arranged on the basis of an algorithm, e.g. only
even numbers or multiples of 10.
---------------------- Page: 13 ----------------------
ISO/IEC TR 9789: 1994(E) o lSO/IEC
6.2.2 Use
self for
Sequential codes are generally used as contained codes
identification or referencing purposes, or as part of a composite
code, often in addition to a classifying code.
Remark
In a numerically defined field with a fixed number of positions
leading zeroes must be used to fill the field up to the number of
positions required, whenever appropriate.
Examole: In a field defined as 3-numeric 'I one 'I is coded 001, and
"fifteen" is coded 015.
6.2.3 Types of sequential codes
There are four types of sequential codes:
-
incremental;
- pure incremental;
- group;
- arranged.
6.2.3.1 Incremental sequential codes
Elements of a set to be coded are assigned a code value determined by
the previously assigned code element by a predefined
increasing
pure incremental) or 2 in case of even numbers, or
number, e.g. 1 (=
10 if only multiples of 10 may be assigned).
With this method a code value does not express any meaning. Similar
elements of the set are not grouped.
The rationale for assigning code values other than increasing by 1 can
be the requirement to use the intermediate code values for subsequent
modifications of the original coded set.
Advantages
-
Fast assignment of code values.
-
Conciseness.
-
Easy validation of coded representations.
Disadvantages
-
Classification or grouping of elements of a set is impossible by
means of the coded representations.
-
The maximum capacity is not fully used.
6.2.3.2 Pure incremental sequential codes
Elements of a set to be coded are assigned a code value determined by
increasing the previously assigned code value by 1.
With this method a code value does not express any meaning. Similar
elements of the set are not grouped.
In addition to the advantages mentioned in 6.2.3.1, another advantage
is that the maximum capacity is used.
---------------------- Page: 14 ----------------------
ISO/IEC TR 9789: 1994(E)
o lSO/IEC
Group sequential codes
6.2.3.3
with this type of code ranges code values are assigned to categories
of elements of the set which have something in common.
Before assigning a code value the category to which the element of the
set belongs must be determined. It is then assigned the next higher
code value belonging to the range of its category.
Examole
Code for products in the area of oil refining.
In this code, each product is represented by a 4-digit number taken
sequentially from a series that is characteristic of the grouping of
products.
These groups of products, which have been determined in advance, and
the corresponding ranges of code values are:
1000 - 2999
gas
-
petrol and fuels 3000 - 4999
Advantages
-
Fast assignment of code values.
-
Conciseness.
-
Easy validation of coded representations.
Disadvantages
-
The maximum capacity is not fully used.
Remark
This system must not be confused with hierarchical codes (see 6.7).
Group sequential coding must only be used if the categories are stable
element of a set may belong to
and there is no possibility that an
now or in the foreseeable future.
different categories,
Arranged sequential codes
6.2.3.4
although the code values may give
This is not a pure sequential code,
the impression.
This type can only successfully be used if all elements of a set are
- .-I ' -I 1 1
known beforehand and tne set ~111, not expana.
Before assigning code values the elements of the set are arranged on
alphabetical order of name,
the basis of some characteristic, e.g.
order thus obtained is
(for events, actions). The
chronological
which themselves have been chosen
expressed again by the code values
sequentially from an ordered list.
it possible to easily retrieve and sort data
This coding scheme makes
impossible or
instances. It is used in cases where it is
element
difficult to obtain the required results by use of the non-coded data
because of the presence of spaces separating
element instances, e.g.
words or resulting from digit adjustment.
Advantages
-
Fast assignment of code values.
- Conciseness.
-
Easy validation of coded representations.
9
---------------------- Page: 15 ----------------------
ISO/IEC TR 9789: 1994(E)
o lSO/IEC
Disadvantages
- The maximum capacity is not fully used.
Example
In France "Departements" have been coded numerically based on a list
sorted on alphabetical sequence. This in
an element other
may
codes, e.g. postal code, national identity
number, vehicle
registration number.
A drawback of this system is that it cannot accommodate extensions,
which will probably arrive in future.
Taking the example above: "Departement Gorse"
was assigned code value
20. When "Gorse" was divided into "Haute Gorse" and "Gorse du Sud" one
was obliged to use letters (2A and 2B,
respectively) in a numeric code
and thus created anomalies.
6.2.4 Conclusions
When selecting a sequential code the pure incremental type offers the
best solution for purposes of flexibility,
expendability and ease of
assignment.
6.3 Random codes
'I
6.3.1 Principle
Elements of a set are assigned a code value
from a set of possible
non-arranged code values
or by means of an algorithm. There is no
correlation whatever between the element of
the set and its code
value. This method differs from sequential coding in that the latter
uses sequentially ordered ranges for code value assignment.
The purpose of this method is to render any interpretation impossible.
6.3.2 Advantages
- Easy and fast assignment of code values, possibly automated.
- Conciseness.
- Use of maximum capacity
6.3.3 Disadvantages
- Classification or grouping of elements of a set is impossible by
means of the coded representations.
- Need for a pre-established list or an algorithm to generate random
provided that any duplication of numbers is excluded.
numbers,
6.3.4 Use
Random codes may be used as self contained codes for identification
code where the other part(s) are
purposes or as part of a composite
based on other coding principles.
10
---------------------- Page: 16 ----------------------
ISO/IEC TR 9789:1994(E)
o lSO/IEC
6.4 Mnemonic codes
6.4.1 Principle
Elements of a set are assigned a coded representation by taking one or
more characters from their names.
this method is to enhance the efficiency of user
The purpose of
memorization of code values.
6.4.2 Advantages
thus avoiding frequent
- Ease of user memorization of code values,
consultation of code lists.
-
Facilitation of the detection of errors.
6.4.3 Disadvantages
Coding closely depends on the way in which the elements of the set are
measurement A risk
originally expressed (language, system, etc.).
exists when creating mnemonics which, in the user's language, has a
meaning with no connection to the occurrences represented.
6.4.4 Use
Mnemonic codes can be usefully applied for limited identifying code
sets which are rather stable and of which the names for elements of
the set are commonly known in the user's environment.
Example 1
From: IS0 3166 Codes for the representation of names of countries
Entity name Countrv code
Austria AT
Canada CA
France FR
United States US
Example 2
From: UN/ECE Recommendation 20 Codes for units of measurement
Coded representations
Unit name
CLT
Centilitre
CMT
Centimetre
Cubic centimetre
CMQ
Cubic foot
FTQ
CEL
Degree Celsius
FAH
Degree Fahrenheit
Gallon GLI
KGM
Kilogram
KMT
Kilometre
KWT
Kilowatt
MTR
Metre
MIN
Minute
PCE
Piece
CMK
Square centimeter
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.