Information technology — Data compression for information interchange — Adaptive coding with embedded dictionary — DCLZ Algorithm

Specifies a lossless compression algorithm (DCLZ - Data Compression according to Lempel and Ziv) to reduce the number of bits required to represent information coded by means of 8-bit bytes. This algorithm is particularly useful when information has to be recorded on an interchangeable medium. Its use is not limited to this application.

Technologies de l'information — Compression de données pour l'échange d'information — Codage adaptif avec un dictionnaire incorporé — Algorithme DCLZ

General Information

Status
Published
Publication Date
26-Aug-1992
Current Stage
9093 - International Standard confirmed
Completion Date
01-Jun-2022
Ref Project

Buy Standard

Standard
ISO/IEC 11558:1992 - Information technology -- Data compression for information interchange -- Adaptive coding with embedded dictionary -- DCLZ Algorithm
English language
12 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL
STANDARD 11558
First edition
1992-09-01
-___---- -----
_ _. _ .-_ -___. --.-----------I_- ---.- -__ _ _.__._____ .----
-_.-___-___________________ _--_ _._ -._ ---
Information technology - Data compression for
information interchange - Adaptive coding with
embedded dictionary - DCLZ Algorithm
Technologies de I’ir,formation - Compression de donnees pour
Codaqe adaptif avec un dictionnaire
Mchange d’in forma tion -
incorpor6 - Algorithme DCLZ c.
Reference number
ISO/I EC 11558: 1992(E)

---------------------- Page: 1 ----------------------
ISO/IEC 11558: 1992 (E)
Page
Contents
1
1 Scope
1
2 Conformance
3 Normative references
4 Definitions
4.1 Code Value
4.2 Codeword
4.3 compression ratio
4.4 dictionary
4.5 empty state
4.6 frozen state
5 Conventions and notations
6 Algorithm identifier
2
7 DCLZ compression algorithm
7.1 Overview
7.2 Principle of operation
7.2.1 Compilation of the dictionary
7.2.2 Frozen dictionary
7.2.3 Resetting the dictionary to the empty state
7.2.4 Boundaries
7.2.5 Recreation of the dictionary
3
7.3 Code Values
7.3.1 Control Codes
7.3.2 Encoded Bytes
7.3.3 Dictionary Codes
4
7.4 Codewords
Annexes
5
A - Example of a generic DCLZ algorithm
9
B - Example of Code Values output for a given input stream
10
C - Bibliography
0 1s0/lEc 1992
All rights reserved. No part of this publication may be reproduced or utilized in any form or by any
means, electronic or mechanical, including photocopying and microfilm, without permission in writing
from the publisher.
l CH-1211 Genkve 20 l Switzerland
ISOEC Copyright Offke l Case postale 56
Printed in Switzerland

---------------------- Page: 2 ----------------------
ISOlIEC 11558: 1992 (E)
Foreword
IS0 (the International Organization for Standardization) and IEC (the Inter-
,
national Electrotechnical Commission) form the specialized system for world-
wide standardization. National bodies that are members of IS0 or IEC partici-
pate in the development of International Standards through technical committees
established by the respective organization to deal with particular Gelds of
technical activity. IS0 and IEC technical committees collaborate in fields of
mutual interest. Other international organizations, governmental and non-
governmental, in liaison with IS0 and IEC, also take part in the work.
In the field of information technology, IS0 and IEC have established a joint
technical committee, ISO/IEC JTC 1. Draft International Standards adopted by
the joint technical committee are circulated to national bodies for voting. Publi-
cation as an International Standard requires approval by at least 75 % of the
national bodies casting a vote.
International Standard ISO/IEC 11558 was prepared by the European Computer
Manufacturers Association (as Standard ECMA-15 1) and was adopted, under a
special “fast-track procedure”., by Joint Technical Committee ISO/IEC JTC 1,
Information technology, in parallel with its approval by national bodies of IS0
and IEC.
Annexes A to C of this International Standard are for information only.
Patents
During the preparation of the ECMA standard, infomlation was gathered on
patents upon which application of the standard might depend. Relevant patents
were identified as belonging to Hewlett Packard Limited. However, neither
ECMA nor ISO/IEC can give authoritative or comprehensive information about
evidence, validity or scope of patent and like rights. The patent holders have
stated that licences will be granted under reasonable and non-discriminatory
temls. Communications on this subject should be addressed to
Hewlett Packard Limited
Computer Peripherals Bristol
Filton road
Stoke Gifford
Bristol BS12 642
United Kingdom
. . .
111

---------------------- Page: 3 ----------------------
ISO/IEC 11558: 1992 (E)
Introduction
In the past decades ISO/IEC have published numerous International Standards for magnetic tapes, magnetic tape cas-
settes and cartridges, as well as for optical disk cartridges. Those media developed recently have a very high physical
recording density. In order to make an optimal use of the resulting data capacity, compression algorithms have been
designed which allow a reduction of the number of bits required for the representation of user data in coded form.
In future, these compression algorithms will be registered by an International Registration Authority to be established
by ISO/IEC. The registration will consist in allocating to each registered algorithm a numerical identifier. For a
recorded medium this should be included in the recorded format to indicate which compression algorithm(s) has been
used.
This International Standard is the first of a series of International Standards for compression algorithms.

---------------------- Page: 4 ----------------------
INTERNATIONAL STANDARD ISO/IEC 11558=1992 (E)
Information technology - Data compression for information interchange -
Adaptive coding with embedded dictionary - DCLZ algorithm
1
Scope
This International Standard specifies a lossless compression algorithm to reduce the number of bits required to
represent information coded by means of 8-bit bytes. This algorithm is known as DCLZ (Data Compression according
to Lempel and Ziv).
This International Standard specifies neither the strategy for resetting the dictionary nor that for freezing it, as these
are implementation-dependent.
This algorithm is particularly useful when information has to be recorded on an inter-changeable medium. Its use is
not limited to this application.
2 Conformance
A compression algorithm shall be in conformance with this International Standard if its output data stream satisfies
the requirements of clause 7.
Normative reference
3
The following standard contains provisions which, through reference in this text, constitute provisions of this
International Standard. At the time of publication, the edition indicated was valid. All standards are subject to
revision, and parties to agreements based on this International Standard are encouraged to investigate the possibility
of applying the most recent edition of the standard listed below. Members of IEC and IS0 maintain registers of
currently valid International Standards.
ISO/IEC 11576, Information technology - Procedure for the registration of algorithms for the lossless compression of
data
4 Definitions
41 0 Code Value: An integer in the range 0 to 4 095 that is generated by the compression algorithm.
42 0 Codeword: A set of 9, 10, 11 or 12 consecutive bits in the output stream to express a Code Value in binary
form.
43 0 compression ratio: The number of bits in the input stream of the compression algorithm divided by the
number of bits in the output stream of the compression algorithm.
44 0 dictionary: A table, comprising 3 832 entries, that is used to retain strings of bytes selected from the input
stream. Each entry is identified by a unique Code Value that is greater than 263.
45 0 empty state: The state in which no data is in the dictionary.
46 0 frozen state: The state in which no further data shall be added to the dictionary.
5
Notations and acronyms
- Numbers in this International Standard are expressed in decimal notation.
- EOR: end of record.
6
Algorithm identifier
The numeric identifier of this algorithm in the International Register is 32.
1

---------------------- Page: 5 ----------------------
ISODEC 11558: 1992 (E)
7
DCLZ compression algorithm
71 0
Overview
The DCLZ compression algorithm shall accept information input, in the form of a stream of &bit data bytes, and shall
output Codewords, in the form of a stream of bits which are organised into 8-bit bytes. The algorithm shall identify
repetition of byte strings in the input stream and shall exclude such redundancy from the output stream.
With many types of information generated, transmitted or recorded by electronic information processing systems and
equipment, the degree of repetition in data is sufficiently high to permit the output stream to contain significantly
fewer bits than the input stream. Under degenerate circumstances, however, the output stream may contain more bits
than the input stream. The compression ratio achieved in practice is dependent on the characteristics of the actual
input data stream.
Compression by this algorithm is lossless, i.e. it is possible to restore exactly the original representation of data by
means of a complementary decompression algorithm.
The algorithm contains features which aid its implementation in data storage and retrieval equipment which handles,
in a sequential manner, data records of varying length.
72 0 Principle of operation
The fundamental principle of operation is the compilation of a dictionary of strings of bytes which occur in the input
stream, the use of that dictionary to detect repetition, and the generation of a Codeword for each repeated string. The
Codeword expresses a Code Value which is the reference to the dictionary entry for the repeated string.
7.2.1 Compilation of the dictionary
Prior to the commencement of operation of the algorithm, the dictionary shall be reset to the empty state (see also
7.3.1.2).
The algorithm shall examine the input stream and shall search for the first occurrence of a unique pair or a unique
string. A unique pair is a 2-byte string which has not yet been allocated a dictionary entry. A unique string of n bytes
(n > 2) is one which has not yet been allocated a dictionary entry; however, the first n-l bytes shall have been already
allocated a dictionary entry. The maximum length of a string for which a dictionary entry can be allocated shall be
128 bytes.
Upon encountering a unique pair, the algorithm shall output a Codeword which expresses the Code Value for the first
byte of the pair. Upon encountering a unique string of n bytes, the algorithm shall output a Codeword which expresses
the Code Value for the first n-l bytes of the string.
It shall then enter the unique pair or unique string into the dictionary and assign the next unused Code Value to the
entry, provided that the dictionary is not frozen (see 7.2.2) and that n does not exceed 128.
Starting with the 2nd byte of the current unique pair or the last byte of the current unique string, the algorithm shall
to examine the
then continue input stream and search for the next unique pair or unique string.
7,2.2 Frozen dictionary
The dictionary shall be considered to be in the frozen state in the following cases:
- all available Code Values have been assigned;
- the implementation of the algorithm has decided not to enter a unique pair or a unique string into the dictionary,
for example because the search for free space in the dictionary takes too much time.
The only means by which the dictionary may be removed from the frozen state is by being reset to the empty state
(see also 7.3.1.1).
2

---------------------- Page: 6 ----------------------
ISO/IEC 11558: 1992 (E)
7.2.3 Resetting the dictionary to the empty state
The algorithm is permitted to reset the dictionary to the empty state at any time, provided that all bytes which have
been input to the algorithm have been expressed by Codewords.
The algorithm may, for example, choose to reset the dictionary if the current degree of compression is not adequate
because the current dictionary entries do not reflect the current repetition characteristics of the input stream to a
sufficient extent.
7.2.4 Boundaries
Within the input stream, natural boundaries may exist between collections of bytes. For example, the stream may
consist of a sequence of records, each comprising one or more bytes; in such a case, a natural boundary exists
between records. The algorithm shall provide a means for identifying such boundaries in the output stream, so that
they are recognized and re-constituted by a decompression algorithm.
Such identification shall be achieved by the output of the EOR Codeword, (see 7.3.1.4), followed by a Codeword
which expresses the Code Value for the single byte, pair of bytes or string of bytes which is being held temporarily
for the purpose of examining the input stream for a unique pair or a unique string. Examination of the input stream
shall then continue from the first byte of the next record. The result is that the data between boundaries in the input
stream is wholly represented by Codewords between corresponding boundaries in the output stream. Such a boundary
in the output stream is considered to be located at the end of the pad bits that follow the Codeword that follows the
EOR Codeword.
7.2.5 Re-creation of the dictionary
The dictionary is not itself included in the output stream as a distinct item. Any appropriate decompression algorithm
will re-create the dictionary and restore the original representation of the data from the output stream of the
compression algorithm.
73 . Code Values
Code Values in the range 0 to 7 shall designate Control Codes. See 7.3.1.
Code Values in the range 8 to 263 shall designate Encoded Bytes. See 7.3.2.
Code Values in the range 264 to 4 095 shall designate Dictionary Codes. See 7.3.3
7.3.1 Control Codes
Four Control Codes are defined, with Code Values 0, 1, 2 and 3 as described below. Values in the range 4 to 7 shall
not be used.
7.3.1,1 Dictionary Frozen
This Control Code shall have the Code Value 0. It shall indicate that the dictionary has been frozen. It is not
mandatory for the algorithm to output this Code Value.
It may be output at any time after the algorithm has decided to freeze the dictionary, provided that all bytes which
have been input to t
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.