ISO/IEC 12042:1993
(Main)Information technology — Data compression for information interchange — Binary arithmetic coding algorithm
Information technology — Data compression for information interchange — Binary arithmetic coding algorithm
Specifies an algorithm for the reduction of the number of bits required to represent information. This process is known as data compression. The algorithm uses binary arithmetic coding and provides lossless compression and is intended for use in information interchange.
Technologies de l'information — Compression de données pour l'échange d'information — Algorithme de codage arithmétique binaire
General Information
Relations
Standards Content (Sample)
INTERNATIONAL
ISO/IEC
STANDARD
12042
First edition
1993-12-15
Information technology - Data
compression for information
interchange - Binary arithmetic coding
algorithm
Technologies de I’information
- Compression de donnees pour
I’khange d’information
- Algorithme de codage arithmktique
binaire
Reference number
ISO/IEC 12042:1993(E)
---------------------- Page: 1 ----------------------
ISO/IEC 12042: 1993 (E)
Contents
Page
1 scope
2 Normative references
3 Conformance
4 Conventions and notations
5 Algorithm identifier
6 Definitions
61 . block
62 . Code Block
63 . Code String
64 . encoding
65 . input event
66 . Logical Dat;ir Record
67 . trailer
Unique Table Pair
68 .
2
7 List of acronyms
2
8 Compression algorithm
8.1 General
8.2 Encoders
8.3 Formation of a Code Block
8.4 Code String
8.5 Table Pairs
8.6 Encoding
@ ISO/IEC 1993
All rights reserved. Unless otherwise specified, no part of this publication may be
reproduced or utilized in any form or by any means, electronie or mechanical, including
photocopying and microfilm, without Permission in writing from thc publisher.
ISO/IEC Copyright Office l Case postale 56 * CH-121 1 Geneve 20 0 Switzerland
Printed in Switzerland
ii
---------------------- Page: 2 ----------------------
ISOIIEC 12042: 1993 (E)
8.6.1 Normal Mode
8.6.2 Run Mode
8.7 Completion of the encoding of a block
Annex
A Example of a binary arithmetic coding algorithm
10
. . .
111
---------------------- Page: 3 ----------------------
ISO/IEC 12042: 1993 (E)
Foreword
ISO (the International Organization for Standardization) and IEC (the Inter-
national Electrotechnical Commission) form the specialized System for
worldwide standardization. National bodies that are members of ISO or IEC
participate in the development of International Standards through technical
committees established by the respective organization to deal with particular
fields of technical activity. ISO and IEC technical committees collaborate in
fields of mutual interest. Other international organizations, governmental and
non-governmental, in liaison with ISO and IEC, also take part in the work.
In the field of information technology, ISO and IEC have established a joint
technical committee, ISO/IEC JTC 1. Draft International Standards adopted by
the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75 % of the
national bodies casting a vote.
International Standard ISO/IEC 12042 was prepared by the European Computer
Manufacturers Association (as Standard ECMA-159) and was adopted, under a
special “fast-track procedure” , by Joint Technical Committee ISO/IEC JTC 1,
Information technology, in parallel with its approval by national bodies of ISO
and IEC.
Annex A of this International Standard is for information only.
---------------------- Page: 4 ----------------------
ISOIIEC 120420 1993 (E)
Introduction
In the past decades numerous International Standards for magnetic tapes, magnetic tape cassettes and cartridges, as weil as
for Optical disk cartridges have been published. Media developed recently have a very high physical recording density. In
Order to make an optimal use of the resulting data capacity, compression algorithms have been designed which allow a
reduction of the number of bits required for the representation of Laser dat;l in coded form.
These compression algorithms arc registered by an International Registration Authority set up by lSO/IEC. The registration
will consist in allocating to each registered algorithm a numerical identifier which will be recorded on the medium and, thus,
indicate which compression aIgorithm(s) has been used.
The fiist International Standard for compression algorithms was:
Information terhnology -Data Compression for Information Interchange - Adaptive Coding with Embedded
ISO/lEC 11558,
Dictionary - BCLZ Algorithm
This International Standard is the next one of this series.
---------------------- Page: 5 ----------------------
This page intentionally left blank
---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 12042:1993(E)
Information technology - Data compression for information
interchange - Binary arithmetic coding algorithm
1 Scope
This International Standard specifies an algorithm for the reduction of the number of bits required to represent information.
This process is known as data compression. The algorithm uses binary arithmetic coding. The algorithm provides lossless
compression and is intended for use in information interchange.
2 Normative references
The following standards contain provisions which, through reference in this text, constitute provisions of this International Standard.
At the time of publication, the editions indicated were valid. All Standards are subject to revision, and Parties to agreements based
on this International Standard are encouraged to investigate the possibility of applying the most recent editions of the standards
listed below. Members of EC and ISO maintain registers of currently valid International Standards.
LSOIIEC 115762993, Information technology - Procedure for the registration of algorithrns for the lossless compression of
data.
Intern&onal Register of Algorithm for bssless Cornpression of Data, in accordance with LYOIIEC 11576.
3 Conformance
A compression algorithm shall be in conformance with this International Standard if it satisfies all mandatory requirements
of this International Standard.
4 Conventions and notations
The following conventions and notations apply in this International Standard unless otherwise stated:
- In each field the bytes shall be arranged with Byte 1, the most significant, first. Within each byte the bits shall be
amnged with Bit 1, the most significant bit, first and Bit 8, the least significant bit, last.
- Letters and digits in parentheses represent numtxrs in hexadecimal notation.
- The setting of bits is denoted by ZERO or ONE.
- Numbers in binary notation and bit combinations arc reprcsented by strings of ZEROS and ONEs.
- Numbers in binary notation and bit combinations arc shown with the most significant bit to the left.
5 Algorithm identifier
The numeric identifier of this algorithm in the International Register shall be 16.
6 Definitions
For the purposes of this International Standard. the following dcfinitions apply.
1
---------------------- Page: 7 ----------------------
ISO/IEC f2042:1993 (E)
6.1 block: A portion of the Logical Data Record, usually having a length of 5 112 bytes (see 8.2).
6.2 Code Block: A block after compression with a trailer appended.
6.3 Code String: The encoded Logical Data Record.
6.4 encoding: The process of generating Code Blocks from blocks.
6.5 input event: The sample of the input to an encoder currently being examined; in Run Mode it is a byte; in Normal M&
it is a bit.
6.6 Logical Data Record: The data entity that is the input to the data compressor.
6.7 traiier: data appended to a block after compression and addition of pad bits.
6.8 Unique Table Pah: The last of the 256 Table Pairs, used only in Run Mode.
7
List of acronyms
cv Current Value
EV Estimated Value
LDR Logical Data Record
TP Table Pair
Compression aigorithm
8
81 . General
The LDR is transformed to a Code String by a one-pass, adaptive encoding technique designed to provide lossless d;rta
compression. By the use of a suitable decoding technique the exact original LDR tan be recovered from the Code String.
82 . Encoders
The LDR shall be divided into 512,byte blocks, except for the last block, which may be of any length less than, or equal to,
512 bytes. The blocks shall be routed sequentially to eight encoders, numbered from 0 to 7, commencing with encoder 0. If
the LDR contains more than 4 096 bytes the compressor shall retum to encoder 0 and repeat the process (see figure 2).
83 . Formation of a Code Block
The output of each encoder is a Code Block (see figure 1).
1
I
I
Trailer
1 Pad bits Trailer Pad Byte
for Byte 1 Byte 2 if byte
Compressed Block 1 last
count odd
1 data
(FF) (00)
e
1 w
I
I
1
mm-- -l
Figure 1 - Code Block
---------------------- Page: 8 ----------------------
ISODEC 12042: 1993 (E)
Since the degree sf compression achieved in an encoder depends upon the relative frequency of the bit pattems in the LDR,
and upon the presence of sequences of identicarll bytes, the length of the Compressed Block cannot be predicted. Pad bits set to
ZERO shail be added at the end to ferm an integral number of 8-bit bytes.
The Code Block shall be completed by appending a trailer. The trailer shall consist of two Trailer Bytes possibly followed by
a Pad Byte.
Trailer Byte 1 shall be s-et to (FF).
Trailer Byte 2
Bits 1 to 4 shall be set to 1 l0Q if the Code Block has been generated from the last block of the LDR.
shaU be NO1 for all other Code Blocks.
Bit 5 shall be set to ZERO if the nunaber sf bytes after encoding is even.
shall be set to ONE if the number of bytes after encoding is edd,
Bits 6 to 8 shall specify the number of pad bits that have been added to form an integral number of bytes.
If the number of bytes in the Compressed Block plus the pad bi& is odd, a Pad Byte set to (00) shaU be appended after Trailer
Byte 2 to give an even number of bytes.
84 . Code String
The Code String shaII be assembled from the outputs of the encoders, with the first portion beirigg that generated by encoder 0,
the second that genented by encoder I, and so on (see figure 2).
85 . Table Pah
Esch encoder shaI1 be allocated a table of 256 pairs of numbers, numbered from 1 to 256. The fmt number of each Table Pair
.
shall be the estimated Aue (EV) of the Input Event to be encoded; it shall be 1 or 0. The second number (K’) shall be a
measure of the probability sf the Input Event being equal to the EV. # shall have the value 1, 2, 3 or 4, with the pmbability
shown in table 1.
Table I- Probability values of K
The probabilities shall be a measure of how much more likely it is that the vaIue of the Input Event is equai to the EV rather
than being unequal (e.g. for K=2 the probability that the Input Event is equal to the EV shall be 2 to 4 times as great as the
probability that it would not be equal).
Before commencing the encoding of the LDR all EVs shall be set to ZERO and aII values of K sha.ll k set to ONE,
Encoding
86 .
The data shall fnst be examined on a byte basis. Bytes shall be fetched sequentially from the block, starting with the fúst
byte, and compared with the previous byte. The frst byte in a block shall be compared with (40).
Run Mode (see 8.6.2) shaI1 be disabled when the first byte is fetched.
If the current byte differs from the previous byte and Run Mode is not enabled, then the byte shall be encoded, bit by bit, in
Normal Mode (see 8.6.1).
If the current byte differs from the previous byte and Run Mode is enabled, then encoding shall proceed as defined in 8.6.2.2.
3
---------------------- Page: 9 ----------------------
ISOhEC 12042: 1993 (E)
If the two bytes arc identical and Run Mode is not enabled, then Run Mode shall be enabled and the byte shall be encded, bit
by bit, in Normal Mode.
If the two bytes arc identical and Run Mode is enabled, then encoding shall proce efmed in 8.6.2.1.
8,6.1 Noma!I Mode
The first (most significant) bit of the byte shall be co e EV in the fxst Table P
this comparison, one of two actions, which arc descri 1 result. The choice of wh
the remaining bits sf the byte shall determined by the bits previously encoded in the byte.
The fNst bit sf each byte shall always use the fust Table P
The second bit shdl use the second or thir n ~?~~et~er tie first it was ZERO or ONE, res tively.
The third bit shall use one of the next four Talale Psias de o bits. The fourth Tablec Pair s used
if the first two bits were 00, and so on (see figure 3).
This procedure requires the first 255 Table Pairs. T e remaining Table Pair is the Unique Table Pair; it shall used in the
Run Mode only (see 8.6.2).
The process sf encoding is then performed in two Stages:
- The bit is encoded as in 8.6.1.1.
- The values of K and EV arc revised as described in 8-6.1.2.
8.6.l.l Bit emoding
Two values shall be developed du ,ring the bit eomparison portion of the compression process. Both arc fractional binary
numbers to four b inary places. One value, termed the Current Value
(CV), shaI1 be used to genemte the output (Code Block).
The second value, termed the Width, shall have values in the rmge 0, to 1,llll.
For each block the CV shall be initialized to 0, and the Width value shall be initialized to 1,
IGch Input Event shaI1 Cause the two values to be modifled according to the following ruless
The Input Event is equal to the EV
0
The CV shalI be increased by 2-K
The Width shall be decreased by 2-”
where K is the measure of the probability of the Input Event (see 8.5).
If the Code Block is not null the bit to the left of the point in the CV shall be logically added to the last bit appended to
the Code Block and replaced in the CV by a ZERO. If this addition Causes the most recently generated complete byte in
the Code Block to become (FF), then (0) shaI1 be appended to the Code Block at the end of that byte to prevent a carry
from propagating beyond the last complete byte.
If the Code Block is null the bit to the left sf the point in the CV will already be a ZERO.
The Width shall then be compared with 1,0000.
If it is equal to, or greater than, 1,0000, then the Table Pair shall be revised (see 8.6.1.2) and the next bit fetched for
encoding.
If it is less than l,OOOO, then all the bits of the Width shall be shifted left one position; the leftmost bit, set to ZERO,
shall be dropped, and the rightmost position shall be filled with a ZERO. The bit in the frst position to the right of the
point in the CV shall be appended to the encoder output as part of the Code Block. The remaining bits to the right of
the point in the CV shaI1 be shifted left one position and the rightmost position shall be filled with a ZERO.
To prevent a carry from propagating beyond the last complete byte, each byte in the Code Block shaI1 be examined as it
is completed. If it equals (FF) then (0) shall be appended to the Code Block.
---------------------- Page: 10 ----------------------
ISO/IEC 12042: 1993 (E)
ii) The Input Event is not equal to the EV.
The Width shall be reset to l,OOOO.
The K bits to the right of the point in the CV shaI1 be shifted out and appended to the Code Block, the leftmost bit first.
As each bit is appended to the Code Block a check shaI1 be made to determine whether a byte boundary has been
reached. If it has, then the byte just completed sh311 be checked; if it equals (FF) then (0) shaI1 be appended to the Code
Block.
The rightmost K bit positions of the CV shaI1 be set to ZEROS.
8.6.1.2 Revision of K and EV (see figure 4)
As each Input Event is compared with the EV the values of K and the EV for that Table Pair shall be amended.
A four-Stage binary counter (Mc counter) shall be set to 0000 at the beginning of each block and incremented as described
below.
The Input Event is equal to the EV
0
K shaIl be incremented according to table 2, where bits marked X shall be ignored.
Table 2 - Rules for incrementing K
CuKent CuKent value value state of New value
ofK ofK Counter
ofK
XXll
XllI
1111
For all other states of the counter K shall be unchanged.
The counter shall then be incremented by 1. If the counter vaIue was 1111 incrementing by 1 shall res& in a c
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.