ISO/TS 12033:2001
(Main)Electronic imaging — Guidance for selection of document image compression methods
Electronic imaging — Guidance for selection of document image compression methods
Imagerie électronique — Guide pour la sélection des méthodes de compression d'image
General Information
Relations
Standards Content (Sample)
TECHNICAL ISO/TS
SPECIFICATION 12033
First edition
2001-11-15
Electronic imaging — Guidance for
selection of document image compression
methods
Imagerie électronique — Guide pour la sélection des méthodes de
compression d'image
Reference number
ISO/TS 12033:2001(E)
©
ISO 2001
---------------------- Page: 1 ----------------------
ISO/TS 12033:2001(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not
be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this
file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this
area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters
were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event
that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2001
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic
or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body
in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.ch
Web www.iso.ch
Printed in Switzerland
ii © ISO 2001 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/TS 12033:2001(E)
Contents Page
Foreword.iv
Introduction.v
1 Scope.1
2 Normative references.1
3 Terms and definitions .1
4 General.3
5 Type of document and digitization parameters.3
5.1 General.3
5.2 Types of documents.3
5.3 Document classification and digitization.4
5.3.1 General.4
5.3.2 Black and white documents .4
5.3.3 Greyscale documents.4
5.3.4 Pseudo-grey documents.5
5.3.5 Colour documents.5
5.3.6 Mixed documents.5
6 Compression methods and standards .6
6.1 RLE compression (Run-Length Encoding).6
6.2 LZW compression (Lempel-Ziv-Welch) .6
6.3 ITU-T algorithms.6
6.3.1 General.6
6.3.2 Group 3 one-dimensional method (G3 1D) .6
6.3.3 Group 3 two-dimensional method (G3 2D) and Group 4 method .7
6.4 JBIG compression.7
6.5 JPEG compression.7
6.5.1 General.7
6.5.2 Discrete Cosine Transform (DCT).8
6.5.3 JPEG steps.8
6.5.4 Components of JPEG.8
6.6 Fractal compression.9
6.7 Wavelet compression.9
7 Selecting compression parameters.9
7.1 Pertinence of compression .9
7.2 Selecting a compression method .10
7.3 Adjusting JPEG compression .10
8 Conclusion.11
Bibliography.12
© ISO 2001 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/TS 12033:2001(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO
member bodies). The work of preparing International Standards is normally carried out through ISO technical
committees. Each member body interested in a subject for which a technical committee has been established has
the right to be represented on that committee. International organizations, governmental and non-governmental, in
liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical
Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.
The main task of technical committees is to prepare International Standards. Draft International Standards adopted
by the technical committees are circulated to the member bodies for voting. Publication as an International
Standard requires approval by at least 75 % of the member bodies casting a vote.
In other circumstances, particularly when there is an urgent market requirement for such documents, a technical
committee may decide to publish other types of normative document:
— an ISO Publicly Available Specification (ISO/PAS) represents an agreement between technical experts in an
ISO working group and is accepted for publication if it is approved by more than 50 % of the members of the
parent committee casting a vote;
— an ISO Technical Specification (ISO/TS) represents an agreement between the members of a technical
committee and is accepted for publication if it is approved by 2/3 of the members of the committee casting a
vote.
An ISO/PAS or ISO/TS is reviewed after three years with a view to deciding whether it should be confirmed for a
further three years, revised to become an International Standard, or withdrawn. In the case of a confirmed ISO/PAS
or ISO/TS, it is reviewed again after six years at which time it has to be either transposed into an International
Standard or withdrawn.
Attention is drawn to the possibility that some of the elements of this Technical Specification may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO/TS 12033 was prepared by Technical Committee ISO/TC 171, Document imaging applications, Subcommittee
SC 2, Application issues.
iv © ISO 2001 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/TS 12033:2001(E)
Introduction
With respect to the rapid increase of applications using digitization techniques, the role of compression methods
has become a factor of growing importance for the management of the volumes of stored data.
The effects of the available compression methods vary greatly, depending on the source documents. For example,
an Electronic Image Management (EIM) system configured for scanning and storing continuous tone images will
have different image compression requirements as compared to an application involving only text.
Practical methods for analyzing user requirements for image compression in order to select accurate and optimal
image compression schemes are complex. It was evidently useful to issue this Technical Specification in order to
guide users and system developers in their selection of these methods.
© ISO 2001 – All rights reserved v
---------------------- Page: 5 ----------------------
TECHNICAL SPECIFICATION ISO/TS 12033:2001(E)
Electronic imaging — Guidance for selection of document image
compression methods
1 Scope
This Technical Specification provides information to enable a user or EIM integrator to make an informed decision
on selecting compression methods for digital images of business documents. It is designed to provide technical
guidance to analyze the type of documents and which compression methods are most suitable for particular
documents in order to optimize their storage and use.
For the user, this Technical Specification provides information on image compression methods incorporated in
hardware or software in order to help this user during the selection of equipment in which the methods are
embedded.
For the equipment or software designer, it provides planning information.
This Technical Specification is applicable only to still images in bit-map mode. It only takes into account
compression algorithms based on well-tested mathematical work.
2 Normative references
The following normative documents contain provisions which, through reference in this text, constitute provisions of
this Technical Specification. For dated references, subsequent amendments to, or revisions of, any of these
publications do not apply. However, parties to agreements based on this Technical Specification are encouraged to
investigate the possibility of applying the most recent editions of the normative documents indicated below. For
undated references, the latest edition of the normative document referred to applies. Members of ISO and IEC
maintain registers of currently valid International Standards.
ISO 12651:1999, Electronic imaging — Vocabulary
ITU-T Recommendation T.4:1999, Standardization of Group 3 facsimile terminals for document transmission
ITU-T Recommendation T.6:1988, Facsimile coding schemes and coding control functions for group 4 facsimile
apparatus
3 Terms and definitions
For the purposes of this Technical Specification, the terms and definitions given in ISO 12651 and the following
apply.
3.1
lossless compression
compression algorithm that is capable of recalling all of the original information of a compressed image
3.2
lossy compression
compression algorithm which loses some of the original information during compression, so that the decompressed
image is only an approximation of the original
© ISO 2001 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO/TS 12033:2001(E)
NOTE This type of algorithm is especially useful in image compression where details can be eliminated, because these
details are not perceptible, or are minimally perceptible, to the human eye. In this case, the compression ratio is dramatically
increased.
3.3
resolution
number of pixels per unit of length
3.4
dots per inch
dpi
number of dots that a scanner (printer) can scan (print) per inch both horizontally or vertically
3.5
brightness
visual sensation that enables an observer to detect luminance
3.6
contrast
difference between the highest and the lowest densities of an image
3.7
bit level
number of bits used to define a pixel
3.8
luminance
Y
luminous flux emitted from a surface
NOTE The former term was photometric brightness.
3.9
chrominance
Cr,Cb
colour portion of the video signal including hue and saturation but not brightness
NOTE Low chroma means the colour picture looks pale or washed out; high chroma means intense colour; black, grey and
white have a chrominance equal to zero.
3.10
ITU-T Group 3 and Group 4
standard compression algorithms set by the ITU-T
3.11
Joint Photographic Experts Group
JPEG
popular name of ISO/IEC 10994 standard
3.12
Comité Consultatif International pour le Télégraphe et le Téléphone
CCITT
former name of the International Telecommunication Union – Telecommunication Standardization sector (ITU-T)
3.13
compression ratio
ratio between image size before compression and image size after compression
2 © ISO 2001 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/TS 12033:2001(E)
4 General
In a document imaging system, users are concerned about the quality of archived images, for two reasons: first,
because it can affect the imaging system's future in the medium or even long term; and second because they must
choose the imaging tools based on an evolving technology.
The digitization process, which by nature transforms an image conveying comprehensible information into a
dematerialized one, changes the observer's perception of that image. The observer may consider the image as
being improved, though more frequently he considers it degraded. In fact, images undergo a number of successive
transformations at different points during the digitization process. At each of these stages, attempts are made to
keep the image within acceptable legibility limits, but also to restrict its size to within acceptable economic limits.
The specific role of one of the digitization stages ― compression ― is to reduce the size of the image. Some
compression methods are reversible in that the decompression algorithm restores the initial digital information.
These methods are lossless and have no impact on the quality of the image as it is perceived by the human eye.
Other methods are lossy, and may cause degradation perceptible to the eye. By adjusting parameters, the user can
bring a lossy method within acceptable limits.
While numerous compression methods are described in technical literature, few are stable according to industrial
standards. These are based on a limited number of principles: dominance of certain patterns, pattern repetition,
and noticeable mathematical properties. In any individual method, the number of parameters the user can modify is
small.
The choice of a method and compression parameters are for a large part determined by the characteristics of the
document. Obviously, the graphical contents of a document play a key role in determining the method and its
parameters. However, other factors characterizing the application context are also very important (see diagram).
A document's graphical contents are themselves important to the digitization process. Thus, a photograph cannot
be digitized in the same way if it is in greyscale or based on a “pseudo-grey” process. In the first case, JPEG
compression is used, while the second would require ITU or JBIG compression.
Before discussing compression methods, therefore, we need to review the types of documents and how they are
represented following digitization. See Figure 1.
5 Type of document and digitization parameters
5.1 General
A document is a set of organized information intended for presentation to a human user. Documents can be a
single page or a set of pages, and can contain arbitrary contents types, such as character content, graphical
content, and various types of image content.
The following document content may be founded in various types of documents. The classification list hereafter is
somewhat arbitrary, but for a given application, these distinctions may be used to understand how to handle a
given document.
5.2 Types of documents
Here we will present only those documents (generally called “word processing documents”) that are most likely to
be archived electronically. These documents include:
black text on a white background, or less frequently, coloured text or a coloured background;
photographs, black and white or colour;
mixed documents containing both text and photographs reproduced by a printing process — black and white or
colour.
© ISO 2001 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO/TS 12033:2001(E)
Figure 1 — Interactions with the compression method
5.3 Document classification and digitization
5.3.1 General
For the purpose of determining a compression scheme, documents may be described in the following five ways.
For each type of document, digitization methods are briefly described.
5.3.2 Black and white documents
Digitizing pages printed in black and white (primarily text) generates bi-level images where each pixel is
represented by a bit. This form of representation can also be applied to images in text documents with a coloured
background or characters, as well as to line drawings.
The most important digitization parameter is resolution.
Resolution must be determined according to visual perception needs and on the limits of the complete imaging
process (e.g. 200 dpi for word processing documents, 300 dpi for digitized books).
There are also other parameters, related to image processing, which vary according to the kind of image. If we
know, for example, that the images to be digitized are text, we will try to produce black characters that are sharply
defined against a white background. Thus, we have brightness (adjusting the colour of a pixel against a threshold)
and contrast parameters (adjusting the colour of a pixel against that of the surrounding pixels).
5.3.3 Greyscale documents
This form of representation is applied to photographic documents, printed on paper from a black and white film.
4 © ISO 2001 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/TS 12033:2001(E)
Digitization changes an initially continuous document into a matrix of pixels whose intensity is encoded in a range
of levels. Thus, 8-bit encoding produces 256 greyscales.
The number of greyscales or the bit level must be determined according to visual perception needs and the limits of
the complete imaging process.
5.3.4 Pseudo-grey documents
This category includes images that simulate grey using a variable arrangement of black and white pixels. There can
be two cases:
1) the source document is a photographic reproduction in a text; it was produced using a printing technique
and is itself a pseudo-grey document (rastering uses black pixels of variable size);
2) the source document is a true photograph, but was digitized in pseudo-grey for performance reasons: to
reduce the storage volume or transmission times on a network (the “half-tone” technique involves
arranging a variable number of black pixe
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.