ISO/IEC 15444-5:2015
(Main)Information technology - JPEG 2000 image coding system: Reference software - Part 5:
Information technology - JPEG 2000 image coding system: Reference software - Part 5:
ISO/IEC 15444-5:2015 defines a set of lossless and lossy compression methods for coding continuous-tone, bi-level, greyscale or colour digital still images. This Recommendation | International Standard provides three independently created software reference implementations of Rec. ITU-T T.800 | ISO/IEC 15444-1, in order to assist implementers of Rec. ITU-T T.800 | ISO/IEC 15444-1 in testing and understanding its content. The packages are JASPER, JJ2000 and OPENJPEG. The reference software packages are informative only. This Recommendation | International Standard does not define any additional part of the JPEG 2000 image coding system. Each version of the reference software contains source code, which may be compiled to provide the following functionality: - transcoding from selected, widely available image formats into a JPEG 2000 codestream; - transcoding from selected, widely available image formats into the JP2 file format; - selection of a wide range of JPEG 2000 encoding options (as documented in each reference software); - decoding from a JPEG 2000 codestream to a range of selected widely available image formats; - Processing of a JP2 file to extract a JPEG 2000 codestream for decoding and conversion to a range of selected widely available image formats. - The ability to extract metadata from a JP2 file, including the contents of the Image Header box and the colour space. - The decoding of JP2 files that use the Three-Component Matrix-Based form of the Restricted ICC method for the specification of colour space and the conversion of the decoded image data to the sRGB colour space for display, including limited upsampling of all decoded components to the same resolution. - The decoding of JP2 files that use the Monochrome form of the Restricted ICC method for the specification of colour space and the conversion of the decoded image data to the sRGB based greyscale space as defined within the JP2 file format. - The decoding of JP2 files that use the sYCC colour space and the conversion of the decoded image data to the sRGB colour space for display, including upsampling of all decoded components to the same resolution. - some additional tools to help with evaluation and testing. The reference software is intended for use as a testing and validation tool for other implementations of JPEG 2000, and to help in the understanding of Rec. ITU-T T.800 | ISO/IEC 15444-1. Although components of the reference software may find application in software intended for product development, this was not an objective of the development of this software, and prospective implementers are cautioned against making any estimations of performance or resource usage based on the reference software.
Technologies de l'information — Système de codage d'images JPEG 2000: Logiciel de référence — Partie 5:
General Information
Relations
Frequently Asked Questions
ISO/IEC 15444-5:2015 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - JPEG 2000 image coding system: Reference software - Part 5:". This standard covers: ISO/IEC 15444-5:2015 defines a set of lossless and lossy compression methods for coding continuous-tone, bi-level, greyscale or colour digital still images. This Recommendation | International Standard provides three independently created software reference implementations of Rec. ITU-T T.800 | ISO/IEC 15444-1, in order to assist implementers of Rec. ITU-T T.800 | ISO/IEC 15444-1 in testing and understanding its content. The packages are JASPER, JJ2000 and OPENJPEG. The reference software packages are informative only. This Recommendation | International Standard does not define any additional part of the JPEG 2000 image coding system. Each version of the reference software contains source code, which may be compiled to provide the following functionality: - transcoding from selected, widely available image formats into a JPEG 2000 codestream; - transcoding from selected, widely available image formats into the JP2 file format; - selection of a wide range of JPEG 2000 encoding options (as documented in each reference software); - decoding from a JPEG 2000 codestream to a range of selected widely available image formats; - Processing of a JP2 file to extract a JPEG 2000 codestream for decoding and conversion to a range of selected widely available image formats. - The ability to extract metadata from a JP2 file, including the contents of the Image Header box and the colour space. - The decoding of JP2 files that use the Three-Component Matrix-Based form of the Restricted ICC method for the specification of colour space and the conversion of the decoded image data to the sRGB colour space for display, including limited upsampling of all decoded components to the same resolution. - The decoding of JP2 files that use the Monochrome form of the Restricted ICC method for the specification of colour space and the conversion of the decoded image data to the sRGB based greyscale space as defined within the JP2 file format. - The decoding of JP2 files that use the sYCC colour space and the conversion of the decoded image data to the sRGB colour space for display, including upsampling of all decoded components to the same resolution. - some additional tools to help with evaluation and testing. The reference software is intended for use as a testing and validation tool for other implementations of JPEG 2000, and to help in the understanding of Rec. ITU-T T.800 | ISO/IEC 15444-1. Although components of the reference software may find application in software intended for product development, this was not an objective of the development of this software, and prospective implementers are cautioned against making any estimations of performance or resource usage based on the reference software.
ISO/IEC 15444-5:2015 defines a set of lossless and lossy compression methods for coding continuous-tone, bi-level, greyscale or colour digital still images. This Recommendation | International Standard provides three independently created software reference implementations of Rec. ITU-T T.800 | ISO/IEC 15444-1, in order to assist implementers of Rec. ITU-T T.800 | ISO/IEC 15444-1 in testing and understanding its content. The packages are JASPER, JJ2000 and OPENJPEG. The reference software packages are informative only. This Recommendation | International Standard does not define any additional part of the JPEG 2000 image coding system. Each version of the reference software contains source code, which may be compiled to provide the following functionality: - transcoding from selected, widely available image formats into a JPEG 2000 codestream; - transcoding from selected, widely available image formats into the JP2 file format; - selection of a wide range of JPEG 2000 encoding options (as documented in each reference software); - decoding from a JPEG 2000 codestream to a range of selected widely available image formats; - Processing of a JP2 file to extract a JPEG 2000 codestream for decoding and conversion to a range of selected widely available image formats. - The ability to extract metadata from a JP2 file, including the contents of the Image Header box and the colour space. - The decoding of JP2 files that use the Three-Component Matrix-Based form of the Restricted ICC method for the specification of colour space and the conversion of the decoded image data to the sRGB colour space for display, including limited upsampling of all decoded components to the same resolution. - The decoding of JP2 files that use the Monochrome form of the Restricted ICC method for the specification of colour space and the conversion of the decoded image data to the sRGB based greyscale space as defined within the JP2 file format. - The decoding of JP2 files that use the sYCC colour space and the conversion of the decoded image data to the sRGB colour space for display, including upsampling of all decoded components to the same resolution. - some additional tools to help with evaluation and testing. The reference software is intended for use as a testing and validation tool for other implementations of JPEG 2000, and to help in the understanding of Rec. ITU-T T.800 | ISO/IEC 15444-1. Although components of the reference software may find application in software intended for product development, this was not an objective of the development of this software, and prospective implementers are cautioned against making any estimations of performance or resource usage based on the reference software.
ISO/IEC 15444-5:2015 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.30 - Coding of graphical and photographical information. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 15444-5:2015 has the following relationships with other standards: It is inter standard links to ISO/IEC 15444-5:2021, ISO/IEC 15444-5:2003/Amd 2:2015, ISO/IEC 15444-5:2003, ISO/IEC 15444-5:2003/Amd 1:2003. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC 15444-5:2015 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 15444-5
Second edition
2015-10-15
Information technology — JPEG 2000
image coding system: Reference
software
Technologies de l'information — Système de codage d'images
JPEG 2000: Logiciel de référence
Reference number
©
ISO/IEC 2015
© ISO/IEC 2015
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Ch. de Blandonnet 8 CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2015 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 15444-5 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information, in collaboration with
ITU-T. The identical text is published as ITU-T Rec. T.804.
This second edition cancels and replaces the first edition (ISO 15444-5:2003), which has been technically
revised. It also incorporates ISO/IEC 15444-5:2003/Amd.1:2003 and ISO/IEC 15444-5:2003/Amd.2:2015.
ISO/IEC 15444 consists of the following parts, under the general title Information technology — JPEG 2000
image coding system:
Part 1: Core coding system
Part 2: Extensions
Part 3: Motion JPEG 2000
Part 4: Conformance testing
Part 5: Reference software
Part 6: Compound image file format
Part 8: Secure JPEG 2000
Part 9: Interactivity tools, APIs and protocols
Part 10: Extensions for three-dimensional data
Part 11: Wireless
Part 12: ISO base media file format
Part 13: An entry level JPEG 2000 encode
Part 14: XML representation and reference
© ISO/IEC 2015 – All rights reserved iii
CONTENTS
Page
1 Scope . 1
2 Normative references. 1
2.1 Identical Recommendations | International Standards . 1
2.2 Additional references . 2
3 Definitions . 2
4 Abbreviations and symbols . 4
4.1 Abbreviations . 4
4.2 Symbols . 5
5 Conventions . 5
6 General description . 5
7 Copyright and licensing . 6
8 Platform requirements . 6
8.1 JasPer requirements . 6
8.2 JJ2000 requirements . 6
8.3 OpenJPEG requirements . 7
9 Reference code structure . 7
9.1 JasPer executables . 7
9.2 JJ2000 executables . 7
9.3 OpenJPEG executables . 7
10 Intellectual Property . 7
11 Software availability and updates . 8
Annex A – JASPER – C reference software – software description . 9
A.1 Introduction . 9
A.2 Software updates . 9
A.3 Version numbering . 9
A.4 Software overview . 9
A.5 JasPer library . 10
A.6 JasPer demo application programs . 10
A.7 Software requirements. 10
A.8 Building the software . 11
A.9 Using the software . 11
Annex B – JJ2000 – Java reference software – software description . 12
B.1 Introduction . 12
B.2 Software updates . 12
B.3 Software architecture. 12
B.4 Installing and running the software . 13
Annex C – OpenJPEG – C reference software - software description . 14
C.1 Introduction . 14
C.2 Getting and updating the software . 14
C.3 Building and using the software . 14
C.4 Testing the software . 14
Electronic attachment: JASPER, JJ2000, OPENJPEG reference packages
iv © ISO/IEC 2015 – All rights reserved
INTERNATIONAL STANDARD
ITU-T RECOMMENDATION
Information technology –
JPEG 2000 image coding system: Reference software
1 Scope
Rec. ITU-T T.800 | ISO/IEC 15444-1 defines a set of lossless and lossy compression methods for coding continuous-
tone, bi-level, greyscale or colour digital still images. This Recommendation | International Standard provides three
independently created software reference implementations of Rec. ITU-T T.800 | ISO/IEC 15444-1, in order to assist
implementers of Rec. ITU-T T.800 | ISO/IEC 15444-1 in testing and understanding its content. The packages are JASPER,
JJ2000 and OPENJPEG.
The reference software packages are informative only. This Recommendation | International Standard does not define
any additional part of the JPEG 2000 image coding system.
Each version of the reference software contains source code, which may be compiled to provide the following
functionality:
– transcoding from selected, widely available image formats into a JPEG 2000 codestream;
– transcoding from selected, widely available image formats into the JP2 file format;
– selection of a wide range of JPEG 2000 encoding options (as documented in each reference software);
– decoding from a JPEG 2000 codestream to a range of selected widely available image formats;
– Processing of a JP2 file to extract a JPEG 2000 codestream for decoding and conversion to a range of
selected widely available image formats.
– The ability to extract metadata from a JP2 file, including the contents of the Image Header box and the
colour space.
– The decoding of JP2 files that use the Three-Component Matrix-Based form of the Restricted ICC method
for the specification of colour space and the conversion of the decoded image data to the sRGB colour
space for display, including limited upsampling of all decoded components to the same resolution.
– The decoding of JP2 files that use the Monochrome form of the Restricted ICC method for the specification
of colour space and the conversion of the decoded image data to the sRGB based greyscale space as defined
within the JP2 file format.
– The decoding of JP2 files that use the sYCC colour space and the conversion of the decoded image data to
the sRGB colour space for display, including upsampling of all decoded components to the same
resolution.
– some additional tools to help with evaluation and testing.
The reference software is intended for use as a testing and validation tool for other implementations of JPEG 2000, and
to help in the understanding of Rec. ITU-T T.800 | ISO/IEC 15444-1. Although components of the reference software
may find application in software intended for product development, this was not an objective of the development of this
software, and prospective implementers are cautioned against making any estimations of performance or resource usage
based on the reference software.
2 Normative references
The following Recommendations and International Standards contain provisions which, through reference in this text,
constitute provisions of the Recommendation | International Standard. At the time of publication, the editions indicated
were valid. All Recommendations and Standards are subject to revision, and parties to agreements based on this
Recommendation | International Standard are encouraged to investigate the possibility of applying the most recent edition
of the Recommendations and Standards listed below. Members of IEC and ISO maintain registers of currently valid
International Standards. The Telecommunication Standardization Bureau of the ITU maintains a list of currently valid
ITU-T Recommendations.
____________________
This Specification includes an electronic attachment containing the JASPER, JJ2000 and OPENJPEG reference packages.
Rec. ITU-T T.804 (04/2015) 1
2.1 Identical Recommendations | International Standards
– ITU-T Recommendation T.800 (2002) | ISO/IEC 15444-1:2002, Information technology – JPEG 2000
Image Coding System: Core coding system.
2.2 Additional references
– ISO/IEC 9899:1999, Programming languages – C.
– ISO/IEC 9945-1:1996, Information technology – Portable Operating System Interface (POSIX) – Part 1:
System Application Program Interface (API) (C language).
– ISO/IEC 9945-2:1993, Information technology – Portable Operating System Interface (POSIX) – Part 2:
Shell and utilities.
3 Definitions
For the purposes of this Recommendation | International Standard, the following definitions apply:
3.1 big endian: The bits of a value representation occur in order from most significant to least significant.
3.2 bit: A contraction of the term "binary digit"; a unit of information represented by a zero or a one.
3.3 bit-plane: A two-dimensional array of bits. In this Recommendation | International Standard, a bit-plane refers
to all the bits of the same magnitude in all coefficients or samples. This could refer to a bit-plane in a component, tile-
component, code-block, region of interest, or other.
3.4 bit stream: The actual sequence of bits resulting from the coding of a sequence of symbols. It does not include
the markers or marker segments in the main and tile-part headers or the EOC marker. It does include any packet headers
and in stream markers and marker segments not found within the main or tile-part headers.
3.5 box: A portion of the file format defined by a length and unique box type. Boxes of some types may contain
other boxes.
3.6 box contents: Refers to the data wrapped within the box structure. The contents of a particular box are stored
within the DBox field within the Box data structure.
3.7 byte: Eight bits.
3.8 channel: One logical component of the image. A channel may be a direct representation of one component from
the codestream, or may be generated by the application of a palette to a component from the codestream.
3.9 code-block: A rectangular grouping of coefficients from the same subband of a tile-component.
3.10 coder: An embodiment of either an encoding or decoding process.
3.11 codestream: A collection of one or more bit streams and the main header, tile-part headers, and the EOC
required for their decoding and expansion into image data. This is the image data in a compressed form with all of the
signalling needed to decode.
3.12 coefficient: The values that are the result of a transformation.
3.13 component: A two-dimensional array of samples. An image typically consists of several components, for
instance representing red, green and blue.
3.14 compressed image data: Part or all of a bit stream. Can also refer to a collection of bit streams in part or all of
a codestream.
3.15 decoder: An embodiment of a decoding process, and optionally a colour transformation process.
3.16 decoding process: A process which takes as its input all or part of a codestream and outputs all or part of a
reconstructed image.
3.17 discrete wavelet transformation (DWT): A transformation that iteratively transforms one signal into two or
more filtered and decimated signals corresponding to different frequency bands. This transformation operates on spatially
discrete samples.
3.18 encoder: An embodiment of an encoding process.
3.19 encoding process: A process that takes as its input all or part of a source image data and outputs a codestream.
2 Rec. ITU-T T.804 (04/2015)
3.20 file format: A codestream and additional support data and information not explicitly required for the decoding
of codestream. Examples of such support data include text fields providing titling, security and historical information,
data to support placement of multiple codestreams within a given data file, and data to support exchange between
platforms or conversion to other file formats.
3.21 header: Either a part of the codestream that contains only markers and marker segments (main header and tile-
part header) or the signalling part of a packet (packet header).
3.22 image: The set of all components.
3.23 image area: A rectangular part of the reference grid, registered by offsets from the origin and the extent of the
reference grid.
3.24 image area offset: The number of reference grid points down and to the right of the reference grid origin where
the origin of the image area can be found.
3.25 image data: The components and component samples making up an image. Image data can refer to either the
source image data or the reconstructed image data.
3.26 irreversible: A transformation, progression, system, quantization, or other process that, due to systemic or
quantization error, disallows lossless recovery. An irreversible process can only lead to lossy compression.
3.27 JP2: The name of the file format defined by Rec. ITU-T T.800 | ISO/IEC 15444-1.
3.28 JPEG: Used to refer globally to the encoding and decoding process of the following Recommendations |
International Standards:
– Recommendation ITU-T T.81 (1992) | ISO/IEC 10918-1:1994, Information technology – Digital
compression and coding of continuous-tone still images: Requirements and guidelines.
– Recommendation ITU-T T.83 (1994) | ISO/IEC 10918-2:1995, Information technology – Digital
compression and coding of continuous-tone still images: Compliance testing.
– Recommendation ITU-T T.84 (1996) | ISO/IEC 10918-3:1997, Information technology – Digital
compression and coding of continuous-tone still images: Extensions.
– Recommendation ITU-T T.84 (1996)/Amd. 1 (1999), Information technology – Digital compression and
coding of continuous-tone still images: Extensions – Amendment 1: Provisions to allow registration of
new compression types and versions in the SPIFF header.
– Recommendation ITU-T T.86 (1998) | ISO/IEC 10918-4, Information technology – Digital compression
and coding of continuous-tone still images: Registration of JPEG Profiles, SPIFF Profiles, SPIFF Tags,
SPIFF colour Spaces, APPn Markers, SPIFF Compression types and Registration authorities (REGAUT).
3.29 JPEG 2000: Used to refer globally to the encoding and decoding processes in this Recommendation |
International Standard and their embodiment in applications.
3.30 layer: A collection of compressed image data from coding passes of one, or more, code-blocks of a tile-
component. Layers have an order for encoding and decoding that must be preserved.
3.31 lossless: A descriptive term for the effect of the overall encoding and decoding processes in which the output
of the decoding process is identical to the input to the encoding process. Distortion free restoration can be assured. All of
the coding processes or steps used for encoding and decoding are reversible.
3.32 lossy: A descriptive term for the effect of the overall encoding and decoding processes in which the output of
the decoding process is not identical to the input to the encoding process. There is distortion (measured mathematically).
At least one of the coding processes or steps used for encoding and decoding is irreversible.
3.33 marker: A two-byte code in which the first byte is hexadecimal FF (0xFF) and the second byte is a value
between 1 (0x01) and hexadecimal FE (0xFE).
3.34 marker segment: A marker and associated (not empty) set of parameters.
3.35 packet: A part of the bit stream comprising a packet header and the compressed image data from one layer of
one precinct of one resolution level of one tile-component.
3.36 packet header: Portion of the packet that contains signalling necessary for decoding that packet.
3.37 precinct: A one rectangular region of a transformed tile-component, within each resolution level, used for
limiting the size of packets.
3.38 precision: Number of bits allocated to a particular sample, coefficient, or other binary numerical representation.
Rec. ITU-T T.804 (04/2015) 3
3.39 progression: The order of a codestream where the decoding of each successive bit contributes to a "better"
reconstruction of the image. What metrics make the reconstruction "better" is a function of the application. Some
examples of progression are increasing resolution or improved sample fidelity.
3.40 quantization: A method of reducing the precision of the individual coefficients to reduce the number of bits
used to entropy code them. This is equivalent to division while compressing and multiplying while decompressing.
Quantization can be achieved by an explicit operation with a given quantization value or by dropping (truncating) coding
passes from the codestream.
3.41 raster order: A particular sequential order of data of any type within an array. The raster order starts with the
top left data point and moves to the immediate right data point, and so on, to the end of the row. After the end of the row
is reached, the next data point in the sequence is the left-most data point immediately below the current row. This order
is continued to the end of the array.
3.42 reconstructed image: An image that is the output of a decoder.
3.43 reconstructed sample: A sample reconstructed by the decoder. This always equals the original sample value
in lossless coding but may differ from the original sample value in lossy coding.
3.44 reference grid: A regular rectangular array of points used as a reference for other rectangular arrays of data.
Examples include components and tiles.
3.45 reference tile: A rectangular sub-grid of any size associated with the reference grid.
3.46 region of interest (ROI): A collection of coefficients that are considered of particular relevance by some user
defined measure.
3.47 resolution level: Equivalent to decomposition level with one exception: the LL subband is also a separate
resolution level.
3.48 reversible: A transformation, progression, system, or other process that does not suffer systemic or quantization
error and, therefore, allows lossless signal recovery.
3.49 sample: One element in the two-dimensional array that comprises a component.
3.50 source image: An image used as input to an encoder.
3.51 subband: A group of transform coefficients resulting from the same sequence of low-pass and high-pass
filtering operations, both vertically and horizontally.
3.52 subband coefficient: A transform coefficient within a given subband.
3.53 tile: A rectangular array of points on the reference grid, registered with and offset from the reference grid origin
and defined by a width and height. The tiles which overlap are used to define tile-components.
3.54 tile-component: All the samples of a given component in a tile.
3.55 tile index: The index of the current tile ranging from zero to the number of tiles minus one.
3.56 transformation: A mathematical mapping from one signal space to another.
4 Abbreviations and symbols
4.1 Abbreviations
For the purposes of this Recommendation | International Standard, the following abbreviations apply.
ICC International Colour Consortium
ICT Irreversible Colour transformation
JPEG Joint Photographic Experts Group – The joint ISO/ITU committee responsible for developing
standards for continuous-tone still picture coding. It also refers to the standards produced by this
committee: Rec. ITU-T T.81 | ISO/IEC 10918-1, Rec. ITU-T T.83 | ISO/IEC 10918-2, Rec.
ITU-T T.84 | ISO/IEC 10918-3 and Rec. ITU-T T.87 | ISO/IEC 14495-1.
JURA JPEG Utilities Registration Authority
1D-DWT One-dimensional Discrete Wavelet Transformation
FDWT Forward Discrete Wavelet Transformation
IDWT Inverse Discrete Wavelet Transformation
4 Rec. ITU-T T.804 (04/2015)
LSB Least Significant Bit
MSB Most Significant Bit
PCS Profile Connection Space
RCT Reversible Colour Transformation
ROI Region of Interest
SNR Signal to Noise Ratio
UCS Universal Character Set
URI Uniform Resource Identifier
URL Uniform Resource Locator
UTF-8 UCS Transformation Format 8
UUID Universal Unique Identifier
XML Extensible Markup Language
W3C World-Wide Web Consortium
4.2 Symbols
For the purposes of this Recommendation | International Standard, the following symbols apply.
0x---- Denotes a hexadecimal number
\nnn A three-digit number preceded by a backslash indicates the value of a single byte within a character
string, where the three digits specify the octal value of that byte
COC Coding style component marker
COD Coding style default marker
COM Comment marker
CRG Component registration marker
EPH End of packet header marker
EOC End of codestream marker
PLM Packet length, main header marker
PLT Packet length, tile-part header marker
POC Progression order change marker
PPM Packed packet headers, main header marker
PPT Packed packet headers, tile-part header marker
QCC Quantization component marker
QCD Quantization default marker
RGN Region of interest marker
SIZ Image and tile size marker
SOC Start of codestream marker
SOP Start of packet marker
SOD Start of data marker
SOT Start of tile-part marker
TLM Tile-part lengths marker
5 Conventions
The source files provided are supplied in the form of an individual zip file for each source tree. File locations given in
this Recommendation | International Standard are expressed relative to the top level of the corresponding sou
...
ISO/IEC JTC 1/SC 29/WG 1 N 2412
Date: 2002-12-25
ISO/IEC JTC 1/SC 29/WG 1
(ITU-T SG 16)
Coding of Still Pictures
JBIG JPEG
Joint Bi-level Image Joint Photographic
Experts Group Experts Group
TITLE: The JPEG-2000 Still Image Compression Standard
(Last Revised: 2002-12-25)
SOURCE: Michael D. Adams
Assistant Professor
Dept. of Electrical and Computer Engineering
University of Victoria
P. O. Box 3055 STN CSC, Victoria, BC, V8W 3P6, CANADA
E-mail: mdadams@ece.uvic.ca
Web: www.ece.uvic.ca/˜mdadams
PROJECT: JPEG 2000
STATUS:
REQUESTED ACTION: None
DISTRIBUTION: Public
Contact:
ISO/IEC JTC 1/SC 29/WG 1 Convener—Dr. Daniel T. Lee
Yahoo! Asia, Sunning Plaza, Rm 2802, 10 Hysan Avenue, Causeway Bay, Hong Kong
Yahoo! Inc, 701 First Avenue, Sunnyvale, California 94089, USA
Tel: +1 408 349 7051/+852 2882 3898, Fax: +1 253 830 0372, E-mail: dlee@yahoo-inc.com
THIS PAGE WAS INTENTIONALLY LEFT BLANK
(TO ACCOMMODATE DUPLEX PRINTING).
Copyright
c 2002 by Michael D. Adams 1
The JPEG-2000 Still Image Compression Standard
(Last Revised: 2002-12-25)
Michael D. Adams
Dept. of Electrical and Computer Engineering, University of Victoria
P. O. Box 3055 STN CSC, Victoria, BC, V8W 3P6, CANADA
E-mail: mdadams@ece.uvic.ca Web: www.ece.uvic.ca/˜mdadams
Abstract—JPEG 2000, a new international standard for still image com- marks in Section IV. Throughout our presentation, a basic un-
pression, is discussed at length. A high-level introduction to the JPEG-2000
derstanding of image coding is assumed.
standard is given, followed by a detailed technical description of the JPEG-
2000 Part-1 codec.
II. JPEG 2000
Keywords—JPEG 2000, still image compression/coding, standards.
The JPEG-2000 standard supports lossy and lossless com-
pression of single-component (e.g., grayscale) and multi-
I. INTRODUCTION
component (e.g., color) imagery. In addition to this basic com-
IGITAL IMAGERY is pervasive in our world today. Con-
pression functionality, however, numerous other features are
Dsequently, standards for the efficient representation and
provided, including: 1) progressive recovery of an image by fi-
interchange of digital images are essential. To date, some of
delity or resolution; 2) region of interest coding, whereby differ-
the most successful still image compression standards have re-
ent parts of an image can be coded with differing fidelity; 3) ran-
sulted from the ongoing work of the Joint Photographic Experts
dom access to particular regions of an image without needing to
Group (JPEG). This group operates under the auspices of Joint
decode the entire code stream; 4) a flexible file format with pro-
Technical Committee 1, Subcommittee 29, Working Group 1
visions for specifying opacity information and image sequences;
(JTC 1/SC 29/WG 1), a collaborative effort between the In-
and 5) good error resilience. Due to its excellent coding per-
ternational Organization for Standardization (ISO) and Interna-
formance and many attractive features, JPEG 2000 has a very
tional Telecommunication Union Standardization Sector (ITU-
large potential application base. Some possible application ar-
T). Both the JPEG [1–3] and JPEG-LS [4–6] standards were
eas include: image archiving, Internet, web browsing, document
born from the work of the JPEG committee. For the last few
imaging, digital photography, medical imaging, remote sensing,
years, the JPEG committee has been working towards the estab-
and desktop publishing.
lishment of a new standard known as JPEG 2000 (i.e., ISO/IEC
15444). The fruits of these labors are now coming to bear, as
A. Why JPEG 2000?
JPEG-2000 Part 1 (i.e., ISO/IEC 15444-1 [7]) has recently been
Work on the JPEG-2000 standard commenced with an initial
approved as a new international standard.
call for contributions [14] in March 1997. The purpose of having
In this paper, we provide a detailed technical description of
a new standard was twofold. First, it would address a number
the JPEG-2000 Part-1 codec, in addition to a brief overview of
of weaknesses in the existing JPEG standard. Second, it would
the JPEG-2000 standard. This exposition is intended to serve as
provide a number of new features not available in the JPEG stan-
a reader-friendly starting point for those interested in learning
dard. The preceding points led to several key objectives for the
about JPEG 2000. Although many details are included in our
new standard, namely that it should: 1) allow efficient lossy and
presentation, some details are necessarily omitted. The reader
lossless compression within a single unified coding framework,
should, therefore, refer to the standard [7] before attempting an
2) provide superior image quality, both objectively and subjec-
implementation. The JPEG-2000 codec realization in the JasPer
tively, at low bit rates, 3) support additional features such as re-
software [8–10] may also serve as a practical guide for imple-
gion of interest coding, and a more flexible file format, 4) avoid
mentors. (See Appendix A for more information about JasPer.)
excessive computational and memory complexity. Undoubtedly,
The reader may also find [11–13] to be useful sources of infor-
much of the success of the original JPEG standard can be at-
mation on the JPEG-2000 standard.
tributed to its royalty-free nature. Consequently, considerable
The remainder of this paper is structured as follows. Sec-
effort has been made to ensure that minimally-compliant JPEG-
tion II begins with a overview of the JPEG-2000 standard. This
2000 codec can be implemented free of royalties .
is followed, in Section III, by a detailed description of the JPEG-
2000 Part-1 codec. Finally, we conclude with some closing re-
B. Structure of the Standard
The JPEG-2000 standard is comprised of numerous parts,
This document is a revised version of the JPEG-2000 tutorial that I wrote
which appeared in the JPEG working group document WG1N1734. The original
several of which are listed in Table I. For convenience, we will
tutorial contained numerous inaccuracies, some of which were introduced by
refer to the codec defined in Part 1 of the standard as the baseline
changes in the evolving draft standard while others were due to typographical
errors. Hopefully, most of these inaccuracies have been corrected in this revised
document. In any case, this document will probably continue to evolve over Whether these efforts ultimately prove successful remains to be seen, how-
time. Subsequent versions of this document will be made available from my ever, as there are still some unresolved intellectual property issues at the time of
home page (the URL for which is provided with my contact information). this writing.
...
2 Copyright
c 2002 by Michael D. Adams
codec. The baseline codec is simply the core (or minimal func-
tionality) coding system for the JPEG-2000 standard. Parts 2
(i.e., [15]) and 3 (i.e., [16]) describe extensions to the baseline
codec that are useful for certain specific applications such as
intraframe-style video compression. In this paper, we will, for
Component N−1
the most part, limit our discussion to the baseline codec. Some
Component 2
of the extensions proposed for inclusion in Part 2 will be dis-
Component 1
cussed briefly. Unless otherwise indicated, our exposition con-
Component 0
siders only the baseline system.
(a)
For the most part, the JPEG-2000 standard is written from the
Component i
point of view of the decoder. That is, the decoder is defined quite
precisely with many details being normative in nature (i.e., re-
...
quired for compliance), while many parts of the encoder are less
rigidly specified. Obviously, implementors must make a very
clear distinction between normative and informative clauses in
the standard. For the purposes of our discussion, however, we
will only make such distinctions when absolutely necessary.
(b)
III. JPEG-2000 CODEC
Fig. 1. Source image model. (a) An image with N components. (b) Individual
component.
Having briefly introduced the JPEG-2000 standard, we are
now in a position to begin examining the JPEG-2000 codec in
Xsiz
detail. The codec is based on wavelet/subband coding tech-
(0,0)
niques [19, 20]. It handles both lossy and lossless compres-
YOsiz
sion using the same transform-based framework, and borrows
heavily on ideas from the embedded block coding with opti- (XOsiz,YOsiz)
mized truncation (EBCOT) scheme [21–23]. In order to fa-
Ysiz
cilitate both lossy and lossless coding in an efficient manner,
Image Area
Ysiz−YOsiz
reversible integer-to-integer [24–26] and nonreversible real-to-
real transforms are employed. To code transform data, the codec
makes use of bit-plane coding techniques. For entropy coding,
(Xsiz−1,Ysiz−1)
a context-based adaptive binary arithmetic coder [27] is used—
XOsiz Xsiz−XOsiz
more specifically, the MQ coder from the JBIG2 standard [28].
Two levels of syntax are employed to represent the coded image:
Fig. 2. Reference grid.
a code stream and file format syntax. The code stream syntax is
similar in spirit to that used in the JPEG standard.
The remainder of Section III is structured as follows. First,
components with one component representing each of the red,
Sections III-A to III-C, discuss the source image model and
green, and blue color planes. In the simple case of a grayscale
how an image is internally represented by the codec. Next, Sec-
image, there is only one component, corresponding to the lu-
tion III-D examines the basic structure of the codec. This is minance plane. The various components of an image need not
followed, in Sections III-E to III-M by a detailed explanation of be sampled at the same resolution. Consequently, the compo-
the coding engine itself. Next, Sections III-N and III-O explain
nents themselves can have different sizes. For example, when
the syntax used to represent a coded image. Finally, Section III-
color images are represented in a luminance-chrominance color
P briefly describes some extensions proposed for inclusion in
space, the luminance information is often more finely sampled
Part 2 of the standard.
than the chrominance data.
A. Source Image Model B. Reference Grid
Before examining the internals of the codec, it is important to Given an image, the codec describes the geometry of the var-
understand the image model that it employs. From the codec’s ious components in terms of a rectangular grid called the ref-
point of view, an image is comprised of one or more compo- erence grid. The reference grid has the general form shown
nents (up to a limit of 2 ), as shown in Fig. 1(a). As illustrated in Fig. 2. The grid is of size Xsiz Ysiz with the origin lo-
in Fig. 1(b), each component consists of a rectangular array of cated at its top-left corner. The region with its top-left corner at
samples. The sample values for each component are integer val- (XOsiz;YOsiz) and bottom-right corner at (Xsiz 1;Ysiz 1)
ued, and can be either signed or unsigned with a precision from is called the image area, and corresponds to the picture data to
1 to 38 bits/sample. The signedness and precision of the sample be represented. The width and height of the reference grid can-
data are specified on a per-component basis. not exceed 2 1 units, imposing an upper bound on the size
All of the components are associated with the same spatial ex- of an image that can be handled by the codec.
tent in the source image, but represent different spectral or aux- All of the components are mapped onto the image area of
iliary information. For example, a RGB color image has three the reference grid. Since components need not be sampled at
...
...
Copyright
c 2002 by Michael D. Adams 3
TABLE I
PARTS OF THE STANDARD
Part Title Purpose
1 Core coding system [7] Specifies the core (or minimum functionality) codec for the JPEG-2000 family of standards.
2 Extensions [15] Specifies additional functionalities that are useful in some applications but need not be supported
by all codecs.
3 Motion JPEG 2000 [16] Specifies extensions to JPEG-2000 for intraframe-style video compression.
4 Conformance testing [17] Specifies the procedure to be employed for compliance testing.
5 Reference software [18] Provides sample software implementations of the standard to serve as a guide for implementors.
Xsiz
the full resolution of the reference grid, additional information
(0,0)
is required in order to establish this mapping. For each com-
YTOsiz
ponent, we indicate the horizontal and vertical sampling period
(XTOsiz,YTOsiz)
in units of the reference grid, denoted as XRsiz and YRsiz, re-
spectively. These two parameters uniquely specify a (rectangu-
(XOsiz,YOsiz) YTsiz
lar) sampling grid consisting of all points whose horizontal and
T T T
0 1 2
vertical positions are integer multiples of XRsiz and YRsiz, re-
Ysiz
spectively. All such points that fall within the image area, con-
stitute samples of the component in question. Thus, in terms
T T T
3 4 5 YTsiz
of its own coordinate system, a component will have the size
Xsiz XOsiz Ysiz YOsiz
and its top-left sam-
XRsiz XRsiz YRsiz YRsiz
XOsiz YOsiz
T T T
ple will correspond to the point ; : Note that 6 7 8
XRsiz YRsiz
the reference grid also imposes a particular alignment of sam-
YTsiz
ples from the various components relative to one another.
From the diagram, the size of the image area is (Xsiz
XOsiz) (Ysiz YOsiz). For a given image, many combina-
XTOsiz XTsiz XTsiz XTsiz
tions of the Xsiz, Ysiz, XOsiz, and YOsiz parameters can be
chosen to obtain an image area with the same size. Thus, one
Fig. 3. Tiling on the reference grid.
might wonder why the XOsiz and YOsiz parameters are not
fixed at zero while the Xsiz and Ysiz parameters are set to the
size of the image. As it turns out, there are subtle implications
ple, suppose that a tile has an upper left corner and lower right
to changing the XOsiz and YOsiz parameters (while keeping
corner with coordinates (tx ;ty ) and (tx 1;ty 1), respec-
0 1
0 1
the size of the image area constant). Such changes affect codec
tively. Then, in the coordinate space of a particular component,
behavior in several important ways, as will be described later.
the tile would have an upper left corner and lower right cor-
This behavior allows a number of basic operations to be per-
ner with coordinates (tcx ;tcy ) and (tcx 1;tcy 1), respec-
0 1
0 1
formed efficiently on coded images, such as cropping, horizon-
tively, where
tal/vertical flipping, and rotation by an integer multiple of 90
degrees. (tcx ;tcy ) = (dtx =XRsize;dty =YRsize) (1a)
0 0
0 0
(tcx ;tcy ) = (dtx =XRsize;dty =YRsize): (1b)
1 1
1 1
C. Tiling
These equations correspond to the illustration in Fig. 4. The por-
In some situations, an image may be quite large in compar-
tion of a component that corresponds to a single tile is referred
ison to the amount of memory available to the codec. Conse-
to as a tile-component. Although the tiling grid is regular with
quently, it is not always feasible to code the entire image as a
respect to the reference grid, it is important to note that the grid
single atomic unit. To solve this problem, the codec allows an
may not necessarily be regular with respect to the coordinate
image to be broken into smaller pieces, each of which is inde-
systems of the components.
pendently coded. More specifically, an image is partitioned into
one or more disjoint rectangular regions called tiles. As shown
D. Codec Structure
in Fig. 3, this partitioning is performed with respect to the ref-
erence grid by overlaying the reference grid with a rectangu- The general structure of the codec is shown in Fig. 5 with
lar tiling grid having horizontal and vertical spacings of XTsiz the form of the encoder given by Fig. 5(a) and the decoder
and YTsiz, respectively. The origin of the tiling grid is aligned given by Fig. 5(b). From these diagrams, the key processes
with the point (XTOsiz;YTOsiz). Tiles have a nominal size of associated with the codec can be identified: 1) preprocess-
XTsiz YTsiz, but those bordering on the edges of the image ing/postprocessing, 2) intercomponent transform, 3) intracom-
area may have a size which differs from the nominal size. The ponent transform, 4) quantization/dequantization, 5) tier-1 cod-
tiles are numbered in raster scan order (starting at zero). ing, 6) tier-2 coding, and 7) rate control. The decoder structure
By mapping the position of each tile from the reference grid essentially mirrors that of the encoder. That is, with the excep-
to the coordinate systems of the individual components, a par- tion of rate control, there is a one-to-one correspondence be-
titioning of the components themselves is obtained. For exam- tween functional blocks in the encoder and decoder. Each func-
4 Copyright
c 2002 by Michael D. Adams
(0,0)
age data from the RGB to YCrCb color space. The transforms
are defined to operate on the first three components of an image,
with the assumption that components 0, 1, and 2 correspond
to the red, green, and blue color planes. Due to the nature of
( , )tcx tcy
0 0
these transforms, the components on which they operate must
be sampled at the same resolution (i.e., have the same size). As
a consequence of the above facts, the ICT and RCT can only be
Tile−Component Data
employed when the image being coded has at least three com-
ponents, and the first three components are sampled at the same
resolution. The ICT may only be used in the case of lossy cod-
( −1, −1)tcx tcy
1 1
ing, while the RCT can be used in either the lossy or lossless
case. Even if a transform can be legally employed, it is not
Fig. 4. Tile-component coordinate system.
necessary to do so. That is, the decision to use a multicompo-
nent transform is left at the discretion of the encoder. After the
intercomponent transform stage in the encoder, data from each
tional block in the decoder either exactly or approximately in-
component is treated independently.
verts the effects of its corresponding block in the encoder. Since
The ICT is nothing more than the classic RGB to YCrCb color
tiles are coded independently of one another, the input image
space transform. The forward transform is defined as
is (conceptually, at least) processed one tile at a time. In the
2 3 2 32 3
sections that follow, each of the above processes is examined in
V (x;y) 0:299 0:587 0:114 U (x;y)
0 0
4 5 4 54 5
V (x;y) = 0:16875 0:33126 0:5 U (x;y) (2)
more detail. 1 1
V (x;y) 0:5 0:41869 0:08131 U (x;y)
2 2
E. Preprocessing/Postprocessing
where U (x;y), U (x;y), and U (x;y) are the input components
0 1 2
The codec expects its input sample data to have a nominal
corresponding to the red, green, and blue color planes, respec-
dynamic range that is approximately centered about zero. The
tively, and V (x;y), V (x;y), and V (x;y) are the output compo-
0 1 2
preprocessing stage of the encoder simply ensures that this ex-
nents corresponding to the Y, Cr, and Cb planes, respectively.
pectation is met. Suppose that a particular component has P
The inverse transform can be shown to be
bits/sample. The samples may be either signed or unsigned,
2 3 2 32 3
U (x;y) 1 0 1:402 V (x;y)
P1 P1 0 0
leading to a nominal dynamic range of [2 ;2 1] or
4 5 4 54 5
U (x;y) = 1 0:34413 0:71414 V (x;y) (3)
1 1
P
[0;2 1], respectively. If the sample values are unsigned, the
U (x;y) 1 1:772 0 V (x;y)
2 2
nominal dynamic range is clearly not centered about zero. Thus,
The RCT is simply a reversible integer-to-integer approxima-
the nominal dynamic range of the samples is adjusted by sub-
P1
tion to the ICT (similar to that proposed in [26]). The forward
tracting a bias of 2 from each of the sample values. If the
sample values for a component are signed, the nominal dynamic transform is given by
range is already centered about zero, and no processing is re-
V (x;y) = (U (x;y)+ 2U (x;y)+U (x;y)) (4a)
0 0 1 2
quired. By ensuring that the nominal dynamic range is centered
about zero, a number of simplifying assumptions could be made
V (x;y) = U (x;y)U (x;y) (4b)
1 2 1
in the design of the codec (e.g., with respect to context model-
V (x;y) = U (x;y)U (x;y) (4c)
2 0 1
ing, numerical overflow, etc.).
The postprocessing stage of the decoder essentially undoes
where U (x;y), U (x;y), U (x;y), V (x;y), V (x;y), and V (x;y)
0 1 2 0 1 2
the effects of preprocessing in the encoder. If the sample val-
are defined as above. The inverse transform can be shown to be
ues for a component are unsigned, the original nominal dynamic
U (x;y) = V (x;y) (V (x;y)+V (x;y)) (5a)
range is restored. Lastly, in the case of lossy coding, clipping is 1 0 1 2
performed to ensure that the sample values do not exceed the
U (x;y) = V (x;y)+U (x;y) (5b)
0 2 1
allowable range.
U (x;y) = V (x;y)+U (x;y) (5c)
2 1 1
F. Intercomponent Transform
The inverse intercomponent transform stage in the decoder
essentially undoes the effects of the forward intercomponent
In the encoder, the preprocessing stage is followed by the for-
transform stage in the encoder. If a multicomponent transform
ward intercomponent transform stage. Here, an intercomponent
was applied during encoding, its inverse is applied here. Unless
transform can be applied to the tile-component data. Such a
the transform is reversible, however, the inversion may only be
transform operates on all of the components together, and serves
approximate due to the effects of finite-precision arithmetic.
to reduce the correlation between components, leading to im-
proved coding efficiency.
G. Intracomponent Transform
Only two intercomponent transforms are defined in the base-
line JPEG-2000 codec: the irreversible color transform (ICT) Following the intercomponent transform stage in the encoder
and reversible color transform (RCT). The ICT is nonreversible is the intracomponent transform stage. In this stage, transforms
and real-to-real in nature, while the RCT is reversible and that operate on individual components can be applied. The par-
integer-to-integer. Both of these transforms essentially map im- ticular type of operator employed for this purpose is the wavelet
Copyright
c 2002 by Michael D. Adams 5
Rate Control
Original Coded
Image Image
Forward Forward
Tier−1 Tier−2
Preprocessing Quantization
Intercomponent Intracomponent
Encoder Encoder
Transform Transform
(a)
Coded Reconstructed
Image Image
Inverse Inverse
Tier−2 Tier−1
Dequantization Intracomponent Intercomponent Postprocessing
Decoder Decoder
Transform Transform
(b)
Fig. 5. Codec structure. The structure of the (a) encoder and (b) decoder.
x[n] y [n]
- - - - h- - - h- -
transform. Through the application of the wavelet transform,
#2 + + s
? ?
a component is split into numerous frequency bands (i.e., sub-
? 6 6
A (z) Q A (z) Q
0 1 λ2
λ1
bands). Due to the statistical properties of these subband signals,
z
the transformed data can usually be coded more efficiently than ? ?
6 6
the original untransformed data. A (z) A (z)
Q 1 Q
0 λ2 λ1
?
? ?
y [n]
Both reversible integer-to-integer [24, 25, 29–31] and non-
- - h - 6- - h - 6- -
#2 + + s
reversible real-to-real wavelet transforms are employed by the
(a)
baseline codec. The basic building block for such transforms
y [n] x[n]
+ +
- - h - - - h - - - h-
s + + "2 +
is the 1-D 2-channel perfect-reconstruction (PR) uniformly-
?
?
maximally-decimated (UMD) filter bank (FB) which has the
6 6
Q A (z) Q A (z) 6
1 0
λ1 λ2
general form shown in Fig. 6. Here, we focus on the lifting
z
? ?
realization of the UMDFB [32, 33], as it can be used to imple-
6 6
A (z) Q A (z) Q
1 0
ment the reversible integer-to-integer and nonreversible real-to- λ1 λ2
? ?
y [n]
real wavelet transforms employed by the baseline codec. In fact, 1
+ +
- - - h- - 6 - h- -
s + + "2
for this reason, it is likely that this realization strategy will be
(b)
employed by many codec implementations. The analysis side
of the UMDFB, depicted in Fig. 6(a), is associated with the for-
Fig. 6. Lifting realization of 1-D 2-channel PR UMDFB. (a) Analysis side. (b)
ward transform, while the synthesis side, depicted in Fig. 6(b),
Synthesis side.
is associated with the inverse transform. In the diagram, the
λ1 λ1
fA (z)g ,fQ (x)g , andfsg denote filter transfer func-
i i i
i=0 i=0 i=0
tions, quantization operators, and (scalar) gains, respectively. To
lowpass (LL), 2) horizontally lowpass and vertically highpass
λ1
obtain integer-to-integer mappings, thefQ (x)g are selected
i
i=0 (LH), 3) horizontally highpass and vertically lowpass (HL), and
such that they always yield integer values, and the fsg are
i
i=0 4) horizontally and vertically highpass (HH). A (R 1)-level
λ1
chosen as integers. For real-to-real mappings, the fQ (x)g
i wavelet decomposition is associated with R resolution levels,
i=0
are simply chosen as the identity, and the fsg are selected
numbered from 0 to R 1, with 0 and R 1 corresponding
i
i=0
from the real numbers. To facilitate filtering at signal bound-
to the coarsest and finest resolutions, respectively. Each sub-
aries, symmetric extension [34, 35] is employed. Since an im-
band of the decomposition is identified by its orientation (e.g.,
age is a 2-D signal, clearly we need a 2-D UMDFB. By applying
LL, LH, HL, HH) and its corresponding resolution level (e.g.,
the 1-D UMDFB in both the horizontal and vertical directions,
0;1;:::;R1). The input tile-component signal is considered to
a 2-D UMDFB is effectively obtained. The wavelet transform is
be the LL band. At each resolution level (except the lowest)
R1
then calculated by recursively applying the 2-D UMDFB to the
the LL band is further decomposed. For example, the LL
R1
lowpass subband signal obtained at each level in the decompo-
band is decomposed to yield the LL , LH , HL , and
R2 R2 R2
sition.
HH bands. Then, at the next level, the LL band is de-
R2 R2
composed, and so on. This process repeats until the LL band
Suppose that a (R 1)-level wavelet transform is to be em-
is obtained, and results in the subband structure illustrated in
ployed. To compute the forward transform, we apply the anal-
Fig. 7. In the degenerate case where no transform is applied,
ysis side of the 2-D UMDFB to the tile-component data in an
R = 1, and we effectively have only one subband (i.e., the LL
iterative manner, resulting in a number of subband signals be-
band).
ing produced. Each application of the analysis side of the 2-D
UMDFB yields four subbands: 1) horizontally and vertically As described above, the wavelet decomposition can be as-
...
6 Copyright
c 2002 by Michael D. Adams
see that (tbx ;tby ) = (trx ;try ) and (tbx ;tby ) = (trx ;try )
0 0 1 1
0 0 1 1
LL .
for the LL band, as one would expect. (This should be the case
r
HL
R−2
since the LL band is equivalent to a reduced resolution version
r
of the original data.) As will be seen, the coordinate systems for
HL
R−1
the various resolutions and subbands of a tile-component play
an important role in codec behavior.
LH HH
R−2 R−2
By examining (1), (6), and (7), we observe that the coordi-
nates of the top-left sample for a particular subband, denoted
(tbx ;tby ), are partially determined by the XOsiz and YOsiz
parameters of the reference grid. At each level of the decompo-
sition, the parity (i.e., oddness/evenness) of tbx and tby affects
LH HH
the outcome of the downsampling process (since downsampling
R−1 R−1
is shift variant). In this way, the XOsiz and YOsiz parameters
have a subtle, yet important, effect on the transform calculation.
Having described the general transform framework, we now
describe the two specific wavelet transforms supported by the
baseline codec: the 5/3 and 9/7 transforms. The 5/3 transform
Fig. 7. Subband structure.
is reversible, integer-to-integer, and nonlinear. This transform
was proposed in [24], and is simply an approximation to a linear
sociated with data at R different resolutions. Suppose that the
wavelet transform proposed in [36]. The 5/3 transform has an
top-left and bottom-right samples of a tile-component have co-
underlying 1-D UMDFB with the parameters:
ordinates (tcx ;tcy ) and (tcx 1;tcy 1), respectively. This
0 1
0 1
1 1 1
λ = 2; A (z) = (z+ 1); A (z) = (1+ z ); (8)
being the case, the top-left and bottom-right samples of the 0 1
2 4
tile-component at resolution r have coordinates (trx ;try ) and
Q (x) =bxc; Q (x) = x+ ; s = s = 1:
0 1 0 1
(trx 1;try 1), respectively, given by
The 9/7 transform is nonreversible and real-to-real. This trans-
Rr1 Rr1
(trx ;try ) = tcx =2 ; tcy =2 (6a)
0 0
0 0
form, proposed in [20], is also employed in the FBI fingerprint
Rr1 Rr1
(trx ;try ) = tcx =2 ; tcy =2 (6b) compression standard [37] (although the normalizations differ).
1 1
1 1
The 9/7 transform has an underlying 1-D UMDFB with the pa-
where r is the particular resolution of interest. Thus, the tile-
rameters:
component signal at a particular resolution has the size (trx
trx )(try try ).
1 0 λ = 4; A (z) = α (z+ 1); A (z) = α (1+ z ); (9)
0 0 1 1
Not only are the coordinate systems of the resolution levels
A (z) = α (z+ 1); A (z) = α (1+ z );
2 2 3 3
important, but so too are the coordinate systems for the various
Q (x) = x for i = 0;1;2;3;
i
subbands. Suppose that we denote the coordinates of the upper
left and lower right samples in a subband as (tbx ;tby ) and
α 1:586134; α 0:052980; α 0:882911;
0 0 1 2
(tbx 1;tby 1), respectively. These quantities are computed
α 0:443506; s 1:230174; s = 1=s :
3 0 1 0
as
Since the 5/3 transform is reversible, it can be employed for
(tbx ;tby )
8l m l m
either lossy or lossless coding. The 9/7 transform, lacking the
tcy
tcx
0 0
>
; for LL band
>
Rr1 Rr1 reversible property, can only be used for lossy coding. The num-
> 2 2
>l m l m
>
> tcx tcy ber of resolution levels is a parameter of each transform. A typi-
0 0
<
; for HL band
(7a)
Rr1 Rr1
2 2 2
l m l m cal value for this parameter is six (for a sufficiently large image).
=
tcx tcy
0 0
>
; for LH band
> The encoder may transform all, none, or a subset of the compo-
Rr1 Rr1
> 2
2 2
>
l m l m
>
>
tcx tcy nents. This choice is at the encoder’s discretion.
1 1
: 0 0
; for HH band
Rr1 Rr1
2 2
2 2
The inverse intracomponent transform stage in the decoder
essentially undoes the effects of the forward intracomponent
(tbx ;tby )
transform stage in the encoder. If a transform was applied to
8l m l m
tcy
tcx
1 1
>
; for LL band a particular component during encoding, the corresponding in-
>
Rr1 Rr1
>
2 2
>l m l m
>
verse transform is applied here. Due to the effects of finite-
> tcy
tcx 1
1 1
<
; for HL band
(7b)
Rr1 Rr1
2 2 precision arithmetic, the inversion process is not guaranteed to
l m l m
=
tcy
tcx 1
1 1
>
be exact unless reversible transforms are employed.
; for LH band
>
Rr1 Rr1
> 2 2 2
>l m l m
>
> tcy
tcx 1 1
1 1
:
; for HH band
Rr1 Rr1
H. Quantization/Dequantization
2 2 2 2
where r is the resolution level to which the band belongs, R is In the encoder, after the tile-component data has been trans-
the number of resolution levels, and tcx , tcy , tcx , and tcy are formed (by intercomponent and/or intracomponent transforms),
0 1
0 1
as defined in (1). Thus, a particular band has the size (tbx the resulting coefficients are quantized. Quantization allows
tbx ) (tby tby ). From the above equations, we can also greater compression to be achieved, by representing transform
1 0
...
...
Copyright
c 2002 by Michael D. Adams 7
coefficients with only the minimal precision required to obtain
(0,0)
the desired level of image quality. Quantization of transform co-
efficients is one of the two primary sources of information loss
in the coding path (the other source being the discarding of cod-
xcb’ ycb’
ing pass data as will be described later).
(m , n )2 2
Transform coefficients are quantized using scalar quantiza- . ycb’
( , )tbx tby 2
0 0
tion with a deadzone. A different quantizer is employed for the
B B B
0 1 2
coefficients of each subband, and each quantizer has only one
parameter, its step size. Mathematically, the quantization pro-
... ycb’
cess is defined as
B B B
3 4 5 2
V(x;y) =bjU(x;y)j=ΔcsgnU(x;y) (10)
B B B
6 7 8
where Δ is the quantizer step size, U(x;y) is the input sub- ycb’
...
band signal, and V(x;y) denotes the output quantizer indices for ( −1, −1)tbx tby
1 1
the subband. Since this equation is specified in an informative
clause of the standard, encoders need not use this precise for-
xcb’ xcb’ xcb’
mula. This said, however, it is likely that many encoders will, in
2 2 2
fact, use the above equation.
The baseline codec has two distinct modes of operation, re-
ferred to herein as integer mode and real mode. In integer mode,
Fig. 8. Partitioning of subband into code blocks.
all transforms employed are integer-to-integer in nature (e.g.,
RCT, 5/3 WT). In real mode, real-to-real transforms are em-
ployed (e.g., ICT, 9/7 WT). In integer mode, the quantizer step
parameter of the coding process, subject to certain constraints,
sizes are always fixed at one, effectively bypassing quantization
most notably: 1) the nominal width and height of a code block
and forcing the quantizer indices and transform coefficients to
must be an integer power of two, and 2) the product of the nom-
be one and the same. In this case, lossy coding is still possible,
inal width and height cannot exceed 4096.
but rate control is achieved by another mechanism (to be dis-
Suppose that the nominal code block size is tentatively cho-
cussed later). In the case of real mode (which implies lossy cod- xcb ycb
sen to be 2 2 . In tier-2 coding, yet to be discussed,
ing), the quantizer step sizes are chosen in conjunction with rate
code blocks are grouped into what are called precincts. Since
control. Numerous strategies are possible for the selection of the
code blocks are not permitted to cross precinct boundaries, a re-
quantizer step sizes, as will be discussed later in Section III-L.
duction in the nominal code block size may be required if the
As one might expect, the quantizer step sizes used by the en-
precinct size is sufficiently small. Suppose that the nominal
coder are conveyed to the decoder via the code stream. In pass- xcb’ ycb’
code block size after any such adjustment is 2 2 where
ing, we note that the step sizes specified in the code stream are
xcb’ xcb and ycb’ ycb. The subband is partitioned into code
relative and not absolute quantities. That is, the quantizer step
blocks by overlaying the subband with a rectangular grid having
size for each band is specified relative to the nominal dynamic xcb’ ycb’
horizontal and vertical spacings of 2 and 2 , respectively,
range of the subband signal.
as shown in Fig. 8. The origin of this grid is anchored at (0;0) in
In the decoder, the dequantization stage tries to undo the ef-
the coordinate system of the subband. A typical choice for the
fects of quantization. Unless all of the quantizer step sizes are
nominal code block size is 64 64 (i.e., xcb = 6 and ycb = 6).
less than or equal to one, the quantization process will normally
Let us, again, denote the coordinates of the top-left sample
result in some information loss, and this inversion process is
in a subband as (tbx ;tby ). As explained in Section III-G, the
only approximate. The quantized transform coefficient values
quantity (tbx ;tby ) is partially determined by the reference grid
are obtained from the quantizer indices. Mathematically, the de-
parameters XOsiz and YOsiz. In turn, the quantity (tbx ;tby )
quantization process is defined as
affects the position of code block boundaries within a subband.
In this way, the XOsiz and YOsiz parameters have an impor-
U(x;y) = (V(x;y)+ r sgnV(x;y))Δ (11)
tant effect on the behavior of the tier-1 coding process (i.e., they
affect the location of code block boundaries).
where Δ is the quantizer step size, r is a bias parameter, V(x;y)
are the input quantizer indices for the subband, and U(x;y) is After a subband has been partitioned into code blocks, each
the reconstructed subband signal. Although the value of r is of the code blocks is independently coded. The coding is per-
formed using the bit-plane coder described later in Section III-J.
not normatively specified in the standard, it is likely that many
For each code block, an embedded code is produced, comprised
decoders will use the value of one half.
of numerous coding passes. The output of the tier-1 encoding
I. Tier-1 Coding
process is, therefore, a collection of coding passes for the vari-
ous code blocks.
After quantization is performed in the encoder, tier-1 coding
takes place. This is the first of two coding stages. The quantizer On the decoder side, the bit-plane coding passes for the var-
indices for each subband are partitioned into code blocks. Code ious code blocks are input to the tier-1 decoder, these passes
blocks are rectangular in shape, and their nominal size is a free are decoded, and the resulting data is assembled into subbands.
...
...
...
8 Copyright
c 2002 by Michael D. Adams
d v d
In this way, we obtain the reconstructed quantizer indices for v
0 0 1
each subband. In the case of lossy coding, the reconstructed
h x h
h x h
0 1
0 1
quantizer indices may only be approximations to the quantizer
v d v d
1 2 1 3
indices originally available at the encoder. This is attributable
to the fact that the code stream may only include a subset of
(a) (b)
the coding passes generated by the tier-1 encoding process. In
Fig. 9. Templates for context selection. The (a) 4-connected and (b) 8-
the lossless case, the reconstructed quantizer indices must be
connected neighbors.
same as the original indices on the encoder side, since all cod-
ing passes must be included for lossless coding.
allows the computational complexity of bit-plane coding to be
J. Bit-Plane Coding
significantly reduced, by decreasing the number of symbols that
The tier-1 coding process is essentially one of bit-plane cod-
must be arithmetically coded. This comes, of course, at the cost
ing. After all of the subbands have been partitioned into code
of reduced coding efficiency.
blocks, each of the resulting code blocks is independently coded
As indicated above, coding pass data can be encoded using
using a bit-plane coder. Although the bit-plane coding tech-
one of two schemes (i.e., arithmetic or raw coding). Consecu-
nique employed is similar to those used in the embedded ze-
tive coding passes that employ the same encoding scheme con-
rotree wavelet (EZW) [38] and set partitioning in hierarchical
stitute what is known as a segment. All of the coding passes in a
trees (SPIHT) [39] codecs, there are two notable differences:
segment can collectively form a single codeword or each coding
1) no interband dependencies are exploited, and 2) there are
pass can form a separate codeword. Which of these is the case
three coding passes per bit plane instead of two. The first dif-
is determined by the termination mode in effect. Two termina-
ference follows from the fact that each code block is completely
tion modes are supported: per-pass termination and per-segment
contained within a single subband, and code blocks are coded in-
termination. In the first case, only the last coding pass of a seg-
dependently of one another. By not exploiting interband depen-
ment is terminated. In the second case, all coding passes are
dencies, improved error resilience can be achieved. The second
terminated. Terminating all coding passes facilitates improved
difference is arguably less fundamental. Using three passes per
error resilience at the expense of decreased coding efficiency.
bit plane instead of two reduces the amount of data associated
Since context-based arithmetic coding is employed, a means
with each coding pass, facilitating finer control over rate. Also,
for context selection is necessary. Generally speaking, context
using an additional pass per bit plane allows better prioritization
selection is performed by examining state information for the
of important data, leading to improved coding efficiency.
4-connected or 8-connected neighbors of a sample as shown in
As noted above, there are three coding passes per bit plane. In
Fig. 9.
order, these passes are as follows: 1) significance, 2) refinement,
In our explanation of the coding passes that follows, we focus
and 3) cleanup. All three types of coding passes scan the sam-
on the encoder side as this facilitates easier understanding. The
ples of a code block in the same fixed order shown in Fig. 10.
decoder algorithms follow directly from those employed on the
The code block is partitioned into horizontal stripes, each hav-
encoder side.
ing a nominal height of four samples. If the code block height
is not a multiple of four, the height of the bottom stripe will J.1 Significance Pass
be less than this nominal value. As shown in the diagram, the
The first coding pass for each bit plane is the significance
stripes are scanned from top to bottom. Within a stripe, columns
pass. This pass is used to convey significance and (as neces-
are scanned from left to right. Within a column, samples are
sary) sign information for samples that have not yet been found
scanned from top to bottom.
to be significant and are predicted to become significant during
The bit-plane encoding process generates a sequence of sym-
the processing of the current bit plane. The samples in the code
bols for each coding pass. Some or all of these symbols may be
block are scanned in the order shown previously in Fig. 10. If
entropy coded. For the purposes of entropy coding, a context-
a sample has not yet been found to be significant, and is pre-
based adaptive binary arithmetic coder is used—more specifi-
dicted to become significant, the significance of the sample is
cally, the MQ coder from the JBIG2 standard [28]. For each
coded with a single binary symbol. If the sample also happens
pass, all of the symbols are either arithmetically coded or raw
to be significant, its sign is coded using a single binary sym-
coded (i.e., the binary symbols are emitted as raw bits with sim-
bol. In pseudocode form, the significance pass is described by
ple bit stuffing). The arithmetic and raw coding processes both
Algorithm 1.
ensure that certain bit patterns never occur in the output, allow-
ing such patterns to be used for error resilience purposes.
Algorithm 1 Significance pass algorithm.
Cleanup passes always employ arithmetic coding. In the case
1: for each sample in code block do
of the significance and refinement passes, two possibilities ex-
2: if sample previously insignificant and predicted to become significant
ist, depending on whether the so called arithmetic-coding bypass during current bit plane then
3: code significance of sample /* 1 binary symbol */
mode (also known as lazy mode) is enabled. If lazy mode is en-
4: if sample significant then
abled, only the significance and refinement passes for the four
5: code sign of sample /* 1 binary symbol */
most significant bit planes use arithmetic coding, while the re-
6: endif
7: endif
maining such passes are raw coded. Otherwise, all significance
8: endfor
and refinement passes are arithmetically coded. The lazy mode
Copyright
c 2002 by Michael D. Adams 9
If the most significant bit plane is being processed, all samples
...
are predicted to remain insignificant. Otherwise, a sample is
predicted to become significant if any 8-connected neighbor has
already been found to be significant. As a consequence of this
...
prediction policy, the significance and refinement passes for the
most significant bit plane are always empty (and need not be
explicitly coded).
The symbols generated during the significance pass may or
may not be arithmetically coded. If arithmetic coding is em-
Fig. 10. Sample scan order within a code block.
ployed, the binary symbol conveying significance information
is coded using one of nine contexts. The particular context used
is selected based on the significance of the sample’s 8-connected
from top to bottom, and the columns within a stripe are scanned
neighbors and the orientation of the subband with which the
from left to right. For convenience, we will refer to each column
sample is associated (e.g., LL, LH, HL, HH). In the case that
within a stripe as a vertical scan. That is, each vertical arrow in
arithmetic coding is used, the sign of a sample is coded as the
the diagram corresponds to a so called vertical scan. As
...










Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...