Information technology — Scalable compression and coding of continuous-tone still images — Part 1: Core coding system specification

This document specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content. This document defines the core coding system, which forms the basis for the entire ISO/IEC 18477 series.

Technologies de l'information — Compression échelonnable et codage d'images plates en ton continu — Partie 1: Spécification du système de codage de noyau

General Information

Status
Published
Publication Date
28-May-2020
Current Stage
9092 - International Standard to be revised
Completion Date
09-Feb-2023
Ref Project

Relations

Buy Standard

Standard
ISO/IEC 18477-1:2020 - Information technology -- Scalable compression and coding of continuous-tone still images
English language
17 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO/IEC 18477-1:2020 - Information technology -- Scalable compression and coding of continuous-tone still images
English language
17 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 18477-1
Second edition
2020-05
Information technology — Scalable
compression and coding of
continuous-tone still images —
Part 1:
Core coding system specification
Technologies de l'information — Compression échelonnable et codage
d'images plates en ton continu —
Partie 1: Spécification du système de codage de noyau
Reference number
ISO/IEC 18477-1:2020(E)
©
ISO/IEC 2020

---------------------- Page: 1 ----------------------
ISO/IEC 18477-1:2020(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 18477-1:2020(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Symbols and abbreviated terms . 3
4.1 Symbols . 3
4.2 Abbreviated terms . 3
5 Conventions . 3
5.1 Conformance language . 3
5.2 Operators . 4
5.2.1 Arithmetic operators . 4
5.2.2 Assignment operators . 4
5.2.3 Precedence order of operators . 4
5.2.4 Mathematical functions . 4
6 General . 4
6.1 General definitions . 4
6.2 Functional overview on the decoding process . 5
6.3 Encoder requirements . 5
6.4 Decoder requirements. 5
Annex A (normative) Component subsampling and expansion of subsampling .6
Annex B (normative) Codestream syntax . 8
Annex C (normative) Multi-component decorrelation .15
Bibliography .17
© ISO/IEC 2020 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 18477-1:2020(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see http:// patents .iec .ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This second edition cancels and replaces the first edition (ISO/IEC 18477-1:2015), which has been
technically revised.
The main changes compared to the previous edition are as follows:
— Annex A.3 has been revised to adopt centred upsampling by default;
— minor editorial changes throughout.
A list of all parts in the ISO/IEC 18477 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 18477-1:2020(E)

Introduction
This document specifies a coded codestream format for storage of continuous-tone photographic
content. JPEG XT is a scalable image coding system that builds on the legacy Rec. ITU-T T.81 |
ISO/IEC 10918-1 coding system, also known as JPEG, but extends it in a backwards compatible way.
This document specifies the commonly deployed components of the JPEG coding system. Additional
parts of the ISO/IEC 18477 series extend on this baseline.
JPEG XT has been designed to be backwards compatible to legacy applications while at the same
time having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of Rec.
ITU-T T.81 | ISO/IEC 10918-1, Rec. ITU-T T.86 | ISO/IEC 10918-4 and Rec. ITU-T T.871 | ISO/IEC 10918-5
to extend the functionality of the legacy JPEG coding system. It is optimized for good image quality and
compression efficiency while also enabling low-complexity encoding and decoding implementations.
© ISO/IEC 2020 – All rights reserved v

---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO/IEC 18477-1:2020(E)
Information technology — Scalable compression and
coding of continuous-tone still images —
Part 1:
Core coding system specification
1 Scope
This document specifies a coding format, referred to as JPEG XT, which is designed primarily for
continuous-tone photographic content. This document defines the core coding system, which forms the
basis for the entire ISO/IEC 18477 series.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
Rec. ITU-T T.81 | ISO/IEC 10918-1:1994, Information technology — Digital compression and coding of
continuous-tone still images — Part 1: Requirements and guidelines
Rec. ITU-T T.86 | ISO/IEC 10918-4, Information technology — Digital compression and coding of
continuous-tone still images — Part 4: Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour
spaces, APPn markers, SPIFF compression types and Registration Authorities (REGAUT)
Rec. ITU-T T.871 | ISO/IEC 10918-5, Information technology — Digital compression and coding of
continuous-tone still images  — Part 5: JPEG File Interchange Format (JFIF)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
bitstream
partially encoded or decoded sequence of bits comprising an entropy-coded segment
3.2
block
8×8 array of samples or an 8×8 array of DCT coefficient values of one component
3.3
byte
group of 8 bits
3.4
coder
embodiment of a coding process
© ISO/IEC 2020 – All rights reserved 1

---------------------- Page: 6 ----------------------
ISO/IEC 18477-1:2020(E)

3.5
coding
encoding or decoding
3.6
compression
reduction in the number of bits used to represent source image data
3.7
component
two-dimensional array of samples having the same designation in the output or display device
Note 1 to entry: An image typically consists of several components, e.g. red, green and blue.
3.8
continuous-tone image
image whose components have more than one bit per sample
3.9
discrete cosine transform
DCT
either the forward discrete cosine transform or the inverse discrete cosine transform
3.10
downsampling
procedure by which the spatial resolution of a component is reduced
3.11
entropy-coded data segment
independently decodable sequence of entropy encoded bytes of compressed image data
3.12
marker
two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and
hexadecimal FE
3.13
marker segment
marker and associated set of parameters
3.14
precision
number of bits allocated to a particular sample or DCT coefficient
3.15
procedure
set of steps which accomplishes one of the tasks which comprise an encoding or decoding process
3.16
sample
one element in the two-dimensional array which comprises a component
3.17
sample grid
common coordinate system for all samples of an image with the samples at the top left edge of the image
having the coordinates (0, 0), the first coordinate increases towards the right, the second to the bottom
3.18
scan
single pass through the data for one or more of the components in an image
2 © ISO/IEC 2020 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/IEC 18477-1:2020(E)

3.19
scan header
marker segment that contains a start-of-scan marker and associated scan parameters that are coded at
the beginning of a scan
3.20
upsampling
procedure by which the spatial resolution of a component is increased
3.21
vertical sampling factor
relative number of vertical data units of a particular component with respect to the number of vertical
data units in the other components in the frame
4 Symbols and abbreviated terms
4.1 Symbols
X width of the sample grid in positions
Y height of the sample grid in positions
Nf number of components in an image
s subsampling factor of component i in horizontal direction
i, x
s subsampling factor of component i in vertical direction
i, y
H subsampling indicator of component i in the frame header
i
V subsampling indicator of component i in the frame header
i
v sample value at the sample grid position x, y
x, y
4.2 Abbreviated terms
ASCII American Standard Code for Information Interchange
DC lowpass
AC highpass
LSB least significant bit
MSB most significant bit
DCT discrete cosine transformation
JPEG joint photographic experts group
5 Conventions
5.1 Conformance language
The keyword "reserved" indicates a provision that is not specified at this time, shall not be used, and
may be specified in the future. The keyword "forbidden" indicates "reserved" and in addition indicates
that the provision will never be specified in the future.
© ISO/IEC 2020 – All rights reserved 3

---------------------- Page: 8 ----------------------
ISO/IEC 18477-1:2020(E)

5.2 Operators
NOTE Many of the operators used in this document are similar to those used in the C programming language.
5.2.1 Arithmetic operators
+ addition
− subtraction (as a binary operator) or negation (as a unary prefix operator)
× multiplication
/ division without truncation or rounding
5.2.2 Assignment operators
= assignment operator
5.2.3 Precedence order of operators
Operators are listed in descending order of precedence. If several operators appear in the same line,
they have equal precedence. When several operators of equal precedence appear at the same level in an
expression, evaluation proceeds according to the associativity of the operator either from right to left
or from left to right.
Operators Type of operation Associativity
(), [ ], . expression left to right
− unary negation
×, / multiplication left to right

+, − addition and subtraction left to right
<, >, <=, >= relational left to right
5.2.4 Mathematical functions
⎾x⏋ Ceiling of x. Returns the smallest integer that is greater than or equal to x.
⎿x⏌ Floor of x. Returns the largest integer that is lesser than or equal to x.
|x| Absolute value, is –x for x < 0, otherwise x.
sign(x) Sign of x, zero if x is zero, +1 if x is positive, –1 if x is negative.
clamp(x, min, max) Clamps x to the range [min, max]: returns min if x < min, max if x > max or
otherwise x.
6 General
6.1 General definitions
The purpose of this clause is to give an informative overview of the elements specified in this document.
Another purpose is to introduce many of the terms which are defined in Clause 3. These terms are
printed in italics upon first usage in this clause.
4 © ISO/IEC 2020 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 18477-1:2020(E)

There are three elements specified in this document:
a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source image
data and encoder specifications, and by means of a specified set of procedures generates as output a
codestream.
b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by
means of a specified set of procedures generates as output digital reconstructed image data.
c) The codestream is a compressed image data representation which includes all necessary data to
allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data
might be required that define the interpretation of the sample data, such as the spatial dimensions
of the samples.
6.2 Functional overview on the decoding process
The high-level algorithm for decoding is as follows: The samples are first reconstructed following the
decoder specifications defined in Rec. ITU-T T.81 | ISO/IEC 10918-1. If the resulting component arrays
are subsampled, they are upsampled on a common sample grid following the specifications in Annex A.
Following that, the output data is processed by an inverse decorrelation transformation. If the data is
already in an RGB type colour space, e.g. RGB with ITU-R Rec. BT.601 primaries, this transformation
will be the identity transformation. Otherwise, either the ICT is used to transform the data into RGB.
The inverse decorrelation transformation is defined in Annex C, and the markers that are required to
select the transformation are defined in Annex B.
6.3 Encoder requirements
An encoding process converts source image data to compressed image data. This includes first obtaining
a low dynamic range image, and representing it by a coding process specified in Rec. ITU-T T.81 |
ISO/IEC 10918-1:1994, Annex F or Annex G.
In order to comply with this document, an encoder shall satisfy at least one of the following two
requirements. An encoder shall with appropriate accuracy, convert source image data to compressed
image data which comply with the codestream format syntax specified in Annex B for the encoding
process(es) embodied by the encoder. A limited accuracy sufficient to match the error bounds specified
in the compliance tests is acceptable.
There is no requirement in this document that any encoder which embodies one of the encoding
processes specified here shall be able to operate for all ranges of the parameters which are allowed for
that process. An encoder is only required to meet the compliance tests and to generate the compressed
data format according to Annex B for those parameter values which it does use.
6.4 Decoder requirements
A decoding process converts compressed image data to reconstructed image data. For that, it has
to follow the decoding operation specified in Rec. ITU-T T.81 | ISO/IEC 10918-1 with sufficient
accuracy, using either the baseline, sequential or progressive scan process defined in Rec.
ITU-T T.81 | ISO/IEC 10918-1:1994, Annex F or Annex G. This process generates sample values on a
sample grid, which are then converted into a digital image by following the upsampling specifications
in Annex B and the multi-component decorrelation (ICT) process in Annex C.
© ISO/IEC 2020 – All rights reserved 5

---------------------- Page: 10 ----------------------
ISO/IEC 18477-1:2020(E)

Annex A
(normative)

Component subsampling and expansion of subsampling
NOTE In this annex, the flowcharts and tables are normative only in the sense that they are defining an
output that alternative implementations shall duplicate.
A.1 Component dimensions and subsampling factors
An image is defined to consist of Nf components, each of which is identified by a unique identifier C
i
defined in the frame header of the codestream format specified in Annex B. The number of components
Nf shall be either 1 or 3. A component consists of a rectangular array of samples x wide and y samples
i i
high. The component dimensions are derived from the image dimensions X and Y, also parameters
recorded in the frame header. These two parameters define a sample grid of X grid points wide and
Y grid points high, where the left topmost grid coordinate is (0, 0) and coordinates increase from left
to right and from top to bottom. However, the dimensions of the component do not need to coincide
with the dimensions of the image. For each component, two subsampling factors s and s define the
i, x i, y
spacing between sample points of component i relative to the sample grid and the size of the component
array. If X and Y are the dimensions of the sample grid, the size of component i with subsampling factors
s and s is
i, x i, y
⎾X/s ⏋ and ⎾Y/s ⏋
i,x i,y
Upsampling by interpolation from surrounding samples as specified in Annex A generates then sample
values on all grid points of the sample grid.
The subsampling factors s and s are not directly represented in the binary codestream or any of its
i, x i, y
markers, but shall be derived from the parameters H and V recorded in the frame header. If Nf equals
i i
1, i.e. the image consists of a single component, H and V shall be 1, and s and s are both 1. If Nf
1 1 1, x 1, y
equals 3, Table A.1 defines the relation between H , V and s and s . No other combinations of H and
i i i, x i, y i
V than those listed in Table A.1 shall be used.
i
Table A.1 — Sampling values
H V H V H V s s s s s s
1 1 2 2 3 3 1, x 1, y 2, x 2, y 3, x 3, y
1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 1 2 1 1 1 1 2 1 2
2 2 1 2 1 2 1 1 2 1 2 1
2 2 1 1 1 1 1 1 2 2 2 2
All other values reserved for ITU/ISO purposes.
NOTE Rec. ITU-T T.81 | ISO/IEC 10918-1 allowed other component arrangements and relations between
grid positions and sample positions that are not valid in this document. However, the definitions given here are
special cases of the more general relations provided in Rec. ITU-T T.81 | ISO/IEC 10918-1 and both definitions
agree whenever both are defined.
A.2 Expansion of subsampled components
Whenever the subsampling factors s and s are not both 1, interpolation is used to populate all grid
i,x i,y
positions of the image sample grid. The following bilinear interpolation algorithm can be used to provide
sample values at all sampling grid positions. Readers should be aware that the algorithm described
here will also change the sample values at sampling grid positions whose values are represented in the
6 © ISO/IEC 2020 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 18477-1:2020(E)

codestream. This may have the effect of a continuous loss of precision of the subsampled components
over multiple compression-decompression cycles.
A.3 Bilinear expansion of subsampled components
Upsampling is performed in two steps. First, upsampling in the vertical direction if s is 2, generating
i,y
an intermediate image. Second, upsampling in the horizontal direction if s is 2, generating the final
i,x
output image from the intermediate image.
In a first step, check for each component i whether s is 2 or 1. If s is 1, copy the reconstructed samples
i,y i,y
(up,y) (up,y)
to the intermediate image v directly. Otherwise, compute the intermediate image v from the
reconstructed samples v by first setting v to v and v to v , and then set for all x such
x,−1 x,0 x,⎾Y/2⏋ x,⎾Y/2⏋–1
that 0 ≤ x < X and all y such such that 0 ≤ y < ⎾Y/2⏋:
(up,y)
v  =  ⎿(v +3×v +1 + (x mod 2))/4⏌
x,2y x,y−1 x,y
(up,y)
v  =  ⎿(v +3×v +2 − (x mod 2))/4⏌
x,2y+1 x,y+1 x,y
(up,y)
The outputs v are discarded if the image height Y is odd.
x,2⎾Y/2⏋+1
In a second step, check for each component i whether s is 2 or 1. If s is 1, copy the intermediate image
i,x i,x
(up,x,y)
to the output image directly. Otherwise, compute the output image v from the intermediate image
(up,y) (up,y) (up,y) (up,y) (up,y)
v by first setting v to v and v to v and then set for all y such
−1,y 0,y ,⎾X/2⏋,y ,⎾X/2⏋−1,y
that 0 ≤ y < Y and all x such such that 0 ≤ x < ⎾X/2⏋:
(up,x,y) (up,y) (up,y)
v  =  ⎿(v +3×v +2)/4⏌
2x,y x−1,y x,y
(up,x,y) (up,y) (up,y)
v  =  ⎿(v +3×v +1)/4⏌
2x+1,y x+1,y x,y
(up,x,y)
The outputs v are discarded if the image width X is odd.
2⎾X/2⏋+1,y
A.4 Downsampling of components
This document does not define a normative procedure by which the resolution of components whose
si, x and si, y factors are not both one shall be reduced. Any procedure that generates components of
the size ⎾X/si,x⏋ and ⎾Y/si,y⏋ is acceptable as long as it is compatible with the upsampling procedure
defined above. A very simple downsampling filter is given in the next subclause.
A.5 Downsampling by a box filter
The box filter is the simplest possible downsampling filter and provides only poor quality. Even though
better alternatives exist, the box filter is nevertheless presented here as an example. The input of the box
filter is a X × Y component array of samples, where the sample value at position x, y is denoted by v .
x, y
x :=s ×x x :=min(s ×x +s −1, X−1) y := s ×y y :=min(s ×y +s −1, Y−1)
min i, x s max i, x s i, x min i, y max i, y s i, y
The output of the box filter at position x , y is then defined as:
s s
s
v :=(Σ Σ v )/((x −x −1) × (y −y −1))
x, y x=xmin.xmax y=xmin.ymax x, y max min max min
s
i.e. the average over the box x , y to x , y . The array of downsampled sample values v is
min min max max x, y
then subject to further processing, e.g. DCT transformation and entropy coding.
© ISO/IEC 2020 – All rights reserved 7

---------------------- Page: 12 ----------------------
ISO/IEC 18477-1:2020(E)

Annex B
(normative)

Codestream syntax
NOTE 1 This annex defines the compressed bitstream syntax which, structurally, consists of an ordered
collection of marker segments and entropy coded data segments. Marker segments specify parameters necessary
to reconstruct the sample values from the entropy coded data segments. Because all of these constituent parts
are represented with byte-aligned codes, each compressed data format consists of an ordered sequence of 8-bit
bytes. For each byte, a most significant bit (MSB) and a least significant bit (LSB) are defined.
NOTE 2 The codestream syntax defined here agrees mostly with the "interchange format" defined in Rec.
ITU-T T.81 | ISO/IEC 10918-1, with some additional constraints on the parameters in the marker segments and
some additional markers carrying information that is irrelevant for the older standard.
B.1 Parameters
Parameters are integers, with values specific to the encoding process, source image characteristics,
and other features selectable by the application. Parameters are assigned either 4-bit, 1-byte, 2-byte or
4-byte codes. Except for certain optional groups of parameters, parameters encode critical information
without which the decoding process cannot properly reconstruct the image. The code assignment for
a parameter shall be an unsigned integer of the specified length in bits with the particular value of the
parameter.
For parameters which are 2 bytes (16 bits) in length, the most significant byte shall come first in the
compressed data’s ordered sequence of bytes. The same holds for parameters that are 4 bytes (32 bit)
in length, where bits are ordered in the codestream in decreasing significance. Parameters which are
4 bits in length always come in pairs, and the pair shall always be encoded in a single byte. The first
4-bit parameter of the pair shall occupy the most significant 4 bits of the byte. Within any 32-, 16-, 8-, or
4-bit parameter, the MSB shall come first and LSB shall come last. This encoding is commonly known as
"big endian" representation of unsigned integers.
B.2 Markers
Markers serve to identify the various structural parts of the compressed data formats. Most markers
start marker segments containing a related group of parameters; some markers stand alone. All
markers are assigned two-byte codes: an 0xff byte followed by a byte which is not equal to 0x00 or 0xff.
Any marker may optionally be preceded by any number of fill bytes, which are bytes of the value 0xff.
NOTE Because of this special code-assignment structure, markers make it possible for a decoder to parse
the compressed data and locate its various parts without having to decode other segments of image data.
B.3 Marker assignments
All markers shall be assigned two-byte codes: a 0xff byte followed by a second byte which is not equal
to 0x00 nor 0xff. The second byte is specified in Table B.1 for each defined marker. An asterisk (*)
indicates a marker which stands alone, that is, which is not the start of a marker segment. Most of the
marker segments used by this document are defined in Rec. ITU-T T.81 | ISO/IEC 10918-1, though some
marker segments defined there shall not be used in this document. For completeness, these markers are
also included in Tabl
...

FINAL
INTERNATIONAL ISO/IEC
DRAFT
STANDARD FDIS
18477-1
ISO/IEC JTC 1/SC 29
Information technology — Scalable
Secretariat: JISC
compression and coding of
Voting begins on:
2020-03-13 continuous-tone still images —
Voting terminates on:
Part 1:
2020-05-08
Core coding system specification
RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPOR TING
DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
ISO/IEC FDIS 18477-1:2020(E)
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO/IEC 2020

---------------------- Page: 1 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Symbols and abbreviated terms . 3
4.1 Symbols . 3
4.2 Abbreviated terms . 3
5 Conventions . 3
5.1 Conformance language . 3
5.2 Operators . 4
5.2.1 Arithmetic operators . 4
5.2.2 Assignment operators . 4
5.2.3 Precedence order of operators . 4
5.2.4 Mathematical functions . 4
6 General . 4
6.1 General definitions . 4
6.2 Functional overview on the decoding process . 5
6.3 Encoder requirements . 5
6.4 Decoder requirements. 5
Annex A (normative) Component subsampling and expansion of subsampling .6
Annex B (normative) Codestream syntax . 8
Annex C (normative) Multi-component decorrelation .15
Bibliography .17
© ISO/IEC 2020 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that
are members of ISO or IEC participate in the development of International Standards through
technical committees established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see http:// patents .iec .ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This second edition cancels and replaces the first edition (ISO/IEC 18477-1:2015), which has been
technically revised.
The main changes compared to the previous edition are as follows:
— Annex A.3 has been revised to adopt centred upsampling by default;
— minor editorial changes throughout.
A list of all parts in the ISO/IEC 18477 series can be found on the ISO website.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
iv © ISO/IEC 2020 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

Introduction
This document specifies a coded codestream format for storage of continuous-tone photographic
content. JPEG XT is a scalable image coding system that builds on the legacy Rec. ITU-T T.81 |
ISO/IEC 10918-1 coding system, also known as JPEG, but extends it in a backwards compatible way.
This document specifies the commonly deployed components of the JPEG coding system. Additional
parts of the ISO/IEC 18477 series will extend on this baseline.
JPEG XT has been designed to be backwards compatible to legacy applications while at the same
time having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of Rec.
ITU-T T.81 | ISO/IEC 10918-1, Rec. ITU-T T.86 | ISO/IEC 10918-4 and Rec. ITU-T T.871 | ISO/IEC 10918-5
to extend the functionality of the legacy JPEG coding system. It is optimized for good image quality and
compression efficiency while also enabling low-complexity encoding and decoding implementations.
© ISO/IEC 2020 – All rights reserved v

---------------------- Page: 5 ----------------------
FINAL DRAFT INTERNATIONAL STANDARD ISO/IEC FDIS 18477-1:2020(E)
Information technology — Scalable compression and
coding of continuous-tone still images —
Part 1:
Core coding system specification
1 Scope
This document specifies a coding format, referred to as JPEG XT, which is designed primarily for
continuous-tone photographic content. This document defines the core coding system, which forms the
basis for the entire ISO/IEC 18477 series.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
Rec. ITU-T T.81 | ISO/IEC 10918-1:1994, Information technology — Digital compression and coding of
continuous-tone still images — Part 1: Requirements and guidelines
Rec. ITU-T T.86 | ISO/IEC 10918-4, Information technology — Digital compression and coding of
continuous-tone still images — Part 4: Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour
spaces, APPn markers, SPIFF compression types and Registration Authorities (REGAUT)
Rec. ITU-T T.871 | ISO/IEC 10918-5, Information technology — Digital compression and coding of
continuous-tone still images — Part 5: JPEG File Interchange Format (JFIF)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
bitstream
partially encoded or decoded sequence of bits comprising an entropy-coded segment
3.2
block
8×8 array of samples or an 8×8 array of DCT coefficient values of one component
3.3
byte
group of 8 bits
3.4
coder
embodiment of a coding process
© ISO/IEC 2020 – All rights reserved 1

---------------------- Page: 6 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

3.5
coding
encoding or decoding
3.6
compression
reduction in the number of bits used to represent source image data
3.7
component
two-dimensional array of samples having the same designation in the output or display device
Note 1 to entry: An image typically consists of several components, e.g. red, green and blue.
3.8
continuous-tone image
image whose components have more than one bit per sample
3.9
discrete cosine transform
DCT
either the forward discrete cosine transform or the inverse discrete cosine transform
3.10
downsampling
procedure by which the spatial resolution of a component is reduced
3.11
entropy-coded data segment
independently decodable sequence of entropy encoded bytes of compressed image data
3.12
marker
two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and
hexadecimal FE
3.13
marker segment
marker and associated set of parameters
3.14
precision
number of bits allocated to a particular sample or DCT coefficient
3.15
procedure
set of steps which accomplishes one of the tasks which comprise an encoding or decoding process
3.16
sample
one element in the two-dimensional array which comprises a component
3.17
sample grid
common coordinate system for all samples of an image with the samples at the top left edge of the image
having the coordinates (0, 0), the first coordinate increases towards the right, the second to the bottom
3.18
scan
single pass through the data for one or more of the components in an image
2 © ISO/IEC 2020 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

3.19
scan header
marker segment that contains a start-of-scan marker and associated scan parameters that are coded at
the beginning of a scan
3.20
upsampling
procedure by which the spatial resolution of a component is increased
3.21
vertical sampling factor
relative number of vertical data units of a particular component with respect to the number of vertical
data units in the other components in the frame
4 Symbols and abbreviated terms
4.1 Symbols
X width of the sample grid in positions
Y height of the sample grid in positions
Nf number of components in an image
s subsampling factor of component i in horizontal direction
i, x
s subsampling factor of component i in vertical direction
i, y
H subsampling indicator of component i in the frame header
i
V subsampling indicator of component i in the frame header
i
v sample value at the sample grid position x, y
x, y
4.2 Abbreviated terms
ASCII American Standard Code for Information Interchange
DC lowpass
AC highpass
LSB least significant bit
MSB most significant bit
DCT discrete cosine transformation
JPEG joint photographic experts group
5 Conventions
5.1 Conformance language
The keyword "reserved" indicates a provision that is not specified at this time, shall not be used, and
may be specified in the future. The keyword "forbidden" indicates "reserved" and in addition indicates
that the provision will never be specified in the future.
© ISO/IEC 2020 – All rights reserved 3

---------------------- Page: 8 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

5.2 Operators
NOTE Many of the operators used in this document are similar to those used in the C programming language.
5.2.1 Arithmetic operators
+ addition
− subtraction (as a binary operator) or negation (as a unary prefix operator)
× multiplication
/ division without truncation or rounding
5.2.2 Assignment operators
= assignment operator
5.2.3 Precedence order of operators
Operators are listed in descending order of precedence. If several operators appear in the same line,
they have equal precedence. When several operators of equal precedence appear at the same level in an
expression, evaluation proceeds according to the associativity of the operator either from right to left
or from left to right.
Operators Type of operation Associativity
(), [ ], . expression left to right
− unary negation
×, / multiplication left to right

+, − addition and subtraction left to right
<, >, <=, >= relational left to right
5.2.4 Mathematical functions
⎾x⏋ Ceiling of x. Returns the smallest integer that is greater than or equal to x.
⎿x⏌ Floor of x. Returns the largest integer that is lesser than or equal to x.
|x| Absolute value, is –x for x < 0, otherwise x.
sign(x) Sign of x, zero if x is zero, +1 if x is positive, –1 if x is negative.
clamp(x, min, max) Clamps x to the range [min, max]: returns min if x < min, max if x > max or
otherwise x.
6 General
6.1 General definitions
The purpose of this clause is to give an informative overview of the elements specified in this document.
Another purpose is to introduce many of the terms which are defined in Clause 3. These terms are
printed in italics upon first usage in this clause.
4 © ISO/IEC 2020 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

There are three elements specified in this document:
a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source image
data and encoder specifications, and by means of a specified set of procedures generates as output a
codestream.
b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by
means of a specified set of procedures generates as output digital reconstructed image data.
c) The codestream is a compressed image data representation which includes all necessary data to
allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data
might be required that define the interpretation of the sample data, such as the spatial dimensions
of the samples.
6.2 Functional overview on the decoding process
The high-level algorithm for decoding is as follows: The samples are first reconstructed following the
decoder specifications defined in Rec. ITU-T T.81 | ISO/IEC 10918-1. If the resulting component arrays
are subsampled, they are upsampled on a common sample grid following the specifications in Annex A.
Following that, the output data is processed by an inverse decorrelation transformation. If the data is
already in an RGB type colour space, e.g. RGB with ITU-R Rec. BT.601 primaries, this transformation
will be the identity transformation. Otherwise, either the ICT is used to transform the data into RGB.
The inverse decorrelation transformation is defined in Annex C, and the markers that are required to
select the transformation are defined in Annex B.
6.3 Encoder requirements
An encoding process converts source image data to compressed image data. This includes first
generating a low dynamic range image, and representing it by a coding process specified in Rec.
ITU-T T.81 | ISO/IEC 10918-1:1994, Annex F or Annex G, and then generating a residual image which is
encoded by one of the processes defined in this document.
In order to comply with this document, an encoder shall satisfy at least one of the following two
requirements. An encoder shall with appropriate accuracy, convert source image data to compressed
image data which comply with the codestream format syntax specified in Annex B for the encoding
process(es) embodied by the encoder. A limited accuracy sufficient to match the error bounds specified
in the compliance tests is acceptable.
There is no requirement in this document that any encoder which embodies one of the encoding
processes specified here shall be able to operate for all ranges of the parameters which are allowed for
that process. An encoder is only required to meet the compliance tests and to generate the compressed
data format according to Annex B for those parameter values which it does use.
6.4 Decoder requirements
A decoding process converts compressed image data to reconstructed image data. For that, it has
to follow the decoding operation specified in Rec. ITU-T T.81 | ISO/IEC 10918-1 with sufficient
accuracy, using either the baseline, sequential or progressive scan process defined in Rec.
ITU-T T.81 | ISO/IEC 10918-1:1994, Annex F or Annex G. This process generates sample values on a
sample grid, which are then converted into a digital image by following the upsampling specifications
in Annex B and the multi-component decorrelation (ICT) process in Annex C.
© ISO/IEC 2020 – All rights reserved 5

---------------------- Page: 10 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

Annex A
(normative)

Component subsampling and expansion of subsampling
NOTE In this annex, the flowcharts and tables are normative only in the sense that they are defining an
output that alternative implementations shall duplicate.
A.1 Component dimensions and subsampling factors
An image is defined to consist of Nf components, each of which is identified by a unique identifier C
i
defined in the frame header of the codestream format specified in Annex B. The number of components
Nf shall be either one or three. A component consists of a rectangular array of samples x wide and
i
y samples high. The component dimensions are derived from the image dimensions X and Y, also
i
parameters recorded in the frame header. These two parameters define a sample grid of X grid points
wide and Y grid points high, where the left topmost grid coordinate is (0, 0) and coordinates increase
from left to right and from top to bottom. However, the dimensions of the component do not need to
coincide with the dimensions of the image. For each component, two subsampling factors s and s
i, x i,
define the spacing between sample points of component i relative to the sample grid and the size of
y
the component array. If X and Y are the dimensions of the sample grid, the size of component i with
subsampling factors s and s is
i, x i, y
⎾X/s ⏋ and ⎾Y/s ⏋
i,x i,y
Upsampling by interpolation from surrounding samples as specified in Annex A generates then sample
values on all grid points of the sample grid.
The subsampling factors s and s are not directly represented in the binary codestream or any of its
i, x i, y
markers, but shall be derived from the parameters H and V recorded in the frame header. If Nf equals
i i
one, i.e. the image consists of a single component, H and V shall be one, and s and s are both one.
1 1 1, x 1, y
If Nf equals three, Table A.1 defines the relation between H , V and s and s . No other combinations
i i i, x i, y
of H and V than those listed in Table A.1 shall be used.
i i
Table A.1 — Subsampling values
H V H V H V s s s s s s
1 1 2 2 3 3 1, x 1, y 2, x 2, y 3, x 3, y
1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 1 2 1 1 1 1 2 1 2
2 2 1 2 1 2 1 1 2 1 2 1
2 2 1 1 1 1 1 1 2 2 2 2
All other values reserved for ITU/ISO purposes.
NOTE Rec. ITU-T T.81 | ISO/IEC 10918-1 allowed other component arrangements and relations between
grid positions and sample positions that are not valid in this document. However, the definitions given here are
special cases of the more general relations provided in Rec. ITU-T T.81 | ISO/IEC 10918-1 and both definitions
agree whenever both are defined.
A.2 Expansion of subsampled components
Whenever the subsampling factors s and s are not both 1, interpolation is used to populate all grid
i,x i,y
positions of the image sample grid. The following bilinear interpolation algorithm can be used to provide
sample values at all sampling grid positions. Readers should be aware that the algorithm described
here will also change the sample values at sampling grid positions whose values are represented in the
6 © ISO/IEC 2020 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

codestream. This may have the effect of a continuous loss of precision of the subsampled components
over multiple compression-decompression cycles.
A.3 Bilinear expansion of subsampled components
Upsampling is performed in two steps. First, upsampling in the vertical direction if s is 2, generating
i,y
an intermediate image. Second, upsampling in the horizontal direction if s is 2, generating the final
i,x
output image from the intermediate image.
In a first step, check for each component i whether s is 2 or 1. If s is 1, copy the reconstructed samples
i,y i,y
(up,y) (up,y)
to the intermediate image v directly. Otherwise, compute the intermediate image v from the
reconstructed samples v by first setting v to v and v to v , and then set for all x such
x,−1 x,0 x,⎾Y/2⏋ x,⎾Y/2⏋–1
that 0 ≤ x < X and all y such such that 0 ≤ y < ⎾Y/2⏋:
(up,y)
v  =  ⎿(v +3×v +1 + (x mod 2))/4⏌
x,2y x,y−1 x,y
(up,y)
v  =  ⎿(v +3×v +2 − (x mod 2))/4⏌
x,2y+1 x,y+1 x,y
(up,y)
The outputs v are discarded if the image height Y is odd.
x,2⎾Y/2⏋+1
In a second step, check for each component i whether s is 2 or 1. If s is 1, copy the intermediate image
i,x i,x
(up,x,y)
to the output image directly. Otherwise, compute the output image v from the intermediate image
(up,y) (up,y) (up,y) (up,y) (up,y)
v by first setting v to v and v to v and then set for all y such
−1,y 0,y ,⎾X/2⏋,y ,⎾X/2⏋−1,y
that 0 ≤ y < Y and all x such such that 0 ≤ x < ⎾X/2⏋:
(up,x,y) (up,y) (up,y)
v  =  ⎿(v +3×v +2)/4⏌
2x,y x−1,y x,y
(up,x,y) (up,y) (up,y)
v  =  ⎿(v +3×v +1)/4⏌
2x+1,y x+1,y x,y
(up,x,y)
The outputs v are discarded if the image width X is odd.
2⎾X/2⏋+1,y
A.4 Downsampling of components
This document does not define a normative procedure by which the resolution of components whose
s and s factors are not both one shall be reduced. Any procedure that generates components of
i, x i, y
the size ⎾X/s ⏋ and ⎾Y/s ⏋ is acceptable as long as it is compatible with the upsampling procedure
i,x i,y
defined above. A very simple downsampling filter is given in the next subclause.
A.5 Downsampling by a box filter
The box filter is the simplest possible downsampling filter and provides only poor quality. Even though
better alternatives exist, the box filter is nevertheless presented here as an example. The input of the box
filter is a X × Y component array of samples, where the sample value at position x, y is denoted by v .
x, y
x :=s ×x x :=min(s ×x +s −1, X−1) y := s ×y y :=min(s ×y +s −1, Y−1)
min i, x s max i, x s i, x min i, y max i, y s i, y
The output of the box filter at position x , y is then defined as:
s s
s
v :=(Σ Σ v )/((x −x −1) × (y −y −1))
x, y x=xmin.xmax y=xmin.ymax x, y max min max min
s
i.e. the average over the box x , y to x , y . The array of downsampled sample values v is
min min max max x, y
then subject to further processing, e.g. DCT transformation and entropy coding.
© ISO/IEC 2020 – All rights reserved 7

---------------------- Page: 12 ----------------------
ISO/IEC FDIS 18477-1:2020(E)

Annex B
(normative)

Codestream syntax
NOTE 1 This annex defines the compressed bitstream syntax which, structurally, consists of an ordered
collection of marker segments and entropy coded data segments. Marker segments specify parameters necessary
to reconstruct the sample values from the entropy coded data segments. Because all of these constituent parts
are represented with byte-aligned codes, each compressed data format consists of an ordered sequence of 8-bit
bytes. For each byte, a most significant bit (MSB) and a least significant bit (LSB) are defined.
NOTE 2 The codestream syntax defined here agrees mostly with the "interchange format" defined in Rec.
ITU-T T.81 | ISO/IEC 10918-1, with some additional constraints on the parameters in the marker segments and
some additional markers carrying information that is irrelevant for the older standard.
B.1 Parameters
Parameters are integers, with values specific to the encoding process, source image characteristics,
and other features selectable by the application. Parameters are assigned either 4-bit, 1-byte, 2-byte or
4-byte codes. Except for certain optional groups of parameters, parameters encode critical information
without which the decoding process cannot properly reconstruct the image. The code assignment for
a parameter shall be an unsigned integer of the specified length in bits with the particular value of the
parameter.
For parameters which are 2 bytes (16 bits) in length, the most significant byte shall come first in the
compressed data’s ordered sequence of bytes. The same holds for parameters that are 4 bytes (32 bit)
in length, where bits are ordered in the codestream in decreasing significance. Parameters which are
4 bits in length always come in pairs, and the pair shall always be encoded in a single byte. The first
4-bit parameter of the pair shall occupy the most significant 4 bits of the byte. Within any 32-, 16-, 8-, or
4-bit parameter, the MSB shall come first and LSB shall come last. This encoding is commonly known as
"big endian" representation of unsigned integers.
B.2 Markers
Markers serve to identify the various structural parts of the compressed data formats. Most markers
start marker segments containing a related group of parameters; some markers stand alone. All
markers are assigned two-byte codes: an 0xff byte followed by a byte which is not equal to 0x00 or 0xff.
Any marker may optionally be preceded by any number of fill bytes, which are bytes of the value 0xff.
NOTE Because of this special code-assignment structure, markers make it possible for a decoder to parse
the compressed data and loca
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.