Information technology - Scalable compression and coding of continuous-tone still images - Part 3: Box file format

ISO/IEC 18477-3:2015 specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content.

Technologies de l'information — Compression échelonnable et codage d'images plates en ton continu — Partie 3: Format de la liste de fichiers

General Information

Status
Withdrawn
Publication Date
08-Dec-2015
Current Stage
9599 - Withdrawal of International Standard
Start Date
06-Dec-2023
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 18477-3:2015 - Information technology -- Scalable compression and coding of continuous-tone still images
English language
42 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ISO/IEC 18477-3:2015 - Information technology -- Scalable compression and coding of continuous-tone still images
English language
42 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 18477-3:2015 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Scalable compression and coding of continuous-tone still images - Part 3: Box file format". This standard covers: ISO/IEC 18477-3:2015 specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content.

ISO/IEC 18477-3:2015 specifies a coding format, referred to as JPEG XT, which is designed primarily for continuous-tone photographic content.

ISO/IEC 18477-3:2015 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.30 - Coding of graphical and photographical information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 18477-3:2015 has the following relationships with other standards: It is inter standard links to ISO/IEC 18477-3:2023. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 18477-3:2015 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 18477-3
First edition
Information technology — Scalable
compression and coding of
continuous-tone still images —
Part 3:
Box file format
Technologies de l’information — Compression échelonnable et codage
d’images plates en ton continu —
Partie 3: Format de la liste de fichiers
PROOF/ÉPREUVE
Reference number
©
ISO/IEC 2015
© ISO/IEC 2015, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2015 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms, definitions, abbreviated terms and symbols . 1
3.1 Terms and definitions . 1
3.2 Symbols . 5
3.3 Abbreviated terms . 6
4 Conventions . 6
4.1 Conformance language . 6
4.2 Operators . 6
4.2.1 Arithmetic operators . 6
4.2.2 Logical operators . 7
4.2.3 Relational operators . 7
4.2.4 Precedence order of operators . 7
4.2.5 Mathematical functions . 7
5 General . 8
5.1 High level overview on JPEG XT ISO/IEC 18477-3 . 8
5.2 Encoder requirements . 8
5.3 Decoder requirements. 9
Annex A (normative) JPEG XT marker segment .10
Annex B (normative) Common box types .15
Annex C (normative) Point transformation .39
Annex D (normative) Checksum computation .41
Bibliography .42
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE iii

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical
Barriers to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 29, Coding of
audio, picture, multimedia and hypermedia information.
ISO/IEC 18477 contains the following parts under the general title Information technology — Scalable
compression and coding of continuous-tone still images:
— Part 1: Scalable compression and coding of continuous-tone still images
— Part 2: Extensions for high dynamic range images
— Part 3: Box file format
— Part 6: IDR Integer Coding
— Part 7: HDR Floating-Point Coding
— Part 8: Lossless and Near-lossless Coding
— Part 9: Alpha Channel Coding
The following parts are under preparation:
— Part 4: Conformance testing
— Part 5: Reference software
iv PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

Introduction
This part of ISO/IEC 18477 specifies an extensible file format, denoted as JPEG XT, which is built on top
of the existing Rec. ITU-T T.81 | ISO/IEC 10918-1 codestream definition. While typically file formats
encapsulate codestreams by means of additional syntax elements such as boxes, the file format
structure specified here rather embeds the syntax elements of the file format, called boxes, into the
codestream. The necessity for this unusual arrangement is the backwards compatibility to the legacy
standard and the application toolchain built around it; that is, legacy applications conforming to Rec.
ITU-T T.81 | ISO/IEC 10918-1 will be able to decode image information embedded in files conforming
to this part of ISO/IEC 18477, though will only be able to recover a three component, 8 bits per sample,
lower quality version of the image described by the full file.
For more demanding applications, it is not uncommon to use a bit depth of 16, providing 65 536
representable values to describe each channel within a pixel, resulting on over 2,8 × 10 representable
colour values. In some less common scenarios, even greater bit depths are used, and sometimes the
dynamic range of the image is so high that a floating point based encoding is desirable. In addition to
image information, some applications also require an additional opacity channel, a feature not available
from the legacy standard.
Most common photo and image formats use an 8-bit or 16-bit unsigned integer value to represent some
function of the intensity of each colour channel. While it might be theoretically possible to agree on
one method for assigning specific numerical values to real world colours, doing so is not practical.
Since any specific device has its own limited range for colour reproduction, the device’s range may be a
small portion of the agreed-upon universal colour range. As a result, such an approach is an extremely
inefficient use of the available numerical values, especially when using only 8 bits (or 256 unique
values) per channel. To represent pixel values as efficiently as possible, devices use a numeric encoding
optimized for their own range of possible colours or gamut.
JPEG XT is designed to extend the legacy JPEG standard towards higher bitdepth, higher dynamic range,
wide colour gamut content while simultaneously allowing legacy applications to decode the image data
in the codestream to a standard low dynamic range image represented by only eight bits per channel.
The goal is to provide a backwards compatible coding specification that allows legacy applications and
existing toolchains to continue to operate on codestreams conforming to this part of ISO/IEC 18477.
JPEG XT has been designed to be backwards compatible to legacy applications while at the same time
having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of Rec. ITU-T T.81
| ISO/IEC 10918-1 to extend the functionality of the legacy JPEG Coding System.
This part of ISO/IEC 18477 is an extension of ISO/IEC 18477-1, a compression system for continuous
tone digital still images which is backwards compatible with Rec. ITU-T T.81 | ISO/IEC 10918-1. That is,
legacy applications conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1 will be able to reconstruct streams
generated by an encoder conforming to this part of ISO/IEC 18477, though will possibly not be able to
reconstruct such streams in full dynamic range, full quality or other features defined in this part of
ISO/IEC 18477.
The aim of this part of ISO/IEC 18477 is to provide a flexible and extensible framework to enrich
ISO/IEC 18477-1 compliant codestreams with side-channels and metadata. The syntax chosen in this
part of ISO/IEC 18477 defines a mechanism to embed syntax elements denoted as “Boxes” into Rec.
ITU-T T.81 | ISO/IEC 10918-1 compliant codestreams. The box syntax used here is identical to that
defined in the JPEG family of standards, for example JPEG 2000 (Rec. ITU-T T. 800 | ISO/IEC 15444-1).
Boxes will then carry either additional image data, to enable encoding of images of higher bitdepth, high
dynamic range, include alpha channels etc., or will carry metadata that describes the decoding process
of the legacy Rec. ITU-T T.81 | ISO/IEC 10918-1 codestream and the side channels to an extended or
high dynamic range image.
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE v

INTERNATIONAL STANDARD ISO/IEC 18477-3:2015(E)
Information technology — Scalable compression and
coding of continuous-tone still images —
Part 3:
Box file format
1 Scope
This part of ISO/IEC 18477 specifies a coding format, referred to as JPEG XT, which is designed primarily
for continuous-tone photographic content.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
ISO/IEC 18477-1:2015, Information technology — Scalable compression and coding of continuous-tone still
images — Part 1: Scalable compression and coding of continuous-tone still images
Rec. ITU-T T.81 | ISO/IEC 10918-1, Information Technology — Digital Compression and Coding of
Continuous Tone Still Images – Requirements and Guidelines
Rec. ITU-T T.871 | ISO/IEC 10918-5, Information technology — Digital compression and coding of
continuous-tone still images: JPEG File Interchange Format
3 Terms, definitions, abbreviated terms and symbols
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1.1
ASCII encoding
encoding of text characters and text strings according to ISO/IEC 10646
3.1.2
base decoding path
process of decoding legacy codestream and refinement data to the base image, jointly with all further
steps until residual data is added to the values obtained from the residual codestream
3.1.3
base image
collection of sample values obtained by entropy decoding the DCT coefficients of the legacy codestream
and the refinement codestream, and inversely DCT transforming them jointly
3.1.4
binary decision
choice between two alternatives
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 1

3.1.5
bit stream
partially encoded or decoded sequence of bits comprising an entropy-coded segment
3.1.6
block
8 × 8 array of samples or an 8 × 8 array of DCT coefficient values of one component
3.1.7
box
structured collection of data describing the image or the image decoding process embedded into one or
multiple APP marker segments
Note 1 to entry: See Annex A for the definition of boxes
3.1.8
byte
group of 8 bits
3.1.9
coder
embodiment of a coding process
3.1.10
coding
encoding or decoding
3.1.11
coding model
procedure used to convert input data into symbols to be coded
3.1.12
(coding) process
general term for referring to an encoding process, a decoding process, or both
3.1.13
compression
reduction in the number of bits used to represent source image data
3.1.14
component
two-dimensional array of samples having the same designation in the output or display device. An
image typically consists of several components, e. g. red, green and blue
3.1.15
continuous tone image
image whose components have more than one bit per sample
3.1.16
decoder
embodiment of a decoding process
3.1.17
decoding process
process which takes as its input compressed image data and outputs a continuous-tone image
3.1.18
encoder
embodiment of an encoding process
2 PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

3.1.19
encoding process
process which takes as its input a continuous-tone image and outputs compressed image data
3.1.20
entropy decoder
embodiment of an entropy decoding procedure
3.1.21
entropy decoding
lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the
entropy encoder
3.1.22
entropy encoder
embodiment of an entropy encoding procedure
3.1.23
entropy encoding
lossless procedure which converts a sequence of input symbols into a sequence of bits such that the
average number of bits per symbol approaches the entropy of the input symbols
3.1.24
high dynamic range
image or image data comprised of more than eight bits per sample
3.1.25
Intermediate dynamic range
image or image data comprised of more than eight bits per sample
3.1.26
Joint Photographic Experts Group
JPEG
informal name of the committee which created this part of ISO/IEC 18477
Note 1 to entry: The “joint” comes from the ITU and ISO/IEC collaboration.
3.1.27
legacy codestream
collection of markers and syntax elements defined by Rec. ITU-T T.81 | ISO/IEC 10918-1 bare any syntax
elements defined by the family ISO/IEC 18477 standards
Note 1 to entry: That is, the legacy codestream consists of the collection of all markers except those APP
markers that describe JPEG XT boxes by the syntax defined in Annex A.
3.1.28
legacy decoding path
collection of operations to be performed on the entropy coded data as described by Rec. ITU-T T.81 |
ISO/IEC 10918-1 jointly with the Legacy Refinement scans before this data is merged with the residual
data to form the final output image
3.1.29
legacy decoder
embodiment of a decoding process conforming to ITU. T Rec. T.81 | ISO/IEC 10918-1, confined to the
lossy DCT process and the baseline, sequential or progressive modes, decoding at most four components
to eight bits per component
3.1.30
legacy image
arrangement of sample values as described by applying the decoding process described by Rec.
ITU-T T.81 | ISO/IEC 10918-1 on the entropy coded data as defined by said standard
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 3

3.1.31
lossless
descriptive term for encoding and decoding processes and procedures in which the output of the
decoding procedure(s) is identical to the input to the encoding procedure(s)
3.1.32
lossless coding
mode of operation which refers to any one of the coding processes defined in this part of ISO/IEC 18477
in which all of the procedures are lossless
3.1.33
lossy
descriptive term for encoding and decoding processes which are not lossless
3.1.34
low dynamic range
image or image data comprised of data with no more than eight bits per sample
3.1.35
marker
two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and
hexadecimal FE
3.1.36
marker segment
marker together with its associated set of parameters
3.1.37
pixel
collection of sample values in the spatial image domain having all the same sample coordinates, e. g. a
pixel may consist of three samples describing its red, green and blue value
3.1.38
point transform
scaling of a sample or DCT coefficient by a factor
3.1.39
precision
number of bits allocated to a particular sample or DCT coefficient
3.1.40
procedure
set of steps which accomplishes one of the tasks which comprise an encoding or decoding process
3.1.41
residual decoding path
collection of operations applied to the entropy coded data contained in the residual data box and
residual refinement scan boxes up to the point where this data is merged with the base image to form
the final output image
3.1.42
residual image
extension image
sample values as reconstructed by inverse quantization and inverse DCT transformation applied to the
entropy-decoded coefficients described by the residual scan and residual refinement scans
3.1.43
residual scan
additional pass over the image data invisible to legacy decoders which provides additive and/or
multiplicative correction data of the base scans to allow reproduction of high dynamic range or wide
colour gamut data
4 PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

3.1.44
refinement scan
additional pass over the image data invisible to legacy decoders which provides additional least
significant bits to extend the precision of the DCT transformed coefficients. Refinement scans can be
either applied in the base or residual decoding path
3.1.45
sample
one element in the two-dimensional image array which comprises a component
3.1.46
sample grid
common coordinate system for all samples of an image
Note 1 to entry: The samples at the top left edge of the image have the coordinates (0, 0), the first coordinate
increases towards the right, the second towards the bottom.
3.1.47
superbox
box that carries other boxes as payload data
3.1.48
zero byte
0x00 byte
3.1.49
zig-zag sequence
specific sequential ordering of the DCT coefficients from (approximately) lowest spatial frequency to
highest
3.2 Symbols
X width of the sample grid in positions
Y height of the sample grid in positions
Nf number of components in an image
s subsampling factor of component i in horizontal direction
i, x
s subsampling factor of component i in vertical direction
i, y
H subsampling indicator of component i in the frame header
i
V subsampling indicator of component i in the frame header
i
v sample value at the sample grid position x, y
x, y
R additional number of DCT coefficient bits represented by refinement scans in the base decod-
h
ing path, 8+R is the number of non-fractional bits (i. e. bits in front of the “binary dot”) of the
h
output of the inverse DCT process in the base decoding path.
R additional number of DCT coefficient bits represented by refinement scans in the residual
r
decoding path. P+R is the number of non-fractional bits of the output of the invers DCT pro-
r
cess in the residual decoding path, where P is the frame-precision of the residual image as
recorded in the frame header of the residual codestream.
R Additional bits in the HDR image. 8+Rb is the sample precision of the reconstructed HDR
b
image.
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 5

3.3 Abbreviated terms
For the purposes of this part of ISO/IEC 18477, the following abbreviations apply.
ASCII American Standard Code for Information Interchange
LSB Least Significant Bit
MSB Most Significant Bit
HDR High Dynamic Range
IDR Intermediate Dynamic Range
LDR Low Dynamic Range
TMO Tone Mapping Operator
DCT Discrete Cosine Transformation
4 Conventions
4.1 Conformance language
This part of ISO/IEC 18477 consists of normative and informative text.
Normative text is that text which expresses mandatory requirements. The word “shall” is used to express
mandatory requirements strictly to be followed in order to conform to this part of ISO/IEC 18477 and
from which no deviation is permitted. A conforming implementation is one that fulfils all mandatory
requirements.
Informative text is text that is potentially helpful to the user, but not indispensable and can be removed,
changed or added editorially without affecting interoperability. All text in this part of ISO/IEC 18477
is normative, with the following exceptions: the Introduction, any parts of the text that are explicitly
labelled as “informative”, and statements appearing with the preamble “NOTE” and behaviour described
using the word “should”. The word “should” is used to describe behaviour that is encouraged but is not
required for conformance to this part of ISO/IEC 18477.
The keywords “may” and “need not” indicate a course of action that is permissible in a conforming
implementation.
The keyword “reserved” indicates a provision that is not specified at this time, shall not be used, and
may be specified in the future. The keyword “forbidden” indicates “reserved” and in addition indicates
that the provision will never be specified in the future.
4.2 Operators
NOTE Many of the operators used in this part of ISO/IEC 18477 are similar to those used in the C
programming language.
4.2.1 Arithmetic operators
+ Addition
− Subtraction (as a binary operator) or negation (as a unary prefix operator)
* Multiplication
/ Division without truncation or rounding.
6 PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

umod x umod a is the unique value y between 0 and a-1
for which y+Na = x with a suitable integer N.
4.2.2 Logical operators
|| Logical OR
&& Logical AND
! Logical NOT
∈ x ∈ {A, B} is defined as (x == A || x == B)
∉ x ∉ {A, B} is defined as (x != A && x != B)
4.2.3 Relational operators
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
== Equal to
!= Not equal to
4.2.4 Precedence order of operators
Operators are listed below in descending order of precedence. If several operators appear in the same
line, they have equal precedence. When several operators of equal precedence appear at the same level
in an expression, evaluation proceeds according to the associativity of the operator either from right to
left or from left to right.
Operators Type of operation Associativity
(), [ ], . Expression Left to Right
− Unary negation
*, / Multiplication Left to Right
umod Modulo (remainder) Left to Right
+, − Addition and Subtraction Left to Right
< , >, <=, >= Relational Left to Right
4.2.5 Mathematical functions
Ceil of x. Returns the smallest integer that is greater than or equal to x.
 
x
 
 
Floor of x. Returns the largest integer that is lesser than or equal to x.
 
x
 
 
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 7

|x| Absolute value, is –x for x < 0, otherwise x.
sign(x) Sign of x, 0 if x is 0, +1 if x is positive, -1 if x is negative.
clamp(x- Clamps x to the range [min, max]: returns min if x < min, max if x > max or other-
,min,max) wise x.
a
x Raises the value of x to the power of a. x is a non-negative real number, a is a real
a
number. x is equal to exp(a×log(x)) where exp is the exponential function and
a
log() the natural logarithm. If x is 0 and a is positive, x is defined to be 0.
5 General
The purpose of this Clause is to give an informative overview of the elements specified in this part of
ISO/IEC 18477. Another purpose is to introduce many of the terms which are defined in Clause 3. These
terms are printed in italics upon first usage in this Clause.
There are three elements specified in this part of ISO/IEC 18477:
a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source
image data and encoder specifications, and by means of a specified set of procedures generates as
output a codestream.
b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by
means of a specified set of procedures generates as output digital reconstructed image data.
c) The codestream is a compressed image data representation which includes all necessary data to
allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data
might be required that define the interpretation of the sample data, such as colour space or the
spatial dimensions of the samples.
5.1 High level overview on JPEG XT ISO/IEC 18477-3
The high-level syntax of an ISO/IEC 18477-3 compliant codestream is identical to that defined in
ISO/IEC 18477-1, which is a subset of the syntax defined in Rec. ITU-T T.81 | ISO/IEC 10918-1. Marker
definitions and the syntax of the markers defined in the above Recommendation remain in force and
unchanged. However, this part of ISO/IEC 18477 defines the APP marker, reserved in the legacy
Recommendation | Standard for encoding additional syntax elements. Legacy decoders will skip and
ignore such marker elements, and hence will only be able to decode the image encoded by the legacy
syntax elements. This part of ISO/IEC 18477 codestream will be denoted the legacy codestream in
the following.
This part of ISO/IEC 18477 extends the legacy standard by a syntax element called “Box”, using
the APP marker to hide the extended syntax elements from legacy applications. Boxes and their
encoding are specified in Annex A. A common set of boxes used by all subsequent parts of the family
of ISO/IEC 18477 standards are defined in Annex B. A box may either include additional metadata
required to decode the complete codestream to full precision, full dynamic range or without loss, or
may contain entropy coded image data itself.
How entropy coded data from the side-channels contained in the boxes and entropy coded data in the
legacy codestream are merged together is application dependent and defined in subsequent parts of the
ISO/IEC 18477 standards family. It is beyond the scope of this part of ISO/IEC 18477 to define this process.
5.2 Encoder requirements
An encoder is only required to meet the compliance tests and to generate the codestream according to
the syntax defined in this part of ISO/IEC 18477. How the codestream is algorithmically constructed and
how the boxes are laid out is implementation specific and not within scope of this part of ISO/IEC 18477.
Subsequent Recommendations | Standards of the ISO/IEC 18477 family may, however, define additional
8 PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

restrictions and requirements, either within the standard itself, or within profiles that restrict the
freedom of the encoder further.
An encoder claiming to be compliant to one of these profiles then shall conform to the syntax constraints
defined in the corresponding profile of the corresponding part of ISO/IEC 18477.
5.3 Decoder requirements
A decoding process converts compressed image data to reconstructed image data. A decoder shall
interpret the syntax of the box structures, namely the packaging of boxes into APP markers specified
in Annex A correctly. It is not required, though, that a conforming decoder is capable of interpreting
the semantics of all box types defined in this or subsequent members of the ISO/IEC 18477 family of
standards. A decoder implementation should skip over boxes it is unable or not willing to support
unless such a box is indicated as a mandatory box in the profile and part of ISO/IEC 18477 the decoder
claims to be compliant with.
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 9

Annex A
(normative)
JPEG XT marker segment
A.1 General
This Annex extends the compressed bitstream syntax of ISO/IEC 18477-1:2015, Annex B by introducing
additional markers and marker segments carrying side channel and coding parameters that control the
decoding process. While the corresponding decoding processes are specified in subsequent parts of
the ISO/IEC 18477 family of standards, this Annex defines a generic mechanism by which such syntax
elements are embedded into ISO/IEC 18477-1 compliant files.
The syntax element and the building block defined in this Annex is called a Box. This part of ISO/IEC 18477
defined several types of boxes; the definition of each specific box type defines the kind of information
that may be found within a box of that type. Some boxes will be defined to contain other boxes. Box types
are specified in Annex B, or in subsequent members of the ISO/IEC 18477 family of standards.
Boxes are, unlike in other Recommendations | International Standards not a top-level syntax elements,
but themselves wrapped in JPEG XT Marker segments introduced in A.2. Since boxes may logically carry
more than 64K (65536) bytes of payload data, but marker segments can at most carry 64K of data, a single
logical box may need to be broken up into several marker segments. Syntax elements within the marker
segment then instruct the decoder how to put the contents in the marker segment back into a single box.
Additionally, a JPEG XT file may contain several boxes of the same box type, though with differing
content. The syntax of the marker segment provides a mechanism to distinguish between two logically
different boxes of the same box type.
A.2 Marker assignments
The following additional marker is defined in this part of ISO/IEC 18477:
Table A.1 — Additional markers and marker segments
Code Assignment Symbol Description Defined in
0xFFEB APP JPEG XT Marker This part of ISO/IEC 18477
Each box is encapsulated in at least one JPEG XT marker segment, and may extend over several marker
segments if the size of its payload data exceeds the capacity of the JPEG XT marker. See A.4 for how to
merge JPEG XT marker segments to logical boxes.
A.3 Codestream syntax
The high-level syntax of ISO/IEC 18477-2 codestreams shall follow the syntax specified in the
ISO/IEC 18477-1 standard, which is a subset of Rec. ITU-T T.81 | ISO/IEC 10918-1. Specifically, since JPEG
XT boxes are represented by APP marker segments, ISO/IEC 18477-1 conforming implementations
10 PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

that do not implement this or any subsequent Recommendation | International Standard of the
ISO/IEC 18477 family will ignore them.
NOTE Note that by the above paragraph, byte stuffing and padding as defined in Rec. ITU-T T.81 |
ISO/IEC 10918-1 also applies to entropy coded data contained in APP markers. Note further that due to the
segmentation of entropy coded data into application markers, it may happen that the last byte of an APP
marker segment is 0xff, and that the corresponding “stuffed” zero byte is part of a subsequent application marker
segment. This does not cause a problem for legacy decoders since they are required to skip over unknown
application marker segments in first place, without interpreting their content.
A.4 JPEG XT boxes
JPEG XT structures any additional data that remains invisible to legacy decoders in JPEG XT boxes. A
box is a generic data container that has both a type, and a body that carries its actual payload. The type
is a four-byte identifier that allows decoders to identify its purpose and the structure of its content. A
JPEG XT file may also carry several boxes of identical type. These boxes are logically distinct and differ
in the value of the Box Instance Number En of the JPEG Extensions marker segment, see Figure A.1.
Boxes are embedded into the codestream format by encapsulating them into one or several JPEG XT
marker segments. Since boxes can grow large in size, a single box may extend over multiple JPEG XT
marker segments, and decoders may have to merge multiple marker segments before they can attempt
to decode the box content. JPEG XT Marker segments that belong to the same logical box and require
merging prior to interpretation have identical Box Instance Number fields En, but differ in the
Packet Sequence Number Z.
The JPEG XT marker segment consists of the APP marker that is reserved for this part of ISO/IEC 18477,
the size of the marker segment in bytes (not including the marker), a common identifier identical for all
boxes and box types, the box instance number field, the packet sequence number field, the box length,
the box type and the actual box payload data. The box length field can be extended by a Box Length
Extension field that allows box sizes beyond 2 -1 bytes. Figure A.1 depicts the high-level syntax of a
JPEG XT Marker segment.
Figure A.1 — Organization of the JPEG XT marker segment
The meaning of the fields of the JPEG XT Marker segment is as follows:
The Le field is the size of the marker segment, not including the marker. It measures the size from the Le
field up to the end of the marker segment.
NOTE Since boxes may extend over several marker segments, the Le field is typically not derived from the
Box Length field and care must be taken not to confuse the two. The Le field defines the amount of data carried
by a single marker segment; the Box Length is the logical size of the box. If a box extends over multiple JPEG XT
Extension marker segments, the Le field measures the total size of each individual marker segment and may
differ from segment to segment, whereas the Box Length field remains identical in all segments that contribute
to the same logical box.
The Common Identifier is a 16 bit field that allows decoders to identify an APP marker segment as a
JPEG XT marker segment. Its value shall be 0x4A50. It is identical for all boxes and all box types.
The Box Instance Number is a 16-bit field that disambiguates between JPEG XT marker segments
carrying boxes of identical type, but differing content. That is, data that belongs to logically distinct
boxes with the same box type differ in their Box Instance Number. Encoders shall concatenate the
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 11

payload data of those JPEG XT marker segments whose Box Instance Number and Type Identifier fields
are identical in the order of increasing Packet Sequence Numbers.
NOTE A codestream containing multiple boxes of the same box type uses the box sequence number field to
instruct the decoder which JPEG XT Extension marker segments to merge into one box. Refinement coding makes
use of this process: The entropy coded data of each refinement scan is placed into its individual box, using the
box instance number field to disambiguate the scans.
The Packet Sequence Number is a 32-bit field that specifies the order in which payload data shall be
merged. Concatenation proceeds in the order of increasing Packet Sequence Numbers.
The Box Length LBox is a four byte field that specifies the box length. It measures the size of the
payload data of all JPEG Extensions markers of the same box type and enumerator combined, plus the
size of a single copy of the Box Type, plus the size of a single copy of the Box Length, plus the length of a
single copy of the Box Length Extender if present. The box length does not include the size of the packet
sequence number, the box instance number, the common identifier, the marker length or the marker.
NOTE A box having a payload data of 32 bytes will, by this, have a box length of 32+4+4 = 40. If this box is split
evenly over two JPEG XT marker segments, each marker segment will have a Le value of 2+2+2+4+(4+4+16) = 50.
If the size of the box payload is less than 2 -8 bytes, then all fields except the XLBox field, that is:
Le, CI, En, Z, LBox and TBox, shall be present in all JPEG XT marker segment representing this box,
regardless of whether the marker segments starts this box, or continues a box started by a former JPEG
XT Marker segment.
The Box Type TBox is a 32-bit field that specifies the type of the payload data, and thus its syntax. Box
types are specified in Annex B and in subsequent parts of the ISO/IEC 18477 family of standards. Since
ITU | ISO/IEC may add additional box types that define additional meta-information on the image later,
decoders shall disregard box types they do not understand.
If the box length is larger than 2 bytes, the LBox field is no longer sufficient to encode the box length
and the XLBox field is required additionally. In this case, the LBox field shall be one and the XLBox field
carries the box size instead. If the box length is larger than 2 , the XLBox field shall be present in all
JPEG XT marker segments of the same box type and same box Instance Number, and its value shall be
identical in all JPEG XTmarker segments of the same Box Type and same Box Instance Number.
The payload data carries the contents of the box. Its syntax is specified along with the corresponding
box types in this Annex.
Profiles defined in subsequent parts of the ISO/IEC 18477 family of standards add additional constraints
in how payload data may be broken up into individual JPEG XT marker segments.
Table A.2 — JPEG XT marker parameters and sizes
Parameter Size (bits) Value Meaning
APP 16 0xFFEB Identifies all JPEG XT Marker
Segments.
Le 16 8.65535 Length of the marker
segment, including the size
itself, all parameters, and
the size of the payload data
contained in this marker seg-
ment alone. Does not include
the marker itself.
12 PROOF/ÉPREUVE © ISO/IEC 2015 – All rights reserved

Table A.2 (continued)
Parameter Size (bits) Value Meaning
CI 16 0x4A50 The special value 0x4A50
(ASCII: ‘J’ ‘P’) allows readers
(ASCII encoding of “JP”)
to distinguish the JPEG
Extensions marker seg-
ment from other uses of the
APP marker. Readers shall
ignore APP markers for the
purpose of decoding JPEG
extensions if this value does
not match.
En 16 1.65535 The Box Instance Number
disambiguates payload data
of the same box type and
defines which payload data
is to be concatenated. Only
payload data whose box type
and enumerator is identical
shall be concatenated.
The value 0 is reserved for
ITU | ISO/IEC purposes.
Z 32 1.2 -1 Packet Sequence number
defining the order in which
the payload data shall be
concatenated. Concatenation
shall proceed in order of
increasing Z values.
The value 0 is reserved for
ITU | ISO/IEC purposes.
LBox 32 1 or 8.2 -1 Box length. This is the total
length of the concatenated
payload data, including a
single copy of the LBox and
Tbox field, and a single copy
of the XLBox field, if present.
The values 0 and two to
seven are reserved for ITU |
ISO/IEC purposes and shall
not be used.
TBox 32 0.2 -1 Box type. The box type
defines the syntax of the
concatenated payload data.
Also, the box type and the
box instance number specify
which payload data to merge.
XLBox 0 or 64 16.2 -1 If the LBox field is one, this
field contains the size of
the concatenated payload
data plus the box overhead
instead.
Otherwise, this field is miss-
ing.
The values 0 to 15 are
reserved for ITU | ISO/IEC
purposes.
© ISO/IEC 2015 – All rights reserved PROOF/ÉPREUVE 13

Table A.2 (continued)
Parameter Size (bits) Value Meaning
Payload Data Varies Varies The syntax of the concate-
nated payload data is defined
in Annex B and subse-
quent members of the ISO/
IEC 18477 family of stand-
ards.
NOTE The size of the XLBox field itself also contributes to the box length, hence creating a corner case for
boxes larger than 4GB. If an encoder detects that the value of the LBox field, computed as the sum of the payload
data size and the box overhead, overruns the 4GB boundary LBox is able to express, it is not sufficient to create
an XLBox field and store the sum there. The box size needs to be enlarged by the size of the XLBox field as well,
namely by eight bytes.
A.5 Boxes and superboxes
Some boxes may carry other boxes as payload data. Such boxes are denoted as superboxes. The payload
size of a superbox is given by the sum of the box lengths of all the boxes it contains.
Boxes within superboxes do not consist of JPEG XT marker, neither a marker size, neither a Common
Identifier, neither an Box Instance Number nor a Packet Sequence Number shall be present. They start
with the LBox field. The additional fields are not required since their composition from markers into
boxes is unambiguous.
NOTE The length of a box within a superbox is derived in the same way from the size of the payload data
as for top-level boxes within JP
...


INTERNATIONAL ISO/IEC
STANDARD 18477-3
First edition
2015-12-15
Information technology — Scalable
compression and coding of
continuous-tone still images —
Part 3:
Box file format
Technologies de l’information — Compression échelonnable et codage
d’images plates en ton continu —
Partie 3: Format de la liste de fichiers
Reference number
©
ISO/IEC 2015
© ISO/IEC 2015, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2015 – All rights reserved

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms, definitions, abbreviated terms and symbols . 1
3.1 Terms and definitions . 1
3.2 Symbols . 5
3.3 Abbreviated terms . 6
4 Conventions . 6
4.1 Conformance language . 6
4.2 Operators . 6
4.2.1 Arithmetic operators . 6
4.2.2 Logical operators . 7
4.2.3 Relational operators . 7
4.2.4 Precedence order of operators . 7
4.2.5 Mathematical functions . 7
5 General . 8
5.1 High level overview on JPEG XT ISO/IEC 18477-3 . 8
5.2 Encoder requirements . 8
5.3 Decoder requirements. 9
Annex A (normative) JPEG XT marker segment .10
Annex B (normative) Common box types .15
Annex C (normative) Point transformation .39
Annex D (normative) Checksum computation .41
Bibliography .42
© ISO/IEC 2015 – All rights reserved iii

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical
Barriers to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 29, Coding of
audio, picture, multimedia and hypermedia information.
ISO/IEC 18477 contains the following parts under the general title Information technology — Scalable
compression and coding of continuous-tone still images:
— Part 1: Scalable compression and coding of continuous-tone still images
— Part 2: Extensions for high dynamic range images
— Part 3: Box file format
— Part 6: IDR Integer Coding
— Part 7: HDR Floating-Point Coding
— Part 8: Lossless and Near-lossless Coding
— Part 9: Alpha Channel Coding
The following parts are under preparation:
— Part 4: Conformance testing
— Part 5: Reference software
iv © ISO/IEC 2015 – All rights reserved

Introduction
This part of ISO/IEC 18477 specifies an extensible file format, denoted as JPEG XT, which is built on top
of the existing Rec. ITU-T T.81 | ISO/IEC 10918-1 codestream definition. While typically file formats
encapsulate codestreams by means of additional syntax elements such as boxes, the file format
structure specified here rather embeds the syntax elements of the file format, called boxes, into the
codestream. The necessity for this unusual arrangement is the backwards compatibility to the legacy
standard and the application toolchain built around it; that is, legacy applications conforming to Rec.
ITU-T T.81 | ISO/IEC 10918-1 will be able to decode image information embedded in files conforming to
the family of ISO/IEC 18477 standards, though will only be able to recover a three component, 8 bits per
sample, lower quality version of the image described by the full file.
For more demanding applications, it is not uncommon to use a bit depth of 16, providing 65 536
representable values to describe each channel within a pixel, resulting on over 2,8 × 10 representable
colour values. In some less common scenarios, even greater bit depths are used, and sometimes the
dynamic range of the image is so high that a floating point based encoding is desirable. In addition to
image information, some applications also require an additional opacity channel, a feature not available
from the legacy standard.
Most common photo and image formats use an 8-bit or 16-bit unsigned integer value to represent some
function of the intensity of each colour channel. While it might be theoretically possible to agree on
one method for assigning specific numerical values to real world colours, doing so is not practical.
Since any specific device has its own limited range for colour reproduction, the device’s range may be a
small portion of the agreed-upon universal colour range. As a result, such an approach is an extremely
inefficient use of the available numerical values, especially when using only 8 bits (or 256 unique
values) per channel. To represent pixel values as efficiently as possible, devices use a numeric encoding
optimized for their own range of possible colours or gamut.
JPEG XT is designed to extend the legacy JPEG standard towards higher bitdepth, higher dynamic range,
wide colour gamut content while simultaneously allowing legacy applications to decode the image data in
the codestream to a standard low dynamic range image represented by only eight bits per channel. The
goal is to provide a backwards compatible coding specification that allows legacy applications and existing
toolchains to continue to operate on codestreams conforming to the family of ISO/IEC 18477 standards.
JPEG XT has been designed to be backwards compatible to legacy applications while at the same time
having a small coding complexity; JPEG XT uses, whenever possible, functional blocks of Rec. ITU-T T.81
| ISO/IEC 10918-1 to extend the functionality of the legacy JPEG Coding System.
This part of ISO/IEC 18477 is an extension of ISO/IEC 18477-1, a compression system for continuous
tone digital still images which is backwards compatible with Rec. ITU-T T.81 | ISO/IEC 10918-1. That is,
legacy applications conforming to Rec. ITU-T T.81 | ISO/IEC 10918-1 will be able to reconstruct streams
generated by an encoder conforming to this part of ISO/IEC 18477, though will possibly not be able to
reconstruct such streams in full dynamic range, full quality or other features defined in this part of
ISO/IEC 18477.
The aim of this part of ISO/IEC 18477 is to provide a flexible and extensible framework to enrich
ISO/IEC 18477-1 compliant codestreams with side-channels and metadata. The syntax chosen in this
part of ISO/IEC 18477 defines a mechanism to embed syntax elements denoted as “Boxes” into Rec.
ITU-T T.81 | ISO/IEC 10918-1 compliant codestreams. The box syntax used here is identical to that
defined in the JPEG family of standards, for example JPEG 2000 (Rec. ITU-T T. 800 | ISO/IEC 15444-1).
Boxes will then carry either additional image data, to enable encoding of images of higher bitdepth, high
dynamic range, include alpha channels etc., or will carry metadata that describes the decoding process
of the legacy Rec. ITU-T T.81 | ISO/IEC 10918-1 codestream and the side channels to an extended or
high dynamic range image.
© ISO/IEC 2015 – All rights reserved v

INTERNATIONAL STANDARD ISO/IEC 18477-3:2015(E)
Information technology — Scalable compression and
coding of continuous-tone still images —
Part 3:
Box file format
1 Scope
This part of ISO/IEC 18477 specifies a coding format, referred to as JPEG XT, which is designed primarily
for continuous-tone photographic content.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS)
ISO/IEC 18477-1:2015, Information technology — Scalable compression and coding of continuous-tone still
images — Part 1: Scalable compression and coding of continuous-tone still images
Rec. ITU-T T.81 | ISO/IEC 10918-1, Information Technology — Digital Compression and Coding of
Continuous Tone Still Images – Requirements and Guidelines
Rec. ITU-T T.871 | ISO/IEC 10918-5, Information technology — Digital compression and coding of
continuous-tone still images: JPEG File Interchange Format
3 Terms, definitions, abbreviated terms and symbols
3.1 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1.1
ASCII encoding
encoding of text characters and text strings according to ISO/IEC 10646
3.1.2
base decoding path
process of decoding legacy codestream and refinement data to the base image, jointly with all further
steps until residual data is added to the values obtained from the residual codestream
3.1.3
base image
collection of sample values obtained by entropy decoding the DCT coefficients of the legacy codestream
and the refinement codestream, and inversely DCT transforming them jointly
3.1.4
binary decision
choice between two alternatives
© ISO/IEC 2015 – All rights reserved 1

3.1.5
bit stream
partially encoded or decoded sequence of bits comprising an entropy-coded segment
3.1.6
block
8 × 8 array of samples or an 8 × 8 array of DCT coefficient values of one component
3.1.7
box
structured collection of data describing the image or the image decoding process embedded into one or
multiple APP marker segments
Note 1 to entry: See Annex A for the definition of boxes
3.1.8
byte
group of 8 bits
3.1.9
coder
embodiment of a coding process
3.1.10
coding
encoding or decoding
3.1.11
coding model
procedure used to convert input data into symbols to be coded
3.1.12
(coding) process
general term for referring to an encoding process, a decoding process, or both
3.1.13
compression
reduction in the number of bits used to represent source image data
3.1.14
component
two-dimensional array of samples having the same designation in the output or display device. An
image typically consists of several components, e. g. red, green and blue
3.1.15
continuous tone image
image whose components have more than one bit per sample
3.1.16
decoder
embodiment of a decoding process
3.1.17
decoding process
process which takes as its input compressed image data and outputs a continuous-tone image
3.1.18
encoder
embodiment of an encoding process
2 © ISO/IEC 2015 – All rights reserved

3.1.19
encoding process
process which takes as its input a continuous-tone image and outputs compressed image data
3.1.20
entropy decoder
embodiment of an entropy decoding procedure
3.1.21
entropy decoding
lossless procedure which recovers the sequence of symbols from the sequence of bits produced by the
entropy encoder
3.1.22
entropy encoder
embodiment of an entropy encoding procedure
3.1.23
entropy encoding
lossless procedure which converts a sequence of input symbols into a sequence of bits such that the
average number of bits per symbol approaches the entropy of the input symbols
3.1.24
high dynamic range
image or image data comprised of more than eight bits per sample
3.1.25
Intermediate dynamic range
image or image data comprised of more than eight bits per sample
3.1.26
Joint Photographic Experts Group
JPEG
informal name of the committee which created this part of ISO/IEC 18477
Note 1 to entry: The “joint” comes from the ITU and ISO/IEC collaboration.
3.1.27
legacy codestream
collection of markers and syntax elements defined by Rec. ITU-T T.81 | ISO/IEC 10918-1 bare any syntax
elements defined by the family ISO/IEC 18477 standards
Note 1 to entry: That is, the legacy codestream consists of the collection of all markers except those APP
markers that describe JPEG XT boxes by the syntax defined in Annex A.
3.1.28
legacy decoding path
collection of operations to be performed on the entropy coded data as described by Rec. ITU-T T.81 |
ISO/IEC 10918-1 jointly with the Legacy Refinement scans before this data is merged with the residual
data to form the final output image
3.1.29
legacy decoder
embodiment of a decoding process conforming to ITU. T Rec. T.81 | ISO/IEC 10918-1, confined to the
lossy DCT process and the baseline, sequential or progressive modes, decoding at most four components
to eight bits per component
3.1.30
legacy image
arrangement of sample values as described by applying the decoding process described by Rec.
ITU-T T.81 | ISO/IEC 10918-1 on the entropy coded data as defined by said standard
© ISO/IEC 2015 – All rights reserved 3

3.1.31
lossless
descriptive term for encoding and decoding processes and procedures in which the output of the
decoding procedure(s) is identical to the input to the encoding procedure(s)
3.1.32
lossy
descriptive term for encoding and decoding processes which are not lossless
3.1.33
low dynamic range
image or image data comprised of data with no more than eight bits per sample
3.1.34
marker
two-byte code in which the first byte is hexadecimal FF and the second byte is a value between 1 and
hexadecimal FE
3.1.35
marker segment
marker together with its associated set of parameters
3.1.36
pixel
collection of sample values in the spatial image domain having all the same sample coordinates, e. g. a
pixel may consist of three samples describing its red, green and blue value
3.1.37
point transform
scaling of a sample or DCT coefficient by a factor
3.1.38
precision
number of bits allocated to a particular sample or DCT coefficient
3.1.39
procedure
set of steps which accomplishes one of the tasks which comprise an encoding or decoding process
3.1.40
residual decoding path
collection of operations applied to the entropy coded data contained in the residual data box and
residual refinement scan boxes up to the point where this data is merged with the base image to form
the final output image
3.1.41
residual image
extension image
sample values as reconstructed by inverse quantization and inverse DCT transformation applied to the
entropy-decoded coefficients described by the residual scan and residual refinement scans
3.1.42
residual scan
additional pass over the image data invisible to legacy decoders which provides additive and/or
multiplicative correction data of the base scans to allow reproduction of high dynamic range or wide
colour gamut data
4 © ISO/IEC 2015 – All rights reserved

3.1.43
refinement scan
additional pass over the image data invisible to legacy decoders which provides additional least
significant bits to extend the precision of the DCT transformed coefficients. Refinement scans can be
either applied in the base or residual decoding path
3.1.44
sample
one element in the two-dimensional image array which comprises a component
3.1.45
sample grid
common coordinate system for all samples of an image
Note 1 to entry: The samples at the top left edge of the image have the coordinates (0, 0), the first coordinate
increases towards the right, the second towards the bottom.
3.1.46
superbox
box that carries other boxes as payload data
3.1.47
zero byte
0x00 byte
3.1.48
zig-zag sequence
specific sequential ordering of the DCT coefficients from (approximately) lowest spatial frequency to
highest
3.2 Symbols
X width of the sample grid in positions
Y height of the sample grid in positions
Nf number of components in an image
s subsampling factor of component i in horizontal direction
i, x
s subsampling factor of component i in vertical direction
i, y
H subsampling indicator of component i in the frame header
i
V subsampling indicator of component i in the frame header
i
v sample value at the sample grid position x, y
x, y
R additional number of DCT coefficient bits represented by refinement scans in the base decod-
h
ing path, 8+R is the number of non-fractional bits (i. e. bits in front of the “binary dot”) of the
h
output of the inverse DCT process in the base decoding path
R additional number of DCT coefficient bits represented by refinement scans in the residual de-
r
coding path. P+R is the number of non-fractional bits of the output of the invers DCT process
r
in the residual decoding path, where P is the frame-precision of the residual image as record-
ed in the frame header of the residual codestream
R additional bits in the HDR image. 8+Rb is the sample precision of the reconstructed HDR
b
image
© ISO/IEC 2015 – All rights reserved 5

3.3 Abbreviated terms
For the purposes of this part of ISO/IEC 18477, the following abbreviations apply.
ASCII American Standard Code for Information Interchange
LSB Least Significant Bit
MSB Most Significant Bit
HDR High Dynamic Range
IDR Intermediate Dynamic Range
LDR Low Dynamic Range
TMO Tone Mapping Operator
DCT Discrete Cosine Transformation
4 Conventions
4.1 Conformance language
This part of ISO/IEC 18477 consists of normative and informative text.
Normative text is that text which expresses mandatory requirements. The word “shall” is used to express
mandatory requirements strictly to be followed in order to conform to this part of ISO/IEC 18477 and
from which no deviation is permitted. A conforming implementation is one that fulfils all mandatory
requirements.
Informative text is text that is potentially helpful to the user, but not indispensable and can be removed,
changed or added editorially without affecting interoperability. All text in this part of ISO/IEC 18477
is normative, with the following exceptions: the Introduction, any parts of the text that are explicitly
labelled as “informative”, and statements appearing with the preamble “NOTE” and behaviour described
using the word “should”. The word “should” is used to describe behaviour that is encouraged but is not
required for conformance to this part of ISO/IEC 18477.
The keywords “may” and “need not” indicate a course of action that is permissible in a conforming
implementation.
The keyword “reserved” indicates a provision that is not specified at this time, shall not be used, and
may be specified in the future. The keyword “forbidden” indicates “reserved” and in addition indicates
that the provision will never be specified in the future.
4.2 Operators
NOTE Many of the operators used in this part of ISO/IEC 18477 are similar to those used in the C
programming language.
4.2.1 Arithmetic operators
+ Addition
− Subtraction (as a binary operator) or negation (as a unary prefix operator)
* Multiplication
/ Division without truncation or rounding.
6 © ISO/IEC 2015 – All rights reserved

umod x umod a is the unique value y between 0 and a-1
for which y+Na = x with a suitable integer N.
4.2.2 Logical operators
|| Logical OR
&& Logical AND
! Logical NOT
∈ x ∈ {A, B} is defined as (x == A || x == B)
∉ x ∉ {A, B} is defined as (x != A && x != B)
4.2.3 Relational operators
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
== Equal to
!= Not equal to
4.2.4 Precedence order of operators
Operators are listed below in descending order of precedence. If several operators appear in the same
line, they have equal precedence. When several operators of equal precedence appear at the same level
in an expression, evaluation proceeds according to the associativity of the operator either from right to
left or from left to right.
Operators Type of operation Associativity
(), [ ], . Expression Left to Right
− Unary negation
*, / Multiplication Left to Right
umod Modulo (remainder) Left to Right
+, − Addition and Subtraction Left to Right
< , >, <=, >= Relational Left to Right
4.2.5 Mathematical functions
Ceil of x. Returns the smallest integer that is greater than or equal to x.
 
x
 
 
Floor of x. Returns the largest integer that is lesser than or equal to x.
 
x
 
 
© ISO/IEC 2015 – All rights reserved 7

|x| Absolute value, is –x for x < 0, otherwise x.
sign(x) Sign of x, 0 if x is 0, +1 if x is positive, -1 if x is negative.
clamp(x,min,max) Clamps x to the range [min, max]: returns min if x < min, max if x > max or
otherwise x.
a
x Raises the value of x to the power of a. x is a non-negative real number, a is a
a
real number. x is equal to exp(a×log(x)) where exp is the exponential func-
a
tion and log() the natural logarithm. If x is 0 and a is positive, x is defined to
be 0.
5 General
The purpose of this Clause is to give an informative overview of the elements specified in this part of
ISO/IEC 18477. Another purpose is to introduce many of the terms which are defined in Clause 3. These
terms are printed in italics upon first usage in this Clause.
There are three elements specified in this part of ISO/IEC 18477:
a) An encoder is an embodiment of an encoding process. An encoder takes as input digital source
image data and encoder specifications, and by means of a specified set of procedures generates as
output a codestream.
b) A decoder is an embodiment of a decoding process. A decoder takes as input a codestream, and by
means of a specified set of procedures generates as output digital reconstructed image data.
c) The codestream is a compressed image data representation which includes all necessary data to
allow a (full or approximate) reconstruction of the sample values of a digital image. Additional data
might be required that define the interpretation of the sample data, such as colour space or the
spatial dimensions of the samples.
5.1 High level overview on JPEG XT ISO/IEC 18477-3
The high-level syntax of an ISO/IEC 18477-3 compliant codestream is identical to that defined in
ISO/IEC 18477-1, which is a subset of the syntax defined in Rec. ITU-T T.81 | ISO/IEC 10918-1. Marker
definitions and the syntax of the markers defined in the above Recommendation remain in force and
unchanged. However, this part of ISO/IEC 18477 defines the APP marker, reserved in the legacy
Recommendation | Standard for encoding additional syntax elements. Legacy decoders will skip and
ignore such marker elements, and hence will only be able to decode the image encoded by the legacy
syntax elements. This part of ISO/IEC 18477 codestream will be denoted the legacy codestream in
the following.
This part of ISO/IEC 18477 extends the legacy standard by a syntax element called “Box”, using
the APP marker to hide the extended syntax elements from legacy applications. Boxes and their
encoding are specified in Annex A. A common set of boxes used by all subsequent parts of the family
of ISO/IEC 18477 standards are defined in Annex B. A box may either include additional metadata
required to decode the complete codestream to full precision, full dynamic range or without loss, or
may contain entropy coded image data itself.
How entropy coded data from the side-channels contained in the boxes and entropy coded data in the
legacy codestream are merged together is application dependent and defined in subsequent parts of the
ISO/IEC 18477 standards family. It is beyond the scope of this part of ISO/IEC 18477 to define this process.
5.2 Encoder requirements
An encoder is only required to meet the compliance tests and to generate the codestream according to
the syntax defined in this part of ISO/IEC 18477. How the codestream is algorithmically constructed and
how the boxes are laid out is implementation specific and not within scope of this part of ISO/IEC 18477.
8 © ISO/IEC 2015 – All rights reserved

Subsequent Recommendations | Standards of the ISO/IEC 18477 family may, however, define additional
restrictions and requirements, either within the standard itself, or within profiles that restrict the
freedom of the encoder further.
An encoder claiming to be compliant to one of these profiles then shall conform to the syntax constraints
defined in the corresponding profile of the corresponding part of ISO/IEC 18477.
5.3 Decoder requirements
A decoding process converts compressed image data to reconstructed image data. A decoder shall
interpret the syntax of the box structures, namely the packaging of boxes into APP markers specified
in Annex A correctly. It is not required, though, that a conforming decoder is capable of interpreting
the semantics of all box types defined in this or subsequent members of the ISO/IEC 18477 family of
standards. A decoder implementation should skip over boxes it is unable or not willing to support
unless such a box is indicated as a mandatory box in the profile and part of ISO/IEC 18477 the decoder
claims to be compliant with.
© ISO/IEC 2015 – All rights reserved 9

Annex A
(normative)
JPEG XT marker segment
A.1 General
This Annex extends the compressed bitstream syntax of ISO/IEC 18477-1:2015, Annex B by introducing
additional markers and marker segments carrying side channel and coding parameters that control the
decoding process. While the corresponding decoding processes are specified in subsequent parts of
the ISO/IEC 18477 family of standards, this Annex defines a generic mechanism by which such syntax
elements are embedded into ISO/IEC 18477-1 compliant files.
The syntax element and the building block defined in this Annex is called a Box. This part of ISO/IEC 18477
defined several types of boxes; the definition of each specific box type defines the kind of information
that may be found within a box of that type. Some boxes will be defined to contain other boxes. Box types
are specified in Annex B, or in subsequent members of the ISO/IEC 18477 family of standards.
Boxes are, unlike in other Recommendations | International Standards not a top-level syntax elements,
but themselves wrapped in JPEG XT Marker segments introduced in A.2. Since boxes may logically carry
more than 64K (65536) bytes of payload data, but marker segments can at most carry 64K of data, a single
logical box may need to be broken up into several marker segments. Syntax elements within the marker
segment then instruct the decoder how to put the contents in the marker segment back into a single box.
Additionally, a JPEG XT file may contain several boxes of the same box type, though with differing
content. The syntax of the marker segment provides a mechanism to distinguish between two logically
different boxes of the same box type.
A.2 Marker assignments
The following additional marker is defined in this part of ISO/IEC 18477:
Table A.1 — Additional markers and marker segments
Code Assignment Symbol Description Defined in
0xFFEB APP JPEG XT Marker This part of ISO/IEC 18477
Each box is encapsulated in at least one JPEG XT marker segment, and may extend over several marker
segments if the size of its payload data exceeds the capacity of the JPEG XT marker. See A.4 for how to
merge JPEG XT marker segments to logical boxes.
A.3 Codestream syntax
The high-level syntax of ISO/IEC 18477-3 codestreams shall follow the syntax specified in the
ISO/IEC 18477-1 standard, which is a subset of Rec. ITU-T T.81 | ISO/IEC 10918-1. Specifically, since JPEG
XT boxes are represented by APP marker segments, ISO/IEC 18477-1 conforming implementations
10 © ISO/IEC 2015 – All rights reserved

that do not implement this or any subsequent Recommendation | International Standard of the
ISO/IEC 18477 family will ignore them.
NOTE Note that by the above paragraph, byte stuffing and padding as defined in Rec. ITU-T T.81 |
ISO/IEC 10918-1 also applies to entropy coded data contained in APP markers. Note further that due to the
segmentation of entropy coded data into application markers, it may happen that the last byte of an APP
marker segment is 0xff, and that the corresponding “stuffed” zero byte is part of a subsequent application marker
segment. This does not cause a problem for legacy decoders since they are required to skip over unknown
application marker segments in first place, without interpreting their content.
A.4 JPEG XT boxes
JPEG XT structures any additional data that remains invisible to legacy decoders in JPEG XT boxes. A
box is a generic data container that has both a type, and a body that carries its actual payload. The type
is a four-byte identifier that allows decoders to identify its purpose and the structure of its content. A
JPEG XT file may also carry several boxes of identical type. These boxes are logically distinct and differ
in the value of the Box Instance Number En of the JPEG Extensions marker segment, see Figure A.1.
Boxes are embedded into the codestream format by encapsulating them into one or several JPEG XT
marker segments. Since boxes can grow large in size, a single box may extend over multiple JPEG XT
marker segments, and decoders may have to merge multiple marker segments before they can attempt
to decode the box content. JPEG XT Marker segments that belong to the same logical box and require
merging prior to interpretation have identical Box Instance Number fields En, but differ in the
Packet Sequence Number Z.
The JPEG XT marker segment consists of the APP marker that is reserved for this part of ISO/IEC 18477,
the size of the marker segment in bytes (not including the marker), a common identifier identical for all
boxes and box types, the box instance number field, the packet sequence number field, the box length,
the box type and the actual box payload data. The box length field can be extended by a Box Length
Extension field that allows box sizes beyond 2 -1 bytes. Figure A.1 depicts the high-level syntax of a
JPEG XT Marker segment.
f
Figure A.1 — Organization of the JPEG XT marker segment
The meaning of the fields of the JPEG XT Marker segment is as follows:
The Le field is the size of the marker segment, not including the marker. It measures the size from the Le
field up to the end of the marker segment.
NOTE Since boxes may extend over several marker segments, the Le field is typically not derived from the
Box Length field and care must be taken not to confuse the two. The Le field defines the amount of data carried
by a single marker segment; the Box Length is the logical size of the box. If a box extends over multiple JPEG XT
Extension marker segments, the Le field measures the total size of each individual marker segment and may
differ from segment to segment, whereas the Box Length field remains identical in all segments that contribute
to the same logical box.
The Common Identifier is a 16 bit field that allows decoders to identify an APP marker segment as a
JPEG XT marker segment. Its value shall be 0x4A50. It is identical for all boxes and all box types.
The Box Instance Number is a 16-bit field that disambiguates between JPEG XT marker segments
carrying boxes of identical type, but differing content. That is, data that belongs to logically distinct
boxes with the same box type differ in their Box Instance Number. Encoders shall concatenate the
© ISO/IEC 2015 – All rights reserved 11

payload data of those JPEG XT marker segments whose Box Instance Number and Type Identifier fields
are identical in the order of increasing Packet Sequence Numbers.
NOTE A codestream containing multiple boxes of the same box type uses the box sequence number field to
instruct the decoder which JPEG XT Extension marker segments to merge into one box. Refinement coding makes
use of this process: The entropy coded data of each refinement scan is placed into its individual box, using the
box instance number field to disambiguate the scans.
The Packet Sequence Number is a 32-bit field that specifies the order in which payload data shall be
merged. Concatenation proceeds in the order of increasing Packet Sequence Numbers.
The Box Length LBox is a four byte field that specifies the box length. It measures the size of the
payload data of all JPEG Extensions markers of the same box type and enumerator combined, plus the
size of a single copy of the Box Type, plus the size of a single copy of the Box Length, plus the length of a
single copy of the Box Length Extender if present. The box length does not include the size of the packet
sequence number, the box instance number, the common identifier, the marker length or the marker.
NOTE A box having a payload data of 32 bytes will, by this, have a box length of 32+4+4 = 40. If this box is split
evenly over two JPEG XT marker segments, each marker segment will have a Le value of 2+2+2+4+(4+4+16) = 50.
If the size of the box payload is less than 2 -8 bytes, then all fields except the XLBox field, that is:
Le, CI, En, Z, LBox and TBox, shall be present in all JPEG XT marker segment representing this box,
regardless of whether the marker segments starts this box, or continues a box started by a former JPEG
XT Marker segment.
The Box Type TBox is a 32-bit field that specifies the type of the payload data, and thus its syntax. Box
types are specified in Annex B and in subsequent parts of the ISO/IEC 18477 family of standards. Since
ITU | ISO/IEC may add additional box types that define additional meta-information on the image later,
decoders shall disregard box types they do not understand.
If the box length is larger than 2 bytes, the LBox field is no longer sufficient to encode the box length
and the XLBox field is required additionally. In this case, the LBox field shall be one and the XLBox field
carries the box size instead. If the box length is larger than 2 , the XLBox field shall be present in all
JPEG XT marker segments of the same box type and same box Instance Number, and its value shall be
identical in all JPEG XTmarker segments of the same Box Type and same Box Instance Number.
The payload data carries the contents of the box. Its syntax is specified along with the corresponding
box types in this Annex.
Profiles defined in subsequent parts of the ISO/IEC 18477 family of standards add additional constraints
in how payload data may be broken up into individual JPEG XT marker segments.
Table A.2 — JPEG XT marker parameters and sizes
Parameter Size (bits) Value Meaning
APP 16 0xFFEB Identifies all JPEG XT Marker
Segments.
Le 16 8.65535 Length of the marker
segment, including the size
itself, all parameters, and
the size of the payload data
contained in this marker seg-
ment alone. Does not include
the marker itself.
12 © ISO/IEC 2015 – All rights reserved

Table A.2 (continued)
Parameter Size (bits) Value Meaning
CI 16 0x4A50 The special value 0x4A50
(ASCII: ‘J’ ‘P’) allows readers
(ASCII encoding of “JP”)
to distinguish the JPEG
Extensions marker seg-
ment from other uses of the
APP marker. Readers shall
ignore APP markers for the
purpose of decoding JPEG
extensions if this value does
not match.
En 16 1.65535 The Box Instance Number
disambiguates payload data
of the same box type and
defines which payload data
is to be concatenated. Only
payload data whose box type
and enumerator is identical
shall be concatenated.
The value 0 is reserved for
ITU | ISO/IEC purposes.
Z 32 1.2 -1 Packet Sequence number
defining the order in which
the payload data shall be
concatenated. Concatenation
shall proceed in order of
increasing Z values.
The value 0 is reserved for
ITU | ISO/IEC purposes.
LBox 32 1 or 8.2 -1 Box length. This is the total
length of the concatenated
payload data, including a
single copy of the LBox and
Tbox field, and a single copy
of the XLBox field, if present.
The values 0 and two to
seven are reserved for ITU |
ISO/IEC purposes and shall
not be used.
TBox 32 0.2 -1 Box type. The box type
defines the syntax of the
concatenated payload data.
Also, the box type and the
box instance number specify
which payload data to merge.
XLBox 0 or 64 16.2 -1 If the LBox field is one, this
field contains the size of
the concatenated payload
data plus the box overhead
instead.
Otherwise, this field is miss-
ing.
The values 0 to 15 are
reserved for ITU | ISO/IEC
purposes.
© ISO/IEC 2015 – All rights reserved 13

Table A.2 (continued)
Parameter Size (bits) Value Meaning
Payload Data Varies Varies The syntax of the concate-
nated payload data is defined
in Annex B and subse-
quent members of the ISO/
IEC 18477 family of stand-
ards.
NOTE The size of the XLBox field itself also contributes to the box length, hence creating a corner case for
boxes larger than 4GB. If an encoder detects that the value of the LBox field, computed as the sum of the payload
data size and the box overhead, overruns the 4GB boundary LBox is able to express, it is not sufficient to create
an XLBox field and store the sum there. The box size needs to be enlarged by the size of the XLBox field as well,
namely by eight bytes.
A.5 Boxes and superboxes
Some boxes may carry other boxes as payload data. Such boxes are denoted as superboxes. The payload
size of a superbox is given by the sum of the box lengths of all the boxes it contains.
Boxes within superboxes do not consist of JPEG XT marker, neither a marker size, neither a Common
Identifier, neither an Box Instance Number nor a Packet Sequence Number shall be present. They start
with the LBox field. The additional fields are not required since their composition from markers into
boxes is unambiguous.
NOTE The length of a box within a superbox is derived in the same way from the size of the payload data
as for top-level boxes within JPEG XT marker segments. Note that neither top-level boxes nor boxes within
superboxes count the Le, En and Z fields as part of their length. Note further that a box within a superbox may be
a superbox again and may contain further boxes. The layout of such boxes is given by Figure A.2, too.
Figure A.2 — Organization of a box within a superbox
14 © ISO/IEC 20
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...