Information technology — Coding of audio-visual objects — Part 29: Web video coding

ISO/IEC 14496-29:2014 specifies web video coding for coding of audio-visual objects.

Technologies de l'information — Codage des objets audiovisuels — Partie 29: Codage vidéo Web

General Information

Status
Published
Publication Date
29-Mar-2015
Current Stage
9093 - International Standard confirmed
Completion Date
24-Jun-2021
Ref Project

Buy Standard

Standard
ISO/IEC 14496-29:2015 - Information technology -- Coding of audio-visual objects
English language
188 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 14496-29
First edition
2015-04-01

Information technology — Coding of
audio-visual objects —
Part 29:
Web video coding
Technologies de l'information — Codage des objets audiovisuels —
Partie 29: Codage vidéo Web





Reference number
ISO/IEC 14496-29:2015(E)
©
ISO/IEC 2015

---------------------- Page: 1 ----------------------
ISO/IEC 14496-29:2015(E)
COPYRIGHT PROTECTED DOCUMENT
©  ISO/IEC 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any
means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission.
Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2015 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 14496-29:2015(E)
Contents Page
1 Scope .1
2 Normative references .1
3 Definitions.1
4 Abbreviations .7
5 Conventions .8
5.1 Arithmetic operators . 8
5.2 Logical operators . 8
5.3 Relational operators . 8
5.4 Bit-wise operators . 9
5.5 Assignment operators . 9
5.6 Range notation . 9
5.7 Mathematical functions . 9
5.8 Order of operation precedence. 10
5.9 Variables, syntax elements, and tables . 11
5.10 Text description of logical operations . 12
5.11 Processes . 13
6 Source, coded, decoded and output data formats, scanning processes, and neighbouring relationships . 13
6.1 Bitstream formats . 13
6.2 Source, decoded, and output picture formats . 14
6.3 Spatial subdivision of pictures and slices . 15
6.4 Inverse scanning processes and derivation processes for neighbours . 16
7 Syntax and semantics . 26
7.1 Normative Syntax and Semantics . 26
7.2 Specification of syntax functions, categories, and descriptors . 28
7.3 Syntax in tabular form. 30
7.4 Semantics . 42
8 Decoding process . 70
8.1 NAL unit decoding process . 71
8.2 Slice decoding process . 72
8.3 Intra prediction process . 82
8.4 Inter prediction process . 95
8.5 Transform coefficient decoding process and picture construction process prior to deblocking filter process . 107
8.6 (void) . 118
8.7 Deblocking filter process . 118
9 Parsing process . 126
9.1 Parsing process for Exp-Golomb codes . 127
9.2 CAVLC parsing process for transform coefficient levels. 131
Annex A (normative) Profiles and levels . 142
A.1 Requirements on video decoder capability . 142
A.2 Profiles . 142
A.3 Levels . 143
Annex B (normative) Byte stream format. 155
B.1 Byte stream NAL unit syntax and semantics . 155
B.2 Byte stream NAL unit decoding process . 156
B.3 Decoder byte-alignment recovery (informative) . 156
© ISO/IEC 2014 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 14496-29:2015(E)
Annex C (normative) Hypothetical reference decoder . 158
C.1 Operation of coded picture buffer (CPB) . 161
C.2 Operation of the decoded picture buffer (DPB) . 163
C.3 Bitstream conformance . 165
C.4 Decoder conformance . 166
Annex D (normative) Supplemental enhancement information . 170
Annex E (normative) Video usability information . 171
E.1 VUI syntax . 172
E.2 VUI semantics . 173
iv © ISO/IEC 2015 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 14496-29:2015(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other
international organizations, governmental and non-governmental, in liaison with ISO and IEC, also
take part in the work. In the field of information technology, ISO and IEC have established a joint
technical committee, ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does
not constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO's adherence to the WTO principles in the Technical
Barriers to Trade (TBT) see the following URL: Foreword - Supplementary information
The committee responsible for this document is ISO/IEC JTC 1, Information technology, SC 29,
Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC 14496 consists of the following parts, under the general title Information technology — Coding of
audio-visual objects:
— Part 1: Systems
— Part 2: Visual
— Part 3: Audio
— Part 4: Conformance testing
— Part 5: Reference software
— Part 6: Delivery Multimedia Integration Framework (DMIF)
— Part 7: Optimized reference software for coding of audio-visual objects
— Part 8: Carriage of ISO/IEC 14496 contents over IP networks
— Part 9: Reference hardware description
© ISO/IEC 2015 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC 14496-29:2015(E)
— Part 10: Advanced Video Coding
— Part 11: Scene description and application engine
— Part 12: ISO base media file format
— Part 13: Intellectual Property Management and Protection (IPMP) extensions
— Part 14: MP4 file format
— Part 15: Advanced Video Coding (AVC) file format
— Part 16: Animation Framework eXtension (AFX)
— Part 17: Streaming text format
— Part 18: Font compression and streaming
— Part 19: Synthesized texture stream
— Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
— Part 21: MPEG-J Graphics Framework eXtensions (GFX)
— Part 22: Open Font Format
— Part 23: Symbolic Music Representation
— Part 24: Audio and systems interaction
— Part 25: 3D Graphics Compression Model
— Part 26: Audio conformance
— Part 27: 3D Graphics conformance
— Part 28: Composite font representation
— Part 29: Web video coding
vi © ISO/IEC 2015 – All rights reserved

---------------------- Page: 6 ----------------------
ISO/IEC 14496-29:2015(E)
Introduction
This International Standard specifies Web Video Coding, a technology that is compatible with the Constrained
Baseline Profile of ISO/IEC 14996-10. Only the subset that is specified in Annex A for the Constrained Baseline
Profile is a normative specification, while all remaining aspects are informative. This text is derived from ISO/IEC
14996-10, with which the section numbers in this specification are aligned, and that specification may additionally
be consulted if desired, as an aid to understanding this Specification.
© ISO/IEC 2015 – All rights reserved vii

---------------------- Page: 7 ----------------------
ISO/IEC 14496-29:2015(E)
INTERNATIONAL STANDARD
Information technology — Coding of audio-visual objects —
Part 29: Web video coding
1 Scope
This Part of ISO/IEC 14496 specifies Web Video Coding for coding of audio-visual objects.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated references, only the
edition cited applies. For undated references, the latest edition of the referenced document (including any amendments)
applies.
– ISO 11664-1, Colorimetry — Part 1: CIE standard colorimetric observers.
– ISO/IEC 14496-10: Information technology – Coding of audio-visual objects – Part 10: Advanced Video
Coding
3 Definitions
For the purposes of this document, the following definitions apply:
3.1 access unit: A set of NAL units that are consecutive in decoding order and contain exactly one primary coded
picture. In addition to the primary coded picture, an access unit may also contain one auxiliary coded picture, or
other NAL units not containing slices of a coded picture. The decoding of an access unit always results in a decoded
picture.
3.2 AC transform coefficient: Any transform coefficient for which the frequency index in one or both dimensions is
non-zero.
3.3 bitstream: A sequence of bits that forms the representation of coded pictures and associated data forming one or
more coded video sequences. Bitstream is a collective term used to refer either to a NAL unit stream or a byte
stream.
3.4 block: An MxN (M-column by N-row) array of samples, or an MxN array of transform coefficients.
3.5 [void]
3.6 broken link: A location in a bitstream at which it is indicated that some subsequent pictures in decoding order may
contain serious visual artefacts due to unspecified operations performed in the generation of the bitstream.
3.7 byte: A sequence of 8 bits, written and read with the most significant bit on the left and the least significant bit on
the right. When represented in a sequence of data bits, the most significant bit of a byte is first.
3.8 byte-aligned: A position in a bitstream is byte-aligned when the position is an integer multiple of 8 bits from the
position of the first bit in the bitstream. A bit or byte or syntax element is said to be byte-aligned when the position
at which it appears in a bitstream is byte-aligned.
© ISO/IEC 2015 – All rights reserved 1

---------------------- Page: 8 ----------------------
ISO/IEC 14496-29:2015(E)
3.9 byte stream: An encapsulation of a NAL unit stream containing start code prefixes and NAL units as specified in
Annex B.
3.10 can: A term used to refer to behaviour that is allowed, but not necessarily required.
3.11 [void]
3.12 chroma: An adjective specifying that a sample array or single sample is representing one of the two colour
difference signals related to the primary colours. The symbols used for a chroma array or sample are Cb and Cr.
NOTE – The term chroma is used rather than the term chrominance in order to avoid the implication of the use of linear
light transfer characteristics that is often associated with the term chrominance.
3.13 coded frame: A coded representation of a frame.
3.14 coded picture: A coded representation of a picture.
3.15 coded picture buffer (CPB): A first-in first-out buffer containing access units in decoding order specified in the
hypothetical reference decoder in Annex C.
3.16 coded representation: A data element as represented in its coded form.
3.17 [void]
3.18 coded slice NAL unit: A NAL unit containing a slice that is not a slice of an auxiliary coded picture.
3.19 coded video sequence: A sequence of access units that consists, in decoding order, of an IDR access unit followed
by zero or more non-IDR accessunits including all subsequent access units up to but not including any subsequent
IDR access unit.
3.20 component: An array or single sample from one of the three arrays (luma and two chroma) that make up a frame in
4:2:0 colour format.
3.21 DC transform coefficient: A transform coefficient for which the frequency index is zero in all dimensions.
3.22 decoded picture: A decoded picture is derived by decoding a coded picture. A decoded picture is a decoded frame.
3.23 decoded picture buffer (DPB): A buffer holding decoded pictures for reference, output reordering, or output delay
specified for the hypothetical reference decoder in Annex C.
3.24 decoder: An embodiment of a decoding process.
3.25 decoder under test (DUT): A decoder that is tested for conformance to this International Standard by operating
the hypothetical stream scheduler to deliver a conforming bitstream to the decoder and to the hypothetical
reference decoder and comparing the values and timing of the output of the two decoders.
3.26 decoding order: The order in which syntax elements are processed by the decoding process.
3.27 decoding process: The process specified in this International Standard that reads a bitstream and derives decoded
pictures from it.
3.28 [void]
3.29 display process: A process not specified in this International Standard having, as its input, the cropped decoded
pictures that are the output of the decoding process.
3.30 emulation prevention byte: A byte equal to 0x03 that may be present within a NAL unit. The presence of
emulation prevention bytes ensures that no sequence of consecutive byte-aligned bytes in the NAL unit contains a
start code prefix.
3.31 encoder: An embodiment of an encoding process.
3.32 encoding process: A process, not specified in this International Standard, that produces a bitstream conforming to
this International Standard.
2 © ISO/IEC 2015 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 14496-29:2015(E)
3.33 flag: A variable that can take one of the two possible values 0 and 1.
3.34 frame: A frame contains an array of luma samples and two corresponding arrays of chroma samples in 4:2:0
format.
3.35 frame macroblock: A macroblock representing samples of a coded frame. All macroblocks of a coded frame are
frame macroblocks.
3.36 [void]
3.37 frequency index: A one-dimensional or two-dimensional index associated with a transform coefficient prior to an
inverse transform part of the decoding process.
3.38 hypothetical reference decoder (HRD): A hypothetical decoder model that specifies constraints on the variability
of conforming NAL unit streams or conforming byte streams that an encoding process may produce.
3.39 hypothetical stream scheduler (HSS): A hypothetical delivery mechanism for the timing and data flow of the
input of a bitstream into the hypothetical reference decoder. The HSS is used for checking the conformance of a
bitstream or a decoder.
3.40 I slice: A slice that is decoded using intra prediction only.
3.41 informative: A term used to refer to content provided in this International Standard that is not an integral part of
this International Standard. Informative content does not establish any mandatory requirements for conformance to
this International Standard.
3.42 instantaneous decoding refresh (IDR) access unit: An access unit in which the primary coded picture is an IDR
picture.
3.43 instantaneous decoding refresh (IDR) picture: A coded picture for which the variable IdrPicFlag is equal to 1.
An IDR picture causes the decoding process to mark all reference pictures as "unused for reference" immediately
after the decoding of the IDR picture. All coded pictures that follow an IDR picture in decoding order can be
decoded without inter prediction from any picture that precedes the IDR picture in decoding order. The first
picture of each coded video sequence in decoding order is an IDR picture.
3.44 inter coding: Coding of a block, macroblock, slice, or picture that uses inter prediction.
3.45 inter prediction: A prediction derived from decoded samples of reference pictures other than the current decoded
picture.
3.46 interpretation sample value: A possibly-altered value corresponding to a decoded sample value of an auxiliary
coded picture that may be generated for use in the display process. Interpretation sample values are not used in the
decoding process and have no normative effect on the decoding process.
3.47 intra coding: Coding of a block, macroblock, slice, or picture that uses intra prediction.
3.48 intra prediction: A prediction derived from the decoded samples of the same decoded slice.
3.49 intra slice: See I slice.
3.50 inverse transform: A part of the decoding process by which a set of transform coefficients are converted into
spatial-domain values, or by which a set of transform coefficients are converted into DC transform coefficients.
3.51 layer: One of a set of syntactical structures in a non-branching hierarchical relationship. Higher layers contain
lower layers. The coding layers are the coded video sequence, picture, slice, and macroblock layers.
3.52 level: A defined set of constraints on the values that may be taken by the syntax elements and variables of this
International Standard. The same set of levels is defined for all profiles, with most aspects of the definition of each
level being in common across different profiles. Individual implementations may, within specified constraints,
support a different level for each supported profile. In a different context, a level is the value of a transform
coefficient prior to scaling (see the definition of transform coefficient level).
3.53 list: A one-dimensional array of syntax elements or variables.
© ISO/IEC 2015 – All rights reserved 3

---------------------- Page: 10 ----------------------
ISO/IEC 14496-29:2015(E)
3.54 luma: An adjective specifying that a sample array or single sample is representing the monochrome signal related
to the primary colours. The symbol or subscript used for luma is Y or L.
NOTE – The term luma is used rather than the term luminance in order to avoid the implication of the use of linear light
transfer characteristics that is often associated with the term luminance. The symbol L is sometimes used instead of the
symbol Y to avoid confusion with the symbol y as used for vertical location.
3.55 macroblock: A 16x16 block of luma samples and two corresponding blocks of chroma samples of a picture that
has three sample arrays, or a 16x16 block of samples of a monochrome picture or a picture that is coded using three
separate colour planes. The division of a slice into macroblocks is a partitioning.
3.56 macroblock address: a macroblock address is the index of a macroblock in a macroblock raster scan of the picture
starting with zero for the top-left macroblock in a picture.
3.57 macroblock location: The two-dimensional coordinates of a macroblock in a picture denoted by ( x, y ). For the
top left macroblock of the picture ( x, y ) is equal to ( 0, 0 ). x is incremented by 1 for each macroblock column
from left to right. y is incremented by 1 for each macroblock row from top to bottom.
3.58 macroblock partition: A block of luma samples and two corresponding blocks of chroma samples resulting from a
partitioning of a macroblock for inter prediction for a picture that has three sample arrays or a block of luma
samples resulting from a partitioning of a macroblock for inter prediction for a monochrome picture or a picture
that is coded using three separate colour planes.
3.59 matrix: A two-dimensional array of syntax elements or variables.
3.60 may: A term used to refer to behaviour that is allowed, but not necessarily required. In some places where the
optional nature of the described behaviour is intended to be emphasized, the phrase "may or may not" is used to
provide emphasis.
3.61 memory management control operation: Seven operations that control reference picture marking.
3.62 motion vector: A two-dimensional vector used for inter prediction that provides an offset from the coordinates in
the decoded picture to the coordinates in a reference picture.
3.63 must: A term used in expressing an observation about a requirement or an implication of a requirement that is
specified elsewhere in this International Standard. This term is used exclusively in an informative context.
3.64 NAL unit: A syntax structure containing an indication of the type of data to follow and bytes containing that data
in the form of an RBSP interspersed as necessary with emulation prevention bytes.
3.65 NAL unit stream: A sequence of NAL units.
3.66 non-reference frame: A frame coded with nal_ref_idc equal to 0.
3.67 non-reference picture: A picture coded with nal_ref_idc equal to 0. A non-reference picture is not used for inter
prediction of any other pictures.
3.68 note: A term used to prefix informative remarks. This term is used exclusively in an informative context.
3.69 output order: The order in which the decoded pictures are output from the decoded picture buffer.
3.70 P slice: A slice that may be decoded using intraprediction or inter prediction using at most one motion vector and
reference index to predict the sample values of each block.
3.71 parameter: A syntax element of a sequence parameter set or a picture parameter set. Parameter is also used as part
of the defined term quantisation parameter.
3.72 partitioning: The division of a set into subsets such that each element of the set is in exactly one of the subsets.
3.73 picture: A collective term for a frame.
3.74 picture parameter set: A syntax structure containing syntax elements that apply to zero or more entire coded
pictures as determined by the pic_parameter_set_id syntax element found in each slice header.
4 © ISO/IEC 2015 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 14496-29:2015(E)
3.75 picture order count: A variable that is associated with each coded picture and has a value that is non-decreasing
with increasing picture position in output order relative to the first output picture of the previous IDR picture in
decoding order or relative to the previous picture, in decoding order, that contains a memory management control
operation that marks all reference pictures as "unused for reference".
3.76 prediction: An embodiment of the prediction process.
3.77 prediction process: The use of a predictor to provide an estimate of the sample value or data element currently
being decoded.
3.78 predictive slice: See P slice.
3.79 predictor: A combination of specified values or previously decoded sample values or data elements used in the
decoding process of subsequent sample values or data elements.
3.80 primary coded picture: The coded representation of a picture to be used by the decoding process for a bitstream
conforming to this International Standard. The primary coded picture contains all macroblocks of the picture. The
only pictures that have a normative effect on the decoding process are primary coded pictures. e.
3.81 profile: A specified subset of the syntax of this International Standard.
3.82 quantisation parameter: A variable used by the decoding process for scaling of transform coefficient levels.
3.83 random access: The act of starting the decoding process for a bitstream at a point other than the beginning of the
stream.
3.84 raster scan: A mapping of a rectangular two-dimensional pattern to a one-dimensional pattern such that the first
entries in the one-dimensional pattern are from the first top row of the two-dimensional pattern scanned from left to
right, followed similarly by the second, third, etc., rows of the pattern (going down) each scanned from left to right.
3.85 raw byte sequence payload (RBSP): A syntax structure
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.