ISO/IEC 14496-10:2003
(Main)Information technology — Coding of audio-visual objects — Part 10: Advanced Video Coding
Information technology — Coding of audio-visual objects — Part 10: Advanced Video Coding
ISO/IEC 14496-10:2003 was developed jointly with the ITU-T in response to the growing need for higher compression of moving pictures for various applications such as digital storage media, television broadcasting, Internet streaming and real-time audiovisual communication. It is also designed to enable the use of the coded video representation in a flexible manner for a wide variety of network environments. It is designed to be generic in the sense that it serves a wide range of applications, bit rates, resolutions, qualities and services. The use of ISO/IEC 14496-10:2003 allows motion video to be manipulated as a form of computer data and to be stored on various storage media, transmitted and received over existing and future networks and distributed on existing and future broadcasting channels. In the course of creating ISO/IEC 14496-10:2003, various requirements from typical applications have been considered, necessary algorithmic elements have been developed, and these have been integrated into a single syntax. Hence, ISO/IEC 14496-10:2003 will facilitate video data interchange among different applications. The coded representation specified in the syntax is designed to enable a high compression capability for a desired image quality. The algorithm is not lossless, as the exact source sample values are typically not preserved through the encoding and decoding processes. A number of techniques are defined that may be used to achieve highly efficient compression. The expected encoding algorithm (not specified in ISO/IEC 14496-10:2003) selects between inter and intra coding for block-shaped regions of each picture. Inter coding uses motion vectors for block-based inter prediction to exploit temporal statistical dependencies between different pictures. Intra coding uses various spatial prediction modes to exploit spatial statistical dependencies in the source signal for a single picture. Motion vectors and intra prediction modes may be specified for a variety of block sizes in the picture. The prediction residual is then further compressed using a transform to remove spatial correlation inside the transform block before it is quantised, producing an irreversible process that typically discards less important visual information while forming a close approximation to the source samples. Finally, the motion vectors or intra prediction modes are combined with the quantised transform coefficient information and encoded using either variable length codes or arithmetic coding. Annexes A through E contain normative requirements and are an integral part of ISO/IEC 14496-10:2003. Annex A defines three profiles (Baseline, Main and Extended), each being tailored to certain application domains, and defines the levels of capability within each profile. Annex B specifies syntax and semantics of a byte stream format for delivery of coded video as an ordered stream of bytes or bits. Annex C specifies the Hypothetical Reference Decoder and its use to check bitstream and decoder conformance. Annex D specifies syntax and semantics for Supplemental Enhancement Information message payloads. Annex E specifies syntax and semantics of the Video Usability Information parameters of the sequence parameter sets of coded video sequences.
Technologies de l'information — Codage des objets audiovisuels — Partie 10: Codage visuel avancé
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 14496-10
First edition
2003-12-01
Information technology — Coding of
audio-visual objects —
Part 10:
Advanced video coding
Technologies de l'information — Codage des objets audiovisuels —
Partie 10: Codage visuel avancé
Reference number
ISO/IEC 14496-10:2003(E)
©
ISO/IEC 2003
---------------------- Page: 1 ----------------------
ISO/IEC 14496-10:2003(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2003
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2003 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 14496-10:2003(E)
CONTENTS
Foreword. vii
0 Introduction . viii
0.1 Prologue. viii
0.2 Purpose. viii
0.3 Applications. viii
0.4 Profiles and levels . viii
0.5 Overview of the design characteristics.ix
0.6 How to read this specification.x
1 Scope .1
2 Normative references.1
3 Definitions.1
4 Abbreviations .8
5 Conventions.9
5.1 Arithmetic operators.9
5.2 Logical operators.9
5.3 Relational operators.10
5.4 Bit-wise operators.10
5.5 Assignment operators.10
5.6 Range notation.10
5.7 Mathematical functions.10
5.8 Variables, syntax elements, and tables.11
5.9 Text description of logical operations .12
5.10 Processes.13
6 Source, coded, decoded and output data formats, scanning processes, and neighbouring relationships.13
6.1 Bitstream formats.13
6.2 Source, decoded, and output picture formats .14
6.3 Spatial subdivision of pictures and slices.16
6.4 Inverse scanning processes and derivation processes for neighbours .17
7 Syntax and semantics .28
7.1 Method of describing syntax in tabular form .28
7.2 Specification of syntax functions, categories, and descriptors.29
7.3 Syntax in tabular form.30
7.4 Semantics.47
8 Decoding process.81
8.1 NAL unit decoding process.81
8.2 Slice decoding process .82
8.3 Intra prediction process .100
8.4 Inter prediction process .111
8.5 Transform coefficient decoding process and picture construction process prior to deblocking filter
process .133
8.6 Decoding process for P macroblocks in SP slices or SI macroblocks.140
8.7 Deblocking filter process .145
9 Parsing process.155
9.1 Parsing process for Exp-Golomb codes .155
9.2 CAVLC parsing process for transform coefficient levels .158
9.3 CABAC parsing process for slice data.166
Annex A (normative) Profiles and levels .204
A.1 Requirements on video decoder capability.204
A.2 Profiles.204
A.3 Levels.205
Annex B (normative) Byte stream format.212
B.1 Byte stream NAL unit syntax and semantics .212
© ISO/IEC 2003 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 14496-10:2003(E)
B.2 Byte stream NAL unit decoding process . 212
B.3 Decoder byte-alignment recovery (informative). 213
Annex C (normative) Hypothetical reference decoder . 214
C.4 Operation of coded picture buffer (CPB). 216
C.5 Operation of the decoded picture buffer (DPB). 218
C.6 Bitstream conformance. 219
C.7 Decoder conformance. 221
Annex D (normative) Supplemental enhancement information . 224
D.8 SEI payload syntax . 224
D.9 SEI payload semantics. 232
Annex E (normative) Video usability information. 250
E.10 VUI syntax. 250
E.11 VUI semantics. 252
Annex F (informative) Patent Rights . 262
LIST OF FIGURES
Figure 6-1 – Nominal vertical and horizontal locations of 4:2:0 luma and chroma samples in a frame . 15
Figure 6-2 – Nominal vertical and horizontal sampling locations of samples top and bottom fields. 16
Figure 6-3 – A picture with 11 by 9 macroblocks that is partitioned into two slices . 16
Figure 6-4 – Partitioning of the decoded frame into macroblock pairs. . 17
Figure 6-5 – Macroblock partitions, sub-macroblock partitions, macroblock partition scans, and sub-macroblock partition
scans. . 18
Figure 6-6 – Scan for 4x4 luma blocks. 19
Figure 6-7 – Neighbouring macroblocks for a given macroblock. 20
Figure 6-8 – Neighbouring macroblocks for a given macroblock in MBAFF frames. 21
Figure 6-9 – Determination of the neighbouring macroblock, blocks, and partitions (informative) . 22
Figure 7-1 – The structure of an access unit not containing any NAL units with nal_unit_type equal to 0, 7, 8, or in the
range of 12 to 31, inclusive . 52
Figure 8-1 – Intra_4x4 prediction mode directions (informative) . 102
Figure 8-2 –Example for temporal direct-mode motion vector inference (informative) . 121
Figure 8-3 – Directional segmentation prediction (informative) . 122
Figure 8-4 – Integer samples (shaded blocks with upper-case letters) and fractional sample positions (un-shaded blocks
with lower-case letters) for quarter sample luma interpolation. . 127
Figure 8-5 – Fractional sample position dependent variables in chroma interpolation and surrounding integer position
samples A, B, C, and D. . 129
Figure 8-6 – Assignment of the indices of dcY to luma4x4BlkIdx. . 134
Figure 8-7 – Assignment of the indices of dcC to chroma4x4BlkIdx. . 135
Figure 8-8 – a) Zig-zag scan. b) Field scan . 135
Figure 8-9 – Boundaries in a macroblock to be filtered (luma boundaries shown with solid lines and chroma boundaries
shown with dashed lines). 145
Figure 8-10 – Convention for describing samples across a 4x4 block horizontal or vertical boundary . 148
Figure 9-1 – Illustration of CABAC parsing process for a syntax element SE (informative) . 167
Figure 9-2 – Overview of the arithmetic decoding process for a single bin (informative). 193
Figure 9-3 – Flowchart for decoding a decision . 194
Figure 9-4 – Flowchart of renormalization. 196
Figure 9-5 – Flowchart of bypass decoding process. 197
Figure 9-6 – Flowchart of decoding a decision before termination. 198
Figure 9-7 – Flowchart for encoding a decision . 199
Figure 9-8 – Flowchart of renormalization in the encoder . 200
Figure 9-9 – Flowchart of PutBit(B) . 200
Figure 9-10 – Flowchart of encoding bypass. 201
Figure 9-11 – Flowchart of encoding a decision before termination. 202
iv © ISO/IEC 2003 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 14496-10:2003(E)
Figure 9-12 – Flowchart of flushing at termination .202
Figure C-1 – Structure of byte streams and NAL unit streams for HRD conformance checks.214
Figure C-2 – HRD buffer model .215
Figure E-1 – Location of chroma samples for top and bottom fields as a function of chroma_sample_loc_type_top_field
and chroma_sample_loc_type_bottom_field .257
LIST OF TABLES
Table 6-1 – ChromaFormatFactor values.14
Table 6-2 – Specification of input and output assignments for subclauses 6.4.7.1 to 6.4.7.5.21
Table 6-3 – Specification of mbAddrN.25
Table 6-4 - Specification of mbAddrN and yM .27
Table 7-1 – NAL unit type codes.48
Table 7-2 – Meaning of primary_pic_type .58
Table 7-3 – Name association to slice_type.61
Table 7-4 – reordering_of_pic_nums_idc operations for reordering of reference picture lists.66
Table 7-5 – Interpretation of adaptive_ref_pic_marking_mode_flag .68
Table 7-6 – Memory management control operation (memory_management_control_operation) values .68
Table 7-7 – Allowed collective macroblock types for slice_type .70
Table 7-8 – Macroblock types for I slices.71
Table 7-9 – Macroblock type with value 0 for SI slices .72
Table 7-10 – Macroblock type values 0 to 4 for P and SP slices .73
Table 7-11 – Macroblock type values 0 to 22 for B slices.74
Table 7-12 – Specification of CodedBlockPatternChroma values.75
Table 7-13 – Relationship between intra_chroma_pred_mode and spatial prediction modes .76
Table 7-14 – Sub-macroblock types in P macroblocks.77
Table 7-15 – Sub-macroblock types in B macroblocks .78
Table 8-1 – Refined slice group map type .86
Table 8-2 – Specification of Intra4x4PredMode[ luma4x4BlkIdx ] and associated names .101
Table 8-3 – Specification of Intra16x16PredMode and associated names.107
Table 8-4 – Specification of Intra chroma prediction modes and associated names.109
Table 8-5 – Specification of the variable colPic .115
Table 8-6 – Specification of PicCodingStruct( X ) .116
Table 8-7 – Specification of mbAddrCol, yM, and vertMvScale .117
Table 8-8 – Assignment of prediction utilization flags.119
Table 8-9 – Derivation of the vertical component of the chroma vector in field coding mode.124
Table 8-10 – Differential full-sample luma locations .127
Table 8-11 – Assignment of the luma prediction sample predPartLX [ x , y ].129
L L L
Table 8-12 – Specification of mapping of idx to c for zig-zag and field scan .136
ij
Table 8-13 – Specification of QP as a function of qP .136
C I
Table 8-14 – Derivation of indexA and indexB from offset dependent threshold variables α and β .152
Table 8-15 – Value of filter clipping variable t as a function of indexA and bS.153
C0
Table 9-1 – Bit strings with “prefix” and “suffix” bits and assignment to codeNum ranges (informative).155
Table 9-2 – Exp-Golomb bit strings and codeNum in explicit form and used as ue(v) (informative).156
Table 9-3 – Assignment of syntax element to codeNum for signed Exp-Golomb coded syntax elements se(v).156
Table 9-4 – Assignment of codeNum to values of coded_block_pattern for macroblock prediction modes.157
Table 9-5 – coeff_token mapping to TotalCoeff( coeff_token ) and TrailingOnes( coeff_token ).160
Table 9-6 – Codeword table for level_prefix .163
Table 9-7 – total_zeros tables for 4x4 blocks with TotalCoeff( coeff_token ) 1 to 7 .164
Table 9-8 – total_zeros tables for 4x4 blocks with TotalCoeff( coeff_token ) 8 to 15 .164
Table 9-9 – total_zeros tables for chroma DC 2x2 blocks .165
© ISO/IEC 2003 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO/IEC 14496-10:2003(E)
Table 9-10 – Tables for run_before. 165
Table 9-11 – Association of ctxIdx and syntax elements for each slice type in the initialisation process. 168
Table 9-12 – Values of variables m and n for ctxIdx from 0 to 10. 169
Table 9-13 – Values of variables m and n for ctxIdx from 11 to 23. 169
Table 9-14 – Values of variables m and n for ctxIdx from 24 to 39. 169
Table 9-15 – Values of variables m and n for ctxIdx from 40 to 53. 169
Table 9-16 – Values of variables m and n for ctxIdx from 54 to 59. 170
Table 9-17 – Values of variables m and n for ctxIdx from 60 to 69. 170
Table 9-18 – Values of variables m and n for ctxIdx from 70 to 104. 170
Table 9-19 – Values of variables m and n for ctxIdx from 105 to 165. 171
Table 9-20 – Values of variables m and n for ctxIdx from 166 to 226. 172
Table 9-21 – Values of variables m and n for ctxIdx from 227 to 275. 173
Table 9-22 – Values of variables m and n for ctxIdx from 277 to 337. 174
Table 9-23 – Values of variables m and n for ctxIdx from 338 to 398. 175
Table 9-24 – Syntax elements and associated types of binarization, maxBinIdxCtx, and ctxIdxOffset . 177
Table 9-25 – Bin string of the unary binarization (informative). 178
Table 9-26 – Binarization for macroblock types in I slices . 180
Table 9-27 – Binarization for macroblock types in P, SP, and B slices . 181
Table 9-28 – Binarization for sub-macroblock types in P, SP, and B slices. 182
Table 9-29 – Assignment of ctxIdxInc to binIdx for all ctxIdxOffset values except those related to the syntax elements
coded_block_flag, significant_coeff_flag, last_significant_coeff_flag, and coeff_abs_level_minus1 . 184
Table 9-30 – Assignment of ctxIdxBlockCatOffset to ctxBlockCat for syntax elements coded_block_flag,
significant_coeff_flag, last_significant_coeff_flag, and coeff_abs_level_minus1. 185
Table 9-31 – Specification of ctxIdxInc for specific values of ctxIdxOffset and binIdx. 191
Table 9-32 – Specification of ctxBlockCat for the different blocks . 192
Table 9-33 – Specification of rangeTabLPS depending on pStateIdx and qCodIRangeIdx.195
Table 9-34 – State transition table. 196
Table A-1 – Level limits. 207
Table A-2 – Baseline profile level limits. 208
Table A-3 – Main profile level limits . 209
Table A-4 – Extended profile level limits . 209
Table A-5 – Maximum frame rates (frames per second) for some example frame sizes.
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.