ISO/IEC 14496-15:2014
(Main)Information technology — Coding of audio-visual objects — Part 15: Carriage of network abstraction layer (NAL) unit structured video in ISO base media file format
Information technology — Coding of audio-visual objects — Part 15: Carriage of network abstraction layer (NAL) unit structured video in ISO base media file format
ISO/IEC 14496-15:2014 specifies the storage format for streams of video that is structured as Network Abstraction Layer (NAL) Units, such as Advanced Video Coding, AVC (ISO/IEC 14496-10) and High Efficiency Video Coding, HEVC (ISO/IEC 23008-2) video streams.
Technologies de l'information — Codage des objets audiovisuels — Partie 15: Transport de vidéo structuré en unités NAL au format ISO de base pour les fichiers médias
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 14496-15
Third edition
2014‐07‐01
Information technology — Coding of audio-
visual objects —
Part 15:
Carriage of network abstraction layer
(NAL) unit structured video in ISO base
media file format
Technologies de l'information — Codage des objets audiovisuels —
Partie 15: Transport de vidéo structuré en unités NAL au format ISO de base
pour les fichiers médias
Reference number
ISO/IEC 14496‐15:2014(E)
©
ISO/IEC 2014
---------------------- Page: 1 ----------------------
ISO/IEC 14496-15:2014(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2014
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form or by any
means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission.
Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester.
ISO copyright office
Case postale 56 CH‐1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E‐mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2014 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 14496-15:2014(E)
Contents Page
Foreword . v
Introduction . vii
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and abbreviated terms . 1
3.1 Terms and definitions . 1
3.2 Abbreviated terms . 5
4 General Definitions . 6
4.1 Introduction. 6
4.2 Elementary stream structure . 6
4.3 Sample and Configuration definition . 6
4.4 Video Track Structure . 8
4.5 Template fields used . 8
4.6 Visual width and height . 9
4.7 Decoding time (DTS) and composition time (CTS) . 9
4.8 Sync sample (IDR) . 9
4.9 Shadow sync . 10
4.10 Sample groups on random access recovery points and random access points . 10
4.11 Hinting . 10
5 AVC elementary streams and sample definitions . 11
5.1 Introduction. 11
5.2 Elementary stream structure . 11
5.3 Sample and Configuration definition . 14
5.4 Derivation from ISO Base Media File Format . 18
6 SVC elementary stream and sample definitions . 29
6.1 Introduction. 29
6.2 Elementary stream structure . 30
6.3 Use of the plain AVC file format . 31
6.4 Sample and configuration definition . 31
6.5 Derivation from the ISO base media file format . 33
7 MVC elementary stream and sample definitions . 39
7.1 Introduction. 39
© ISO/IEC 2014 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 14496-15:2014(E)
7.2 Overview of MVC Storage . 40
7.3 MVC Track Structure . 41
7.4 Use of the plain AVC File Fo rmat . 42
7.5 Sample and configuration definition . 42
7.6 Derivation from the ISO base media file format . 45
7.7 MVC specific information boxes . 54
8 HEVC elementary streams and sample definitions . 63
8.1 Introduction . 63
8.2 Elementary Stream Structure . 64
8.3 Sample and configuration definition . 64
8.4 Derivation from ISO base media file format . 69
Annex A (normative) In-stream structures specific to SVC and MVC . 76
Annex B (normative) SVC and MVC sample group and sub-track definitions . 81
Annex C (normative) Temporal metadata support . 102
Annex D (normative) File format toolsets . 110
Annex E (normative) Sub-parameters for the MIME type ‘Codecs’ parameter . 112
Annex F (Informative) Patent Statements . 114
iv © ISO/IEC 2014 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 14496-15:2014(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non‐governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75 % of the national bodies
casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 14496‐15 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
This third edition cancels and replaces the second edition (ISO/IEC 14496‐15:2010), which has been
technically revised. It also incorporates the Amendment ISO/IEC 14496‐15:2010/Amd.1:2011 and the
Technical Corrigenda ISO/IEC 14496‐15:2010/Cor.1:2011 and ISO/IEC 14496‐15:2010/Cor.2:2012.
ISO/IEC 14496 consists of the following parts, under the general title Information technology — Coding
of audio-visual objects:
Part 1: Systems
Part 2: Visual
Part 3: Audio
Part 4: Conformance testing
Part 5: Reference software
Part 6: Delivery Multimedia Integration Framework (DMIF)
Part 7: Optimized reference software for coding of audio-visual objects [Technical Report]
Part 8: Carriage of ISO/IEC 14496 contents over IP networks
Part 9: Reference hardware description [Technical Report]
Part 10: Advanced Video Coding
© ISO/IEC 2014 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO/IEC 14496-15:2014(E)
Part 11: Scene description and application engine
Part 12: ISO base media file format
Part 13: Intellectual Property Management and Protection (IPMP) extensions
Part 14: MP4 file format
Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file
format
Part 16: Animation Framework eXtension (AFX)
Part 17: Streaming text format
Part 18: Font compression and streaming
Part 19: Synthesized texture stream
Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format
(SAF)
Part 21: MPEG-J Graphics Framework eXtension (GFX)
Part 22: Open Font Format
Part 23: Symbolic Music Representation
Part 24: Audio and systems interaction
Part 25: 3D Graphics Compression Model
Part 26: Audio conformance
Part 27: 3D Graphics conformance
Part 28: Composite font representation
vi © ISO/IEC 2014 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/IEC 14496-15:2014(E)
Introduction
This part of ISO/IEC 14496 defines a storage format based on, and compatible with, the ISO Base Media
File Format (ISO/IEC 14496‐12 and ISO/IEC 15444‐12), which is used by the MP4 file format
(ISO/IEC 14496‐14) and the Motion JPEG 2000 file format (ISO/IEC 15444‐3) among others. This part
of ISO/IEC 14496 enables video streams formatted as Network Adaptation Layer Units (NAL Units) to
be used in conjunction with other media streams, such as audio,
be used in an MPEG‐4 systems environment, if desired,
be formatted for delivery by a streaming server, using hint tracks, and
inherit all the use cases and features of the ISO Base Media File Format on which MP4 and MJ2 are
based.
This part of ISO/IEC 14496 may be used as a standalone specification; it specifies how NAL unit
structured video content shall be stored in an ISO Base Media File Format compliant format. However, it
is normally used in the context of a specification, such as the MP4 file format, derived from the ISO Base
Media File Format, that permits the use of NAL unit structured video such as AVC (ISO/IEC 14496‐10)
and video and High Efficiency Video Coding (HEVC, ISO/IEC 23008‐2) video.
The ISO Base Media File Format is becoming increasingly common as a general‐purpose media
container format for the exchange of digital media, and its use in this context should accelerate both
adoption and interoperability.
The International Organization for Standardization (ISO) and International Electrotechnical
Commission (IEC) draw attention to the fact that it is claimed that compliance with this document may
involve the use of a patent.
The ISO and IEC take no position concerning the evidence, validity and scope of this patent right.
The holder of this patent right has assured the ISO and IEC that he is willing to negotiate licences under
reasonable and non‐discriminatory terms and conditions with applicants throughout the world. In this
respect, the statement of the holder of this patent right is registered with the ISO and IEC. Information
may be obtained from the companies listed in Annex F.
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights other than those identified in Annex F. ISO and IEC shall not be held responsible for
identifying any or all such patent rights.
© ISO/IEC 2014 – All rights reserved vii
---------------------- Page: 7 ----------------------
INTERNATIONAL STANDARD ISO/IEC 14496-15:2014(E)
Information technology — Coding of audio-visual objects —
Part 15:
Carriage of network abstraction layer (NAL) unit structured video
in the ISO base media file format
1 Scope
This part of ISO/IEC 14496 specifies the storage format for streams of video that is structured as NAL
Units, such as AVC (ISO/IEC 14496‐10) and HEVC (ISO/IEC 23008‐2) video streams.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 14496‐10, Information technology — Coding of audio-visual objects — Part 10: Advanced Video
Coding
ISO/IEC 14496‐12, Information technology — Coding of audio-visual objects — Part 12: ISO base media
1)
file format
ISO/IEC 23008‐2, Information technology — High efficiency coding and media delivery in heterogeneous
environments — Part 2: High efficiency video coding
3 Terms, definitions and abbreviated terms
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 14496‐10 or
ISO/IEC 23008‐2, and the following apply.
3.1.1
aggregator
in‐stream structure using a NAL unit header
NOTE Aggregators are used to group NAL units belonging to the same sample.
1
) ISO/IEC 14496‐12 is technically identical to ISO/IEC 15444‐12.
© ISO/IEC 2014 – All rights reserved 1
---------------------- Page: 8 ----------------------
ISO/IEC 14496-15:2014(E)
3.1.2
AVC base layer
maximum subset of a bitstream that is AVC compatible (i.e. a bitstream not using any of the
functionality of ISO/IEC 14496‐10 Annex G or Annex H)
NOTE 1 The AVC base layer is represented by AVC VCL NAL units and associated non‐VCL NAL units.
NOTE 2 The AVC base layer itself can be a temporal scalable bitstream.
3.1.3
AVC NAL unit
AVC VCL NAL unit and its associated non‐VCL NAL units in a bitstream
3.1.4
AVC VCL NAL unit
NAL unit with type 1 to 5 (inclusive) as specified in ISO/IEC 14496‐10
3.1.5
extraction path
set of operations on the original bitstream, each yielding a subset bitstream, ordered such that the
complete bitstream is first in the set, and the base layer is last, and all the bitstreams are in decreasing
complexity (along one of the scalability axes, such as resolution), and where every bitstream is a valid
operating point
NOTE An extraction path may be represented by the values of priority_id in the NAL unit headers. Alternatively an
extraction path can be represented by the run of tiers or by a set of hierarchically dependent tracks.
3.1.6
extractor
in‐stream structure using a NAL unit header including a NAL unit header extension
NOTE Extractors contain instructions on how to extract data from other tracks. Logically an Extractor can be seen as a
‘link’. While accessing a track containing Extractors, the Extractor is replaced by the data it is referencing.
3.1.7
in-stream structure
structure residing within sample data
3.1.8
MVC VCL NAL unit
NAL unit with type 20, and NAL units with type 14, as specified in ISO/IEC 14496‐10, when the
immediately following NAL units are AVC VCL NAL units.
NOTE MVC VCL NAL units do not affect the decoding process of a legacy AVC decoder.
3.1.9
operating point
subset of a scalable bitstream, representing in SVC a particular spatial resolution, temporal resolution,
and quality, or in MVC a set of target output views
NOTE 1 Each operating point consists of all the data needed to decode this particular bitstream subset.
NOTE 2 In an SVC stream an operating point can be represented either by (i) specific values of DTQ (dependency_id,
temporal_id and quality_id) or (ii) specific values of P (priority_id) or (iii) combinations of them (e.g. PDTQ). Note that the
usage of priority_id is defined by the application. In an SVC file a track represents one or more operating points. Within a track
tiers may be used to define multiple operating points.
2 © ISO/IEC 2014 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/IEC 14496-15:2014(E)
NOTE 3 The bitstream subset of an MVC operating point represents a particular set of target output views at a particular
temporal resolution, and consists of all the data needed to decode this particular bitstream subset.
NOTE 4 An operating point is referred to as an operation point in Annex H of ISO/IEC 14496‐10 or in ISO/IEC 23008‐2.
3.1.10
parameter set
video parameter set, sequence parameter set, or picture parameter set, as defined in the applicable
video standard (e.g. ISO/IEC 14496‐10 or ISO/IEC 23008‐2)
NOTE This term is used to refer to all types of parameter sets.
3.1.11
parameter set elementary stream
elementary stream containing samples made up of only sequence and picture parameter set NAL units
synchronized with the video elementary stream
3.1.12
prefix NAL unit
NAL units with type 14 as specified in ISO/IEC 14496‐10
NOTE Prefix NAL units provide scalability information about AVC VCL NAL units and filler data NAL units. Prefix NAL
units do not affect the decoding process of a legacy AVC decoder. The behaviour of a legacy AVC file reader as a response to
prefix NAL units is undefined.
3.1.13
scalable layer; layer
set of VCL NAL units with the same values of dependency_id, quality_id, and temporal_id, and the
associated non‐VCL NAL units as specified in ISO/IEC 14496‐10.
NOTE 1 A scalable layer with any of dependency_id, quality_id, and temporal_id not equal to 0 enhances the video by one
or more scalability levels in at least one direction (temporal, quality or spatial resolution)
NOTE 2 SVC uses a “layered” encoder design which results in a bitstream representing “coding layers”. In some
publications the ‘base layer’ is the first quality layer of a specific coding layer. In some publications the base layer is the
scalable layer with the lowest priority. The SVC file format uses “scalable layer” or “layer” in a general way for describing
nested bitstreams (using terms like AVC base layer or SVC enhancement layer).
3.1.14
scalable layer representation
bitstream subset that is required for decoding the scalable layer, consisting of the scalable layer itself
and all the scalable layers on which the scalable layer depends
NOTE A scalable layer representation is also referred to as the representation of the scalable layer.
3.1.15
sub-picture
proper subset of coded slices of a layer representation
3.1.16
sub-picture tier
tier that consists of sub‐pictures
NOTE Any coded slice that is not included in the tier representation of a sub‐picture tier is not to be referred to in inter
prediction or inter‐layer prediction for decoding of the sub‐picture tier.
© ISO/IEC 2014 – All rights reserved 3
---------------------- Page: 10 ----------------------
ISO/IEC 14496-15:2014(E)
3.1.17
SVC enhancement layer
layer that specifies a part of a scalable bitstream that enhances the video
NOTE 1 An SVC enhancement layer is represented by SVC VCL NAL units and the associated non‐VCL NAL units and SEI
messages.
NOTE 2 Usually an SVC enhancement layer represents a spatial or coarse‐grain scalability (CGS) coding layer (identified by
a specific value of dependency_id).
3.1.18
SVC NAL unit
SVC VCL NAL unit and its associated non‐VCL NAL units in an SVC stream
3.1.19
SVC stream
bitstream represented by the operating point for which dependency_id is equal to mDid, temporal_id is
the greatest temporal_id value among mOpSet, and quality_id is the greatest quality_id value among
mOpSet, where the greatest value of dependency_id of all the operating points represented by DTQ
(dependency_id, temporal_id and quality_id) combinations is equal to mDid, and the set of all the
operating points with dependency_id equal to mDid is mOpSet.
NOTE The term “SVC stream” is referenced by ‘decoding/accessing the entire stream’ in this document. There may be
NAL units which are not required for decoding this operating point.
3.1.20
SVC VCL NAL unit
NAL unit with type 20, and NAL units with type 14 when the immediately following NAL units are AVC
VCL NAL units
NOTE SVC VCL NAL units do not affect the decoding process of a legacy AVC decoder.
3.1.21
temporal layer representation
representation of a temporal layer
temporal layer and all lower temporal layers
3.1.22
tier
set of operating points within a track, providing information about the operating points and
instructions on how to access the corresponding bitstream portions (using maps and groups)
NOTE 1 A tier represents one or more scalable layers of an SVC bitstream.
NOTE 2 The term “tier” is used to avoid confusion with the frequently used term layer. A tier represents a subset of a track
and represents an operating point of an SVC bitstream. Tiers in a track subset the entire track, no matter whether the track
references another track by extractors.
NOTE 3 An MVC tier represents a particular set of temporal subsets of a particular set of views.
3.1.23
tier representation; representation of the tier
bitstream subset that is required for decoding the tier, consisting of the tier itself and all the tiers on
which the tier depends
4 © ISO/IEC 2014 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/IEC 14496-15:2014(E)
3.1.24
video elementary stream
elementary stream containing access units made up of NAL units for coded picture data
3.1.25
virtual base view
AVC compatible representation of an independently coded non‐base view
NOTE The virtual base view of an independently coded non‐base view is created according to the process specified in
H.8.5.5 of ISO/IEC 14496‐10. Samples containing data units of an independently coded non‐base view and samples of the
virtual base view are aligned by decoding times.
3.2 Abbreviated terms
AVC Advanced Video Coding. Where contrasted with SVC or MVC in this International Standard,
this term refers to the main part of ISO/IEC 14496‐10, including neither Annex G (Scalable
Video Coding) nor Annex H (Multiview Video Coding)
BLA Broken Link Access
CRA Clean Random Access
CTU Coding Tree Unit
HEVC High Efficiency Video Coding
FF File Format
HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh
MVC MultiviewVideo Coding [refers to ISO/IEC 14496‐10 when the techniques in Annex H
(Multiview Video Coding) are in use]
NAL Network Abstraction Layer
PPS Picture Parameter Set
ROI Region‐Of‐Interest
SEI Supplementary Enhancement Information
SPS Sequence Parameter Set
STSA Step‐wise Temporal Sub‐layer Access
SVC Scalable Video Coding [refers to ISO/IEC 14496‐10 when the techniques in Annex G
(Scalable Video Coding) are in use]
TSA Temporal Sub‐layer Access
VCL Video Coding Layer
VPS Video Parameter Set
© ISO/IEC 2014 – All rights reserved 5
---------------------- Page: 12 ----------------------
ISO/IEC 14496-15:2014(E)
4 General Definitions
4.1 Introduction
The specifications in this clause apply to all coding systems identified by chapters in this specification,
unless specifically over‐ridden by definitions in the clause for a specific coding system.
The following table summarizes the correspondences between the sets of terminology used in video
specifications and the ISO Base Media File Format.
Table 1 – Correspondence of terms in video and ISO Base Media File Format
Video ISO Base Media File
Format
‐ Movie
Bitstream Track
Access Unit Sample
4.2 Elementary stream structure
This specification concerns video coding systems that specify a set of Network Abstraction Layer (NAL)
units, which contain different types of data. This subclause specifies the format of the elementary
streams for storing such content.
4.3 Sample and Configuration definition
4.3.1 Introduction
Sample: A sample is an access unit as defined in the appropriate specification.
Parameter set sample: A parameter set sample is a sample in a parameter set stream which shall consist
of those parameter set NAL units that are to be considered as if present in the video elementary stream
at the same instant in time.
4.3.2 Canonical order and restrictions
The elementary stream is stored in the ISO Base Media File Format in a canonical format. The canonical
format is as neutral as possible so that systems that need to customize the stream for delivery over
different transport protocols — MPEG‐2 Systems, RTP, and so on — should not have to remove
information from the stream while being free to add to the stream. Furthermore, a canonical format
allows such operations to be performed against a known initial state.
The canonical stream format is an elementary stream that satisfies the following conditions:
Video data NAL units: All video data NAL units for a single picture shall be contained with the
sample whose decoding time and composition time are those of the picture. Each sample shall
contain at least one video data NAL unit of the primary picture.
6 © ISO/IEC 2014 – All rights reserved
---------------------- Page: 13 ----------------------
ISO/IEC 14496-15:2014(E)
SEI NAL units: All SEI NAL units shall be contained in the parameter set arrays, or in the sample
whose decoding time is at the time, or immediately precedes the time (with no intervening
samples), when the SEI messages come into effect instantaneously. In general, SEI messages for a
picture shall be included in the sample containing that picture and that SEI messages pertaining
to a sequence of pictures shall be included in the sample containing the first picture of the
sequence to which the SEI message pertains. The order of SEI messages within a sample is as
defined in the applicable video coding standard.
The sequence of NAL units in an elementary stream and within a single sample must be in a valid
decoding order for those NAL units as specified in the applicable video coding standard.
All timing information is external to stream. Picture Timing SEI messages that define
presentation or composition timestamps may be included in the video elementary stream, as
these messages contain other information than timing, and may be required for conformance
checking. However, all timing information is provided by the information stored in the various
sample metadata tables, and this information over‐rides any timing provided in the video layer.
Timing provided within the video stream in this file format should be ignored as it may
contradict the timing provided by the file format and may not be correct or consistent within
itself.
NOTE This constraint is imposed due to the fact tha
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.