Information technology - Coding of audio-visual objects - Part 12: ISO base media file format - Amendment 1: File format extensions and guidelines

Technologies de l'information — Codage des objets audiovisuels — Partie 12: Format ISO de base pour les fichiers médias — Amendement 1: Extensions de format de fichier et lignes directrices

General Information

Status
Withdrawn
Current Stage
5098 - Project deleted
Start Date
06-Dec-2004
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 14496-12:2004/FDAM 1 - File format extensions and guidelines
English language
34 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 14496-12:2004/FDAM 1 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Coding of audio-visual objects - Part 12: ISO base media file format - Amendment 1: File format extensions and guidelines". This standard covers: Information technology - Coding of audio-visual objects - Part 12: ISO base media file format - Amendment 1: File format extensions and guidelines

Information technology - Coding of audio-visual objects - Part 12: ISO base media file format - Amendment 1: File format extensions and guidelines

ISO/IEC 14496-12:2004/FDAM 1 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 14496-12:2004/FDAM 1 has the following relationships with other standards: It is inter standard links to ISO 27971:2015, ISO/IEC 14496-12:2004, ISO/IEC 14496-12:2005; is excused to ISO/IEC 14496-12:2004. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 14496-12:2004/FDAM 1 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


FINAL ISO/IEC
AMENDMENT
DRAFT 14496-12:2004
FDAM 1
ISO/IEC JTC 1
Information technology — Coding of
Secretariat: ANSI
audio-visual objects —
Voting begins on:
2004-09-30
Part 12:
ISO base media file format
Voting terminates on:
2004-11-30
AMENDMENT 1: File format extensions and
guidelines
Technologies de l'information — Codage des objets audiovisuels —
Partie 12: Format ISO de base pour les fichiers médias
AMENDEMENT 1: Extensions de format de fichier et lignes directrices

Please see the administrative notes on page iii

RECIPIENTS OF THIS DRAFT ARE INVITED TO
SUBMIT, WITH THEIR COMMENTS, NOTIFICATION
OF ANY RELEVANT PATENT RIGHTS OF WHICH
THEY ARE AWARE AND TO PROVIDE SUPPORT-
ING DOCUMENTATION.
IN ADDITION TO THEIR EVALUATION AS
Reference number
BEING ACCEPTABLE FOR INDUSTRIAL, TECHNO-
LOGICAL, COMMERCIAL AND USER PURPOSES,
DRAFT INTERNATIONAL STANDARDS MAY ON
OCCASION HAVE TO BE CONSIDERED IN THE
LIGHT OF THEIR POTENTIAL TO BECOME STAN-
DARDS TO WHICH REFERENCE MAY BE MADE IN
©
NATIONAL REGULATIONS. ISO/IEC 2004

PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

Copyright notice
This ISO document is a Draft International Standard and is copyright-protected by ISO. Except as permitted
under the applicable laws of the user's country, neither this ISO draft nor any extract from it may be
reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic,
photocopying, recording or otherwise, without prior written permission being secured.
Requests for permission to reproduce should be addressed to either ISO at the address below or ISO's
member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Reproduction may be subject to royalty payments or a licensing agreement.
Violators may be prosecuted.
ii © ISO/IEC 2004 — All rights reserved

In accordance with the provisions of Council Resolution 21/1986, this document is circulated in the
English language only.
© ISO/IEC 2004 — All rights reserved iii

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 1 to ISO/IEC 14496-12:2004 was prepared by Joint Technical Committee ISO/IEC JTC 1,
Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.
iv © ISO/IEC 2004 — All rights reserved

Information technology — Coding of audio-visual objects —
Part 12:
ISO base media file format
AMENDMENT 1: File format extensions and guidelines
In clause 2, add the following references:
ISO/IEC 14496-10, Information technology – Coding of audio-visual objects – Part 10: Advanced Video
Coding
ISO/IEC 14496-14, Information technology – Coding of audio-visual objects – Part 14: MP4 file format
ISO/IEC 15444-3, Information technology – JPEG 2000 image coding system – Part 3: Motion JPEG 2000
IETF RFC 3711, "The Secure Real-time Transport Protocol", Baugher M. et al., March 2004
SMIL 1.0 “Synchronized Multimedia Integration Language (SMIL) 1.0 Specification”,

In 4.3.1, Definition of File Type Box, replace:
The type ‘isom’ (ISO Base Media file) is defined in this section of this specification, as identifying files that
conform to the ISO Base Media File Format.
with:
The type ‘isom’ (ISO Base Media file) is defined in this section of this specification, as identifying files that
conform to the first version of ISO Base Media File Format.
Add at the end of 4.3.1:
The brand ‘iso2’ shall be used to indicate compatibility with this amended version of the ISO Base Media File
Format; it may be used in addition to or instead of the ‘isom’ brand and the same usage rules apply. If used
without the brand 'isom' identifying the first version of this specification, it indicates that support for some or all
of the technology introduced by this amendment is required, such as the functionality in subclauses [8.40]
through [8.45], or the SRTP support in sub-clause [0], is required.
The brand ‘avc1’ shall be used to indicate that the file is conformant with the ‘AVC Extensions’ in sub-clause
[8.40]. If used without other brands, this implies that support for those extensions is required. The use of
‘avc1’ as a major-brand may be permitted by specifications; in that case, that specification defines the file
extension and required behavior.
If a Meta-box with an MPEG-7 handler type is used at the file level, then the brand ‘mp71’ should be a
member of the compatible-brands list in the file-type box.
In 4.3.3, delete the sentence:
“Only one brand is defined here…”
© ISO/IEC 2004 — All rights reserved 1

In 6.1.2, add “Meta-data, “ after “Move Fragments, “.

In 6.2.3, insert the following as point 8 in the recommendations:
8) It is recommended that the progressive download information box be placed as early as possible in files,
for maximum utility.
Replace Table 1 as follows (correctly cross-referenced):
ftyp   * 4.3 file type and compatibility
pdin    8.43 progressive download information
moov   * 8.1 container for all the metadata
mvhd   * 8.3 movie header, overall declarations
trak   * 8.4 container for an individual track or stream
track header, overall information about the track
tkhd  * 8.5
tref   8.6 track reference container
edts   8.25 edit list container
elst  8.26 an edit list
mdia  * 8.7 container for the media information in a track
mdhd  * 8.8 media header, overall information about the media
hdlr  * 8.9 handler, declares the media (handler) type
media information container
minf  * 8.10
vmhd  8.11.2 video media header, overall information (video
track only)
sound media header, overall information (sound
smhd  8.11.3
track only)
hmhd  8.11.4 hint media header, overall information (hint track
only)
nmhd  8.11.5 Null media header, overall information (some
tracks only)
dinf * 8.12 data information box, container
dref * 8.13 data reference box, declares source(s) of media
data in track
stbl * 8.14 sample table box, container for the time/space
map
stsd * 8.16 sample descriptions (codec types, initialization
etc.)
stts * 8.15.2 (decoding) time-to-sample
ctts 8.15.3 (composition) time to sample
stsc * 8.18 sample-to-chunk, partial data-offset information
stsz 8.17.2 sample sizes (framing)
stz2 8.17.3 compact sample sizes (framing)
stco * 8.19 chunk offset, partial data-offset information
co64 8.19 64-bit chunk offset
stss 8.20 sync sample table (random access points)
stsh 8.21 shadow sync sample table
padb 8.23 sample padding bits
stdp 8.22 sample degradation priority
sdtp 8.40.2 independent and disposable samples
sbgp 8.40.3.2 sample-to-group
sgpd 8.40.3.3 sample group description
sub-sample information
subs 8.42
mvex   8.29 movie extends box
mehd   8.30 movie extends header box
trex  * 8.31 track extends defaults
ipmc   8.45.4 IPMP Control Box
moof    8.32 movie fragment
mfhd   * 8.33 movie fragment header
traf   8.34 track fragment
2 © ISO/IEC 2004 — All rights reserved

tfhd  * 8.35 track fragment header
trun   8.36 track fragment run
sdtp   8.40.2 independent and disposable samples
sbgp   8.40.3.2 sample-to-group
subs   8.42 sub-sample information
mfra    8.37 movie fragment random access
tfra   8.38 track fragment random access
mfro   * 8.39 movie fragment random access offset
mdat    8.2 media data container
free    8.24 free space
skip    8.24 free space
udta   8.27 user-data
cprt   8.28 copyright etc.
meta    8.44.1 metadata
hdlr   * 8.9 handler, declares the metadata (handler) type
dinf   8.12 data information box, container
dref   8.13 data reference box, declares source(s) of
metadata items
ipmc   8.45.4 IPMP Control Box
iloc   8.44.3 item location
ipro   8.44.5 item protection
sinf   8.45.1 protection scheme information box
frma  8.45.2 original format box
imif  8.45.3 IPMP Information box
schm  8.45.5 scheme type box
schi  8.45.6 scheme information box
iinf   8.44.6 item information
xml   8.44.2 XML container
bxml   8.44.2 binary XML container
pitm   8.44.4 primary item reference

In 8.2.1, Media Data Box, add:
“, and the item location box, subclause 8.44.3”” before the closing parenthesis at the end of the first paragraph.

In 8.6.3, Track Reference Box Semantics, after "The Track Reference Box contains track reference type
boxes" insert:
track_ID is an integer that provides a reference from the containing track to another track in the
presentation. track_IDs are never re-used and cannot be equal to zero.

In 8.9.1, Definition of Handler Reference Box, change:
Container: Media Box (‘mdia’)
to:
Container: Media Box (‘mdia’) or Meta Box (‘meta’)

Add at the end of 8.9.1:
This box when present within a Meta Box, declares the structure or format of the 'meta' box contents.

© ISO/IEC 2004 — All rights reserved 3

In 8.9.3, change:
handler_type is an integer containing one of the following
to:
handler_type when present in a media box, is an integer containing one of the following

and add before documentation of the ‘name’ field:
handler_type when present in a meta box, contains an appropriate value to indicate the format of the
meta box contents
In 8.12.1, Definition of Data information box, change:
Box Type: ‘dinf’
Container: Media Information Box (‘minf’)
Mandatory: Yes
Quantity: Exactly one
to:
Box Type: ‘dinf’
Container: Media Information Box (‘minf’) or Meta Box (‘meta’)
Mandatory: Yes (required within ‘minf’ box) and No (optional within ‘meta’ box)
Quantity: Exactly one
In 8.24.1, Definition of Free Space Box, change:
Container: File
to:
Container: File or other box
In 8.26.2, Edit List Box Syntax, change:
for (i=1; i ≤ entry_count; i++) {

to:
for (i=1; i <= entry_count; i++) {

Change:
int(16) media_rate_fraction;
to:
int(16) media_rate_fraction = 0;

4 © ISO/IEC 2004 — All rights reserved

In 8.31.1, Definition of Track Extends Box, replace:
The sample flags field in sample fragments (default_sample_flags here and in a Track Fragment Header
Box, and sample_flags and first_sample_flags in a Track Fragment Run Box) is coded as a 32-bit
value. It has the following structure:
bit(12) reserved=0;
bit(3) sample_padding_value;
bit(1) sample_is_difference_sample;
// i.e. when 1 signals a non-key or non-sync sample
unsigned int(16) sample_degradation_priority;

with:
The sample flags field in sample fragments (default_sample_flags here and in a Track Fragment Header
Box, and sample_flags and first_sample_flags in a Track Fragment Run Box) is coded as a 32-bit
value. It has the following structure:
bit(6) reserved=0;
unsigned int(2) sample_depends_on;
unsigned int(2) sample_is_depended_on;
unsigned int(2) sample_has_redundancy;
bit(3) sample_padding_value;
bit(1) sample_is_difference_sample;
// i.e. when 1 signals a non-key or non-sync sample
unsigned int(16) sample_degradation_priority;

The sample_depends_on, sample_is_depended_on and sample_has_redundancy values are defined as
documented in the Independent and Disposable Samples Box.

Insert the following subclauses after 8.39 (i.e. starting with the following subclause numbered as 8.40).
8.40 AVC Extensions to the ISO Base Media File Format
8.40.1 Introduction
This section documents technical additions to the ISO Base Media File Format originally designed for AVC
support, but which are more generally applicable.
8.40.2 Independent and Disposable Samples Box
8.40.2.1 Definition
Box Types: ‘sdtp’
Container: Sample Table Box (‘stbl’)
Mandatory: No
Quantity: Zero or one
This optional table answers three questions about sample dependency:
1) does this sample depend on others (is it an I-picture)?
2) do no other samples depend on this one?
3) does this sample contain multiple (redundant) encodings of the data at this time-instant (possibly with
different dependencies)?
© ISO/IEC 2004 — All rights reserved 5

In the absence of this table:
1) the sync sample table answers the first question; in most video codecs, I-pictures are also sync
points,
2) the dependency of other samples on this one is unknown.
3) the existence of redundant coding is unknown.

When performing ‘trick’ modes, such as fast-forward, it is possible to use the first piece of information to locate
independently decodable samples. Similarly, when performing random access, it may be necessary to locate
the previous sync point or random access recovery point, and roll-forward from the sync point or the pre-roll
starting point of the random access recovery point to the desired point. While rolling forward, samples on
which no others depend need not be retrieved or decoded.
The value of ‘sample-is-depended-on’ is independent of the existence of redundant codings. However, a
redundant coding may have different dependencies from the primary coding; if redundant codings are
available, the value of ‘sample_depends_on’ documents only the primary coding.
The size of the table, sample_count, is taken from the sample_count in the Sample Size Box ('stsz') or
Compact Sample Size Box (‘stz2’).
8.40.2.2 Syntax
aligned(8) class SampleDependencyTypeBox
extends FullBox(‘sdtp’, version = 0, 0) {
for (i=0; i < sample_count; i++){
unsigned int(2) reserved = 0;
unsigned int(2) sample_depends_on;
unsigned int(2) sample_is_depended_on;
unsigned int(2) sample_has_redundancy;
}
}
8.40.2.3 Semantics
sample_depends_on takes one of the following four values:
0: the dependency of this sample is unknown;
1: this sample does depend on others (not an I picture);
2: this sample does not depend on others (I picture);
3: reserved
sample_is_depended_on takes one of the following four values:
0: the dependency of other samples on this sample is unknown;
1: other samples depend on this one (not disposable);
2: no other sample depends on this one (disposable);
3: reserved
sample_has_redundancy takes one of the following four values:
0: it is unknown whether there is redundant coding in this sample;
1: there is redundant coding in this sample;
2: there is no redundant coding in this sample;
3: reserved
8.40.3 Sample Groups
8.40.3.1 Introduction
This clause specifies a generic mechanism for representing a partition of the samples in a track. A sample
grouping is an assignment of each sample in a track to be a member of one sample group, based on a
grouping criterion. A sample group in a sample grouping is not limited to being contiguous samples and may
contain non-adjacent samples. As there may be more than one sample grouping for the samples in a track,
6 © ISO/IEC 2004 — All rights reserved

each sample grouping has a type field to indicate the type of grouping. For example, a file might contain two
sample groupings for the same track: one based on an assignment of sample to layers and another to sub-
sequences.
Sample groupings are represented by two linked data structures: (1) a SampleToGroup box represents the
assignment of samples to sample groups; (2) a SampleGroupDescription box contains a sample group
entry for each sample group describing the properties of the group. There may be multiple instances of the
SampleToGroup and SampleGroupDescription boxes based on different grouping criteria. These are
distinguished by a type field used to indicate the type of grouping.
One example of using these tables is to represent the assignments of samples to layers. In this case each
sample group represents one layer, with an instance of the SampleToGroup box describing which layer a
sample belongs to.
8.40.3.2 SampleToGroup Box
8.40.3.2.1 Definition
Box Type: ‘sbgp’
Container: Sample Table Box (‘stbl’) or Track Fragment Box (‘traf’)
Mandatory: No
Quantity: Zero or more.
This table can be used to find the group that a sample belongs to and the associated description of that
sample group. The table is compactly coded with each entry giving the index of the first sample of a run of
samples with the same sample group descriptor. The sample group description ID is an index that refers to a
SampleGroupDescription box, which contains entries describing the characteristics of each sample group.
There may be multiple instances of this box if there is more than one sample grouping for the samples in a
track. Each instance of the SampleToGroup box has a type code that distinguishes different sample
groupings. Within a track, there shall be at most one instance of this box with a particular grouping type. The
associated SampleGroupDescription shall indicate the same value for the grouping type.
8.40.3.2.2 Syntax
aligned(8) class SampleToGroupBox
extends FullBox(‘sbgp’, version = 0, 0)
{
unsigned int(32) grouping_type;
unsigned int(32) entry_count;
for (i=1; i <= entry_count; i++)
{
unsigned int(32) sample_count;
unsigned int(32) group_description_index;
}
}
8.40.3.2.3 Semantics
version is an integer that specifies the version of this box.
grouping_type is an integer that identifies the type (i.e. criterion used to form the sample groups) of the
sample grouping and links it to its sample group description table with the same value for grouping
type. At most one occurrence of this box with the same value for grouping_type shall exist for a
track.
entry_count is an integer that gives the number of entries in the following table.
sample_count is an integer that gives the number of consecutive samples with the same sample group
descriptor. If the sum of the sample count in this box is less than the total sample count, then the
reader should effectively extend it with an entry that associates the remaining samples with no group.
© ISO/IEC 2004 — All rights reserved 7

It is an error for the total in this box to be greater than the sample_count documented elsewhere,
and the reader behavior would then be undefined.
group_description_index is an integer that gives the index of the sample group entry which
describes the samples in this group. The index ranges from 1 to the number of sample group entries
in the SampleGroupDescription Box, or takes the value 0 to indicate that this sample is a member
of no group of this type.
8.40.3.3 Sample Group Description Box
8.40.3.3.1 Definition
Box Types: ‘sgpd’
Container: Sample Table Box (‘stbl’)
Mandatory: No
Quantity: Zero or more, with one for each SampleToGroup Box.
This description table gives information about the characteristics of sample groups. The descriptive
information is any other information needed to define or characterize the sample group.
There may be multiple instances of this box if there is more than one sample grouping for the samples in a
track. Each instance of the SampleGroupDescription box has a type code that distinguishes different
sample groupings. Within a track, there shall be at most one instance of this box with a particular grouping
type. The associated SampleToGroup shall indicate the same value for the grouping type.
The information is stored in the sample group description box after the entry-count. An abstract entry type is
defined and sample groupings shall define derived types to represent the description of each sample group.
For video tracks, an abstract VisualSampleGroupEntry is used with similar types for audio and hint tracks.
Note: the base classes for sample group description entries are not boxes and therefore
no size is signaled. When defining derived classes, ensure either that they have a fixed
size, or that the size is explicitly indicated with a length field. An implied size (e.g.
achieved by parsing the data) is not recommended as this makes scanning the array
difficult.
8.40.3.3.2 Syntax
// Sequence Entry
abstract class SampleGroupDescriptionEntry (unsigned int(32) handler_type)
{
}
// Visual Sequence
abstract class VisualSampleGroupEntry (type) extends SampleGroupDescriptionEntry
(type)
{
}
// Audio Sequences
abstract class AudioSampleGroupEntry (type) extends SampleGroupDescriptionEntry
(type)
{
}
8 © ISO/IEC 2004 — All rights reserved

aligned(8) class SampleGroupDescriptionBox (unsigned int(32) handler_type)
extends FullBox('sgpd', 0, 0){
unsigned int(32) grouping_type;
unsigned int(32) entry_count;
int i;
for (i = 1 ; i <= entry_count ; i++){
switch (handler_type){
case ‘vide’: // for video tracks
VisualSampleGroupEntry ();
break;
case ‘soun’: // for audio tracks
AudioSampleGroupEntry();
break;
case ‘hint’: // for hint tracks
HintSampleGroupEntry();
break;
}
}
}
8.40.3.3.3 Semantics
version is an integer that specifies the version of this box.
grouping_type is an integer that identifies the SampleToGroup box that is associated with this sample group
description.
entry_count is an integer that gives the number of entries in the following table.

8.40.3.4 Representation of group structures in Movie Fragments
Support for new SampleGroup structures within Movie fragments is provided by the use of the
SampleToGroup Box with the container for this Box being the Track Fragment Box (‘traf’). The definition,
syntax and semantics of this Box is as specified in subclause 8.40.3.2.
The SampleToGroup Box can be used to find the group that a sample in a track fragment belongs to and the
associated description of that sample group. The table is compactly coded with each entry giving the index of
the first sample of a run of samples with the same sample group descriptor. The sample group description ID
is an index that refers to a SampleGroupDescription Box, which contains entries describing the
characteristics of each sample group and present in the SampleTableBox.
There may be multiple instances of the SampleToGroup Box if there is more the one sample grouping for the
samples in a track fragment. Each instance of the SampleToGroup Box has a type code that distinguishes
different sample groupings. The associated SampleGroupDescription shall indicate the same value for
the grouping type.
The total number of samples represented in any SampleToGroup Box in the track fragment must match the
total number of samples in all the track fragment runs. Each SampleToGroup Box documents a different
grouping of the same samples.
8.40.4 Random Access Recovery Points
8.40.4.1 Definition
In some coding systems it is possible to random access into a stream and achieve correct decoding after
having decoded a number of samples. This is known as gradual decoding refresh. For example, in video, the
encoder might encode intra-coded macroblocks in the stream, such that it knows that within a certain period
the entire picture consists of pixels that are only dependent on intra-coded macroblocks supplied during that
period.
Samples for which such gradual refresh is possible are marked by being a member of this group. The
definition of the group allows the marking to occur at either the beginning of the period or the end. However,
© ISO/IEC 2004 — All rights reserved 9

when used with a particular media type, the usage of this group may be restricted to marking only one end (i.e.
restricted to only positive or negative roll values). A roll-group is defined as that group of samples having the
same roll distance.
8.40.4.2 Syntax
class VisualRollRecoveryEntry() extends VisualSampleGroupEntry (’roll’)
{
signed int(16) roll_distance;
}
class AudioRollRecoveryEntry() extends AudioSampleGroupEntry (’roll’)
{
signed int(16) roll_distance;
}
8.40.4.3 Semantics
roll_distance is a signed integer that gives the number of samples that must be decoded in order for
a sample to be decoded correctly.  A positive value indicates the number of samples after the
sample that is a group member that must be decoded such that at the last of these recovery is
complete, i.e. the last sample is correct. A negative value indicates the number of samples before the
sample that is a group member that must be decoded in order for recovery to be complete at the
marked sample. The value zero must not be used; the sync sample table documents random access
points for which no recovery roll is needed.
8.41 Sample Scale Box
8.41.1 Definition
Box Type: ‘stsl’
Container: Sample Entry
Mandatory: No
Quantity: zero or one
This box may be present in any visual sample entry. This box indicates the scaling method that is applied
when the width and height of the visual material (as declared by the width and height values in any visual
sample entry) do not match the track width and height values (as indicated in the track header box).
Implementation of this box is optional; if this box is present and can be interpreted by the decoder, all samples
shall be displayed according to the scaling behavior that is specified in this box. Otherwise, all samples are
scaled to the size that is indicated by the width and height field in the Track Header Box.
If the size of the image is bigger than the size of the presentation region and ‘hidden’ scaling is applied in the
Sample Scale Box, it is not possible to display the whole image. In such a case, it is useful to provide the
information to determine the region that is to be displayed. The center values would then indicate the center of
the region of high priority in each visual sample. The decoder can display the region of high priority according
to these values. The center values imply a consistent crop for all the images in a sequence. The offset values
are positive when the desired visual center is below or to the right of the image center, and negative for offsets
above or to the left.
The semantics of the values for scale_method are as specified for the ‘fit’ attribute of regions in SMIL 1.0.
10 © ISO/IEC 2004 — All rights reserved

8.41.2 Syntax
aligned(8) class SampleScaleBox extends FullBox(‘stsl’, version = 0, 0) {
bit(7) reserved = 0;
bit(1) constraint_flag;
unsigned int(8) scale_method;
int(16) display_center_x;
int(16) display_center_y;
}
8.41.3 Semantics
constraint_flag: if this flag is set, all samples described by this sample entry shall be scaled
according to the method specified by the field ‘scale_method’. Otherwise, it is recommended that all
the samples be scaled according to the method specified by the field ‘scale_method’, but can be
displayed in an implementation dependent way, which may include not scaling the image (i.e. neither
to the width and height specified in the track header box, nor by the method indicated here)
scale_method is an 8-bit unsigned integer that defines the scaling mode to be used. Of the 256 possible
values the values 0 through 127 are reserved for use by ISO and values 128 through 255 are user-
defined and are not specified in this International Standard; they may be used as determined by the
application. Of the reserved values the following modes are currently defined:
1 scaling is done by ‘fill’ mode.
2 scaling is done by ‘hidden’ mode.
3 scaling is done by ‘meet’ mode.
4 scaling is done by ‘slice’ mode in the x-coordinate.
5 scaling is done by ‘slice’ mode in the y-coordinate.
display_center_x is an horizontal offset in pixels of the center of the region that should be displayed
by priority relative to the center of the image. Default value is zero. Positive values indicate a display
center to the right of the image center.
display_center_y is an vertical offset in pixels of the center of the region that should be displayed by
priority relative to the center of the image. Default value is zero. Positive values indicate a display
center below the image center.

8.42 Sub-Sample Information Box
8.42.1 Definition
Box Type: ‘subs’
Container: Sample Table Box (‘stbl’) or Track Fragment Box (‘traf’)
Mandatory: No
Quantity: Zero or one
This box, named the Sub-Sample Information box, is designed to contain sub-sample information.
A sub-sample is a contiguous range of bytes of a sample. The specific definition of a sub-sample shall be
supplied for a given coding system (e.g. for ISO/IEC 14496-10, Advanced Video Coding). In the absence of
such a specific definition, this box shall not be applied to samples using that coding system.
If subsample_count is 0 for any entry, then those samples have no subsample information and no
array follows. The table is sparsely coded; the table identifies which samples have sub-sample
structure by recording the difference in sample-number between each entry. The first entry in the
table records the sample number of the first sample having sub-sample information.
Note: It is possible to combine subsample_priority and discardable such that when
subsample_priority is smaller than a certain value, discardable is set to 1. However,
since different systems may use different scales of priority values, to separate them is safe to
have a clean solution for discardable sub-samples.
© ISO/IEC 2004 — All rights reserved 11

8.42.2 Syntax
aligned(8) class SubSampleInformationBox
extends FullBox(‘subs’, version, 0) {
unsigned int(32) entry_count;
int i,j;
for (i=0; i < entry_count; i++) {
unsigned int(32) sample_delta;
unsigned int(16) subsample_count;
if (subsample_count > 0) {
for (j=0; j < subsample_count; j++) {
if(version == 1)
{
unsigned int(32) subsample_size;
}
else
{
unsigned int(16) subsample_size;
}
unsigned int(8) subsample_priority;
unsigned int(8) discardable;
unsigned int(32) reserved = 0;
}
}
}
}
8.42.3 Semantics
version is an integer that specifies the version of this box (0 or 1 in this specification)
entry_count is an integer that gives the number of entries in the following table.
sample_delta is an integer that specifies the sample number of the sample having sub-sample
structure. It is coded as the difference between the desired sample number, and the sample number
indicated in the previous entry. If the current entry is the first entry, the value indicates the sample
number of the first sample having sub-sample information, that is, the value is the difference between
the sample number and zero (0).
subsample_count is an integer that specifies the number of sub-sample for the current sample.
sample_size is an integer that specifies the size, in bytes, of the current sub-sample.
subsample_priority is an integer specifying the degradation priority for each sub-sample. Higher
values of subsample_priority, indicate sub-samples which are important to, and have a greater
impact on, the decoded quality.
discardable equal to 0 means that the sub-sample is required to decode the current sample, while
equal to 1 means the sub-sample is not required to decode the current sample but may be used for
enhancements, e.g., the sub-sample consists of supplemental enhancement information (SEI)
messages.
8.43 Progressive Download Information Box
8.43.1 Definition
Box Types: ‘pdin’
Container: File
Mandatory: No
Quantity: Zero or One
The Progressive download information box aids the progressive download of an ISO file. The box contains
pairs of numbers (to the end of the box) specifying combinations of effective file download bitrate in units of
bytes/sec and a suggested initial playback delay in units of milliseconds.
A receiving party can estimate the download rate it is experiencing, and from that obtain an upper estimate for
a suitable initial delay by linear interpolation between pairs, or by extrapolation from the first or last entry.
12 © ISO/IEC 2004 — All rights reserved

It is recommended that the progressive download information box be placed as early as possible in files, for
maximum utility.
8.43.2 Syntax
aligned(8) class ProgressiveDownloadInfoBox
extends FullBox(‘pdin’, version = 0, 0) {
for (i=0; ; i++) { // to end of box
unsigned int(32) rate;
unsigned int(32) initial_delay;
}
}
8.43.3 Semantics
rate is a download rate expressed in bytes/second
initial_delay is the suggested delay to use when playing the file, such that if download continues at
the given rate, all data within the file will arrive in time for its use and playback should not need to stall.

8.44 Meta-data Support
A common base structure is used to contain general metadata, called the meta box.
8.44.1 The MetadataBox
8.44.1.1 Definition
Box Type: ‘meta’
Container: File, Movie Box (‘moov’), or Track Box (‘trak’)
Mandatory: No
Quantity: Zero or one
A meta box contains descriptive or annotative metadata. The 'meta' box is required to contain a ‘hdlr’ box
indicating the structure or format of the ‘meta’ box contents. That metadata is located either within a box within
this box (e.g. an XML box), or is located by the item identified by a primary item box.
All other contained boxes are specific to the format specified by the handler box.
The other boxes defined here may be defined as optional or mandatory for a given format. If they are used,
then they must take the form specified here. These optional boxes include a data-information box, which
documents other files in which metadata values (e.g. pictures) are placed, and a item location box, which
documents where in those files each item is located (e.g. in the common case of multiple pictures stored in the
same file). At most one meta box may occur at each of the file level, movie level, or track level.
If an ItemProtectionBox occurs, then some or all of the meta-data, including possibly the primary resource,
may have been protected and be un-readable unless the protection system is taken into account.
© ISO/IEC 2004 — All rights reserved 13

8.44.1.2 Syntax
aligned(8) class MetaBox (handler_type)
extends FullBox(‘meta’, version = 0, 0) {
HandlerBox(handler_type) theHandler;
PrimaryItemBox  primary_resource; // optional
DataInformationBox file_locations; // optional
ItemLocationBox item_locations; // optional
ItemProtectionBox protections;   // optional
ItemInfoBox  item_infos;   // optional
IPMPControlBox  IPMP_control;  // optional
Box other_boxes[];    // optional
}
8.44.1.3 Semantics
The structure or format of the metadata is declared by the handler.
8.44.2 XML Boxes
8.44.2.1 Definition
Box Type: ‘xml ‘ or ‘bxml’
Container: Meta Box (‘meta’)
Mandatory: No
Quantity: Zero or one
When the primary data is in XML format and it is desired that the XML be stored directly in the meta-box, one
of these forms may be used. The BinaryXML Box may only be used when there is a single well-defined
binarization of the XML for that defined format as identified by the handler.
Within an XML box the data is in UTF-8 format unless the data starts with a byte-order-mark (BOM), which
indicates that the data is in UTF-16 format.
8.44.2.2 Syntax
aligned(8) class XMLBox
extends FullBox(‘xml ’, version = 0, 0) {
string xml;
}
aligned(8) class BinaryXMLBox
extends FullBox(‘bxml’, version = 0, 0) {
unsigned int(8) data[]; // to end of box
}
8.44.3 The Item Location Box
8.44.3.1 Definition
Box Type: ‘iloc’
Container: Meta Box (‘meta’)
Mandatory: No
Quantity: Zero or one
The item location box provides a directory of resources in this or other files, by locating their containing file,
their offset within that file, and their length. Placing this in binary format enables common handling of this data,
even by systems which do not understand the particular metadata system (handler) used. For example, a
system might integrate all the externally referenced metadata resources into one file, re-adjusting file offsets
and file references accordingly.
14 © ISO/IEC 2004 — All rights reserved

The box starts with three values, specifying the size in bytes of the offset field, length field, and base_offset
field, respectively. These values must be from the set {0, 4, 8}.
Items may be stored fragmented into extents, e.g. to enable interleaving. An extent is a contiguous subset of
the bytes of the resource; the resource is formed by concatenating the extents. If only one extent is used
(extent_count = 1) then either or both of the offset and length may be implied:
• If the offset is not identified (the field has a length of zero), then the beginning of the file (offset 0) is
implied.
• If the length is not specified, or specified as zero, then the entire file length is implied. References into
the same file as this metadata, or items divided into more than one extent, should have an explicit
offset and length, or use a MIME type requiring a different interpretation of the file, to avoid infinite
recursion.
The size of the item is the sum of the extent_lengths.
Note: extents may be interleaved with the chunks defined by the sample
tables of tracks.
The data-reference index may take the value 0, indicating a reference into the same file as this metadata, or
an index into the data-reference table.
Some referenced data may itself use offset/length techniques to address resources within it (e.g. an MP4 file
might be ‘included’ in this way). Normally such offsets are relative to the beginning of the containing file. The
field ‘base offset’ provides an additional offset for offset calculations within that contained data. For example,
if an MP4 file is included within a file formatted to this specification, then normally data-offsets within that MP4
section are relative to the beginning of file; base_offset adds to those offsets.
8.44.3.2 Syntax
aligned(8) class ItemLocationBox extends FullBox(‘iloc’, version = 0, 0) {
unsigned int(4) offset_size;
unsigned int(4) length_size;
unsigned int(4) base_offset_size;
unsigned int(4) reserved;
unsigned int(16) item_count;
for (i=0; i unsigned int(16) item_ID;
unsigned int(16) data_reference_index;
unsigned int(base_offset_size*8) base_offset;
unsigned int(16) extent_count;
for (j=0; j unsigned int(offset_size*8) extent_offset;
unsigned int(length_size*8) extent_length;
}
}
}
8.44.3.3 Semantics
offset_size is taken from the set {0, 4, 8} and indicates the length in bytes of the offset field.
length_size is taken from the set {0, 4, 8} and indicates the length in bytes of the length field.
base_offset_size is taken from the set {0, 4, 8} and indicates the length in bytes of the base_offset
field.
item_count counts the number of resources in the following array.
item_ID is an arbitrary integer ‘name’ for this resource which can be used to refer to it (e.g. in a URL).
data-reference-index is either zero (‘this file’) or a 1-based index into the data references in the data
information box.
base_offset provides a base value for offset calculations within the referenced data. If
base_offset_size is 0, base_offset takes the value 0, i.e. it is unused.
© ISO/IEC 2004 — All rights reserved 15

extent_count provides the count of the number of extents into which the resource is fragmented; it
must have the value 1 or greater
extent_offset provides the absolute offset in bytes from the beginning of the containing file, of this
item. If offset_size is 0, offset takes the value 0
extent_length provides the absolute length in bytes of this metadata item. If length_size is 0,
length takes the value 0. If the value is 0, then length of the item is the length of the entire referenced
file.
8.44.4 Primary Item Box
8.44.4.1 Definition
Box Type: ‘pitm’
Container: Meta Box (‘meta’)
Mandatory: No
Quantity: Zero or one
For a given handler, the primary data may be one of the referenced items when it is desired that it be stored
elsewhere, or divided into extents. In general, either this box must occur, or there must be a box within the
meta-box (e.g. an XML box) containing the primary information in the format required by the identified handler.
8.44.4.2 Syntax
aligned(8) class PrimaryItemBox
extends FullBox(‘pitm’, version = 0, 0) {
unsigned int(16) item_ID;
}
8.44.4.3 Semantics
item_ID is the identifier of the primary item
8.44.5 Item Protection Box
8.44.5.1 Definition
Box Type: ‘ipro’
Container: Meta Box (‘meta’)
Mandatory: No
Quantity: Zero or one
The item protection box provides an array of item protection information, for use by the Item Information Box.
8.44.5.2 Syntax
aligned(8) class ItemProtectionBox
extends FullBox(‘ipro’, version = 0, 0) {
unsigned int(16) protection_count;
for (i=1; i<=protection_c
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...