ISO/IEC TR 14496-24:2008
(Main)Information technology — Coding of audio-visual objects — Part 24: Audio and systems interaction
Information technology — Coding of audio-visual objects — Part 24: Audio and systems interaction
ISO/IEC TR 14496-24:2008 describes the desired joint behavior of MPEG-4 Systems (MPEG-4 File Format) and MPEG-4 Audio codecs. It is desired that MPEG-4 Audio encoders and decoders permit finite length signals to be encoded to a file (particularly MPEG-4 files) and decoded again to obtain the identical signal, subject to codec distortions. This will allow the use of audio in systems implementations (particularly MPEG-4 Systems), perhaps with other media such as video, in a deterministic fashion. Most importantly, the decoded signal will have nothing “extra” at the beginning or “missing” at the end.
Technologies de l'information — Codage d'objets audiovisuels — Partie 24: Codage audio et interaction de systèmes
General Information
Standards Content (Sample)
TECHNICAL ISO/IEC
REPORT TR
14496-24
First edition
2008-01-15
Information technology — Coding of
audio-visual objects —
Part 24:
Audio and systems interaction
Technologies de l'information — Codage d'objets audiovisuels —
Partie 24: Codage audio et interaction de systèmes
Reference number
ISO/IEC TR 14496-24:2008(E)
©
ISO/IEC 2008
---------------------- Page: 1 ----------------------
ISO/IEC TR 14496-24:2008(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2008
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2008 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC TR 14496-24:2008(E)
Contents Page
Foreword. iv
1 Scope . 1
2 Motivating audio composition time stamp handling. 1
3 AAC Encoder/Decoder Behavior. 2
3.1 Example 1: AAC . 2
3.1.1 Overview . 2
3.1.2 Pre-roll . 4
3.1.3 Edit-list. 4
3.1.4 Compressed Information and Decoder behavior . 4
3.2 Example 2: HE-AAC . 5
3.2.1 Overview . 5
4 Streaming Considerations . 6
Annex A (informative) Relevant ISO Base Media File Format Syntax. 7
A.1 Pre-roll syntax . 7
A.2 Edit-list syntax . 8
© ISO/IEC 2008 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC TR 14496-24:2008(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
In exceptional circumstances, the joint technical committee may propose the publication of a Technical Report
of one of the following types:
— type 1, when the required support cannot be obtained for the publication of an International Standard,
despite repeated efforts;
— type 2, when the subject is still under technical development or where for any other reason there is the
future but not immediate possibility of an agreement on an International Standard;
— type 3, when the joint technical committee has collected data of a different kind from that which is
normally published as an International Standard (“state of the art”, for example).
Technical Reports of types 1 and 2 are subject to review within three years of publication, to decide whether
they can be transformed into International Standards. Technical Reports of type 3 do not necessarily have to
be reviewed until the data they provide are considered to be no longer valid or useful.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC TR 14496-24, which is a Technical Report of type 3, was prepared by Joint Technical Committee
ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and
hypermedia information.
ISO/IEC TR 14496 consists of the following parts, under the general title Information technology — Coding of
audio-visual objects:
⎯ Part 1: Systems
⎯ Part 2: Visual
⎯ Part 3: Audio
⎯ Part 4: Conformance testing
⎯ Part 5: Reference software
⎯ Part 6: Delivery Multimedia Integration Framework (DMIF)
⎯ Part 7: Optimized reference software for coding of audio-visual objects
⎯ Part 8: Carriage of ISO/IEC 14496 contents over IP networks
iv © ISO/IEC 2008 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC TR 14496-24:2008(E)
⎯ Part 9: Reference hardware description
⎯ Part 10: Advanced Video Coding
⎯ Part 11: Scene description and application engine
⎯ Part 12: ISO base media file format
⎯ Part 13: Intellectual Property Management and Protection (IPMP) extensions
⎯ Part 14: MP4 file format
⎯ Part 15: Advanced Video Coding (AVC) file format
⎯ Part 16: Animation Framework eXtension (AFX)
⎯ Part 17: Streaming text format
⎯ Part 18: Font compression and streaming
⎯ Part 19: Synthesized texture stream
⎯ Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
⎯ Part 21: MPEG-J Graphics Framework eXtensions (GFX)
⎯ Part 22: Open Font Format
⎯ Part 23: Symbolic Music Representation
⎯ Part 24: Audio and systems interaction
© ISO/IEC 2008 – All rights reserved v
---------------------- Page: 5 ----------------------
TECHNICAL REPORT ISO/IEC TR 14496-24:2008(E)
Information technology — Coding of audio-visual objects —
Part 24:
Audio and systems interaction
1 Scope
This part of ISO/IEC TR 14496 describes the desired joint behavior of MPEG-4 Systems (MPEG-4 File
Format) and MPEG-4 Audio codecs. It is desired that MPEG-4 Audio encoders and decoders permit finite
length signals to be encoded to a file (particularly MPEG-4 files) and decoded again to obtain the identical
signal, subject to codec distortions. This will allow the use of audio in systems implementations (particularly
MPEG-4 Systems), perhaps with other media such as video, in a deterministic fashion. Most importantly, the
decoded signal will have nothing “extra” at the beginning or “missing” at the end.
This permits:
a) an exact ‘round trip’ from raw audio to encoded file back to raw audio (excepting encoding artifacts);
b) predictable synchronization between audio and other media such as video;
c) correct behavior when performing random access as well as when starting at the beginning of a
stream;
d) identical behavior when edits are applied in the raw domain and the encoded domain (again,
excepting encoding artifacts).
It is also required that there be predictable interoperability between encoders (as represented by files) and
decoders. There are two kinds of audio ‘offsets’ (or ‘delay’ in the context of transmission): those that result
from the encoding process, and those that result from the decoding process. This document is primarily
concerned with the latter.
These issues are resolved by the following:
• The handling of composition time stamps for audio composition units is specified. Special care is
taken in the case of compressed data, like HE-AAC coded audio, that can be decoded in a backward
compatible fashion as well as in an enhanced fashion.
• Examples are given that show how finite length signals can be encoded to an MPEG-4 file and
decoded again to obtain the identical signal, excepting codec distortions. Most importantly, the
decoded signal has nothing “extra” at the beginning or “missing” at the end.
2 Motivating audio composition time stamp handling
For compressed data, like HE-AAC coded audio, which can be decoded by different decoder configurations,
special attention is needed. In this case, decoding can be done in a backward-compatible fashion (AAC only)
as well as in an enhanced fashion (AAC+SBR). In order to insure that timestamps are correct (so that audio
remains synchronized with other media), the following must considered concerning MPEG-4 Systems and
Audio:
© ISO/IEC 2008 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO/IEC TR 14496-24:2008(E)
• If compressed data permits both backward-compatible and enhanced decoding, and if the decoder is
operating in a backwards-compatible fashion, then the decoder does not have to take any action.
However if the decoder is operating in enhanced fashion such that it is using a post-processor that
inserts some additional delay (e.g., the SBR post-processor in HE-AAC), then it must notify Systems
about the additional time delay incurred relative to the backwards-compatible mode. With the delay
thus indicated, Systems can handle the timestamps of the composition units as needed so as to
compensate for the additional delay.
• Specifically for HE-AAC (using any of the available signaling mechanisms, i.e., implicit signaling,
backward compatible explicit signaling, or hierarchical explicit signaling) the original access unit
timestamps apply to backward-compatible AAC decoding and timestamp adjustment for delay-
compensation is needed in case of AAC+SBR decoding.
Figure 1 shows the composition unit that is generated by an AAC decoder (upper half) and by an HE-AAC
decoder operating SBR in dual-rate mode (lower half) when being fed with an access unit of an HE-AAC
bitstream that employs backward compatible signaling. Note that the composition time stamp associated to
said access unit applies to the n-th sample of the composition unit. For the AAC decoder case, n has the
value 1. For the HE-AAC decoder case, n has the value 962+1 to reflect the additional algorithmic delay of
962 samples of the SBR tool at the HE-AAC output sampling rate (which is twice the sampling rate of the
backward compatible AAC output).
AAC
n=1 1024
HE-AAC
2048
1 n=963
Figure 1 — Composition unit (audio waveform segment) generated by AAC decoder and HE-AAC
decoder fed with the same access unit (bitstream frame)
The timestamp handling depends on the technology used, and is independent of the profile signaled for either
bitstream or audio decoder. In particular, if the profile is changed between one that permits backward
compatible decoding and one that requires enhanced decoding, the timestamps and other structures (e.g. edit
lists and pre-roll) are not adjusted.
3 AAC Encoder/Decoder Behavior
3.1 Example 1: AAC
3.1.1 Overview
Figure 2 shows the AAC encoder and decoder behavior with respect to the association of encoder input
blocks, access units (AU), timestamps and decoder output blocks or composition units (CU). Note that the
input signal is only two and a fraction blocks long (as indicated by the oscillating waveform). The encoder
2 © ISO/IEC 2008 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/IEC TR 14496-24:2008(E)
essentially extends the waveform at both ends to facilitate encoding of
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.