Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 2: Parametric coding for high-quality audio

Technologies de l'information — Codage des objets audiovisuels — Partie 3: Codage audio — Amendement 2: Codage paramétrique pour le codage audio de haute qualité

General Information

Status
Withdrawn
Publication Date
29-Jul-2004
Withdrawal Date
29-Jul-2004
Current Stage
9599 - Withdrawal of International Standard
Start Date
14-Mar-2006
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 14496-3:2001/Amd 2:2004 - Parametric coding for high-quality audio
English language
116 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 14496-3:2001/Amd 2:2004 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 2: Parametric coding for high-quality audio". This standard covers: Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 2: Parametric coding for high-quality audio

Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 2: Parametric coding for high-quality audio

ISO/IEC 14496-3:2001/Amd 2:2004 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 14496-3:2001/Amd 2:2004 has the following relationships with other standards: It is inter standard links to ISO/IEC 14496-3:2001/Amd 2:2004/Cor 1:2005, ISO/IEC 14496-3:2005; is excused to ISO/IEC 14496-3:2001/Amd 2:2004/Cor 1:2005. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 14496-3:2001/Amd 2:2004 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 14496-3
Second edition
2001-12-15
AMENDMENT 2
2004-08-01
Information technology — Coding of
audio-visual objects —
Part 3:
Audio
AMENDMENT 2: Parametric coding
for high-quality audio
Technologies de l'information — Codage des objets audiovisuels —
Partie 3: Codage audio
AMENDEMENT 2: Codage paramétrique pour le codage audio de
haute qualité
Reference number
ISO/IEC 14496-3:2001/Amd.2:2004(E)
©
ISO/IEC 2004
ISO/IEC 14496-3:2001/Amd.2:2004(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

©  ISO/IEC 2004
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2004 – All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 2 to ISO/IEC 14496-3:2001 was prepared by Joint Technical Committee ISO/IEC JTC 1,
Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.
© ISO/IEC 2004 – All rights reserved iii

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Introduction
This document specifies the second Amendment to ISO/IEC 14496-3:2001. The document specifies the
normative syntax of the 'Parametric Coding for High Quality Audio' tool SSC and the decoding process. An
informative encoder description is given as well.

iv © ISO/IEC 2004 – All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)

Information technology — Coding of audio-visual objects —
Part 3:
Audio
AMENDMENT 2: Parametric coding for high-quality audio
In ISO/IEC 14496-3:2001, Introduction, add:
MPEG-4 SSC, (SinuSoidal Coding) is a parametric coding tool that is capable of full bandwidth high quality
audio coding. The coding tool dissects a monaural or stereo audio signal into a number of different objects
that each can be parameterized efficiently and encoded at a low bit-rate. These objects are transients:
representing dynamic changes in the temporal domain, sinusoids: representing deterministic components, and
noise: representing components that do not have a clear temporal or spectral localisation. The fourth object,
that is only relevant for stereo input signals, captures the stereo image. As the signal is represented in a
parametric domain, independent, high quality pitch and tempo scaling are possible at low computational cost.

Amendment subpart 1
In Part 3: Audio, Subpart 1, in subclause 1.3 Terms and Definitions, add:
270. SSC: SinuSoidal Coding.
and increase the index-number of subsequent entries.

© ISO/IEC 2004 – All rights reserved 1

ISO/IEC 14496-3:2001/Amd.2:2004(E)
In Part 3: Audio, Subpart 1, in subclause 1.5.1.1 Audio object type definition, replace table 1.1 with the table
below:
Table 1.1 – Audio object definition

Tools/
Modules
Audio Object
Type
Null                  0
AAC main X X X X X X XXX  X        2) 1
AAC LC X X X X X X XX  X         2
AAC SSR X X X  X X X X XX  X         3
AAC LTP X X X X X X X XX  X        2) 4
SBR                X 5
AAC Scalable X X X X X X  XXXXXX        6) 6
TwinVQ  X X X X X   X  X        7
CELP            X      8
HVXC             X     9
(Reserved)                  10
(Reserved)                  11
TTSI                X  12
Main synthetic              X X X   3) 13
Wavetable              X X   4) 14
synthesis
General MIDI               X   15
Algorithmic              X    16
Synthesis and
Audio FX
ER AAC LC X X X X X  XX  X XXX      17
(Reserved)                  18
ER AAC LTP X X X X X X  XX  X XXX      5) 19
ER AAC scalable X X X X X  XXXXXX XXX      6) 20
ER TwinVQ X X X X   X  X XX      21
ER BSAC X X X X X  XX   X XX      22
ER AAC LD  X X X X X  XX  X XXX      23
ER CELP           XXXX     24
ER HVXC           XX X X     25
ER HILN           XX    X  26
ER Parametric           XX X X  X  27
SSC                 X 28
(Reserved)                  29
(Reserved)                  30
(Reserved)                  31

2 © ISO/IEC 2004 — All rights reserved

gain control
block switching
window shapes - standard
window shapes – AAC LD
filterbank - standard
filterbank – SSR
TNS
LTP
intensity
coupling
MPEG-2 prediction
PNS
MS
SIAQ
FSS
upsampling filter tool
quantisation&coding - AAC
quantisation&coding - TwinVQ
quantisation&coding - BSAC
AAC ER Tools
ER payload syntax
EP Tool 1)
CELP
Silence Compression
HVXC
HVXC 4kbs VR
SA tools
SASBF
MIDI
HILN
TTSI
SBR
SSC
Remark
Object Type ID
ISO/IEC 14496-3:2001/Amd.2:2004(E)
In Part 3: Audio, Subpart 1, replace Table 1.2 (Audio Profiles definition) with the following table:
Table 1.2 – Audio Profiles definition
Mobile
High Low High
Main Scalable Speech Synthetic Natural Audio AAC Object
Audio Object
Quality Delay Efficiency
Audio Audio Audio Audio Audio Internet- Profile Type
Audio Audio AAC
Type
Profile Profile Profile Profile Profile working ID
Profile Profile Profile
Profile
Null      0
AAC main X   X  1
AAC LC X X  X X X X 2
AAC SSR X   X  3
AAC LTP X X  X X  4
SBR     X 5
AAC Scalable X X  X X  6
TwinVQ X X   X  7
CELP X X X X X X  8
HVXC X X X  X X  9
(reserved)      10
(reserved)      11
TTSI X X X X X X  12
Main synthetic X  X    13
Wavetable      14
synthesis
General MIDI      15
Algorithmic      16
Synthesis and
Audio FX
ER AAC LC   X X X  17
(reserved)      18
ER AAC LTP   X X  19
ER AAC   X X X  20
Scalable
ER TwinVQ    X X  21
ER BSAC    X X  22
ER AAC LD   X X X  23
ER CELP   X X X  24
ER HVXC   X X  25
ER HILN    X  26
ER    X  27
Parametric
SSC      28
(reserved)      29
(reserved)      30
(reserved)      31
© ISO/IEC 2004 — All rights reserved 3

ISO/IEC 14496-3:2001/Amd.2:2004(E)
In Part 3: Audio, Subpart 1, in subclause 1.6.2.1 AudioSpecificConfig, replace table 1.8 with the table below:
Table 1.8 – Syntax of AudioSpecificConfig()
Syntax No. of bits Mnemonic
AudioSpecificConfig ()
{
audioObjectType; 5 uimsbf
samplingFrequencyIndex; 4 uimsbf
if ( samplingFrequencyIndex==0xf )
samplingFrequency; 24 uimsbf
channelConfiguration; 4 uimsbf

sbrPresentFlag = -1;
if ( audioObjectType == 5 ) {
extensionAudioObjectType = audioObjectType;
sbrPresentFlag = 1;
extensionSamplingFrequencyIndex; 4 uimsbf
if ( extensionSamplingFrequencyIndex==0xf )
extensionSamplingFrequency; 24 uimsbf
audioObjectType; 5 uimsbf
}
else {
extensionAudioObjectType = 0;
}
if ( audioObjectType == 1 || audioObjectType == 2 ||
audioObjectType == 3 || audioObjectType == 4 ||
audioObjectType == 6 || audioObjectType == 7 )
GASpecificConfig();
if ( audioObjectType == 8 )
CelpSpecificConfig();
if ( audioObjectType == 9 )
HvxcSpecificConfig();
if ( audioObjectType == 12 )
TTSSpecificConfig();
if ( audioObjectType == 13 || audioObjectType == 14 ||
audioObjectType == 15 || audioObjectType==16)
StructuredAudioSpecificConfig();

if ( audioObjectType == 17 || audioObjectType == 19 ||
audioObjectType == 20 || audioObjectType == 21 ||
audioObjectType == 22 || audioObjectType == 23 )
GASpecificConfig();
if ( audioObjectType == 24)
ErrorResilientCelpSpecificConfig();
if ( audioObjectType == 25)
ErrorResilientHvxcSpecificConfig();
if ( audioObjectType == 26 || audioObjectType == 27)
ParametricSpecificConfig();
if ( audioObjectType == 17 || audioObjectType == 19 ||
audioObjectType == 20 || audioObjectType == 21 ||
audioObjectType == 22 || audioObjectType == 23 ||
audioObjectType == 24 || audioObjectType == 25 ||
audioObjectType == 26 || audioObjectType == 27 ) {
epConfig; 2 uimsbf
if ( epConfig == 2 || epConfig == 3 ) {
ErrorProtectionSpecificConfig();
}
if ( epConfig == 3 ) {
4 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
directMapping; 1 uimsbf
if ( ! directMapping ) {
/* tbd */
}
}
}
if ( audioObjectType == 28)
SSCSpecificConfig();
if ( extensionAudioObjectType != 5 &&
bits_to_decode() >= 16 ) {
syncExtensionType; 11 bslbf
if (syncExtensionType == 0x2b7) {
extensionAudioObjectType; 5 uimsbf
if ( extensionAudioObjectType == 5 ) {
sbrPresentFlag; 1 uimsbf
If (sbrPresentFlag == 1) {
extensionSamplingFrequencyIndex; 4 uimsbf
if ( extensionSamplingFrequencyIndex == 0xf )
extensionSamplingFrequency; 24 uimsbf
}
}
}
}
}
© ISO/IEC 2004 — All rights reserved 5

ISO/IEC 14496-3:2001/Amd.2:2004(E)
In Part 3: Audio, Subpart 1, in subclause 1.6.2.2.1 Overview, replace table 1.9 by the following table:
Table 1.9 – Audio Object Types
Audio Object Type Object definition of elementary stream Mapping of audio payloads to
Type ID payloads and detailed syntax access units and elementary
streams
AAC MAIN 1 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.2
AAC LC 2 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.2
AAC SSR 3 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.2
AAC LTP 4 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.2
SBR 5 ISO/IEC 14496-3 subpart 4
AAC scalable 6 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.3
TwinVQ 7 ISO/IEC 14496-3 subpart 4
CELP 8 ISO/IEC 14496-3 subpart 3
HVXC 9 ISO/IEC 14496-3 subpart 2
TTSI 12 ISO/IEC 14496-3 subpart 6
Main synthetic 13 ISO/IEC 14496-3 subpart 5
Wavetable synthesis 14 ISO/IEC 14496-3 subpart 5
General MIDI 15 ISO/IEC 14496-3 subpart 5
Algorithmic Synthesis 16 ISO/IEC 14496-3 subpart 5
and Audio FX
ER AAC LC 17 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.4
ER AAC LTP 19 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.4
ER AAC scalable 20 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.4
ER Twin VQ 21 ISO/IEC 14496-3 subpart 4
ER BSAC 22 ISO/IEC 14496-3 subpart 4
ER AAC LD 23 ISO/IEC 14496-3 subpart 4 see subclause 1.6.2.2.2.1.4
ER CELP 24 ISO/IEC 14496-3 subpart 3
ER HVXC 25 ISO/IEC 14496-3 subpart 2
ER HILN 26 ISO/IEC 14496-3 subpart 7
ER Parametric 27 ISO/IEC 14496-3 subpart 2 and 7
SSC 28 ISO/IEC 14496-3 subpart 8

6 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Create Part 3: Audio, Subpart 8:
Subpart 8: Technical description of parametric coding for high
quality audio
8.1 Scope
This part of ISO/IEC 14496 describes the MPEG-4 audio parametric coding scheme for compression of high
quality audio. The short name is SSC (SinuSoidal Coding). At bit-rates around 24 kbit/s stereo and at a
sampling rate of 44.1 kHz, the SSC coding scheme offers a quality that is interesting for a number of
applications.
SSC employs four different tools that together parameterize an audio signal. These tools consist of transient
modelling, sinusoidal modelling, noise modelling and stereo image modelling. One of the distinctive features
of SSC is that it provides decoder support for independent tempo and pitch scaling at hardly any additional
complexity.
Transient tool
The transient tool captures the highly dynamic events of the audio input signal. These events are efficiently
modelled by means of a limited number of sinusoids that are shaped by means of an envelope.
Sinusoidal tool
The sinusoidal tool captures the deterministic events of the audio input signal. The slowly varying nature of
sinusoidal components for typical audio signals is exploited by linking sinusoids over consecutive frames. By
means of differential coding, the frequency, amplitude and phase parameters can be efficiently represented.
Noise tool
The noise tool captures the stochastic or non-deterministic events of the audio input signal. In the decoder, a
white noise generator is used as excitation. A temporal and spectral envelope is applied to control the
temporal and spectral properties of the noise in the audio signal.
Parametric stereo coding tool
The parametric stereo coding tool is able to capture the stereo image of the audio input signal into a limited
number of parameters, requiring only a small overhead ranging from a few kbit/s for medium quality, up to
about 9 kbit/s for higher quality. Together with a monaural downmix of the stereo input signal generated by the
parametric stereo coding tool, the parametric stereo decoding tool is able to regenerate the stereo signal. It is
a generic tool that in principle can operate in combination with any monaural coder. In Annex A of this
document a normative description of the combination of HE-AAC with the parametric stereo coding tool is
provided. SSC can also operate in dual mono mode. In that case the parametric stereo coding tool is not
employed. The parametric stereo tool is intended for low bit-rates.
8.2 Terms and definitions
8.2.1
Frame
Basic unit that can be decoded on itself (file header information is required for general decoder settings).
8.2.2
Laguerre filter
Filter structure used in the noise analysis and synthesis.
© ISO/IEC 2004 — All rights reserved 7

ISO/IEC 14496-3:2001/Amd.2:2004(E)
8.2.3
Audio frame
Contains all data to decode an SSC-coded frame as a stand-alone unit (file header information is required for
general decoder settings). For audio frames with refresh_sinusoids==%1 and refresh_noise==%1 the
complete frame can always be reconstructed; otherwise it is possible in the case of random access that parts
of the signal cannot be reconstructed (e.g. sinusoidal continuations, noise).
8.2.4
Sub-frame
Fine granularity within a frame.
8.2.5
f
s
The sampling frequency in Hertz.
8.2.6
Segment
An interval of samples that can be synthesized on the basis of the parameters that correspond to a sub-frame.
The segment size is 2*S (see Table 8.11).
8.2.7
Window
A function that is used to weigh synthesized samples within a segment such that a valid synthesis is obtained.
8.2.8
LSF
Line Spectral Frequency.
8.2.9
Overlap and add
An additive method of combining overlapping intervals during signal synthesis.
8.2.10
Linking process
A method to keep track of sinusoidal components over time.
8.2.11
birth
The first component of a sinusoidal track.
8.2.12
Continuation
A sinusoidal track component that is not at the start or the end of a track.
8.2.13
Death
The last component of a sinusoidal track.
8.2.14
SMR
Signal-to-masking ratio.
8.2.15
Partial
Sinusoid of a limited duration.
8.2.16
IID
Inter-channel Intensity Differences.
8 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
8.2.17
IPD
Inter-channel Phase Differences.
8.2.18
OPD
Overall Phase Differences.
8.2.19
ICC
Inter-channel Coherence.
8.3 Symbols and abbreviations
8.3.1 Arithmetic operators
x Round x towards minus infinity
x Round x towards plus infinity.
 
x
mod Modulus operator: mod(x, y)= x− y . Defined only for positive values of x and y.
 
y
 

−t α−1
Γ(α) Gamma distribution function, defined as Γ(α )= e ⋅ t dt .

8.3.2 Relation operators
x?y:z If x is true then y else z.
8.3.3 Mnemonics
The following mnemonics are defined to describe the different data types used in the coded bit-stream.
uimsbf Unsigned integer, most significant bit first.
simsbf Signed integer, most significant bit first.
bslbf Bitstream left bit first.
8.3.4 Ranges
[0, 10] A number in the range of 0 up to and including 10.
[0, 10> A number in the range of 0 up to but excluding 10.
8.3.5 Number notation
%X Binary number representation (e.g. %01111100).
$X Hexadecimal number representation (e.g. $7C).
X Numbers with no prefix use decimal representation (e.g. 124).
© ISO/IEC 2004 — All rights reserved 9

ISO/IEC 14496-3:2001/Amd.2:2004(E)
8.3.6 Definitions
S  Number of samples in a sub-frame (see Table 8.11).
L  Number of samples in a segment; L = 2*S.
numQMFSlots Number of QMF subband samples per ps_data() element. For SSC, this parameter is fixed
to 24.
8.4 Payloads for the audio object type SSC
8.4.1 Decoder configuration (SSCSpecificConfig)
Table 8.1 – Syntax of SSCSpecificConfig()
Syntax Num. bits Mnemonic
SSCSpecificConfig ( channelConfiguration )
{
decoder_level 2 uimsbf
update_rate 4 uimsbf
synthesis_method 2 uimsbf
if (channelConfiguration != 1)
{
mode_ext 2 uimsbf
if ((channelConfiguration == 2) && (mode_ext == 1))

{
reserved 2 uimsbf
}
}
}
8.4.2 SSC Bitstream Payload
Table 8.2 – Syntax of ssc_audio_frame()
Syntax Num. bits Mnemonic
ssc_audio_frame ()
{
ssc_audio_frame_header()
ssc_audio_frame_data()
}
10 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Table 8.3 – Syntax of ssc_audio_frame_header()
Syntax Num. bits Mnemonic
ssc_audio_frame_header ()
{
refresh_sinusoids 1 uimsbf
refresh_sinusoids_next_frame 1 uimsbf
refresh_noise 1 uimsbf
for (ch = 0; ch < nrof_channels; ch++)
{
s_nrof_continuations[0][ch] Note 1 uimsbf
}
n_nrof_den 5 uimsbf
Note 1
n_nrof_lsf uimsbf
freq_granularity 2 uimsbf
amp_granularity 2 uimsbf
phase_jitter_present 1 uimsbf
if (phase_jitter_present == 1)
{
phase_jitter_percentage 2 uimsbf
phase_jitter_band 2 uimsbf
}
}
Note 1: See description of s_nrof_continuations and n_nrof_lsf in section 8.5.2.

Table 8.4 – Syntax of ssc_audio_frame_data()
Syntax Num. bits Mnemonic
ssc_audio_frame_data()
{
for (sf = 0; sf < nrof_subframes; sf++)
{
for (ch = 0; ch < nrof_channels; ch++)
{
ssc_mono_subframe(sf,ch)
if ((channelConfiguration == 2) && (mode_ext == 1)
&& (mod(sf+1,4)==0))
{
ps_data()
}
}
}
}
Table 8.5 – Syntax of ssc_mono_subframe()
Syntax Num. bits Mnemonic
ssc_mono_subframe (sf,ch)
{
subframe_transients(sf, ch)
subframe_sinusoids(sf, ch)
subframe_noise(sf, ch)
}
© ISO/IEC 2004 — All rights reserved 11

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Table 8.6 – Syntax of subframe_transients()
Syntax Num. bits Mnemonic
subframe_transients (sf, ch)
{
t_transient_present[sf][ch] 1 uimsbf
if (t_transient_present[sf][ch] == 1)
{
t_loc[sf][ch] Note 1 uimsbf
if (t_type[sf][ch]==1)
2 uimsbf
{
t_b_par[sf][ch] 3 uimsbf
t_chi_par[sf][ch] 3 uimsbf
t_nrof_sin[sf][ch] 3 uimsbf
t_nrof_sin[sf][ch]++
for (i = 0; i < t_nrof_sin[sf][ch]; i++)

{
t_freq[sf][ch][i] 9 uimsbf
t_amp[sf][ch][i] 5 uimsbf
t_phi[sf][ch][i] 5 simsbf
}
}
}
}
Note 1: See description of t_loc in section 8.5.2.

12 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Table 8.7 – Syntax of subframe_sinusoids()
Syntax Num. bits Mnemonic
subframe_sinusoids(sf, ch)
{
n = 0; p=0; q=0;
/* Continuations */
if (sf > 0)
{
noc = 0;
while (tmp_cont[ch][noc] > 0) { noc++;}
s_nrof_continuations[sf][ch] = noc;
}
if ((refresh_sinusoids == 1) && (sf == 0))
{
for (i = 0; i < s_nrof_continuations[sf][ch]; i++, n++)
{
s_cont[sf][ch][n] = ssc_huff_dec(huff_scont,bs_codeword); 2.5 bslbf
s_freq_coarse[sf][ch][n] = ssc_huff_dec(huff_sfreqc,bs_codeword); 7.25 bslbf
s_freq_fine[sf][ch][n] 0.3 simsbf
s_amp_coarse[sf][ch][n] = ssc_huff_dec(huff_sampca,bs_codeword); 3.16 bslbf
s_amp_fine[sf][ch][n] 0.3 simsbf
s_phi[sf][ch][n] 5 simsbf
if (s_cont[sf][ch][n] > 0)
{
s_adpcm_grid[sf][ch][n] = ssc_huff_dec(huff_sgrid,bs_codeword);
3.7 bslbf
s_delta_cont_freq_pha[sf+1][ch][p] 2 uimsbf
p++;
}
if (s_cont[sf][ch][n] > 1)
{
s_delta_cont_freq_pha[sf+2][ch][q] 2 uimsbf
q++;
}
}
}
else {
for (i = 0; i < s_nrof_continuations[sf][ch]; i++, n++)

{
if (sf == 0)
{
s_cont[sf][ch][n] = ssc_huff_dec(huff_scont,bs_codeword); 2.5 bslbf
}
else {
s_cont[sf][ch][n] = tmp_cont[ch][n] - 1;
}
if (s_cont[sf][ch][n] > 0)
{
p++;
}
if (s_cont[sf][ch][n] > 1)
{
if ((refresh_sinusoids_next_frame == 0) || (nrof_subframes-sf > 2))
{
s_delta_cont_freq_pha[sf+2][ch][q] 2 uimsbf
}
q++;
}
s_delta_cont_amp[sf][ch][n] = 1.15 bslbf
ssc_huff_dec(huff_sampcr[amp_granularity],bs_codeword);
© ISO/IEC 2004 — All rights reserved 13

ISO/IEC 14496-3:2001/Amd.2:2004(E)
}
}
/* Births */
s_nrof_births[sf][ch] = ssc_huff_dec(huff_nrofbirths,bs_codeword); 3.15 bslbf
if (s_nrof_births[sf][ch] > 0)
{
s_cont[sf][ch][n] = ssc_huff_dec(huff_scont,bs_codeword); 2.5 bslbf
s_freq_coarse[sf][ch][n] = ssc_huff_dec(huff_sfreqba,bs_codeword); 7.21 bslbf
s_freq_fine[sf][ch][n] 0.3 simsbf
s_amp_coarse[sf][ch][n] = ssc_huff_dec(huff_sampba,bs_codeword); 3.15 bslbf
s_amp_fine[sf][ch][n] 0.3 simsbf
s_phi[sf][ch][n] 5 simsbf
if (s_cont[sf][ch][n] > 0)
{
if ((refresh_sinusoids_next_frame == 0) || (nrof_subframes-sf > 1))
{
s_delta_cont_freq_pha[sf+1][ch][p] 2 uimsbf
}
p++;
}
if (s_cont[sf][ch][n] > 1)
{
if ((refresh_sinusoids_next_frame == 0) || (nrof_subframes-sf > 2))
{
s_delta_cont_freq_pha[sf+2][ch][q] 2 uimsbf
}
q++;
}
n++;
for (i = 1; i < s_nrof_births[sf][ch]; i++, n++)
{
s_cont[sf][ch][n] = ssc_huff_dec(huff_scont,bs_codeword); 2.5 bslbf
s_delta_birth_freq_coarse[sf][ch][n] = 5.23 bslbf
ssc_huff_dec(huff_sfreqbr,bs_codeword);
s_delta_birth_freq_fine[sf][ch][n] 0.3 simsbf
s_delta_birth_amp_coarse[sf][ch][n] = 2.21 bslbf
ssc_huff_dec(huff_sampbr,bs_codeword);
s_delta_birth_amp_fine[sf][ch][n] 0.3 simsbf
s_phi[sf][ch][n] 5 simsbf
if (s_cont[sf][ch][n] > 0)
{
if ((refresh_sinusoids_next_frame == 0) || (nrof_subframes-sf > 1))
{
s_delta_cont_freq_pha[sf+1][ch][p] 2 uimsbf
}
p++;
}
if (s_cont[sf][ch][n] > 1)
{
if ((refresh_sinusoids_next_frame == 0) || (nrof_subframes-sf > 2))
{
s_delta_cont_freq_pha[sf+2][ch][q] 2 uimsbf
}
q++;
}
}
}
/* Keep track of sinusoids that continue in next sub-frame(s) */
for (i = 0, k = 0; i < n; i++)
{
14 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
if (s_cont[sf][ch][i] > 0)
{
tmp_cont[ch][k] = s_cont[sf][ch][i];
k++;
}
}
}
Note: The variables p, q are used as position indices for subframe+1 and subframe+2 respectively.

© ISO/IEC 2004 — All rights reserved 15

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Table 8.8 – Syntax of subframe_noise()
Syntax Num. bits Mnemonic
subframe_noise (sf, ch)
{
if ((refresh_noise == 1) && (sf == 0))
{
n_laguerre[ch] 2 uimsbf
n_laguerre_granularity[sf][ch] 1 uimsbf
for (i = 0; i < n_nrof_den; I++)

{
n_lar_den_coarse[sf][ch][i] = 1.18 bslbf
ssc_huff_dec(huff_nlag,bs_codeword);
If (n_laguerre_granularity[sf][ch]==1)

{
n_lar_den_fine[sf][ch][i] 2 simsbf
}
}
n_gain[sf][ch] 7 uimsbf
n_lsf[sf][ch][0] = ssc_huff_dec(huff_nlsf,bs_codeword);
2.9 bslbf
for (i = 1; i < n_nrof_lsf; i++)
{
n_delta_lsf[sf][ch][i] = ssc_huff_dec(huff_nlsf,bs_codeword); 2.9 bslbf
}
}
else {
if ( mod(sf,2) == 0 )
{
n_laguerre_granularity[sf][ch] 1 uimsbf
for (i = 0; i < n_nrof_den; i++)
{
n_delta_lar_den_coarse[sf][ch][i] =
1.18 bslbf
ssc_huff_dec(huff_nlag,bs_codeword);
if(n_laguerre_granularity[sf][ch]==1)

{
n_delta_lar_den_fine[sf][ch][i] 2 simsbf
}
}
}
if ( mod(sf,4) == 0 )
{
n_delta_gain[sf][ch] = ssc_huff_dec(huff_ngain,bs_codeword); 1.12 bslbf
if (n_overlap_lsf == 1) 1 uimsbf
{
for ( i = n_nrof_overlap_lsf; i < n_nrof_lsf; i++)
{
n_delta_lsf[sf][ch][i] = 2.9 bslbf
ssc_huff_dec(huff_nlsf,bs_codeword);
}
}
else {
n_lsf[sf][ch][0] = ssc_huff_dec(huff_nlsf,bs_codeword);
2.9 bslbf
for ( i = 1; i < n_nr_of_lsf; i++)
{
n_delta_lsf[sf][ch][i] = ssc_huff_dec(huff_nlsf,bs_codeword); 2.9 bslbf
}
}
}
}
}
16 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Table 8.9 – Syntax of ps_data()
Syntax Num. bits Mnemonic
ps_data()
{
if (enable_ps_header) { 1 uimsbf
if (enable_iid) { 1 uimsbf
iid_mode 3 uimsbf
nr_iid_par = nr_iid_par_tab[iid_mode]
nr_ipdopd_par = nr_ipdopd_par_tab[iid_mode]
}
if (enable_icc) { 1 uimsbf
icc_mode 3 uimsbf
nr_icc_par = nr_icc_par_tab[icc_mode]
}
enable_ext 1 uimsbf
}
frame_class 1 uimsbf
num_env_idx 2 uimsbf
num_env = num_env_tab[frame_class][num_env_idx]

if (frame_class) {
for (e=0 ; e border_position[e] 5 uimsbf
}
}
for (e=0 ; e if (enable_iid) {
iid_dt[e] 1 uimsbf
iid_data()
}
}
for (e=0 ; e if (enable_icc) {
icc_dt[e] 1 uimsbf
icc_data()
}
}
if (enable_ext) {
cnt = ps_extension_size 4 uimsbf
if (cnt == 15)
cnt += esc_count 8 uimsbf
num_bits_left = 8 * cnt
while (num_bits_left > 7) {
ps_extension_id 2 uimsbf
num_bits_left -= 2
ps_extension(ps_extension_id, num_bits_left)
}
fill_bits num_bits_left
}
}
ps_extension(ps_extension_id, num_bits_left){
if (ps_extension_id == 0) {
if (enable_ipdopd) { 1 uimsbf
© ISO/IEC 2004 — All rights reserved 17

ISO/IEC 14496-3:2001/Amd.2:2004(E)
for (e=0 ; e ipd_dt[e] 1 uimsbf
ipd_data()
opd_dt[e] 1 uimsbf
opd_data()
num_bits_left -= ipd_bits + opd_bits + 2 Note 1
}
}
reserved_ps 1 uimsbf
num_bits_left -= 2
}
}
iid_data() {
if (iid_dt[e]) {
for (b=0 ; b iid_par_dt[e][b] = ssc_huff_dec(huff_iid_dt[iid_quant],bs_codeword); 1…20 Note 2
}
}
else {
for (b=0 ; b iid_par_df[e][b] = ssc_huff_dec(huff_iid_df[iid_quant],bs_codeword); 1…18 Note 2
}
}
}
icc_data() {
if (icc_dt[e]) {
for (b=0 ; b icc_par_dt[e][b] = ssc_huff_dec(huff_icc_dt,bs_codeword); 1…14 bslbf
}
}
else {
for (b=0 ; b icc_par_df[e][b] = ssc_huff_dec(huff_icc_df,bs_codeword); 1…13 bslbf
}
}
}
ipd_data() {
if (ipd_dt[e]) {
for (b=0 ; b ipd_par_dt[e][b] = ssc_huff_dec(huff_ipd_dt,bs_codeword); 1…5 bslbf
}
}
else {
for (b=0 ; b ipd_par_df[e][b] = ssc_huff_dec(huff_ipd_df,bs_codeword); 1…4 bslbf
}
}
}
opd_data() {
if (opd_dt[e]) {
for (b=0 ; b opd_par_dt[e][b] = ssc_huff_dec(huff_opd_dt,bs_codeword); 1…5 bslbf
}
}
else {
for (b=0 ; b 18 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
opd_par_df[e][b] = ssc_huff_dec(huff_opd_df,bs_codeword); 1…5 bslbf
}
}
}
Note 1: ipd_bits and opd_bits represent the number of bits read by ipd_data() and opd_data() respectively.
Note 2: the index iid_quant into huff_iid_df is obtained from Table 8.19.
8.5 Semantics
8.5.1 SSCSpecificConfig
channelConfiguration – Channel configuration as defined in ISO/IEC-14496-3:edition 2001 subpart 1,
paragraph 1.6.3.4.
decoder_level – Complexity bounds for decoder settings. A decoder that supports a certain level of
complexity is not able to decode a bit-stream that is encoded according a higher level of complexity. This
decoder is however able to decode a bit-stream that is encoded according to a lower level of complexity (see
Table 8.10).
Table 8.10 - Decoder level
decoder_ Level of max_nrof_sinusoids max_nrof_den #bits for s_nrof_ #bits for
level complexity continuations n_nrof_lsf
00 Reserved Na Na Na Na
01 Medium 60 22 6 4
10 Reserved Na Na Na Na
11 Reserved Na Na Na Na
max_nrof_sinusoids - Maximum number of sinusoids that is allowed (sinusoids under Meixner transients not
included).
max_nrof_den - Maximum value for n_nrof_den.
update_rate – Four bits indicating the sub-frame size S. Table 8.11 shows the relationship between
update_rate and the sub-frame size S in samples.
Table 8.11 - Update rate
update_rate S update_rate S
0000 Reserved 1000 Reserved
0001 Reserved 1001 Reserved
0010 Reserved 1010 Reserved
0011 Reserved 1011 Reserved
0100 384 1100 Reserved
0101 Reserved 1101 Reserved
0110 Reserved 1110 Reserved
0111 Reserved 1111 Reserved
synthesis_method – Two bits providing information on the preferred synthesis for the specific encoded
program (see Table 8.12).
© ISO/IEC 2004 — All rights reserved 19

ISO/IEC 14496-3:2001/Amd.2:2004(E)
Table 8.12 - Synthesis method
Synthesis_method Optimal synthesis
00 Overlap and Add
01 Reserved
10 Reserved
11 Reserved
mode_ext – In combination with channelConfiguration the mode_ext bits provide the full channel
configuration. The number of bits is dependent on the channelConfiguration (see Table 8.13).
Table 8.13 - Channel configuration
channelConfiguration # bits for mode_ext nrof_channels
1 0 1
2 2 According to mode_ext
0, 3 … 15 Na Na
For channelConfiguration == 2, Table 8.14 applies:
Table 8.14 – Channel configuration in case channelConfiguration == 2
mode_ext Full channel configuration
00 Dual mono (ch0=left, ch1=right)
01 Parametric Stereo
10 Reserved
11 Reserved
reserved – Two reserved bits; should be set to %0.
8.5.2 Decoding of SSC Bitstream Payload
ssc_audio_frame() – syntactic element that contains a single SSC frame
ssc_audio_frame_header() – syntactic element that contains the header data for a single SSC frame
ssc_audio_frame_data() – syntactic element that contains the data for a single SSC frame
ssc_huff_dec() – Huffman decoding procedure. See Annex B.
refresh_sinusoids – One bit indicating how sinusoidal continuations of the first sub-frame in a frame are
encoded. If this bit equals %0, the continued track data is differentially coded with respect to the last sub-
frame of the previous frame. If this bit equals %1, the continued track data in the first sub-frame of a frame are
coded as absolute values.
refresh_sinusoids_next_frame – One bit providing an additional frame look ahead for the ADPCM decoding
of sinusoidal parameters. If this bit is set to %1, the next frame is a refresh frame. In that case the bit
refresh_sinusoids shall be set to %1 in the next frame.
refresh_noise – One bit indicating how noise parameters of the first sub-frame in a frame are encoded. If this
bit equals %0, the noise parameters are differentially coded with respect to the last sub-frame of the previous
frame. If this bit equals %1, the noise parameters in the first sub-frame of a frame are coded as absolute
values.
s_nrof_continuations[sf][ch] – For sub-frame sf and channel ch, this value represents the number of
continuations. In the case sf==0 the value of s_nrof_continuations is provided in the bit-stream. For the
20 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
remaining values of sf, the value of s_nrof_continuations is obtained implicitly as described in section 8.6.2.1.
The number of bits required for s_nrof_continuations[0][ch] depends on the maximum number of allowed
sinusoids, which is dependent of the decoder complexity, indicated by decoder_level. This relation is shown in
Table 8.10.
n_nrof_den – Number of denominator LAR coefficients of the FIR filter for noise generation.
n_nrof_lsf - Number of LSF coefficients used for the generation of envelope for noise generation. The
number of bits required for n_nrof_lsf depends on the decoder complexity, indicated by decoder_level. This
relation is shown in Table 8.10.
freq_granularity – The granularity of the differentially or absolute coded frequency parameters used in
subframe_sinusoids(). This parameter determines the number of bits to be read for the fine part of the
frequency parameters.
amp_granularity – The granularity of the differentially or absolute coded amplitude parameters used in
subframe_sinusoids(). This parameter determines the Huffman table to be used or the number of bits to be
read for the fine part of the amplitude parameters.
phase_jitter_present – One bit to indicate presence of phase jitter parameters. If this bit equals %0, no
phase jitter is present. If this bit equals %1, phase jitter is present.
phase_jitter_percentage – This is a two bit unsigned integer identifying a distance percentage. The full
distance equals half a quantisation step. The maximum jitter applied to the frequency components is
phase_jitter_percentage+1
freq_granularity−1
max_jitter= 2 .
phase_jitter_band – Two bits identifying the frequency representation level from which phase jitter must be
applied. Table 8.15 provides the relation between phase_jitter_band and f .
jitter,min
Table 8.15 - Phase jitter band expressed in representation levels
phase_jitter_band frequency representation level f
jitter,min
00 0
01 800
10 1600
11 2400
nrof_subframes – the number of sub-frames in one frame. This value is fixed to 8.
ssc_mono_subframe() – syntactic element that contains the data for a single SSC sub frame.
ps_data() – syntactic element that contains the parametric stereo data.
subframe_transients() – syntactic element that contains the transient data for a single SSC sub frame.
subframe_sinusoids() – syntactic element that contains the sinusoid data for a single SSC sub frame.
subframe_noise() – syntactic element that contains the noise data for a single SSC sub frame.
t_transient_present[sf][ch] – One bit indicating if a transient is present in sub-frame sf, channel ch. If
t_transient_present[sf][ch]==%1, a transient is present. If t_transient_present[sf][ch]==%0, no transient is
present.
© ISO/IEC 2004 — All rights reserved 21

ISO/IEC 14496-3:2001/Amd.2:2004(E)
t_loc[sf][ch] – Indication of the location of the transient in sub-frame sf of channel ch, indicated in the number
of samples from the start of the sub-frame. The valid range for t_loc is [0,S>. The number of bits that is used
to represent t_loc is calculated according to
log (S) ,

where S represents the sub-frame size in samples.
t_type[sf][ch] – Two bits to indicate the transient type of the transient in sub-frame sf of channel ch (see
Table 8.16).
Table 8.16 - Transient types
t_type Type
00 Step
01 Meixner
10 Reserved
11 Reserved
t_b_par[sf][ch] – For a transient of the Meixner type in sub-frame sf of channel ch, these 3 bits hold the value
for the attack of the transient envelope, denoted as the ‘b-parameter’. Allowed values for t_b_par are [0, 1, 2,
3]. The remaining values are reserved. The value b is calculated as
b = t_b_par + 2.
t_chi_par[sf][ch] – For a transient of the Meixner type in sub-frame sf of channel ch, these 3 bits hold the
value for the decay of the transient envelope, denoted as the ‘ξ-parameter’. Allowed values for t_chi_par are
[0, 1, 2, 3]. The remaining values are reserved. The value ξ is tabulated in Table 8.17.
Table 8.17 - Quantised values for b and ξ
t_b_par
ξ
0 1 2 3
0 0.9688 0.9685 0.9683 0.9681
t_chi_par 1 0.9763 0.9756 0.9750 0.9744
2 0.9839 0.9827 0.9817 0.9807
3 0.9914 0.9898 0.9884 0.9870
t_nrof_sin[sf][ch] – For a transient of the Meixner type in sub-frame sf of channel ch, these 3 bits represent
the number of sinusoids that are present under the envelope. The number of sinusoids under the Meixner
envelope is equal to the value in the stream plus one.
t_freq[sf][ch][i] – For a transient of the Meixner type in sub-frame sf of channel ch, these bits represent the
frequency in radians of the i-th sinusoid under the transient envelope.
t_freq[sf][ch][i]
11.4⋅21.4
2π 10 −1
tf [i]= ,
q
f 0.00437
s
where tf represents the dequantized absolute frequency in radians.
q
t_amp[sf][ch][i] – For a transient of the Meixner type in sub-frame sf of channel ch, these bits represent the
amplitude of the i-th sinusoid under the transient envelope.
22 © ISO/IEC 2004 — All rights reserved

ISO/IEC 14496-3:2001/Amd.2:2004(E)
2⋅t_amp[sf][ch][i]
ta [i]= ta ,
q b
where ta represents the log quantisation base (maximum error=1.5dB), ta = 1.1885 . ta represents the
b q
b
dequantized absolute amplitude.
t_phi[sf][ch][i] – For a transient of the Meixner type in sub-frame sf of channel ch, these bits represent the
phase of the i-th sinusoid under the transient envelope. The decoded value is converted into a phase value in
radians in the range [-π, π> and is specified for the start of the transient.
tp [i]= 2⋅tp ⋅ t_phi[sf][ch][i],
q e
π
where tp represents the absolute phase error ( tp = ) and tp represents the dequantized absolute phase
e q
e
(in radians). The allowed range for t_phi is [-16, 15]; the representation level +16 is represented by –16
(because +π==-π).
noc – Local variable that counts the number of continuations in the previous sub-frame.
tmp_cont[ch][noc] – Local array that contains a copy of the s_cont-parameters of the previous sub-frame,
needed for parsing the stream correctly (extract number of continuations and keep track of how many sub-
frames the sinusoidal track must be continued in the current frame).
s_cont[sf][ch][n] – For sub-frame sf and channel ch, this value indicates how many sub-frames component n
will be continued in the current frame (if the component continues also in the next frame, one must be added
to the number of sub-frames it continues in the current frame. If the value is 0 this indicates component n
stops at sub-frame sf, which is called a death). The valid range for s_cont is [0, 9].
s_freq_coarse[sf][ch][n] – For sub-frame sf and channel ch, this value represents the coarse frequency
parameter of the n-th sinusoid.
s_freq_fine[sf][ch][n] – For sub-frame sf and channel ch, this signed integer represents a higher level of
detail to the coarse frequency parameter. The number of bits to be read amounts to (3 – freq_granularity). The
frequency representation level f is the sum of coarse frequency, fine frequency scaled to the granularity grid.
rl
freq_granularity
f [n]= s_freq_coarse[sf][ch][n]+ s_freq_fine[sf][ch][n]⋅ 2 .
rl
Phase jitter is only applied in combination with tempo and pitch scaling. If phase_jitter_present == %1 and f
rl
> f , the phase jitter parameter is
jitter,min
f = max_jitter⋅()2x−1+ 0.5 ,
jitter
where x holds a random number, uniformly distributed between 0 and 1, generated for each frequency
parameter in the sub-frame, matching the requirement above. The decoded value is converted into a
dequantized absolute frequency value f in radians, using the following equation:
q
f [n]
rl
91.2⋅21.4
2π 10 −1
f [n]= .
q
f 0.00437
s
s_amp_coarse[sf][ch][n] – For sub-frame sf and channel ch, this value represents the coarse amplitude
parameter of the n-th sinusoid.
s_amp_fine[sf][ch][n] – For sub-frame sf and channel ch, this parameter represents a higher level of detail to
the coarse amplitude parameter. The number of bits to be read amounts to (3 – amp_granularity). The
amplitude representation level sa is the sum of coarse amplitude, fine amplitude scaled to the granularity grid
rl
© ISO/IEC 2004 — All rights reserved 23

ISO/IEC 14496-3:2001/Amd.2:2004(E)
amp_granularity
sa [n]= s_amp_coarse[sf][ch][n]+ s_amp_fine[sf][ch][n]⋅ 2 .
rl
The decoded value is converted into a dequantized linear amplitude value sa in the range [1, 2 -1]
q
conforming to
2⋅sa [n]
rl
sa [n]= sa .
q b
Where sa = 1.0218 is the log quantization base. Its value corresponds to a maximum error of 0.1875 dB.
b
s_phi[sf][ch][n] – For sub-frame sf and channel ch, this represents the phase parameter of the n-th sinusoid.
This value is converted into a phase value in radians in the range [-π
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...