Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 3: Scalable Lossless Coding (SLS)

Technologies de l'information — Codage des objets audiovisuels — Partie 3: Codage audio — Amendement 3: Codage extensible sans perte (SLS)

General Information

Status
Withdrawn
Publication Date
08-Jun-2006
Withdrawal Date
08-Jun-2006
Current Stage
9599 - Withdrawal of International Standard
Start Date
26-Aug-2009
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 14496-3:2005/Amd 3:2006 - Scalable Lossless Coding (SLS)
English language
73 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 14496-3:2005/Amd 3:2006 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 3: Scalable Lossless Coding (SLS)". This standard covers: Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 3: Scalable Lossless Coding (SLS)

Information technology - Coding of audio-visual objects - Part 3: Audio - Amendment 3: Scalable Lossless Coding (SLS)

ISO/IEC 14496-3:2005/Amd 3:2006 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 14496-3:2005/Amd 3:2006 has the following relationships with other standards: It is inter standard links to ISO/IEC 14496-3:2005, ISO/IEC 14496-3:2009; is excused to ISO/IEC 14496-3:2005. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 14496-3:2005/Amd 3:2006 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 14496-3
Third edition
2005-12-01
AMENDMENT 3
2006-06-01
Information technology — Coding of
audio-visual objects —
Part 3:
Audio
AMENDMENT 3: Scalable Lossless Coding
(SLS)
Technologies de l'information — Codage des objets audiovisuels —
Partie 3: Codage audio
AMENDEMENT 3: Codage extensible sans perte (SLS)

Reference number
ISO/IEC 14496-3:2005/Amd.3:2006(E)
©
ISO/IEC 2006
ISO/IEC 14496-3:2005/Amd.3:2006(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

©  ISO/IEC 2006
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 3 to ISO/IEC 14496-3:2005/Amd. 3:2005 was prepared by Joint Technical Committee
ISO/IEC JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and
hypermedia information.
This Amendment specifies Audio Scalable Lossless Coding (SLS).

© ISO/IEC 2006 – All rights reserved iii

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Information technology — Coding of audio-visual objects —
Part 3:
Audio
AMENDMENT 3: Scalable Lossless Coding (SLS)
In ISO/IEC 14496-3, Introduction, add the following to the end of the subclause "MPEG-4 general audio
coding tools":
MPEG-4 SLS (Scalable Lossless Coding) is a tool used in combination with optional MPEG-4 General Audio
coding tools to provide fine-grain scalable to numerical lossless coding of digital audio waveform.

In Part 3: Audio, Subpart 1, in subclause 1.3 Terms and Definitions, add:
SLS: Audio Scalable to Lossless Coding
and increase the index-number of subsequent entries.

In Part 3: Audio, Subpart 1, in subclause 1.5.1.1 Audio object type definition, amend table 1.1 with the updates
in the table below:
Tools/
Modules
Audio Object
Type

(escape)   X 31

SLS X X X X X 37
SLS non-core  X X 38
...
Note: (*) marks new columns
© ISO/IEC 2006 – All rights reserved 1
Error Mapping (*)
Integer TNS (*)
Integer M/S (*)
IntMDCT (*)
BPGC/CBAC/LEMC (*)
Remark
Object Type ID
ISO/IEC 14496-3:2005/Amd.3:2006(E)
In Part 3: Audio, Subpart 1, subclause 1.4 (Symbols and Abbreviations) add the following subclause:
1.4.9 Arithmetic data types
INT32 32 bit signed integer using two’s complement
INT64 64 bit signed integer using two’s complement

In Part 3: Audio, Subpart 1, subclause 1.5 add the following subclauses:
1.5.1.2.31 SLS object type
The SLS object is supported by the scalable to lossless tool which provides fine-grain scalable to lossless
enhancement of MPEG perceptual audio codecs, such as AAC, allowing multiple enhancement steps from the
audio quality of the core codec up to near-lossless and lossless signal representation. It also provides stand-
alone lossless audio coding when the core audio codec is omitted.

1.5.1.2.32 SLS Non-Core object type
The SLS non-core object is supported by the scalable to lossless tool. It is similar to the SLS object type but
the core audio codec is omitted.

In Part 3: Audio, Subpart 1, in subclause 1.6.2.1 AudioSpecificConfig, amend table 1.8 with the updates in the
table below:
Syntax No. of bits Mnemonic
AudioSpecificConfig ()
{

switch (audioObjectType) {
case 37:
case 38:
SLSSpecificConfig();
break;

}

}
In Part 3: Audio, Subpart 1, in subclause 1.6.2.1 add the following subclause:
1.6.2.1.13 SLSSpecificConfig
Defined in ISO/IEC 14496-3 subpart 12.

In Part 3: Audio, Subpart 1, in subclause 1.6.2.2.1 Overview, add the following to table 1.14:
Audio Object Type Object Definition of elementary stream Mapping of audio payloads to
Type ID payloads and detailed syntax access units and elementary
streams

SLS 37 ISO/IEC 14496-3 subpart 12
SLS non_core 38 ISO/IEC 14496-3 subpart 12
2 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Create Part 3: Audio, Subpart 12:
Subpart 12: Technical description of scalable lossless coding
12.1 Scope
This subpart of ISO/IEC 14496-3 describes the MPEG-4 scalable lossless coding algorithm for audio signals.
This description partially relies on the specification as given in subpart 4.
12.2 Terms and definitions
12.2.1 Definitions
The following definitions are used in this subpart.
Core Layer The MPEG-4 GA T/F coder used as the first layer in SLS . The audio object
types AAC LC, AAC Scalable (without LTP), ER AAC LC, ER AAC Scalable
and ER BSAC are supported.
LLE Layer Lossless enhancement layer used in SLS to enhance the quality of the core
layer towards lossless coding.
Bit-Plane Position of specific bit in binary data word, starting with 0 as the position of
the least significant bit (LSB). For example, the binary bit-plane symbols from
bit-plane 0, 1, 2, and 3 of data word 0x0011 1101 (0x3d) are 1, 0, 1, and 1
respectively.
BPGC Bit-Plane Golomb Code
CBAC Context Based Arithmetic Code
LEMC Low Energy Mode Code
Implicit Band A scale factor band for which the quantized spectral data presented in the
core layer bit-stream will be used in determining part of the necessary side
information for the LLE layer.
Explicit Band A scale factor band for which the quantized spectral data presented in the
core layer bit-stream will not be used in determining the necessary side
information for the LLE layer. All the side information will be coded explicitly
in the LLE payload.
Oversampling Factor (osf) Ratio between sampling rates of LLE Layer and Core Layer, possible values
are 1, 2 and 4.
Oversampling Range High frequency range covered only by the LLE Layer, comprises
(osf-1)*1024 resp. (osf-1)*128 frequency values per window.
Reserved All fields labelled Reserved are reserved for future standardization. All
Reserved fields must be set to zero.
© ISO/IEC 2006 – All rights reserved 3

ISO/IEC 14496-3:2005/Amd.3:2006(E)
12.2.2 Notations
In order to make the description stringent, the following notations are used in this subpart:
• Vectors are indicated by bold lower-case names, e.g. vector.
• Matrices (and vectors of vectors) are indicated by bold upper-case single letter names, e.g. M.
• Variables are indicated by italics, e.g. variable.
• Functions are indicated as func(x)
12.2.3 Definitions
DIV(m,n) Integer division with truncation of the result of m/n to an integer value towards −∞.
• The floor operation. Returns the largest integer that is less than or equal to the real-valued
⎢⎥
⎣⎦
argument.
12.3 Payloads for the audio object
Table 12.1 – Syntax of SLSSpecificConfig
Syntax No. of bits Mnemonics
SLSSpecificConfig(samplingFrequencyIndex,
channelConfiguration,
audioObjectType)
{
pcmWordLength; 3 uimsbf
1 uimsbf
aac_core_present;
1 uimsbf
lle_main_stream;
1 uimsbf
reserved_bit;
3 uimsbf
frameLength;
if (!channelConfiguration){
program_config_element();
}
}
Table 12.2 – Top layer payload for lle stream
Syntax No. of bits Mnemonics
lle_element()
{
for (ch=0;ch if (is_channel_pair(ch)) {
lle_channel_pair_element();
ch += 2;
} else {
lle_single channel_element();
ch++;
}
}
}
4 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Table 12.3 – Syntax of lle_single_channel_element
Syntax No. of bits Mnemonics
lle_single_channel_element()
{
lle_individual_channel_stream(1);
}
Table 12.4 – Syntax of lle_channel_pair_element
Syntax No. of bits Mnemonics
lle_channel_pair_element()
{
lle_individual_channel_stream(1);
lle_individual_channel_stream(0);
}
Table 12.5 – Syntax of lle_individual_channel_stream
Syntax No. of bits Mnemonics
lle_individual_channel_stream(is_first_channel)
{
lle_ics_length; 16 uimsbf
if (is_first_channel) {
element_instance_tag; 4 uimsbf
}
lle_reserved_bit; 1 uimsbf
if (lle_main_stream) {
lle_header(is_first_channel);
lle_side_info();
}
lle_data();
byte_align();
}
© ISO/IEC 2006 – All rights reserved 5

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Table 12.6 – Syntax of lle_header()
Syntax No. of bits Mnemonics
lle_header(is_first_channel)
{
if (lle_channel_pair_element && common_window &&
is_first_channel) {
1 uimsbf
use_stereo_intmdct;
}
if (aac_core_present) {
band_type_signaling; 2 uimsbf
if (band_type_signaling==1) {
for(g=0;g for(sfb=0;sfb band_type[g][sfb]; 1 uimsbf
}
}
}
} else {
if (is_first channel) {
windows_sequence; 2 uimsbf
}
}
}
Table 12.7 – Syntax of lle_side_info
Syntax No. of bits Mnemonics
lle_side_info()
{
For(g=0;g for(sfb=0;sfb if (band_type[g][sfb]==Explicit_Band) {
vcod_dpcm_max_bp[g][sfb]; 1.17 bslbf
}
if (max_bp[g][sfb] != -1) {
vcod_lazy_bp[g][sfb]; 1. 2 bslbf
}
}
}
cb_cbac; 1 uimsbf
}
Table 12.8 – Syntax of lle_data
Syntax No. of bits Mnemonics
lle_data()
{
BPGC/CBAC data; varies bslbf
LEMC data; varies bslbf
}
6 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
12.4 Semantics
Data elements:
aac_core_present Indicates, whether the lossless enhancement operates on top of an MPEG-4
GA T/F core (aac_core_present=1) or in non-core mode
(aac_core_present=0).
lle_main_stream Indicates, whether the current stream represents an LLE main stream
including all the necessary side information or an LLE extension stream that
extends the previous LLE stream.
pcmWordlength Quantization word length of the original PCM waveform.
Table 12.9 – Word length of original PCM waveform
pcmWordlength Word length of original PCM
waveform
0 8
1 16
2 20
3 24
4 – 7 Reserved
frameLength Length of the IntMDCT frame in the LLE layer.
Table 12.10 – Length of the IntMDCT frame
frameLength Length of the IntMDCT frame Oversampling factor of the
IntMDCT filterbank (osf)
0 1024 1
1 2048 2
2 4096 4
3-7 Reserved Reserved
element_instance_tag Unique instance tag for syntactic elements. All syntactic elements containing
instance tags may occur more than once, but must have a unique
element_instance_tag in each audio frame. When the MPEG-4 GA T/F core
is present, syntactic elements of SLS and MPEG-4 GA T/F from the same
audio channel use the same element_instance_tag.
lle_ics_length Length of LLE individual channel stream (LLE_ICS) for the current frame; in
bytes.
band_type_signaling By default, the band type for a scale factor band is defined as follows: A scale
factor band that is in a section coded with the zero codebook (ZERO_HCB),
Intensity Stereo (IS) coded, or Perceptual Noise Substitution (PNS) coded is
an Explicit_Band. Otherwise it is an Implicit_Band.
Scale factor bands above max_sfb and in the oversampling range are always
Explicit_Band.
This default band type can by overwritten by band_type_signaling in the
following way:
© ISO/IEC 2006 – All rights reserved 7

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Table 12.11 – Band type signaling
Value of band type
band_type_signaling
00 Use default
01 Band type signaling for each sfb follows
10 All sfb are Explicit_Band
11 Reserved
band_type[g][sfb] Band type signaling for each scale factor band when
band_type_signaling==01. A scale factor band is set to Explicit_Band if
band_type[g][sfb] is 0.
Table 12.12 –Band type
Value Band type
0 Explicit_Band
1 Default
vcod_dpcm_max_bp[g][sfb] The variable length coded maximum bit-plane for scale factor band sfb and
group g.
vcod_lazy_bp[g][sfb] The variable length coded lazy bit-plane for non-zero scale factor band sfb
and group g.
cb_cbac Indication of frequency table that will be used in the LLE decoding process.

Table 12.13 – cb_cbac table
cb_cbac Frequency table
0 BPGC
1 CBAC
bpgc/cbac_data The binary bit-stream of the bpgc/cbac coded residual spectrum data

low_energy_mode_data The binary bit-stream of the LEMC mode coded residual spectrum data

12.5 SLS decoder tool
12.5.1 Overview
The block diagram of the scalable lossless (SLS) decoder is given in Figure 12.1. The core layer MPEG-4 GA
stream is decoded by a deterministic Core Layer decoder. Its output, which is a deterministic spectrum in the
MDCT domain, is sent to the inverse error mapping process. Meanwhile, the residual IntMDCT spectrum,
carried in the LLE layer streams, is decoded and sent to the inverse error mapping process to reconstruct the
IntMDCT spectrum. An inverse integer Mid/Side (M/S) and an inverse integer TNS process are then invoked
and performed on the IntMDCT coefficients if necessary. Finally, its output is inversely transformed by using
the inverse IntMDCT process to produce the PCM audio samples. A detailed description of each process is
given in the subsequent sections.

8 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)

MPEG - 4
Deterministic
GA stream
MPEG -4 GA
decoder
MPEG - 4
Bitstream
SLS
LLE Output PCM
Payload
Inverse
BPGC / Inverse Inverse
Inverse
stream samples
Parser
Integer
CBAC Error Integer
IntMDCT
TNS
Decoder Mapping M/S
Low
Energy
Mode
Decoder
Figure 12.1 – SLS decoder block diagram
12.5.1.1 Non-core Mode
In the non-core mode SLS works as a stand-alone codec without AAC core. In case of the SLS audio object
type this is signalled by aac_core_present=0 for the non-core mode and aac_core_present=1 for the core-
based mode. In case of the SLS non-core audio object type it is always aac_core_present=0.
In the non-core mode the following default values are used:
• window_shape = 0 (sine window)
• if (lle_channel_pair_element) common_window = 1 (on)
• if (use_stereo_intmdct) all M/S flags are on, else all M/S flags are off
• if (window_sequence == EIGHT_SHORT_SEQUENCE) grouping = {2,2,2,2}

LLE Output PCM
BPGC/
Inverse
samples
stream
CBAC
IntMDCT
SLS decoder
Bitstream
stream
Payload
Parser
osf*1024 osf* 1024
Low
Energy
Mode
Decoder
Figure 12.2 – SLS non-core decoder block diagram

© ISO/IEC 2006 – All rights reserved 9

ISO/IEC 14496-3:2005/Amd.3:2006(E)
12.5.2 Oversampling technique
The core layer is allowed to operate at a lower sampling rate than the LLE layers. The following table shows
some possible sampling rate combinations.
Table 12.14 – Example combinations of sampling rates for Core and LLE layers
Core@ 48 kHz Core@ 96 kHz Core@ 192 kHz
LLE@ 48 kHz X (osf = 1)
LLE@ 96 kHz X (osf = 2) X (osf = 1)
LLE@ 192 kHz X (osf = 4) X (osf = 2) X (osf = 1)

This technique is referred to as “Oversampling” in the following.
The scalability of the codec using different sampling rates is achieved by changing the length of the inverse
IntMDCT in the decoder accordingly. While the AAC core processes 1024 values in each frame, the SLS
codec needs to process osf*1024 values per frame. This is achieved by extending the length of the inverse
IntMDCT in the decoder to osf*1024 spectral lines. The 1024 inverse quantized spectral values from the AAC
core are added to the 1024 low-frequency values of the SLS residual spectrum. This is illustrated in Figure
12.3.
MPEG-4
Deterministic
GA stream
MPEG-4 GA
decoder
AAC +
LLE
Bitstream
stream
LLE Output PCM
Payload
Inverse
BPGC/ Inverse Inverse
stream Inverse samples
Parser
Integer
CBAC Error Integer
IntMDCT
TNS
decoder Mapping M/S
osf *102 4 osf*1024 osf*1024 osf * 1024
Low
Energy
Mode
Decoder
Figure 12.3 – Structure of SLS decoder with oversampling

12.5.3 SLS with Scalable AAC Core
If the core layer is AAC Scalable, the spectral data decoded from the SLS layers are added to the spectral
data decoded from the AAC Scalable streams with a deterministic inverse AAC quantizer. The resulting
spectral data is then processed with inverse integer M/S and inverse integer TNS process if necessary.
Finally, the output is transformed by the inverse IntMDCT to produce the PCM audio samples. The decoding
process is illustrated in Figure 12.4.
10 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)

Deterministic L'
Integer
Scal able I nverse
Inverse
AAC Q uantization
FSS
L
TNS L
(SIAQ )
M'
Left/Mid
(One or more layers)
Integer
R'
Inverse
Bitstream Deterministic
TNS R
Demultiplex SIAQ
FSS
R
Right / Side
(One or more layers) Integer
Inverse
Integer
S' MS
Inverse
TNS M
Deterministic
M
SIAQ M''
M'''
Mono
Integer
(One or more
Inverse
layers)
S
MDCT R
L"
SLS
Integer
Right/Side
Inverse
Output Time
(One or more S"
TNS M
Signal (Right)
layers)
R" Integer
SLS
Inverse
Left/Mid
M"''
MDCT L
(One or more
layers)
Output Time
Signal (Left)
Figure 12.4 – Structure of SLS decoder with Scalable AAC core layer streams

12.5.4 Decoding of lle_single_channel_element (LLE_SCE) and lle_channel_pair_element
(LLE_CPE)
12.5.4.1 Definitions
lle_ics_length Length of LLE individual channel stream (LLE_ICS) in bytes.
vcod_dpcm_max_bp[g][sfb] The variable length coded maximum bit-plane for scale factor band sfb
and group g. This element is only present for insignificant scale factor
bands.
vcod_lazy_bp[g][sfb] The variable length coded lazy bit-plane for non-zero scale factor band
sfb and group g.
g Group index.
sfb Scale factor band within group.
win Window index.
bin Frequency bin index.
num_window_groups Number of groups of windows which share one set of scale factors.
© ISO/IEC 2006 – All rights reserved 11

ISO/IEC 14496-3:2005/Amd.3:2006(E)
num_sfb Number of scale factor bands per short window in case of
EIGHT_SHORT_SQEUENCE, number of scale factor bands for long
windows otherwise.
num_osf_sfb Number of scale factor bands per window in the oversampling range. The
oversampling range is covered by (osf-1)*16 bands with a width of 64 in
case of long windows resp. (osf-1)*4 bands with a width of 32 in case of
short windows.
max_bp[g][sfb] The maximum bit-plane for group g and scale factor band sfb.
lazy_bp[g][sfb] The lazy bit-plane for group g and scale factor band sfb.
read_bits(n) Read n consecutive bits from the inputs bit-stream in the order of bslbf.
quant[g][win][sfb][bin] AAC quantized spectral data.
interval[g][win][sfb][k] Quantization intervals in the core AAC encoder.

12.5.4.2 Decoding process
12.5.4.2.1 LLE_SCE and LLE_CPE
An LLE_SCE is composed of an lle_individual_channel_stream (LLE_ICS) while an LLE_CPE has two
lle_individual_channel_streams (LLE_ICS).
12.5.4.2.2 Decoding an LLE_ICS
In the LLE_ICS, the order of the decoding process is given in the following flowchart:

Get ll_ics_len
Get LLE decoding
side information
Get BPGC/CBAC
data
Get LEMC data
Figure 12.5 – Process of decoding LLE_ICS

For SLS bit-stream composed of an lle_main stream (lle_main_stream = 1) and multiple (>=1) lle_extension
stream (lle_main_stream = 0), for each LLE_ICS, the lle_data() is constructed by concatenating the lle_data()
elements from the lle_main stream, and all the available lle_extension streams in sequences as shown in the
following figure:
12 © ISO/IEC 2006 – All rights reserved

l
ISO/IEC 14496-3:2005/Amd.3:2006(E)

LLE
decoding
...
lle_data() lle_data() lle_data()
side
information
...
lle_main lle_extension lle_extension
(layer 1) (layer N)
Figure 12.6 – Construction of LLE_ICS for from multiple LLE streams

If there is an intermediate LLE_extension stream missing, the data in lle_data() of the subsequent streams
can not be used.
12.5.4.2.3 Recovering BPGC/CBAC side information
For each scale factor band of band type Explicit_Band, a maximum bit-plane (max_bp) is transmitted. In
addition, for each scale factor band, a lazy bit-plane (lazy_bp) is transmitted unless the residual spectral data
is all zero for this scale factor band (which is signalled by maximum bit-plane = -1). The max_bp is coded
using variable length coded DPCM relative to the previously transmitted maximum bit-plane. The first value in
each window group is coded using 5 bits PCM. The max_bp value is coded in unary representation. The
following table gives some examples of how the DPCM value of max_bp is coded.
Table 12.15 – Codeword for decoding the DPCM value of max_bp
DPCM max_bp codeword codeword length
0 1 1
(s)1 01(s) 3
(s)2 001(s) 4
… … …
(s)10 00000000001(s) 12
… … …
The difference between max_bp and lazy_bp, whose value is within the range {1, 2, 3} is decoded as follows:
Table 12.16 – Codeword for decoding the difference between max_bp and lazy_bp
max_bp - lazy_bp codeword codeword length
1 10 2
2 0 1
3 11 2
© ISO/IEC 2006 – All rights reserved 13
lle_ics_ ength
ISO/IEC 14496-3:2005/Amd.3:2006(E)
The following pseudo code illustrates the decoding process for max_bp and lazy_bp.
for (g = 0;g < num_window_groups; g++)
init = 0;
for (sfb = 0; sfb if (band_type[g][sfb]== Explicit_Band) {
if (!init){
max_bp[g][sfb] = read_bits(5) - 1; init ++;
}
else {
m = 0;
while (read_bits(1) == 0) m++;
if (m) {
if (read_bits(1)) m = -m;
}
max_bp[g][sfb] = m0 - m;
}
m0 = max_bp[g][sfb];
}
if (max_bp[g][sfb]>=0) {
if (read_bits(1)==0)
lazy_bp[g][sfb] = max_bp[g][sfb] - 2;
else {
if (read_bits(1)==0) lazy_bp[g][sfb] = max_bp[g][sfb] - 1;
else lazy_bp[g][sfb] = max_bp[g][sfb] - 3;
}
}
}
For Implicit_Bands, max_bp[g][sfb] is calculated from the quantization thresholds of the core layer quantizer
as follows:
As the first step, the maximum bit-plane M for each residual spectral bin for significant scale factor bands can
be calculated from
M[g][win][sfb][bin]= INT log interval[g][win][sfb][bin]
[ ]
{ }
where interval[]g[win][sfb][bin] is the quantization interval that is given by:
interval[g][win][sfb][bin]=+thr quant[g][win][sfb][bin] 1−thr quant[g][win][sfb][bin]+1
( ) ( )
.
Here thr(x) and inv_quant(x) are, respectively, the deterministic quantization threshold and the corresponding
deterministic inverse quantization for AAC quantizer. They are calculated as in the following pseudo code:
If (x==0)
thr(x)=0;
else
thr(x) = (thrMantissa(|x|-1, scale_res))<<(12+scale_int);

inv_quant(x) = (invQuantMantissa(|x|,scale_res))<<(12+scale_int);

where
scale_int = DIV(scale,4)
scale_res = scale - scale_int*4, and
scale=scale_factor(sfb)+core_scaling_factor+scale_osf-118.
14 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
The value of core_scaling_factor is given in Table 12.17.
Table 12.17 – Table for core_scaling_factor
Word Length
16 20 24
sfb Type
Long Window (2048), M/S 0 16 32
Long Window (2048), non M/S 2 18 34
Short Window (256), M/S 6 22 38
Short Window (256), non M/S 8 24 40

Table 12.18 – Table for scale_osf
osf 1 2 4
scale_osf 0 2 4
The functions thrMantissa() and invQuantMantissa() are defined in 12.5.11.
For scalefactor bands coded with IS or PNS the value of inv_quant(x) is set to 0.
The maximum bit-plane max_bp for each sfb is the maximum value of M for spectral data that belongs to that
sfb:
max_bp[g][sfb]= max M[g][win][sfb][bin]
( )
12.5.5 Decoding of lle_data
12.5.5.1 Definitions
lle_data() Part of the bit-stream which contains the coded residual spectrum data.
window_group_len[g] Number of windows in each group.
is_lle_ics_eof() An auxiliary function to detect the end of LLE_ICS.
read_bits(n) Read n consecutive bits from the input bit-stream in the order of bslbf. If there
exists no bit to be fed in the bitstream, it returns ‘0’ by default.
cur_bp[g][sfb] The current decoded bit-plane.
res[g][win][sfb][k] The reconstructed residual integer spectral data vector.
amp[g][win][sfb][k] The amplitude of the reconstructed residual integer spectral data vector.
sign[g][win][sfb][k] The sign of the reconstructed residual integer spectral data vector.
determine_frequency() The function to determine the probability of the symbol '1' according to either
the CBAC or the BPGC frequency table.
ambiguity_check(f) The function to detect ambiguity for the arithmetic decoding. The argument f
indicates the probability of the symbol '1'.
terminate_decoding() The function to terminate decoding of the LLE data when ambiguity occurs.
© ISO/IEC 2006 – All rights reserved 15

ISO/IEC 14496-3:2005/Amd.3:2006(E)
smart_decoding_cbac_bpgc() The function to decode additional symbols in the absence of incoming bits in
the cbac/bpgc mode decoding. This decoding continues up to the point where
there exists no ambiguity. It includes ambiguity_check(f) and
terminate_decoding().
smart_decoding_low_energy() The function to decode additional symbols in the absence of incoming bits in
the low energy mode decoding. It also includes ambiguity_check(f) and
terminate_decoding().
12.5.5.2 Decoding process
12.5.5.2.1 Overview
The residual integer spectral data vector is decoded from the LLE data stream lle_data(). Firstly, all scale
factor bands with lazy_bp > 0 are BPGC/CBAC decoded, where the amplitude of the residual spectral data
res is bit-plane decoded starting from the maximum bit-plane max_bp and progressing to lower bit-planes until
bit-plane 0 for each scale factor band. Subsequently, the low energy mode decoding is invoked to decode the
remaining scale factor bands with lazy_bp <= 0.
The SLS decoder can provide the functionality of fine-grain scalability (FGS) by truncating the LLE bitstream.
Moreover, it allows to decode additional symbols beyond the point of truncation by exploiting the properties of
arithmetic coding.
12.5.5.2.2 BPGC/CBAC decoding process
The BPGC decoding or CBAC decoding process is performed on scale factor bands for which lazy_bp>0. The
BPGC/CBAC bit-plane decoding process is used to decode the bit-plane symbols for reconstructing the
residual integer spectral data res. The bit-plane decoding process is started from max_bp for each sfb, and
progressively proceeds to lower bit-planes. For the first NUM_BP bit-plane scans the bit-plane symbols are
arithmetic decoded as illustrated in the following pseudo code:
/* preparing the help element */
for (g=0;g for (sfb = 0;sfb width = swb_offset[g][sfb+1] – swb_offset[g][sfb];
for (win = 0;win for (bin=0;bin is_sig[g][win][sfb][bin] =
(quant[g][sfb][win][bin])&&(band_type[g][sfb]==ImplicitBand)?1:0;
/* sign will be determined implicitly if is_sig == 1 */
res[g][win][sfb][bin] = 0;
}
cur_bp[g][sfb] = max_bp[g][sfb];
}
}
/* BPGC/CBAC decoding */
while ((max_bp[g][sfb] – cur_bp[g][sfb]= 0)){
for (g=0;g for (sfb = 0;sfb if ((cur_bp[g][sfb]>=0) && (lazy_bp[g][sfb] > 0)){
width = swb_offset[g][sfb+1] – swb_offset[g][sfb];
for (win=0;win for (bin=0;bin if (!is_lle_ics_eof ()){
if (interval[g][win][sfb][bin] >
res[g][win][sfb][bin] + (1< {
freq = determine_frequency();
res[g][win][sfb][bin] += decode(freq ) << cur_bp[g][sfb];
/* decode bit-plane cur_bp*/
16 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
if ((!is_sig[g][win][sfb][bin]) && (res[g][win][sfb][bin] )) {
/* decode sign bit of res if necessary */
res[g][win][sfb][bin] *= (decode(freq_sign))? 1:-1;
is_sig[g][win][sfb][bin] = 1;
}
}
}
else {
smart_decoding_cbac_bpgc();
}
}
}
cur_bp[g][sfb]--; /* progress to next bit-plane */
}
}
}
}
After that, BPGC/CBAC enters the lazy decoding mode after skipping the 2 bit terminating string, where the
bit-plane symbols are directly read from the input bit-stream:

/* BPGC/CBAC lazy decoding */
read_bits(1);read_bits(1); /* skip the 2 AC termination string before lazy coding
while (cur_bp[g][sfb] >= 0){
for (g=0;g for (sfb = 0;sfb if ((cur_bp[g][sfb]>=0) && (lazy_bp[g][sfb] > 0)){
width = swb_offset[g][sfb+1] – swb_offset[g][sfb];
for (win=0;win for (bin=0;bin if (!is_lle_ics_eof ()){
if (interval[g][win][sfb][bin] >
res[g][win][sfb][bin] + (1< {
res[g][win][sfb][bin] += read_bit() << cur_bp[g][sfb];
/* decode bit-plane cur_bp */
if (((!is_sig[g][win][sfb][bin]) && (res[g][win][sfb][bin] )) {
/* decode sign bit of res if necessary */
res[g][win][sfb][bin] *= (read_bit())? 1:-1;
is_sig[g][win][sfb][bin] = 1;
}
}
}
}
}
cur_bp[g][sfb]--;
}
}
}
}
the value of NUM_BP is determined in the following table.
Table 12.19 – Value of NUM_BP
cb_cbac NUM_BP
0 (BPGC) 4
1 (CBAC) 6
The probability assignment freq in the above BPGC/CBAC decoding process is either the BPGC frequency
freq_bpgc or the CBAC frequency freq_cbac depending on whether the current LLE_ICS is decoded with the
BPGC or the CBAC frequency table. freq_bpgc is determined by the relationship of the cur_bp to the lazy_bp
© ISO/IEC 2006 – All rights reserved 17

ISO/IEC 14496-3:2005/Amd.3:2006(E)
parameter as given in the following table. The sign bits in the above decoding process are decoded with
frequency 8192, i.e., freq_sign = 8192.
Table 12.20 – freq_bpgc table
cur_bp BPGC frequency
lazy_bp+3 64
lazy_bp+2 964
lazy_bp+1 3277
lazy_bp 5461
The value freq_cbac is determined by the context of the bit-plane symbol which is currently being decoded.
There are three types of context used in CBAC which are listed in the following.
• Context 1: frequency band (fb)
The fb context is determined by the index of the interleaved residual IntMDCT spectral data c[i],
i=0,…,1024*osf-1, and the sampling rate of the current LLE layer as shown in the following table.
Table 12.21 – Frequency band (fb) context [frequency bin]
Sampling Rate 44100 48000 96000 192000 Other
Context no
0 (Low Band) 0 - 185 0 – 169 0 - 84 0 - 42 0 – 338
1 (Mid Band) 186 - 511 170 – 469 85 - 234 43 - 117 338 – 938
2 (High Band) >511 >469 >234 >117 > 938

• Context 2: significant state (ss)
For interleaved residual IntMDCT spectral data c[i], i=0,…,1024*osf-1 that is insignificant (i.e., the bit-plane
symbols of c[i] decoded so far are all zeroes) the ss context is determined by the significance of its adjacent
spectral data:
sig_cx(i,bp)=−sig_state(i 2,bp),sig_state(i−1,,bp) sig_state(i+1,,bp) sig_state(i+2,bp)
{ }
where sig_(state i,bp) is defined as:
⎧0 c i isinsignificantbeforebitplanebp
[]
sig_,state i bp =
()

1 c[]i issignificantbeforebitplanebp

and sig_(state i,bp) is defined as 0 if i is smaller than 0 or larger than the IntMDCT length.
For c[i] that is already significant, the ss context is determined by the band type of the scalefactor band that it
is from:
⎧0_c i is froman Explicit Band
[]
sig _core i = .
()

1_c[]i is froman Implicit Band

18 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Furthermore, for the latter case, the ss context is further determined according to the value of
quant_interval(,i bp) defined as:
bp+1
⎧0_rec spectrum[i]+≤2 interval[i]

,
quant_,interval i bp =
()

bp bp+1
1_rec spectrum[i]+≤2 interval i [] []


The detailed context assignment of the ss context is summarized in the following table:
Table 12.22 – Significance state (SS) context
Context no sig_state(i,cur_bp) sig_cx(i,cur_bp) sig_core(i) quant_interval(i)
0 0 {0,0,0,0} x x
1 0 {0,0,0,1} x x
{1,0,0,0}
2 0 {0,0,1,0} x x
{0,1,0,0}
3 0 {0,0,1,1} x x
{1,1,0,0}
4 0 {0,1,0,1} x x
{1,0,1,0}
5 0 {0,1,1,0} x x
6 0 {0,1,1,1} x x
{1,1,1,0}
7 0 {1,0,0,1} x x
8 0 {1,0,1,1} x x
{1,1,0,1}
9 0 {1,1,1,1} x x
10 1 x 0 x
11 1 x 1 0
12 1 x 1 1
• Context 3: distance to lazy (d2l)
The d2l context is determined by the distance of cur_bp to the lazy_bp parameter of the currently decoded bit-
plane symbol. The detailed assignment is listed in the following table.
Table 12.23 – Distance to lazy (D2L) context
Context no cur_bp – lazy_bp
0 <-2
1 -2
2 -1
3 0
4 1
5 2
6 3
The frequencies freq_cbac for each context are given in the following table.
© ISO/IEC 2006 – All rights reserved 19

ISO/IEC 14496-3:2005/Amd.3:2006(E)
Table 12.24 – freq_cbac table
d2l 0 1 2 3 4 5 6
fb*13+ss
0 8192 7823 7826 6506 4817 2186 1053
1 8192 8344 7983 6440 4202 1362 64
2 8192 8399 8382 7016 4202 1234 64
3 8192 8305 7960 6365 3963 1285 64
4 8192 8335 8146 6655 3746 825 64
5 8192 8473 8244 6726 3929 927 64
6 8192 8398 7919 6098 3581 875 64
7 8192 8359 8028 6382 3459 631 64
8 8192 8192 8192 5461 3277 964 64
9 8192 8333 7481 5288 3076 732 64
10 8192 7658 6898 5145 1424 1636 64
11 8192 5471 5732 6264 4890 1279 93
12 8192 8180 8136 7897 5715 1553 64
13 8192 7242 6876 6083 3604 1214 950
14 8192 7897 7570 6583 3733 1067 900
15 8192 8071 7928 7069 4294 1406 1200
16 8192 8197 7952 6906 4050 1457 1101
17 8192 8278 8039 7094 4160 1381 64
18 8192 8307 8139 7263 4407 1555 64
19 8192 8339 8124 7065 4074 1636 64
20 8192 8213 7918 6827 3787 1161 64
21 8192 8286 8067 6902 3855 1387 64
22 8192 8336 8072 6705 3731 1558 64
23 8192 7636 6962 5036 1985 1037 64
24 8192 5519 5270 5238 4778 1588 219
25 8192 7884 7528 6743 4848 1970 64
26 8192 6084 6323 5929 3321 900 385
27 8192 7862 7618 6728 4409 1431 1302
28 8192 8078 7871 7081 5119 2371 1670
29 8192 8294 8046 7239 5218 2032 967
30 8192 8378 8119 7351 5413 1947 64
31 8192 8378 8207 7491 5624 2444 64
32 8192 8484 8302 7626 5514 2021 64
33 8192 8302 8006 7192 4941 1561 64
34 8192 8464 8246 7510 5217 1780 64
35 8192 8544 8442 7742 4944 2010 64
36 8192 7556 6771 4859 2638 2155 64
37 8192 5916 4780 4713 4239 1240 182
38 8192 7658 7095 5986 3886 1394 64

12.5.5.2.3 Low Energy Mode Code (LEMC) decoding
The following pseudo code illustrates the LEMC decoding process that is performed on scale factor bands for
which lazy_bp<=0.
/* low energy mode decoding */
for (g = 0;g < num_window_groups; g++){
for (sfb = 0; sfb if ((cur_bp[g][sfb] >= 0) && (lazy_bp[g][sfb] <= 0))
{
width = swb_offset[g][sfb+1] – swb_offset[g][sfb];
20 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
for (win=0;win res[g][sfb][win][bin] = 0;
pos = 0;
for (bin=0;bin if (!is_lle_ics_eof ()){
/* decoding of binary string and reconstructing res */
while (decode(freq_silence[pos])==1) {
res[g][sfb][win][bin] ++;
pos++;
if (pos>2) pos = 2;
if (res[g][sfb][win][bin]==(1<<(max_bp[g][sfb]+1))-1) break;
}
/* decoding of sign of res */
if (!is_sig[g][win][sfb][bin]) && res[g][sfb][win][bin]){
res[g][sfb][win][bin] *= (decode(freq_sign))? -1:1;
is_sig[g][win][sfb][bin] = 1;
}
}
else smart_decoding_low_energy();
}
}
}
}
}
The probability assignments for the low energy mode decoding, freq_bpgc and freq_silence are given in the
following tables. The sign bits in the above decoding process are decoded with frequency 8192, i.e. freq_sign
= 8192.
Table 12.25 – freq_silence table
lazy_bp0 -1 -2 -3
pos
0 12603 9638 6554 3810
1 7447 3344 1820 X
>1 6302 745 552 X
The following table defines the mapping between the binary string decoded in case of the low energy mode
and the residual spectral data res. The sign bit of res is decoded after the first non-zero bit-plane symbol has
been decoded.
Table 12.26 – Binarization of res in low energy mode coding
Amplitude of res[g][win][sfb][bin] Binary string
0 0
1 1 0
2 1 1 0
3 1 1 1 0
4 1 1 1 1 0
… …
2^(max_bp[g][sfb]+1)-2
1 1 … … … 1 0
2^(max_bp[g][sfb]+1)-1
1 1 … … … 1 1
pos 0 1 2 3 …
© ISO/IEC 2006 – All rights reserved 21

ISO/IEC 14496-3:2005/Amd.3:2006(E)
12.5.5.2.4 Arithmetic decoding
The following pseudo code illustrates the integer arithmetic decoding process used in the BPGC/CBAC and
the low energy mode decoding process.
Definitions:
#define CODE_WL 16
#define PRE_SHT 14
#define TOP_VALUE (((long)1< #define QTR_VALUE (TOP_VALUE/4+1)
#define HALF_VALUE (2*QTR_VALUE)
#define TRDQTR_VALUE (3*QTR_VALUE)

Initialization:
low = 0;
high = TOP_VALUE;
value = 0;
The decoding subroutine
int decode(int freq)
{
range = (long)((high-low)+1);
cumu = ((long)((value-low)+1)< if (cumu sym = 1;
high = low + (range*freq>>PRE_SHT)-1;
}
else {
sym = 0;
low = low + (range*freq>>PRE_SHT);
}
for (;;) {
if (high } else if (low>=HALF_VALUE) {
value -= HALF_VALUE;
low -= HALF_VALUE;
high -= HALF_VALUE;
} else if (low>=QTR_VALUE && high value -= QTR_VALUE;
low -= QTR_VALUE;
high -= QTR_VALUE;
} else
break;
low = 2*low;
high = 2*high+1;
value = 2*value + read_bits(1); /*input next bit from bit-stream */
}
return sym;
}
12.5.5.2.5 Smart arithmetic decoding of truncated SLS bitstreams
The smart arithmetic decoding provides an efficient way to decode an intermediate layer corresponding to a
given target bitrate. This algorithm exploits the fact that a decoding buffer still contains meaningful information
for arithmetic decoding even if there is no bit left to be fed into the decoding buffer. The decoding process
continues as long as there exists no ambiguity in determining a symbol.
The following pseudo code illustrates the algorithm to detect the ambiguity in the arithmetic decoding module.
The variable num_dummy_bits represents the number of calls to evoke the function of read_bits(1) in the
arithmetic decoding process just after the truncation point.
int ambiguity_check(int freq)
{
/* if there is no ambiguity, returns 1 */
/* otherwise, returns 0        */
22 © ISO/IEC 2006 – All rights reserved

ISO/IEC 14496-3:2005/Amd.3:2006(E)
upper = 1< decisionVal = ((high-low)*freq>>PRE_SHT)-value+low-1;
if(decisionVal>upper || decisionVal<0) return 0;
else return 1;
}
Either smart_decoding_cbac_bpgc() or smart_decoding_low_energy() is executed when num_dummy_bits is
greater than 0. In order to prevent sign bit errors, the spectral value of the current spectral line should be set
to zero when an ambiguity can occur while decoding a sign bit. Notice that all index variables in the smart
decoding process should be carried over from the previous arithmetic decoding process.

smart_decoding_cbac_bpgc()
{
/* BPGC/CBAC normal decoding with ambiguity detection */
while ((max_bp[g][sfb] - cur_bp[g][sfb]= 0)){
for (;g for (;sfb if ((cur_bp[g][sfb]>=0) && (lazy_bp[g][sfb] > 0)){
width = swb_offset[g][sfb+1] - swb_offset[g][sfb];
for (;win for (;bin if (interval[g][win][sfb][bin] >
res[g][win][sfb][bin] + (1< {
freq = determine_frequency();
if (ambiguity_check(freq)) {
/* no ambiguity for arithmetic decoding */
res[g][win][sfb][bin] += decode(freq ) << cur_bp[g][sfb];
/* decode bit-plane cur_bp*/
if ((!is_sig[g][win][sfb][bin]) && (res[g][win][sfb][bin] )) {
/* decode sign bit of res if necessary */
if (ambiguity_check(freq)) {
res[g][win][sfb][bin] *= (decode(freq_sign))? 1:-1;
is_sig[g][win][sfb][bin] = 1;
}
else {
/* discard the decoded symbol prior to sign symbol */
res[g][win][sfb][bin] = 0;
terminate_decoding();
}
}
}
else terminate_decoding();
}
}
}
cur_bp[g][sfb]--; /* progress to next bit-plane */
}
}
}
}
}
smart_decoding_low_energy()
{
/* low ener
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...