ISO/IEC 23003-2:2010/Cor 1:2012
(Corrigendum)Information technology — MPEG audio technologies — Part 2: Spatial Audio Object Coding (SAOC) — Technical Corrigendum 1
Information technology — MPEG audio technologies — Part 2: Spatial Audio Object Coding (SAOC) — Technical Corrigendum 1
Technologies de l'information — Technologies audio MPEG — Partie 2: Codage d'objet audio spatial (SAOC) — Rectificatif technique 1
General Information
Relations
Standards Content (Sample)
INTERNATIONAL STANDARD ISO/IEC 23003-2:2010
TECHNICAL CORRIGENDUM 1
Published 2012-09-01
INTERNATIONAL ORGANIZATION FOR STANDARDIZATION МЕЖДУНАРОДНАЯ ОРГАНИЗАЦИЯ ПО СТАНДАРТИЗАЦИИ ORGANISATION INTERNATIONALE DE NORMALISATION
INTERNATIONAL ELECTROTECHNICAL COMMISSION МЕЖДУНАРОДНАЯ ЭЛЕКТРОТЕХНИЧЕСКАЯ КОМИССИЯ COMMISSION ÉLECTROTECHNIQUE INTERNATIONALE
Information technology — MPEG audio technologies —
Part 2:
Spatial Audio Object Coding (SAOC)
TECHNICAL CORRIGENDUM 1
Technologies de l'information — Technologies audio MPEG —
Partie 2: Codage d'objet audio spatial (SAOC)
RECTIFICATIF TECHNIQUE 1
Technical Corrigendum 1 to ISO/IEC 23003-2:2010 was prepared by Joint Technical Committee ISO/IEC
JTC 1, Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.
In Clause 2 "Normative references”, add:
ISO/IEC 23000-12, Information technology – Multimedia application format (MPEG-A) – Part 12: Interactive
music application format
In all tables, replace:
“reserved”
with:
“N/A”
ICS 35.040 Ref. No. ISO/IEC 23003-2:2010/Cor.1:2012(E)
© ISO/IEC 2012 – All rights reserved
Published in Switzerland
---------------------- Page: 1 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 5.1 Introduction, replace:
The number of objects that can be handled is in priniciple not limited.
with:
The number of objects that can be handled is in principle not limited.
In 5.5.2 Baseline Profile, replace:
Note that ISO/IEC 23000-12 (Information technology — Multimedia application format (MPEG-A) — Part 12:
Interactive music spplication format) defines several brands that refer to the SAOC Baseline Profile.
with:
Note that ISO/IEC 23000-12 defines several brands that refer to the SAOC Baseline Profile.
In 6.1 Payloads for SAOC, replace:
Table 8 – Syntax of ResidualConfig()
Syntax No. of bits Mnemonic
ResidualConfig()
{
bsResidualSamplingFrequencyIndex; 4 uimsbf
bsResidualFramesPerSAOCFrame; 2 uimsbf
bsNumGroupsFGO; 2 uimsbf
for ( i=0; i
bsResidualPresent[i]; 1 uimsbf
if ( bsResidualPresent[i] ) {
with:
Table 8 – Syntax of ResidualConfig()
Syntax No. of bits Mnemonic
ResidualConfig()
{
bsResidualSamplingFrequencyIndex; 4 uimsbf
bsResidualFramesPerSAOCFrame; 2 uimsbf
bsNumEAO; 2 uimsbf
for ( i=0; i
bsResidualPresent[i]; 1 uimsbf
if ( bsResidualPresent[i] ) {
2 © ISO/IEC 2012 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 6.1 Payloads for SAOC, replace:
Table 15 – Syntax of PresetConfig()
Syntax No. of bits Mnemonic
PresetConfig()
{
bsNumPresets; 4 uimsbf
for ( i=0; i
bsNumBytePresetLabel[i]; 8 uimsbf
for ( j=0; j
bsPresetLabel[i][j]; 8 bslbf
}
bsPresetMatrix; 1 uimsbf
if (bsPresetMatrix) {
PresetMatrixData();
} else {
with:
Table 15 – Syntax of PresetConfig()
Syntax No. of bits Mnemonic
PresetConfig()
{
bsNumPresets; 4 uimsbf
for ( i=0; i
bsNumBytePresetLabel[i];
8 uimsbf
for ( j=0; j
bsPresetLabel[i][j];
8 bslbf
}
bsPresetMatrix[i];
1 uimsbf
if (bsPresetMatrix[i]) {
PresetMatrixData();
} else {
In 6.1 Payloads for SAOC, replace:
Table 21 – Syntax of SAOCFramingInfo()
Syntax No. of bits Mnemonic
SAOCFramingInfo()
{
bsFramingType; 1 uimsbf
If ( bsLowDelayMode == 0 ) {
bsNumParamSets; 3 uimsbf
} else {
bsNumParamSets; 1 uimsbf
}
if (bsFramingType) {
for (ps=0; ps
bsParamSlot[ps]; nBitsParamSlot uimsbf
Note 2
}
}
}
Note 1: numParamSets is defined by numParamSets = bsNumParamSets + 1.
Note 2: nBitsParamSlot is defined according to nBitsParamSlot = ceil(log2(numSlots)).
© ISO/IEC 2012 – All rights reserved 3
---------------------- Page: 3 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
with:
Table 21 – Syntax of SAOCFramingInfo()
Syntax No. of bits Mnemonic
SAOCFramingInfo()
{
bsFramingType; 1 uimsbf
If ( bsLowDelayMode == 0 ) {
bsNumParamSets; 3 uimsbf
} else {
bsNumParamSets; 1 uimsbf
}
for (ps=0; ps
if (bsFramingType) { Note 1
bsParamSlot[ps]; nBitsParamSlot uimsbf
Note 2
} else {
bsParamSlot[ps] =ceil(numSlots*(ps+1)/numParamSets)-1; Note 1, 3
}
}
}
Note 1: numParamSets is defined by numParamSets = bsNumParamSets + 1.
Note 2: nBitsParamSlot is defined according to nBitsParamSlot = ceil(log2(numSlots)).
Note 3: numSlots is defined by numSlots = bsFrameLength + 1.
In 6.1 Payloads for SAOC, replace:
bsDcuParam Defines the parameter value for the DCU algorithm according to Table 41.
with:
bsDcuParam Defines the parameter value for the DCU algorithm according to Table 39.
In 6.1 Payloads for SAOC; replace:
Table 42 — numQuantSteps
XXX (dataType) numQuantStepsXXXCoarse numQuantStepsXXXFine
DCLD, DMG, PDG 15 31
IOC 4 8
OLD 8 16
NRG 32 64
with:
Table 42 – numQuantSteps
XXX (dataType) numQuantStepsXXXCoarse numQuantStepsXXXFine
DCLD, DMG, PDG 15 31
IOC 4 8
OLD 8 16
NRG 32 64
4 © ISO/IEC 2012 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 6.1 Payloads for SAOC, replace:
PresetUserDataContainer()
Syntactic element that contains preset rendering data in the user-defined preset
representation format and has a length of exactly bsPresetUserDataLen bytes.
with:
PresetUserDataContainer()
Syntactic element that contains preset rendering data in the user-defined preset
representation format and has a length of exactly bsPresetUserDataLen bytes.
All bitstream variables which are not explicitly described here are defined in ISO/IEC 23003-1:2007.
In 6.1 Payloads for SAOC, add:
bsResidualFramesPerSAOCFrame
Indicates the number of residual frames per SAOC frame, ranging from one to four
according to Table 56 defined in ISO/IEC 23003-1:2007.
In 6.1 Payloads for SAOC, add:
SAOCDiffHuffData() Syntactic element that contains one or two temporally subsequent parameter
subsets of a given parameter in the SAOC frame, where the quantized values are
coded using a combination of differential coding and Huffman coding.
In Clause 7 SAOC processing, omit the time/band indices for all signals and parameters.
In 7.1.2 Dequantization of the SAOC parameters, replace:
Table 47 – OLD parameter quantization table
idx 0 1 2 3 4 5 6 7
-15.00 -4.50 -4.00 -3.50 -3.00 -2.50 -2.20 -1.90
OLD[idx] 10 10 10 10 10 10 10 10
idx 8 9 10 11 12 13 14 15
-1.60 -1.30 -1.00 -0.80 -0.60 -0.40 -0.20
OLD[idx] 10 10 10 10 10 10 10 1
with:
Table 47 – OLD parameter quantization table
0 1 2 3 4 5 6 7
idx
-15.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.2 -1.9
OLD[idx] 10 10 10 10 10 10 10 10
8 9 10 11 12 13 14 15
idx
-1.6 -1.3 -1.0 -0.8 -0.6 -0.4 -0.2
OLD[idx] 10 10 10 10 10 10 10 1
© ISO/IEC 2012 – All rights reserved 5
---------------------- Page: 5 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 7.1.2 Dequantization of the SAOC parameters, replace:
while (ps=0; ps
switch (bsXXXdataMode[pi][ps]) {
case 0: /* default */
for (pb=0; pb
switch (XXX) {
case OLD, NRG, IOC, DCLD, DMG, PDG:
idxXXX[pi][ps][pb] = 0;
break;
}
}
break;
with:
while (ps=0; ps
switch (bsXXXdataMode[pi][ps]) {
case 0: /* default */
for (pb=0; pb
switch (XXX) {
case NRG, DCLD, DMG, PDG:
idxXXX[pi][ps][pb] = 0;
break;
}
case OLD:
idxXXX[pi][ps][pb] = 15;
break;
}
case IOC:
idxXXX[pi][ps][pb] = 5;
break;
}
}
break;
In 7.2.3 Unquantized interface for the MPS parameters, replace:
For an efficient practical implementation and to prevent a loss in precision, the parameter interface to the
MPS decoder may alternatively be established in a direct, unquantized way. Rather than writing an actual
MPS bitstream, the relevant parameters may be passed directly to the MPS decoder.
with:
For an efficient practical implementation and to prevent a loss in precision, the parameter interface to the
MPS decoder may alternatively be established in a direct, unquantized way. The required range of all relevant
parameters is determined by the minimal and maximal values of the corresponding dequantization scheme.
Rather than writing an actual MPS bitstream, the relevant parameters may be passed directly using binary32
(single) floating point format (IEEE 754-2008) to the MPS decoder.
In 7.4 Post(processing) downmix compensation, replace and move the corresponding text to “7.5 Signals and
parameters”:
nk,
If the post(processed) downmix X is used, the following modification should be taken prior to
post(processed)
SAOC decoding/transcoding:
nk,,n k nk,
XW X ,
PDG post(processed)
6 © ISO/IEC 2012 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
nk,
where X represents the input signal to the SAOC decoder/transcoder.
nk,
The matrix W is defined for every time-slot n and every hybrid subband k . Its elements are obtained
PDG
from the transmitted PDG parameters which are defined for a given parameter time-slot l and a given
m A.31, ISO/IEC
processing band . The mapping to the hybrid domain is done according to Table
lm,
23003-1:2007. If post(processed) downmix compensation is applied (bsPdgFlag = 1), the matrix W is
PDG
defined as:
lm, l,m
W PDG, for mono downmix,
0
PDG
lm,
PDG 0
lm, 0
W , for stereo downmix,
lm,
PDG
0 PDG
1
lm,
where PDG D j,,l m .
j PDG
with:
If the post(processed) downmix X compensation is applied (bsPdgFlag = 1), the following
post(processed)
modification should be taken prior to the SAOC decoding/transcoding
XW X .
PDG post(processed)
The matrix W is obtained from the transmitted PDG parameters as
PDG
PDG 0
0
W , for stereo downmix,
PDG
0 PDG
1
PDG 0
0
W , for mono downmix.
PDG
00
Here, the dequantized post(processed) downmix gains are obtained according to 7.1.2 as
PDG D j,,l m.
j PDG
In 7.5.2 Signals and parameters, replace:
l
nk, 0
Xx , for stereo downmix,
r
0
d
nk, 0
Xx , for monoo downmix.
0
with:
l
0
X , for stereo downmix,
r
0
d
0
X , for mono downmix.
0
© ISO/IEC 2012 – All rights reserved 7
---------------------- Page: 7 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 7.5 Signals and parameters, add:
Output covariance
The output covariance matrix F with elements f is given as
ij,
*
FA EA , for binaural rendering,
*
FM EM, otherwise.
ren ren
In 7.5 Signals and parameters, add:
Input covariance
v
The input covariance is given as
*2
vDED
.
In 7.6 SAOC transcoding/decoding modes, use the following structure:
7.6 SAOC transcoding/decoding modes
7.6.1 Overview
7.6.2 Decorrelated signal
7.6.3 Transcoding modes
7.6.3.1 Introduction
7.6.3.2 Mono downmix (“x-1-5”) processing mode
7.6.3.2.1 Introduction
7.6.3.2.2 SAOC downmix preprocessor unit
7.6.3.2.3 SAOC parameter processing unit
7.6.3.3 Stereo downmix (“x-2-5”) processing mode
7.6.3.3.1 Introduction
7.6.3.3.2 SAOC downmix preprocessor unit
7.6.3.3.3 SAOC parameter processing unit
7.6.4 Decoding modes
7.6.4.1 Introduction
7.6.4.2 Mono to binaural "x-1-b" processing mode
7.6.4.3 Mono to stereo "x-1-2" processing mode
7.6.4.4 Mono to mono "x-1-1" processing mode
7.6.4.5 Stereo to binaural "x-2-b" processing mode
7.6.4.6 Stereo to stereo "x-2-2" processing mode
7.6.4.7 Stereo to mono "x-2-1" processing mode
In 7.6.2 Mono downmix (“x-1-5”) processing mode, replace:
Estimation of power and cross power terms
Incorporating index h denoting the OTT element, the power and cross power terms can be estimated by:
NN11 NN11 NN11
2 hh 2 hh hh
p ww e , p ww e , R ww e .
hi,0 0, 0,ji,j hi,1 1, 1,ji,j hi0, 1,ji,j
ij00 ij00 ij00
Derivation of the MPS parameters
Finally, the corresponding CLD and ICC parameters are derived as:
2
p
lm,2h,0
CLD 10logmax , ,
h 10
2
p
h,1
8 © ISO/IEC 2012 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
2
max R ,
h
lm,
ICC ,
h
22
maxpp, max ,
hh,0 ,1
with:
The index h refers to the OTT element
h
h h
r r
0,0 0,1
CLD 10log , .
ICC
h 10 h h
hh
r
rr
1,1
0,0 1,1
h
The terms r can be estimated as
ij,
NN11
hhh 2
rwmax w ,e .
ij,,in j,m n,m
nm00
In 7.6.2 Mono downmix (“x-1-5”) processing mode, replace:
lm,
f
lm,2
ADG 10log max , .
10 lm,
v
lm,
The scalar f is computed as
N1
lm,,l m
f f .
ii,
i0
with:
trace F
2
ADG 10log max , .
10
v
In 7.6.2 Mono downmix (“x-1-5”) processing mode, replace:
The following subclauses give a description of the SAOC transcoding mode for the mono downmix case. The
object parameters (OLD, IOC, DMG, DCLD) from the SAOC bitstream are transcoded into spatial parameters
(CLD, ICC, CPC, ADG) for the MPS bitstream according to the rendering information. The downmix is not
modified.
with:
The following subclauses describe the processing steps dedicated to the transformation of SAOC parameters
(OLD, IOC, DMG) into MPS data (CLD, ICC, ADG) according to the rendering information for the mono
downmix case, see Figure 13 (left). The downmix signal is not modified.
In 7.6.2 Mono downmix (“x-1-5”) processing mode, replace:
The respective contribution of each object to the two outputs of OTT element 0 is obtained by summation of
lm, lm,
the corresponding elements in M . This summation gives a sub-rendering matrix W of OTT element 0:
ren 0
© ISO/IEC 2012 – All rights reserved 9
---------------------- Page: 9 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
00
ww
lm, 0,0 0,N1
W
0 00
ww
1,0 1,N1
l,,m lm lm, lm, lm, l,m l,m l,m
mmmm m m m m
0,Lf 0,Rf 0,C 0,Lfe N1,Lf N 1,Rf N1,C N1,Lfe
.
l,,m lm lm, l,m
mm m m
0,Ls 0,Rs N1,Ls N 1,Rs
lm, lm,
The CLDs and ICCs of the subsequent OTT boxes (CLD , ICC , h 0,, 4 ) are calculated using the
h h
sub-rendering matrices defined as:
11 l,,m lm lm, l,m
ww mmm m
lm, 0,0 0,N1 0,Lf 0,Rf N1,Lf N 1,Rf
W ,
1 11 l,,m lm lm, lm,
ww mmm m
1,0 1,N1 0,C 0,Lfe N1,C N 1,Lfe
with:
The respective contribution of each object to the two outputs of OTT element is obtained by summation of the
h
corresponding elements in the rendering matrix M . The subsequent sub-rendering matrices W with
ren h
h
elements w are defined as
ij,
00
ww
0,0 0,N1
lm,
W
0
00
ww
1,0 1,N1
.
lm,2 lm, 2 l,m2 l,m2 lm, 2 l,m 2 lm, 2 l,m 2
mmmm m m m m
0,Lf 0,Rf 0,C 0,Lfe N1,Lf N 1,Rf N1,C N1,Lfe
lm,2 l,m2 lm, 2 l,m 2
mm m m
0,Ls 0,Rs N 1,Ls N 1,Rs
lm,2 l,m2 lm, 2 l,m 2
11
mm m m
ww 0,Lf 0,Rf N1,Lf N 1,Rf
lm, 0,0 0,N1
,
W
1 11
lm,2 l,m 2 lm, 2 l,m 2
ww
1,0 1,N1
mm m m
0,C 0,Lfe N1,C N 1,Lfe
In 7.6.2.2 Sub-rendering matrices for each OTT element, remove:
lm, l,m
Additional information is provided by the rendering matrix M with elements m , yielding the mapping of
ren i, j
lm,
all audio input channels i to the desired output channels j . The rendering matrix M for the 5.1 output
ren
configuration is given by:
lm,,lm
mm
0,Lf N1,Lf
lm,,l m
mm
0,Rf N1,Rf
lm,,l m
mm
0,CN1,C
lm,
M .
ren
lm,,l m
mm
0,Lfe N1,Lfe
lm,,lm
mm
0,Ls N1,Ls
lm,,lm
mm
0,Rs N1,Rs
In 7.6.2.4 Derivation of the MPS parameters, remove:
lm, lm,
Matrix F of size NN with elements f is given as
MPS MPS ij,
*
lm,,lm lm, lm,
.
FA E A
10 © ISO/IEC 2012 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 7.6.2.4 Derivation of the MPS parameters, remove:
lm,
The scalar v is computed as
*
lm,,l lm l 2
vDE D .
In 7.6.3 Stereo downmix (“x-2-5”) processing mode, replace:
c 1 b
1 i,3 T
3y with y γ .
c 1
2
2
ij,
j1,2
with:
c 1 b
1 i,3 T
3y with y γ .
c 1
2
2
2
max ,
ij,
j1,2
In 7.6.3 Stereo downmix (“x-2-5”) processing mode, replace:
00
,0r ,
1,2
P 00
2
vw diag,.otherwise
Rd
with:
00
2
,,r
1,2
00
P
2
vw diag,.otherwise
Rd
In 7.6.3 Stereo downmix (“x-2-5”) processing mode, replace:
ˆ
PG11
1
with:
11
ˆ
PG .
1
00
In 7.6.3 Stereo downmix (“x-2-5”) processing mode, replace:
ˆ
diaggG,0r ,
vec 1,2
G
ˆ
G,.otherwise
with:
2
ˆ
diaggG,,r
vec 1,2
G
ˆ
G,.otherwise
© ISO/IEC 2012 – All rights reserved 11
---------------------- Page: 11 ----------------------
ISO/IEC 23003-2:2010/Cor.1:2012(E)
In 7.6.3 Stereo downmix (“x-2-5”) processing mode, replace:
Figure 14 – “5-2-5” tree structure for the MPS decoder (stereo downmix)
with:
Figure 14 – “5-2-5” tree structure for the MPS decoder (stereo downmix)
In 7.6.3.2.2 Rendering between front and surround channels, replace:
acording to below.
with:
according to below.
In 7.6.3.3 Stereo Preprocessing, remove:
ˆ
The stereo downmix X is processed into the modified downmix signal X :
ˆ
XG X ,
In 7.6.3.3 Stereo Preprocessing, remove:
ˆ
The final stereo output from the SAOC transcoder X is produced by mixing X with a decorrelated signal
component according to:
ˆ
XGX PX ,
Mod 2 d
where the decorrelated signal X is calculated according to 7.6.3.4, and the mix matrices G and P
d Mod 2
acording to below.
12 © ISO/IEC 2012 – All rights reserved
------
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.