ISO/IEC 23003-3:2020
(Main)Information technology — MPEG audio technologies — Part 3: Unified speech and audio coding
Information technology — MPEG audio technologies — Part 3: Unified speech and audio coding
This document specifies a unified speech and audio codec which is capable of coding signals having an arbitrary mix of speech and audio content. The codec has a performance comparable to, or better than, the best known coding technology that might be tailored specifically to coding of either speech or general audio content. The codec supports single and multi-channel coding at high bitrates and provides perceptually transparent quality. At the same time, it enables very efficient coding at very low bitrates while retaining the full audio bandwidth. This document incorporates several perceptually-based compression techniques developed in previous MPEG standards: perceptually shaped quantization noise, parametric coding of the upper spectrum region and parametric coding of the stereo sound stage. However, it combines these well-known perceptual techniques with a source coding technique: a model of sound production, specifically that of human speech.
Technologies de l'information — Technologies audio MPEG — Partie 3: Codage unifié parole et audio
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 23003-3
Second edition
2020-06
Information technology — MPEG
audio technologies —
Part 3:
Unified speech and audio coding
Technologies de l'information — Technologies audio MPEG —
Partie 3: Codage unifié parole et audio
Reference number
ISO/IEC 23003-3:2020(E)
©
ISO/IEC 2020
---------------------- Page: 1 ----------------------
ISO/IEC 23003-3:2020(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2020
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Fax: +41 22 749 09 47
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2020 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 23003-3:2020(E)
Contents Page
Foreword. vii
Introduction . viii
1 Scope . 1
2 Normative references . 1
3 Terms, definitions, symbols and abbreviated terms . 1
3.1 Terms and definitions . 1
3.2 Symbols and abbreviated terms . 3
4 Technical overview . 4
4.1 Decoder block diagram . 4
4.2 Overview of the decoder tools . 5
4.3 Combination of USAC with MPEG Surround and SAOC . 9
4.4 Interface between USAC and systems . 9
4.4.1 Decoder behaviour . 10
4.5 USAC profiles and levels . 10
4.5.1 General . 10
4.5.2 MPEG-4 HE AACv2 compatibility . 11
4.5.3 Baseline USAC profile . 12
4.5.4 Extended high efficiency AAC profile . 13
4.6 Combination of USAC with MPEG-D DRC . 14
5 Syntax . 15
5.1 General . 15
5.2 Decoder configuration (UsacConfig) . 15
5.3 USAC bitstream payloads . 20
5.3.1 Payloads for audio object type USAC . 20
5.3.2 Subsidiary payloads . 23
5.3.3 Payloads for enhanced SBR . 34
5.3.4 Payloads for MPEG Surround. 43
5.3.5 Payload of extension elements . 53
6 Data structure . 53
6.1 USAC configuration . 53
6.1.1 Definition of elements . 53
6.1.2 UsacConfig() . 63
6.1.3 Usac Output Sampling Frequency . 63
6.1.4 UsacChannelConfig() . 63
6.1.5 UsacDecoderConfig() . 64
6.1.6 UsacSingleChannelElementConfig(). 64
6.1.7 UsacChannelPairElementConfig() . 64
6.1.8 UsacLfeElementConfig() . 64
6.1.9 UsacCoreConfig() . 64
6.1.10 SbrConfig() . 65
6.1.11 SbrDfltHeader() . 65
6.1.12 Mps212Config() . 65
6.1.13 UsacExtElementConfig() . 65
6.1.14 UsacConfigExtension() . 65
6.1.15 Unique stream identifier (Stream ID) . 66
6.2 USAC payload . 66
6.2.1 Definition of elements . 66
6.2.2 UsacFrame() . 68
6.2.3 UsacSingleChannelElement() . 69
6.2.4 UsacExtElement(). 69
6.2.5 UsacChannelPairElement() . 70
6.2.6 Low frequency enhancement (LFE) channel element, UsacLfeElement() . 70
6.2.7 UsacCoreCoderData() . 71
© ISO/IEC 2020 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 23003-3:2020(E)
6.2.8 StereoCoreToolInfo() . 71
6.2.9 fd_channel_stream() and ics_info() . 72
6.2.10 lpd_channel_stream() . 76
6.2.11 Spectral noiseless coder . 79
6.2.12 Enhanced SBR . 80
6.2.13 Definition of MPEG Surround 2-1-2 payloads. 82
6.2.14 Buffer requirements . 84
7 Tool descriptions . 85
7.1 Quantization . 85
7.1.1 Tool description . 85
7.1.2 Definition of elements . 85
7.1.3 Decoding process . 85
7.2 Noise filling . 85
7.2.1 Tool description . 85
7.2.2 Definition of elements . 86
7.2.3 Decoding process . 86
7.2.4 Generation of random signs for spectral noise filling . 87
7.3 Scale factors . 87
7.4 Spectral noiseless coding . 87
7.4.1 Tool description . 87
7.4.2 Definition of elements . 88
7.4.3 Decoding process . 89
7.5 enhanced SBR tool (eSBR) . 93
7.5.1 Modifications to SBR tool . 93
7.5.2 Additional pre-processing in the MPEG-4 SBR within USAC . 108
7.5.3 DFT based harmonic transposer . 110
7.5.4 QMF based harmonic transposer . 120
7.5.5 4:1 Structure for SBR in USAC . 128
7.5.6 Predictive vector coding (PVC) decoding process . 138
7.6 Inter-subband-sample temporal envelope shaping (inter-TES) . 141
7.6.1 Tool Description . 141
7.6.2 Definition of elements . 142
7.6.3 Inter-TES . 142
7.7 Joint stereo coding . 144
7.7.1 M/S stereo . 144
7.7.2 Complex stereo prediction . 144
7.8 TNS. 151
7.8.1 General . 151
7.8.2 Definition of elements . 151
7.8.3 Decoding process . 152
7.8.4 Maximum TNS bandwidth. 152
7.9 Filterbank and block switching . 153
7.9.1 Tool description . 153
7.9.2 Definition of elements . 153
7.9.3 Decoding process . 153
7.10 Time-warped filterbank and blockswitching . 161
7.10.1 Tools description . 161
7.10.2 Definition of elements . 161
7.10.3 Decoding process . 163
7.11 MPEG Surround for mono to stereo upmixing . 169
7.11.1 Tool description . 169
7.11.2 Decoding process . 170
7.12 AVQ decoding . 182
7.13 LPC-filter . 189
7.13.1 Tool description . 189
7.13.2 Definition of elements . 189
7.13.3 Number of LPC filters . 189
7.13.4 General principle of the inverse quantizer . 189
7.13.5 Decoding of the LPC quantization mode . 190
iv © ISO/IEC 2020 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 23003-3:2020(E)
7.13.6 First-stage approximation . 191
7.13.7 AVQ refinement . 191
7.13.8 Reordering of quantized LSFs . 193
7.13.9 Conversion into LSP parameters . 193
7.13.10 Interpolation of LSP parameters . 194
7.13.11 LSP to LP conversion . 194
7.13.12 LPC initialization at decoder start-up . 195
7.14 ACELP . 196
7.14.1 General . 196
7.14.2 Definition of elements . 196
7.14.3 ACELP initialization at USAC decoder start-up . 197
7.14.4 Setting of the ACELP excitation buffer using the past FD synthesis and LPC0. 197
7.14.5 Decoding of CELP excitation . 197
7.14.6 Excitation postprocessing . 203
7.14.7 Synthesis . 204
7.14.8 Writing in the output buffer . 204
7.15 MDCT based TCX . 205
7.15.1 Tool description . 205
7.15.2 Decoding process . 205
7.16 Forward aliasing cancellation (FAC) tool . 209
7.16.1 Tool description . 209
7.16.2 Definition of elements . 209
7.16.3 Decoding process . 210
7.16.4 Writing in the output buffer . 211
7.17 Post-processing of the synthesis signal . 212
7.18 Audio pre-roll . 214
7.18.1 General . 214
7.18.2 Semantics . 214
7.18.3 Decoding process . 215
8 Conformance testing . 217
8.1 General . 217
8.2 USAC conformance testing . 217
8.2.1 Profiles . 217
8.2.2 Conformance tools and test procedure . 218
8.3 USAC bitstreams . 222
8.3.1 General . 222
8.3.2 USAC configuration . 222
8.3.3 Framework . 225
8.3.4 Frequency domain coding (FD mode) . 226
8.3.5 Linear predictive domain coding (LPD mode) . 228
8.3.6 Common core coding tools . 229
8.3.7 Enhanced spectral band replication (eSBR) . 230
8.3.8 eSBR – Predictive vector coding (PVC) . 232
8.3.9 eSBR – Inter temporal envelope shaping (inter-TES). 233
8.3.10 MPEG Surround 2-1-2 . 233
8.3.11 Configuration Extensions . 235
8.3.12 AudioPreRoll. 235
8.3.13 DRC . 236
8.3.14 Restrictions depending on profiles and levels . 236
8.4 USAC decoders .
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.