ISO/IEC 23003-4:2015
(Main)Information technology — MPEG audio technologies — Part 4: Dynamic Range Control
Information technology — MPEG audio technologies — Part 4: Dynamic Range Control
ISO/IEC 23003-4:2015 specifies technology for loudness and dynamic range control. ISO/IEC 23003-4:2015 is applicable to most MPEG audio technologies. It offers flexible solutions to efficiently support the widespread demand for technologies such as loudness normalization and dynamic range compression for various playback scenarios.
Technologies de l'information — Technologies audio MPEG — Partie 4: Contrôle de gamme dynamique
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 23003-4
First edition
2015-11-15
Information technology — MPEG
audio technologies —
Part 4:
Dynamic Range Control
Technologies de l’information — Technologies audio MPEG —
Partie 4: Contrôle de gamme dynamique
Reference number
ISO/IEC 23003-4:2015(E)
©
ISO/IEC 2015
---------------------- Page: 1 ----------------------
ISO/IEC 23003-4:2015(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2015, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2015 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 23003-4:2015(E)
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and mnemonics . 1
3.1 Terms . 1
3.2 Mnemonics . 2
4 Symbols (and abbreviated terms) . 2
5 Technical overview . 3
6 DRC decoder . 4
6.1 DRC decoder configuration . 4
6.1.1 Overview . 4
6.1.2 Description of logical blocks . 5
6.1.3 Derivation of peak and loudness values . 8
6.2 Dynamic DRC gain payload .11
6.3 DRC set selection .12
6.3.1 Overview .12
6.3.2 Pre-selection based on Signal Properties and Decoder Configuration .13
6.3.3 Selection based on requests .16
6.3.4 Final selection .18
6.3.5 Applying multiple DRC sets .18
6.3.6 Album mode .19
6.3.7 Ducking .19
6.3.8 Precedence .19
6.4 Time domain DRC application .19
6.4.1 Overview .19
6.4.2 Framing .20
6.4.3 Time resolution .20
6.4.4 Time alignment .20
6.4.5 Decoding .20
6.4.6 Gain modifications and interpolation .24
6.4.7 Spline interpolation . .28
6.4.8 Look-ahead in decoder .28
6.4.9 Node reservoir .29
6.4.10 Applying the compression .30
6.4.11 Multi-band DRC filter bank .33
6.5 Sub-band domain DRC .37
6.6 Loudness normalization .40
6.6.1 Overview .40
6.6.2 Loudness normalization based on target loudness .40
6.7 DRC in streaming scenarios .43
6.7.1 DRC configuration .43
6.7.2 Error handling .43
6.8 DRC configuration changes during active processing .43
7 Syntax .45
7.1 Syntax of DRC payload .45
7.2 Syntax of DRC gain payload .46
7.3 Syntax of static DRC payload .47
7.4 Syntax of DRC gain sequence .59
Annex A (normative) Tables .60
Annex B (normative) External Interface to DRC tool .74
© ISO/IEC 2015 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 23003-4:2015(E)
Annex C (informative) Audio codec specific information .85
Annex D (informative) DRC gain generation and encoding .90
Annex E (informative) DRC set selection and adjustment at decoder .95
Annex F (informative) Loudness normalization .100
Annex G (informative) Peak limiter .101
Bibliography .106
iv © ISO/IEC 2015 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 23003-4:2015(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical
Barriers to Trade (TBT), see the following URL: Foreword — Supplementary information.
The committee responsible for this document is ISO/IEC JTC 1, Information Technology, Subcommittee
SC 29, Coding of audio, picture, multimedia, and hypermedia.
ISO/IEC 23003 consists of the following parts, under the general title Information technology — MPEG
audio technologies:
— Part 1: MPEG Surround
— Part 2: Spatial Audio Object Coding
— Part 3: Unified speech and audio coding
— Part 4: Dynamic Range Control
© ISO/IEC 2015 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO/IEC 23003-4:2015(E)
Introduction
Consumer audio systems and devices are used in a large variety of configurations and acoustical
environments. For many of these scenarios, the audio reproduction quality can be improved by
appropriate control of content dynamics and loudness.
This part of ISO/IEC 23003 provides a universal dynamic range control tool that supports loudness
normalization. The DRC tool offers a bitrate efficient representation of dynamically compressed
versions of an audio signal. This is achieved by adding a low-bitrate DRC metadata stream to the audio
signal. The DRC tool includes dedicated sections for clipping prevention, ducking, and for generating a
fade-in and fade-out to supplement the main dynamic range compression functionality. The DRC effects
available at the DRC decoder are generated at the DRC encoder side. At the DRC decoder side, the audio
signal may be played back without applying the DRC tool, or an appropriate DRC tool effect is selected
and applied based on the given playback scenario.
vi © ISO/IEC 2015 – All rights reserved
---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 23003-4:2015(E)
Information technology — MPEG audio technologies —
Part 4:
Dynamic Range Control
1 Scope
This part of ISO/IEC 23003 specifies technology for loudness and dynamic range control. This
International Standard is applicable to most MPEG audio technologies. It offers flexible solutions
to efficiently support the widespread demand for technologies such as loudness normalization and
dynamic range compression for various playback scenarios.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 14496-12, Information technology — Coding of audio-visual objects — Part 12: ISO base
media file format
ISO/IEC 23001-8, Information technology — MPEG systems technologies — Part 8: Coding-independent
code points
3 Terms, definitions and mnemonics
For the purposes of this document, the terms and definitions given in ISO/IEC 14496-12 and the
following apply.
3.1 Terms
3.1.1
DRC sequence
series of DRC gain values that can be applied to one or more audio channels
3.1.2
DRC set
defined set of DRC sequences that produce a desired effect if applied to the audio signal
3.1.3
album
collection of audio recordings that are mastered in a consistent way. Traditionally, a collection of songs
released on a Compact Disk belongs into this category, for example
© ISO/IEC 2015 – All rights reserved 1
---------------------- Page: 7 ----------------------
ISO/IEC 23003-4:2015(E)
3.2 Mnemonics
bslbf bit string, left bit first, where “left” is the order in
which bit strings are written in ISO/IEC 14496.
Bit strings are written as a string of 1s and 0s
within single quote marks, for example ‘1000
0001’. Blanks within a bit string are for ease of
reading and have no significance
uimsbf unsigned integer, most significant bit first
vlclbf variable length code, left bit first, where “left”
refers to the order in which the variable length
codes are written
bit(n) a bit string with n bits in the same format as bslbf
unsigned int(n) an unsigned integer with n bits in the same for-
mat as uimsbf
signed int(n) a signed integer with n bits, most significant bit
first
4 Symbols (and abbreviated terms)
a Filter coefficient
i
b Band index of DRC filter bank (starting at 0)
b Filter coefficient
i
deltaTmin Smallest permitted DRC gain sample interval in units of the audio sample interval.
f Cross-over frequency in Hz
c
f Cross-over frequency expressed as fraction of the audio sample rate.
c,norm
f (s) Cross-over frequency of audio decoder sub-band s expressed as fraction of the audio
c,norm,SB
sample rate. The cross-over frequency is the upper band edge frequency of the sub-
band.
f Audio sample rate in Hz. If an audio decoder is present, it is the sample rate of the de-
s
coded time-domain audio signal.
N Maximum permitted number of DRC samples per DRC frame. Identical to the number of
DRC
intervals with a duration of deltaTmin per DRC frame.
N Codec frame size in units of the audio sample interval 1/f
Codec s
M DRC frame size in units of the audio sample interval 1/f
DRC s
π Ratio of a circle’s circumference to its diameter
s Audio decoder sub-band index (starting at 0)
TRUE/FALSE Values of Boolean data type, which correspond to numerical 1 and 0, respectively.
z Complex variable of the z-transform
2 © ISO/IEC 2015 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/IEC 23003-4:2015(E)
5 Technical overview
The technology described in this part of ISO/IEC 23003 is called DRC tool. It provides efficient control
of dynamic range, loudness, and clipping based on metadata generated at the encoder. The decoder can
choose to selectively apply the metadata to the audio signal to achieve a desired result. Metadata for
dynamic range compression consists of encoded time-varying gain values that can be applied to the audio
signal. Hence, the main blocks of the DRC tool include a DRC gain encoder, a DRC gain decoder, a DRC gain
modification block, and a DRC gain application block. These blocks are exercised on a frame-by-frame
basis during audio processing. Various DRC configurations can be conveyed in a separate bitstream
element, such as configurations for a downmix or combined DRCs. The DRC set selection block decides
based on the playback scenario and the applicable DRC configurations which DRC gains to apply to the
audio signal. Moreover, the DRC tool supports loudness normalization based on loudness metadata.
A typical system for loudness and dynamic range control in the time domain is shown in Figure 1. A
more complex system including downmixer and peak limiter is shown in Figure 2. The decoder part
of the DRC tool is driven by metadata that efficiently represents the DRC gain samples and parameters
for interpolation. The gain samples can be updated as fast as necessary to accurately represent gain
changes down to at least 1 ms update intervals. In the following the decoder part of the DRC tool is
referred to as “DRC decoder”, which includes everything except the audio decoder and associated
bitstream de-multiplexing.
Figure 1 — Block diagram of a typical system with audio decoder and DRC tool modules to
achieve loudness normalization (LN) and dynamic range control
Figure 2 — Block diagram of a more complex system including downmixer and peak limiter
(TD = time-domain, SD = subband-domain)
© ISO/IEC 2015 – All rights reserved 3
---------------------- Page: 9 ----------------------
ISO/IEC 23003-4:2015(E)
6 DRC decoder
6.1 DRC decoder configuration
6.1.1 Overview
The DRC configuration information can be received in-stream using the static payloads uniDrcConfig()
and loudnessInfoSet() described below, or it can be delivered by a higher layer, such as 14496-12
(see Table 1). The basic decoding process of the static information is virtually the same. The difference
consists mainly in a few syntax changes and reduced field sizes to increase the bit rate efficiency of the in-
stream configuration. The syntax of the in-stream static payload is given in 7.3. The associated metadata
encoding is given inA.6. The static DRC payload is evaluated once at the beginning of the decoding
process and it is monitored subsequently. For static DRC payload changes during playback see 6.8.
Table 1 — Overview of configuration (setup) and separate metadata track in ISO/IEC 14496-12
Sample Entry Setup (in sample entry) Track reference Sample format
Code
Audio Track As specified for DRCInstructions box using ‘adrc’ referring to the As specified for the
the audio codec negative values for drcLoca- metadata tracks carry- audio codec in use
in use (un- tion ing gain values (unchanged)
changed)
Metadata ‘unid’ (none) (none) Each sample is a un-
Track iDrcGain() payload
The static payload is divided into five logical blocks:
— channelLayout();
— downmixInstructions();
— drcCoefficientsBasic(), drcCoefficientsUniDrc();
— drcInstructionsBasic(), drcInstructionUniDrc();
— loudnessInfo().
Except for the channelLayout(), multiple instances of a logical block can appear. The DRC decoder
combines the information of the matching instances of up to five logical blocks for a given playback
scenario. Matching instances are found by matching several identifiers (labels) contained in the blocks.
From the static payload the decoder can also extract information about the effect of a particular DRC
and various associated loudness information, if present. If multiple DRCs are available, this information
can be used to select a particular DRC based on target criteria for dynamics and loudness (see 6.3)
uniDrcConfig() contains all blocks except for the loudnessInfo() blocks which are bundled in
loudnessInfoSet(). The last part of the uniDrcConfig() payload can include future extension payloads.
In the event that a uniDrcConfigExtType value is received that is not equal to UNIDRCCONFEXT_TERM,
the DRC tool parser must read and discard the bits (otherBit) of the extension payload. Similarly, the
last part of the loudnessInfoSet() payload can include future extension payloads. In the event that a
loudnessInfoSetExtType value is received that is not equal to UNIDRCLOUDEXT_TERM, the DRC tool
parser must read and discard the bits (otherBit) of the extension payload.
The top level fields of uniDrcConfig() include the audio sample rate, which is a fundamental parameter
for the decoding process (if not present, the audio sample rate is inherited from the employed audio
codec). Moreover, the top level fields of uniDrcConfig() include the number of instances of each of
the logical blocks, except for the channelLayout() block which appears only once. The top level fields
of loudnessInfoSet() only include the number of loudnessInfo() blocks. The five logical blocks are
described in the following.
4 © ISO/IEC 2015 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/IEC 23003-4:2015(E)
6.1.2 Description of logical blocks
6.1.2.1 channelLayout()
The channelLayout() block includes the channel count of the audio signal in the base layout. It may
also include the base layout unless it is specified elsewhere. For use cases where the base audio signal
represents objects or other audio content, the channel count represents the total number of base
content channels.
6.1.2.2 downmixInstructions()
This block includes a unique non-zero downmix identifier (downmixId) that can be used externally to
refer to this downmix. The targetChannelCount specifies the number of channels after downmixing to
the target layout. It may also contain downmix coefficients, unless they are specified elsewhere. For
use cases where the base audio signal represents objects or other audio content, the downmixId can be
used to refer to a specific target channel configuration of a present rendering engine.
6.1.2.3 drcCoefficientsBasic(), drcCoefficientsUniDrc()
A drcCoefficients block describes all available DRC gain sequences in one location. The block can have
the basic format or the uniDrc format. The basic format, drcCoefficientsBasic(), contains a subset of
information included in drcCoefficientsUniDrc() that can be used to describe DRCs other than the ones
specified in this standard. drcCoefficientsUniDrc() contains for each sequence several indicators on
how it is encoded, the time resolution, time alignment, the number of DRC sub-bands and corresponding
crossover frequencies and DRC characteristics. The crossover frequencies must increase with
increasing band index. Alternatively, explicit indices in a decoder sub-band domain can be specified
for the assignment of DRC sub-bands. The sub-band indices must also increase with increasing band
index. If the DRC gains are applied in the time-domain by using the multi-band DRC filter bank specified
in 6.4.11, explicit index signalling is not allowed. The index of the DRC characteristic indicates which
compression characteristic was used to produce the gain sequence. The DRC location describes where
these gain sequences can be found in the bitstream. The DRC gain sequences in that location are
inherently enumerated according to their order of appearance starting with 1.
The DRC location field encoding depends on the audio codec. A codec specification may include this
specification, and use values 1 – 4 to refer to codec-specific locations as indicated in Table 1. For
example, for AAC (ISO/IEC 14496-3), the codec-specific values of the DRC location field are encoded as
shown in Table 3.
Table 2 — Encoding of drcLocation for in-stream payload
drcLocation n Payload
0 Reserved
1 Location 1 (Codec-specific use)
2 Location 2 (Codec-specific use)
3 Location 3 (Codec-specific use)
4 Location 4 (Codec-specific use)
n > 4 reserved
© ISO/IEC 2015 – All rights reserved 5
---------------------- Page: 11 ----------------------
ISO/IEC 23003-4:2015(E)
Table 3 — Codec-specific encoding of drcLocation for MPEG-4 Audio
drcLocation n Payload
1 uniDrc() (defined in Clause 7)
2 dyn_rng_sgn[i] / dyn_rng_ctl[i] in dynamic_range_info()
(defined in ISO/IEC 14496-3:2009 subpart 4)
3 compression _va lue in MPEG 4_ anc illar y_ dat a( )
(defined in ISO/IEC 14496-3:2009/AMD 4:2013)
4 reserved
The DRC frame size can optionally be specified. It must be provided if the DRC frame size deviates from
the default size specified in 6.4.2. If not specified, the default frame size is used.
The in-stream drcCoefficient syntax is given in Table 42 and Table 44. The syntax for the corresponding
block for ISO/IEC 14496-12 (ISO base media file format) is shown in Table 43 and Table 45. The
corresponding blocks carry essentially the same information. Values that are identically included in
both blocks are coded the same way except for drcLocation.
In ISO base media file format (see ISO/IEC 14496-12), for each codec that can be carried in MP4 files
and that also carries DRC information, there is a specific definition of how the location is coded, using
the DRC_location field (see Table 4). A negative value of DRC_location indicates that a DRC payload is in
an associated meta-data track. That track is the n-th linked via a track reference of type ‘adrc’ (audio
DRC) from the audio track, where n = abs(DRC_location), and the sample-entry type in the meta-data
track indicates in which format the coefficients are stored. Table 3 defines the specific entries of the
drcLocation field for AAC. Some example use cases are discussed in C.10.
If the uniDrc() payload is stored in a separate track in the ISO base media file format (ISO/IEC 14496-
12), then the track is a metadata track with the sample entry identifier ‘unid’ (uniDrc), with no required
boxes added to the sample entry. The time synchronization with the linked audio track is the same as if
the payload was in-stream.
Table 4 — Encoding of drcLocation for ISO/IEC 14496-12
drcLocation n Payload
n < 0 DRC payload located in |n|-th linked meta-data track
0 reserved
1 Location 1 (Codec-specific use)
2 Location 2 (Codec-specific use)
3 Location 3 (Codec-specific use)
4 Location 4 (Codec-specific use)
n > 4 reserved
6.1.2.4 drcInstructionsBasic(),
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.