ISO/IEC 23003-4:2015
(Main)Information technology - MPEG audio technologies - Part 4: Dynamic Range Control
Information technology - MPEG audio technologies - Part 4: Dynamic Range Control
ISO/IEC 23003-4:2015 specifies technology for loudness and dynamic range control. ISO/IEC 23003-4:2015 is applicable to most MPEG audio technologies. It offers flexible solutions to efficiently support the widespread demand for technologies such as loudness normalization and dynamic range compression for various playback scenarios.
Technologies de l'information — Technologies audio MPEG — Partie 4: Contrôle de gamme dynamique
General Information
Relations
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 23003-4
First edition
2015-11-15
Information technology — MPEG
audio technologies —
Part 4:
Dynamic Range Control
Technologies de l’information — Technologies audio MPEG —
Partie 4: Contrôle de gamme dynamique
Reference number
©
ISO/IEC 2015
© ISO/IEC 2015, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2015 – All rights reserved
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms, definitions and mnemonics . 1
3.1 Terms . 1
3.2 Mnemonics . 2
4 Symbols (and abbreviated terms) . 2
5 Technical overview . 3
6 DRC decoder . 4
6.1 DRC decoder configuration . 4
6.1.1 Overview . 4
6.1.2 Description of logical blocks . 5
6.1.3 Derivation of peak and loudness values . 8
6.2 Dynamic DRC gain payload .11
6.3 DRC set selection .12
6.3.1 Overview .12
6.3.2 Pre-selection based on Signal Properties and Decoder Configuration .13
6.3.3 Selection based on requests .16
6.3.4 Final selection .18
6.3.5 Applying multiple DRC sets .18
6.3.6 Album mode .19
6.3.7 Ducking .19
6.3.8 Precedence .19
6.4 Time domain DRC application .19
6.4.1 Overview .19
6.4.2 Framing .20
6.4.3 Time resolution .20
6.4.4 Time alignment .20
6.4.5 Decoding .20
6.4.6 Gain modifications and interpolation .24
6.4.7 Spline interpolation . .28
6.4.8 Look-ahead in decoder .28
6.4.9 Node reservoir .29
6.4.10 Applying the compression .30
6.4.11 Multi-band DRC filter bank .33
6.5 Sub-band domain DRC .37
6.6 Loudness normalization .40
6.6.1 Overview .40
6.6.2 Loudness normalization based on target loudness .40
6.7 DRC in streaming scenarios .43
6.7.1 DRC configuration .43
6.7.2 Error handling .43
6.8 DRC configuration changes during active processing .43
7 Syntax .45
7.1 Syntax of DRC payload .45
7.2 Syntax of DRC gain payload .46
7.3 Syntax of static DRC payload .47
7.4 Syntax of DRC gain sequence .59
Annex A (normative) Tables .60
Annex B (normative) External Interface to DRC tool .74
© ISO/IEC 2015 – All rights reserved iii
Annex C (informative) Audio codec specific information .85
Annex D (informative) DRC gain generation and encoding .90
Annex E (informative) DRC set selection and adjustment at decoder .95
Annex F (informative) Loudness normalization .100
Annex G (informative) Peak limiter .101
Bibliography .106
iv © ISO/IEC 2015 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the meaning of ISO specific terms and expressions related to conformity
assessment, as well as information about ISO’s adherence to the WTO principles in the Technical
Barriers to Trade (TBT), see the following URL: Foreword — Supplementary information.
The committee responsible for this document is ISO/IEC JTC 1, Information Technology, Subcommittee
SC 29, Coding of audio, picture, multimedia, and hypermedia.
ISO/IEC 23003 consists of the following parts, under the general title Information technology — MPEG
audio technologies:
— Part 1: MPEG Surround
— Part 2: Spatial Audio Object Coding
— Part 3: Unified speech and audio coding
— Part 4: Dynamic Range Control
© ISO/IEC 2015 – All rights reserved v
Introduction
Consumer audio systems and devices are used in a large variety of configurations and acoustical
environments. For many of these scenarios, the audio reproduction quality can be improved by
appropriate control of content dynamics and loudness.
This part of ISO/IEC 23003 provides a universal dynamic range control tool that supports loudness
normalization. The DRC tool offers a bitrate efficient representation of dynamically compressed
versions of an audio signal. This is achieved by adding a low-bitrate DRC metadata stream to the audio
signal. The DRC tool includes dedicated sections for clipping prevention, ducking, and for generating a
fade-in and fade-out to supplement the main dynamic range compression functionality. The DRC effects
available at the DRC decoder are generated at the DRC encoder side. At the DRC decoder side, the audio
signal may be played back without applying the DRC tool, or an appropriate DRC tool effect is selected
and applied based on the given playback scenario.
vi © ISO/IEC 2015 – All rights reserved
INTERNATIONAL STANDARD ISO/IEC 23003-4:2015(E)
Information technology — MPEG audio technologies —
Part 4:
Dynamic Range Control
1 Scope
This part of ISO/IEC 23003 specifies technology for loudness and dynamic range control. This
International Standard is applicable to most MPEG audio technologies. It offers flexible solutions
to efficiently support the widespread demand for technologies such as loudness normalization and
dynamic range compression for various playback scenarios.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 14496-12, Information technology — Coding of audio-visual objects — Part 12: ISO base
media file format
ISO/IEC 23001-8, Information technology — MPEG systems technologies — Part 8: Coding-independent
code points
3 Terms, definitions and mnemonics
For the purposes of this document, the terms and definitions given in ISO/IEC 14496-12 and the
following apply.
3.1 Terms
3.1.1
DRC sequence
series of DRC gain values that can be applied to one or more audio channels
3.1.2
DRC set
defined set of DRC sequences that produce a desired effect if applied to the audio signal
3.1.3
album
collection of audio recordings that are mastered in a consistent way. Traditionally, a collection of songs
released on a Compact Disk belongs into this category, for example
© ISO/IEC 2015 – All rights reserved 1
3.2 Mnemonics
bslbf bit string, left bit first, where “left” is the order in
which bit strings are written in ISO/IEC 14496.
Bit strings are written as a string of 1s and 0s
within single quote marks, for example ‘1000
0001’. Blanks within a bit string are for ease of
reading and have no significance
uimsbf unsigned integer, most significant bit first
vlclbf variable length code, left bit first, where “left”
refers to the order in which the variable length
codes are written
bit(n) a bit string with n bits in the same format as bslbf
unsigned int(n) an unsigned integer with n bits in the same for-
mat as uimsbf
signed int(n) a signed integer with n bits, most significant bit
first
4 Symbols (and abbreviated terms)
a Filter coefficient
i
b Band index of DRC filter bank (starting at 0)
b Filter coefficient
i
deltaTmin Smallest permitted DRC gain sample interval in units of the audio sample interval.
f Cross-over frequency in Hz
c
f Cross-over frequency expressed as fraction of the audio sample rate.
c,norm
f (s) Cross-over frequency of audio decoder sub-band s expressed as fraction of the audio
c,norm,SB
sample rate. The cross-over frequency is the upper band edge frequency of the sub-
band.
f Audio sample rate in Hz. If an audio decoder is present, it is the sample rate of the de-
s
coded time-domain audio signal.
N Maximum permitted number of DRC samples per DRC frame. Identical to the number of
DRC
intervals with a duration of deltaTmin per DRC frame.
N Codec frame size in units of the audio sample interval 1/f
Codec s
M DRC frame size in units of the audio sample interval 1/f
DRC s
π Ratio of a circle’s circumference to its diameter
s Audio decoder sub-band index (starting at 0)
TRUE/FALSE Values of Boolean data type, which correspond to numerical 1 and 0, respectively.
z Complex variable of the z-transform
2 © ISO/IEC 2015 – All rights reserved
5 Technical overview
The technology described in this part of ISO/IEC 23003 is called DRC tool. It provides efficient control
of dynamic range, loudness, and clipping based on metadata generated at the encoder. The decoder can
choose to selectively apply the metadata to the audio signal to achieve a desired result. Metadata for
dynamic range compression consists of encoded time-varying gain values that can be applied to the audio
signal. Hence, the main blocks of the DRC tool include a DRC gain encoder, a DRC gain decoder, a DRC gain
modification block, and a DRC gain application block. These blocks are exercised on a frame-by-frame
basis during audio processing. Various DRC configurations can be conveyed in a separate bitstream
element, such as configurations for a downmix or combined DRCs. The DRC set selection block decides
based on the playback scenario and the applicable DRC configurations which DRC gains to apply to the
audio signal. Moreover, the DRC tool supports loudness normalization based on loudness metadata.
A typical system for loudness and dynamic range control in the time domain is shown in Figure 1. A
more complex system including downmixer and peak limiter is shown in Figure 2. The decoder part
of the DRC tool is driven by metadata that efficiently represents the DRC gain samples and parameters
for interpolation. The gain samples can be updated as fast as necessary to accurately represent gain
changes down to at least 1 ms update intervals. In the following the decoder part of the DRC tool is
referred to as “DRC decoder”, which includes everything except the audio decoder and associated
bitstream de-multiplexing.
Figure 1 — Block diagram of a typical system with audio decoder and DRC tool modules to
achieve loudness normalization (LN) and dynamic range control
Figure 2 — Block diagram of a more complex system including downmixer and peak limiter
(TD = time-domain, SD = subband-domain)
© ISO/IEC 2015 – All rights reserved 3
6 DRC decoder
6.1 DRC decoder configuration
6.1.1 Overview
The DRC configuration information can be received in-stream using the static payloads uniDrcConfig()
and loudnessInfoSet() described below, or it can be delivered by a higher layer, such as 14496-12
(see Table 1). The basic decoding process of the static information is virtually the same. The difference
consists mainly in a few syntax changes and reduced field sizes to increase the bit rate efficiency of the in-
stream configuration. The syntax of the in-stream static payload is given in 7.3. The associated metadata
encoding is given inA.6. The static DRC payload is evaluated once at the beginning of the decoding
process and it is monitored subsequently. For static DRC payload changes during playback see 6.8.
Table 1 — Overview of configuration (setup) and separate metadata track in ISO/IEC 14496-12
Sample Entry Setup (in sample entry) Track reference Sample format
Code
Audio Track As specified for DRCInstructions box using ‘adrc’ referring to the As specified for the
the audio codec negative values for drcLoca- metadata tracks carry- audio codec in use
in use (un- tion ing gain values (unchanged)
changed)
Metadata ‘unid’ (none) (none) Each sample is a un-
Track iDrcGain() payload
The static payload is divided into five logical blocks:
— channelLayout();
— downmixInstructions();
— drcCoefficientsBasic(), drcCoefficientsUniDrc();
— drcInstructionsBasic(), drcInstructionUniDrc();
— loudnessInfo().
Except for the channelLayout(), multiple instances of a logical block can appear. The DRC decoder
combines the information of the matching instances of up to five logical blocks for a given playback
scenario. Matching instances are found by matching several identifiers (labels) contained in the blocks.
From the static payload the decoder can also extract information about the effect of a particular DRC
and various associated loudness information, if present. If multiple DRCs are available, this information
can be used to select a particular DRC based on target criteria for dynamics and loudness (see 6.3)
uniDrcConfig() contains all blocks except for the loudnessInfo() blocks which are bundled in
loudnessInfoSet(). The last part of the uniDrcConfig() payload can include future extension payloads.
In the event that a uniDrcConfigExtType value is received that is not equal to UNIDRCCONFEXT_TERM,
the DRC tool parser must read and discard the bits (otherBit) of the extension payload. Similarly, the
last part of the loudnessInfoSet() payload can include future extension payloads. In the event that a
loudnessInfoSetExtType value is received that is not equal to UNIDRCLOUDEXT_TERM, the DRC tool
parser must read and discard the bits (otherBit) of the extension payload.
The top level fields of uniDrcConfig() include the audio sample rate, which is a fundamental parameter
for the decoding process (if not present, the audio sample rate is inherited from the employed audio
codec). Moreover, the top level fields of uniDrcConfig() include the number of instances of each of
the logical blocks, except for the channelLayout() block which appears only once. The top level fields
of loudnessInfoSet() only include the number of loudnessInfo() blocks. The five logical blocks are
described in the following.
4 © ISO/IEC 2015 – All rights reserved
6.1.2 Description of logical blocks
6.1.2.1 channelLayout()
The channelLayout() block includes the channel count of the audio signal in the base layout. It may
also include the base layout unless it is specified elsewhere. For use cases where the base audio signal
represents objects or other audio content, the channel count represents the total number of base
content channels.
6.1.2.2 downmixInstructions()
This block includes a unique non-zero downmix identifier (downmixId) that can be used externally to
refer to this downmix. The targetChannelCount specifies the number of channels after downmixing to
the target layout. It may also contain downmix coefficients, unless they are specified elsewhere. For
use cases where the base audio signal represents objects or other audio content, the downmixId can be
used to refer to a specific target channel configuration of a present rendering engine.
6.1.2.3 drcCoefficientsBasic(), drcCoefficientsUniDrc()
A drcCoefficients block describes all available DRC gain sequences in one location. The block can have
the basic format or the uniDrc format. The basic format, drcCoefficientsBasic(), contains a subset of
information included in drcCoefficientsUniDrc() that can be used to describe DRCs other than the ones
specified in this standard. drcCoefficientsUniDrc() contains for each sequence several indicators on
how it is encoded, the time resolution, time alignment, the number of DRC sub-bands and corresponding
crossover frequencies and DRC characteristics. The crossover frequencies must increase with
increasing band index. Alternatively, explicit indices in a decoder sub-band domain can be specified
for the assignment of DRC sub-bands. The sub-band indices must also increase with increasing band
index. If the DRC gains are applied in the time-domain by using the multi-band DRC filter bank specified
in 6.4.11, explicit index signalling is not allowed. The index of the DRC characteristic indicates which
compression characteristic was used to produce the gain sequence. The DRC location describes where
these gain sequences can be found in the bitstream. The DRC gain sequences in that location are
inherently enumerated according to their order of appearance starting with 1.
The DRC location field encoding depends on the audio codec. A codec specification may include this
specification, and use values 1 – 4 to refer to codec-specific locations as indicated in Table 1. For
example, for AAC (ISO/IEC 14496-3), the codec-specific values of the DRC location field are encoded as
shown in Table 3.
Table 2 — Encoding of drcLocation for in-stream payload
drcLocation n Payload
0 Reserved
1 Location 1 (Codec-specific use)
2 Location 2 (Codec-specific use)
3 Location 3 (Codec-specific use)
4 Location 4 (Codec-specific use)
n > 4 reserved
© ISO/IEC 2015 – All rights reserved 5
Table 3 — Codec-specific encoding of drcLocation for MPEG-4 Audio
drcLocation n Payload
1 uniDrc() (defined in Clause 7)
2 dyn_rng_sgn[i] / dyn_rng_ctl[i] in dynamic_range_info()
(defined in ISO/IEC 14496-3:2009 subpart 4)
3 compression _va lue in MPEG 4_ anc illar y_ dat a( )
(defined in ISO/IEC 14496-3:2009/AMD 4:2013)
4 reserved
The DRC frame size can optionally be specified. It must be provided if the DRC frame size deviates from
the default size specified in 6.4.2. If not specified, the default frame size is used.
The in-stream drcCoefficient syntax is given in Table 42 and Table 44. The syntax for the corresponding
block for ISO/IEC 14496-12 (ISO base media file format) is shown in Table 43 and Table 45. The
corresponding blocks carry essentially the same information. Values that are identically included in
both blocks are coded the same way except for drcLocation.
In ISO base media file format (see ISO/IEC 14496-12), for each codec that can be carried in MP4 files
and that also carries DRC information, there is a specific definition of how the location is coded, using
the DRC_location field (see Table 4). A negative value of DRC_location indicates that a DRC payload is in
an associated meta-data track. That track is the n-th linked via a track reference of type ‘adrc’ (audio
DRC) from the audio track, where n = abs(DRC_location), and the sample-entry type in the meta-data
track indicates in which format the coefficients are stored. Table 3 defines the specific entries of the
drcLocation field for AAC. Some example use cases are discussed in C.10.
If the uniDrc() payload is stored in a separate track in the ISO base media file format (ISO/IEC 14496-
12), then the track is a metadata track with the sample entry identifier ‘unid’ (uniDrc), with no required
boxes added to the sample entry. The time synchronization with the linked audio track is the same as if
the payload was in-stream.
Table 4 — Encoding of drcLocation for ISO/IEC 14496-12
drcLocation n Payload
n < 0 DRC payload located in |n|-th linked meta-data track
0 reserved
1 Location 1 (Codec-specific use)
2 Location 2 (Codec-specific use)
3 Location 3 (Codec-specific use)
4 Location 4 (Codec-specific use)
n > 4 reserved
6.1.2.4 drcInstructionsBasic(), drcInstructionsUniDrc()
A drcInstructions block includes information about one specific DRC set that can be applied to
achieve a desired effect. This block can have the basic format or the uniDrc format. The basic format,
drcInstructionsBasic(), contains a subset of information included in drcInstructionsUniDrc() that can
be used to describe DRCs other than the ones specified in this standard. The information included in
drcInstructionsUniDrc() consists mainly of pre-defined description elements such as the DRC set effect
and the DRC gain sequences that are applied. The drcSetEffect field contains several effect bits as listed
in Table A.32. Multiple bits can be set unless otherwise noted. Note that if no effect bit is set at all,
the DRC set is ignored in the DRC set selection (see 6.3). Each drcInstructions block carries a unique
non-zero identifier drcSetId. A downmixId is included to indicate if this DRC set applies to a certain
downmix with this identifier. A downmixId of zero indicates that the DRC set is applied to the base
layout. A downmixId of 0x7F indicates that the DRC set can be applied before or after the downmix.
6 © ISO/IEC 2015 – All rights reserved
Since such a DRC can be applied to any downmix, it has only one channel group including all channels.
If a “Ducking” bit is set in the drcSetEffect field, the DRC set is applied before any downmix specified
by the downmix ID, i.e. the DRC set is always applied to the base layout and the downmix is generated
thereafter. The downmixId 0x7F is not permitted for a ducking DRC set. In all other cases, the DRC set is
applied to the channel configuration indicated by the downmix ID.
A second DRC set may be specified for certain configurations. These configurations include cases
where, e.g. one DRC set is used for dynamic range compression and the other for clipping prevention
(“Clipping” bit is set); or, e.g. one DRC set is applied before and the other after the downmix. In those
cases, the second DRC set contains a non-zero field dependsOnDrcSet that has the value of the drcSetId
of the first DRC set it depends on. The declared DRC set effects of the second DRC set do not take into
account the effects of the first DRC set. If the first DRC set is not designed to be used without combining
it with another DRC set, the noIndependentUse flag must be set to 1. In that case, the DRC set can only
be used in combination with another DRC set as indicated by the dependsOnDrcSet field of the other set
that is combined with it.
Usually, each audio channel is assigned to a DRC gain sequence. A collection of channels assigned to the
same DRC gain sequence is called “channel group”. The assignment of a DRC gain sequence to a channel
group is done in the order of first appearance of the sequence index when iterating through all channels
(see also Table 14). A DRC gain sequence index bsSequenceIndex == 0 indicates that the assigned channel
will be passed through by the DRC tool without processing unless otherwise noted. Note that therefore
bsSequenceIndex is effectively 1-based, wereas the corresponding indices (sequenceIndex) for processing
are zero-based.
If subsequent channels are assigned the same sequence index, the field repeatSequenceCount indicates
how many channels will have the same sequence not including the first.
The drcLocation field is used in the same way as the drcLocation field in the drcCoefficients (see 6.1.2.3).
Certain entries of the drcLocation field allow adding drcInstructions information to gain sequences
defined elsewhere. Some use cases are discussed in C.10.
The field limiterPeakTarget declares the peak target level used by the encoder-side DRC, if applicable.
For example, if a limiter is used to generate the DRC gain sequence, it is configured to control the audio
sample magnitude to not exceed this peak target level. limiterPeakTarget is represented in dBFS and
encoded according to Table A.27.
If limiterPeakTarget is present, and the only drcSetEffect is “clipping prevention”, the gain sequence is to
be shifted by the negative sum of loudnessNormalizationGainDb and limiterPeakTarget if the negative sum
is greater than 0. Afterwards, the gain sequence is saturated at the threshold of 0 dB so that only negative
gains (dB) occur. With this mechanism it is possible to send gains for clipping prevention in expectation
of a high loudnessNormalizationGainDb. If loudnessNormalizationGainDb is lower than expected, the gains
are applied only as far as needed, and the dynamic range can be kept as high as possible.
If gainScalingPresent == 1, the gain scaling coefficients must be applied to the channel group. If
gainOffsetPresent == 1, the gain offset value must be applied to the channel group as shown in Table 16.
Similarly, if duckingScalingPresent == 1, the scaling factor must be applied to the associated ducking
gain sequence for that channel group.
The in-stream drcInstructions syntax is given in Table 46 and Table 48. The syntax for the
corresponding block for ISO/IEC 14496-12 is shown in Table 47 and Table 49. The corresponding blocks
carry essentially the same information. Values that are identically included in both blocks are coded the
same way except for drcLocation. Further information on the coding of drcLocation is defined in 6.1.2.3.
6.1.2.5 loudnessInfo()
A loudnessInfo() block includes loudness and peak information. A downmix identifier and DRC set
identifier indicate which configuration the information applies to. Hence, this block can be associated
with the audio signal without DRC and without downmix, or with any specific DRC and/or downmix
applied. If a DRC with a dependent DRC set is applied, the loudness information describes the output of
© ISO/IEC 2015 – All rights reserved 7
the combined DRCs. A loudnessInfo() block can either represent an individual content item or the entire
album. Typically, all content items of an album include identical album loudnessInfo() blocks.
If downmixId is zero, then loudnessInfo() applies to the base layout. If the drcSetId is zero, then
loudnessInfo() applies to the audio signal without DRC processing.
The fields samplePeakLevel and truePeakLevel represent the level of the maximum sample magnitude
in dBFS and the true peak in dBTP, respectively, of the associated audio content before or after audio
encoding as defined in Reference [4]. The measurementSystem field includes standardized systems
and others (see Table A.37). System 3 is defined as ITU-R BS.1770-3 with pre-processing. The pre-
processing is a high-pass filter that models the typical limited frequency response of portable device
loudspeakers. System 4 is defined as “User”. It means that the corresponding methodValue reflects a
(subjective) user preference. System 5 is defined as “Expert/Panel”. It means that the corresponding
methodValue represents a (subjective) expert or panel preference.
The methodDefinition field according to Table A.36 specifies how the methodValue is derived. The mixing
[1]
level is compatible with “mixlevel” in ATSC A/52. It indicates the absolute acoustic sound pressure
level of an individual channel during the final audio mixing session. The peak mixing level is the acoustic
level of a sine wave in a single channel whose peaks reach 100 percent in the PCM representation. The
absolute SPL value is typically measured by means of pink noise with an RMS value of −20 or −30 dB
with respect to the peak RMS sine wave level. The value of mixing level is not typically used within the
DRC tool, but may be used by other parts of the audio reproduction system.
[1]
The room type field is compatible with “roomtyp” in ATSC A/52. It indicates the type and calibration
of the mixing room used for the final audio mixing session. The value of roomtyp is not typically used by
the DRC tool, but may be used by other parts of the audio reproduction system.
The loudnessInfoSet() payload contains all loudnessInfo() blocks. The in-stream syntax of
loudnessInfoSet() is given in Table 37. For the ISO base media file format the slightly different syntax of
“LoudnessBox” is used as defined in ISO/IEC 14496-12.
6.1.3 Derivation of peak and loudness values
The loudnessInfo() blocks provide optional values that describe loudness and peak. Several DRC
decoder processes depend on these values, hence, when the loudness information is partially or entirely
absent, fallback values are used as shown in Table 5. For peak values, a default value is to be used. Some
other values can be drawn from the loudnessInfo() block of the base layout.
8 © ISO/IEC 2015 – All rights reserved
Table 5 — Default and fallback values of loudnessInfo
Value Default 1st fallback: use value from 2nd fallback: use value from
loudnessInfo() of base layout loudnessInfo() of the base layout
with same DRCsetId without DRC
truePeakLevel 0.0 No No
samplePeakLevel 0.0 No No
programLoudness Undefined Yes Yes
anchorLoudness Undefined Yes Yes
loudnessRange Undefined No No
Maximum loudness range Undefined No No
Maximum momentary Undefined No No
loudness
Maximum short-term loud- Undefined No No
ness
Short-term loudness Undefined No No
Mixing level Undefined Yes Yes
Room type Undefined Yes Yes
The signalPeakLevel of a DRC set is determined as specified in Table 6, where peak related metadata
entries are selected dependent on their availability and dependent on the drcSetId, and the requested
downmixId. If no explicit peak information is available, signalPeakLevel is estimated from downmix
coefficients and others. The estimates based on downmix coefficients hold for passive downmixers and
might hold for specific active downmixers.
© ISO/IEC 2015 – All rights reserved 9
Table 6 — Determination of signalPeakLevel for a specific DRC set
getSignalPeakLevelForDrcSet (drcSetId, downmixIdRequested) {
dmxId = downmixIdRequested;
if truePeakLevelIsPresent(drcSetId, dmxId) {
signalPeakLevel = getTruePeakLevel(drcSetId, dmxId);
} else if samplePeakLevelIsPresent(drcSetId, dmxId) {
signalPeakLevel = getSamplePeakLevel(drcSetId, dmxId);
} else if limiterPeakTargetIsPresent(drcSetId, dmxId) {
signalPeakLevel = getLimiterPeakTarget(drcSetId, dmxId);
} else if (dmxId != 0) {
signalPeakLevelTmp = 0.0;
downmixPeakLevelLinear = 0.0;
if downmixCoefficientsArePresent(dmxId) {
for (i=0; i
downmixPeakLevelLinearTmp = 0.0;
for (j=0; j
downmixPeakLevelLinearTmp +=
pow(10.0, getDownmixCoefficient(dmxId, i, j)/20.0);
}
if (downmixPeakLevelLinear < downmixPeakLevelLinearTmp) {
downmixPeakLevelLinear = downmixPeakLevelLinearTmp;
}
}
}
if truePeakLevelIsPresent(drcSetId, 0) {
signalPeakLevelTmp = getTruePeakLevel(drcSetId, 0);
} else if samplePeakLevelIsPresent(drcSetId, 0) {
signalPeakLevelTmp = getSamplePeakLevel(drcSetId, 0);
} else if limiterPeakTargetIsPresent(drcSetId, 0) {
signalPeakLevelTmp = getLimiterPeakTarget(drcSetId, 0);
}
signalPeakLevel = signalPeakLevelTmp + 20.0*log10(downmixPeakLevelLinear);
} else {
signalPeakLevel = 0.0; /* worst case estimate */
}
return signalPeakLevel
}
Table 6 includes functions to check the availability and to retrieve peak-related information from
loudnessInfo() and a drcInstructions block which can have the basic or uniDrc format. Table 7 shows
pseudo code for some of the functions for the truePeakLevel and limiterPeakTarget. The functions for
samplePeakLevel can be implemented by replacing truePeakLevel with samplePeakLevel.
10 © ISO/IEC 2015 – All rights reserved
Table 7 — Pseudo code for functions referenced in Table 6
truePeakLevelIsPresent(drcSetId, downmixId) {
if (useAlbumMode == 1) count = loudnessInfoAlbumCount;
else count = loudnessInfoCount;
for (i=0; i
if (loudnessInfo[i]->drcSetId == drcSetId) &&
(loudnessInfo[i]->downmixId == downmixId) {
if (loudnessInfo[i]->truePeakLevelPresent) return TRUE;
}
}
return FALSE;
}
getTruePeakLevel(drcSetId, downmixId) {
if (useAlbumMode == 1) count = loudnessInfoAlbumCount;
else count = loudnessInfoCount;
for (i=0; i
if (loudnessInfo[i]->drcSetId == drcSetId) &&
(loudnessInfo[i]->downmixId == downmixId) {
if (loudnessInfo[i]->truePeakLevelPresent) {
return (loudnessInfo[i]->truePeakLevel);
}
}
}
return error;
}
limiterPeakTargetIsPresent(drcSetId, downmixId) {
for (i=0; i
if (drcInstructions[i]->drcSetId == drcSetId) &&
((drcInstructions[i]->downmixId == downmixId) ||
(drcInstructions[i]->downmixId == 0x7F)) {
if (drcInstructions[i]->limiterPeakTargetPresent) return TRUE;
}
}
return FALSE;
}
getlimiterPeakTarget(drcSetId, downmixId) {
for (i=0; i
if (drcInstructions[i]->drcSetId == drcSetId) &&
((drcInstructions[i]->downmixId == downmixId) ||
(drcInstructions[i]->downmixId == 0x7F)) {
if (drcInstructions[i]->limiterPeakTargetPresent)) {
return (drcInstructions[i]->limiterPeakTarget);
}
}
}
return error;
}
downmixCoefficientsArePresent(downmixId) {
for (i=0; i
if (downmixInstructions[i]->downmixId == downmixId) {
if (downmixInstructions[i]->downmixCoefficientsPresent) return TRUE;
}
}
return FALSE;
}
getDownmixCoefficient(downmixId, outChan, inChan) {
for (i=0; i
if (downmixInstructions[i]->downmixId == downmixId) {
if (downmixInstructions[i]->downmixCoefficientsPresent) {
return (downmixInstructions[i]->downmixCoefficient[outChan][inChan]);
}
}
}
return error;
}
6.2 Dynamic DRC gain payload
The dynamic gain sequences for all DRCs are received in-stream or via a metadata track using the
uniDrcGain() syntax given in Table 34. Each access unit contains gain sequences for the duration of
drcFrameSize samples that are decoded according to 6.4.5.
© ISO/IEC 2015 – All rights reserved 11
The last part of the uniDrcGain() can include future extension payloads. If a uniDrcGainExtType is
received that is different from UNIDRCGAINEXT_TERM, the extension payload (other
...
Frequently Asked Questions
ISO/IEC 23003-4:2015 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - MPEG audio technologies - Part 4: Dynamic Range Control". This standard covers: ISO/IEC 23003-4:2015 specifies technology for loudness and dynamic range control. ISO/IEC 23003-4:2015 is applicable to most MPEG audio technologies. It offers flexible solutions to efficiently support the widespread demand for technologies such as loudness normalization and dynamic range compression for various playback scenarios.
ISO/IEC 23003-4:2015 specifies technology for loudness and dynamic range control. ISO/IEC 23003-4:2015 is applicable to most MPEG audio technologies. It offers flexible solutions to efficiently support the widespread demand for technologies such as loudness normalization and dynamic range compression for various playback scenarios.
ISO/IEC 23003-4:2015 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 23003-4:2015 has the following relationships with other standards: It is inter standard links to ISO/IEC 23003-4:2015/Amd 1:2017, ISO/IEC 23003-4:2015/Amd 2:2017, ISO/IEC 23003-4:2020. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ISO/IEC 23003-4:2015 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
Die Norm ISO/IEC 23003-4:2015 behandelt die Technologie zur Regelung der Lautstärke und des Dynamikbereichs, was sie zu einem bedeutenden Dokument im Bereich der Informationstechnologie macht. Der Geltungsbereich dieser Norm ist bemerkenswert, da sie für die meisten MPEG-Audiotechnologien anwendbar ist und somit einen breiten Einfluss auf die Audioverarbeitung hat. Die Norm bietet flexible Lösungen, die den weit verbreiteten Anforderungen an Technologien wie Lautheitsnormalisierung und Dynamikbereichskompression gerecht werden. Ein herausragendes Stärke dieser Norm liegt in ihrer Fähigkeit, unterschiedliche Wiedergabeszenarien zu unterstützen. In einer Zeit, in der Inhalte in verschiedenen Formaten und über diverse Plattformen konsumiert werden, ist die Bedeutung von Lautheitskontrolle und dynamischer Regelung nicht zu unterschätzen. ISO/IEC 23003-4:2015 trägt dazu bei, die Konsistenz und Qualität des Audioerlebnisses zu gewährleisten, unabhängig von der Umgebung oder dem Gerät, und verbessert somit das Nutzererlebnis erheblich. Die Relevanz dieser Norm zeigt sich auch in der steigenden Nachfrage nach hochwertigem Audio, die sowohl in der Unterhaltungsindustrie als auch in professionellen Anwendungen wie Rundfunk und Musikproduktion von Bedeutung ist. Durch die Implementierung dieser Technologien wird es möglich, die Anforderungen von Herstellern und Endverbrauchern zu erfüllen, indem sie eine verbesserte Kontrolle über die Lautstärke bereitstellen. Insgesamt hebt sich die ISO/IEC 23003-4:2015 durch ihr umfassendes Konzept, ihre Flexibilität und ihre Kompatibilität mit einer Vielzahl von MPEG-Audiotechnologien hervor. Diese Aspekte machen die Norm zu einem unverzichtbaren Referenzdokument für Fachleute in der Audiotechnik.
La norme ISO/IEC 23003-4:2015 présente des spécifications techniques cruciales pour la gestion de la dynamique et du niveau sonore dans les technologies audio MPEG. Son champ d'application est vaste, couvrant la majorité des technologies audio MPEG, ce qui lui confère une grande pertinence dans l'ère numérique actuelle. L'un des principaux atouts de cette norme réside dans sa capacité à offrir des solutions flexibles pour répondre à la demande croissante de normalisation des niveaux sonores et de compression de la plage dynamique. Ces fonctionnalités sont particulièrement importantes dans un contexte de diversité des scénarios de lecture, allant des systèmes de sonorisation domestique aux plateformes de streaming. En intégrant les technologies de contrôle de la dynamique et du niveau sonore, la norme ISO/IEC 23003-4:2015 facilite l'accès à une expérience audio optimisée pour l'utilisateur, garantissant un équilibre sonore approprié quel que soit le type de contenu consommé. Cela témoigne de sa pertinence dans le développement continu des industries audio et multimédia. De plus, l'application de cette norme favorise la standardisation des pratiques au sein de l'industrie, permettant ainsi une meilleure interopérabilité entre différents systèmes et dispositifs. Cela est essentiel pour le développement de produits audio qui répondent aux attentes variées des consommateurs en matière de qualité sonore. En somme, l'ISO/IEC 23003-4:2015 se positionne comme un acteur clé pour le soutien et l'innovation dans le domaine des technologies audio, répondant à des besoins spécifiques tout en s'adaptant à l'évolution des demandes du marché.
The ISO/IEC 23003-4:2015 standard, titled "Information technology - MPEG audio technologies - Part 4: Dynamic Range Control," presents an essential framework for loudness and dynamic range control technologies, making it highly relevant in today's audio landscape. The scope of ISO/IEC 23003-4:2015 is significant as it addresses the increasing need for efficient and flexible solutions to support loudness normalization and dynamic range compression across various playback scenarios. As audio consumption continues to evolve, the demand for consistent listening experiences across different devices and settings has surged. This standard provides a foundation that caters to most MPEG audio technologies, ensuring compatibility and efficacy in a range of applications. One of the strengths of ISO/IEC 23003-4:2015 is its comprehensive approach to dynamic range control. By standardizing the methodologies for loudness adjustment, it empowers content creators and audio engineers to maintain audio integrity while meeting industry requirements for user experience. This standardization is crucial in mitigating issues such as audio clipping and distortion, which can arise from uncontrolled dynamic ranges, ultimately enhancing the listening experience for end-users. Moreover, ISO/IEC 23003-4:2015 promotes interoperability between different audio processing systems, which is particularly vital in a multicultural and multi-device environment. This aspect of the standard supports audio software and hardware manufacturers, as they can implement these standardized technologies into their products, thereby increasing market acceptance and consumer satisfaction. In summary, the ISO/IEC 23003-4:2015 standard’s focus on loudness and dynamic range control serves a critical role in the MPEG audio technologies ecosystem. Its relevance is underscored by the growing necessity for standardized solutions to address dynamic audio management, making it a valuable resource for professionals within the audio technology sector.
ISO/IEC 23003-4:2015は、音声技術に関連する重要な標準であり、ラウドネスとダイナミックレンジコントロールに関する技術仕様を提供しています。この標準は、セマンティックな音声処理の世界において、非常に重要な役割を果たしており、特にMPEG音声技術の多くに適用可能です。 ISO/IEC 23003-4:2015の特徴として、ラウドネスの正規化やダイナミックレンジ圧縮といった技術が挙げられます。これらは、様々な再生シナリオに対応するための柔軟なソリューションを提供し、特に音声の一貫性とバランスを確保する上で不可欠です。この標準は、エンターテイメント業界をはじめとする音響プロダクトにおける音質向上に寄与し、リスナーにとっての体験を豊かにするための基盤を形成します。 さらに、この標準は、音声信号の処理において求められる標準化されたアプローチを示しており、開発者や技術者にとっても利便性が高いものです。ISO/IEC 23003-4:2015を活用することで、ユーザーは異なるデバイスやプラットフォーム間での音質の一貫性を保持しながら、音声コンテンツの配信を容易に行うことができます。 総じて、ISO/IEC 23003-4:2015は、ラウドネスとダイナミックレンジの管理において重要な基準を確立しており、音声技術の進化と普及において不可欠な役割を果たしています。この標準の適用により、今後の音声処理技術の発展が期待されます。
ISO/IEC 23003-4:2015는 정보 기술 및 MPEG 오디오 기술에 관한 매우 중요한 표준으로, 특히 동적 범위 제어와 관련된 기술을 명확히 규정하고 있습니다. 이 표준의 주요 범위는 음량 조절 및 동적 범위 제어 기술에 대한 사양을 제공하는 것으로, 이는 대부분의 MPEG 오디오 기술에 적용 가능합니다. 이 문서는 음량 정상화 및 동적 범위 압축과 같은 기술에 대한 수요가 급증하는 현대의 다양한 재생 시나리오를 효과적으로 지원하기 위한 유연한 해결책을 제공합니다. ISO/IEC 23003-4:2015는 사용자에게 최적의 청취 경험을 제공하기 위해 음향 제어의 필요성을 잘 반영하고 있으며, 그로 인해 많은 오디오 시스템에서 이 표준의 적용이 요구되고 있습니다. 이 표준의 강점 중 하나는 다양한 오디오 환경에서 최적화된 재생 결과를 도출할 수 있도록 설계되었다는 점입니다. 또한, 다각적인 음향 기술이 통합되어 있는 복잡한 시스템에서도 호환성을 유지할 수 있도록 지원합니다. 이는 특히 방송, 음악 스트리밍, 영화 및 게임 산업 등에서 필수적이며, 그에 따라 산업 전반에 걸쳐 널리 적용될 수 있는 기반을 제공합니다. 결국, ISO/IEC 23003-4:2015는 오디오 기술의 발전과 함께 진화하는 사용자 요구 사항을 반영하며, 효율적이고 일관된 음향 솔루션을 보장함으로써 오디오 재생의 품질 및 사용자 만족도를 크게 향상시키는 데 기여하고 있습니다. 이러한 이유로, 이 표준은 현대 정보 기술의 중요한 구성 요소로 자리잡고 있으며, 오디오 기술 분야에서 그 중요성은 더욱 커질 것으로 예상됩니다.








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...