Information technology - MPEG systems technologies - Part 11: Energy-efficient media consumption (green metadata)

ISO/IEC 23001-11:2015 specifies metadata for energy-efficient decoding, encoding, presentation, and selection of media. The metadata for energy-efficient decoding specifies two sets of information: Complexity Metrics (CM) metadata and Decoding Operation Reduction Request (DOR-Req) metadata. A decoder uses CM metadata to vary operating frequency and thus reduce decoder power consumption. In a point-to-point video conferencing application, the remote encoder uses the DOR-Req metadata to modify the decoding complexity of the bitstream and thus reduce local decoder power consumption. The metadata for energy-efficient encoding specifies a quality metric that is used by a decoder to reduce the quality loss from low-power encoding. The metadata for energy-efficient presentation specifies RGB-component statistics and quality levels. A presentation subsystem uses this metadata to reduce power by adjusting display parameters, based on the statistics, to provide a desired quality level from those provided in the metadata. The metadata for energy-efficient media selection specifies Decoder Operation Reduction Ratios (DOR-Ratios), RGB-component statistics and quality levels. The client in an adaptive streaming session uses this metadata to determine decoder and display power-saving characteristics of available video Representations and to select the Representation with the optimal quality for a given power-saving.

Technologies de l'information — Technologies des systèmes MPEG — Partie 11: Consommation des supports éconergétiques (métadonnées vertes)

General Information

Status
Withdrawn
Publication Date
06-Jul-2015
Withdrawal Date
06-Jul-2015
Current Stage
9599 - Withdrawal of International Standard
Start Date
20-Mar-2019
Completion Date
30-Oct-2025
Ref Project

Relations

Standard
ISO/IEC 23001-11:2015 - Information technology -- MPEG systems technologies
English language
41 pages
sale 15% off
Preview
sale 15% off
Preview

Frequently Asked Questions

ISO/IEC 23001-11:2015 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - MPEG systems technologies - Part 11: Energy-efficient media consumption (green metadata)". This standard covers: ISO/IEC 23001-11:2015 specifies metadata for energy-efficient decoding, encoding, presentation, and selection of media. The metadata for energy-efficient decoding specifies two sets of information: Complexity Metrics (CM) metadata and Decoding Operation Reduction Request (DOR-Req) metadata. A decoder uses CM metadata to vary operating frequency and thus reduce decoder power consumption. In a point-to-point video conferencing application, the remote encoder uses the DOR-Req metadata to modify the decoding complexity of the bitstream and thus reduce local decoder power consumption. The metadata for energy-efficient encoding specifies a quality metric that is used by a decoder to reduce the quality loss from low-power encoding. The metadata for energy-efficient presentation specifies RGB-component statistics and quality levels. A presentation subsystem uses this metadata to reduce power by adjusting display parameters, based on the statistics, to provide a desired quality level from those provided in the metadata. The metadata for energy-efficient media selection specifies Decoder Operation Reduction Ratios (DOR-Ratios), RGB-component statistics and quality levels. The client in an adaptive streaming session uses this metadata to determine decoder and display power-saving characteristics of available video Representations and to select the Representation with the optimal quality for a given power-saving.

ISO/IEC 23001-11:2015 specifies metadata for energy-efficient decoding, encoding, presentation, and selection of media. The metadata for energy-efficient decoding specifies two sets of information: Complexity Metrics (CM) metadata and Decoding Operation Reduction Request (DOR-Req) metadata. A decoder uses CM metadata to vary operating frequency and thus reduce decoder power consumption. In a point-to-point video conferencing application, the remote encoder uses the DOR-Req metadata to modify the decoding complexity of the bitstream and thus reduce local decoder power consumption. The metadata for energy-efficient encoding specifies a quality metric that is used by a decoder to reduce the quality loss from low-power encoding. The metadata for energy-efficient presentation specifies RGB-component statistics and quality levels. A presentation subsystem uses this metadata to reduce power by adjusting display parameters, based on the statistics, to provide a desired quality level from those provided in the metadata. The metadata for energy-efficient media selection specifies Decoder Operation Reduction Ratios (DOR-Ratios), RGB-component statistics and quality levels. The client in an adaptive streaming session uses this metadata to determine decoder and display power-saving characteristics of available video Representations and to select the Representation with the optimal quality for a given power-saving.

ISO/IEC 23001-11:2015 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

ISO/IEC 23001-11:2015 has the following relationships with other standards: It is inter standard links to ISO/IEC 23001-11:2015/Amd 2:2018, ISO/IEC 23001-11:2015/Amd 1:2016, ISO/IEC 23001-11:2019. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ISO/IEC 23001-11:2015 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.

Standards Content (Sample)


INTERNATIONAL ISO/IEC
STANDARD 23001-11
First edition
2015-07-15
Information technology — MPEG
systems technologies —
Part 11:
Energy-efficient media consumption
(green metadata)
Technologies de l’information — Technologies des systèmes MPEG —
Partie 11: Consommation des supports éconergétiques (métadonnées
vertes)
Reference number
©
ISO/IEC 2015
© ISO/IEC 2015, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2015 – All rights reserved

Contents Page
Introduction .iv
1 Scope . 1
2 Normative references . 1
3 Terms, definitions, symbols, abbreviated terms and conventions . 2
3.1 Terms and definitions . 2
3.2 Symbols and abbreviated terms. 5
3.3 Conventions . 5
3.3.1 Arithmetic operators . 5
3.3.2 Mathematical functions . 6
4 Functional architecture (Informative) . 6
4.1 Description of the functional architecture . 6
4.2 Definition of components in the functional architecture . 7
5 Decoder power reduction . 8
5.1 General . 8
5.2 Complexity metrics for decoder-power reduction . 8
5.2.1 General. 8
5.2.2 Syntax . 9
5.2.3 Signalling . 9
5.2.4 Semantics . 9
5.3 Interactive signalling for remote decoder-power reduction .11
5.3.1 General.11
5.3.2 Syntax .11
5.3.3 Signalling .11
5.3.4 Semantics .12
6 Display power reduction using display adaptation .12
6.1 General .12
6.2 Syntax .12
6.2.1 Systems without a signalling mechanism from the receiver to the transmitter .12
6.2.2 Systems with a signalling mechanism from the receiver to the transmitter .13
6.3 Signalling .13
6.3.1 Systems without a signalling mechanism from the receiver to the transmitter .13
6.3.2 Systems with a signalling mechanism from the receiver to the transmitter .13
6.4 Semantics .13
7 Energy-efficient media selection .15
7.1 General .15
7.2 Syntax .15
7.3 Signalling .15
7.4 Semantics .15
7.4.1 Decoder-power indication metadata semantics .15
7.4.2 Display-power indication metadata semantics .16
8 Metrics for quality recovery after low-power encoding .16
8.1 General .16
8.2 Syntax .17
8.3 Signalling .17
8.4 Semantics .17
Annex A (normative) Supplemental Enhancement Information (SEI) syntax .18
Annex B (normative) Implementation guidelines for the usage of Green Metadata .20
© ISO/IEC 2015 – All rights reserved iii

Introduction
This part of ISO/IEC 23001 specifies the metadata (Green Metadata) that facilitates reduction of energy
usage during media consumption as follows:
— the format of the metadata that enables reduced decoder power consumption;
— the format of the metadata that enables reduced display power consumption;
— the format of the metadata that enables media selection for joint decoder and display power
reduction;
— the format of the metadata that enables quality recovery after low-power encoding.
This metadata facilitates reduced energy usage during media consumption without any degradation in
the Quality of Experience (QoE). However, it is also possible to use this metadata to get larger energy
savings, but at the expense of some QoE degradation.
iv © ISO/IEC 2015 – All rights reserved

INTERNATIONAL STANDARD ISO/IEC 23001-11:2015(E)
Information technology — MPEG systems technologies —
Part 11:
Energy-efficient media consumption (green metadata)
1 Scope
This part of ISO/IEC 23001 specifies metadata for energy-efficient decoding, encoding, presentation,
and selection of media.
The metadata for energy-efficient decoding specifies two sets of information: Complexity Metrics
(CM) metadata and Decoding Operation Reduction Request (DOR-Req) metadata. A decoder uses CM
metadata to vary operating frequency and thus reduce decoder power consumption. In a point-to-point
video conferencing application, the remote encoder uses the DOR-Req metadata to modify the decoding
complexity of the bitstream and thus reduce local decoder power consumption.
The metadata for energy-efficient encoding specifies a quality metric that is used by a decoder to reduce
the quality loss from low-power encoding.
The metadata for energy-efficient presentation specifies RGB-component statistics and quality levels. A
presentation subsystem uses this metadata to reduce power by adjusting display parameters, based on
the statistics, to provide a desired quality level from those provided in the metadata.
The metadata for energy-efficient media selection specifies Decoder Operation Reduction Ratios
(DOR-Ratios), RGB-component statistics and quality levels. The client in an adaptive streaming session
uses this metadata to determine decoder and display power-saving characteristics of available video
Representations and to select the Representation with the optimal quality for a given power-saving.
2 Normative references
The following documents, in whole or in part, are normatively referenced in this document and are
indispensable for its application. For dated references, only the edition cited applies. For undated
references, the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 13818-1:2013, Information technology — Generic coding of moving pictures and associated audio
information — Part 1: Systems
ISO/IEC 14496-10, Information technology — Coding of audio-visual objects — Part 10: Advanced Video
Coding
ISO/IEC 14496-12, Information technology — Coding of audio-visual objects — Part 12: ISO base media file
format
ISO/IEC 23001-10, Information technology — MPEG systems technologies — Part 10: Carriage of Timed
Metadata Metrics of Media in ISO Base Media File
ISO/IEC 23009-1, Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1:
Media presentation description and segment formats
ISO/IEC 23009-1:2015/Amd 2:2015, Spatial relationship description, generalized URL parameters and
other extensions
ISO/IEC/TR 23009-3:2015, Information technology — Dynamic adaptive streaming over HTTP (DASH) —
Part 3: Implementation guidelines
© ISO/IEC 2015 – All rights reserved 1

3 Terms, definitions, symbols, abbreviated terms and conventions
For the purposes of this document, the following terms and definitions apply.
3.1 Terms and definitions
3.1.1
Adaptation Set
using the terms and definitions in ISO/IEC 23009-1, a set of interchangeable encoded versions of one or
several media content components
3.1.2
Alpha-Point Deblocking Instance
APDI
using the notation, terms, and definitions in ISO/IEC 14496-10, a single filtering operation that produces
either a single, filtered output p’ or a single, filtered output q’ , where p’ and q’ are filtered samples
0 0 0 0
across a 4x4 block edge
3.1.3
bitstream
using the terms and definitions in ISO/IEC 14496-10, a sequence of bits that forms the representation of
coded pictures and associated data forming one or more coded video sequences
3.1.4
block
using the terms and definitions in ISO/IEC 14496-10, an MxN (M-column by N-row) array of samples or
an MxN array of transform coefficients
3.1.5
byte
using the terms and definitions in ISO/IEC 14496-10, a sequence of 8 bits, written and read with the
most significant bit on the left and the least significant bit on the right
3.1.6
chroma
using the terms and definitions in ISO/IEC 14496-10, an adjective specifying that a sample array or
single sample is representing one of the two colour difference signals relating to the primary colours
3.1.7
chroma_format_idc
using the notation, terms and definitions in ISO/IEC 14496-10, specifies the chroma sampling relative
to the luma sampling
3.1.8
decoded picture
using the terms and definitions in ISO/IEC 14496-10, a picture derived by decoding a coded picture
3.1.9
decoder
using the terms and definitions in ISO/IEC 14496-10, an embodiment of a decoding process
3.1.10
display process
using the terms and definitions in ISO/IEC 14496-10, a process that takes, as its input, the cropped
decoded pictures that are the output of the decoding process
3.1.11
encoder
using the terms and definitions in ISO/IEC 14496-10, an embodiment of an encoding process
2 © ISO/IEC 2015 – All rights reserved

3.1.12
frame
using the terms and definitions in ISO/IEC 14496-10, an array of luma samples in monochrome format
or an array of luma samples and two corresponding arrays of chroma samples in 4:2:0, 4:2:2 , and 4:4:4
colour format
3.1.13
informative
term used to refer to content provided in this Recommendation | International Standard that is not an
integral part of this Recommendation | International Standard
3.1.14
intra coding
using the terms and definitions in ISO/IEC 14496-10, coding of a block, macroblock, slice or picture that
uses intra prediction
3.1.15
luma
using the terms and definitions in ISO/IEC 14496-10, an adjective specifying that a sample array or
single sample is representing the monochrome signal relating to the primary colours
3.1.16
macroblock
using the terms and definitions in ISO/IEC 14496-10, a 16x16 block of luma samples and two corresponding
blocks of chroma samples of a picture that has three sample arrays, or a 16x16 block of samples of a
monochrome picture or a picture that is coded using three separate colour planes
3.1.17
Media Presentation Description
MPD
using the terms and definitions in ISO/IEC 23009-1, a formalized description for a Media Presentation
for the purpose of providing a streaming service
3.1.18
No-Quality-Loss Operating Point
NQLOP
metadata-enabled operating point associated with the largest display-power reduction that can be
achieved without any quality loss (infinite PSNR)
3.1.19
non-zero macroblock
macroblock (3.1.16) containing at least one non-zero sample
3.1.20
note
term that is used to prefix informative (3.1.13) remarks (used exclusively in an informative context)
3.1.21
peak signal
maximum permissible RGB component (3.1.31) in a reconstructed frame
Note 1 to entry: For 8-bit video, peak signal is 255.
3.1.22
period
interval over which complexity-metrics metadata are applicable
3.1.23
Period
using the terms and definitions in ISO/IEC 23009-1, an interval of the Media Presentation, where a
contiguous sequence of all Periods constitutes the Media Presentation
© ISO/IEC 2015 – All rights reserved 3

3.1.24
PicSizeInMbs
using the notation, terms and definitions in ISO/IEC 14496-10, a variable that is derived as the product
of PicWidthInMbs and PicHeightInMbs
3.1.25
picture
using the terms and definitions in ISO/IEC 14496-10, a collective term for a field or a frame
3.1.26
pixel
smallest addressable element in an all-points addressable display device
3.1.27
prediction
using the terms and definitions in ISO/IEC 14496-10, an embodiment of the prediction process
3.1.28
reconstructed frames
frames obtained after applying RGB colour-space conversion and cropping to the specific decoded picture
(3.1.8) or pictures (3.1.25) for which display power-reduction metadata are applicable
3.1.29
Representation
using the terms and definitions in ISO/IEC 23009-1, a collection and encapsulation of one or more media
streams in a delivery format and associated with descriptive metadata
3.1.30
RGB colour space
colour space based on the red, green, and blue primaries
3.1.31
RGB component
single sample representing one of the three primary colours of the RGB colour space (3.1.30)
3.1.32
Segment
using the terms and definitions in ISO/IEC 23009-1, a unit of data associated with an HTTP-URL and
optionally a byte range that are specified by an MPD
3.1.33
separate_colour_plane_flag
using the notation, terms, and definitions in ISO/IEC 14496-10, a flag that, when set, specifies that the
three colour components of the 4:4:4 chroma format are coded separately
3.1.34
shall
term used to express mandatory requirements for conformance to this Recommendation | International
Standard
3.1.35
should
term used to refer to behaviour of an implementation that is encouraged to be followed under anticipated
ordinary circumstances, but is not a mandatory requirement for conformance to this Recommendation
| International Standard
3.1.36
Six-Tap Filtering
STF
indicates a single application of the 6-tap filter, defined in ISO/IEC 14496-10, to generate a single filtered
sample
4 © ISO/IEC 2015 – All rights reserved

3.1.37
source
using the terms and definitions in ISO/IEC 14496-10, a term used to describe some of the video material
or some of its attributes before encoding
3.2 Symbols and abbreviated terms
For the purpose of this document, the symbols and abbreviated terms given in the following apply:
APDI Alpha-Point Deblocking Instance
ASIC Application Specific Integrated Circuit
AVC Advanced Video Coding
CM Complexity Metric
CMOS Complementary Metal Oxide Semiconductor
CPU Central processing Unit
DASH Dynamic Adaptive Streaming over HTTP
DOR-Ratio Decoding Operation Reduction Ratio
DOR-Req Decoding Operation Reduction Request
DVFS Dynamic Voltage Frequency Scaling
FS Fresh Start
GP Good Picture
MPD Media Presentation Description
MSD Mean Square Difference
MV Motion Vector
NQLOP No-Quality-Loss Operating Point
PSNR Peak Signal to Noise Ratio
QoE Quality of Experience
RBLL Remaining Battery Life Level
RGB Red, Green, Blue
SEI Supplemental Enhancement Information
SP Start Picture
STF Six-Tap Filtering
XSD Cross-Segment Decoding
3.3 Conventions
3.3.1 Arithmetic operators
+ Addition
© ISO/IEC 2015 – All rights reserved 5

- Subtraction (as a two-argument operator) or negation (as a unary prefix operator)
* Multiplication
y
x Exponentiation
x/y Division where no truncation or rounding is intended
Division where no truncation or rounding is intended
x
y
Summation of fi() with i taking all integer values from x up to and including y
y
fi()

ix=
3.3.2 Mathematical functions
Mathematical functions in this Technical Specification are defined as follows:
− Abs(x)= (3-1)

xx, ≥0

xx, <256

Clip()x = (3-2)

255, otherwise

Floor(x) is the greatest integer less than or equal to x (3-3)
Log10(x) returns the base-10 logarithm of x (3-4)
Round(x) = Sign(x) * Floor(Abs(x) + 0.5) (3-5)
−<10, x
Sign()x = (3-6)

10, x≤

4 Functional architecture (Informative)
This clause is informative and placed here to provide context.
4.1 Description of the functional architecture
Figure 1 shows the functional architecture utilizing Green Metadata in this Technical Specification. The
media pre-processor is applied to analyse and to filter the content source and a video encoder is used to
encode the content to a bitstream for delivery. The bitstream is delivered to the receiver and decoded
by a video decoder with the output rendered on a presentation subsystem that implements a display
process.
6 © ISO/IEC 2015 – All rights reserved

Transmitterr
Receiverr
MediMediaa FFrraammeework (Encoder) Media FrFramameweworoo k (Decoder)
Mediaaaaaaa Mediaa
MeMMM diaaa Mediaaa
Mediaaa Presentatioonnnn
Preeee-processorrr Encoderrrr
EncoEncodededddd EncoEncoddeeeddddddd Decoderrr Subsysteeeemmm
Mediaa
Mediaa
GrG eenee GrG eenee
Metaetaddaattttaaa MetaMetaddaattttaaa
Generatorrrrrrr
Generatorrrrrrr
N
GrG eeeen
o
MetaMMMetaddaattttaaaaa
GrGreeeen n GrGreeeen n
r Extractorrrrrrrrr
Metadata
Metadataa
PoPowweerrr
PoPowweerrr PoPowweeeerrrrrrrr PoPowweerrr
m
control controoool controlll control
a
Power optimization modulee
Power optimization modulee
t
i
(Decoder)
v
GrG eenee GrG eeeen
Green Green
ee
FeFFFeeededbbaackccckkkkk FeFFFeeededbbaackccckkkkkkk
Feedback
Feeddddbackkkk
Extractorrrrrrrrr Generatooooooorrrrrrr
Figure 1 — Functional architecture
The Green Metadata is extracted from either the media encoder or the media pre-processor. In both cases,
the Green Metadata is multiplexed or encapsulated in the conformant bitstream. Such Green Metadata
is used at the receiver to reduce the power consumption for video decoding and presentation. The
bitstream will be packetized and delivered to the receiver for decoding and presentation. At the receiver,
the metadata extractor processes the packets and sends the Green Metadata to a power optimization
module for efficient power control. For instance, the power optimization module interprets the Green
Metadata and then applies appropriate operations to reduce the video decoder’s power consumption
when decoding the video and also to reduce the presentation subsystem’s power consumption when
rendering the video. In addition, the power-optimization module could collect receiver information, such
as remaining battery capacity, and send it to the transmitter as green feedback to adapt the encoder
operations for power-consumption reduction.
The normative aspect of this document is limited to the Green Metadata and Green Feedback in
Figure 1.
4.2 Definition of components in the functional architecture
Green Metadata generator
— Generates metadata from either the video encoder or the content pre-processor.
Green Metadata extractor
— Interprets the bitstream syntax information and sends it to the power optimization module in the
receiver.
Green feedback generator
— Generates feedback information for the transmitter.
— Communicates with the transmitter through a feedback channel, if available, for energy-efficient
processing.
© ISO/IEC 2015 – All rights reserved 7

Green feedback extractor
— Receives the feedback from the receiver and sends it to the power optimization module in the
transmitter.
Power optimization module in the transmitter
— Collects platform statistics such as the remaining battery capacity of the device in which the
transmitter resides.
— Controls the operation of the Green Metadata generator, video encoder and content pre-processor.
— Processes green feedback.
Power optimization module in the receiver
— Processes the green-metadata information and applies appropriate operations for power-
consumption control.
— Collects platform statistics such as remaining battery capacity of the device in which the receiver
resides.
— Sends requests to Green feedback generator.
5 Decoder power reduction
5.1 General
Energy-efficient decoding is achieved with two types of metadata: Complexity Metrics (CMs) metadata
and Decoding Operation Reduction Request (DOR-Req) metadata. A decoder may use CMs metadata
to vary operating frequency and thus reduce decoder power consumption. In a point-to-point video
conferencing application, the remote encoder may use the DOR-Req metadata to modify the decoding
complexity of the bitstream and thus reduce local decoder power consumption.
5.2 Complexity metrics for decoder-power reduction
5.2.1 General
With respect to the functional architecture in Figure 1, the green-metadata generator provides CMs that
indicate the picture-decoding complexity of an AVC bitstream to the decoder.
8 © ISO/IEC 2015 – All rights reserved

5.2.2 Syntax
The syntax for the CMs is as follows:
Size (bits) Descriptor
period_type 8 unsigned integer
if (period_type = = 2) {
num_seconds 16 unsigned integer
}
else if (period_type = = 3) {
num_pictures 16 unsigned integer
}
percent_non_zero_macroblocks 8 unsigned integer
percent_intra_coded_macroblocks 8 unsigned integer
percent_six_tap_filterings 8 unsigned integer
percent_alpha_point_deblocking_instances 8 unsigned integer
5.2.3 Signalling
SEI messages can be used to signal Green Metadata in an AVC stream. The Green Metadata SEI message
payload type is specified in ISO/IEC 14496-10:2014/Amd. 2. The complete syntax of the Green Metadata
SEI message payload is specified in Annex A.
The message containing the CMs is transmitted at the start of an upcoming period. The next message
containing CMs will be transmitted at the start of the next upcoming period. Therefore, when the
upcoming period is a picture or the interval up to the next I-slice, a message will be transmitted for each
picture or interval, respectively. However, when the upcoming period is a specified time interval or a
specified number of pictures, the associated message will be transmitted with the first picture in the
time interval or with the first picture in the specified number of pictures.
5.2.4 Semantics
The semantics of various terms are defined below.
period_type – specifies the type of upcoming period over which the four complexity metrics are
applicable and is defined in the following table.
Value Description
0x00 complexity metrics are applicable to a single picture
0x01 complexity metrics are applicable to all pictures in decoding order, up to (but not including)
the picture containing the next I slice
0x02 complexity metrics are applicable over a specified time interval in seconds
0x03 complexity metrics are applicable over a specified number of pictures counted in decoding
order
0x04–0xFF user-defined
num_seconds – when period_type is 2, num_seconds indicates the number of seconds over which the
complexity metrics are applicable.
num_pictures – when period_type is 3, num_pictures specifies the number of pictures, counted in
decoding order, over which the complexity metrics are applicable.
© ISO/IEC 2015 – All rights reserved 9

num_pics_in_period – specifies the number of pictures in the specified period. When period_type
is 0, then num_pics_in_period is 1. When period_type is 1, then num_pics_in_period is determined by
counting the pictures in decoding order up to (but not including) the one containing the next I slice.
When period_type is 2, then num_pics_in_period is determined from the frame rate. When period_type
is 3, then num_pics_in_period is equal to num_pictures.
th
total_num_macroblocks_pic(i) – set to the value of the AVC variable picSizeInMbs for the i picture
within the specified period, where 1 <= i <= num_pics_in_period.
total_num_macroblocks_in_period – indicates the total number of macroblocks that are coded in the
specified period. Determined by the following computation:
num_pics_in_period
total_num_macroblocks_pic()i (5-1)

i=1
num_intra_coded_macroblocks – indicates the number of intra-coded macroblocks in the specified
period.
percent_intra_coded_macroblocks – indicates the percentage of intra-coded macroblocks in the
specified period and is defined as follows:
 num_intra_coded_macrobblocks 
percent_intra_coded_macroblocks=Floor ∗255 (5-2)
 
total_num_macroblocks_in_period
 
num_non_zero_macroblocks – indicates the number of non-zero macroblocks in the specified period.
percent_non_zero_macroblocks – indicates the percentage of non-zero macroblocks in the specified
period and is defined as follows:
 num_non_zero_macroblocks 
percent_non_zero_macroblokcs=Floor ∗255 (5-3)
 
ttotal_num_macroblocks_in_period
 
num_six_tap_filterings – indicates the number of Six-Tap Filterings (STFs), as defined in ISO/IEC 14496-
10, within the specified period.
max_num_six_tap_filterings_pic(i) – indicates the maximum number of STFs that could occur
th
in the i picture within the specified period, where 1 <= i <= num_pics_in_period. Set to the value
th
(1664 * picSizeInMbs), where picSizeInMbs is the value of the corresponding AVC variable for the i
picture.
max_num_six_tap_filterings_in_period – indicates the maximum number of STFs that could occur
within the specified period. Determined by the following computation:
num_pics_in_period
max_num_six_tap_filterings_pic()i (5-4)

i=1
percent_six_tap_filterings – indicates the percentage of STFs in the specified period and is defined as
follows:
 num_six_tap_filterings 
percent_six_tap_filterings=Floor ∗255 (5-5)
 
max_nnum_six_tap_filterings_in_period
 
num_alpha_point_deblocking_instances – indicates the number of Alpha-Point Deblocking Instances
(APDIs) in the specified period. Using the notation in ISO/IEC 14496-10, this is equivalent to the total
number of filtering operations applied to produce filtered samples of the type p’ or q’ , in the specified
0 0
period.
10 © ISO/IEC 2015 – All rights reserved

max_num_alpha_point_deblocking_instances_pic(i) – indicates the maximum number of APDIs that
th
could occur in the i picture within the specified period, where 1 <= i <= num_pics_in_period. Set as
follows:
max_num_alpha_point_deblocking_instances_pic(i) =
128 * chroma_format_multiplier * PicSizeInMbs (5-6)
where chroma_format_multiplier depends on the AVC variables separate_colour_plane_flag and chroma_
format_idc as shown in the following table.
chroma_format_multiplier separate_colour_plane_flag chroma_format_idc Comment
1 1 any value separate colour
plane
1 0 0 monochrome
1.5 0 1 4:2:0 sampling
2 0 2 4:2:2 sampling
3 0 3 4:4:4 sampling
max_num_alpha_point_deblocking_instances_in_period – indicates the maximum number of APDIs
that could occur within the specified period. Determined by the following computation:
num_pics__in_period
max_num_alpha_point_deblocking_instances_pic()i (5-7)

i=1
percent_alpha_point_deblocking_instances – indicates the percentage of APDIs in the specified
period and is defined as follows:
percent_alpha_point_deblocking_instances=
(5-8)
 num_alpha_poiint_deblocking_instances 
Floor ∗255
 
max_num_alpha_point_deblocking_insstances
 
5.3 Interactive signalling for remote decoder-power reduction
5.3.1 General
For point-to-point video conferencing, each device contains a transmitter and a receiver. A local device
sends metadata that instructs the remote device to modify the decoding complexity of the bitstream
and thus reduce local decoder-power consumption.
5.3.2 Syntax
The syntax is as follows:
Size (bits) Descriptor
dec_ops_reduction_req 8 signed integer
5.3.3 Signalling
The transmitter in each device sends a dec_ops_reduction_req (DOR-Req) message to the attention of
the remote encoder. This message requests the remote encoder to adjust its encoding parameters so
that ideally, when the local decoder decodes the bitstream, the power saving of the local decoder will
match the power saving implied by the DOR-Req message.
© ISO/IEC 2015 – All rights reserved 11

5.3.4 Semantics
dec_ops_reduction_req – the requested percentage reduction of local decoding operations relative
to the local decoding operations since the last dec_ops_reduction_req was sent to the transmitter, or
since the start of the video session, if no earlier dec_ops_reduction_req was sent. The percentage will
be expressed as a signed integer. A negative percentage means an increase of decoding operations. dec_
ops_reduction_req is an integer in the interval [-100, 100].
6 Display power reduction using display adaptation
6.1 General
With respect to the functional architecture, Display Adaptation (DA) provides Green Metadata comprised
of RGB-component statistics and quality indicators. The statistics are used to set display controls in the
presentation subsystem so that desired quality levels and corresponding display power reductions are
attained.
6.2 Syntax
6.2.1 Systems without a signalling mechanism from the receiver to the transmitter
The following message format is used to send metadata from the transmitter to the receiver:
Size Descriptor
(bits)
num_constant_backlight_voltage_time_intervals 2 unsigned integer
num_max_variations 2 unsigned integer
num_quality_levels 4 unsigned integer
for (j = 0; j < num_max_variations; j++) {
max_variation[j] 8 unsigned integer
}
for (k = 0; k < num_constant_backlight_voltage_time_intervals;k++) {
constant_backlight_voltage_time_interval[k] 16 unsigned integer
for (j = 0; j < num_max_variations; j++) {
lower_bound[k][j] 8 unsigned integer
if (lower_bound[k][j] > 0) {
upper_bound[k][j] 8 unsigned integer
}
rgb_component_for_infinite_psnr[k][j] 8 unsigned integer
for (i = 1; i < = num_quality_levels; i++) {
max_rgb_component[k][j][i] 8 unsigned integer
scaled_psnr_rgb[k][j][i] 8 unsigned integer
}
}
}
12 © ISO/IEC 2015 – All rights reserved

6.2.2 Systems with a signalling mechanism from the receiver to the transmitter
The receiver first uses the following message format to signal information to the transmitter:
Size (bits) Descriptor
constant_backlight_voltage_time_interval 16 unsigned integer
max_variation 8 unsigned integer
The transmitter then uses the message format shown below to then signal metadata to the receiver:
Size (bits) Descriptor
num_quality_levels 4 unsigned integer
rgb_component_for_infinite_psnr 8 unsigned integer
for (i = 1; i < = num_quality_levels; i++) {
max_rgb_component[i] 8 unsigned integer
scaled_psnr_rgb[i] 8 unsigned integer
}
6.3 Signalling
6.3.1 Systems without a signalling mechanism from the receiver to the transmitter
Green Metadata can be carried as specified in ISO/IEC 13818-1:2013-Amd 3 or it can be carried in metadata
tracks within the ISO Base Media File Format (ISO/IEC 14496-12), as specified in ISO/IEC 23001-10.
Using the format in 6.2.1, the transmitter sends a message to the receiver. The DA metadata is applicable
to the presentation subsystem until the next message containing DA metadata arrives.
6.3.2 Systems with a signalling mechanism from the receiver to the transmitter
Using the first message format described in 6.2.2, the receiver first signals constant_backlight_voltage_
time_interval and max_variation to the transmitter. The transmitter then uses the second message
format in 6.2.2 to send a message to the receiver. The DA metadata is applicable to the presentation
subsystem until the next message containing DA metadata arrives.
6.4 Semantics
num_constant_backlight_voltage_time_intervals – the number of constant backlight/voltage time
intervals for which metadata is provided in the bitstream.
constant_backlight_voltage_time_interval[k] – the minimum time interval, in milliseconds, that
th
must elapse before the backlight can be updated after the last backlight update. This is the k minimum
time interval for which metadata is provided in the bitstream, where 0 <= k < num_constant_backlight_
voltage_time_intervals.
num_max_variations – the number of maximum variations for which metadata is provided in the
bitstream.
max_variation[j] – the maximal change between backlight values of two successive frames relative to
the backlight value of the earlier frame. The backlight value for a frame is the value of backlight_scaling_
factor[k][j][i] for that frame. max_variation is in the range [0,001, 0,1] and is normalized to one byte
th
maximal backlight change for which metadata is
by rounding after multiplying by 2 048. This is the j
provided in the bitstream, where 0 <= j < num_max_variations.
num_quality_levels – the number of quality levels that are enabled by the metadata, excluding the
NQLOP.
© ISO/IEC 2015 – All rights reserved 13

th th
max_rgb_component[k][j][i] – for the k constant_backlight_voltage_time_interval, j max_variation
th
and i quality level, the maximum RGB component (as defined in 3.1) that will be retained in the frames,
where 1 <= i <= num_quality_levels. Note that max_rgb_component[k][j][0] = rgb_component_for_
infinite_psnr[k][j].
th th
scaled_frames[k][j][i] – for the k constant_backlight_voltage_time_interval, j max_variation
th
and i quality level, the frames obtained from the reconstructed frames by saturating to max_rgb_
component[k][j][i] all RGB components that are greater than max_rgb_component[k][j][i], where
0 <= i <= num_quality_levels.
th th
rgb_component_for_infinite_psnr[k][j] – for the k constant_backlight_voltage_time_interval and j
max_variation, the largest RGB component (as defined in 3.1) in the reconstructed frames. Therefore,
scaled_frames[k][j][0] are identical to the reconstructed frames. The rgb_component_for_infinite_
psnr[k][j] defines a No-Quality-Loss Operating Point (NQLOP) and consequently scaled_frames[k][j][0]
will have a PSNR of infinity relative to the reconstructed frames.
scaled_psnr_rgb[k][j][i] – the PSNR of scaled_frames[k][j][i] relative to the reconstructed frames. This
PSNR is defined as follows:
scaled_psnr_rgb[k][j][i] =
 
 
 
 
 
 peaksignal ∗∗widthheight∗∗NN 
colour frames
Clip RoundL10 og (6-1)
 
 
10 
N N peaksignal
frames colour 2
 
 
 N ()ll∗−()X 
∑ ∑ ∑ c,,ns
 
n=1 c=1 l=+X 1
 s 
 
 
for 0 < I <= num_quality_levels,
where
width is the width of a video frame;
height is the height of a video frame;
N is the number of colour channels. For RGB colourspace, N = 3;
colour colour
N is the number of frames in the reconstructed frames;
frames
th
N (l) is the number of RGB components that are set to l in the n frame of colour-channel c in
c,n
reconstructed frames;
X is max_rgb_component[k][j][i].
s
Note that scaled_psnr_rgb[k][j][0] is associated with the NQLOP. It is not transmitted, but understood to
be mathematically infinite.
th
backlight_scaling_factor[k][j][i] – max_rgb_component[k][j][i]/peak signal, for the k constant_
th th
backlight_voltage_time_interval, j max_variation and i quality level.
lower_bound[k][j] – if lower_bound[k][j] is greater than zero, then metadata for contrast enhancement
th th
is available at the lowest quality level, for the k constant_backlight_voltage_time_interval and j max_
variation. If lower_bound[k][j] = 0, then contrast-enhancement metadata is unavailable.
th th
upper_bound[k][j] – for the k constant_backlight_voltage_time_interval and j max_variation, if
lower_bound[k][j] is greater than zero, then contrast enhancement is performed as follows: All RGB
components of reconstructed frames that are less than or equal to lower_bound[k][j] are set to zero and
all RGB components that are greater than or equal to upper_bound[k][j] are saturated to peak signal.
The RGB components in the range (lower_bound[k][j], upper_bound[k][j]) are mapped linearly onto the
range (0, peak signal).
14 © ISO/IEC 2015 – All rights reserved

7 Energy-efficient media selection
7.1 General
The Green Metadata specified in this clause can enable a client in an adaptive streaming session, such as
DASH, to determine decoder and display power-saving characteristics of available video Representations
and to select the Representation with the optimal quality for a given power-saving.
Two types of Green Metadata are defined as follows:
— decoder-power indication metadata gives the potential decoder power saving of each available
Representation of a video Segment;
— display-power indication metadata gives the maximum potential display power saving of a video
Segment for a specified number of quality levels. This metadata is computed without any constraint
on the maximal backlight change between two successive frames and with no practical restriction
on the minimum time interval between backlight updates. Therefore, using the semantics of 6.4,
the metadata is produced with the assumptions that max_variation is mathematically infinite and
that constant_backlight_voltage_time_interval is less than or equal to the interval between two
successive frames.
7.2 Syntax
The decoder-power indication metadata is a pair of decoder operations reduction ratios:
Size (bits) Descriptor
dec_ops_reduction_ratio_from_max 8 unsigned integer
dec_ops_reduction_ratio_from_prev 16 signed integer
The display-power indication metadata contains a list of ms_num_quality_levels pairs, as shown below:
Size (bits) Descriptor
ms_num_quality_levels 4 unsigned integer
ms_rgb_component_for_infinite_psnr 8 unsigned integer
for (i = 1; i < = ms_num_quality_levels; i++) {
ms_max_rgb_component[i] 8 unsigned integer
ms_scaled_psnr_rgb[i] 8 unsigned integer
}
7.3 Signalling
Green Metadata can be carried in metadata tracks within the ISO Base Media File Format (ISO/IEC 14496-
12). Such carriage is specified in ISO/IEC 23001-10.
In the context of DASH delivery, a specific Adaptation Set within the MPD can define the available Green
Metadata Representations and their association to the available media Representations, using the
signalling mechanisms specified in ISO/IEC 23009-1:2014/Amd 2 and ISO/IEC 23009-3:2014/Amd 1 and
illustrated in Annex B.
7.4 Semantics
7.4.1 Decoder-power indication metadata semantics
th
num_dec_ops(i) – the estimated number of decoding operations required for the i Representa
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...