Information technology - Coded representation of immersive media - Part 13: Video decoding interface for immersive media

This document specifies the interfaces of a video decoding engine as well as the operations related to elementary streams and metadata that can be performed by this video decoding engine. To support those operations, this document also specifies SEI messages when necessary for certain video codecs.

Technologies de l'information — Représentation codée de média immersifs — Partie 13: Interface de décodage vidéo pour les média immersifs

General Information

Status
Published
Publication Date
18-Jan-2024
Current Stage
9092 - International Standard to be revised
Start Date
08-Feb-2024
Completion Date
30-Oct-2025
Ref Project

Overview

ISO/IEC 23090-13:2024 - Information technology - Coded representation of immersive media - Part 13: Video decoding interface for immersive media - specifies standardized interfaces and operations for a video decoding engine (VDE) used with immersive media. The standard defines how media and elementary streams, decoder instances and related metadata are fed, managed and exposed through an input video decoding interface (IVDI) and an output video decoding interface (OVDI). Where needed for codec support, the document also specifies SEI (Supplemental Enhancement Information) messages.

Key topics and technical requirements

  • Video decoding engine (VDE) architecture: roles for input formatting, decoder instances, composition memory and compositor; management of decoded sequences and synchronization.
  • Input/Output interfaces: definitions of the IVDI and OVDI for passing media streams (aggregated or partial elementary streams) into the VDE and delivering decoded output to the rendering pipeline.
  • Elementary streams & media streams: semantics for media streams, elementary streams (ES), access units (AU), video objects and video object identifiers used for filtering, insertion, appending and stacking operations.
  • Control interface: an IDL-based control interface (Annex A) to control VDE functions and query capabilities.
  • Codec and instantiation constraints: slice- and layer-based instantiations for codecs referenced in the document (e.g., HEVC - ISO/IEC 23008-2, VVC - ISO/IEC 23090-3, EVC - ISO/IEC 23094-1) and related media/elementary stream constraints.
  • SEI message syntax: normative SEI syntax and semantics when required for specific codecs (Annex C).
  • Platform mappings and examples: informative mappings to existing integration layers and APIs such as OpenMAX IL and Vulkan Video, plus example input formatting operations and MSE mapping (Annexes B, D, E, F).

Applications and who uses it

  • Silicon and SoC vendors implementing hardware VDEs to expose standardized decoding capabilities to software stacks.
  • Media player and renderer developers (desktop, mobile, VR/AR, immersive media players) for consistent integration of hardware decoders.
  • Streaming and OTT platforms delivering immersive video formats that require interoperable decoder interfaces and metadata handling.
  • Middleware/API designers and browser engine teams implementing standardized decoder bindings (e.g., mappings shown for OpenMAX IL, Vulkan Video, MSE).
  • Standards and testing organizations validating compliance, interoperability and patent-aware implementations.

Related standards

  • ISO/IEC 23008-2 (HEVC)
  • ISO/IEC 23090-3 (VVC)
  • ISO/IEC 23094-1 (EVC)

Keywords: ISO/IEC 23090-13:2024, video decoding interface, immersive media, VDE, IVDI, OVDI, elementary streams, SEI messages, HEVC, VVC, EVC, OpenMAX, Vulkan Video.

Standard
ISO/IEC 23090-13:2024 - Information technology — Coded representation of immersive media — Part 13: Video decoding interface for immersive media Released:19. 01. 2024
English language
43 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


International
Standard
ISO/IEC 23090-13
First edition
Information technology — Coded
2024-01
representation of immersive
media —
Part 13:
Video decoding interface for
immersive media
Technologies de l'information — Représentation codée de média
immersifs —
Partie 13: Interface de décodage vidéo pour les média immersifs
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 2
5 Video decoding engine . . 2
5.1 General .2
5.2 Input video decoding interface .4
5.3 Output video decoding interface .4
5.4 Control interface to the Video Decoding Interface .5
5.4.1 Functions .5
5.5 Examples of video decoding engine instantiations .9
5.5.1 Mapping on OpenMAX™ integration layer (OpenMAX IL) .9 ®
5.5.2 Mapping on Vulkan Video .9
5.5.3 Informative mapping . 12
6 VDI systems decoder model .13
6.1 Introduction . 13
6.2 Concepts of the VDI systems decoder model . 13
6.2.1 General . 13
6.2.2 Media stream . 13
6.2.3 Media stream interface. 13
6.2.4 Input formatter . 13
6.2.5 Access Units (AU) .14
6.2.6 Decoding Buffer (DB) .14
6.2.7 Elementary Streams (ES) .14
6.2.8 Elementary Stream Interface (ESI) .14
6.2.9 Decoder .14
6.2.10 Composition Units (CU) .14
6.2.11 Composition Memory (CM) .14
6.2.12 Compositor .14
7 Video decoder interface . 14
7.1 General .14
7.2 Operations on input media streams .14
7.2.1 General .14
7.2.2 Concepts . 15
7.2.3 Filtering by video object identifier . 15
7.2.4 Inserting video objects .16
7.2.5 Appending two video objects .17
7.2.6 Stacking two video objects .18
7.3 Slice-based instantiation for ISO/IEC 23008-2 high efficiency video coding (HEVC) .19
7.3.1 General .19
7.3.2 Media and elementary stream constraints .19
7.4 Layer-based instantiation for ISO/IEC 23090-3 versatile video coding (VVC) . 20
7.4.1 General . 20
7.4.2 Media and elementary stream constraints . 20
7.5 Slice-based instantiation for ISO/IEC 23094-1 essential video coding (EVC) . 22
7.5.1 General . 22
7.5.2 Media and elementary streams constraints . 23
Annex A (normative) Control interface IDL definition .25
Annex B (informative) OpenMAX IL VDI extension header .26

© ISO/IEC 2024 – All rights reserved
iii
Annex C (normative) Supplemental enhancement information (SEI) syntax and semantics .27
Annex D (informative) Example implementations of input formatting operations .33
Annex E (informative) Brief description of OpenMAX IL functions .38
Annex F (informative) Mapping on media source extensions (MSE) . 41
Bibliography .43

© ISO/IEC 2024 – All rights reserved
iv
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO 23090 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.

© ISO/IEC 2024 – All rights reserved
v
Introduction
The interfaces and operations specified in this document come as extensions of existing video decoding
engine specifications exposing hardware video decoding capabilities.

© ISO/IEC 2024 – All rights reserved
vi
International Standard ISO/IEC 23090-13:2024(en)
Information technology — Coded representation of
immersive media —
Part 13:
Video decoding interface for immersive media
1 Scope
This document specifies the interfaces of a video decoding engine as well as the operations related to
elementary streams and metadata that can be performed by this video decoding engine. To support those
operations, this document also specifies SEI messages when necessary for certain video codecs.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 23008-2, Information technology — High efficiency coding and media delivery in heterogeneous
environments — Part 2: High efficiency video coding
ISO/IEC 23090-3, Information technology — Coded representation of immersive media — Part 3: Versatile video
coding
ISO/IEC 23094-1, Information technology — General video coding — Part 1: Essential video coding
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
media stream
part of an elementary stream (3.2) or one or more aggregated elementary streams (3.2)
Note 1 to entry: Every elementary stream is a media stream, but the inverse is not true.
Note 2 to entry: A media stream may contain metadata such as non-VCL NAL units.
3.2
subframe
independently decodable unit smaller than a frame to which post-decoding processing by the decoder, if any,
has been applied
3.3
video object
independently decodable substream of a video elementary stream (3.2)

© ISO/IEC 2024 – All rights reserved
3.4
video object identifier
integer identifying a video object (3.4)
4 Abbreviated terms
API application programming interface
ES elementary stream
I video object identifier
IDL interface definition language
IVDI input video decoding interface
MDS media stream
NAL network abstraction layer
OLS output layer set
OVDI output video decoding interface
PPS picture parameter set
SEI supplemental enhancement information
SPS sequence parameter set
VCL video coding layer
VDE video decoding engine
5 Video decoding engine
5.1 General
The video decoding engine (VDE) enables the decoding, the synchronization and the formatting of media
streams which are one or more aggregated elementary streams or a part thereof. The media streams are
fed through the input video decoding interface (IVDI) of the VDE and provided to the subsequent elements
of the rendering pipeline via the output video decoding interface (OVDI) in their decoded form. Between
the input and the output, the VDE extracts and merges independently decodable regions from a set of
input media streams via the input formatting function and generates a set of elementary streams fed to
the video decoder instances which run inside the engine. The VDE can execute a merging operation or an
extraction operation on the input media streams such that the number of running video decoder instances
is different from the number of input media streams required by the application. For example, a VDE can
be incapable of decoding a single 4K input media stream with one decoder instance, but it can decode some
of the independently decodable regions, at a lower resolution, present in that input media stream. To this
end, the VDE should first verify the availability of sufficient resources to run in parallel those video decoder
instances.
Figure 1 represents the architecture for the VDE and the associated IVDI and OVDI interfaces.

© ISO/IEC 2024 – All rights reserved
Key
MDS media stream
ES elementary stream
MTS metadata stream
DS decoded sequence
m number of input metadata streams
n number of media streams
j number of video decoder instances
p number of output metadata streams
q number of decoded sequences
Figure 1 — Video decoding engine and interfaces
NOTE 1 Multiple elementary streams that are output of the input formatting function can be fed to a single video
decoder instance.
NOTE 2 The concept of metadata stream does not yet possess a definition in this document. Figure 2 depicts an
architecture for handling multiple video decoder instances on a single hardware platform. In this scenario, one or
more video decoder instances running on the same video decoder hardware engine are exposed to the application
layer as several decoder instances each with their own interface.

© ISO/IEC 2024 – All rights reserved
Figure 2 — Example relationship between video decoder instances and video decoder hardware
engine
5.2 Input video decoding interface
The video decoding engine accepts media streams and metadata streams. There is at least one media stream
as input but there is no constraint on the number of metadata streams with respect to the number of media
streams being concurrently consumed by the VDE.
The input of the VDE comprises thus:
— n media streams
— m metadata streams
5.3 Output video decoding interface
The video decoding engine outputs decoded video sequences and metadata streams. There is at least one
decoded video sequence as output but there is no constraint on the number of metadata streams with
respect to the number of decoded video sequences being concurrently output by the VDE.
These two output stream types may be provided in a form of multiplexed output buffers, including both
decoded media data and its associated metadata.
The output of the VDE comprises thus:
— q decoded sequences
— p metadata streams
© ISO/IEC 2024 – All rights reserved
5.4 Control interface to the Video Decoding Interface
5.4.1 Functions
In order to support immersive media applications, subclause 5.4 defines an abstract video decoding
interface. A video decoding platform that complies with this document shall implement this video decoding
interface whose IDL can be found in Annex A.
The video decoding interface consists of the abstract functions defined in the following subclause. These
functions are defined using the IDL syntax specified in ISO/IEC 19516.
Figure 3 depicts an example instantiation of decoder instances using some of the functionalities of the video
decoding interface. The video decoder instances with identifiers 1 to 3 belong to the group with the identifier
4. By this grouping mechanism, the three instances write the decoded sequences into a single aggregate
buffer and the decoding operations across those instances are performed in a coordinated manner such that
no instance runs ahead or behind the others.
Figure 3 — Example instantiation using VDI
5.4.1.1 queryCurrentAggregateCapabilities()
5.4.1.1.1 Declaration
The IDL declarations of the queryCurrentAggregateCapabilities() function along with the
AggregateCapabilities and PerformancePoint structures and the capabilities flags are defined as follows:
const unsigned long CAP_INSTANCES_FLAG = 0x1;
const unsigned long CAP_BUFFER_MEMORY_FLAG = 0x2;
const unsigned long CAP_BITRATE_FLAG = 0x4;
const unsigned long CAP_MAX_SAMPLES_SECOND_FLAG = 0x8;
const unsigned long CAP_MAX_PERFORMANCE_POINT_FLAG = 0xA;

struct PerformancePoint {
float picture_rate;
unsigned long width;
unsigned long height;
© ISO/IEC 2024 – All rights reserved
unsigned long bit_depth;
};
struct AggregateCapabilities {
unsigned long flags;
unsigned long max_instances;
unsigned long buffer_memory;
unsigned long bitrate;
unsigned long max_samples_second;
PerformancePoint max_performance_point;
};
AggregateCapabilities queryCurrentAggregateCapabilities (
in string component_name,
in unsigned long flags
);
5.4.1.1.2 Definition
The queryCurrentAggregateCapabilities() function can be used by the application to query the
instantaneous aggregate capabilities of a decoder platform for a specific codec component.
The capability flags below can set separately or in a single function call to query one or more parameters.
The component_name provides the name of the component of the decoding platform for which the query
applies. The name All may be used to indicate that the query is not for a particular component but is rather
for all the components of the decoding platform. Components are hardware or software functionalities
exposed by the Video Decoding Engine such as decoders.
CAP_INSTANCES_FLAG queries the max_instances parameter which indicates the maximum number of decoder
instances that can be instantiated at this moment for the provided decoder component.
CAP_BUFFER_MEMORY_FLAG queries the buffer_memory parameter which indicates the instantaneous global
maximum available buffer size in bytes that can be allocated independently of any components at this
moment on the decoder platform for buffer exchange. The allocation of the memory can be done by the
application or the VDE itself depending on the VDE instantiation.
CAP_BITRATE_FLAG queries the bitrate parameter which indicates the instantaneous maximum coded
bitrate in bits per second that the queried component can process.
CAP_MAX_SAMPLES_SECOND_FLAG queries the max_samples_second parameter which indicates the
instantaneous maximum number of luma and chroma samples combined per second that the queried
component is able to process.
CAP_MAX_PERFORMANCE_POINT_FLAG queries the max_performance_point parameter which indicates the
maximum performance point of a bitstream that can be decoded by the indicated component in a new
instance of that decoder component.
A PerformancePoint contains the following parameters:
— picture_rate indicating the instantaneous picture rate of the maximum performance point in pictures
per second.
— height indicating the height in luma samples of the maximum performance point.
— width indicating the width in luma samples of the maximum performance point.
— bit_depth indicating the bit depth of the luma samples of the maximum performance point.
NOTE Each parameter of the max performance point does not necessarily represent the maximum in that
dimension. It is the combination of all dimensions that constitutes the maximum performance point.

© ISO/IEC 2024 – All rights reserved
5.4.1.2 getInstance()
5.4.1.2.1 Declaration
The IDL declarations of the getInstance() function and the associated ErrorAllocation exception are
defined as follows:
exception ErrorAllocation {
string reason;
};
unsigned long getInstance(
in string component_name,
in unsigned long group_id // optional, default value = -1
) raises(ErrorAllocation);
5.4.1.2.2 Definition
The result of a successful call to the getInstance()function call shall provide the identifier of the instance
and the group_id that is assigned or created for this new instance, if one was requested. The default behavior
is that the decoder instance does not belong to any already established group but is assigned to a newly
created group.
Several decoder instances belonging to a same group means that the VDE treats those instances collectively
such that the decoding states of those instances progress in synchrony and not in competition against
each other. As a result, the VDE will also ensure synchronized output writing operation, possibly into an
aggregate buffer. There are no conditions for two video decoder instances to be in the same group.
5.4.1.3 setConfig()
5.4.1.3.1 Declaration
The IDL declarations of the setConfig() function, the associated ErrorConfig exception, the
ConfigDataParameters structure and the ConfigParameters enumeration are defined as follows:
enum ConfigParameters {
CONFIG_OUTPUT_BUFFER
};
struct ConfigDataParameters {
SampleFormat sample_format;
SampleType sample_type;
unsigned long sample_stride;
unsigned long line_stride;
unsigned long buffer_offset;
};
exception ErrorConfig {
string reason;
};
boolean setConfig (
in unsigned long instance_id,
in ConfigParameters config_parameters,
in ConfigDataParameters config_data_parameters
) raises(ErrorConfig);
5.4.1.3.2 Definition
The setConfig() function may be called with the parameter CONFIG_OUTPUT_BUFFER, in which case it provides
the format of the output buffer.

© ISO/IEC 2024 – All rights reserved
The format of the buffer shall contain the following parameters:
— sample_format indicating the format of each sample, which can be a scalar, a 2D vector, a 3D vector, or a
4D vector.
— sample_type indicating the type of each component of the sample.
— sample_stride indicating the number of bytes between 2 consecutive samples of this output.
— line_stride indicating the number of bytes between the first byte of one line and the first byte of the
following line of this output.
— buffer_offset indicating the offset into the output buffer, starting from which the output frame should
be written.
5.4.1.4 getParemeter() and setParameter()
5.4.1.4.1 Declaration
The IDL declarations of the getParameter() and setParemeter() functions as well as the associated
ErrorParameter exception and the ExtParameters enumeration are defined as follows:
enum ExtParameters {
PARAM_PARTIAL_OUTPUT,
PARAM_SUBFRAME_OUTPUT,
PARAM_METADATA_CALLBACK,
PARAM_OUTPUT_CROP,
PARAM_OUTPUT_CROP_WINDOW,
PARAM_MAX_OFFTIME_JITTER
};
struct CropWindow {
unsigned long x;
unsigned long y;
unsigned long width;
unsigned long height;
};
exception ErrorParameter {
string reason;
};
any getParameter (
in unsigned long instance_id,
in ExtParameters ext_parameters,
out any parameter
);
boolean setParameter (
in unsigned long instance_id,
in ExtParameters ext_parameters,
in any parameter
) raises(ErrorParameter);
5.4.1.4.2 Definition
The getParameter() and setParameter() functions can receive the extended parameters in the clauses
below.
PARAM_PARTIAL_OUTPUT indicates whether the output of subframes is required, desired, or not allowed. If it
is not allowed, only complete decoded frames will be passed to the buffer.
PARAM_SUBFRAME_OUTPUT indicates the one or more subframes to be output by the decoder.
PARAM_METADATA_CALLBACK sets a callback function for a specific metadata type. The list of supported
metadata types is codec dependent and shall be defined for each codec independently.

© ISO/IEC 2024 – All rights reserved
PARAM_OUTPUT_CROP indicates that only part of the decoded frame is desired at the output. The decoder
instance may use this information to intelligently reduce its decoding processing by discarding units that do
not fall in the cropped output region whenever possible.
PARAM_OUTPUT_CROP_WINDOW indicates the part of the decoded frame to be cropped and output.
PARAM_MAX_OFFTIME_JITTER indicates the maximum amount of time in microseconds between consecutive
executions of the decoder instance. This parameter is relevant whenever the underlying hardware
component is shared among multiple decoder instances, which requires context switching between the
different decoder instances.
5.5 Examples of video decoding engine instantiations
5.5.1 Mapping on OpenMAX™ integration layer (OpenMAX IL)
5.5.1.1 Overview
1)
For more information on OpenMAX IL, Annex E provides a brief description of the main functions of this
API.
5.5.1.2 Mapping of VDI functions
The function defined in 5.4 are mapped on the OpenMAX IL interface by using the extension mechanism
defined by the specification. This MPEG VDI extension for OpenMAX IL is formatted as a C header file and
registered with the vendor name “MPEG”.
Annex B defines the MPEG VDI extension for OpenMAX IL and provides information to access the electronic
version of this extension. ®
5.5.2 Mapping on Vulkan Video
5.5.2.1 Overview
®2)
Vulkan Video (VK) is an extension of the Vulkan API which defines functions exposed by Graphics
Processing Units (GPU). This extension provides interfaces for an application to leverage hardware decoding
and encoding capabilities present on GPUs.
A VK Video Session consists of a single decoding session on a single layer. As a result, a single VK Video
Session corresponds to a single video decoder instance depicted in Figure 1.
The mapping of VDI functions on VK is summarised in Table 1.
1) OpenMAX™ is an example of a suitable product available commercially. This information is given for the convenience
of users of this document and does not constitute an endorsement by ISO IEC of this product. ®
2) Vulkan is an example of a suitable product available commercially. This information is given for the convenience of
users of this document and does not constitute an endorsement by ISO IEC of this product.

© ISO/IEC 2024 – All rights reserved ®
Table 1 — Summary of VDI function mapping on Vulkan Video
VDI functions VK mapping
queryCurrent
New vkGetPhysicalDeviceCurrentVideoCapabilitiesMPEG() function
AggregateCapabilities
Extending VkVideoSessionCreateInfoKHR with a group identifier passed in the new
structure
getInstance (grouping)
VkVideoSessionCreateInfoGroupingMPEG. Call of existing
vkCreateVideoSessionKHR().
setConfig (buffer con- Mapping on existing VkVideoSessionCreateInfoKHR and
figuration) VkVideoPictureResourceKHR structures.
getParameter and set-
New VkVideoSessionOutputParameterMPEG structure
Parameter
5.5.2.2 The vkGetPhysicalDeviceCurrentVideoCapabilitiesMPEG() function
5.5.2.2.1 Definition
The VK Video API provides a function for querying capabilities for a single VK Video Profile which is called
vkGetPhysicalDeviceVideoCapabilitiesKHR(). Similar to this function, the VDI VK mapping defines the
vkGetPhysicalDeviceCurrentVideoCapabilitiesMPEG()function. In contrast to the vkGetPhysicalDevic
eVideoCapabilitiesKHR() function, the vkGetPhysicalDeviceCurrentVideoCapabilitiesMPEG()function
allows to query the aggregates capabilities of the physical device. When it is called with a certain profile, the
aggregated capabilities pertains to this given profile.
5.5.2.2.2 Declaration
VkResult vkGetPhysicalDeviceCurrentVideoCapabilitiesMPEG(
VkPhysicalDevice       physicalDevice,
VkVideoProfileKHR*      pVideoProfile,
VkCurrentVideoCapabilitiesMPEG*  pCapabilities);

5.5.2.2.3 Semantics
physicalDevice is the physical device whose video decode or encode capabilities are to be queried.
pVideoProfile is a pointer to a VkVideoProfileKHR structure with a chained codec-operation specific video
profile structure.
pCapabilities is a pointer to a VkCurrentVideoCapabilitiesMPEG structure in which the capabilities are
returned.
5.5.2.3 The VkCurrentVideoCapabilitiesMPEG structure
5.5.2.3.1 Definition
The VkCurrentVideoCapabilitiesMPEG structure holds the information returned by a call to the vkGetPhysi
calDeviceCurrentVideoCapabilitiesMPEG() function defined in subclause 5.5.2.2.
5.5.2.3.2 Declaration
typedef struct VkCurrentVideoCapabilitiesMPEG {
VkStructureType    sType;
void*         pNext;
uint32_t        maxInstances;
uint32_t        bufferMemory;
uint32_t        bitrate;
uint32_t        maxSamplesSecond;
VkPerformancePointMPEG*   maxPerformancePoint;

© ISO/IEC 2024 – All rights reserved
} VkCurrentVideoCapabilitiesMPEG;

5.5.2.3.3 Semantics
sType is the type of this structure.
pNext is NULL or a pointer to a structure extending this structure.
maxInstances see semantic in subclause 5.4.1.1.2.
bufferMemory see semantic in subclause 5.4.1.1.2.
bitrate see semantic in subclause 5.4.1.1.2.
maxSamplesSecond see semantic in subclause 5.4.1.1.2.
maxPerformancePoint is a pointer to a VkPerformancePointMPEG structure in which the properties of the
maximum performance are returned.
5.5.2.4 The VkCurrentVideoCapabilitiesMPEG structure
5.5.2.4.1 Definition
The VkCurrentVideoCapabilitiesMPEG structure contains properties describing a performance point for a
video processing entity.
5.5.2.4.2 Declaration
typedef struct VkPerformancePointMPEG {
VkStructureType sType;
void*      pNext;
uint32_t    pictureRate;
uint32_t    height;
uint32_t    width;
uint32_t    bitDepth;
} VkPerformancePointMPEG;
5.5.2.4.3 Semantics
sType is the type of this structure.
pNext is NULL or a pointer to a structure extending this structure.
pictureRate see semantic in subclause 5.4.1.1.2.
height see semantic in subclause 5.4.1.1.2.
width see semantic in subclause 5.4.1.1.2.
bitDepth see semantic in subclause 5.4.1.1.2.
5.5.2.5 The VkVideoSessionCreateInfoGroupingMPEG structure
5.5.2.5.1 Definition
The VkVideoSessionCreateInfoGroupingMPEG structure allows to attach a group identifier to a video
decoding instance created via the VK Video API. This structure extends the VkVideoSessionCreateInfoKHR
structure defined in the VK Video API

© ISO/IEC 2024 – All rights reserved
5.5.2.5.2 Declaration
typedef struct VkVideoSessionCreateInfoGroupingMPEG {
VkStructureType         sType;
const void*           pNext;
uint32_t            groupId;
} VkVideoSessionCreateInfoGroupingMPEG;

5.5.2.5.3 Semantics
sType is the type of this structure.
pNext is NULL or a pointer to a structure extending this structure.
groupId see semantic in subclause 5.4.1.2.2.
5.5.2.6 The VkVideoSessionOutputParameterMPEG structure
5.5.2.6.1 Definition
The VkVideoSessionOutputParameterMPEG structure contains parameters that configure the properties of
the output of the VK Video Session.
5.5.2.6.2 Declaration
typedef struct VkVideoSessionOutputParameterMPEG {
VkStructureType sType;
const void*   pNext;
VkFlag     partialOutput;
uint32_t*    subframeCount;
uint32_t*    pSubframeOutput;
VkFlag     outputCrop;
VkExtent2D*   pOutputCropWindow;
uint32_t    maxOfftimeJitter;
void*      pMetadataCallback;
} VkVideoSessionOutputParameterMPEG;

5.5.2.6.3 Semantics
sType is the type of this structure.
pNext is NULL or a pointer to a structure extending this structure.
partialOutput see semantic in subclause 5.4.1.4.2.
subframeCount and pSubframeOutput see semantic in subclause 5.4.1.4.2.
outputCrop see semantic in subclause 5.4.1.4.2.
pOutputCropWindow see semantic in subclause 5.4.1.4.2.
maxOfftimeJitter see semantic in subclause 5.4.1.4.2.
pMetadataCallback see semantic in subclause 5.4.1.4.2.
5.5.3 Informative mapping
This specification also provides informative mapping on other APIs such as in Annex F on the media source
extension (MSE).
© ISO/IEC 2024 – All rights reserved
6 VDI systems decoder model
6.1 Introduction
The VDI systems decoder model extends on the systems decoder model (SDM) defined in ISO/IEC 14496-1.
Compared to the SDM, the VDI SDM introduces a new interface in addition to the elementary stream interface
called the media stream interface. This interface is the input of the Input Formatter, also called input
formatting function, which takes as input the so-called media streams. The output of the Input Formatter is
one or more elementary streams that can be further passed on to the decoders.
These elements are depicted in Figure 4.
Figure 4 — VDI systems decoder model
6.2 Concepts of the VDI systems decoder model
6.2.1 General
The concepts necessary for the specification are the formatting, the timing, and the buffering model. The
sequence of definitions corresponds to a walk from the left to the right side of the VDI SDM illustration in
Figure 4.
6.2.2 Media stream
6.2.3 Media stream interface
The media stream interface is a concept that models the exchange of media stream data between the delivery
interface and the input formatting function.
6.2.4 Input formatter
The input formatter takes one or more media streams as input and generates one or more elementary
streams as output. A single input formatter may be attached to several decoding buffers when it produces
individual elementary streams or multi-layer elementary streams.

© ISO/IEC 2024 – All rights reserved
6.2.5 Access Units (AU)
See 7.1.2.2 in ISO/IEC 14496-1.
6.2.6 Decoding Buffer (DB)
See 7.1.2.4 in ISO/IEC 14496-1.
6.2.7 Elementary Streams (ES)
See 7.1.2.5 in ISO/IEC 14496-1.
6.2.8 Elementary Stream Interface (ESI)
See 7.1.2.6 in ISO/IEC 14496-1.
6.2.9 Decoder
See 7.1.2.7 in ISO/IEC 14496-1.
6.2.10 Composition Units (CU)
See 7.1.2.8 in ISO/IEC 14496-1.
6.2.11 Composition Memory (CM)
See 7.1.2.9 in ISO/IEC 14496-1.
6.2.12 Compositor
See 7.1.2.10 in ISO/IEC 14496-1.
7 Video decoder interface
7.1 General
As shown in Figure 1, the hardware video decoding engine may spawn one or more video decoder
instances. The number of instances running is an optimization choice for the platform when considering
available resources such as computational load, energy consumption, memory, etc. However, the number of
input media streams fed through the IVDI depends on the application needs to properly render the media
experience. Therefore, one or more input media streams may be fed to the same video decoding instance
thanks to the block called "Input formatting" in Figure 1.
This clause defines the binding for several video codecs to realize the operations on input video streams.
7.2 Operations on input media streams
7.2.1 General
The input formatting function in Figure 1 provides several operations on media streams and video objects.
The input formatting function results in one or more elementary streams conforming to the profile, tier, level
or any other performance constraints of the video decoder instance expected to consume them including
buffer fullness of the hypothetical reference decoder model. These operations are defined in an atomic way
such that more complex operations can be achieved by combining them as long as the final output consists
of valid elementary streams. The actual implementation of those combined operations is out-of-scope of this
specification and can be subject to optimization by the implementers. Example of possible implementations
are provided in Annex D.
© ISO/IEC 2024 – All rights reserved
A media stream contains one or more video objects and a video object is contained into one elementary
stream. Each video object in an elementary stream provides information for enabling the defined operations
such as a mean to determine the location and the dimension of the video object in the picture, the number of
luma and chroma samples in the video object, the bit depth of the coded picture of the video object and so on.
7.2.2 Concepts
MediaStream a type of media stream
ElementaryStream a type of elementary stream
AccessUnit a type of access unit
VideoObjectIdentifier a type of video object identifier
VideoObjectSample a type of video object sample
7.2.3 Filtering by video object identifier
7.2.3.1 Definition
Function: Filtering
fM:xDS IE→ S
Definition:
Input: one media stream with at least one video object
the identifier of the selected video object to be extracted
Output: one elementary stream with one video object which corresponds to the selected one
Signature: ElementaryStream output_stream filtering(MediaStream input_stream,
VideoObjectIdentifier id)
For each i-th access unit in the input media stream, the function makes a copy of the access unit. Then, the
function lists the video object samples present in this copied access unit. If a video object sample does not
correspond to the video object identifier passed as input, the video object sample is removed from the copied
access unit. Lastly, the copied access unit is appended to the output elementary stream as a new access unit.
NOTE The function implements a filtering process based on the selected object identifier, that is the original
access units are first copied and then removed from the unwanted objects. This way, the operation does not need to
create and initialize an
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...

Frequently Asked Questions

ISO/IEC 23090-13:2024 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology - Coded representation of immersive media - Part 13: Video decoding interface for immersive media". This standard covers: This document specifies the interfaces of a video decoding engine as well as the operations related to elementary streams and metadata that can be performed by this video decoding engine. To support those operations, this document also specifies SEI messages when necessary for certain video codecs.

This document specifies the interfaces of a video decoding engine as well as the operations related to elementary streams and metadata that can be performed by this video decoding engine. To support those operations, this document also specifies SEI messages when necessary for certain video codecs.

ISO/IEC 23090-13:2024 is classified under the following ICS (International Classification for Standards) categories: 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.

You can purchase ISO/IEC 23090-13:2024 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.