ETSI TS 126 118 V15.1.0 (2019-04)
5G; 3GPP Virtual reality profiles for streaming applications (3GPP TS 26.118 version 15.1.0 Release 15)
5G; 3GPP Virtual reality profiles for streaming applications (3GPP TS 26.118 version 15.1.0 Release 15)
RTS/TSGS-0426118vf10
General Information
Standards Content (Sample)
TECHNICAL SPECIFICATION
5G;
3GPP Virtual reality profiles for streaming applications
(3GPP TS 26.118 version 15.1.0 Release 15)
3GPP TS 26.118 version 15.1.0 Release 15 1 ETSI TS 126 118 V15.1.0 (2019-04)
Reference
RTS/TSGS-0426118vf10
Keywords
5G
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE
Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88
Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.
© ETSI 2019.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are trademarks of ETSI registered for the benefit of its Members.
TM TM
3GPP and LTE are trademarks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and
of the oneM2M Partners. ®
GSM and the GSM logo are trademarks registered and owned by the GSM Association.
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 2 ETSI TS 126 118 V15.1.0 (2019-04)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
Foreword
This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP).
The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables.
The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under
http://webapp.etsi.org/key/queryform.asp.
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 3 ETSI TS 126 118 V15.1.0 (2019-04)
Contents
Intellectual Property Rights . 2
Foreword . 2
Modal verbs terminology . 2
Foreword . 7
Introduction . 7
1 Scope . 8
2 References . 8
3 Definitions, symbols and abbreviations . 9
3.1 Definitions . 9
3.2 Symbols . 9
3.3 Abbreviations . 9
4 Architectures and Interfaces for Virtual Reality . 10
4.1 Definitions and Reference Systems . 10
4.1.1 Overview . 10
4.1.2 3GPP 3DOF Coordinate System . 11
4.1.3 Video Signal Representation. 13
4.1.4 Audio Signal Representation . 14
4.2 End-to-end Architecture . 15
4.3 Client Reference Architecture . 16
4.4 Rendering Schemes, Operation Points and Media Profiles . 18
4.5 Audio Rendering . 20
4.5.1 Audio Renderer Definitions . 20
4.5.1.1 Reference Renderer . 20
4.5.1.2 Common Informative Binaural Renderer (CIBR) . 20
4.5.1.3 External Renderer . 21
4.5.1.4 Common Renderer API . 21
4.5.1.5 External Renderer API . 21
4.5.1.6 Rendering Test . 22
5 Video . 22
5.1 Video Operation Points . 22
5.1.1 Definition of Operation Point . 22
5.1.2 Parameters of Visual Operation Point . 22
5.1.3 Operation Point Summary . 23
5.1.4 Basic H.264/AVC . 23
5.1.4.1 General . 23
5.1.4.2 Profile and level . 24
5.1.4.3 Aspect Ratios and Spatial resolutions . 24
5.1.4.4 Colour information . 24
5.1.4.5 Frame rates . 24
5.1.4.6 Random access point . 25
5.1.4.7 Sequence parameter set . 25
5.1.4.8 Video usability information . 25
5.1.4.9 Omni-directional Projection Format . 25
5.1.4.10 Restricted Coverage . 26
5.1.4.11 Other VR Metadata . 26
5.1.4.12 Receiver Compatibility . 26
5.1.5 Main H.265/HEVC . 26
5.1.5.1 General . 26
5.1.5.2 Profile and level . 26
5.1.5.3 Bit depth . 27
5.1.5.4 Spatial Resolutions . 27
5.1.5.5 Colour information and Transfer Characteristics . 27
5.1.5.6 Frame rates . 28
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 4 ETSI TS 126 118 V15.1.0 (2019-04)
5.1.5.7 Random access point . 28
5.1.5.8 Video and Sequence Parameter Sets . 28
5.1.5.9 Video usability information . 29
5.1.5.10 Omni-directional Projection Formats . 29
5.1.5.11 Restricted Coverage . 29
5.1.5.12 Viewport-Optimized Content . 29
5.1.5.13 Frame packing arrangement . 29
5.1.5.14 Other VR Metadata . 30
5.1.5.15 Receiver Compatibility . 30
5.1.6 Flexible H.265/HEVC . 30
5.1.6.1 General . 30
5.1.6.2 Profile and level . 30
5.1.6.3 Bit depth . 31
5.1.6.4 Spatial Resolutions . 31
5.1.6.5 Colour information and Transfer Characteristics . 31
5.1.6.6 Frame rates . 32
5.1.6.7 Random access point . 32
5.1.6.8 Video and Sequence Parameter Sets . 32
5.1.6.9 Video usability information . 33
5.1.6.10 Omni-directional Projection Formats . 33
5.1.6.11 Restricted Coverage . 33
5.1.6.12 Viewport-Optimized Content . 33
5.1.6.13 Frame packing arrangement . 34
5.1.6.14 Other VR Metadata . 34
5.1.6.15 Receiver Compatibility . 34
5.2 Video Media Profiles. 34
5.2.1 Introduction and Overview . 34
5.2.2 Basic Video Media Profile . 35
5.2.2.1 Overview . 35
5.2.2.2 File Format Signaling and Encapsulation . 35
5.2.2.3 DASH Integration . 36
5.2.2.3.1 Definition. 36
5.2.2.3.2 Additional Restrictions for DASH Representations . 36
5.2.2.3.3 DASH Adaptation Set Constraints . 37
5.2.3 Main Video Media Profile . 38
5.2.3.1 Overview . 38
5.2.3.2 File Format Signaling and Encapsulation . 38
5.2.3.3 DASH Integration . 40
5.2.3.3.1 Definition. 40
5.2.3.3.2 Additional Restrictions for DASH Representations . 40
5.2.3.3.3 DASH Adaptation Set Constraints . 40
5.2.3.3.4 Adaptation Set Ensembles for Viewport-Optimized offering. 42
5.2.4 Advanced Video Media Profile . 43
5.2.4.1 Overview . 43
5.2.4.2 File Format Signaling and Encapsulation . 43
5.2.4.3 DASH Integration . 44
5.2.4.3.1 Definition. 44
5.2.4.3.2 Additional Restrictions for DASH Representations . 45
5.2.4.3.3 DASH Adaptation Set Constraints . 46
5.2.4.3.4 Adaptation Set Constraints for Viewport Selection . 48
6 Audio . 48
6.1 Audio Operation Points . 48
6.1.1 Definition of Operation Point . 48
6.1.2 Parameters of Audio Operation Point . 49
6.1.3 Summary of Audio Operation Points . 49
6.1.4 3GPP MPEG-H Audio Operation Point. 49
6.1.4.1 Overview . 49
6.1.4.2 Bitstream requirements . 50
6.1.4.3 Receiver requirements . 50
6.1.4.3.1 General . 50
6.1.4.3.2 Decoding process. 50
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 5 ETSI TS 126 118 V15.1.0 (2019-04)
6.1.4.3.3 Random Access . 51
6.1.4.3.4 Configuration change . 51
6.1.4.3.5 MPEG-H Multi-stream Audio . 51
6.1.4.3.6 Rendering requirements . 51
6.2 Audio Media Profiles . 53
6.2.1 Introduction and Overview . 53
6.2.2 OMAF 3D Audio Baseline Media Profile . 53
6.2.2.1 Overview . 53
6.2.2.2 File Format Signaling and Encapsulation . 54
6.2.2.2.1 General . 54
6.2.2.2.2 Configuration change constraints . 54
6.2.2.3 Multi-stream constraints . 54
6.2.2.3a Additional Restrictions for DASH Representations . 54
6.2.2.4 DASH Adaptation Set Constraints . 54
6.2.2.4.1 General . 54
6.2.2.4.2 DASH Adaptive Bitrate Switching . 55
7 Metadata . 55
7.1 Presentation without Pose Information to 2D Screens . 55
8 VR Presentation . 55
8.1 Definition . 55
8.2 3GPP VR File . 55
8.3 3GPP VR DASH Media Presentation . 55
Annex A (informative): Content Generation Guidelines . 57
A.1 Introduction . 57
A.2 Video . 57
A.2.1 Overview . 57
A.2.2 Decoded Texture Signal Constraints . 57
A.2.2.1 General . 57
A.2.2.2 Constraints for Main and Flexible H.265/HEVC Operation Point . 57
A.2.3 Conversion of ERP Signals to CMP . 58
A.2.3.1 General . 58
A.2.3.2 Equirectangular Projection (ERP) . 59
A.2.3.3 Cubemap Projection (CMP) . 59
A.2.3.4 Conversion between two projection formats . 61
Annex B (informative): Example External Binaural Renderer . 62
B.1 General . 62
B.2 Interfaces . 62
B.2.1 Interface for Audio Data and Metadata . 62
B.2.2 Head Tracking Interface. . 63
B.2.3 Interface for Head-Related Impulse Responses . 63
B.3 Preprocessing . 63
B.3.1 Channel Content . 63
B.3.2 Object Content . 63
B.3.3 HOA Content . 63
B.3.4 Non-diegetic Content . 63
B.4 Scene Displacement Processing . 64
B.4.1 General . 64
B.4.2 Applying Scene Displacement Information . 64
B.5 Headphone Output Signal Computation . 64
B.5.1 General . 64
B.5.2 HRIR Selection . 64
B.5.3 Initialization . 64
B.5.4 Convolution and Crossfade . 65
B.5.5 Binaural Downmix . 65
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 6 ETSI TS 126 118 V15.1.0 (2019-04)
B.5.6 Complexity . 65
B.5.7 Motion Latency . 66
Annex C (informative): Registration Information . 67
C.1 3GPP Registered URIs . 67
Annex D (informative): Change history . 68
History . 69
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 7 ETSI TS 126 118 V15.1.0 (2019-04)
Foreword
This Technical Specification has been produced by the 3rd Generation Partnership Project (3GPP).
The contents of the present document are subject to continuing work within the TSG and may change following formal
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an
identifying change of release date and an increase in version number as follows:
Version x.y.z
where:
x the first digit:
1 presented to TSG for information;
2 presented to TSG for approval;
3 or greater indicates TSG approved document under change control.
y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections,
updates, etc.
z the third digit is incremented when editorial only changes have been incorporated in the document.
Introduction
The present document provides technologies for interoperable Virtual Reality services with focus on streaming and
consumption.
Virtual Reality (VR) is the ability to be virtually present in a space created by the rendering of natural and/or synthetic
image and sound correlated by the movements of the immersed user allowing interacting with that world.
Suitable media formats for providing immersive experiences are specified to enable Virtual Reality Services in the
context of 3GPP bearer and user services.
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 8 ETSI TS 126 118 V15.1.0 (2019-04)
1 Scope
The present document defines interoperable formats for Virtual Reality for streaming services. Specifically, the present
document defines operation points, media profiles and presentation profiles for Virtual Reality. The present document
builds on the findings and conclusions in TR 26.918 [2].
2 References
The following documents contain provisions which, through reference in this text, constitute provisions of the present
document.
- References are either specific (identified by date of publication, edition number, version number, etc.) or
non-specific.
- For a specific reference, subsequent revisions do not apply.
- For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same
Release as the present document.
[1] 3GPP TR 21.905: "Vocabulary for 3GPP Specifications".
[2] 3GPP TR 26.918: "Virtual Reality (VR) media services over 3GPP".
[3] Recommendation ITU-R BT.709-6 (06/2015): "Parameter values for the HDTV standards for
production and international programme exchange".
[4] Recommendation ITU-R BT.2020-2 (10/2015): "Parameter values for ultra-high definition
television systems for production and international programme exchange".
[5] Recommendation ITU-T H.264 (04/2017): "Advanced video coding for generic audiovisual
services" | ISO/IEC 14496-10:2014: "Information technology – Coding of audio-visual objects –
Part 10: Advanced Video Coding".
[6] Recommendation ITU-T H.265 (02/2018): "High efficiency video coding" | ISO/IEC 23008-
2:2018: "High Efficiency Coding and Media Delivery in Heterogeneous Environments – Part 2:
High Efficiency Video Coding".
[7] void
[8] 3GPP TS 26.247: "Transparent end-to-end Packet-switched Streaming Service (PSS); Progressive
Download and Dynamic Adaptive Streaming over HTTP (3GP-DASH)".
[9] ISO/IEC 14496-15: "Information technology - Coding of audio-visual objects - Part 15: Carriage
of network abstraction layer (NAL) unit structured video in ISO base media file format".
[10] ISO/IEC 23001-8: "Information technology -- MPEG systems technologies -- Part 8: Coding-
independent code points".
[11] Recommendation ITU-R BT.2100-1: "Image parameter values for high dynamic range television
for use in production and international programme exchange".
[12] 3GPP TS 26.116: "Television (TV) over 3GPP services; Video profiles".
[13] ISO/IEC 23090-2: "Coded representation of immersive media -- Part 2: Omnidirectional media
format".
[14] ISO/IEC DIS 23091-2: "Information technology -- Coding-independent code points -- Part 2:
Video".
[15] 3GPP TS 26.260: "Objective test methodologies for the evaluation of immersive audio systems".
[16] 3GPP TS 26.259: "Subjective test methodologies for the evaluation of immersive audio systems".
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 9 ETSI TS 126 118 V15.1.0 (2019-04)
[17] ISO/IEC 14496-12: "Information technology -- Coding of audio-visual objects -- Part 12: ISO base
media file format".
[18] ISO/IEC 23009-1: "Information technology -- Dynamic adaptive streaming over HTTP (DASH) --
Part 1: Media presentation description and segment formats".
[19] ISO/IEC 23008-3:2015: "Information technology -- High efficiency coding and media delivery in
heterogeneous environments - Part 3: 3D audio", ISO/IEC 23008-3:2015/Amd2:2016: "MPEG-H
3D Audio File Format Support ", ISO/IEC 23008-3:2015/Amd 3:2017: "MPEG-H 3D Audio
Phase 2", ISO/IEC 23008-3:2015/Amd 5: "Audio metadata enhancements".
[20] IETF RFC 6381: "The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types", R. Gellens,
D. Singer, P. Frojdh, August 2011.
[21] AES69-2015: "AES standard for file exchange - Spatial acoustic data file format", 2015.
3 Definitions, symbols and abbreviations
3.1 Definitions
For the purposes of the present document, the terms and definitions given in 3GPP TR 21.905 [1] and the following
apply. A term defined in the present document takes precedence over the definition of the same term, if any, in 3GPP
TR 21.905 [1].
bitstream: a bitstream that conforms to a video encoding format and certain Operation Point
field of view: the extent of visible area expressed with vertical and horizontal angles, in degrees in the 3GPP 3DOF
reference system
operation point: a collection of discrete combinations of different content formats including spatial and temporal
resolutions, colour mapping, transfer functions, rendering metadata and the encoding format.
pose: position derived by the head tracking sensor expressed by (azimuth; elevation; tilt angle).
receiver: a receiver that can decode and render any bitstream that is conforming to a certain Operation Point.
viewport: the part of the 3DOF content to render based on the pose and the field of view.
3.2 Symbols
For the purposes of the present document, the following symbols apply:
α yaw of the 3GPP 3DOF coordinate system
β pitch of the 3GPP 3DOF coordinate system
γ roll of the 3GPP 3DOF coordinate system
ϕ azimuth of the 3GPP 3DOF coordinate system
θ elevation of the 3GPP 3DOF coordinate system
3.3 Abbreviations
For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [1] and the following apply. An
abbreviation defined in the present document takes precedence over the definition of the same abbreviation, if any, in
3GPP TR 21.905 [1].
3DOF 3 Degrees of freedom
ACN Ambisonics Channel Number
API Application Programming Interface
AVC Advanced Video Coding
BMFF Base Media File Format
BRIR Binaural Room Impulse Response
CMP Cube-Map Projection
CIBR Common Informative Binaural Renderer
DASH Dynamic Adaptive Streaming over HTTP
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 10 ETSI TS 126 118 V15.1.0 (2019-04)
DRC Dynamic Range Control
EOTF Electro-Optical Transfer Function
ERP EquiRectangular Projection
ESD Equivalent Spatial Domain
FFT Fast Fourier Transform
FIR Finite Impulse Response
FOA First Order Ambisonics
FOV Field Of View
GPU Graphics Processing Unit
HDR High Dynamic Range
HDTV High Definition TeleVision
HEVC High Efficiency Video Coding
HMD Head Mounted Display
HOA High Order Ambisonics
HRD Hypothetical Reference Decoder
HRIR Head-Related Impulse Responses
HRTF Head-Related Transfer Function
HTTP HyperText Transfer Protocol
IFFT Inverse FFT
IRFFT Inverse RFFT
MAE MPEG-H Audio Metadata information
MHAS MPEG-H Audio Stream
MIME Multipurpose Internet Mail Extensions
MPD Media Presentation Description
MPEG Moving Pictures Experts Group
NAL Network Abstraction Layer
OMAF Omnidirectional MediA Format
PCM Pulse Code Modulation
RAP Random Access Point
RFFT Real FFT
RWP Region-Wise Packing
SDR Standard Dynamic Range
SEI Supplemental Enhancement Information
SN3D Schmidt semi-normalisation
SOFA Spatially Oriented Format for Acoustics
SPS Sequence Parameter Set
SRQR Spherical Region-wise Quality Ranking
VCL Video Coding Layer
VST Virtual Studio Technology
VUI Video Usability Information
VR Virtual Reality
4 Architectures and Interfaces for Virtual Reality
4.1 Definitions and Reference Systems
4.1.1 Overview
Virtual reality is a rendered version of a delivered visual and audio scene. The rendering is designed to mimic the visual
and audio sensory stimuli of the real world as naturally as possible to an observer or user as they move within the limits
defined by the application.
Virtual reality usually, but not necessarily, assumes a user to wear a head mounted display (HMD), to completely
replace the user's field of view with a simulated visual component, and to wear headphones, to provide the user with the
accompanying audio as shown in Figure 4.1-1.
ETSI
3GPP TS 26.118 version 15.1.0 Release 15 11 ETSI TS 126 118 V15.1.0 (2019-04)
Z
Yaw α
Roll γ
Pit ch β
X
Y
Figure 4.1-1: Reference System
Some form of head and motion tracking of the user in VR is usually also necessary to allow the simulated visual and
audio components to be updated in order to ensure that, from the user's perspective, items and sound sources remain
consistent with the user's movements. Sensors typically are able to track the user's pose in the reference system.
Additional means to interact with the virtual reality simulation may be provided but are not strictly necessary.
VR users are expected to be able to look around from a single observation point in 3D space defined by either a
producer or the position of one or multiple capturing devices. When VR media including video and audio is consumed
with a head-mounted display or a smartphone, only the area of the spherical video that corresponds to the user's
viewport is rendered, as if the user were in the spot where the video and audio were captured.
This ability to look around and listen from a centre point in 3D space is defined as 3 degrees of freedom (3DOF).
According to the figure 4.1-1:
- tilting side to side on the X-axis is referred to as Rolling, also expressed as γ
- tilting forward and backward on the Y-axis is referred to as Pitching, also expressed as β
α
- turning left and right on the Z-axis is referred to as Yawing, also expressed as
It is worth noting that this centre point is not necessarily static - it may be moving. Users or producers may also select
from a few different observational points, but each observation point in 3D space only permits the user 3 degrees of
freedom. For a full 3DOF VR experience, such video content may be combined with simultaneously captured audio,
binaurally rendered with an appropriate Binaural Room Impulse Response (BRIR). The third relevant aspect is the
interactivity: Only if the content is presented to the user in such a way that the movements are instantaneously reflected
in the rendering , then the user will perceive a full immersive experience. For details on immersive rendering latencies,
refer to TR 26.918 [2].
4.1.2 3GPP 3DOF Coordinate System
The coordinate system is specified for defining the sphere coordinates azimuth (φ) and elevation (θ) for identifying a
location of a point on the unit sphere, as well as the rotation angles yaw (α), pitch ( β), and roll ( γ). The origin of
the coordinate system is usually the same as the centre point of a device or rig used for audio or video acquisition as
well as the position of the user's head in the 3D space in w
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...