ISO/IEC 23090-7:2022/Amd 1:2024
(Amendment)Information technology — Coded representation of immersive media — Part 7: Immersive media metadata — Amendment 1: Common metadata for immersive media
Information technology — Coded representation of immersive media — Part 7: Immersive media metadata — Amendment 1: Common metadata for immersive media
Technologies de l'information — Représentation codée de média immersifs — Partie 7: Métadonnées de media immersifs — Amendement 1: Métadonnées communes pour médias immersifs
General Information
Relations
Standards Content (Sample)
International
Standard
ISO/IEC 23090-7
First edition
Information technology — Coded
2022-11
representation of immersive media —
AMENDMENT 1
Part 7:
2024-12
Immersive media metadata
AMENDMENT 1: Common metadata
for immersive media
Reference number
ISO/IEC 23090-7:2022/Amd. 1:2024(en) © ISO/IEC 2024
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 23090 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2024 – All rights reserved
iii
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
Information technology — Coded representation of
immersive media —
Part 7:
Immersive media metadata
AMENDMENT 1: Common metadata for immersive media
Normative references
Add the following reference to the end of the reference list:
IEEE 754-2019. IEEE Standard for Floating-Point Arithmetic
Introduction
Replace:
— Clause 5 describes common metadata applicable to immersive media. This includes reference co-
ordinate system related metadata and other common metadata syntax and semantics.
— Clause 6 describes metadata that applies to video and images. This includes projection formats and
region-wise packing metadata metadata which applies to video and images.
with:
— Clause 5 describes common metadata applicable to immersive media, with omnidirectional media
in particular. This includes reference co-ordinate system related metadata and other common
metadata syntax and semantics.
— Clause 6 describes metadata that applies to video and images, with omnidirectional media in
particular. This includes projection formats and region-wise packing metadata metadata which
applies to video and images.
— Clause 7 describes common metadata applicable to immersive media, with Visual Volumetric Video-
based Coding (V3C) and Video-based Point Cloud Compression (V-PCC) in particular. This includes
extrinsic camera information, intrinsic camera information, and other 3D common metadata syntax
and semantics.
— Appendices A and B on Annotations of non-timed visual volumetric data and G-PCC data.
Add the following paragraphs after the document organization list
This document follows the following guiding principles:
1) Common metadata and their data structures shall be defined for both 3DoF and 6DoF immersive
content, separately as well as jointly, in order to be used for applications that are either specific to
separate 3DoF and 6DoF immersive content or general to mixed 3DoF and 6DoF immersive content.
© ISO/IEC 2024 – All rights reserved
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
2) Basic and common data structures are defined for simple metadata, and extend and enhanced data
structures are defined as extensions of basic and common metadata (e.g., viewport is an extension
of viewpoint)
3) Metadata structures shall be defined in a way to allow their encapsulation in ISOBMFF:
— Static: extension of containing boxes
— Dynamic: timed metadata tracks
Clause 7
Add the following new Clause 7 after Clause 6.
7 Common Metadata for Immersive Media
7.1 Vector3
Dimensions, positions, sizes for 3D immersive media can be defined using the following 3D vector data
structure.
7.1.1 Syntax
aligned(8) class Vector3(unsigned char precision_bytes_minus1) {
signed int((precision_bytes_minus1+1)*8) x;
signed int((precision_bytes_minus1+1)*8) y;
signed int((precision_bytes_minus1+1)*8) z;
}
7.1.2 Semantics
precision_bytes_minus1: Plus 1, specifies the precision of Vector3 components in bytes. Valid values are in
the range from [0, 3].
x, y and z: specify the x, y, and z coordinate values, respectively, of a 3D point in the Cartesian coordinate system
7.2 Scaling
Scaling in 3-dimension is defined using the following data structure:
7.2.1 Syntax
aligned(8) class 3DScaling (unsigned char precision_bytes_minus1) {
Vector3 scale(precision_bytes_minus1);
}
7.2.2 Semantics
precision_bytes_minus1: Plus 1, specifies the precision of scale components in bytes. Valid values are in
the range from [0, 3].
scale.x, scale.y, and scale.z indicate the scaling extension in the Cartesian coordinates along the x, y, and
z axes, respectively, relative to the origin (0,0,0).
7.3 Extrinsic Camera Information
Extrinsic camera information is defined using the following data structure.
7.3.1 Syntax
class CameraExtrinsics(unsigned char abs_flag, unsigned char mode, unsigned char pos_bytes_
minus1, unsigned char pos_unit, unsigned char quat_bytes_minus1, unsigned char quat_den_bits_
minus1) {
if(mode & 0x1) {
signed int((pos_bytes_minus1+1)*8) pos_x;
© ISO/IEC 2024 – All rights reserved
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
}
if(mode & 0x2) {
signed int((pos_bytes_minus1+1)*8) pos_y;
}
if(mode & 0x4) {
signed int((pos_bytes_minus1+1)*8) pos_z;
}
if(mode & 0x8) {
Vector3 quat(quat_bytes_minus1);
}
}
7.3.2 Semantics
abs_flag: If 1, absolute position and orientation is specified. If 0, the specified values are added relative to
the previously coded position and orientation.
mode: Signalling mode; Valid values are:
[1, 7]: Only the position is signalled.
8: Only the orientation is signalled.
[9, 15]: Both, orientation and position are signalled.
pos_bytes_minus1: Plus 1 indicates the number of bytes to be read for pos_x, pos_y and pos_z. Valid values
are in the range from [0, 3].
pos_unit: Unit of pos_x, pos_y and pos_z. Valid values are in the range from [0, 2], where
0: µm
1: mm
2: m
quat_bytes_minus1: Plus 1 indicates the number of bytes to be read for quat.x, quat.y, quat.z. Valid
values are in the range from [0, 1].
quat_den_bits_minus1: Specifies the denominator of quat.x, quat.y and quat.z. Valid values for quat_den_
bits_minus1 are in the range from [0, 13]. The denominator is computed as follows:
quat_den_bits_minus1 + 1
denominator = 2
pos_x: Specifies the x-coordinate of the location of the camera in units specified by pos_unit. When not
present, its value shall be inferred to be 0 if abs_flag is 1.
pos_y: Specifies the y-coordinate of the location of the camera in units specified by pos_unit. When not
present, its value shall be inferred to be 0 if abs_flag is 1.
pos_z: Specifies the z-coordinate of the location of the camera in units specified by pos_unit. When not
present, its value shall be inferred to be 0 if abs_flag is 1.
quat.x: Specifies the x component, qX, for the rotation of the camera using the quaternion representation.
quat_den_bits_minus1+1 quat_den_bits_minus1+1
The range of quat_x shall be in the range of -2 to 2 , inclusive.
When not present, its value shall be inferred to be 0 if abs_flag is set to 1.
quat.y: Specifies the y component, qY, for the rotation of the camera using the quaternion representation.
quat_den_bits_minus1+1 quat_den_bits_minus1+1
The range of quat_y shall be in the range of -2 to 2 , inclusive.
When not present, its value shall be inferred to be 0 if abs_flag is set to 1.
quat.z: Specifies the z component, qZ, for the rotation of the camera using the quaternion representation.
quat_den_bits_minus1+1 quat_den_bits_minus1+1
The range of quat_z shall be in the range of -2 to 2 , inclusive.
When not present, its value shall be inferred to be 0 if abs_flag is set to 1.
© ISO/IEC 2024 – All rights reserved
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
The values of the quaternion representation are computed as follows:
qX = quat.x / denominator
qY = quat.y / denominator
qZ = quat.z / denominator
It is a requirement of bitstream conformance that:
qX2 + qY2 +qZ2 <= 1
The fourth component of the quaternion representation, qW, is computed as follows:
2 2 2
qW = Sqrt( 1 – ( qX + qY + qZ ) )
The point (w, x, y, z) represents a rotation around the axis directed by the vector (x, y, z) by an angle
2*cos ^{-1}(w)=2*sin ^{-1}(sqrt(x^{2}+y^{2}+z^{2})).
NOTE As aligned ISO/IEC FDIS 23090-5, qW is always positive. If a negative qW is desired, one can signal all three
syntax elements, cam_quat_x, cam_quat_y, and cam_quat_z with an opposite sign, which is equivalent.
7.4 Intrinsic Camera Information
Intrinsic camera information is defined using the following data structure.
7.4.1 Syntax
aligned(8) class IntCameraInfo (unsigned char precision_bytes_minus1) {
unsigned int(10) camera_id;
bit(3) reserved = 0;
unsigned int(3) camera_type;
if (camera_type == 0) {
signed int((precision_bytes_minus1+1)*8) erp_horizontal_fov;
signed int((precision_bytes_minus1+1)*8) erp_vertical_fov;
}
if (camera_type == 1) {
signed int((precision_bytes_minus1+1)*8) perspective_horizontal_fov;
unsigned int(8)[4] perspective_aspect_ratio;
}
if (camera_type == 2) {
unsigned int(8)[4] ortho_aspect_ratio;
unsigned int(8)[4] ortho_horizontal_size;
}
unsigned int(8)[4] clipping_near_plane;
unsigned int(8)[4] clipping_far_plane;
}
7.4.2 Semantics
camera_id is an identifier number that is used to identify a given viewport camera parameters.
camera_type indicates the projection method of the viewport camera. The value 0 specifies ERP projection.
The value 1 specifies a perspective projection. The value 2 specifies an orthographic projection. Values
in the range 3 to 255 are reserved for future use by ISO/IEC.
precision_bytes_minus1: Plus 1 indicates the number of bytes to be read for erp_horizontal_fov, erp_
vertical_fov and perspective_horizontal_fov. Valid values are in the range from [0, 3].
erp_horizontal_fov specifies the longitude range for an ERP projection corresponding to the horizontal
size of the viewport region, in units of radians. The value shall be in the range 0 to 2π.
erp_vertical_fov specifies the latitude range for an ERP projection corresponding to the vertical size of the
viewport region, in units of radians. The value shall be in the range 0 to π.
perspective_horizontal_fov specifies the horizontal field of view for perspective projection in radians.
The value of shall be in the range of 0 and π.
© ISO/IEC 2024 – All rights reserved
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
perspective_aspect_ratio specifies the relative aspect ratio of viewport for perspective projection
(horizontal/vertical). The value shall be expressed in 32-bit binary floating-point format with the 4
bytes in big-endian order and with the parsing process as specified in IEEE 754.
ortho_aspect_ratio specifies the relative aspect ratio of viewport for orthogonal projection (horizontal/
vertical). The value shall be expressed in 32-bit binary floating-point format with the 4 bytes in big-
endian order and with the parsing process as specified in IEEE 754.
ortho_horizontal_size specifies the horizontal size of the orthogonal in metres. The value shall be
expressed in 32-bit binary floating-point format with the 4 bytes in big-endian order and with the
parsing process as specified in IEEE 754.
clipping_near_plane and clipping_far_plane indicate the near and far depths (or distances) based on the
near and far clipping planes of the viewport in metres. The values shall be expressed in 32-bit binary
floating-point format with the 4 bytes in big-endian order and with the parsing process as specified in
IEEE 754.
7.5 Viewing Spaces
A cuboid viewi
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.