ISO/IEC 23090-7:2022/Amd 1:2024
(Amendment)Information technology — Coded representation of immersive media — Part 7: Immersive media metadata — Amendment 1: Common metadata for immersive media
Information technology — Coded representation of immersive media — Part 7: Immersive media metadata — Amendment 1: Common metadata for immersive media
Technologies de l'information — Représentation codée de média immersifs — Partie 7: Métadonnées de media immersifs — Amendement 1: Métadonnées communes pour médias immersifs
ISO/IEC 23090-7
First edition
Information technology — Coded
representation of immersive media —
Part 7:
Immersive media metadata
AMENDMENT 1: Common metadata
for immersive media
Reference number
ISO/IEC 23090-7:2022/Amd. 1:2024(en) © ISO/IEC 2024
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
Information technology — Coded representation of
immersive media —
Part 7:
Immersive media metadata
AMENDMENT 1: Common metadata for immersive media
Normative references
Add the following reference to the end of the reference list:
IEEE 754-2019. IEEE Standard for Floating-Point Arithmetic
— Clause 5 describes common metadata applicable to immersive media. This includes reference co-
ordinate system related metadata and other common metadata syntax and semantics.
— Clause 6 describes metadata that applies to video and images. This includes projection formats and
region-wise packing metadata metadata which applies to video and images.
— Clause 5 describes common metadata applicable to immersive media, with omnidirectional media
in particular. This includes reference co-ordinate system related metadata and other common
metadata syntax and semantics.
— Clause 6 describes metadata that applies to video and images, with omnidirectional media in
particular. This includes projection formats and region-wise packing metadata metadata which
applies to video and images.
— Clause 7 describes common metadata applicable to immersive media, with Visual Volumetric Video-
based Coding (V3C) and Video-based Point Cloud Compression (V-PCC) in particular. This includes
extrinsic camera information, intrinsic camera information, and other 3D common metadata syntax
and semantics.
— Appendices A and B on Annotations of non-timed visual volumetric data and G-PCC data.
Add the following paragraphs after the document organization list
This document follows the following guiding principles:
1) Common metadata and their data structures shall be defined for both 3DoF and 6DoF immersive
content, separately as well as jointly, in order to be used for applications that are either specific to
separate 3DoF and 6DoF immersive content or general to mixed 3DoF and 6DoF immersive content.
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
2) Basic and common data structures are defined for simple metadata, and extend and enhanced data
structures are defined as extensions of basic and common metadata (e.g., viewport is an extension
of viewpoint)
3) Metadata structures shall be defined in a way to allow their encapsulation in ISOBMFF:
— Static: extension of containing boxes
— Dynamic: timed metadata tracks
Clause 7
Add the following new Clause 7 after Clause 6.
7 Common Metadata for Immersive Media
7.1 Vector3
Dimensions, positions, sizes for 3D immersive media can be defined using the following 3D vector data
7.1.1 Syntax
aligned(8) class Vector3(unsigned char precision_bytes_minus1) {
signed int((precision_bytes_minus1+1)*8) x;
signed int((precision_bytes_minus1+1)*8) y;
signed int((precision_bytes_minus1+1)*8) z;
7.1.2 Semantics
precision_bytes_minus1: Plus 1, specifies the precision of Vector3 components in bytes. Valid values are in
the range from [0, 3].
x, y and z: specify the x, y, and z coordinate values, respectively, of a 3D point in the Cartesian coordinate system
7.2 Scaling
Scaling in 3-dimension is defined using the following data structure:
7.2.1 Syntax
aligned(8) class 3DScaling (unsigned char precision_bytes_minus1) {
Vector3 scale(precision_bytes_minus1);
7.2.2 Semantics
precision_bytes_minus1: Plus 1, specifies the precision of scale components in bytes. Valid values are in
the range from [0, 3].
scale.x, scale.y, and scale.z indicate the scaling extension in the Cartesian coordinates along the x, y, and
z axes, respectively, relative to the origin (0,0,0).
7.3 Extrinsic Camera Information
Extrinsic camera information is defined using the following data structure.
7.3.1 Syntax
class CameraExtrinsics(unsigned char abs_flag, unsigned char mode, unsigned char pos_bytes_
minus1, unsigned char pos_unit, unsigned char quat_bytes_minus1, unsigned char quat_den_bits_
minus1) {
if(mode & 0x1) {
signed int((pos_bytes_minus1+1)*8) pos_x;
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
if(mode & 0x2) {
signed int((pos_bytes_minus1+1)*8) pos_y;
if(mode & 0x4) {
signed int((pos_bytes_minus1+1)*8) pos_z;
if(mode & 0x8) {
Vector3 quat(quat_bytes_minus1);
7.3.2 Semantics
abs_flag: If 1, absolute position and orientation is specified. If 0, the specified values are added relative to
the previously coded position and orientation.
mode: Signalling mode; Valid values are:
[1, 7]: Only the position is signalled.
8: Only the orientation is signalled.
[9, 15]: Both, orientation and position are signalled.
pos_bytes_minus1: Plus 1 indicates the number of bytes to be read for pos_x, pos_y and pos_z. Valid values
are in the range from [0, 3].
pos_unit: Unit of pos_x, pos_y and pos_z. Valid values are in the range from [0, 2], where
0: µm
1: mm
2: m
quat_bytes_minus1: Plus 1 indicates the number of bytes to be read for quat.x, quat.y, quat.z. Valid
values are in the range from [0, 1].
quat_den_bits_minus1: Specifies the denominator of quat.x, quat.y and quat.z. Valid values for quat_den_
bits_minus1 are in the range from [0, 13]. The denominator is computed as follows:
quat_den_bits_minus1 + 1
denominator = 2
pos_x: Specifies the x-coordinate of the location of the camera in units specified by pos_unit. When not
present, its value shall be inferred to be 0 if abs_flag is 1.
pos_y: Specifies the y-coordinate of the location of the camera in units specified by pos_unit. When not
present, its value shall be inferred to be 0 if abs_flag is 1.
pos_z: Specifies the z-coordinate of the location of the camera in units specified by pos_unit. When not
present, its value shall be inferred to be 0 if abs_flag is 1.
quat.x: Specifies the x component, qX, for the rotation of the camera using the quaternion representation.
quat_den_bits_minus1+1 quat_den_bits_minus1+1
The range of quat_x shall be in the range of -2 to 2 , inclusive.
When not present, its value shall be inferred to be 0 if abs_flag is set to 1.
quat.y: Specifies the y component, qY, for the rotation of the camera using the quaternion representation.
quat_den_bits_minus1+1 quat_den_bits_minus1+1
The range of quat_y shall be in the range of -2 to 2 , inclusive.
When not present, its value shall be inferred to be 0 if abs_flag is set to 1.
quat.z: Specifies the z component, qZ, for the rotation of the camera using the quaternion representation.
quat_den_bits_minus1+1 quat_den_bits_minus1+1
The range of quat_z shall be in the range of -2 to 2 , inclusive.
When not present, its value shall be inferred to be 0 if abs_flag is set to 1.
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
The values of the quaternion representation are computed as follows:
qX = quat.x / denominator
qY = quat.y / denominator
qZ = quat.z / denominator
It is a requirement of bitstream conformance that:
qX2 + qY2 +qZ2 <= 1
The fourth component of the quaternion representation, qW, is computed as follows:
2 2 2
qW = Sqrt( 1 – ( qX + qY + qZ ) )
The point (w, x, y, z) represents a rotation around the axis directed by the vector (x, y, z) by an angle
2*cos ^{-1}(w)=2*sin ^{-1}(sqrt(x^{2}+y^{2}+z^{2})).
NOTE As aligned ISO/IEC FDIS 23090-5, qW is always positive. If a negative qW is desired, one can signal all three
syntax elements, cam_quat_x, cam_quat_y, and cam_quat_z with an opposite sign, which is equivalent.
7.4 Intrinsic Camera Information
Intrinsic camera information is defined using the following data structure.
7.4.1 Syntax
aligned(8) class IntCameraInfo (unsigned char precision_bytes_minus1) {
unsigned int(10) camera_id;
bit(3) reserved = 0;
unsigned int(3) camera_type;
if (camera_type == 0) {
signed int((precision_bytes_minus1+1)*8) erp_horizontal_fov;
signed int((precision_bytes_minus1+1)*8) erp_vertical_fov;
if (camera_type == 1) {
signed int((precision_bytes_minus1+1)*8) perspective_horizontal_fov;
unsigned int(8)[4] perspective_aspect_ratio;
if (camera_type == 2) {
unsigned int(8)[4] ortho_aspect_ratio;
unsigned int(8)[4] ortho_horizontal_size;
unsigned int(8)[4] clipping_near_plane;
unsigned int(8)[4] clipping_far_plane;
7.4.2 Semantics
camera_id is an identifier number that is used to identify a given viewport camera parameters.
camera_type indicates the projection method of the viewport camera. The value 0 specifies ERP projection.
The value 1 specifies a perspective projection. The value 2 specifies an orthographic projection. Values
in the range 3 to 255 are reserved for future use by ISO/IEC.
precision_bytes_minus1: Plus 1 indicates the number of bytes to be read for erp_horizontal_fov, erp_
vertical_fov and perspective_horizontal_fov. Valid values are in the range from [0, 3].
erp_horizontal_fov specifies the longitude range for an ERP projection corresponding to the horizontal
size of the viewport region, in units of radians. The value shall be in the range 0 to 2π.
erp_vertical_fov specifies the latitude range for an ERP projection corresponding to the vertical size of the
viewport region, in units of radians. The value shall be in the range 0 to π.
perspective_horizontal_fov specifies the horizontal field of view for perspective projection in radians.
The value of shall be in the range of 0 and π.
ISO/IEC 23090-7:2022/Amd. 1:2024(en)
perspective_aspect_ratio specifies the relative aspect ratio of viewport for perspective projection
(horizontal/vertical). The value shall be expressed in 32-bit binary floating-point format with the 4
bytes in big-endian order and with the parsing process as specified in IEEE 754.
ortho_aspect_ratio specifies the relative aspect ratio of viewport for orthogonal projection (horizontal/
vertical). The value shall be expressed in 32-bit binary floating-point format with the 4 bytes in big-
endian order and with the parsing process as specified in IEEE 754.
ortho_horizontal_size specifies the horizontal size of the orthogonal in metres. The value shall be
expressed in 32-bit binary floating-point format with the 4 bytes in big-endian order and with the
parsing process as specified in IEEE 754.
clipping_near_plane and clipping_far_plane indicate the near and far depths (or distances) based on the
near and far clipping planes of the viewport in metres. The values shall be expressed in 32-bit binary
floating-point format with the 4 bytes in big-endian order and with the parsing process as specified in
IEEE 754.
7.5 Viewing Spaces
A cuboid viewi
