Information technology — Multimedia application format (MPEG-A) — Part 11: Stereoscopic video application format

ISO/IEC 23000-11:2009 specifies a file format which is capable of storage, interchange, management, editing, and presentation of stereoscopic video contents based on the ISO base media file format. ISO/IEC 23000-11:2009 specifies the following: the file type brand for stereoscopic video contents; four stereoscopic video contents composition types, which are widely used and suitable for the 3D mobile display; the file structure, which is capable of storage with a single video track or two video tracks for stereoscopic contents; the stereoscopic video media information, which includes the visual type information of signaling stereoscopic content composition types, and the stereo_mono_change information for the identification of each fragment in stereo-monoscopic mixed contents; the stereoscopic camera and display information, which provides camera, display and visual safety information for stereoscopic contents; the track reference type, which tells the track of indicating primary and secondary view sequences in case of using two video tracks for stereoscopic contents; the item location, which provides a location of stereoscopic fragments in stereoscopic video contents. ISO/IEC 23000-11:2009 provides the overall structure for storing pure stereoscopic video contents and also stereo-monoscopic mixed contents with the stereoscopic-related information in mobile environments.

Technologies de l'information — Format pour application multimédia (MPEG-A) — Partie 11: Format pour application vidéo stéréoscopique

General Information

Status
Published
Publication Date
11-Nov-2009
Current Stage
9093 - International Standard confirmed
Completion Date
23-Jun-2021
Ref Project

Relations

Buy Standard

Standard
ISO/IEC 23000-11:2009 - Information technology -- Multimedia application format (MPEG-A)
English language
23 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 23000-11
First edition
2009-11-15


Information technology — Multimedia
application format (MPEG-A) —
Part 11:
Stereoscopic video application format
Technologies de l'information — Format pour application multimédia
(MPEG-A) —
Partie 11: Format pour application vidéo stéréoscopique




Reference number
ISO/IEC 23000-11:2009(E)
©
ISO/IEC 2009

---------------------- Page: 1 ----------------------
ISO/IEC 23000-11:2009(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.


COPYRIGHT PROTECTED DOCUMENT


©  ISO/IEC 2009
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2009 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 23000-11:2009(E)
Contents Page
Foreword .v
Introduction.vi
1 Scope.1
2 Normative references.1
3 Terms and definitions .2
4 Abbreviated terms.4
5 Overview.4
5.1 Overall procedure of stereoscopic contents.4
5.2 Acquisition of the stereoscopic contents.4
5.3 Stereoscopic contents composition type.6
5.3.1 Side-by-side type.6
5.3.2 Vertical line interleaved type.7
5.3.3 Frame sequential type.7
5.3.4 Left/Right view sequence type.7
6 Components of Stereoscopic Video AF.8
6.1 Supported components .8
6.1.1 ISO base media file format .8
6.1.2 LASeR.8
6.1.3 AMR.9
6.1.4 EVRC.9
7 File structures.9
7.1 Table for boxes .9
7.2 File structures of Stereoscopic Video AF.11
7.2.1 File structure for stereoscopic contents.11
7.2.2 File structure for stereo-monoscopic mixed contents.13
8 Syntax and Semantics of the Boxes.15
8.1 File Type Box .15
8.1.1 Definition .15
8.2 Track Reference Box.16
8.2.1 Definition .16
8.2.2 Syntax.16
8.2.3 Semantics.16
8.3 Sync Sample Box .16
8.3.1 Definition .16
8.4 Stereoscopic Video Media Information Box .17
8.4.1 Definition .17
8.4.2 Syntax.17
8.4.3 Semantics.17
8.5 Stereoscopic Camera and Display Information Box.18
8.5.1 Definition .18
8.5.2 Syntax.18
8.5.3 Semantics.19
8.6 Item Location Box .20
8.6.1 Definition .20
8.6.2 Semantics.20
8.7 Registration of voice codecs .20
8.7.1 AMRSampleEntry box.20
8.7.2 EVRCSampleEntry box .21
© ISO/IEC 2009 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 23000-11:2009(E)
Annex A (informative) Use cases of the file structure of stereo-monoscopic mixed contents .22
Bibliography .23

iv © ISO/IEC 2009 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 23000-11:2009(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 23000-11 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC 23000 consists of the following parts, under the general title Information technology — Multimedia
application format (MPEG-A):
⎯ Part 1: Purpose for multimedia application formats [Technical Report]
⎯ Part 2: MPEG music player application format
⎯ Part 3: MPEG photo player application format
⎯ Part 4: Musical slide show application format
⎯ Part 5: Media streaming application format
⎯ Part 6: Professional archival application format
⎯ Part 7: Open access application format
⎯ Part 8: Portable video application format
⎯ Part 9: Digital Multimedia Broadcasting application format
⎯ Part 10: Video surveillance application format
⎯ Part 11: Stereoscopic video application format
⎯ Part 12: Interactive music application format

© ISO/IEC 2009 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC 23000-11:2009(E)
Introduction
In today’s technological arena, there is an abundance of digital content for digital image machinery such as
laptops, cell-phones, digital cameras, and mobile devices. Stereoscopic video contents provide users with an
experience of natural three-dimensional scenes, which are displayed using acquisition and generation
techniques. The market for applying stereoscopic video contents on such devices is taking shape and
maturing. Stereoscopic laptops, mobile phones, digital TVs, and multimedia devices are already on the
market; however, what seems to be required for an immersive 3D market is a standard file format which is
capable of storage, interchange, management, editing, and presentation of stereoscopic video contents.
The Stereoscopic Video application format (AF) defines a file format for stereoscopic video services in mobile
environments. It specifies core structures of stereoscopic video AF being organized by the combination of
related information for stereoscopic video applications.
Applicable areas of the Stereoscopic Video AF are quite broad, including the internet, telecommunications,
and storage devices. The user can download the Stereoscopic Video AF files from the internet or via the
telecommunication networks to his/her personal multimedia devices (e.g. Portable Multimedia Player or cell-
phone) for local playback.

vi © ISO/IEC 2009 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 23000-11:2009(E)

Information technology — Multimedia application format
(MPEG-A) —
Part 11:
Stereoscopic video application format
1 Scope
This part of ISO/IEC 23000 specifies a file format which is capable of storage, interchange, management,
editing, and presentation of stereoscopic video contents based on the ISO base media file format. The file
format provides the overall structure for storing stereoscopic video contents with the related stereoscopic
information in mobile environments.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO/IEC 10918-1:1994, Information technology — Digital compression and coding of continuous-tone still
images: Requirements and guidelines
ISO/IEC 14496-2, Information technology — Coding of audio-visual objects — Part 2: Visual
ISO/IEC 14496-3, Information technology — Coding of audio-visual objects — Part 3: Audio
ISO/IEC 14496-10, Information technology — Coding of audio-visual objects — Part 10: Advanced Video
Coding
ISO/IEC 14496-12, Information technology — Coding of audio-visual objects — Part 12: ISO base media file
format
ISO/IEC 14496-20, Information technology — Coding of audio-visual objects — Part 20: Lightweight
Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)
ISO/IEC 15948:2004, Information technology — Computer graphics and image processing — Portable
Network Graphics (PNG): Functional specification
3GPP TS 26.071, Mandatory speech CODEC speech processing functions; AMR speech Codec; General
description
TIA/EIA/IS-127, Enhanced Variable Rate Codec (EVRC)

© ISO/IEC 2009 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO/IEC 23000-11:2009(E)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1
baseline
line between origins of the respective cameras
3.2
convergence distance
distance between a convergence point and a midpoint of baseline
3.3
convergence point
point at which two optical axes of left and right cameras intersect
3.4
disparity
horizontal difference between corresponding points in stereoscopic view
3.5
focal length
distance from a surface of a lens (optical center) or mirror to its focal point (image plane)
3.6
frame
one of the many still images which compose the complete moving picture
NOTE A frame contains an array of luma samples and two corresponding arrays of chroma samples. A frame
consists of two fields: a top field and a bottom field.
3.7
lenticular
array of magnifying lenses designed so that, when viewed from slightly different angles, different images are
magnified
NOTE A lenticular sheet is placed on a normal display panel to show two or more different views simply by changing
the angle of light direction. It can make left and right views display on left and right eyes, respectively, creating a sense of
depth.
3.8
max of disparity
maximum disparity value within a stereoscopic fragment
3.9
monoscopic fragment
set of successive samples which represents only monoscopic sequence
3.10
min of disparity
minimum disparity value within the stereoscopic fragment
2 © ISO/IEC 2009 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 23000-11:2009(E)
3.11
parallax barrier
device to allow a liquid crystal display to show a three dimensional image without the need for the viewer to
wear glasses
NOTE Placed in front of the normal display panel, a parallax barrier consists of a layer of material with a series of
precision slits, allowing each eye to see a different set of pixels, so creating a sense of depth.
3.12
primary view sequence
sequence that has a priority of presentation between sequences of Left/Right view sequence type
3.13
rotation
relative angular variation from the primary-view camera to the secondary-view camera
3.14
secondary view sequence
sequence that has a lower priority of presentation than the primary view sequence between sequences of
Left/Right view sequence type
3.15
sequence
series of one or more frames
3.16
stereoscopic camera information
information for stereoscopic camera parameters such as baseline, focal_length, convergence_distance,
camera_arrangement, and rotation
3.17
stereoscopic display information
information for the stereoscopic display and visual safety, such as the display size and the viewing distance
3.18
stereoscopic fragment
set of successive samples which represents the stereoscopic sequence satisfying the stereoscopic
composition type specified in this part of ISO/IEC 23000
3.19
stereoscopic left fragment
set of successive samples which represents the left view of stereoscopic sequences satisfying the
stereoscopic composition type specified in this part of ISO/IEC 23000
3.20
stereoscopic left view sequence
left view sequence of the stereoscopic sequence
3.21
stereoscopic right fragment
set of successive samples which represents the right view of the stereoscopic sequences satisfying the
stereoscopic composition type specified in this part of ISO/IEC 23000
3.22
stereoscopic right view sequence
right view sequence of the stereoscopic sequence
© ISO/IEC 2009 – All rights reserved 3

---------------------- Page: 9 ----------------------
ISO/IEC 23000-11:2009(E)
4 Abbreviated terms
3D Three Dimensional
AAC Advanced Audio Coding
AF Application Format
AMR Adaptive Multirate
AVC Advanced Video Coding
CDMA Code Division Multiple Access
EVRC Enhanced Variable Rate Codec
GSM Global Systems for Mobile communications
HE-AAC High Efficiency AAC
JPEG Joint Photographic Experts Group
LASeR Lightweight Application Scene Representation
PNG Portable Network Graphics
PMP Portable Multimedia Player
UMTS Universal Mobile Telecommunications System
5 Overview
5.1 Overall procedure of stereoscopic contents
The overall procedure for stereoscopic contents can be explained as follows. Both left and right view
sequences are acquired from a stereoscopic camera for stereoscopic video sequences, and are composited
into a video sequence or two video sequences according to the composition types specified in 5.3. This
composited video sequence is encoded and then stored into an AF.
A file generator for Stereoscopic Video AF is to accept the stereoscopic contents with video, audio and
LASeR streams. The file satisfying the Stereoscopic Video AF is parsed, decoded and then rendered for a
stereoscopic display device.
5.2 Acquisition of the stereoscopic contents
Stereoscopic video sequences are acquired from two cameras, left and right view. As described in Figure 1,
camera parameters shall be needed for specifying spatial relationship between two cameras. The
stereoscopic contents can be rendered in the display device more precisely by using these camera
parameters. The camera parameters shall be described in the ‘scdi’ box, which will be specified in 8.5.

4 © ISO/IEC 2009 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC 23000-11:2009(E)
Y
t = (t ,t ,t )
x y z
(Translation matrix)
Baseline
X
Convergent Distance (Rotation matrix)
R
θ
Cam Cam
1 2
Z Convergence Point

(a) Camera coordinates


(b) Camera arrangements
Figure 1 — Example of camera coordinate and camera arrangement used in Stereoscopic Video AF
Figure 1 (a) shows one example of camera coordinates used in the Stereoscopic Video AF. Cam1 and Cam2
indicate right and left views, respectively. This AF simplifies stereoscopic camera coordinates because it
considers only stereoscopic contents suitable for binocular display system. In order to decrease the number of
camera parameters we assume the coordinates of Cam1 is identical with world coordinates and Cam1 and
Cam2 share X axis. Under these assumptions rotation information indicates relative angle value (θ) from
Cam1 to Cam2 according to Y axis. Baseline distance means relative translation information of origins from
Cam1 to Cam2. In addition, each focal length information is assumed to be identical because stereoscopic
contents with different focal length can produce severe eye strain on binocular display. Figure 1 (b) shows
camera arrangements– parallel arrangement and cross arrangement.
© ISO/IEC 2009 – All rights reserved 5

---------------------- Page: 11 ----------------------
ISO/IEC 23000-11:2009(E)
5.3 Stereoscopic contents composition type
In the current market, there are several stereoscopic composition types such as ‘side-by-side type’, ‘top and
bottom type’, ‘pixel-by-pixel type’, ‘vertical line interleaved type’, ‘frame sequential type’, ‘Left/Right view
sequence type’ and etc.
This specification, considering the wide usage and suitability for the mobile display, considers the composition
types described in 5.3.1 to 5.3.4.
5.3.1 Side-by-side type
Side-by-side type is one of the most widely used stereoscopic composition types. Two respective left view and
right view images are put together into one composition image by making their horizontal resolutions half as
being shown in the Figure 2, which shows one example of side-by-side type when the left (right) view part
locates in the left (right) side of composition image. It can be compressed in conventional bitrates although
there is a quality loss due to the half resolution. In addition, it can be rendered in the legacy player and
implemented without modification of the system.

(a) Side-by-side type stereoscopic sequence

(b) Side-by-side type contents for a real image
Figure 2 — Example of the side-by-side type

6 © ISO/IEC 2009 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/IEC 23000-11:2009(E)
5.3.2 Vertical line interleaved type
A composition image of this type is made of repeated vertical lines of the left view and the right view images
using this type. In the Figure 3, the vertical line of the left view firstly appears and then the vertical line of the
right view follows after it. Due to the discontinuity between every vertical line, the compression efficiency is
relatively poorer compared with other type. This type is supported by the parallax barrier display, which is
most used in the stereoscopic mobile display. The contents can be directly displayed on the parallax barrier
without converting them.

Figure 3 — Example of vertical line interleaved type

5.3.3 Frame sequential type
The frame sequential type is composed of successive left view and right view images as being shown in
Figure 4. Some stereoscopic devices display left and right images sequentially while the other devices such
as parallax barrier display left and right image in the same time and the same screen. If contents have double
frame rate, this type provides full resolution with normal frame rate. In the following example in Figure 4, a left
view precedes a right view.

Figure 4 — Example of frame sequential type

5.3.4 Left/Right view sequence type
This type is composed of the independent elementary streams. For example, one stream represents the left
view images and the other one does the right view images as shown in Figure 5. In this type, respective two
images of left and right view shall be synchronized.

Figure 5 — Example of Left/Right view sequence type
© ISO/IEC 2009 – All rights reserved 7

---------------------- Page: 13 ----------------------
ISO/IEC 23000-11:2009(E)
6 Components of Stereoscopic Video AF
6.1 Supported components
Table 1 shows a brief summary of the supported components of the Stereoscopic Video AF which consists of
the ISO/IEC Standards and non-ISO/IEC Standards .
The Stereoscopic Video AF includes ISO/IEC 14496-2 Simple Profile at Level 3 and ISO/IEC 14496-10
Baseline Profile at Level 1.3 for visual, ISO/IEC 14496-3 AAC and HE-AAC Profile for audio, 3GPP TS 26.071
AMR and TIA/EIA/IS-127 EVRC for voice, ISO/IEC 14496-20 LASeR for scene description, and various kind
of image such as ISO/IEC 10918-1 JPEG and ISO/IEC 15948 PNG. For this specification, ISO/IEC 14496-12
ISO base media file format is used for a base file format structure.
Table 1 — Supported components of Stereoscopic Video AF
Type Component Name Specification Standard
File format ISO base media file format ISO/IEC 14496-12
MPEG-4 Video ISO/IEC 14496-2 Simple Profile Level 3
Visual
MPEG-4 AVC ISO/IEC 14496-10 Baseline Profile Level 1.3
MPEG-4 Audio AAC ISO/IEC 14496-3
Audio
ISO/IEC Standards
MPEG-4 Audio HE-AAC ISO/IEC 14496-3
MPEG-4 LASeR ISO/IEC 14496-20
Data JPEG Image ISO/IEC 10918-1
PNG Image ISO/IEC 15948
AMR 3GPP TS 26.071
Non-ISO/IEC
Voice
Standards
EVRC TIA/EIA/IS-127

6.1.1 ISO base media file format
ISO/IEC 14496-12 ISO base media file format is a flexible, extensible format which contains timed media
information in order to facilitate interchange, management, editing, and presentation of the media as being
shown in Figure 6. The ISO base media file format is a base format for the Stereoscopic Video AF.

ISO file
Media data (mdat)
Movie data (moov)
… other boxes Interleaved, time-ordered,
video and audio frames
trak (video)
trak (audio)

Figure 6 — Example of a simple ISO base media file format
6.1.2 LASeR
ISO/IEC 14496-20 LASeR is a scene description format that specifies various aspects of 2D scene
representation and updates of scenes as a part of rich media content. A scene description is composed of
graphics, animation, text, and spatial and temporal layout. The LASeR is designed to be suitable for
lightweight embedded devices such as mobile phones and PMPs. The LASeR is used for supporting enrich
interactive and 2D combined stereoscopic contents services.
8 © ISO/IEC 2009 – All rights reserved

---------------------- Page: 14 ----------------------
ISO/IEC 23000-11:2009(E)
6.1.3 AMR
3GPP TS 26.071 AMR is an audio data compression scheme optimized for speech coding. The AMR was
adopted as the standard speech codec by 3GPP in October 1998 and is now widely used in GSM and UMTS.
It uses link adaptation to select from one of eight different bit rates based on link conditions.
6.1.4 EVRC
TIA/EIA/IS-127 EVRC is a speech codec used in CDMA networks. It was developed in 1995 to replace the
QCELP vocoder which used more bandwidth on the carrier's network, thus EVRC's primary goal was to offer
the mobile carriers more capacity on their networks while not increasing the amount of bandwidth or wireless
spectrum needed.
7 File structures
7.1 Table for boxes
The Stereoscopic Video AF contains various boxes based on the ISO base media file format. It provides new
boxes such as the Stereoscopic Video Media Information (‘svmi’) and the Stereoscopic Camera and
Display Information (‘scdi’).
The normative file structure consists of ‘ftyp’‚ ‘moov’ and ‘mdat’ boxes. Mandatory boxes are marked
with an asterisk (*).
The ‘ftyp’ box indicates the type of the file format which complies to the structure defined for the
Stereoscopic Video AF. Thus, an application should be able to play Stereoscopic Video AF files when it
supports the brands of ‘ftyp’ box field. A detailed description of the brands of Stereoscopic Video AF is
provided in 8.1.
The ‘moov’ box contains one or more tracks for stereoscopic video sequences, a track for LASeR streams,
and also contain tracks for audio, images, text and metadata.
The ‘trak’ boxes contain temporal and spatial information of the media data (e.g. stereoscopic video
sequences, stereo-monoscopic mixed video sequences, LASeR streams, JPEG images). For Stereoscopic
Video AF, each track contains its associated ’mdia’ box, a ‘tref’ box and a track level ‘meta’ box.
The ‘mdia’ box contains a ‘svmi’ box for the stereoscopic visual type and fragment information of the
stereoscopic contents in the track.
The ‘tr
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.