Information technology — Coding of audio-visual objects — Part 11: Scene description and application engine

ISO/IEC 14496-11:2015 specifies: 1. the coded representation of the spatio-temporal positioning of audio-visual objects as well as their behavior in response to interaction (scene description); 2. the Extensible MPEG-4 Textual (XMT) format, a textual representation of the multimedia content described in ISO/IEC 14496 using the Extensible Markup Language (XML); and 3. a system level description of an application engine (format, delivery, lifecycle, and behavior of downloadable Java byte code applications).

Technologies de l'information — Codage des objets audiovisuels — Partie 11: Description de scène et moteur d'application

General Information

Status
Published
Publication Date
01-Nov-2015
Current Stage
9093 - International Standard confirmed
Start Date
23-Jun-2021
Completion Date
23-Jun-2021
Ref Project

RELATIONS

Buy Standard

Standard
ISO/IEC 14496-11:2015 - Information technology -- Coding of audio-visual objects
English language
547 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (sample)

INTERNATIONAL ISO/IEC
STANDARD 14496-11
Second Edition
2015-11-01
Information technology — Coding of
audio-visual objects —
Part 11:
Scene description and application engine
Technologies de l'information — Codage des objets audiovisuels —
Partie 11: Description de scène et moteur d'application
Reference number
ISO/IEC 14496-11:2015(E)
ISO/IEC 2015
---------------------- Page: 1 ----------------------
ISO/IEC 14496-11:2015(E)
PDF disclaimer

This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but

shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In

downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat

accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.

Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation

parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In

the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

© ISO/IEC 2015

All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,

electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or

ISO's member body in the country of the requester.
ISO copyright office
Case postale 56  CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO/IEC 2015 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/IEC 14496-11:2015(E)
Contents Page

Foreword ............................................................................................................................................................. v

0 Introduction .......................................................................................................................................... vii

0.1 Scene Description ............................................................................................................................... vii

0.2 Extensible MPEG-4 Textual Format .................................................................................................... ix

0.3 MPEG-J .................................................................................................................................................. ix

1 Scope ...................................................................................................................................................... 1

2 Normative references ............................................................................................................................ 1

3 Additional reference .............................................................................................................................. 2

4 Terms and definitions ........................................................................................................................... 2

5 Abbreviations and Symbols ................................................................................................................. 7

6 Conventions ........................................................................................................................................... 7

7 MPEG-4 Systems Node Semantics ...................................................................................................... 8

7.1 Scene Description ................................................................................................................................. 8

7.2 Node Semantics ................................................................................................................................... 24

7.3 Informative: Differences Between MPEG-4 Scripts and ECMA Scripts ....................................... 181

7.4 Informative: FlexTime behavior ....................................................................................................... 182

7.5 Informative: Implementation of MaterialKey node ......................................................................... 183

7.6 Informative: Example implementation of spatial audio processing (perceptual approach) ..... 184

7.7 Informative: MPEG-4 Audio TTS application with Facial Animation ............................................ 188

7.8 Informative: 3D Mesh Coding in BIFS scenes ................................................................................ 188

7.9 Profiles................................................................................................................................................ 189

7.10 Metric information for resident fonts .............................................................................................. 220

7.11 Font metrics for SANS SERIF font (Albany) ................................................................................... 221

7.12 Font metrics for SERIF font (Thorndale) ......................................................................................... 227

7.13 Font metrics for TYPEWRITER font (Cumberland) ........................................................................ 234

8 BIFS ..................................................................................................................................................... 242

8.1 Introduction ........................................................................................................................................ 242

8.2 Decoding tables, data structures and associated functions ........................................................ 242

8.3 Quantization ....................................................................................................................................... 247

8.4 Compensation process ..................................................................................................................... 257

8.5 BIFS Configuration ............................................................................................................................ 258

8.6 BIFS Command Syntax ..................................................................................................................... 262

8.7 BIFS Scene ......................................................................................................................................... 274

8.8 BIFS-Anim .......................................................................................................................................... 305

8.9 Interpolator compression ................................................................................................................. 310

8.10 Definition of bodySceneGraph nodes ............................................................................................. 349

8.11 Adaptive Arithmetic Decoder for BIFS-Anim .................................................................................. 357

8.12 Informative : Adaptive Arithmetic Encoder for BIFS-Anim ........................................................... 359

8.13 View Dependent Object Scalability .................................................................................................. 360

9 The Extensible MPEG-4 Textual Format ......................................................................................... 381

9.1 Introduction ........................................................................................................................................ 381

9.2 XMT-A Format .................................................................................................................................... 381

9.3 XMT-Ω Format ................................................................................................................................... 433

9.4 XMT-C Modules .................................................................................................................................. 478

9.5 XMT Schemas .................................................................................................................................... 486

9.6 Informative: XMT/X3D Compatibility ............................................................................................... 486

9.7 Informative: The usage of XMT-A BitWrapper element in authoring side ................................... 487

© ISO/IEC 2015 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO/IEC 14496-11:2015(E)

10 MPEG-J .............................................................................................................................................. 500

10.1 Architecture ....................................................................................................................................... 500

10.2 MPEG-J Session ............................................................................................................................... 502

10.3 Delivery of MPEG-J Data .................................................................................................................. 503

10.4 MPEG-J API List ................................................................................................................................ 506

10.5 Informative: Starting the Java Virtual Machine ............................................................................. 512

10.6 Informative: Examples of MPEG-J API usage ................................................................................ 513

Annex A (normative) Curve-based animators ............................................................................................. 522

Annex B (normative) Procedural textures algorithms ................................................................................ 525

Annex C (informative) Text Processing in BIFS .......................................................................................... 530

Annex D (informative) Patent statements .................................................................................................... 532

Bibliography ................................................................................................................................................... 533

iv © ISO/IEC 2015 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/IEC 14496-11:2015(E)
Foreword

ISO (the International Organization for Standardization) and IEC (the International Electrotechnical

Commission) form the specialized system for worldwide standardization. National bodies that are members of

ISO or IEC participate in the development of International Standards through technical committees

established by the respective organization to deal with particular fields of technical activity. ISO and IEC

technical committees collaborate in fields of mutual interest. Other international organizations, governmental

and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information

technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of the joint technical committee is to prepare International Standards. Draft International

Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as

an International Standard requires approval by at least 75 % of the national bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent

rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.

ISO/IEC 14496-11 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information Technology,

Subcommittee SC 29, Coding of Audio, Picture, Multimedia and Hypermedia Information.

This second edition cancels and replaces the first edition, which has been technically revised.

ISO/IEC 14496 consists of the following parts, under the general title Information technology — Coding of

audio-visual objects:
— Part 1: Systems
— Part 2: Visual
— Part 3: Audio
— Part 4: Conformance testing
— Part 5: Reference software
— Part 6: Delivery Multimedia Integration Framework (DMIF)

— Part 7: Optimized reference software for coding of audio-visual objects [Technical Report]

— Part 8: Carriage of ISO/IEC 14496 contents over IP networks
— Part 9: Reference hardware description [Technical Report]
— Part 10: Advanced Video Coding
— Part 11: Scene description and application engine
— Part 12: ISO base media file format
— Part 13: Intellectual Property Management and Protection (IPMP) extensions
— Part 14: MP4 file format
© ISO/IEC 2015 – All rights reserved v
---------------------- Page: 5 ----------------------
ISO/IEC 14496-11:2015(E)
— Part 15: Advanced Video Coding (AVC) file format
— Part 16: Animation Framework eXtension (AFX)
— Part 17: Streaming text format
— Part 18: Font compression and streaming
— Part 19: Synthesized texture stream

— Part 20: Lightweight Application Scene Representation (LASeR) and Simple Aggregation Format (SAF)

— Part 21: MPEG-J GFX
vi © ISO/IEC 2015 – All rights reserved
---------------------- Page: 6 ----------------------
ISO/IEC 14496-11:2015(E)
Introduction
1.1 Scene Description
1.1.1 Overview

ISO/IEC 14496 addresses the coding of audio-visual objects of various types: natural video and audio objects as well as

textures, text, 2- and 3-dimensional graphics, and also synthetic music and sound effects. To reconstruct a multimedia

scene at the terminal, it is hence not sufficient to transmit the raw audio-visual data to a receiving terminal. Additional

information is needed in order to combine this audio-visual data at the terminal and construct and present to the end-user

a meaningful multimedia scene. This information, called scene description, determines the placement of audio-visual

objects in space and time and is transmitted together with the coded objects as illustrated in Figure 1. Note that the scene

description only describes the structure of the scene. The action of assembling these objects in the same representation

space is called composition. The action of transforming these audio-visual objects from a common representation space to

a specific presentation device (i.e. speakers and a viewing window) is called rendering.

audiovisual
voice
audiovisual
presentation
sprite
multiplexed
downstream control / data
2D background
multiplexed
upstream control / data
3D objects
scene
coordinate
system
user events
video audio
compositor
compositor
projection
plane
hypothetical viewer
display
speaker
user input
Figure 1 — An example of an object-based multimedia scene

Independent coding of different objects may achieve higher compression, and also brings the ability to manipulate content

at the terminal. The behaviors of objects and their response to user inputs can thus also be represented in the scene

description.

The scene description framework used in this part of ISO/IEC 14496 is based largely on ISO/IEC 14772-1:1998 (Virtual

Reality Modeling Language – VRML).
1.1.2 Composition and Rendering

ISO/IEC 14496-11 defines the syntax and semantics of bitstreams that describe the spatio-temporal relationships of audio-

visual objects. For visual data, particular composition algorithms are not mandated since they are implementation-

dependent; for audio data, subclause 7.1.1.2.13 and the semantics of the AudioBIFS nodes normatively define the

composition process. The manner in which the composed scene is presented to the user is not specified for audio or

visual data. The scene description representation is termed “BInary Format for Scenes” (BIFS).

© ISO/IEC 2015 – All rights reserved vii
---------------------- Page: 7 ----------------------
ISO/IEC 14496-11:2015(E)
1.1.3 Scene Description

In order to facilitate the development of authoring, editing and interaction tools, scene descriptions are coded

independently from the audio-visual media that form part of the scene. This permits modification of the scene without

having to decode or process in any way the audio-visual media. The following clauses detail the scene description

capabilities that are provided by ISO/IEC 14496-11.
1.1.3.1 Grouping of audio-visual objects

A scene description follows a hierarchical structure that can be represented as a graph. Nodes of the graph form audio-

visual objects, as illustrated in Figure 2. The structure is not necessarily static; nodes may be added, deleted or be

modified.
scene
person 2D background furniture audiovisual
presentation
voice sprit globe desk
Figure 2 — Logical structure of example scene
1.1.3.2 Spatio-Temporal positioning of objects

Audio-visual objects have both a spatial and a temporal extent. Complex audio-visual objects are constructed by

combining appropriate scene description nodes to build up the scene graph. Audio-visual objects may be located in 2D or

3D space. Each audio-visual object has a local co-ordinate system. A local co-ordinate system is one in which the audio-

visual object has a pre-defined (but possibly varying) spatio-temporal location and scale (size and orientation). Audio-

visual objects are positioned in a scene by specifying a co-ordinate transformation from the object’s local co-ordinate

system into another co-ordinate system defined by a parent node in the scene graph.

1.1.3.3 Attributes of audio-visual objects

Scene description nodes expose a set of parameters through which aspects of their appearance and behavior can be

controlled.

EXAMPLE  the volume of a sound; the color of a synthetic visual object; the source of a streaming video.

1.1.3.4 Behavior of audio-visual objects

ISO/IEC 14496-11 provides tools for enabling dynamic scene behavior and user interaction with the presented content.

User interaction can be separated into two major categories: client-side and server-side. Client-side interaction is an

integral part of the scene description described herein. Server-side interaction is not dealt with.

Client-side interaction involves content manipulation that is handled locally at the end-user’s terminal. It consists of the

modification of attributes of scene objects according to specified user actions.

EXAMPLE  A user can click on a scene to start an animation or video sequence. The facilities for describing such

interactive behavior are part of the scene description, thus ensuring the same behavior in all terminals conforming to

ISO/IEC 14496-11.
viii © ISO/IEC 2015 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/IEC 14496-11:2015(E)
1.2 Extensible MPEG-4 Textual Format
1.2.1 Overview

The Extensible MPEG-4 Textual format (XMT) is a framework (illustrated in Figure 3) for representing MPEG-4 scene

description using a textual syntax. The XMT allows the content authors to exchange their content with other authors, tools

or service providers, and facilitates interoperability with both the Extensible 3D (X3D) being developed by the Web3D and

the Synchronized Multimedia Integration Language (SMIL) from the W3C.
SM IL P la y e r
Pa rse
SM IL
C o m p ile
SV G
VR M L
Bro w ser
XM T
MPEG -7
MPEG -4
R e pre se nta tion
MPEG -4
( e .g . m p 4 )
Pla y er
X3D
Figure 3 — Overview of the XMT Framework
1.2.2 Interoperability of XMT

The XMT format can be interchangeable between SMIL players, VRML players, and MPEG-4 players. The format can be

parsed and played directly by a W3C SMIL player, preprocessed to Web3D X3D and played back by a VRML player, or

compiled to an MPEG-4 representation such as MP4, which can then be played by an MPEG-4 player. See below for a

graphical description of interoperability of the XMT.
1.2.3 Two-tier Architecture: XMT-A and XMT-Ω Formats

The XMT framework consists of two levels of textual syntax and semantics: the XMT-A format and the XMT-Ω format,

which we will abbreviate by A and Ω, respectively, and use them interchangeably where there is no confusion.

The XMT-A is an XML-based version of MPEG-4 content, which contains a subset of the X3D. Also contained in XMT-A

is an MPEG-4 extension to the X3D to represent MPEG-4 specific features. The XMT-A provides a straightforward, one-

to-one mapping between the textual and binary formats.

The XMT-Ω is a high-level abstraction of MPEG-4 features designed based on the W3C SMIL. The XMT provides a

default mapping from Ω to A, for there is no deterministic mapping between the two, and it also provides content authors

with an escape mechanism from Ω to A.

In addition an XMT-C (Common) section contains the definition of elements and attributes that may be used within either

XMT-A or XMT-Ω.
1.3 MPEG-J
1.3.1 Overview

MPEG-J is a flexible programmatic control system that represents an audio-visual session in a manner that allows the

session to adapt to the operating characteristics when presented at the terminal. Two important characteristics are

supported: first, the capability to allow graceful degradation under limited or time varying resources, and second, the ability

to respond to user interaction and provide enhanced multimedia functionality.
More specifically, 9.7 normatively defines:
© ISO/IEC 2015 – All rights reserved ix
---------------------- Page: 9 ----------------------
ISO/IEC 14496-11:2015(E)

The format and delivery of Java byte code by specifying the MPEG-J stream format and the delivery mechanism of

such a stream (Java byte code and associated data);
The MPEG-J Session and the MPEG-J application lifecycle; and

The interactions and behavior of byte code through the specification of Java APIs.

1.3.2 Organization MPEG-J specification

10.1 gives an overall architecture of the MPEG-J system. MPEG-J Session start-up is walked through in 10.2. The

Delivery of MPEG-J data to the terminal is specified in 10.3. 10.4 specifies the different categories of APIs that a program

in the form of Java bytecode would use. 10.5 is an informative annex on starting the Java Virtual Machine. The electronic

annex attached to this document lists the normative MPEG-J APIs in the HTML format. 10.6 illustrates the usage of

MPEG-J APIs through a few examples.
----------------------

The International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC) draw

attention to the fact that it is claimed that compliance with this document may involve the use of patents.

The ISO and IEC take no position concerning the evidence, validity and scope of these patent rights.

The holder of these patent rights have assured the ISO and IEC that they are willing to negotiate licences under

reasonable and non-discriminatory terms and conditions with applicants throughout the world. In this respect, the

statement of the holder of this patent right is registered with the ISO and IEC. Information may be obtained from the

companies listed in Annex D.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights other

than those identified in Annex D. ISO and IEC shall not be held responsible for identifying any or all such patent rights.

x © ISO/IEC 2015 – All rights reserved
---------------------- Page: 10 ----------------------
INTERNATIONAL STANDARD ISO/IEC 14496-11:2015(E)
Information technology — Coding of audio-visual objects —
Part 11:
Scene description and application engine
1. Scope
This part of ISO/IEC 14496 specifies:

1. the coded representation of the spatio-temporal positioning of audio-visual objects as well as their behavior in

response to interaction (scene description);

2. the Extensible MPEG-4 Textual (XMT) format, a textual representation of the multimedia content described in

ISO/IEC 14496 using the Extensible Markup Language (XML); and

3. a system level description of an application engine (format, delivery, lifecycle, and behavior of downloadable Java byte

code applications).
2. Normative references

The following referenced documents are indispensable for the application of this document. For dated references, only the

edition cited applies. For undated references, the latest edition of the referenced document (including any amendments)

applies.

ISO 639-2:1998, Codes for the representation of names of languages — Part 2: Alpha-3 code

ISO 3166-1:1997, Codes for the representation of names of countries and their subdivisions — Part 1: Country codes

ISO 9613-1:1993, Acoustics — Attenuation of sound during propagation outdoors — Part 1: Calculation of the absorption

of sound by the atmosphere

ISO/IEC 11172-2:1993, Information technology — Coding of moving pictures and associated audio for digital storage

media at up to about 1,5 Mbit/s — Part 2: Video

ISO/IEC 11172-3:1993, Information technology — Coding of moving pictures and associated audio for digital storage

media at up to about 1,5 Mbit/s — Part 3: Audio

ISO/IEC 13818-3:1998, Information technology — Generic coding of moving pictures and associated audio information —

Part 3: Audio

ISO/IEC 13818-7: 2004, Information technology — Generic coding of moving pictures and associated audio information —

Part 7: Advanced Audio Coding (AAC)

ISO/IEC 14496-2:2004, Information technology — Coding of audio-visual objects — Part 2: Visual

ISO/IEC 14772-1:1997, Information technology — Computer graphics and image processing — The Virtual Reality

Modeling Language — Part 1: Functional specification and UTF-8 encoding

ISO/IEC 14772-1:1997/Amd.1:2003, Information technology — Computer graphics and image processing — The Virtual

Reality Modeling Language — Part 1: Functional specification and UTF-8 encoding — Amendment 1: Enhanced

interoperability
ISO/IEC 16262:2002, Information technology — ECMAScript language specification

ISO/IEC 13818-2:2000, Information technology — Generic coding of moving pictures and associated audio information —

Part 2: Video

ISO/IEC 10918-1:1994, Information technology — Digital compression and coding of continuous-tone still images:

Requirements and guidelines
IEEE Std 754-1985, Standard for Binary Floating-Point Arithmetic

Addison-Wesley:September 1996, The Java Language Specification, by James Gosling, Bill Joy and Guy Steele, ISBN 0-

201-63451-1

Addison-Wesley:September 1996, The Java Virtual Machine Specification, by T. Lindholm and F. Yellin, ISBN 0-201-

63452-X

Addison-Wesley:July 1998, Java Class Libraries Vol. 1 The Java Class Libraries, Second Edition Volume 1, by Patrick

Chan, Rosanna Lee and Douglas Kramer, ISBN 0-201-31002-3
© ISO/IEC 2015 – All rights reserved 1
---------------------- Page: 11 ----------------------
ISO/IEC 14496-11:2015(E)

Addison-Wesley:July 1998, Java Class Libraries Vol. 2 The Java Class Libraries, Second Edition Volume 2, by Patrick

Chan and Rosanna Lee, ISBN 0-201-31003-1

Addison-Wesley, May 1996, Java API, The Java Application Programming Interface, Volume 1: Core Packages, by

J. Gosling, F. Yellin and the Java Team, ISBN 0-201-63453-8
DAVIC 1.4.1 specification Part 9: Information Representation
ANSI/SMPTE 291M-1996, Television — Ancillary Data Packet and Space Formatting

SMPTE 315M -1999, Television — Camera Positioning Information Conveyed by Ancillary Data Packets

3. Additional reference

ISO/IEC 13522-6:1998, Information technology — Coding of multimedia and hypermedia information — Part 6: Support for

enhanced interactive applications. This reference contains the full normative references to Java APIs and the Java Virtual

Machine as described in the normative references above.
4. Terms and definitions
For the purposes of this document, the following terms and definitions apply.
4.1
Access Unit (AU)
individually accessible portion of data within an elementary stream

NOTE An access unit is the smallest data entity to which timing information can be attributed.

4.2
Alpha map
representation of the transparency parameters associated with a texture map
4.3
audio-visual object

representation of a natural or synthetic object that has an audio and/or visual manifestation

NOTE The representation corresponds to a node or a group of nodes in the BIFS scene description. Each audio-

visual object is associated with zero or more elementary streams using one or more object descriptors.

4.4
audio-visual scene (AV scene)

set of audio-visual objects together with scene description information that defines their spatial and temporal attributes

including behaviors resulting from object and user interactions
4.5
Binary Format for Scene (BIFS)
coded representation of a parametric scene description format
4.6
buffer model

model that defines how a terminal complying with ISO/IEC 14496 manages the buffer resources that are needed to

decode a presentation
4.7
byte aligned

position in a coded bit stream with a distance of a multiple of 8-bits from the first bit in the stream

4.8
clock reference
special time stamp that conveys a reading of a time base
4.9
composition

process of applying scene description information in order to identify the spatio-temporal attributes and hierarchies of

audio-visual objects
4.10
Composition Memory (CM)
random access memory that contains composition units
2 © ISO/IEC 2015
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.