ISO/IEC 15938-3:2002
(Main)Information technology — Multimedia content description interface — Part 3: Visual
Information technology — Multimedia content description interface — Part 3: Visual
The structure of this document is as follows. Clauses 2-4 specify the terms, abbreviations, symbols and conventions used throughout the document. Clauses 5-11 contain definitions of the description tools standardized by 15938-3 grouped by the visual features they are associated with, starting with basic structures and containers in Clause 5, through color, texture, shape, motion, localization in Clause 10. Clause 11 contains the remaining, unclassified items. Each description tool is described by the following subclauses: • Syntax: Normative DDL specification of the Ds or DSs. • Binary Syntax: Normative binary representation of the Ds or DSs. • Semantic: Normative definition of the semantics of all the components of the corresponding D or DS.
Technologies de l'information — Interface de description du contenu multimédia — Partie 3: Visuel
General Information
- Status
- Published
- Publication Date
- 19-Jun-2002
- Current Stage
- 9060 - Close of review
- Completion Date
- 04-Mar-2029
Relations
- Effective Date
- 06-Jun-2022
- Effective Date
- 06-Jun-2022
- Effective Date
- 06-Jun-2022
- Effective Date
- 26-Jun-2021
- Effective Date
- 15-Apr-2008
- Effective Date
- 15-Apr-2008
- Effective Date
- 15-Apr-2008
Overview
ISO/IEC 15938-3:2002 is Part 3 of the ISO/IEC 15938 series (commonly known as MPEG-7) and defines the visual component of the Multimedia Content Description Interface. It standardizes a set of descriptors, description schemes (DS) and datatypes for describing visual features of multimedia content (still images, video and 3D models). Each tool in the document is specified by a DDL (Description Definition Language) syntax, a normative binary syntax, and precise semantic definitions to ensure interoperable metadata exchange.
Key topics
- Visual description tools organized by feature groups: basic structures and containers, color, texture, shape, motion, and localization.
- Descriptor structure: every descriptor/DS includes normative Syntax, Binary Syntax, and Semantic subclauses.
- Basic primitives: grid layout, temporal interpolation, spatial coordinates, multiple-view and time-series containers.
- Color tools: color space, quantization, dominant/scalable color, color layout, color structure, GoF/GoP color descriptors.
- Texture tools: homogeneous texture, texture browsing, edge histogram descriptors.
- Shape tools: region shape, contour shape, 3D shape descriptors.
- Motion tools: camera motion, motion trajectory, parametric motion, motion activity.
- Localization tools: region locator and spatio-temporal locator for precise content positioning.
- Annexes: normative basis functions for face recognition and binary representation of media-time tools; informative patent statements.
Applications
ISO/IEC 15938-3 provides standardized metadata building blocks for:
- Content-based image and video retrieval (visual search and indexing).
- Multimedia asset management and cataloging for archives and broadcasters.
- Automated analysis: feature extraction for indexing, summarization, and scene understanding.
- Interoperable exchange of visual metadata between tools, players, and delivery systems.
- Research and development in computer vision, multimedia retrieval and metadata tooling.
Who benefits: metadata engineers, multimedia software developers, video search platform designers, broadcasters, digital-asset managers, and standards implementers who need a formal, interoperable way to describe visual content.
Related standards
- ISO/IEC 15938 (MPEG-7) family:
- Part 1: Systems
- Part 2: Description Definition Language (DDL)
- Part 4: Audio
- Part 5: Multimedia description schemes
- Part 6–8: Reference software, conformance testing, extraction/use guidelines
Keywords: ISO/IEC 15938-3, MPEG-7 Visual, multimedia metadata, visual descriptors, DDL, binary syntax, color descriptors, texture descriptors, motion descriptors, localization.
Get Certified
Connect with accredited certification bodies for this standard

BSI Group
BSI (British Standards Institution) is the business standards company that helps organizations make excellence a habit.

NYCE
Mexican standards and certification body.
Sponsored listings
Frequently Asked Questions
ISO/IEC 15938-3:2002 is a standard published by the International Organization for Standardization (ISO). Its full title is "Information technology — Multimedia content description interface — Part 3: Visual". This standard covers: The structure of this document is as follows. Clauses 2-4 specify the terms, abbreviations, symbols and conventions used throughout the document. Clauses 5-11 contain definitions of the description tools standardized by 15938-3 grouped by the visual features they are associated with, starting with basic structures and containers in Clause 5, through color, texture, shape, motion, localization in Clause 10. Clause 11 contains the remaining, unclassified items. Each description tool is described by the following subclauses: • Syntax: Normative DDL specification of the Ds or DSs. • Binary Syntax: Normative binary representation of the Ds or DSs. • Semantic: Normative definition of the semantics of all the components of the corresponding D or DS.
The structure of this document is as follows. Clauses 2-4 specify the terms, abbreviations, symbols and conventions used throughout the document. Clauses 5-11 contain definitions of the description tools standardized by 15938-3 grouped by the visual features they are associated with, starting with basic structures and containers in Clause 5, through color, texture, shape, motion, localization in Clause 10. Clause 11 contains the remaining, unclassified items. Each description tool is described by the following subclauses: • Syntax: Normative DDL specification of the Ds or DSs. • Binary Syntax: Normative binary representation of the Ds or DSs. • Semantic: Normative definition of the semantics of all the components of the corresponding D or DS.
ISO/IEC 15938-3:2002 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.40 - Coding of audio, video, multimedia and hypermedia information. The ICS classification helps identify the subject area and facilitates finding related standards.
ISO/IEC 15938-3:2002 has the following relationships with other standards: It is inter standard links to ISO/IEC 15938-3:2002/Amd 2:2006, ISO/IEC 15938-3:2002/Amd 1:2004, ISO/IEC 15938-3:2002/Amd 3:2009, ISO/IEC 15938-3:2002/Amd 4:2010; is excused to ISO/IEC 15938-3:2002/Amd 2:2006, ISO/IEC 15938-3:2002/Amd 1:2004, ISO/IEC 15938-3:2002/Amd 3:2009. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
ISO/IEC 15938-3:2002 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.
Standards Content (Sample)
INTERNATIONAL ISO/IEC
STANDARD 15938-3
First edition
2002-05-15
Information technology — Multimedia
content description interface —
Part 3:
Visual
Technologies de l'information — Interface de description du contenu
multimédia —
Partie 3: Visuel
Reference number
©
ISO/IEC 2002
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be
edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file,
parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters
were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that
a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO/IEC 2002
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or
mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body in the
country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.ch
Web www.iso.ch
Printed in Switzerland
ii © ISO/IEC 2002 – All rights reserved
Contents Page
Foreword.v
Introduction.vi
1 Scope .1
1.1 Organization of the document.1
1.2 Overview of Visual Description Tools .1
2 Terms and Definitions.2
2.1 Default reference axis .2
2.2 DCT coefficients .2
2.3 Data element .3
3 Abbreviations and Symbols .3
3.1 General.3
3.2 Abbreviations.3
3.3 Arithmetic operators .3
3.4 Logical operators.3
3.5 Relational operators.3
3.6 Bitwise operators.4
3.7 Conditional operator .4
3.8 Assignment .4
3.9 Mnemonics .4
3.10 Constants .4
3.11 Functions.4
4 Conventions .5
4.1 Method of describing the DDL representation syntax.5
4.2 Method of describing the binary representation syntax .5
4.3 Method of describing the descriptor semantics .8
5 Basic structures.8
5.1 Introduction.8
5.2 Grid layout.8
5.3 Time series .11
5.4 Multiple view .15
5.5 Spatial 2D coordinates.16
5.6 Temporal interpolation.23
6 Color.29
6.1 Introduction.29
6.2 Color space .29
6.3 Color quantization .33
6.4 Dominant color .35
6.5 Scalable color .37
6.6 Color layout.42
6.7 Color structure.50
6.8 GoF/GoP Color.56
7 Texture.57
7.1 Introduction.57
7.2 Homogeneous texture.57
7.3 Texture browsing.61
7.4 Edge histogram.63
8 Shape .66
8.1 Introduction.66
© ISO/IEC 2002 – All rights reserved iii
8.2 Region shape .66
8.3 Contour shape.68
8.4 Shape 3D.71
9 Motion .73
9.1 Introduction.73
9.2 Camera motion.73
9.3 Motion trajectory.81
9.4 Parametric motion .84
9.5 Motion activity.87
10 Localization .92
10.1 Introduction.92
10.2 Region locator.92
10.3 Spatio-temporal locator .96
11 Others .103
11.1 Introduction.103
11.2 Face recognition .103
Annex A (normative) Basis functions for FaceRecognition .105
A.1 Basis matrix.105
A.2 Mean face .169
Annex B (normative) Binary representation of media time tools.171
B.1 Introduction .171
B.2 Binary representation syntax.172
B.3 Descriptor components semantics .173
Annex C (informative) Patent statements .174
iv © ISO/IEC 2002 – All rights reserved
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission)
form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC
participate in the development of International Standards through technical committees established by the respective
organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields
of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and
IEC, also take part in the work. In the field of information technology, ISO and IEC have established a joint technical
committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3.
The main task of the joint technical committee is to prepare International Standards. Draft International Standards
adopted by the joint technical committee are circulated to national bodies for voting. Publication as an International
Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this part of ISO/IEC 15938 may be the subject of
patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
ISO/IEC 15938-3 was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
ISO/IEC 15938 consists of the following parts, under the general title Information technology — Multimedia content
description interface:
— Part 1: Systems
— Part 2: Description definition language
— Part 3: Visual
— Part 4: Audio
— Part 5: Multimedia description schemes
— Part 6: Reference software
— Part 7: Conformance testing
— Part 8: Extraction and use of MPEG-7 descriptions
Annexes A and B form a normative part of this part of ISO/IEC 15938. Annex C is for information only.
© ISO/IEC 2002 – All rights reserved v
Introduction
This standard, also known as "Multimedia Content Description Interface," provides a standardized set of technologies for
describing multimedia content. The standard addresses a broad spectrum of multimedia applications and requirements by
providing a metadata system for describing the features of multimedia content.
The following are specified in this standard:
• Description Schemes (DS) describe entities or relationships pertaining to multimedia content. Description Schemes specify
the structure and semantics of their components, which may be Description Schemes, Descriptors, or datatypes.
• Descriptors (D) describe features, attributes, or groups of attributes of multimedia content.
• Datatypes are the basic reusable datatypes employed by Description Schemes and Descriptors
• Description Definition Language (DDL) defines Description Schemes, Descriptors, and Datatypes by specifying their
syntax, and allows their extension.
• Systems tools support delivery of descriptions, multiplexing of descriptions with multimedia content, synchronization, file
format, and so forth.
This standard is subdivided into eight parts:
Part 1 – Systems: specifies the tools for preparing descriptions for efficient transport and storage, compressing descriptions, and
allowing synchronization between content and descriptions.
Part 2 – Description definition language: specifies the language for defining the standard set of description tools (DSs, Ds, and
datatypes) and for defining new description tools.
Part 3 – Visual: specifies the description tools pertaining to visual content.
Part 4 – Audio: specifies the description tools pertaining to audio content.
Part 5 – Multimedia description schemes: specifies the generic description tools pertaining to multimedia including audio and
visual content.
Part 6 – Reference software: provides a software implementation of the standard.
Part 7 – Conformance testing: specifies the guidelines and procedures for testing conformance of implementations of the
standard.
Part 8 – Extraction and use of MPEG-7 descriptions: provides guidelines and examples of the extraction and use of
descriptions.
This document contains the visual elements (Descriptors and Description Schemes) that are considered for being part of the
standard. All these Descriptive Structures are classified according to the types of visual features they describe. For each
Descriptive Structure, there is one corresponding section in this document. The section specifies textual and binary syntax and
semantics of the structures.
vi © ISO/IEC 2002 – All rights reserved
INTERNATIONAL STANDARD ISO/IEC 15938-3:2002(E)
Information technology — Multimedia content description
interface —
Part 3:
Visual
1 Scope
1.1 Organization of the document
The structure of this document is as follows. Clauses 2-4 specify the terms, abbreviations, symbols and conventions used
throughout the document. Clauses 5-11 contain definitions of the description tools standardized by 15938-3 grouped by the visual
features they are associated with, starting with basic structures and containers in Clause 5, through color, texture, shape, motion,
localization in Clause 10. Clause 11 contains the remaining, unclassified items.
Each description tool is described by the following subclauses:
• Syntax: Normative DDL specification of the Ds or DSs.
• Binary Syntax: Normative binary representation of the Ds or DSs.
• Semantic: Normative definition of the semantics of all the components of the corresponding D or DS.
1.2 Overview of Visual Description Tools
This part of ISO/IEC 15938 specifies tools for description of visual content, including still images, video and 3D models. These
tools are defined by their syntax in DDL and binary representations and semantics associated with the syntactic elements. They
enable description of the visual features of the visual material, such as color, texture, shape and motion, as well as localization of
the described objects in the image or video sequence. An overview of the visual description tools is shown in Figure 1.
The basic structure description tools include five supporting tools of visual descriptions defined in clauses 6−11. They are
categorized into two groups, descriptor containers and basic supporting tools. The former consists of three datatypes, GridLayout
providing efficient representations of visual features on grids, TimeSeries representing temporal arrays of several descriptions,
and MultipleView describing a 3D object using several pictures captured from different view angles. The latter contains two tools,
Spatial2DCoordinateSystem used to specify the 2D coordinate system and TemporalInterpolation indicating the interpolation
method between two samples on a time axis.
The remaining description tools, except for the FaceRecognition descriptor, are associated with visual features and are grouped
into five feature categories: Color, Texture, Shape, Motion and Localization.
The color description tools include four color descriptors to represent different aspects of color features: representative colors
(DominantColor), color distribution (ScalableColor), spatial distribution of colors (ColorLayout and ColorStructure). It also contains
two supporting tools, ColorSpace and ColorQuantization used in DominantColor and an extension of ScalableColor to a group of
frames or pictures (GoFGoPColor). All the color descriptors can be extracted from arbitrarily shaped regions.
The texture description tools facilitate browsing (TextureBrowsing) and similarity retrieval (HomogeneousTexture and
EdgeHistogram) using the texture of a still or moving image region. All the texture descriptors can be extracted from arbitrarily
shaped regions.
The shape description tools include two descriptors that characterize different shape features of a 2D object or region. The
RegionShape descriptor captures the distribution of all pixels within a region and the Contour Shape descriptor characterizes the
shape properties of the contour of an object. The Shape3D descriptor provides an intrinsic shape characterization of 3D mesh
models.
The motion description tools include four descriptors that characterize various aspects of motion. The CameraMotion descriptor
specifies a set of basic camera operations such as, for example, panning and tilting. The motion of a key point (pixel) from a
moving object or region can be characterized by the MotionTrajectory descriptor. The ParametricMotion descriptor characterizes
an evolution of an arbitrarily shaped region over time in terms of a 2D geometric transformation. Finally, the MotionActivity
descriptor captures the pace of the motion in the sequence, as perceived by the viewer. All motion descriptors except for
CameraMotion can be extracted from arbitrarily shaped regions.
The localization description tools can be used to indicate regions of interest in the spatial (RegionLocator) and spatio-temporal
(SpatioTemporalLocator) domains.
© ISO/IEC 2002 – All rights reserved 1
The FaceRecognition descriptor is not associated with any particular visual feature and can be used to describe a human face for
applications requiring the matching and retrieval of face images.
Basic Structures
Descriptor Containers Basic Supporting Tools
GridLayout TemporalInterpolation
TimeSeries Spatial2DcoordinateSystem
MultipleView
Visual Features
Color
Color Feature Descriptors
Color Supporting Tools
DominantColor ColorSpace
ScalableColor ColorQuantization
ColorLayout
ColorStructure
GofGopColor
Texture Shape Motion
HomogeneousTexture RegionShape CameraMotion
TextureBrowsing ContourShape MotionTrajectory
EdgeHistogram Shape3D ParametricMotion
MotionActivity
Localization
RegionLocator
SpatioTemporalLocator
Other
FaceRecognition
Figure 1 Overview of Visual Description Tools
2 Terms and Definitions
2.1 Default reference axis
The default reference axis for angle calculation is the positive x (horizontal) axis. Positive angle is calculated anti-clockwise.
2.2 DCT coefficients
DCT coefficient
The signed amplitude of a specific cosine basis function.
AC coefficient
Any DCT coefficient for which the frequency in one or both dimensions is non-zero.
DC coefficient
The DCT coefficient for which the frequency in both dimensions is zero.
2 © ISO/IEC 2002 – All rights reserved
2.3 Data element
An item of data as represented before encoding and after decoding.
3 Abbreviations and Symbols
3.1 General
The mathematical symbols used to describe ISO/IEC 15938-3 are similar to those used in the C programming language. However,
integer divisions with truncation and rounding are specifically defined. Numbering and counting loops generally begin with zero.
3.2 Abbreviations
ART Angular-Radial Transform
CSS Curvature Scale Space
DDL Description Definition Language
DS Description Scheme
D Descriptor
DCT Discrete Cosine Transform
FOC Focus of Contraction
FOE Focus of Expansion
GoF Group of Frames
GoP Group of Pictures
HMMD Hue-Min-Max-Difference
HSV Hue-Saturation-Value
RGB Red-Green-Blue
3.3 Arithmetic operators
+ Addition
- Subtraction (as a binary operator) or negation (as a unary operator)
++ Increment, i.e. x++ is equivalent to x=x+1
-- Decrement, i.e. x-- is equivalent to x=x-1
* Multiplication
x Multiplication
^ Power
/ Integer division with truncation of the result towards zero. For example, 7/4 and -7/-4 are truncated to 1
-7/4 and 7/-4 are truncated to -1
// Integer division with rounding to the nearest integer. Half-integer values are rounded away from zero
unless otherwise specified. For example, 3//2 is rounded to 2, and -3//2 is rounded to -2.
÷ Used to indicate division in mathematical equations where no rounding is intended
% Modulus operator, defined only for positive numbers
ld Logarithm base 2
ceil Minimum integer number greater or equal than the given floating point number
3.4 Logical operators
|| Logical OR
&& Logical AND
! Logical NOT
3.5 Relational operators
> Greater than
>= Greater than or equal to
≥ Greater than or equal to
< Less than
<= Less than or equal to
≤ Less than or equal to
© ISO/IEC 2002 – All rights reserved 3
== Equal to
!= Not equal to
3.6 Bitwise operators
| OR
& AND
>> Shift right with sign extension
<< Shift left with zero fill
3.7 Conditional operator
a if (condition)
?: condition?a :b =
b otherwise
3.8 Assignment
= Assignment operator
3.9 Mnemonics
The following mnemonics are defined to describe the different data types used in the coded bitstream.
bslbf Bit string, left bit first, where “left” is the order in which bits are written in ISO/IEC 15938-3. Bit strings are generally
written as a string of 1s and 0s within single quote marks, e.g. ‘1000 0001’. Blanks within a bit string are for ease of
reading and have no significance. For convenience, large strings are occasionally written in hexadecimal, in which
case conversion to a binary in the conventional manner will yield the value of the bit string. Thus, the left-most
hexadecimal digit is first and in each hexadecimal digit the most significant of the four digits is first.
vluimsbf5 Variable length unsigned integer most significant bit first representation consisting of two parts. The first part defines
the number n of 4-bit bit fields used for the value representation, encoded by a sequence of n-1 “1” bits, followed by a
“0” bit signaling its end. The second part contains the value of the integer encoded using the number of bit fields
specified in the first part.
uimsbf Unsigned integer, most significant bit first.
simsbf Signed integer, in two’s complement format, most significant bit (sign) first.
vlclbf Variable length code, left bit first, where “left” refers to the order in which the VLC codes are written in ISO/IEC
15938-3. The byte order of multibyte words is most significant byte first.
fsbf Float (32 bit), sign bit first. The semantics of the bits within a float are specified in the IEEE Standard for Binary Floating
Point Arithmetic (ANSI/IEEE Std 754-1985).
UTF-8 Binary string encoding defined in ISO 10646/IETF RFC 2279.
3.10 Constants
π 3.141 592 653 58…
e 2.718 281 828 45…
3.11 Functions
max() Maximum value in argument list
min() Minimum value in argument list
1 x ≥ 0
Sign() Sign(x) =
−1 x < 0
x x ≥ 0
Abs() Abs(x) =
−x x < 0
i
f (i) Summation of f (i) with i taking integer values from a up to, but not including b.
∑
i=a
4 © ISO/IEC 2002 – All rights reserved
L1 norm L1(x,y) = x − y
i i
∑
i
L2 norm L2(x,y) = ()x − y
i i
∑
i
Euclidean distance D(x,y) =sqrt (x − y )
i i
∑
i
4 Conventions
4.1 Method of describing the DDL representation syntax
The method of describing the DDL representation syntax is defined in ISO/IEC 15938-2 (MPEG-7 Description Definition
Language).
4.2 Method of describing the binary representation syntax
4.2.1 Introduction
The video description elements can be encoded using the generic encoding mechanism defined in ISO/IEC 15938-1 or the
video-specific binary representation syntax defined in the “Binary representation syntax” subclauses below.
4.2.2 Generic binary representation
The use of the video-specific syntax is signaled using the codec configuration mechanism defined in ISO/IEC 15938-1 and the
following classification scheme is defined for this purpose.
MPEG7CameraMotion
ISO/IEC 15938-3 Binary Camera Motion
Codec
MPEG7ColorLayout
ISO/IEC 15938-3 Binary Color Layout
Codec
MPEG7ColorQuantization
ISO/IEC 15938-3 Binary Color Quantization
Codec
MPEG7ColorSpace
ISO/IEC 15938-3 Binary Color Space
Codec
MPEG7ColorStructure
ISO/IEC 15938-3 Binary Color Structure
Codec
MPEG7ContourShape
ISO/IEC 15938-3 Binary Contour Shape
Codec
© ISO/IEC 2002 – All rights reserved 5
MPEG7DominantColor
ISO/IEC 15938-3 Binary Dominant Color
Codec
MPEG7EdgeHistogram
ISO/IEC 15938-3 Binary Edge Histogram
Codec
MPEG7FaceRecognition
ISO/IEC 15938-3 Binary Face Recognition
Codec
MPEG7FoFGoPColor
ISO/IEC 15938-3 Binary GoFGoP Color
Codec
MPEG7GridLayout
ISO/IEC 15938-3 Binary Grid Layout
Codec
MPEG7HomogeneousTexture
ISO/IEC 15938-3 Binary Homogeneous Texture
Codec
MPEG7IrregularVisualTimeSeries
ISO/IEC 15938-3 Binary Irregular Time Series
Codec
MPEG7MotionActivity
ISO/IEC 15938-3 Binary Motion Activity
Codec
MPEG7MotionTrajectory
ISO/IEC 15938-3 Binary Motion Trajectory
Codec
MPEG7MultipleView
ISO/IEC 15938-3 Binary Multiple View
Codec
MPEG7ParametricMotion
ISO/IEC 15938-3 Binary Parametric Motion
Codec
MPEG7RegionLocator
ISO/IEC 15938-3 Binary Region Locator
Codec
MPEG7RegionShape
6 © ISO/IEC 2002 – All rights reserved
ISO/IEC 15938-3 Binary Region Shape
Codec
MPEG7RegularVisualTimeSeries
ISO/IEC 15938-3 Binary Regular Time Series
Codec
MPEG7ScalableColor
ISO/IEC 15938-3 Binary Scalable Color
Codec
MPEG7Shape3D
ISO/IEC 15938-3 Binary Shape 3D
Codec
MPEG7Spatial2DCoordinateSystem
ISO/IEC 15938-3 Binary Spatial 2D Coordinate
System Codec
MPEG7SpatioTemporalLocator
ISO/IEC 15938-3 Binary SpatioTemporal Locator
Codec
MPEG7TemporalInterpolation
ISO/IEC 15938-3 Binary Temporal Interpolation
Codec
MPEG7TextureBrowsing
ISO/IEC 15938-3 Binary Texture Browsing
Codec
4.2.3 Video binary representation
The video-specific bitstream retrieved is described in subclauses entitled “Binary syntax representation” in clauses 5—11. Each
data item in the bitstream is in bold type. It is described with its name, its length in bits, and a mnemonic for its type and order of
transmission.
The action caused by a decoded data element in a bitstream depends on the value of that data element and on data elements
previously decoded. The following constructs are used to express the conditions when data elements are present and are in
normal type.
while ( condition ) { If the condition is true, then the group of data elements
data_element occurs next in the data stream. This repeats until the
… condition is not true.
}
do {
data_element The data element always occurs at least once.
…
The data element is repeated until the condition is not true.
} while (condition )
© ISO/IEC 2002 – All rights reserved 7
if ( condition ) { If the condition is true, then the first group of data elements
data_element occurs next in the data stream.
…
If the condition is not true, then the second group of data
} else {
elements occurs next in the data stream.
data_element
…
}
for ( i=m ; i
data_element construct within the group of data elements may depend on
… the value of the loop control variable i, which is set to m for
the first occurrence, incremented by one for the second
}
occurrence, and so forth.
/* comments */ Explanatory comments that may be deleted entirely without
in any way altering the syntax.
The syntax uses a ‘C-code’ convention that a variable or expression evaluating to a non-zero value is equivalent to a condition that
is true and a variable or expression evaluating to zero is equivalent to a condition that is false. In many cases a literal string is used
in a condition. In such cases a literal string is used to describe the value of a bitstream element.
As noted, a group of data element may contain nested conditional constructs. For compactness, the brackets {} are omitted when
only one data element follows. The elements of a multidimensional table are represented as follows.
st
data_element[n] data_element[n] is the n+1 element in an array of data
st st
data_element[m][n] data_element[m][n] is the m+1 ,n+1 element in a two-dimensional array of data
st st st
data_element[l][m][n] data_element[l][m][n] is the l+1 ,m+1 ,n+1 element in a three-dimensional array
The elements of a multidimensional array are transmitted in the bitstream starting with data_element[0][0] and with the outermost
elements incremented first, i.e. data_element[0][1] is sent second, data_element[0][2] third, etc.
4.3 Method of describing the descriptor semantics
The general semantics of the descriptors are defined in the introductory sections of respective subclauses. The semantics of the
syntax components is defined in sections “Descriptor components semantics”. The ordering in the semantics sections normally
follows the order in which the items appear in the binary representation syntax, which is typically equivalent to the order of items in
the DDL instantiation (not schema specification).
5 Basic structures
5.1 Introduction
This clause introduces five supporting tools for the visual descriptions defined in clauses 6−11. They are categorized into two
groups, descriptor containers and basic supporting tools. The former consists of three datatypes, GridLayout providing efficient
representations of visual features on grids, VisualTimeSeries representing temporal arrays of several descriptions, and
MultipleView describing a 3D object using several pictures captured from different view angles. The latter contains two tools,
Spatial2DCoordinateSystem used to specify the 2D coordinate system and TemporalInterpolation indicating the interpolation
method between two samples on a time axis.
5.2 Grid layout
5.2.1 Introduction
This datatype specifies a structure that allows an image to be split into a set of rectangular regions, so that each region can be
described separately. Each region of the grid can be described in terms of other descriptors such as color or texture.
8 © ISO/IEC 2002 – All rights reserved
5.2.2 DDL representation syntax
minOccurs=”1” maxOccurs=”65025”/>
5.2.3 Binary representation syntax
GridLayout {
Number of bits Mnemonic
8 uimsbf
DescriptorID
8 uimsbf
numOfPartX
8 uimsbf
numOfPartY
1 bslbf
DescriptorMaskPresent
if(DescriptorMaskPresent) {
partNumX*partNumY bslbf
descriptorMask
}
for(k=0;k
if(DescriptorMaskPresent) {
if(descriptorMask[k]) {
Descriptor instance specified by
Descriptor[k]
descriptorID
}
} else {
Descriptor[k] Descriptor instance specified by
descriptorID
}
}
}
5.2.4 Descriptor components semantics
DescriptorID
This field, which is only present in the binary representation, specifies a descriptor identifier. The descriptor identifier indicates the
descriptor type accommodated in the grid layout.
The assignment of IDs to the descriptors is specified in Table 1.
© ISO/IEC 2002 – All rights reserved 9
Table 1 Assignment of IDs to descriptors
ID Descriptor
0 Forbidden
1 CameraMotion
2 ColorLayout
3 ColorSpace
4 ColorStructure
5 ColorQuantization
6 ContourShape
7 DominantColor
8 EdgeHistogram
9 FaceRecognition
10 GoFGoPColor
11 GridLayout
12 HomogeneousTexture
13 IrregularVisualTimeSeries
14 MotionActivity
15 MotionTrajectory
16 MultipleView
17 ParametricMotion
18 RegionLocator
19 RegionShape
20 RegularVisualTimeSeries
21 ScalableColor
22 Shape3D
23 Spatial2DCoordinateSystem
24 SpatioTemporalLocator
25 TemporalInterpolation
26 TextureBrowsing
27-255 Reserved
numOfPartX
This attribute specifies the number of horizontal partitions in the grid over the image.
numOfPartY
This attribute specifies the number of vertical partitions in the grid over the image.
DescriptorMaskPresent
This field, which is only present in the binary syntax, indicates whether all partitions of the image contain the descriptors. If
DescriptorMaskPresent is set to 0 then all partitions contain the descriptor. If DescriptorMaskPresent is set to 1 then the
DescriptorMask attribute indicates which partitions contain descriptors.
descriptorMask
This attribute specifies a bit-field that indicates whether a descriptor is assigned to the corresponding partition. The partitioned
image is indexed from left to right and top to bottom. For example, if a descriptorMask of “0110” is given for a 2x2 partitioned
image then the upper right and lower left quarter of the image contain a descriptor.
Descriptor
This element specifies the visual descriptors that have been assigned to the cells. When a visual descriptor is assigned to a cell
within grid layout, it defines the properties of the particular cell according to the semantic definition of the descriptor used. In other
words, each cell is treated as an individual image and the descriptor values are computed accordingly. For example, a dominant
color descriptor assigned to a cell specifies the dominant colors of the pixels within that cell. All the restrictions on the image size,
etc. are now applicable to the size of the cell within the grid. All the descriptor instances in a single GridLayout instance must be of
the same type.
The following visual description tools cannot appear in the GridLayout: VisualTimeSeries, MultipleView,
Spatial2DCoordinateSystem, TemporalInterpolation, ColorSpace, ColorQuantization, Shape3D, CameraMotion,
MotionTrajectory, ParametricMotion, SpatioTemporalLocator.
Grid layout can be applied to a video segment, in which case the geometry of the grid layout is constant over time. Each frame in
the segment is divided into the same number of cells and the cells at corresponding locations form sequences that can be viewed
as a group of frames. The GoFGoPColor and MotionActivity descriptors can then be applied to each sequence defined by the grid
layout.
10 © ISO/IEC 2002 – All rights reserved
5.3 Time series
5.3.1 Introduction
This datatype specifies a temporal series of descriptors in a video segment. Two types of time series datatypes are defined:
RegularVisualTimeSeries and IrregularVisualTimeSeries. In the RegularVisualTimeSeries, descriptors are located regularly (with
constant intervals) within a given time span. Alternatively, descriptors are located irregularly in the IrregularVisualTimeSeries. Both
structures consist of a series of descriptors and temporal intervals between them as illustrated in Figure 2.
����
�����������������
�����
������ ����������� ����������� ����������� �������������
������������������������������������������
����������������
Figure 2 Overview of the VisualTimeSeries
5.3.2 VisualTimeSeries
5.3.2.1 Introduction
The VisualTimeSeries serves as the base type for RegularVisualTimeSeries and IrregularVisualTimeSeries. As it is an abstract
type, only the DDL representation is defined.
5.3.2.2 DDL representation syntax
...




Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...