Information technology — Multimedia content description interface — Part 3: Visual — Amendment 4: Video signature tools

Technologies de l'information — Interface de description du contenu multimédia — Partie 3: Visuel — Amendement 4: Outils de vidéosignature

General Information

Status

Published

Publication Date

05-Oct-2010

ICS

35.040 - Information coding

35.040.40 - Coding of audio, video, multimedia and hypermedia information

Technical Committee

ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information

Drafting Committee

ISO/IEC JTC 1/SC 29 - Coding of audio, picture, multimedia and hypermedia information

Current Stage

6060 - International Standard published

Due Date

13-Jul-2012

Completion Date

06-Oct-2010

Ref Project

Relations

Amends

ISO/IEC 15938-3:2002 - Information technology — Multimedia content description interface — Part 3: Visual

Effective Date

26-Jun-2021

Buy Standard

ISO/IEC 15938-3:2002/Amd 4:2010 - Video signature tools

Standard

ISO/IEC 15938-3:2002/Amd 4:2010 - Video signature tools

English language

42 pages

sale 15% off

Preview

sale 15% off

Preview

Standards Content (Sample)

ISO/IEC 15938-3:2002/Amd 4:201...

INTERNATIONAL ISO/IEC
STANDARD 15938-3
First edition
2002-05-15
AMENDMENT 4
2010-10-15

Information technology — Multimedia
content description interface —
Part 3:
Visual
AMENDMENT 4: Video signature tools
Technologies de l'information — Interface de description du contenu
multimédia —
Partie 3: Visuel
AMENDEMENT 4: Outils de vidéosignature

Reference number
ISO/IEC 15938-3:2002/Amd.4:2010(E)
©
ISO/IEC 2010

---------------------- Page: 1 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.

COPYRIGHT PROTECTED DOCUMENT

© ISO/IEC 2010
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland

ii © ISO/IEC 2010 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are members of
ISO or IEC participate in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of technical activity. ISO and IEC
technical committees collaborate in fields of mutual interest. Other international organizations, governmental
and non-governmental, in liaison with ISO and IEC, also take part in the work. In the field of information
technology, ISO and IEC have established a joint technical committee, ISO/IEC JTC 1.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of the joint technical committee is to prepare International Standards. Draft International
Standards adopted by the joint technical committee are circulated to national bodies for voting. Publication as
an International Standard requires approval by at least 75 % of the national bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO and IEC shall not be held responsible for identifying any or all such patent rights.
Amendment 4 to ISO/IEC 15938-3:2002 was prepared by Joint Technical Committee ISO/IEC JTC 1,
Information technology, Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia
information.

© ISO/IEC 2010 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
Information technology — Multimedia content description
interface —
Part 3:
Visual
AMENDMENT 4: Video signature tools
Replace 1.2 with:
1.2 Overview of Visual Description Tools
This part of ISO/IEC 15938 specifies tools for description of visual content, including still images, video and
3D models. These tools are defined by their syntax in DDL and binary representations and semantics
associated with the syntactic elements. They enable description of the visual features of the visual material,
such as color, texture, shape, motion, localization of the described objects in the image or video sequence
and also unique and robust identification of visual material. An overview of the visual description tools is
shown in Figure 1.
The basic structure description tools include five supporting tools of visual descriptions defined in
Clauses 6-11. They are categorized into two groups, descriptor containers and basic supporting tools. The
former consists of three datatypes, GridLayout providing efficient representations of visual features on grids,
TimeSeries representing temporal arrays of several descriptions, GofGopFeature describes representative
descriptions over video segment, and MultipleView describing a 3D object using several pictures captured
from different view angles. The latter contains two tools, Spatial2DcoordinateSystem used to specify the 2D
coordinate system and TemporalInterpolation indicating the interpolation method between two samples on a
time axis.
The remaining description tools, except for the FaceRecognition and ImageSignature descriptors, are
associated with visual features and are grouped into five feature categories: Color, Texture, Shape, Motion
and Localization.
The color description tools include five color descriptors to represent different aspects of color features:
representative colors (DominantColor), color distribution (ScalableColor), spatial distribution of colors
(ColorLayout and ColorStructure) and perceptual feeling of illumination color (ColorTemperature). It also
contains three supporting tools, ColorSpace and ColorQuantization used in DominantColor and
IlluminationInvariantColor to extend four color descriptors, DominantColor, ScalableColor, ColorLayout and
ColorStructure, to support illumination invariant similarity matching. An extension of ScalableColor to a group
of frames or pictures (GoFGoPColor) is also included in this group. All the color descriptors can be extracted
from arbitrarily shaped regions.
The texture description tools facilitate browsing (TextureBrowsing) and similarity retrieval
(HomogeneousTexture and EdgeHistogram) using the texture of a still or moving image region. All the texture
descriptors can be extracted from arbitrarily shaped regions.
The shape description tools include two descriptors that characterize different shape features of a 2D object or
region. The RegionShape descriptor captures the distribution of all pixels within a region and the Contour
Shape descriptor characterizes the shape properties of the contour of an object. The extension of
RegionShape is also defined as ShapeVariation to describe temporal variation of shape over video segment.
The Shape3D and Perceptual 3D Shape descriptors provide 3-dimensional shape information; the former
© ISO/IEC 2010 – All rights reserved 1

---------------------- Page: 4 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
represents an intrinsic shape characterization of 3D mesh models, and the latter represents part-based
representation of a 3D object.
The motion description tools include four descriptors that characterize various aspects of motion. The
CameraMotion descriptor specifies a set of basic camera operations such as, for example, panning and tilting.
The motion of a key point (pixel) from a moving object or region can be characterized by the MotionTrajectory
descriptor. The ParametricMotion descriptor characterizes an evolution of an arbitrarily shaped region over
time in terms of a 2D geometric transformation. Finally, the MotionActivity descriptor captures the pace of the
motion in the sequence, as perceived by the viewer. All motion descriptors except for CameraMotion can be
extracted from arbitrarily shaped regions.
The localization description tools can be used to indicate regions of interest in the spatial (RegionLocator) and
spatio-temporal (SpatioTemporalLocator) domains.
The FaceRecognition descriptor and the Advanced Face Recognition descriptor are not associated with any
particular visual feature and can be used to describe a human face for applications requiring the matching and
retrieval of face images.
The signature descriptors provide a "fingerprint" that uniquely identifies image and video content. The
signatures are robust (unchanging) across a wide range of common editing operations, but are sufficiently
different for every item of "original" content to allow unique and reliable identification – just like human
fingerprints. There are two visual signatures; the ImageSignature and VideoSignature are descriptors for
images and videos respectively. The signatures have no direct association with specific visual features such
as colour, shape or texture.

2 © ISO/IEC 2010 – All rights reserved

---------------------- Page: 5 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
Basic Structures
Descriptor Containers Basic Supporting Tools
GridLayout TemporalInterpolation
TimeSeries Spatial2DcoordinateSystem
GofGopFeature
MultipleView
Visual Features
Color
Color Feature Descriptors
Color Supporting Tools
DominantColor ColorSpace
ScalableColor ColorQuantization
ColorLayout IlluminationInvariantColor
ColorStructure
GofGopColor
ColorTemperature
Texture Shape Motion
HomogeneousTexture RegionShape CameraMotion
TextureBrowsing ContourShape MotionTrajectory
EdgeHistogram ParametricMotion
ShapeVariation
MotionActivity
Shape3D
Perceptual 3D Shape
Localization
RegionLocator
SpatioTemporalLocator
Other
Signatures
FaceRecognition
ImageSignature
AdvancedFaceRecognition
VideoSignature

Figure 1 — Overview of Visual Description Tools
In 3.3, extend the definitions:
floor Maximum integer number less than or equal to the given floating point number
Replace 4.2.2 with:
4.2.2 Generic binary representation
The use of the video-specific syntax is signalled using the codec configuration mechanism defined in
ISO/IEC 15938-1. The following classification scheme is defined for this purpose.
© ISO/IEC 2010 – All rights reserved 3

---------------------- Page: 6 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)

MPEG7CameraMotion
ISO/IEC 15938-3 Binary Camera Motion
  Codec

MPEG7ColorLayout
ISO/IEC 15938-3 Binary Color Layout
  Codec

MPEG7ColorQuantization
ISO/IEC 15938-3 Binary Color Quantization
  Codec

MPEG7ColorSpace
ISO/IEC 15938-3 Binary Color Space
  Codec

MPEG7ColorStructure
ISO/IEC 15938-3 Binary Color Structure
  Codec

MPEG7ContourShape
ISO/IEC 15938-3 Binary Contour Shape
  Codec

MPEG7DominantColor
ISO/IEC 15938-3 Binary Dominant Color
  Codec

MPEG7EdgeHistogram
ISO/IEC 15938-3 Binary Edge Histogram
  Codec

MPEG7FaceRecognition
ISO/IEC 15938-3 Binary Face Recognition
  Codec

MPEG7FoFGoPColor
ISO/IEC 15938-3 Binary GoFGoP Color
  Codec

MPEG7GridLayout
ISO/IEC 15938-3 Binary Grid Layout
  Codec

MPEG7HomogeneousTexture
ISO/IEC 15938-3 Binary Homogeneous Texture
  Codec
4 © ISO/IEC 2010 – All rights reserved

---------------------- Page: 7 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)

MPEG7IrregularVisualTimeSeries
ISO/IEC 15938-3 Binary Irregular Time Series
  Codec

MPEG7MotionActivity
ISO/IEC 15938-3 Binary Motion Activity
  Codec

MPEG7MotionTrajectory
ISO/IEC 15938-3 Binary Motion Trajectory
  Codec

MPEG7MultipleView
ISO/IEC 15938-3 Binary Multiple View
  Codec

MPEG7ParametricMotion
ISO/IEC 15938-3 Binary Parametric Motion
  Codec

MPEG7RegionLocator
ISO/IEC 15938-3 Binary Region Locator
  Codec

MPEG7RegionShape
ISO/IEC 15938-3 Binary Region Shape
  Codec

MPEG7RegularVisualTimeSeries
ISO/IEC 15938-3 Binary Regular Time Series
  Codec

MPEG7ScalableColor
ISO/IEC 15938-3 Binary Scalable Color
  Codec

MPEG7Shape3D
ISO/IEC 15938-3 Binary Shape 3D
  Codec

MPEG7Spatial2DCoordinateSystem
ISO/IEC 15938-3 Binary Spatial 2D Coordinate
  System Codec

MPEG7SpatioTemporalLocator
ISO/IEC 15938-3 Binary SpatioTemporal Locator
  Codec
© ISO/IEC 2010 – All rights reserved 5

---------------------- Page: 8 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)

MPEG7TemporalInterpolation
ISO/IEC 15938-3 Binary Temporal Interpolation
  Codec

MPEG7TextureBrowsing
ISO/IEC 15938-3 Binary Texture Browsing
  Codec

MPEG7GofGopFeature
ISO/IEC 15938-3 Binary Gof Gop Feature
Codec

MPEG7ColorTemperature
ISO/IEC 15938-3 Binary Color Temperature
Codec

MPEG7ShapeVariation
ISO/IEC 15938-3 Binary Shape Variation
Codec

MPEG7IlluminationInvariantColor
ISO/IEC 15938-3 Binary Illumination Invariant
Color Codec

MPEG7AdvancedFaceRecognition
ISO/IEC 15938-3 Binary Advanced Face Recognition
Codec

MPEG7Perceptual3DShape
ISO/IEC 15938-3 Binary Perceptual 3D Shape
Codec

MPEG7ImageSignature
ISO/IEC 15938-3 Binary Image Signature
Codec

MPEG7VideoSignature
ISO/IEC 15938-3 Binary Video Signature
Codec

6 © ISO/IEC 2010 – All rights reserved

---------------------- Page: 9 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
In 5.2.4, replace Table 1 with:
Table 1 — Assignment of IDs to descriptors
ID Descriptor
0 Forbidden
1 CameraMotion
2 ColorLayout
3 ColorSpace
4 ColorStructure
5 ColorQuantization
6 ContourShape
7 DominantColor
8 EdgeHistogram
9 FaceRecognition
10 GoFGoPColor
11 GridLayout
12 HomogeneousTexture
13 IrregularVisualTimeSeries
14 MotionActivity
15 MotionTrajectory
16 MultipleView
17 ParametricMotion
18 RegionLocator
19 RegionShape
20 RegularVisualTimeSeries
21 ScalableColor
22 Shape3D
23 Spatial2DCoordinateSystem
24 SpatioTemporalLocator
25 TemporalInterpolation
26 TextureBrowsing
27 GofGopFeature
28 ColorTemperature
29 ShapeVariation
30 IlluminationInvariantColor
31 AdvancedFaceRecognition
32 Perceptual3DShape
33 ImageSignature
34 VideoSignature
35-255 Reserved

After 11.3, add the following:
11.4 Video Signature
11.4.1 Introduction
The visual content descriptors in Sections 6-9 are very useful when trying to find videos with similar content.
These descriptors are intended to be general and were found to be unsuitable for the task of finding duplicate
content. The video signature descriptor is designed to identify duplicate video content. This descriptor is
robust (unchanging) to a wide range of common video editing operations, but is sufficiently different for every
"original" content to identify it uniquely and reliably – just like human fingerprints.
© ISO/IEC 2010 – All rights reserved 7

---------------------- Page: 10 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
The video signature is composed of three main elements,
• a frame signature,
• a set of compact summary frame signatures - referred to as words
• and a group-of-frames representation for a temporal segment -referred to as a bag-of-words.
A video is assumed to be made up of a set of frames (or pictures) each representing a single temporal sample.
A frame is made of a set of pixels each representing a single spatial sample. The frame signature is extracted
from each frame of a video. It is a 380 dimensional vector of base-3 ternary values that describe the
intensities and the intensity inter-relations between pixel regions in the frames. Each dimension can be
characterized as a mean, first or second order operator.
Words are compact, 1 byte, representations of the frame signature. All possible combinations of values for a
word are referred to as the vocabulary. The words provide a summary representation of the frame.
A bag-of-words representation is often used in text searching to compare the similarity between two
documents. It ignores the ordering of the text and therefore provides some robustness to editing. For the video
signature a bag-of-words records the occurrence of words within a temporal segment of frames. The bag-of-
words therefore provides a coarse descriptor for the temporal segment.
The video signature descriptor syntax provides support for description of single or multiple static spatial
regions within the frame. Each spatial region is a rectangular region having arbitrary position and size, with
edges parallel to the edges of the frame. Each spatial region may have its own start and end media times.
This feature is useful when describing content such as videos with picture-in-picture, where the entire frame
region can be described as the first spatial region and the picture-in-picture region can be described as the
second spatial region.
The extraction procedure shall be applied to each spatial region independently. Specifically, only pixels within
the spatial region are processed to extract the video signature.
11.4.2 DDL representation syntax
























8 © ISO/IEC 2010 – All rights reserved

---------------------- Page: 11 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)




























































© ISO/IEC 2010 – All rights reserved 9

---------------------- Page: 12 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)





Descriptor example:


   0 0
   719 479

  0
  1000

   0
   5038


   0
   89

    0
    2969

   1 1 1 1 0 1 0 0 .
    .

    .



   0
   100
   1 2 3 4 5
   1 2 1 0 1 0 2 1 0 2 .
    .



10 © ISO/IEC 2010 – All rights reserved

---------------------- Page: 13 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
11.4.3 Binary Representation Syntax
VideoSignature { Number of bits Mnemonics
NumOfSpatialRegions 32 uimsbf
for( r=0; r< NumOfSpatialRegions; r++ ) {
    SpatialLocationFlag 1 bslbf
    if( SpatialLocationFlag == 1 ) {
      for(i=0;i<2;i++){
        PixelX 16 uimsbf
        PixelY 16 uimsbf
      }
    }
    StartFrameOfSpatialRegion 32 uimsbf
    NumOfFrames 32 uimsbf
    MediaTimeUnit 16 uimsbf
    MediaTimeFlagOfSpatialRegion 1 bslbf
    if( MediaTimeFlagOfSpatialRegion == 1 ) {
       StartMediaTimeOfSpatialRegion 32 uimsbf
       EndMediaTimeOfSpatialRegion 32 uimsbf
    }
    NumOfSegments 32 uimsbf
    for( i=0; i< NumOfSegments; i++ ) {
    StartFrameOfSegment 32 uimsbf
    EndFrameOfSegment 32 uimsbf
       MediaTimeFlagOfSegment 1 bslbf
       if( MediaTimeFlagOfSegment == 1 ) {
          StartMediaTimeOfSegment 32 uimsbf
          EndMediaTimeOfSegment 32 uimsbf
       }
       for( j=0; j< WordsPerFrame; j++ ) {
          BagOfWords[j] 243 bslbf
}
    }
    CompressionFlag 1 bslbf
    if(CompressionFlag ==0) {
for( i=0; i< NumOfFrames; i++ ) {
        MediaTimeFlagOfFrame 1 bslbf
        if( MediaTimeFlagOfFrame == 1 ) {
           MediaTimeOfFrame 32 uimsbf
        }
  FrameConfidence 8 uimsbf
  for( j=0; j< WordsPerFrame; j++ ) {
           Word[j] 8 uimsbf
        }
        FrameSignature 608 bslbf
       }
     } else {
      for (i=0; i        MediaTimeFlagOfFrame 1 bslbf
       if( MediaTimeFlagOfFrame == 1 ) {
           MediaTimeOfFrame 32 uimsbf
       }
       FrameConfidence 8 uimsbf
       for (j=0; j < WordsPerFrame; j++) {
       Word[j] 8 uimsbf
      }
     }
© ISO/IEC 2010 – All rights reserved 11

---------------------- Page: 14 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
     CompressedSegmentLength = 45
     n = 0
     for (i=0; i < NumOfSegments; i++) {
       if (i == NumOfSegments -1 ) {
        CompressedSegmentLength = NumOfFrames -
n
       }
      CompressedSegment( ) bslbf
      n += CompressedSegmentLength
     }
}
}
}

Number of bits Mnemonics
CompressedSegment {
  num_frames = 0
  while (num_frames < CompressedSegmentLength) {
   FrameSignature 608 bslbf
   GOPLengthm1 ceil(ld(Segment uimsbf
Length))
   PredictedPictures( ) bslbf
   num_frames = num_frames + GOPLengthm1 + 1
  }
}

PredictedPictures { Number of bits Mnemonics
  decoded_el = 0

  num_el = GOPLengthm1 × 380
  while (decoded_el < num_el) {
  ZeroRL Variable (see Exp- bslbf
Golomb coding in
11.4.9.1).
  decoded_el = decoded_el + ZeroRL
  if (decoded_el == num_el) then break
  NonZeroSymbol 1
  decoded_el = decoded_el + 1
  }
}

WordsPerFrame = 5
11.4.4 Descriptor Component Semantics
NumOfSpatialRegions
This field, which is only present in the binary syntax, specifies the number of spatial regions from the video.
12 © ISO/IEC 2010 – All rights reserved

---------------------- Page: 15 ----------------------
ISO/IEC 15938-3:2002/Amd.4:2010(E)
SpatialLocationFlag
This field, which is only present in the binary syntax, indicates the presence of the PixelX, Pix
...

ISO/IEC 15938-3:2002/Amd 4:2010

Information technology — Multimedia content description interface — Part 3: Visual — Amendment 4: Video signature tools

Information technology — Multimedia content description interface — Part 3: Visual — Amendment 4: Video signature tools

Technologies de l'information — Interface de description du contenu multimédia — Partie 3: Visuel — Amendement 4: Outils de vidéosignature

General Information

Relations

Buy Standard

Standards Content (Sample)

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Information technology — Multimedia content description interface — Part 3: Visual — Amendment 4: Video signature tools

Technologies de l'information — Interface de description du contenu multimédia — Partie 3: Visuel — Amendement 4: Outils de vidéosignature

General Information

Relations

Buy Standard

Standards Content (Sample)

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

This May Also Interest You