Information technology — Use of biometrics in video surveillance systems — Part 4: Ground truth and video annotation procedure

This document establishes requirements for the annotation of humans, human faces and other body parts, and arbitrary objects appearing in imagery. It specifies the following: — metadata to be inserted in a video stream; — encoding of full and partial spatial and temporal ground truth information for: — objects present in a video, and — objects absent in a video; — procedures for different annotation of known and unknown subjects. This document does not specify: — encoding of video data.

Titre manque — Partie 4: Titre manque

General Information

Status
Published
Publication Date
21-Jun-2021
Current Stage
6060 - International Standard published
Start Date
22-Jun-2021
Due Date
04-Jan-2021
Completion Date
22-Jun-2021
Ref Project

Buy Standard

Standard
ISO/IEC 30137-4:2021 - Information technology -- Use of biometrics in video surveillance systems
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview
Draft
ISO/IEC PRF 30137-4:Version 08-maj-2021 - Information technology -- Use of biometrics in video surveillance systems
English language
18 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

INTERNATIONAL ISO/IEC
STANDARD 30137-4
First edition
2021-06
Information technology — Use of
biometrics in video surveillance
systems —
Part 4:
Ground truth and video annotation
procedure
Reference number
ISO/IEC 30137-4:2021(E)
©
ISO/IEC 2021

---------------------- Page: 1 ----------------------
ISO/IEC 30137-4:2021(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii © ISO/IEC 2021 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 30137-4:2021(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 2
5 Conformance . 2
6 Encoding of information supporting annotations . 3
6.1 Overview . 3
6.2 Region annotation . 3
6.2.1 Content . 3
6.2.2 Encoding of a bounding box . 3
6.2.3 Encoding of a polygonal region . 4
6.3 Encoding of object class information . 4
6.4 Encoding of object information . 6
6.4.1 Generic object information . 6
6.4.2 Encoding of human subject metadata . 7
6.5 Encoding of an annotation . 7
6.6 Encoding of frame timestamps . 8
6.7 Encoding of frames and intervals . 8
6.8 Encoding of a track . 8
6.9 Encoding of imaging system information . 9
7 Annotation of one video sequence .10
7.1 Overview .10
7.2 Annotation of tracks in video sequence .10
7.3 Annotation of absence in video sequence .10
7.4 Annotation of counting information .11
Annex A (normative) ISO/IEC 30137-4 XSD Schema .12
Bibliography .18
© ISO/IEC 2021 – All rights reserved iii

---------------------- Page: 3 ----------------------
ISO/IEC 30137-4:2021(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives or www .iec .ch/ members
_experts/ refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html. In the IEC, see www .iec .ch/ understanding -standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 37, Biometrics.
A list of all parts in the ISO/IEC 30137 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html and www .iec .ch/ national
-committees.
iv © ISO/IEC 2021 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 30137-4:2021(E)

Introduction
Considerable improvements in the performance of automated face recognition (AFR) have resulted
in applications such as automated border controls, where facial images encoded in ePassports
are compared with the face presented by a traveller at a control point. The success of these first
generation AFR systems has encouraged suppliers to consider other applications, where the subject
is not necessarily aware of the use of biometric comparison and where the environment for collection
of images can be far from optimal. The inferior performance in such less-controlled identification
applications can necessitate a greater involvement by trained personnel.
The ISO/IEC 30137 series provides guidance on the use of biometric technologies (primarily automated
face recognition) in video surveillance systems (VSS) for several scenarios, including real-time
operation against watchlists and post-event analysis of video data. The ISO/IEC 30137 series includes
guidance on the selection and placement of cameras through to system specification, testing and
maintenance. The ISO/IEC 30137 series uses the term VSS to replace the older but commonly used term,
closed circuit television (CCTV).
The ISO/IEC 30137 series addresses the annotation of human beings. It is not intended to provide for
annotation of non-human objects such as cars, animals, or luggage.
Records conformant to this document can be produced from video in either of the following ways:
— automatically, in which software analyses video and estimates quantities defined in this document,
or
— manually, in which human reviewers annotate video with a goal of producing ground truth video
annotation, which can be used by a receiving system (i.e. any service or device that decodes,
interprets and uses standardized data).
This supports several applications, including:
— People counting:
— stating of the number of people present in a location,
— stating of the number of people traversing a given point or volume,
— stating of population density (e.g. in crowds),
— measurement of crowd densities,
— performance of crowd behavioural analyses.
— Automated detection and tracking:
— automated enrolment (addition) of subjects to a watchlist, exhaustively or after behavioural
analysis,
— detection of subjects, and parts of subjects (e.g. faces),
— tracking of subjects through time, e.g. following motion in a single video,
— tracking of subjects appearing through camera networks, including cases where a subject is
viewed simultaneously by different cameras, and cases where the subject appears sequentially
before several cameras,
© ISO/IEC 2021 – All rights reserved v

---------------------- Page: 5 ----------------------
ISO/IEC 30137-4:2021(E)

— re-identification, the process of connecting an identity of a subject across two or more video
sequences.
— Automated identification:
— in law enforcement, looking for subjects of interest present on watchlists (negative identification,
blacklists),
— in law enforcement, applications in review of post-event VSS video from one or multiple cameras
against watchlists,
— in private commercial settings, looking for individuals to be given preferential service,
— identification of cooperative enrolled subjects (positive access control, whitelists).
This document includes annotation of the following information:
— Imaging type: single camera, sequential cameras, stereo cameras, combination, camera capture
spectrum.
— When the subject appears in the video (start time) and when they leave (end time).
— Brief description of the subject (what can be seen in the video?).
— Where and when the face of the subject appears.
— Brief description of the face (pose, orientation, expression, occlusion).
— Intermediate tracking points between the start and end times, for subject and face.
— Absolute description of the subject:
— estimated age, sex,
— hair and eye colour,
— estimated height and corpulence,
— clothing and clothing colour,
— glasses/hat,
— best subject image or best subject face image.
— Subject interactions with other subjects and groups.
— Subject interactions with other video elements (bag, car, etc.).
— Known identity of the subject.
— The presence of other subjects who are not annotated.
— Regions of interest, outside of which an algorithm or receiving system would not operate.
— Absence: Where items of interest, including subjects, are known to be absent.
Standardized annotation supports evaluation, research and development, and operational deployment.
vi © ISO/IEC 2021 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 30137-4:2021(E)
Information technology — Use of biometrics in video
surveillance systems —
Part 4:
Ground truth and video annotation procedure
1 Scope
This document establishes requirements for the annotation of humans, human faces and other body
parts, and arbitrary objects appearing in imagery. It specifies the following:
— metadata to be inserted in a video stream;
— encoding of full and partial spatial and temporal ground truth information for:
— objects present in a video, and
— objects absent in a video;
— procedures for different annotation of known and unknown subjects.
This document does not specify:
— encoding of video data.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
annotation
process of generating annotation data from imagery
3.2
annotation data
metadata associated with a subject traversing the field of view of a specific VSS camera
Note 1 to entry: An annotator preparing instances in accordance with this document should document the
criteria under which a subject annotation was made. For example, it can be policy to not annotate faces for which
interocular distance is below 12 pixels.
Note 2 to entry: If annotations are made by following a strict, tightly constrained or narrow set of criteria, then
detection, tracking, recognition or algorithm is expected to be more accurate than if more permissive or general
criteria has been used.
© ISO/IEC 2021 – All rights reserved 1

---------------------- Page: 7 ----------------------
ISO/IEC 30137-4:2021(E)

Note 3 to entry: An evaluation of, a tracking algorithm, for example, might exclude subjects that traverse in a non-
conformant way. This could include factors such as the subject’s direction of travel, obscuration by other people
or objects, operational functionalities of the camera (such as correct focus) or environmental conditions (e.g.
operation during night or day).
3.3
bounding box
rectangular region enclosing annotated object
Note 1 to entry: The major and minor axes of the rectangle are parallel to the edges of the images. For rotated
boxes, the polygon annotation is to be used.
3.4
bounding polygon
arbitrary region enclosing annotated object
3.5
video surveillance system
system consisting of camera equipment, monitoring and associated equipment for transmission and
controlling purposes, which can be necessary for the surveillance of a protected area
3.6
random access
ability to access arbitrary parts of a media item
3.7
recognition
process of assigning a biometric identifier to a subject
3.8
identification
process of determining a subject’s identity by comparing imagery of a biometric mode against a
database formed from imagery of individuals
Note 1 to entry: This generally does not include assigning an identifier when the target subject is not found in the
database.
4 Abbreviated terms
AFR automated facial recognition
ROI region of interest
VSS video surveillance system
5 Conformance
A biometric data record conforms to this document if it satisfies all normative requirements related to:
— its semantic requirements,
— its encoding requirements for structure, data values, and the relationships between its data
elements, as specified throughout Clauses 6 and 7 and Annex A for the biometric record format of
this document, and
— the relationship between its data values and the input biometric data from which the biometric data
record was generated.
2 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 30137-4:2021(E)

6 Encoding of information supporting annotations
6.1 Overview
The following subclauses define encodings used in the full annotation of video clips, as detailed in
Clause 7.
6.2 Region annotation
6.2.1 Content
An annotation of a body or body part shall enclose the region. An exception applies for a human face
which may be annotated using anthropometric landmarks instead of, or in addition to, a bounding
region.
A region annotation should be as precise as possible without adding an arbitrary margin around the
object.
NOTE An object recognition algorithm can need to adjust the amount of spatial margin in the annotated
region, depending on how it was trained and on its translational invariance.
If an object appears as two or more separated parts due to occlusion, two or more polygonal regions
may be used. In this case, the polygonal regions shall be linked together by using a common identifier.
6.2.2 Encoding of a bounding box
Bounding boxes are the simplest mechanism for spatial annotation. They are rectangles whose major
and minor axes are parallel to the image axes. They shall be encoded according to Table 1.
Table 1 — XSD schema for encoding of bounding box information
  schemaLocation="iso-iec-39794-5-ed-1-v1.xsd" />

 
  
   This is the root element of the 30137-4 data
structure.
  
 

 
  
   
   
  
 

 
  
   
   
  
 

 
  
   
   
  
 

© ISO/IEC 2021 – All rights reserved 3

---------------------- Page: 9 ----------------------
ISO/IEC 30137-4:2021(E)

 
  
   
   
   
  
 
 
  
   
   
  
 
 
  
    maxOccurs="unbounded" />
  
 
 
  
   
   
  
 
6.2.3 Encoding of a polygonal region
Polygonal regions are the secondary mechanism for spatial annotation. They are available for
annotation of objects that cannot be adequately localized, contained or demarcated by a bounding box.
Bounding polygons shall be encoded according to Table 2.
Table 2 — XSD schema for encoding of polygon information
 
  
    type="CartesianCoordinateListType" />
   
  
 
6.3 Encoding of object class information
This subclause annotates arbitrary objects. This kind of object is referred to as its class. For biometric
modalities, the class indicates the body part, such as a face, an ear, or a whole body. For other (generally)
non-human objects, the class indicates a noun such as car or suitcase. The encoded data shall identify
which body part or object is annotated according to Table 3. In cases where multiple modalities appear
in one annotated region (e.g. face and ear), the encoded data shall represent at least one object. It
supports annotation data of multiple objects.
Table 3 — XSD schema for encoding of object class information
 
  
    minOccurs="0" />
   
  
 
4 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC 30137-4:2021(E)

 
  
   
   
   
  
 

 
  
   
    
     face
    
   
   
    
     mouth
    
   
   
    
     singleIris
    
   
   
    
     bothIrides
    
   
   
    
     ear
    
   
   
    
     torso
    
   
   
    
     fingerPrintSingle
    
   
   
    
     fingerPrintTwo
    
   
   
    
     fingerPrintFour
    
   
   
    
     fingerPrintFourAndThumb
    
   
   
    
     palm
    
   
   
    
     backOfHand
    
   
   
© ISO/IEC 2021 – All rights reserved 5

---------------------- Page: 11 ----------------------
ISO/IEC 30137-4:2021(E)


lowerArm

   
   

legs

   
   

fullBody

   
   

person

   
  
 
 
  
   
   
  
 
 
  
   
   
  
 
6.4 Encoding of object information
6.4.1 Generic object information
Object information shall be encoded according to Table 4.
Table 4 — XSD schema for encoding of object information
 
 
  
   
   
   
   
   
   
  
 
 
  
   
   
   
  
 
 
  
   
6 © ISO/IEC 2021 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/IEC 30137-4:2021(E)

   
   
  
 
 
  
   
   
   
   
   
  
 
 
  
   
   
   
  
 
EXAMPLE Ambient conditions such as illumination spectrum can be encoded in the data element field of
user-defined subject information.
6.4.2 Encoding of human subject metadata
Subject-specific information shall be encoded according to Table 5. Additional non-standard data may
be included using the encoding of Table 6.
Table 5 — XSD schema for encoding of human subject-specific information
 
  
   
   
   
   
   
   
   
  
 
Table 6 — XSD schema for encoding of extended/proprietary subject metadata
 
  
   
   
  
 
6.5 Encoding of an annotation
Regions around objects appearing in two-dimensional video frames, or still images, shall be encoded
according to Table 7.
Table 7 — XSD schema for encoding of an annotation
 

...

INTERNATIONAL ISO/IEC
STANDARD 30137-4
First edition
Information technology — Use of
biometrics in video surveillance
systems —
Part 4:
Ground truth and video annotation
procedure
PROOF/ÉPREUVE
Reference number
ISO/IEC 30137-4:2021(E)
©
ISO/IEC 2021

---------------------- Page: 1 ----------------------
ISO/IEC 30137-4:2021(E)

COPYRIGHT PROTECTED DOCUMENT
© ISO/IEC 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii PROOF/ÉPREUVE © ISO/IEC 2021 – All rights reserved

---------------------- Page: 2 ----------------------
ISO/IEC 30137-4:2021(E)

Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 2
5 Conformance . 2
6 Encoding of information supporting annotations . 3
6.1 Overview . 3
6.2 Region annotation . 3
6.2.1 Content . 3
6.2.2 Encoding of a bounding box . 3
6.2.3 Encoding of a polygonal region . 4
6.3 Encoding of object class information . 4
6.4 Encoding of object information . 6
6.4.1 Generic object information . 6
6.4.2 Encoding of human subject metadata . 7
6.5 Encoding of an annotation . 7
6.6 Encoding of frame timestamps . 8
6.7 Encoding of frames and intervals . 8
6.8 Encoding of a track . 8
6.9 Encoding of imaging system information . 9
7 Annotation of one video sequence .10
7.1 Overview .10
7.2 Annotation of tracks in video sequence .10
7.3 Annotation of absence in video sequence .10
7.4 Annotation of counting information .11
Annex A (normative) ISO/IEC 30137-4 XSD Schema .12
Bibliography .18
© ISO/IEC 2021 – All rights reserved PROOF/ÉPREUVE iii

---------------------- Page: 3 ----------------------
ISO/IEC 30137-4:2021(E)

Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives or www .iec .ch/ members
_experts/ refdocs).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www .iso .org/ patents) or the IEC
list of patent declarations received (see patents.iec.ch).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www .iso .org/
iso/ foreword .html. In the IEC, see www .iec .ch/ understanding -standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 37, Biometrics.
A list of all parts in the ISO/IEC 30137 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html and www .iec .ch/ national
-committees.
iv PROOF/ÉPREUVE © ISO/IEC 2021 – All rights reserved

---------------------- Page: 4 ----------------------
ISO/IEC 30137-4:2021(E)

Introduction
Considerable improvements in the performance of automated face recognition (AFR) have resulted in
applications such as automated border control, where facial images encoded in ePassports are compared
with the face presented by a traveller at a control point. The success of these first generation AFR
systems has encouraged suppliers to consider other applications, where the subject is not necessarily
aware of the use of biometric comparison and where the environment for collection of images can
be far from optimal. The inferior performance in such less-controlled identification applications can
necessitate a greater involvement by trained personnel.
The ISO/IEC 30137 series provides guidance on the use of biometric technologies (primarily automated
face recognition) in video surveillance systems (VSS) for several scenarios, including real-time operation
against watchlists and post-event analysis of video data. The ISO/IEC 30137 series includes guidance on
the selection and placement of cameras through to system specification, testing and maintenance. The
ISO/IEC 30137 series uses the term VSS to replace the older, but commonly used, term, closed circuit
television (CCTV).
The ISO/IEC 30137 series addresses the annotation of human beings. It is not intended to provide for
annotation of non-human objects such as cars, animals, or luggage.
Records conformant to this document can be produced from video in either of the following ways:
— automatically, in which software analyses video and estimates quantities defined in this document,
or
— manually, in which human reviewers annotate video with a goal of producing ground truth video
annotation, which can be used by a receiving system (i.e. any service or device that decodes,
interprets and uses standardized data).
This supports several applications, including:
— People counting:
— stating of the number of people present in a location,
— stating of the number of people traversing a given point or volume,
— stating of population density (e.g. in crowds),
— measurement of crowd densities,
— performance of crowd behavioural analyses.
— Automated detection and tracking:
— automated enrolment (addition) of subjects to a watchlist, exhaustively or after behavioural
analysis,
— detection of subjects, and parts of subjects (e.g. faces),
— tracking of subjects through time, e.g. following motion in a single video,
— tracking of subjects appearing through camera networks, including cases where a subject is
viewed simultaneously by different cameras, and those where the subject appears sequentially
before several cameras,
© ISO/IEC 2021 – All rights reserved PROOF/ÉPREUVE v

---------------------- Page: 5 ----------------------
ISO/IEC 30137-4:2021(E)

— re-identification, the process of connecting an identity of a subject across two or more video
sequences.
— Automated identification:
— in law enforcement, looking for subjects of interest present on watchlists (negative identification,
blacklists),
— in law enforcement, applications in review of post-event VSS video from one or multiple cameras
against watchlists,
— in private commercial settings, looking for individuals to be given preferential service,
— identification of cooperative enrolled subjects (positive access control, whitelists).
The standard includes annotation of the following information:
— Imaging type: single camera, sequential cameras, stereo cameras, combination, camera capture
spectrum.
— When the subject appears in the video (start time) and when they leave (end time).
— Brief description of the subject (what can be seen in the video?).
— Where and when the face of the subject appears.
— Brief description of the face (pose, orientation, expression, occlusion).
— Intermediate tracking points between the start and end times, for subject and face.
— Absolute description of the subject:
— estimated age, sex,
— hair and eye colour,
— estimated height and corpulence,
— clothing and clothing colour,
— glasses/hat,
— best subject image or best subject face image.
— Subject interactions with other subjects and groups.
— Subject interactions with other video elements (bag, car, etc.).
— Known identity of the subject.
— The presence of other subjects who are not annotated.
— Regions of interest, outside of which an algorithm or receiving system would not operate.
— Absence: Where items of interest, including subjects, are known to be absent.
Standardized annotation supports: evaluation, research and development, and operational deployment.
vi PROOF/ÉPREUVE © ISO/IEC 2021 – All rights reserved

---------------------- Page: 6 ----------------------
INTERNATIONAL STANDARD ISO/IEC 30137-4:2021(E)
Information technology — Use of biometrics in video
surveillance systems —
Part 4:
Ground truth and video annotation procedure
1 Scope
This document establishes requirements for the annotation of humans, human faces and other body
parts, and arbitrary objects appearing in imagery. It specifies the following:
— metadata to be inserted in a video stream;
— encoding of full and partial spatial and temporal ground truth information for:
— objects present in a video, and
— objects absent in a video;
— procedures for different annotation of known and unknown subjects.
This document does not specify:
— encoding of video data.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at http:// www .electropedia .org/
3.1
annotation
process of generating annotation data from imagery
3.2
annotation data
metadata associated with a subject traversing the field of view of a specific VSS camera
Note 1 to entry: An annotator preparing instances in accordance with this document should document the
criteria under which a subject annotation was made. For example, it can be policy to not annotate faces for which
interocular distance is below 12 pixels.
Note 2 to entry: If annotations are made by following a strict, tightly constrained or narrow set of criteria, then
detection, tracking, recognition or algorithm is expected to be more accurate than if more permissive or general
criteria has been used.
© ISO/IEC 2021 – All rights reserved PROOF/ÉPREUVE 1

---------------------- Page: 7 ----------------------
ISO/IEC 30137-4:2021(E)

Note 3 to entry: An evaluation of, a tracking algorithm, for example, might exclude subjects that traverse in a non-
conformant way. This could include factors such as the subject’s direction of travel, obscuration by other people
or objects, operational functionalities of the camera (such as correct focus) or environmental conditions (e.g.
operation during night or day).
3.3
bounding box
rectangular region enclosing annotated object
Note 1 to entry: The major and minor axes of the rectangle are parallel to the edges of the images. For rotated
boxes, the polygon annotation is to be used.
3.4
bounding polygon
arbitrary region enclosing annotated object
3.5
video surveillance system
system consisting of camera equipment, monitoring and associated equipment for transmission and
controlling purposes, which can be necessary for the surveillance of a protected area
3.6
random access
ability to access arbitrary parts of a media item
3.7
recognition
process of assigning a biometric identifier to a subject
3.8
identification
process of determining a subject’s identity by comparing imagery of a biometric mode against a
database formed from imagery of individuals
Note 1 to entry: This generally does not include assigning an identifier when the target subject is not found in the
database.
4 Abbreviated terms
AFR automated facial recognition
ROI region of interest
VSS video surveillance system
5 Conformance
A biometric data record conforms to this document if it satisfies all normative requirements related to:
— its semantic requirements,
— its encoding requirements for structure, data values, and the relationships between its data
elements, as specified throughout Clauses 6 and 7 and Annex A for the biometric record format of
this document, and
— the relationship between its data values and the input biometric data from which the biometric data
record was generated.
2 PROOF/ÉPREUVE © ISO/IEC 2021 – All rights reserved

---------------------- Page: 8 ----------------------
ISO/IEC 30137-4:2021(E)

6 Encoding of information supporting annotations
6.1 Overview
The following subclauses define encodings used in the full annotation of video clips, as detailed in
Clause 7.
6.2 Region annotation
6.2.1 Content
An annotation of a body or body part shall enclose the region. An exception applies for a human face
which may be annotated using anthropometric landmarks instead of, or in addition to, a bounding
region.
A region annotation should be as precise as possible without adding an arbitrary margin around the
object.
NOTE An object recognition algorithm can need to adjust the amount of spatial margin in the annotated
region, depending on how it was trained and on its translational invariance.
If an object appears as two or more separated parts due to occlusion, two or more polygonal regions
may be used. In this case, the polygonal regions shall be linked together by using a common identifier.
6.2.2 Encoding of a bounding box
Bounding boxes are the simplest mechanism for spatial annotation. They are rectangles whose major
and minor axes are parallel to the image axes. They shall be encoded according to Table 1.
Table 1 — XSD schema for encoding of bounding box information
  schemaLocation="iso-iec-39794-5-ed-1-v1.xsd" />

 
  
   This is the root element of the 30137-4 data
structure.
  
 

 
  
   
   
  
 

 
  
   
   
  
 

 
  
   
   
  
 

© ISO/IEC 2021 – All rights reserved PROOF/ÉPREUVE 3

---------------------- Page: 9 ----------------------
ISO/IEC 30137-4:2021(E)

 
  
   
   
   
  
 

 
  
   
   
  
 

 
  
    maxOccurs="unbounded" />
  
 

 
  
   
   
  
 
6.2.3 Encoding of a polygonal region
Polygonal regions are the secondary mechanism for spatial annotation. They are available for
annotation of objects that cannot be adequately localized, contained or demarcated by a bounding box.
Bounding polygons shall be encoded according to Table 2.
Table 2 — XSD schema for encoding of polygon information
 
  
    type="CartesianCoordinateListType" />
   
  
 
6.3 Encoding of object class information
This subclause annotates arbitrary objects. This kind of object is referred to as its class. For biometric
modalities, the class indicates the body part, such as a face, an ear, or a whole body. For other (generally)
non-human objects, the class indicates a noun such as car or suitcase. The encoded data shall identify
which body part or object is annotated according to Table 3. In cases where multiple modalities appear
in one annotated region (e.g. face and ear), the encoded data shall represent at least one object. It
supports annotation data of multiple objects.
Table 3 — XSD schema for encoding of object class information
 
  
    minOccurs="0" />
   
  
 

4 PROOF/ÉPREUVE © ISO/IEC 2021 – All rights reserved

---------------------- Page: 10 ----------------------
ISO/IEC 30137-4:2021(E)

 
  
   
   
   
  
 

 
  
   
    
     face
    
   
   
    
     mouth
    
   
   
    
     singleIris
    
   
   
    
     bothIrides
    
   
   
    
     ear
    
   
   
    
     torso
    
   
   
    
     fingerPrintSingle
    
   
   
    
     fingerPrintTwo
    
   
   
    
     fingerPrintFour
    
   
   
    
     fingerPrintFourAndThumb
    
   
   
    
     palm
    
   
   
    
     backOfHand
    
   
   
© ISO/IEC 2021 – All rights reserved PROOF/ÉPREUVE 5

---------------------- Page: 11 ----------------------
ISO/IEC 30137-4:2021(E)

    
     lowerArm
    
   
   
    
     legs
    
   
   
    
     fullBody
    
   
   
    
     person
    
   
  
 

 
  
   
   
  
 

 
  
   
   
  
 
6.4 Encoding of object information
6.4.1 Generic object information
Object information shall be encoded according to Table 4.
Table 4 — XSD schema for encoding of object information
 
 
  
   
   
   
   
   
   
  
 

 
  
   
   
   
  
 

 
  
   
6 PROOF/ÉPREUVE © ISO/IEC 2021 – All rights reserved

---------------------- Page: 12 ----------------------
ISO/IEC 30137-4:2021(E)

   
   
  
 

 
  
   
   
   
   
   
  
 

 
  
   
   
   
  
 
EXAMPLE Ambient conditions such as illumination spectrum can be encoded in the data element field of
user-defined subject information.
6.4.2 Encoding of human subject metadata
Subject-specific information shall be encoded according to Table 5. Additional non-standard data may
be included using the encoding of Table 6.
Table 5 — XSD schema for encoding of human subject-specific information
 
  
   
   
   
   
   
   
   
  
 
Table 6 — XSD schema for encoding of extended/proprietary subject metadata
 
  
   
   
  

...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.