ISO/IEC 19795-10:2024
(Main)Information technology — Biometric performance testing and reporting — Part 10: Quantifying biometric system performance variation across demographic groups
Information technology — Biometric performance testing and reporting — Part 10: Quantifying biometric system performance variation across demographic groups
This document establishes requirements for estimating and reporting on performance variations observed when cohorts belonging to different demographic groups engage with biometric enrolment and recognition systems. In this context, performance refers to failure-to-enrol rate, failure-to-acquire rate, shifts in comparison score, recognition error rates, and aspects of response and processing time (throughput). This document is applicable to the following: — demographic group membership; — using phenotypic measures; — reporting on tests; — stating statistical uncertainty estimates; — operational thresholds settings; — equitability; — procurement agency activities. This document also provides terms and definitions to be used when reporting performance variation across demographic groups. This document is applicable to: — technology evaluations of algorithms, subsystems and systems; — scenario evaluations of systems; — operational evaluations of fielded systems. Application of this document does not require detailed knowledge of a system’s algorithms but it does require specific knowledge of the demographic characteristics for the population of interest.
Technologies de l'information — Essais et rapports de performance biométriques — Partie 10: Quantification de la variation des performances du système biométrique selon les groupes démographiques
General Information
Standards Content (Sample)
International
Standard
ISO/IEC 19795-10
First edition
Information technology —
2024-10
Biometric performance testing and
reporting —
Part 10:
Quantifying biometric system
performance variation across
demographic groups
Technologies de l'information — Essais et rapports de
performance biométriques —
Partie 10: Quantification de la variation des performances du
système biométrique selon les groupes démographiques
Reference number
© ISO/IEC 2024
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
© ISO/IEC 2024 – All rights reserved
ii
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 2
4 Conformance . 4
5 Planning the evaluation . 4
5.1 Identifying the scope of the evaluation .4
5.2 Demographic variables .5
5.2.1 Ground truth requirements .5
5.2.2 Categorical demographic variables .5
5.2.3 Continuous demographic variables .7
5.2.4 Other demographic variables . .8
6 Executing the evaluation . 8
6.1 Generation of mated comparison and identification trials .8
6.2 Generation of non-mated comparison and identification trials .8
6.2.1 General .8
6.2.2 Verification (1:1) .8
6.2.3 Identification (1:N) .8
6.3 Selection of a threshold .9
6.4 Calculating differential performance based on categorical variables for two specific
demographic groups .9
6.4.1 General .9
6.4.2 Differential performance between two groups based on mathematical difference.9
6.4.3 Differential performance between two groups based on mathematical ratio .10
6.5 Calculating differential performance based on categorical variables for more than two
groups .10
6.5.1 General .10
6.5.2 Differential performance for more than two groups based on the largest error
rate relative to the geometric mean .10
6.5.3 Differential performance for more than two groups based on the Gini coefficient .11
6.6 Calculating differential performance in identification trials .11
6.7 Calculating demographic differentials for failure-to-enrol rate, failure-to-acquire rate
and transaction duration . 12
6.8 Calculating demographic differentials for continuous variables . 12
6.9 Comparison score differential measures . 13
6.10 Calculating uncertainty .14
6.10.1 Uncertainty in demographic differentials .14
6.10.2 Sampling the target population .14
6.10.3 Sample size requirements . 15
7 Reporting the evaluation results .16
7.1 Reporting the experimental design .16
7.2 Reporting the target application .16
7.3 Reporting the test population .16
7.4 Reporting differential performance .17
7.4.1 Reporting differential performance on previously collected datasets. .17
7.4.2 Reporting differential performance for two or more groups .17
7.4.3 Reporting differential performance against a benchmark .18
7.4.4 Reporting error trade-off metrics .18
7.4.5 Reporting threshold management policy .18
7.5 Reporting comparison score differential measures .18
7.6 Reporting exception handling .19
© ISO/IEC 2024 – All rights reserved
iii
Annex A (informative) Example of estimating sample size for differential performance .20
Annex B (informative) Calculating aggregate equitability measures .23
Bibliography .25
© ISO/IEC 2024 – All rights reserved
iv
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical activity.
ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations,
governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
The procedures used to develop this document and those intended for its further maintenance are described
in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the different types
of document should be noted. This document was drafted in accordance with the editorial rules of the ISO/
IEC Directives, Part 2 (see www.iso.org/directives or www.iec.ch/members_experts/refdocs).
ISO and IEC draw attention to the possibility that the implementation of this document may involve the
use of (a) patent(s). ISO and IEC take no position concerning the evidence, validity or applicability of any
claimed patent rights in respect thereof. As of the date of publication of this document, ISO and IEC had not
received notice of (a) patent(s) which may be required to implement this document. However, implementers
are cautioned that this may not represent the latest information, which may be obtained from the patent
database available at www.iso.org/patents and https://patents.iec.ch. ISO and IEC shall not be held
responsible for identifying any or all such patent rights.
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and expressions
related to conformity assessment, as well as information about ISO's adherence to the World Trade
Organization (WTO) principles in the Technical Barriers to Trade (TBT) see www.iso.org/iso/foreword.html.
In the IEC, see www.iec.ch/understanding-standards.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 37, Biometrics.
A list of all parts in the ISO/IEC 19795 series can be found on the ISO and IEC websites.
Any feedback or questions on this document should be directed to the user’s national standards
body. A complete listing of these bodies can be found at www.iso.org/members.html and
www.iec.ch/national-committees.
© ISO/IEC 2024 – All rights reserved
v
Introduction
As the use of biometric technology increases, so too does public interest in establishing whether the
technology performs similarly for all individuals. Stakeholders are asking government and industry
organizations that use biometric technology to establish whether these technologies vary in performance
for different demographic groups. The intention of this document is to provide guidance on how to measure
[2]
and report performance variation across demographic groups.
This document is intended to help organizations evaluate demographic performance in biometric systems
and report their results. Specifically, this document outlines how to measure and report biometric
performance variations across demographic groups. It provides a set of metrics and best practices to
facilitate such testing. However, this document does not provide guidance on how to establish specific
causes for the observed variations. The following demographic variables are explicitly discussed in this
[7][10][12]
document:
— biological characteristics, such as:
— sex, age, weight, height and skin lightness;
— social constructs, such as:
— ethnicity, gender and language.
Many other variables can cause systematic changes in biometric characteristics or in how individuals
interact with biometric systems. The following demographic variables are relevant although not explicitly
discussed in this document:
— performance variations based on temporary states, such as:
— self-styling (e.g. makeup, eyewear, mask-wearing, clothing, hairstyles),
— behavioural or emotional states (e.g. intoxication),
— behaviours (e.g. smiling, closing eyes, varying pose);
— performance variation caused by diseases or injuries, such as:
— eye surgery, cataracts, vision correction,
— stroke, cleft lip, Apert’s syndrome,
— missing digits;
— performance variation caused by disabilities.
Demographic performance variation for applications other than biometric recognition, such as emotion,
gender or age estimation, are not considered in this document.
© ISO/IEC 2024 – All rights reserved
vi
International Standard ISO/IEC 19795-10:2024(en)
Information technology — Biometric performance testing
and reporting —
Part 10:
Quantifying biometric system performance variation across
demographic groups
1 Scope
This document establishes requirements for estimating and reporting on performance variations observed
when cohorts belonging to different demographic groups engage with biometric enrolment and recognition
systems. In this context, performance refers to failure-to-enrol rate, failure-to-acquire rate, shifts in
comparison score, recognition error rates, and aspects of response and processing time (throughput).
This document is applicable to the following:
— demographic group membership;
— using phenotypic measures;
— reporting on tests;
— stating statistical uncertainty estimates;
— operational thresholds settings;
— equitability;
— procurement agency activities.
This document also provides terms and definitions to be used when reporting performance variation across
demographic groups.
This document is applicable to:
— technology evaluations of algorithms, subsystems and systems;
— scenario evaluations of systems;
— operational evaluations of fielded systems.
Application of this document does not require detailed knowledge of a system’s algorithms but it does
require specific knowledge of the demographic characteristics for the population of interest.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content constitutes
requirements of this document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
ISO/IEC 2382-37, Information technology — Vocabulary — Part 37: Biometrics
© ISO/IEC 2024 – All rights reserved
3 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO/IEC 2382-37 and the following apply.
ISO and IEC maintain terminology databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1
differential performance measure
DPM
difference in biometric system measures across different demographic groups
EXAMPLE Differences in error rates [e.g. False Match Rate (FMR), False Non-Match Rate (FNMR)] between
different demographic groups.
Note 1 to entry: ISO/IEC 2382-37:2022 term 37.09.28 defines “demographic differential” as the difference in “outcome
of a biometric system”. This definition is equivalent to this document’s “differential performance measure”. This
document also recognizes other kinds of demographic differentials, such as differential treatment (3.7) and comparison
score differential measure (3.4).
3.2
false negative differential performance
FND
difference in false negative error rates calculated across multiple demographic groups
EXAMPLE If Group A’s false non-match rate is 10 %, and Group B’s false non-match rate is 20 %, the false negative
differential is 10 percentage points if viewed as a mathematical difference or a factor of 2 if viewed as a mathematical
ratio (see 6.4).
3.3
false positive differential performance
FPD
difference in false positive error rates calculated across multiple demographic groups
EXAMPLE If Group A’s false match rate is 1 %, and Group B’s false match rate is 3 %, the false positive differential
is 2 percentage points if viewed as a mathematical difference or a factor of 3 if viewed as a mathematical ratio (see 6.4).
3.4
comparison score differential measure
difference in system measures across different demographic groups represented through comparison score
analysis
EXAMPLE Differences in mean comparison scores for different demographic groups (see 6.9).
3.5
mated comparison score differential measure
difference in the statistics of mated score distributions observed for different demographic groups
EXAMPLE If the mean mated comparison score for subjects in Group A is 10 and the mean mated comparison score
for subjects in Group B is 5, then the mated comparison score differential measure is a mean difference of 5 (see 6.9).
3.6
non-mated comparison score differential measure
difference in the statistics of non-mated score distributions observed for different demographic groups
EXAMPLE If the mean non-mated comparison score for subjects in Group A is 10 and the mean non-mated
comparison score for subjects in Group B is 5, then the non-mated comparison score differential measure is a mean
difference of 5 (see 6.9).
© ISO/IEC 2024 – All rights reserved
3.7
differential treatment
different set of actions for a biometric enrolee or biometric capture subject based on their demographic group
EXAMPLE Implementing a system in which one machine learning model recognizes male faces and a different
machine learning model recognizes female faces.
3.8
categorical demographic variable
demographic variable of an individual that is nominally or ordinally described
EXAMPLE A data subject’s gender or ethnicity.
3.9
continuous demographic variable
demographic variable of an individual that is observable, measurable and not necessarily constrained to
discrete categories
EXAMPLE An individual’s age or the measurement of a phenotypic trait, such as an individual’s skin lightness.
3.10
intersectional demographic variable
demographic group that is the combination of multiple categorical demographic variables.
EXAMPLE A data subject’s gender-ethnicity.
3.11
demographic group
value of a continuous, categorical or intersectional demographic variable associated with a data subject
EXAMPLE A data subject that has self-reported their gender as female has a demographic group of female for the
categorical demographic variable of gender.
3.12
demographic reference database
database comprising biometric references annotated with demographic variables and groups
3.13
aggregate equitability measure
AEM
performance measure that combines multiple measures of differential performance into an aggregate
measure of overall differential performance
3.14
confidence interval
interval estimator (T , T ) for the parameter θ with the statistics T and T as interval limits and for which it
0 1 0 1
holds that P[T < θ < T ] ≥ 1 – α
0 1
Note 1 to entry: Unless otherwise stated, the threshold for statistical significance, α, is 0.05, which equates to a 95 %
probability that the parameter is within the interval limit.
[SOURCE: ISO 3534-1:2006, 1.28, modified — original Notes to entry have been removed and replaced by a
new Note 1 to entry.]
3.15
effect magnitude
statistical measure of the size of an observed differential
EXAMPLE 1 A mathematical difference of 20 percentage points in false non-match rates between two demographic
groups (e.g. 5 % vs. 25 %).
EXAMPLE 2 A mathematical ratio of 5 between false non-match rates between two demographic groups (e.g. 5 %
vs. 25 %).
© ISO/IEC 2024 – All rights reserved
4 Conformance
To conform to this document, a biometric evaluation assessing performance variation across demographic
groups shall be planned, executed and reported in accordance with the requirements contained in Clauses 5 to 7.
5 Planning the evaluation
5.1 Identifying the scope of the evaluation
This subclause establishes the experimental methods for designing evaluations to measure demographic
differences in the performance of biometric systems. In general, experimental design includes setting the
objectives of an evaluation and determining the statistical properties and design of the evaluation to match
the objectives. This document applies specifically to evaluations in which one of the objectives is to calculate
differential performance measures (DPMs) or calculate comparison score differential measures in biometric
systems across different demographic groups. Prior to executing the evaluation, the tester shall prepare a
test plan describing the evaluation.
— Test plans shall describe the objectives as well as any models or hypotheses of the evaluation, including:
— demographic variables and groups of interest;
— biometric performance measures of interest;
— demographic differential performance measure(s) and/or comparison score differential measure(s)
of interest;
— effect magnitude(s) of interest.
— Test plans shall describe the data that will be gathered to test these models or hypotheses, including:
— how demographic variables are to be measured or otherwise collected;
— manipulated, fixed or blocked factors, including counterbalancing factors where appropriate;
— controls for non-tested factors;
— target sample size requirements, including the rationale for cohort selection when generating mated
and non-mated trials and the target type I and type II error.
— Test plans should describe what analyses will be performed on these data and what inductions will be
attempted, including:
— the expected uncertainty around differential performance measures;
— statistical tests to be performed.
The balance between the internal and external validity of the evaluation should be considered and explained.
Evaluations with high internal validity, such as technology tests, focus on specific components of biometric
systems and are well controlled: many factors are considered, documented and manipulated in a controlled
fashion or fixed at pre-determined levels. Evaluations with high external validity, such as operational tests,
are not necessarily able to attribute the observed differentials uniquely to the factors of interest due to
uncontrolled variation in the test environment. These evaluations therefore have lower internal validity.
Any deviations between the test design and the envisioned operational conditions for the system shall be
noted and reported as these efforts to control variation may change the effect magnitude observed relative
to the target environment within which the biometric system operates (see 7.2).
This document does not specify what constitutes an acceptable amount of differential performance. To
inform the design of the evaluation, regulators or procurement guidelines can specify allowable differential
performance, where appropriate, calculated according to at least one of the methods described in 6.4 to 6.9.
© ISO/IEC 2024 – All rights reserved
When specifying that a level of differential performance is not acceptable, regulators or procurement
guidelines can utilize benchmarks as described in 7.4.3.
5.2 Demographic variables
5.2.1 Ground truth requirements
Evaluations of biometric performance have strict requirements for establishing the ground truth identity of
data capture subjects. This is to ensure the validity of any metrics derived from these classifications, such
as false match and false non-match rates. Evaluations of biometric performance across demographic groups
have three additional constraints:
— Demographic evaluations shall specify the demographic variables of interest.
— Each demographic variable shall be comprised of defined demographic groups which shall be associated
to individual data capture subjects.
— Demographic group membership for demographic variables of interest and other metadata should be
collected at the same time as the corpus samples to avoid errors in inference.
Evaluations to measure performance variation across demographic groups should involve focused data
collection where demographic groups are recorded and where ground-truth identity information is
established.
Demographic group membership should not be inferred directly from biometric samples. An example of
this is assigning the value of ethnicity or the value of gender from a face sample. Demographic groups are
properties of a data subject or a data subject’s biometric characteristic. They are not properties of a biometric
sample. Estimating demographic groups from biometric samples can introduce spurious correlations
between biometric performance and demographic variables. For example, if the width of a face or eyelid
palpebral aperture used to estimate the demographic group is measured from the same sample used for
biometric comparison, any lens distortion can affect both the biometric and the demographic outcomes. If
it is not possible to establish demographic group membership independent of the biometric sample, other
techniques should be applied (see 5.2.2 and 5.2.3). In this case, the tester should carefully consider and shall
document any correlations and impacts between demographic variables and the biometric sample collection
technique.
Many demographic variables are categorical. Categorical demographic variables are those that take a
distinct, limited number of possible values, such as gender and ethnicity. Other demographic variables are
continuous and have an infinite number of possible values. These can be combined into demographic groups
for the purpose of analysis. In some practical applications, continuous demographic variables such as age
and height are bound by natural limits and should be reported in appropriate granularity.
5.2.2 Categorical demographic variables
5.2.2.1 Sex
Sex is defined as the state of being male or female as it relates to biological factors such as DNA, anatomy
and physiology. Sex typically consists of two categories, “male” and “female”. Female individuals generally
possess two copies of the X chromosome. Male individuals generally possess one copy each of an X and a
Y chromosome. Important exceptions do occur and complicate binary classification. The tester should
establish appropriate categories for sex. If necessary, the tester can extend the general binary classification
model of male/female.
When sex is included in the evaluation, it shall be determined through the collection and analysis of DNA or
by self-report. In evaluations that include sex, the tester shall prepare a statement that documents how sex
was determined (see 7.3).
NOTE If sex was determined by self-report, gender can also be recorded.
© ISO/IEC 2024 – All rights reserved
5.2.2.2 Gender
Gender is defined as the classification of individuals as male, female or additional categories based on social,
cultural or behavioural qualities. An individual’s gender identity can consist of multiple, distinct categories.
An individual’s gender can also change over time. When gender is included in the evaluation, gender should
be determined through self-reporting. Gender self-reporting options presented to the capture subject shall
be documented. Gender should not be assigned by the tester conducting the evaluation.
In some evaluations that include gender, it is not always possible to obtain self-reported gender information.
In this case, the tester shall prepare a statement describing why self-reporting was not possible and potential
inferential errors this can cause (see 7.3).
5.2.2.3 Ethnicity
In the context of biometric evaluations, ethnicities are classifications of individuals within a society based
on shared qualities that are generally considered distinct within that society. Categories can reflect common
physical characteristics, ancestry, language, community, religious affiliation, cultural heritage or other
common qualities.
When ethnicity is included in the evaluation, the tester shall prepare a statement that documents the
method for determining ethnicity. If utilizing self-reporting to establish ethnicity, the tester shall prepare
a statement that documents the ethnicity self-reporting options presented to the data subject (see 7.3). In
technology and scenario evaluations, ethnicity shall be recorded and associated with collected samples
through voluntary self-reporting. In operational tests ethnicity shall be established through voluntary self-
reporting or from available ID data. Whenever possible, data subjects should be given the opportunity to
select multiple ethnicities to designate multi-ethnic identities and the evaluation plan should enumerate
the presented choices. Ethnicity shall not be assigned by the tester, for example by inspecting samples (see
Reference [5] for further details).
In case of any deviations from these requirements, the tester shall prepare a statement that documents
the specific reasons for use of alternate means of ethnicity determination and potential inferential errors
this can cause (see 7.3). Finally, establishing ground truth for ethnicity can be inherently complex. The
population of individuals that identify with specific ethnic groups can change across different societies and
within a society over time. Because of this complexity, testers shall not use other demographic variables as
proxies for ethnicity.
5.2.2.4 Birthplace
Birthplace refers to the geographic location (e.g. a region or country) where an individual was born. When
birthplace is included in the evaluation, birthplace shall be established through voluntary self-reporting
or from available ID data or documents. In evaluations that include birthplace, the tester shall prepare a
statement that documents the method for determining birthplace. If utilizing self-reporting to establish
birthplace, the tester shall prepare a statement that documents birthplace self-reporting options presented
to the data subject. If birthplace is recorded more finely than nation state (e.g. by a region within a country),
the tester shall prepare a statement that documents how this granularity was established (see 7.3).
Birthplace is a distinct demographic variable from ethnicity and shall not be used as a proxy for ethnicity.
EXAMPLE Birthplace can be retrieved from an identity credential in an operational evaluation.
NOTE ID data or documents are preferred to voluntary self-report due to their reliability. However, self-report
has advantages in terms of data minimization.
5.2.2.5 Place of residence
Place of residence refers to the geographic location (e.g. a region or country) where an individual lives or
resides. When place of residence is included in the evaluation, place of residence shall be established through
voluntary self-reporting, from available ID data, or documents. In evaluations that include place of residence,
the tester shall prepare a statement that documents how place of residence was determined and categorized
(see 7.3). Place of residence is a distinct demographic variable from ethnicity, since it can change quickly and
arbitrarily. Place of residence shall therefore not be used as a proxy for ethnicity or birthplace.
© ISO/IEC 2024 – All rights reserved
5.2.2.6 Native language
Native language refers to language that an individual acquires fully through extensive exposure during the
critical period of development and presently understands. When native language is included in an evaluation,
the tester shall prepare a statement that documents the method for determining native language. If utilizing
self-reporting to establish native language, the tester shall prepare a statement that documents native
language self-reporting options presented to the data subject (see 7.3).
NOTE Native language can be a sensitive, complex and multi-dimensional topic, such as in cases where multiple
languages are spoken at home.
5.2.3 Continuous demographic variables
5.2.3.1 Age
The age of an individual is the quantity of time that has elapsed since the moment of the individual’s
birth. Age is commonly expressed in months or years. When age is included in the evaluation, age shall be
established through self-reporting. Age can be subsequently verified via identity documents (e.g. a driver’s
licence, passport, birth certificate, etc.).
5.2.3.2 Weight
In the context of a biometric performance test, weight refers to the mass of an individual relative to their
gravitational conditions. Weight is expressed in Newtons but, for evaluations conducted under Earth’s
gravitational field, it is commonly expressed in kilograms. When weight is included in the evaluation, weight
shall be established through either self-reporting or measurement using a weight scale.
In some evaluations, it is not possible to obtain measured or self-reported weight information. In this case,
the ground-truth weight shall not be assigned by the tester.
5.2.3.3 Height
In the context of a biometric performance test, height refers to the measurement from the base (i.e. the feet)
to the top (i.e. the top of the head) of an individual. Height is commonly expressed in centimetres. When
height is included in the evaluation, height shall be established through either self-reporting or measurement
using a height scale. Height can be further verified via identity documents (e.g. a driver’s licence or passport).
In some evaluations, it is not possible to obtain measured or self-reported height information. In this case,
the ground-truth height shall not be assigned by the tester.
5.2.3.4 Skin lightness
Skin lightness is the perceptual lightness or darkness value of an individual’s skin. Skin lightness or darkness
is primarily determined by the amount of melanin in an individual’s skin cells. Skin lightness or darkness,
or the amount of melanin in skins cells, can be impacted by ethnicity as well as external factors, such as
exposure to ultraviolet radiation or levels of vitamin A in the body. When skin lightness is included in the
evaluation, The ground truth of skin lightness should be established by measuring the capture subject’s skin
[6][11][15]
for the L* component of the CIE L*a*b* colour. Skin lightness should be measured in a controlled
manner using a calibrated instrument such as a colorimeter or a spectrophotometer. Skin lightness shall
not be measured or estimated photographically unless the camera is colour-calibrated within the capture
environment (see Reference [11] for further details).
In some evaluations, it is not possible to obtain measured skin lightness via the L* component. In this case,
a statement shall be included regarding the specific reasons for use of alternate means of quantification,
such as a relative colour analysis, and conclusions should include a statement regarding potential inferential
errors this can cause.
© ISO/IEC 2024 – All rights reserved
5.2.4 Other demographic variables
There are many physical and social characteristics not identified in this document that can be described
as demographic variables. Some, but not all, can reasonably be expected to impact biometric performance.
For variables not defined in 5.2.2 and 5.2.3, the tester shall prepare a statement that documents how
ground truth was established (e.g. self-reported or measured, see 7.3). This also applies to intersectional
demographic variables, such as specific intersections of age and gender (e.g. females over 65 years of age).
6 Executing the evaluation
6.1 Generation of mated comparison and identification trials
This clause provides requirements and guidance on establishing ground truth and conducting verification
or identification trials where biometric samples from the same data subject and the same biometric
characteristics are compared. A mated verification (1:1) trial involves samples from just one data subject
and therefore a single demographic group, d . In identification applications, the demographic group of the
i
probe sample, d , can differ from the demographic group of samples in the demographic reference database.
i
However, in mated identification trials, d is always represented in the demographic reference database.
i
The comparison scores from mated verification and identification trials are used in the calculation of false
negative differential performance (see 6.4 to 6.6, 6.8) and false negative comparison score differential
measures (see 6.9). Demographic differentials based on mated trials can be computed solely based on probe
demographics, d . In identification trials specifically, the composition of the demographic reference database
i
shall be considered separately from the demographic composition of the probe set and shall also be reported
(see 7.3).
6.2 Generation of non-mated comparison and identification trials
6.2.1 General
This subclause provides requirements and guidance on establishing ground truth and conducting
verification or identification trials where biometric samples from two different data subjects are compared.
A non-mated verification trial can have two different demographic groups: the demographic group of the
probe sample, d , and the demographic group of the reference, d . Biometric performance metrics based on
i j
non-mated samples [e.g. false match rate (FMR)] are therefore related to the demographics of both data
subjects. Demographic differentials based on non-mated trials (i.e. false positive differential performance
and non-mated comparison score differential measures) are therefore also related to the demographics of
the multiple data subjects involved.
6.2.2 Verification (1:1)
In 1:1 trials that investigate false positive differential performance, one simplifying approach is to constrain
analyses to cohorts where the demographics of non-mated probe and reference samples are matched (e.g.
both female or both male). The tester then compares FMR for males and FMR for females. FMR measures can
then be constrained to a single demographic group, the demographics of the probe (i.e. FMR instead of
d
i
FMR given dd= ). When using this simplifying approach, the tester shall compare samples within
dd, ij
ij
each group, d , as is done in formulae of false positive differential performance (see 6.4 and 6.5).
i
6.2.3 Identification (1:N)
Measuring demographic differentials for evaluations of identification systems is more complex than
measuring demographic differentials for evaluations of verification systems. Unlike mated identification
trials, non-mated identification trials are not guaranteed to have samples in the demographic reference
database with demographics that match the demographic group of the probe sample, d . Consequently, the
i
tester shall specify the demographic composition of the demographic reference database, G , used for
identification trials. The demographic reference database can be composed of samples from one or more
demographic groups or intersections. To specify G , the tester shall enumerate the total number of references
© ISO/IEC 2024 – All rights reserved
enrolled, N . The tester shall also enumerate the number of enrolled references belonging to each
demographic group or intersectional (NN,,…). Because G is specified and fixed, false positive
dd
identification rate (FPIR) can be parameterized and computed solely as a function of the probe demographic
(i.e. FPIR instead of FPIR ). This simplifying approach is utilized in formulae of false positive
d dG,,⊆…dd ,
i i 12
differential performance in identification trials (see 6.6).
There are two options for the tester to configure the demographic reference database used for identification
trials. First, the tester can specify a demographic reference database to be representative of a particular use
case. Second, a tester can specify a demographic reference database with a constant number of samples per
group or intersectional (i.e. NN==.=N ). The tester can then compute FPIR against this database
dd d d
12 n i
for each probe demographic group, d . FPIR can then be used to compute false positive differentials in
i d
i
identification trials (see 6.6). Using this approach, measures of false positive demographic differentials are
valid only for the specified demographic reference database.
6.3 Selection of a threshold
Measurements of demographic differential performance shall reflect the differences in error rates, not the
differences in success rates. Error rates are threshold-dependent, so measures of differential performance
are also threshold-dependent. This is appropriate as most operational biometric systems operate at a fixed
threshold. Testers shall select a threshold in an evaluation of demographic differential
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...