ISO/TR 27877:2021
(Main)Statistical analysis for evaluating the precision of binary measurement methods and their results
Statistical analysis for evaluating the precision of binary measurement methods and their results
This document introduces five statistical methods for evaluating the precision of binary measurement methods and their results. The five methods can be divided into two types. Both types are based on measured values provided by each laboratory participating in a collaborative study. In the first type, each laboratory repeatedly measures a single sample. The samples measured by the laboratories are nominally identical. The second type is an extension of the first type, where there are several levels of samples. For each statistical method, this document briefly summarizes its theory and explains how to estimate the proposed precision measures. Some real cases are illustrated to help the readers understand the evaluation procedures involved. For the first and second types of methods, five and three cases are presented, respectively. Finally, this document compares the five statistical methods.
Analyse statistique pour l'évaluation de la fidélité des méthodes de mesure binaire et de leurs résultats
General Information
Relations
Buy Standard
Standards Content (Sample)
TECHNICAL ISO/TR
REPORT 27877
First edition
2021-10
Statistical analysis for evaluating the
precision of binary measurement
methods and their results
Analyse statistique pour l'évaluation de la fidélité des méthodes de
mesure binaire et de leurs résultats
Reference number
ISO/TR 27877:2021(E)
© ISO 2021
---------------------- Page: 1 ----------------------
ISO/TR 27877:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
© ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/TR 27877:2021(E)
Contents Page
Foreword .v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, and symbols . .1
3.1 Terms and definitions . 1
3.2 Symbols . 3
4 Overview . 4
5 Examples used in this document .6
5.1 Case 1: Listeria monocytogenes . 6
5.2 Case 2: Human cell line activation test (h-CLAT)-1 . 6
5.3 Case 3: Intraracheal administration testing . 7
5.4 Case 4: Histopathological classification of lung carcinoma . 8
5.5 Case 5: Human cell line activation test (h-CLAT)-2 . 8
5.6 Case 6: Statistical model for predicting chemical toxicity . 8
6 Statistical analysis for evaluating the precision of binary measurement methods
and their results .9
6.1 ISO 5725-based method . 9
6.1.1 Overview . 9
6.1.2 Case 1 .12
6.1.3 Case 2(a) .12
6.1.4 Case 2(b) .13
6.1.5 Case 3(a) . 13
6.1.6 Case 3(b) .13
6.2 Accordance and concordance . 13
6.2.1 Overview . 13
6.2.2 Case 1 . 15
6.2.3 Case 2(a) . 15
6.2.4 Case 2(b) . 16
6.2.5 Case 3(a) . 16
6.2.6 Case 3(b) . 16
6.3 ORDANOVA . . 16
6.3.1 Overview . 16
6.3.2 Case 1 . 18
6.3.3 Case 2(a) . 18
6.3.4 Case 2(b) . 19
6.3.5 Case 3(a) . 19
6.3.6 Case 3(b) . 19
6.4 CM-accuracy, sensitivity and specificity . 19
6.4.1 Overview . 19
6.4.2 Case 4 . 20
6.4.3 Case 5 . 20
6.4.4 Case 6 . 20
6.5 Kappa coefficient . 21
6.5.1 Overview . 21
6.5.2 Case 4 . 21
6.5.3 Case 5 . 21
6.5.4 Case 6 . 22
7 Remarks on the methods introduced in this document .22
7.1 Comparison between the mathematical expressions of the precision estimates .22
7.2 Comparison between the numerical examples of the precision estimates .23
iii
© ISO 2021 – All rights reserved
---------------------- Page: 3 ----------------------
ISO/TR 27877:2021(E)
Bibliography .25
iv
© ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/TR 27877:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 69, Application of statistical methods,
Subcommittee SC 6, Measurement methods and results.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
© ISO 2021 – All rights reserved
---------------------- Page: 5 ----------------------
ISO/TR 27877:2021(E)
Introduction
The documents in the ISO 5725 series define the precision of quantitative measurement methods
and their results, and assume that the errors follow normal distributions in their basic models. Also,
they provide how to run experiments to evaluate precision measures, such as repeatability and
reproducibility. Nowadays, there is also a demand for dealing with qualitative measurement methods
and their results, which output binary data, categorical data, etc. However, the ISO 5725 series is not
suitable mathematically for analyzing such data.
Several existing studies propose statistical methods for dealing with binary and/or categorical data,
but no guidance documents are available so far. Hence, this document summaries various methods to
evaluate the precision of binary measurement methods and their results, which are the most essential
and frequently used methods for qualitative data.
vi
© ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
TECHNICAL REPORT ISO/TR 27877:2021(E)
Statistical analysis for evaluating the precision of binary
measurement methods and their results
1 Scope
This document introduces five statistical methods for evaluating the precision of binary measurement
methods and their results. The five methods can be divided into two types. Both types are based on
measured values provided by each laboratory participating in a collaborative study. In the first type,
each laboratory repeatedly measures a single sample. The samples measured by the laboratories are
nominally identical. The second type is an extension of the first type, where there are several levels of
samples.
For each statistical method, this document briefly summarizes its theory and explains how to estimate
the proposed precision measures. Some real cases are illustrated to help the readers understand the
evaluation procedures involved. For the first and second types of methods, five and three cases are
presented, respectively.
Finally, this document compares the five statistical methods.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 3534-1, Statistics — Vocabulary and symbols — Part 1: General statistical terms and terms used in
probability
ISO 5725-1, Accuracy (trueness and precision) of measurement methods and results — Part 1: General
principles and definitions
3 Terms and definitions, and symbols
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 3534-1 and ISO 5725-1 and
the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1.1
accordance
probability that two binary measured values be identical when they are taken from the same laboratory
Note 1 to entry: The concept corresponds to the definition of “repeatability” in ISO 5725 and was originally
proposed by Reference [3].
1
© ISO 2021 – All rights reserved
---------------------- Page: 7 ----------------------
ISO/TR 27877:2021(E)
3.1.2
concordance
probability that two binary measured values be identical when they are taken from different
laboratories
Note 1 to entry: The concept corresponds to the definition of “reproducibility” in ISO 5725 and was originally
proposed by Reference [3].
3.1.3
ORDANOVA
statistical method for evaluating the precision of ordinal-scale measurement methods and their results
based on an ordinal dispersion measure
Note 1 to entry: The concept was originally proposed by Reference [4].
3.1.4
true positive
TP
correct measured value in positive results, that is, the case where both the measured and the correct
results are positive
3.1.5
true negative
TN
correct measured value in negative results, that is, the case where both the measured and the correct
results are negative
3.1.6
false positive
FP
incorrect measured value in positive results, that is, the case where the measured value is positive but
the correct one is negative
3.1.7
false negative
FN
incorrect measured value in negative results, that is, the case where the measured value is negative but
the correct one is positive
3.1.8
confusion matrix
2×2 matrix showing the numbers of true positives, true negatives, false positives and false negatives
3.1.9
CM-accuracy
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
measured values
Note 1 to entry: The term CM-accuracy is identical to the term accuracy in the machine learning field, and is not
generally used. However, this document uses CM-accuracy instead of accuracy in the machine learning field to
distinguish between the term accuracy in ISO 5725 and the term in the machine learning field. In this document,
the prefix CM stands for confusion matrixes because CM-accuracy can be calculated based on those.
3.1.10
sensitivity
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
positive measured values
2
© ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/TR 27877:2021(E)
3.1.11
specificity
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
negative measured values
3.1.12
CM-precision
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
measured values in positive measured values
Note 1 to entry: The term CM-precision is identical to the term precision in the machine learning field, and is not
generally used. However, this document uses CM-precision instead of precision in the machine learning field to
distinguish between the term precision in ISO 5725 and the term in the machine learning field. In this document,
the prefix CM stands for confusion matrixes because CM-precision can be calculated based on those.
3.1.13
F-measure
statistic for indicating the capability of two-class classifications, defined by the harmonic mean
between sensitivity and CM-precision
3.1.14
kappa coefficient
statistic for indicating the capability of two-class classifications, defined by the ratio of CM-accuracy
minus the possibility of the correctness occurring by chance to one minus the possibility of the
correctness occurring by chance
Note 1 to entry: Note to entry 1: The kappa coefficient is an extended statistic of CM-accuracy, originally
introduced by Reference [15], which takes into account the possibility of the correctness occurring by chance.
3.2 Symbols
L number of laboratories participating in a collaborative study
n
number of repetitions in each laboratory participating in a collaborative study
i
suffix describing a laboratory, and iL∈…1,,
{}
j
suffix describing a repetition, and jn∈…1,,
{}
measured value of repetition j of laboratory i
y
ij
sum of the measured values y of laboratory i for the case where y ∈ 01, , that is,
{}
ij ij
n
x
i
xy=∈ y 01,
(){}
i ∑ ij ij
j=1
n
y arithmetic mean of y of laboratory i , that is, yn= 11/,yi ∈…, L
() (){}
i ij i ij
∑
j=1
L n
y overall arithmetic mean of y , that is, yn= 1/ Ly
()()
ij ∑∑ ij
i==11j
p
number of positive measured values of laboratory i
n
i
L
p p p p
n sum of n with respect to iL∈…1,, , that is, nn=
{}
i i
∑
i=1
c ij, -element of a confusion matrix
()
ij
3
© ISO 2021 – All rights reserved
---------------------- Page: 9 ----------------------
ISO/TR 27877:2021(E)
m
general mean (expectation) in the basic model of ISO 5725-2
laboratory component of bias under repeatability conditions of laboratory i in the basic model
B
i
of ISO 5725-2
random error occurring in every measurement under repeatability conditions in the basic
e
ij
model of ISO 5725-2
2
within-laboratory variance of laboratory i
σ
ri
2
repeatability variance or within-laboratory variance
σ
r
2
between-laboratory variance
σ
L
2 22 2
σ reproducibility variance, that is, σσ=+σ
R Rr L
2
within-laboratory variance of laboratory i in the ISO 5725-based method
σ
ri:W
2
repeatability variance in the ISO 5725-based method
σ
rW:
2
between-laboratory variance in the ISO 5725-based method
σ
LW:
2 22 2
σ reproducibility variance in the ISO 5725-based method, that is, σσ=+σ
RW: RW::rW LW:
ordinal dispersion measure proposed by Reference [14] for the case of binary data assumed to
2
σ
follow a binomial distribution
2
within-laboratory variance of laboratory i in ORDANOVA
σ
ri:O
2
repeatability variance in ORDANOVA
σ
rO:
2
between-laboratory variance in ORDANOVA
σ
LO:
2 22 2
σ reproducibility variance in ORDANOVA, that is, σσ=+σ
RO: RO::rO LO:
p probability of obtaining a measured value y = 1 of laboratory i
i ij
L
p arithmetic mean of p , that is, pL= 1/ p
()
i i
∑
i=1
null hypothesis of a statistical test
H
0
2
test statistic of a chi-squared test
χ
0
A
accordance of Reference [3] method
C
concordance of Reference [3] method
ˆ
estimate of X
X
4 Overview
This document deals with the following five methods.
a) ISO 5725-based method (proposed by Reference [13]);
4
© ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/TR 27877:2021(E)
b) accordance and concordance (proposed by Reference [3]);
c) ORDANOVA (proposed by Reference [4]);
d) CM-accuracy, sensitivity and specificity;
e) Kappa coefficient (proposed by Reference [15]).
The assumed data structure depends on each method. In methods a), b) and c), each laboratory
measures one identical sample multiple times, and the data can be summarized as in Table 1 and/or
Table 2; while methods d) and e) are based on many levels of samples, and the data can be summarized
as in Table 3.
NOTE Nowadays, the probability for detection (POD) approach is used to analyze binary measured values,
instead of the methods d) and e), but this document deals with only classical methods. There are some ISO
documents introducing the POD approach; see References [1] and [2].
Table 1 — Data format for methods a), b) and c)
Laboratory Rep 1 Rep j Rep n
Lab 1 y y
y
1 j 1n
11
Lab i y y y
i1 ij in
y y y
Lab L
L1 Lj Ln
NOTE y is either 0 or 1, which means negative and positive measured
ij
values, respectively.
Table 2 — Another expression of Table 1
Laboratory Number of 1 Number of 0
Lab 1 x nx−
1 1
Lab i x nx−
i i
Lab L
x nx−
L L
NOTE n is the number of repetitions in each laboratory, and
xn∈…01,, , .
{}
i
Table 3 — Data format for methods d) and e)
Measured values
Actual values 1 0
1
c c
11 12
0 c c
21 22
NOTE cc,, cc, are non-negative integers.
11 12 21 22
5
© ISO 2021 – All rights reserved
---------------------- Page: 11 ----------------------
ISO/TR 27877:2021(E)
5 Examples used in this document
5.1 Case 1: Listeria monocytogenes
This subclause deals with the results of a collaborative study on Listeria monocytogenes, which was
quoted and analyzed in Reference [13]. The study consisted of ten laboratories, where each laboratory
repeated five-time measurements, that is, Ln==10, 5. The results are shown in Table 4. In this table,
numbers 1 and 0 mean that Listeria monocytogenes were and were not detected, respectively; and the
column Total indicates the total number of detections of Listeria monocytogenes.
Table 4 — Case 1 — The results of a collaborative study on Listeria monocytogenes
Measured value
Laboratory
Repetition Total
Lab 1 1 1 1 1 1 5
Lab 2 1 1 1 1 1 5
Lab 3 1 1 1 1 1 5
Lab 4 1 1 1 1 1 5
Lab 5 0 0 1 1 1 3
Lab 6 1 1 1 1 1 5
Lab 7 0 0 1 1 1 3
Lab 8 1 1 1 1 1 5
Lab 9 1 1 1 1 1 5
Lab 10 1 1 1 1 1 5
5.2 Case 2: Human cell line activation test (h-CLAT)-1
[5][6]
This subclause deals with the human cell line activation test (h-CLAT) . The h-CLAT is an alternative
to an animal experiment for evaluating the skin sensitization potential.
Reference [7] conducted a collaborative study and reported the results. The study consisted of five
laboratories, where each laboratory repeated three-time measurements. Each laboratory measured
21 chemicals, but this document deals with two chemicals out of the 21, denoted by chemical A and
chemical B, which have different pattern results. Case 2(a) reports the results of chemical A, shown in
Table 5, and Case 2(b) reports the results of chemical B, shown in Table 6.
Table 5 — Case 2(a) — Number of detections of the skin sensitization potential of chemical A
by h-CLAT in three-time measurements
Number of detections
Laboratory
in three repetitions
Lab 1 3
Lab 2 3
Lab 3 1
Lab 4 3
Lab 5 3
6
© ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
ISO/TR 27877:2021(E)
Table 6 — Case 2(b) — Number of detections of the skin sensitization potential of chemical B
by h-CLAT in three-time measurements
Number of detections
Laboratory
in three repetitions
Lab 1 0
Lab 2 2
Lab 3 0
Lab 4 1
Lab 5 0
5.3 Case 3: Intraracheal administration testing
[8]
This subclause deals with the intratracheal administration test . The test is a relatively new in vivo
screening method for evaluating the pulmonary toxicity of nanomaterials.
The National Institute of Advanced Industrial Science and Technology (AIST) and five test laboratories
[9]
conducted a collaborative study and reported the results . The study consisted of five laboratories,
where each laboratory reported positives (1) or negatives (0) of effects for 19 pathological findings
at three doses using five rats. This document deals with two pathological findings at one dose, which
have different patterns. Case 3(a) reports the results of appearance of alveolar macrophages following
the administration of 0,13 mg/kg weight of a multi-wall carbon nanotube (MWCNT), shown in Table 7.
Case 3(b) reports the results of hyperplasia of type II pneumocystis following the administration of
0,13 mg/kg weight of a MWCNT, shown in Table 8.
NOTE 1 See Reference [9] for the report, in Japanese, on the collaborative study and its original raw data.
NOTE 2 Each laboratory originally reported the strength of effects with five-level scores, −, ±, +, ++ and +++. In
this document, these five-level scores are gathered to two-level scores, 1 and 0. When the original score was −, it
is treated as negative (0); otherwise, the score is treated as positive (1).
Table 7 — Case 3 (a) —Results of a collaborative study of an intratracheal
administration testing with appearance of alveolar macrophages
Number of rats reported in each category
Laboratory
0 1
Lab A 0 5
Lab B 0 5
Lab C 0 5
Lab D 0 5
Lab E 0 5
Table 8 — Case 3 (b) —Results of a collaborative study of an intratracheal
administration testing with hyperplasia of type II pneumocystis
Number of rats reported in each category
Laboratory
0 1
Lab A 0 5
Lab B 3 2
Lab C 3 2
Lab D 1 4
Lab E 3 2
7
© ISO 2021 – All rights reserved
---------------------- Page: 13 ----------------------
ISO/TR 27877:2021(E)
5.4 Case 4: Histopathological classification of lung carcinoma
Reference [10] compared 75-case diagnosis results of adenosquamous carcinoma, a type of lung
carcinoma, rendered by three pathologists. The comparison results between two pathologists out of
the three are shown in Table 9 as a confusion matrix. In this table, indices 0 and 1 mean grade II and
grade III, respectively, and the non-negative integer in each cell stands for the number of cases.
Table 9 — Case 4 — Comparison results between two pathologists
Pathologist 2
Pathologist 1 1 0
1 27 4
0 3 41
5.5 Case 5: Human cell line activation test (h-CLAT)-2
Reference [11] compared the detection results of the h-CLAT and another alternative for evaluating the
skin sensitization potential, the local lymph node assay (LLNA), using 117 chemicals. The comparison
results are shown in Table 10 as a confusion matrix. In this table, indices 1 and 0 mean that a chemical
was evaluated to have and not to have, respectively, the skin sensitization potential by each assay. The
non-negative integer in each cell stands for the number of chemicals.
Table 10 — Case 5 — Comparison results between LLNA and h-CLAT
h-CLAT
LLNA 1 0
1 7
...
TECHNICAL ISO/TR
REPORT 27877
First edition
Statistical analysis for evaluating the
precision of binary measurement
methods and their results
PROOF/ÉPREUVE
Reference number
ISO/TR 27877:2021(E)
©
ISO 2021
---------------------- Page: 1 ----------------------
ISO/TR 27877:2021(E)
COPYRIGHT PROTECTED DOCUMENT
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting
on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address
below or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 2 ----------------------
ISO/TR 27877:2021(E)
Contents Page
Foreword .v
Introduction .vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, and symbols . 1
3.1 Terms and definitions . 1
3.2 Symbols . 3
4 Overview . 4
5 Examples used in this document . 6
5.1 Case 1: Listeria monocytogenes . 6
5.2 Case 2: Human cell line activation test (h-CLAT)-1 . 6
5.3 Case 3: Intraracheal administration testing . 7
5.4 Case 4: Histopathological classification of lung carcinoma . 8
5.5 Case 5: Human cell line activation test (h-CLAT)-2 . 8
5.6 Case 6: Statistical model for predicting chemical toxicity . 8
6 Statistical analysis for evaluating the precision of binary measurement methods
and their results . 9
6.1 ISO 5725-based method. 9
6.1.1 Overview . 9
6.1.2 Case 1 .12
6.1.3 Case 2(a) .12
6.1.4 Case 2(b) .13
6.1.5 Case 3(a) .13
6.1.6 Case 3(b) .13
6.2 Accordance and concordance .13
6.2.1 Overview .13
6.2.2 Case 1 .15
6.2.3 Case 2(a) .15
6.2.4 Case 2(b) .16
6.2.5 Case 3(a) .16
6.2.6 Case 3(b) .16
6.3 ORDANOVA .16
6.3.1 Overview .16
6.3.2 Case 1 .18
6.3.3 Case 2(a) .18
6.3.4 Case 2(b) .19
6.3.5 Case 3(a) .19
6.3.6 Case 3(b) .19
6.4 CM-accuracy, sensitivity and specificity .19
6.4.1 Overview .19
6.4.2 Case 4 .20
6.4.3 Case 5 .20
6.4.4 Case 6 .20
6.5 Kappa coefficient .21
6.5.1 Overview .21
6.5.2 Case 4 .21
6.5.3 Case 5 .21
6.5.4 Case 6 .22
7 Remarks on the methods introduced in this document .22
7.1 Comparison between the mathematical expressions of the precision estimates .22
7.2 Comparison between the numerical examples of the precision estimates .23
© ISO 2021 – All rights reserved PROOF/ÉPREUVE iii
---------------------- Page: 3 ----------------------
ISO/TR 27877:2021(E)
Bibliography .25
iv PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 4 ----------------------
ISO/TR 27877:2021(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www .iso .org/ directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www .iso .org/ patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see www .iso .org/
iso/ foreword .html.
This document was prepared by Technical Committee ISO/TC 69, Application of statistical methods,
Subcommittee SC 6, Measurement methods and results.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www .iso .org/ members .html.
© ISO 2021 – All rights reserved PROOF/ÉPREUVE v
---------------------- Page: 5 ----------------------
ISO/TR 27877:2021(E)
Introduction
The documents in the ISO 5725 series define the precision of quantitative measurement methods
and their results, and assume that the errors follow normal distributions in their basic models. Also,
they provide how to run experiments to evaluate precision measures, such as repeatability and
reproducibility. Nowadays, there is also a demand for dealing with qualitative measurement methods
and their results, which output binary data, categorical data, etc. However, the ISO 5725 series is not
suitable mathematically for analyzing such data.
Several existing studies propose statistical methods for dealing with binary and/or categorical data,
but no guidance documents are available so far. Hence, this document summaries various methods to
evaluate the precision of binary measurement methods and their results, which are the most essential
and frequently used methods for qualitative data.
vi PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 6 ----------------------
TECHNICAL REPORT ISO/TR 27877:2021(E)
Statistical analysis for evaluating the precision of binary
measurement methods and their results
1 Scope
This document introduces five statistical methods for evaluating the precision of binary measurement
methods and their results. The five methods can be divided into two types. Both types are based on
measured values provided by each laboratory participating in a collaborative study. In the first type,
each laboratory repeatedly measures a single sample. The samples measured by the laboratories are
nominally identical. The second type is an extension of the first type, where there are several levels of
samples.
For each statistical method, this document briefly summarizes its theory and explains how to estimate
the proposed precision measures. Some real cases are illustrated to help the readers understand the
evaluation procedures involved. For the first and second types of methods, five and three cases are
presented, respectively.
Finally, this document compares the five statistical methods.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 3534-1, Statistics — Vocabulary and symbols — Part 1: General statistical terms and terms used in
probability
ISO 5725-1, Accuracy (trueness and precision) of measurement methods and results — Part 1: General
principles and definitions
3 Terms and definitions, and symbols
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 3534-1 and ISO 5725-1 and
the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1.1
accordance
probability that two binary measured values be identical when they are taken from the same laboratory
Note 1 to entry: The concept corresponds to the definition of “repeatability” in ISO 5725 and was originally
proposed by Reference [3].
© ISO 2021 – All rights reserved PROOF/ÉPREUVE 1
---------------------- Page: 7 ----------------------
ISO/TR 27877:2021(E)
3.1.2
concordance
probability that two binary measured values be identical when they are taken from different
laboratories
Note 1 to entry: The concept corresponds to the definition of “reproducibility” in ISO 5725 and was originally
proposed by Reference [3].
3.1.3
ORDANOVA
statistical method for evaluating the precision of ordinal-scale measurement methods and their results
based on an ordinal dispersion measure
Note 1 to entry: The concept was originally proposed by Reference [4].
3.1.4
true positive
TP
correct measured value in positive results, that is, the case where both the measured and the correct
results are positive
3.1.5
true negative
TN
correct measured value in negative results, that is, the case where both the measured and the correct
results are negative
3.1.6
false positive
FP
incorrect measured value in positive results, that is, the case where the measured value is positive but
the correct one is negative
3.1.7
false negative
FN
incorrect measured value in negative results, that is, the case where the measured value is negative but
the correct one is positive
3.1.8
confusion matrix
2×2 matrix showing the numbers of true positives, true negatives, false positives and false negatives
3.1.9
CM-accuracy
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
measured values
Note 1 to entry: The term CM-accuracy is identical to the term accuracy in the machine learning field, and is not
generally used. However, this document uses CM-accuracy instead of accuracy in the machine learning field to
distinguish between the term accuracy in ISO 5725 and the term in the machine learning field. In this document,
the prefix CM stands for confusion matrixes because CM-accuracy can be calculated based on those.
3.1.10
sensitivity
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
positive measured values
2 PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 8 ----------------------
ISO/TR 27877:2021(E)
3.1.11
specificity
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
negative measured values
3.1.12
CM-precision
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
measured values in positive measured values
Note 1 to entry: The term CM-precision is identical to the term precision in the machine learning field, and is not
generally used. However, this document uses CM-precision instead of precision in the machine learning field to
distinguish between the term precision in ISO 5725 and the term in the machine learning field. In this document,
the prefix CM stands for confusion matrixes because CM-precision can be calculated based on those.
3.1.13
F-measure
statistic for indicating the capability of two-class classifications, defined by the harmonic mean
between sensitivity and CM-precision
3.1.14
kappa coefficient
statistic for indicating the capability of two-class classifications, defined by the ratio of CM-accuracy
minus the possibility of the correctness occurring by chance to one minus the possibility of the
correctness occurring by chance
Note 1 to entry: Note to entry 1: The kappa coefficient is an extended statistic of CM-accuracy, originally
introduced by Reference [15], which takes into account the possibility of the correctness occurring by chance.
3.2 Symbols
L number of laboratories participating in a collaborative study
n
number of repetitions in each laboratory participating in a collaborative study
i
suffix describing a laboratory, and iL∈…1,,
{}
j
suffix describing a repetition, and jn∈…1,,
{}
measured value of repetition j of laboratory i
y
ij
sum of the measured values y of laboratory i for the case where y ∈ 01, , that is,
{}
ij ij
n
x
i
xy=∈ y 01,
(){}
i ∑ ij ij
j=1
n
y arithmetic mean of y of laboratory i , that is, yn= 11/,yi ∈…, L
() (){}
i ij i ij
∑
j=1
L n
y overall arithmetic mean of y , that is, yn= 1/ Ly
()()
ij ∑∑ ij
i==11j
p
number of positive measured values of laboratory i
n
i
L
p p p p
n sum of n with respect to iL∈…1,, , that is, nn=
{}
i i
∑
i=1
c ij, -element of a confusion matrix
()
ij
© ISO 2021 – All rights reserved PROOF/ÉPREUVE 3
---------------------- Page: 9 ----------------------
ISO/TR 27877:2021(E)
m
general mean (expectation) in the basic model of ISO 5725-2
laboratory component of bias under repeatability conditions of laboratory i in the basic model
B
i
of ISO 5725-2
random error occurring in every measurement under repeatability conditions in the basic
e
ij
model of ISO 5725-2
2
within-laboratory variance of laboratory i
σ
ri
2
repeatability variance or within-laboratory variance
σ
r
2
between-laboratory variance
σ
L
2 22 2
σ reproducibility variance, that is, σσ=+σ
R Rr L
2
within-laboratory variance of laboratory i in the ISO 5725-based method
σ
ri:W
2
repeatability variance in the ISO 5725-based method
σ
rW:
2
between-laboratory variance in the ISO 5725-based method
σ
LW:
2 22 2
σ reproducibility variance in the ISO 5725-based method, that is, σσ=+σ
RW: RW::rW LW:
ordinal dispersion measure proposed by Reference [14] for the case of binary data assumed to
2
σ
follow a binomial distribution
2
within-laboratory variance of laboratory i in ORDANOVA
σ
ri:O
2
repeatability variance in ORDANOVA
σ
rO:
2
between-laboratory variance in ORDANOVA
σ
LO:
2 22 2
σ reproducibility variance in ORDANOVA, that is, σσ=+σ
RO: RO::rO LO:
p probability of obtaining a measurement value y = 1 of laboratory i
i ij
L
p arithmetic mean of p , that is, pL= 1/ p
()
i i
∑
i=1
null hypothesis of a statistical test
H
0
2
test statistic of a chi-squared test
χ
0
A
accordance of Reference [3] method
C
concordance of Reference [3] method
ˆ
estimate of X , where X stands for any statistic
X
4 Overview
This document deals with the following five methods.
a) ISO 5725-based method (proposed by Reference [13]);
4 PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 10 ----------------------
ISO/TR 27877:2021(E)
b) accordance and concordance (proposed by Reference [3]);
c) ORDANOVA (proposed by Reference [4]);
d) CM-accuracy, sensitivity and specificity;
e) Kappa coefficient (proposed by Reference [15]).
The assumed data structure depends on each method. In methods a), b) and c), each laboratory
measures one identical sample multiple times, and the data can be summarized as in Table 1 and/or
Table 2; while methods d) and e) are based on many levels of samples, and the data can be summarized
as in Table 3.
NOTE 1 Nowadays, the probability for detection (POD) approach is used to analyze binary measurement
values, instead of the methods d) and e), but this document deals with only classical methods. There are some ISO
documents introducing the POD approach; see References [13] and [15].
Table 1 — Data format for methods a), b) and c)
Laboratory Rep 1 Rep j Rep n
Lab 1 y y
y
1 j 1n
11
Lab i y y y
i1 ij in
y y y
Lab L
L1 Lj Ln
NOTE y is either 0 or 1, which means negative and positive measured
ij
values, respectively.
Table 2 — Another expression of Table 1
Laboratory Number of 1 Number of 0
Lab 1 x nx−
1 1
Lab i x nx−
i i
Lab L
x nx−
L L
NOTE n is the number of repetitions in each laboratory, and
xn∈…01,, , .
{}
i
Table 3 — Data format for methods d) and e)
Measured values
Actual values 1 0
1
c c
11 12
0 c c
21 22
NOTE cc,, cc, are non-negative integers.
11 12 21 22
© ISO 2021 – All rights reserved PROOF/ÉPREUVE 5
---------------------- Page: 11 ----------------------
ISO/TR 27877:2021(E)
5 Examples used in this document
5.1 Case 1: Listeria monocytogenes
This subclause deals with the results of a collaborative study on Listeria monocytogenes, which was
quoted and analyzed in Reference [13]. The study consisted of ten laboratories, where each laboratory
repeated five-time measurements, that is, Ln==10, 5. The results are shown in Table 4. In this table,
numbers 1 and 0 mean that Listeria monocytogenes were and were not detected, respectively; and the
column Total indicates the total number of detections of Listeria monocytogenes.
Table 4 — Case 1 — The results of a collaborative study on Listeria monocytogenes
Measured value
Laboratory
Repetition Total
Lab 1 1 1 1 1 1 5
Lab 2 1 1 1 1 1 5
Lab 3 1 1 1 1 1 5
Lab 4 1 1 1 1 1 5
Lab 5 0 0 1 1 1 3
Lab 6 1 1 1 1 1 5
Lab 7 0 0 1 1 1 3
Lab 8 1 1 1 1 1 5
Lab 9 1 1 1 1 1 5
Lab 10 1 1 1 1 1 5
5.2 Case 2: Human cell line activation test (h-CLAT)-1
[5][6]
This subclause deals with the human cell line activation test (h-CLAT) . The h-CLAT is an alternative
to an animal experiment for evaluating the skin sensitization potential.
Reference [7] conducted a collaborative study and reported the results. The study consisted of five
laboratories, where each laboratory repeated three-time measurements. Each laboratory measured
21 chemicals, but this document deals with two chemicals out of the 21, denoted by chemical A and
chemical B, which have different pattern results. Case 2(a) reports the results of chemical A, shown in
Table 5, and Case 2(b) reports the results of chemical B, shown in Table 6.
Table 5 — Case 2(a) — Number of detections of the skin sensitization potential of chemical A
by h-CLAT in three-time measurements
Number of detections
Laboratory
in three repetitions
Lab 1 3
Lab 2 3
Lab 3 1
Lab 4 3
Lab 5 3
6 PROOF/ÉPREUVE © ISO 2021 – All rights reserved
---------------------- Page: 12 ----------------------
ISO/TR 27877:2021(E)
Table 6 — Case 2(b) — Number of detections of the skin sensitization potential of chemical B
by h-CLAT in three-time measurements
Number of detections
Laboratory
in three repetitions
Lab 1 0
Lab 2 2
Lab 3 0
Lab 4 1
Lab 5 0
5.3 Case 3: Intraracheal administration testing
[8]
This subclause deals with the intratracheal administration test . The test is a relatively new in vivo
screening method for evaluating the pulmonary toxicity of nanomaterials.
The National Institute of Advanced Industrial Science and Technology (AIST) and five test laboratories
[9]
conducted a collaborative study and reported the results . The study consisted of five laboratories,
where each laboratory reported positives (1) or negatives (0) of effects for 19 pathological findings
at three doses using five rats. This document deals with two pathological findings at one dose, which
have different patterns. Case 3(a) reports the results of appearance of alveolar macrophages following
the administration of 0,13 mg/kg weight of a multi-wall carbon nona tube (MWCNT), shown in Table 7.
Case 3(b) reports the results of appearance of hyperplasia of type II pneumocystis following the
administration of 0,13 mg/kg weight of a MWCNT, shown in Table 8.
NOTE 1 See Reference [9] for the report, in Japanese, on the collaborative study and its original raw data.
NOTE 2 Each laboratory originally reported the strength of effects with five-level scores, −, ±, +, ++ and +++. In
this document, these five-level scores are gathered to two-level scores, 1 and 0. When the original score was −, it
is treated as negative (0); otherwise, the score is treated as positive (1).
Table 7 — Case 3 (a) —Results of a collaborative study of an intratracheal
administration testing with appearance of alveolar macrophages
Number of rats reported in each category
Laboratory
0 1
Lab A 0 5
Lab B 0 5
Lab C 0 5
Lab D 0 5
Lab E 0 5
Table 8 — Case 3 (b) —Results of a collaborative study of an intratracheal
administration testing with appearance of hyperplasia of type II pneumocystis
Number of rats reported in each category
Laboratory
0 1
Lab A 0 5
Lab B 3 2
Lab C 3 2
Lab D 1 4
Lab E 3 2
© ISO 2021 – All rights reserved PROOF/ÉPREUVE 7
---------------------- Page: 13 ----------------------
ISO/TR 27877:2021(E)
5.4 Case 4: Histopathological classification of lung carcinoma
Reference [10] compared 75-case diagnosis results of adenosquamous carcinoma, a type of lung
carcinoma, rendered by three pathologists. The comparison results between two pathologists out of
the three are shown in Table 9 as a confusion matrix. In this table, indices 0 and 1 mean grade II and
grade III, respectively, and the non-negative integer in each cell stands for the number of cases.
Table 9 — Case 4 — Comparison results between two pathologists
Pathologist 2
Pathologist 1 1 0
1 27 4
0 3 41
5.5 Case 5: Human cell line activation test (h-CLAT)-2
Reference [11] compared the detection results of the h-CLAT and another alternative for evaluating the
skin sensitization potential, the local lymph node assay (LLNA), using 117 chemicals. The comparison
results are shown in Table 10 as a confusion matrix. In this table, indices 1 and 0 mean that a chemical
was evaluated to have and not to have, respectively, the skin sensitization potential by each assay. The
non-negative integer in each cell stands for the number of chemi
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.