Statistical analysis for evaluating the precision of binary measurement methods and their results

This document introduces five statistical methods for evaluating the precision of binary measurement methods and their results. The five methods can be divided into two types. Both types are based on measured values provided by each laboratory participating in a collaborative study. In the first type, each laboratory repeatedly measures a single sample. The samples measured by the laboratories are nominally identical. The second type is an extension of the first type, where there are several levels of samples. For each statistical method, this document briefly summarizes its theory and explains how to estimate the proposed precision measures. Some real cases are illustrated to help the readers understand the evaluation procedures involved. For the first and second types of methods, five and three cases are presented, respectively. Finally, this document compares the five statistical methods.

Analyse statistique pour l'évaluation de la fidélité des méthodes de mesure binaire et de leurs résultats

General Information

Status
Published
Publication Date
06-Oct-2021
Current Stage
6060 - International Standard published
Start Date
07-Oct-2021
Due Date
26-Jun-2022
Completion Date
07-Oct-2021
Ref Project

Relations

Technical report
ISO/TR 27877:2021 - Statistical analysis for evaluating the precision of binary measurement methods and their results Released:10/7/2021
English language
26 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


TECHNICAL ISO/TR
REPORT 27877
First edition
2021-10
Statistical analysis for evaluating the
precision of binary measurement
methods and their results
Analyse statistique pour l'évaluation de la fidélité des méthodes de
mesure binaire et de leurs résultats
Reference number
© ISO 2021
All rights reserved. Unless otherwise specified, or required in the context of its implementation, no part of this publication may
be reproduced or utilized otherwise in any form or by any means, electronic or mechanical, including photocopying, or posting on
the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below
or ISO’s member body in the country of the requester.
ISO copyright office
CP 401 • Ch. de Blandonnet 8
CH-1214 Vernier, Geneva
Phone: +41 22 749 01 11
Email: copyright@iso.org
Website: www.iso.org
Published in Switzerland
ii
Contents Page
Foreword .v
Introduction . vi
1 Scope . 1
2 Normative references . 1
3 Terms and definitions, and symbols . .1
3.1 Terms and definitions . 1
3.2 Symbols . 3
4 Overview . 4
5 Examples used in this document .6
5.1 Case 1: Listeria monocytogenes . 6
5.2 Case 2: Human cell line activation test (h-CLAT)-1 . 6
5.3 Case 3: Intraracheal administration testing . 7
5.4 Case 4: Histopathological classification of lung carcinoma . 8
5.5 Case 5: Human cell line activation test (h-CLAT)-2 . 8
5.6 Case 6: Statistical model for predicting chemical toxicity . 8
6 Statistical analysis for evaluating the precision of binary measurement methods
and their results .9
6.1 ISO 5725-based method . 9
6.1.1 Overview . 9
6.1.2 Case 1 .12
6.1.3 Case 2(a) .12
6.1.4 Case 2(b) .13
6.1.5 Case 3(a) . 13
6.1.6 Case 3(b) .13
6.2 Accordance and concordance . 13
6.2.1 Overview . 13
6.2.2 Case 1 . 15
6.2.3 Case 2(a) . 15
6.2.4 Case 2(b) . 16
6.2.5 Case 3(a) . 16
6.2.6 Case 3(b) . 16
6.3 ORDANOVA . . 16
6.3.1 Overview . 16
6.3.2 Case 1 . 18
6.3.3 Case 2(a) . 18
6.3.4 Case 2(b) . 19
6.3.5 Case 3(a) . 19
6.3.6 Case 3(b) . 19
6.4 CM-accuracy, sensitivity and specificity . 19
6.4.1 Overview . 19
6.4.2 Case 4 . 20
6.4.3 Case 5 . 20
6.4.4 Case 6 . 20
6.5 Kappa coefficient . 21
6.5.1 Overview . 21
6.5.2 Case 4 . 21
6.5.3 Case 5 . 21
6.5.4 Case 6 . 22
7 Remarks on the methods introduced in this document .22
7.1 Comparison between the mathematical expressions of the precision estimates .22
7.2 Comparison between the numerical examples of the precision estimates .23
iii
Bibliography .25
iv
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards
bodies (ISO member bodies). The work of preparing International Standards is normally carried out
through ISO technical committees. Each member body interested in a subject for which a technical
committee has been established has the right to be represented on that committee. International
organizations, governmental and non-governmental, in liaison with ISO, also take part in the work.
ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of
electrotechnical standardization.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular, the different approval criteria needed for the
different types of ISO documents should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject of
patent rights. ISO shall not be held responsible for identifying any or all such patent rights. Details of
any patent rights identified during the development of the document will be in the Introduction and/or
on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation of the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to
the World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT), see
www.iso.org/iso/foreword.html.
This document was prepared by Technical Committee ISO/TC 69, Application of statistical methods,
Subcommittee SC 6, Measurement methods and results.
Any feedback or questions on this document should be directed to the user’s national standards body. A
complete listing of these bodies can be found at www.iso.org/members.html.
v
Introduction
The documents in the ISO 5725 series define the precision of quantitative measurement methods
and their results, and assume that the errors follow normal distributions in their basic models. Also,
they provide how to run experiments to evaluate precision measures, such as repeatability and
reproducibility. Nowadays, there is also a demand for dealing with qualitative measurement methods
and their results, which output binary data, categorical data, etc. However, the ISO 5725 series is not
suitable mathematically for analyzing such data.
Several existing studies propose statistical methods for dealing with binary and/or categorical data,
but no guidance documents are available so far. Hence, this document summaries various methods to
evaluate the precision of binary measurement methods and their results, which are the most essential
and frequently used methods for qualitative data.
vi
TECHNICAL REPORT ISO/TR 27877:2021(E)
Statistical analysis for evaluating the precision of binary
measurement methods and their results
1 Scope
This document introduces five statistical methods for evaluating the precision of binary measurement
methods and their results. The five methods can be divided into two types. Both types are based on
measured values provided by each laboratory participating in a collaborative study. In the first type,
each laboratory repeatedly measures a single sample. The samples measured by the laboratories are
nominally identical. The second type is an extension of the first type, where there are several levels of
samples.
For each statistical method, this document briefly summarizes its theory and explains how to estimate
the proposed precision measures. Some real cases are illustrated to help the readers understand the
evaluation procedures involved. For the first and second types of methods, five and three cases are
presented, respectively.
Finally, this document compares the five statistical methods.
2 Normative references
The following documents are referred to in the text in such a way that some or all of their content
constitutes requirements of this document. For dated references, only the edition cited applies. For
undated references, the latest edition of the referenced document (including any amendments) applies.
ISO 3534-1, Statistics — Vocabulary and symbols — Part 1: General statistical terms and terms used in
probability
ISO 5725-1, Accuracy (trueness and precision) of measurement methods and results — Part 1: General
principles and definitions
3 Terms and definitions, and symbols
3.1 Terms and definitions
For the purposes of this document, the terms and definitions given in ISO 3534-1 and ISO 5725-1 and
the following apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— ISO Online browsing platform: available at https:// www .iso .org/ obp
— IEC Electropedia: available at https:// www .electropedia .org/
3.1.1
accordance
probability that two binary measured values be identical when they are taken from the same laboratory
Note 1 to entry: The concept corresponds to the definition of “repeatability” in ISO 5725 and was originally
proposed by Reference [3].
3.1.2
concordance
probability that two binary measured values be identical when they are taken from different
laboratories
Note 1 to entry: The concept corresponds to the definition of “reproducibility” in ISO 5725 and was originally
proposed by Reference [3].
3.1.3
ORDANOVA
statistical method for evaluating the precision of ordinal-scale measurement methods and their results
based on an ordinal dispersion measure
Note 1 to entry: The concept was originally proposed by Reference [4].
3.1.4
true positive
TP
correct measured value in positive results, that is, the case where both the measured and the correct
results are positive
3.1.5
true negative
TN
correct measured value in negative results, that is, the case where both the measured and the correct
results are negative
3.1.6
false positive
FP
incorrect measured value in positive results, that is, the case where the measured value is positive but
the correct one is negative
3.1.7
false negative
FN
incorrect measured value in negative results, that is, the case where the measured value is negative but
the correct one is positive
3.1.8
confusion matrix
2×2 matrix showing the numbers of true positives, true negatives, false positives and false negatives
3.1.9
CM-accuracy
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
measured values
Note 1 to entry: The term CM-accuracy is identical to the term accuracy in the machine learning field, and is not
generally used. However, this document uses CM-accuracy instead of accuracy in the machine learning field to
distinguish between the term accuracy in ISO 5725 and the term in the machine learning field. In this document,
the prefix CM stands for confusion matrixes because CM-accuracy can be calculated based on those.
3.1.10
sensitivity
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
positive measured values
3.1.11
specificity
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
negative measured values
3.1.12
CM-precision
statistic for indicating the capability of two-class classifications, defined by the percentage of correct
measured values in positive measured values
Note 1 to entry: The term CM-precision is identical to the term precision in the machine learning field, and is not
generally used. However, this document uses CM-precision instead of precision in the machine learning field to
distinguish between the term precision in ISO 5725 and the term in the machine learning field. In this document,
the prefix CM stands for confusion matrixes because CM-precision can be calculated based on those.
3.1.13
F-measure
statistic for indicating the capability of two-class classifications, defined by the harmonic mean
between sensitivity and CM-precision
3.1.14
kappa coefficient
statistic for indicating the capability of two-class classifications, defined by the ratio of CM-accuracy
minus the possibility of the correctness occurring by chance to one minus the possibility of the
correctness occurring by chance
Note 1 to entry: Note to entry 1: The kappa coefficient is an extended statistic of CM-accuracy, originally
introduced by Reference [15], which takes into account the possibility of the correctness occurring by chance.
3.2 Symbols
L number of laboratories participating in a collaborative study
n
number of repetitions in each laboratory participating in a collaborative study
i
suffix describing a laboratory, and iL∈…1,,
{}
j
suffix describing a repetition, and jn∈…1,,
{}
measured value of repetition j of laboratory i
y
ij
sum of the measured values y of laboratory i for the case where y ∈ 01, , that is,
{}
ij ij
n
x
i
xy=∈ y 01,
(){}
i ∑ ij ij
j=1
n
y arithmetic mean of y of laboratory i , that is, yn= 11/,yi ∈…, L
() (){}
i ij i ij

j=1
L n
y overall arithmetic mean of y , that is, yn= 1/ Ly
()()
ij ∑∑ ij
i==11j
p
number of positive measured values of laboratory i
n
i
L
p p p p
n sum of n with respect to iL∈…1,, , that is, nn=
{}
i i

i=1
c ij, -element of a confusion matrix
()
ij
m
general mean (expectation) in the basic model of ISO 5725-2
laboratory component of bias under repeatability conditions of laboratory i in the basic model
B
i
of ISO 5725-2
random error occurring in every measurement under repeatability conditions in the basic
e
ij
model of ISO 5725-2
within-laboratory variance of laboratory i
σ
ri
repeatability variance or within-laboratory variance
σ
r
between-laboratory variance
σ
L
2 22 2
σ reproducibility variance, that is, σσ=+σ
R Rr L
within-laboratory variance of laboratory i in the ISO 5725-based method
σ
ri:W
repeatability variance in the ISO 5725-based method
σ
rW:
between-laboratory variance in the ISO 5725-based method
σ
LW:
2 22 2
σ reproducibility variance in the ISO 5725-based method, that is, σσ=+σ
RW: RW::rW LW:
ordinal dispersion measure proposed by Reference [14] for the case of binary data assumed to
σ
follow a binomial distribution
within-laboratory variance of laboratory i in ORDANOVA
σ
ri:O
repeatability variance in ORDANOVA
σ
rO:
between-laboratory variance in ORDANOVA
σ
LO:
2 22 2
σ reproducibility variance in ORDANOVA, that is, σσ=+σ
RO: RO::rO LO:
p probability of obtaining a measured value y = 1 of laboratory i
i ij
L
p arithmetic mean of p , that is, pL= 1/ p
()
i i

i=1
null hypothesis of a statistical test
H
test statistic of a chi-squared test
χ
A
accordance of Reference [3] method
C
concordance of Reference [3] method
ˆ
estimate of X
X
4 Overview
This document deals with the following five methods.
a) ISO 5725-based method (proposed by Reference [13]);
b) accordance and concordance (proposed by Reference [3]);
c) ORDANOVA (proposed by Reference [4]);
d) CM-accuracy, sensitivity and specificity;
e) Kappa coefficient (proposed by Reference [15]).
The assumed data structure depends on each method. In methods a), b) and c), each laboratory
measures one identical sample multiple times, and the data can be summarized as in Table 1 and/or
Table 2; while methods d) and e) are based on many levels of samples, and the data can be summarized
as in Table 3.
NOTE Nowadays, the probability for detection (POD) approach is used to analyze binary measured values,
instead of the methods d) and e), but this document deals with only classical methods. There are some ISO
documents introducing the POD approach; see References [1] and [2].
Table 1 — Data format for methods a), b) and c)
Laboratory Rep 1 Rep j Rep n
 
Lab 1 y y
y
1 j 1n
   
 
Lab i y y y
i1 ij in
   
 
y y y
Lab L
L1 Lj Ln
NOTE  y is either 0 or 1, which means negative and positive measured
ij
values, respectively.
Table 2 — Another expression of Table 1
Laboratory Number of 1 Number of 0
Lab 1 x nx−
1 1
 
Lab i x nx−
i i
 
Lab L
x nx−
L L
NOTE  n is the number of repetitions in each laboratory, and
xn∈…01,, , .
{}
i
Table 3 — Data format for methods d) and e)
Measured values
Actual values 1 0
c c
11 12
0 c c
21 22
NOTE  cc,, cc, are non-negative integers.
11 12 21 22
5 Examples used in this document
5.1 Case 1: Listeria monocytogenes
This subclause deals with the results of a collaborative study on Listeria monocytogenes, which was
quoted and analyzed in Reference [13]. The study consisted of ten laboratories, where each laboratory
repeated five-time measurements, that is, Ln==10, 5. The results are shown in Table 4. In this table,
numbers 1 and 0 mean that Listeria monocytogenes were and were not detected, respectively; and the
column Total indicates the total number of detections of Listeria monocytogenes.
Table 4 — Case 1 — The results of a collaborative study on Listeria monocytogenes
Measured value
Laboratory
Repetition Total
Lab 1 1 1 1 1 1 5
Lab 2 1 1 1 1 1 5
Lab 3 1 1 1 1 1 5
Lab 4 1 1 1 1 1 5
Lab 5 0 0 1 1 1 3
Lab 6 1 1 1 1 1 5
Lab 7 0 0 1 1 1 3
Lab 8 1 1 1 1 1 5
Lab 9 1 1 1 1 1 5
Lab 10 1 1 1 1 1 5
5.2 Case 2: Human cell line activation test (h-CLAT)-1
[5][6]
This subclause deals with the human cell line activation test (h-CLAT) . The h-CLAT is an alternative
to an animal experiment for evaluating the skin sensitization potential.
Reference [7] conducted a collaborative study and reported the results. The study consisted of five
laboratories, where each laboratory repeated three-time measurements. Each laboratory measured
21 chemicals, but this document deals with two chemicals out of the 21, denoted by chemical A and
chemical B, which have different pattern results. Case 2(a) reports the results of chemical A, shown in
Table 5, and Case 2(b) reports the results of chemical B, shown in Table 6.
Table 5 — Case 2(a) — Number of detections of the skin sensitization potential of chemical A
by h-CLAT in three-time measurements
Number of detections
Laboratory
in three repetitions
Lab 1 3
Lab 2 3
Lab 3 1
Lab 4 3
Lab 5 3
Table 6 — Case 2(b) — Number of detections of the skin sensitization potential of chemical B
by h-CLAT in three-time measurements
Number of detections
Laboratory
in three repetitions
Lab 1 0
Lab 2 2
Lab 3 0
Lab 4 1
Lab 5 0
5.3 Case 3: Intraracheal administration testing
[8]
This subclause deals with the intratracheal administration test . The test is a relatively new in vivo
screening method for evaluating the pulmonary toxicity of nanomaterials.
The National Institute of Advanced Industrial Science and Technology (AIST) and five test laboratories
[9]
conducted a collaborative study and reported the results . The study consisted of five laboratories,
where each laboratory reported positives (1) or negatives (0) of effects for 19 pathological findings
at three doses using five rats. This document deals with two pathological findings at one dose, which
have different patterns. Case 3(a) reports the results of appearance of alveolar macrophages following
the administration of 0,13 mg/kg weight of a multi-wall carbon nanotube (MWCNT), shown in Table 7.
Case 3(b) reports the results of hyperplasia of type II pneumocystis following the administration of
0,13 mg/kg weight of a MWCNT, shown in Table 8.
NOTE 1 See Reference [9] for the report, in Japanese, on the collaborative study and its original raw data.
NOTE 2 Each laboratory originally reported the strength of effects with five-level scores, −, ±, +, ++ and +++. In
this document, these five-level scores are gathered to two-level scores, 1 and 0. When the original score was −, it
is treated as negative (0); otherwise, the score is treated as positive (1).
Table 7 — Case 3 (a) —Results of a collaborative study of an intratracheal
administration testing with appearance of alveolar macrophages
Number of rats reported in each category
Laboratory
0 1
Lab A 0 5
Lab B 0 5
Lab C 0 5
Lab D 0 5
Lab E 0 5
Table 8 — Case 3 (b) —Results of a collaborative study of an intratracheal
administration testing with hyperplasia of type II pneumocystis
Number of rats reported in each category
Laboratory
0 1
Lab A 0 5
Lab B 3 2
Lab C 3 2
Lab D 1 4
Lab E 3 2
5.4 Case 4: Histopathological classification of lung carcinoma
Reference [10] compared 75-case diagnosis results of adenosquamous carcinoma, a type of lung
carcinoma, rendered by three pathologists. The comparison results between two pathologists out of
the three are shown in Table 9 as a confusion matrix. In this table, indices 0 and 1 mean grade II and
grade III, respectively, and the non-negative integer in each cell stands for the number of cases.
Table 9 — Case 4 — Comparison results between two pathologists
Pathologist 2
Pathologist 1 1 0
1 27 4
0 3 41
5.5 Case 5: Human cell line activation test (h-CLAT)-2
Reference [11] compared the detection results of the h-CLAT and another alternative for evaluating the
skin sensitization potential, the local lymph node assay (LLNA), using 117 chemicals. The comparison
results are shown in Table 10 as a confusion matrix. In this table, indices 1 and 0 mean that a chemical
was evaluated to have and not to have, respectively, the skin sensitization potential by each assay. The
non-negative integer in each cell stands for the number of chemicals.
Table 10 — Case 5 — Comparison results between LLNA and h-CLAT
h-CLAT
LLNA 1 0
1 75 10
0 8 24
5.6 Case 6: Statistical model for predicting chemical toxicity
Some methods can be used for quantifying the prediction accuracy of statistical and machine learning
models. Reference [12] developed a statistical model for predicting increased serum ALT levels in
rats, which was one of the widely-used hepatotoxicity markers. The comparison results between the
observed toxicity in rats and the predicted toxicity by the model are shown in Table 11 as a confusion
matrix. In this table, indices 1 and 0 mean that a chemical was evaluated to have and not to have the
hepatotoxicity, respectively. The non-negative integer in each cell stands for the number of chemicals.
Table 11 — Case 6 — Comparison results between observed and predicted results
on increased serum ALT levels in rats
Predicted values
Observed values 1 0
1 18 5
0 39 114
6 Statistical analysis for evaluating the precision of binary measurement
methods and their results
6.1 ISO 5725-based method
6.1.1 Overview
This method was originally introduced by Reference [13]. The study treated positive and negative
measured values as integers one and zero, respectively, and directly applied the ISO 5725-2 approach to
binary measured values.
The basic model of ISO 5725-2 is
ym=+Be+ , (1)
ij iij
where ym, and B are, respectively, the measured value of repetition jn∈…{}1, , in laboratory
ij i
iL∈…{}1, , , the general mean, and the laboratory component of variation in laboratory i. For any
iL∈…{}1, , , it assumes that the expectation of B is zero, EB()=0 , and the variations of B are
i i i
identical among all laboratories, VB()==σσ . To estimate statistically the repeatability, between-
irir
laboratory and reproducibility variances, a one-way analysis of variance (one-way ANOVA) (random
effects model) is performed. From Table 12, these variances are estimated as follows:
ˆ
σ = s , (2)
rI
ss−
II I
σˆ = , (3)
L
n
and
22 2
σσˆˆ=+σˆ . (4)
Rr L
Table 12 — ANOVA table for ISO 5725-2
Sum of squares Degree of freedom Mean square (MS) Expected MS,
Source
(SQ) (df) (=SQ/df) E(MS)
L
2 22
L− 1
Between lab. ny − y
() s nσσ+
i
∑ II Lr
i=1
L n
2 2
Within lab. yy− Ln − 1
() ()
s σ
ij i
∑∑
I r
i==11j
L n
yy− Ln− 1
Total
()
ij
∑∑
i==11j
n
y =∈yi 1,,… L ,,that is thearithmeticm ean offo yifa l boratory ;
(){}
i ij ij

n
j=1
L n L
 
 
y ==y y ,,that is theoverall arithemeticm ean of y .
ij i ij
∑∑ ∑
 
nL L
i==11j  i=1 
Reference [13] basic model is
yp=+ ()pp−+ e , (5)
ij iij
where y is the measured value described as 0 (negative) or 1 (positive) for repetition jn∈…1,, in
{}
ij
laboratory iL∈…1,, , pp and are the probability of obtaining a measured value y = 1 in
{}
i ij
laboratory i and its expectation, respectively.
Under the independence assumption of ISO 5725-2, yj = 1,, n in laboratory i follows a Bernoulli
()
ij
distribution with parameter p ; thus, the within-laboratory variance of laboratory i is
i
σ =−pp1 , (6)
()
ri:Wi i
and the (whole) repeatability variance is defined as the average of σ ,
ri
L L L
11 1
2 2 2
σσ== pp1− =−p p . (7)
()
rW::ri W ii i
∑∑ ∑
LL L
i==1 i 11i=
The between-laboratory variance in the population of L laboratories is defined as a classical variance
of p , that is,
i
L
1 2
σ = pp− , (8)
()
LW: i

L−1
i=1
and then the reproducibility variance is defined as the same as ISO 5725. In other words,
22 2
σσ=+σ . (9)
RW::rW LW:
n
NOTE 1 Let xy= be the number of positive measured values, then x follows a binomial distribution
i ∑ ij i
j=1
with parameters n and p .
i
From the definition of p and p , their estimates are calculated as
i
n L L n
11 1
pˆˆ==yp and pˆ = y , (10)
i ∑∑ij i ∑∑ ij
n L nL
j==11i i==11j
respectively. In a similar way to ISO 5725-2, a modified one-way ANOVA is performed to estimate
statistically the repeatability, between-laboratory and reproducibility variances. From Table 13, these
variances are estimated as follows:
L
n
ˆ ˆˆ
σ ==s pp1− , (11)
()
rW: I ∑ ii
Ln−1
()
i=1
L L L
 
ss−
1 2 1 1 1

2 II I 2
 
ˆˆ ˆˆ ˆˆ
σˆ = = pp− − pp1− == pp− − σˆ , (12)
() ()
( )
LW: i ii ir:W
∑∑ ∑
 
nL −1 Ln −1 L −1 n
()
i==1 i 1  i=1 
and
22 2
ˆˆ ˆ
σσ=+σ . (13)
RW::rW LW:
Table 13 — ANOVA table for the Reference [13] method
Sum of squares Degree of freedom Mean square Expected MS,
Source
(SQ) (df) (MS)(=SQ/df) E(MS)
L
2 22
Between labs. ˆˆ L− 1
np − p
() s nσσ+
i
∑ II LW::rW
i=1
L
2 2
ˆˆ
Within labs. np 1− p Ln − 1
() () s σ
ii
∑ I rW:
i=1
ˆˆ nL− 1
Total Lnpp1 −
()
For conducting statistical tests to check whether the results of a collaborative study indicate different
sensitivities p , Reference [13] proposed to apply the chi-squared test for independence in the
i
contingency table shown in Table 14.
The null hypothesis is
Hp:,==pp== p (14)
01 2 L
and the alternative hypothesis is that not all of the p are equal. Under the null hypothesis H and the
i 0
ˆ ˆ
condition both np ≥ 5 and np15− ≥ are satisfied, the following test statistic χ is approximately
()
chi-squared distributed with L− 1 degrees of freedom.
2 2 2 2
L L L L
ˆ ˆ ˆ ˆ ˆˆ
xn− p nx− −−np1 n ppp− np − p
() ()() () () ()
2 i i i i
χ = + = +
∑∑ ∑ ∑
ˆ ˆ ˆ ˆ
np np1− p 1− p
()
i==1 i 1 i=1 i=1
(15)
L
n
= ppˆˆ− .
()
∑ i
ˆˆ
pp1−
()
i=1
To check whether the null hypothesis H is statistically rejected or not, the chi-squared test can be
used. When the condition npˆ ≥ 5 and np15− ˆ ≥ is not satisfied, the Fisher’s exact test can be used
()
instead.
Table 14 — Contingency table for detecting a between-laboratory variance
in the ISO 5725-based method
Attribute Laboratory Total
L
Number of positive

x x x x
1 2 L i

measured values
i=1
L
Number of negative

nx− nx− nx− nL− x
1 2 L ∑ i
measured values
i=1
n
n n nL
Total
NOTE 2 The number of repetitions n ≥10 is necessary to satisfy the condition npˆ ≥ 5 and np15− ˆ ≥ ;
()
therefore, if n <10 then the condition is never satisfied.
NOTE 3 Both the Fisher's exact test and the chi-squared test can be conducted using widely used statistical
software such as R. For example, when one uses R, the former test can be conducted by a pre-install function,
fisher.test(); while the latter test can be done by a pre-install function, chisq.test().
6.1.2 Case 1
The estimated detection probability pˆ of each laboratory is listed in Table 15; therefore, pˆ = 09, 2 .
i
Since L =10 and n = 5 , the estimates of the repeatability, between-laboratory and reproducibility
variances are, respectively, calculated as follows:
 
10,·   11− ,,01+−01·,10 +−10,· 11, 0 +−10,· 11, 0 +
() () ( )) ()
 
 
σˆ = 06,·   06,,− 11+−01·,10 +−06,· 10, 6 ++−10,· 11, 0 = 0,,,060 (16)
() () () ()
 
rW:
10 · 51−
()
 
+−11,·  10,,+−10 ·,11 0
() ()
 
 
 
10,,−0921+−,,00 92 +−10,,092
() () ()
 
 222 2
+−10,,092 +−06,,0921+−,,00 92
() () ()
1   1
σˆ = − ·,0 060 ≈0,,016 (17)
 
LW:
10−1 10· 51−
()
 
+−06,,0921+−,,00 92
() ()
 
 
+−10,,09221+−,,00 92
() ()
 
and
σˆ =+0,,060 0 016 40≈ ,.076 (18)
RW:
Because the P-value of the Fisher's exact test is 0,04, Hp: ==pp= is rejected with
01 2 L
5 %-significance level; thus, a between laboratory variance is present from the viewpoint of statistics.
Table 15 — List of the estimated detection probability of each laboratory
Laboratory Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7 Lab 8 Lab 9 Lab 10
Estimated
detection
1,0 1,0 1,0 1,0 0,60 1,0 0,60 1,0 1,0 1,0
probability, pˆ
i
6.1.3 Case 2(a)
ˆ ˆ
The estimated detection probability p of each laboratory is listed in Table 16; then, p =≅0,,866 087 .
i
Since L = 5 and n = 3 , the estimates of the repeatability, between-laboratory and reproducibility
22 2
variances are, respectively, calculated as σσˆˆ≈≈0,,067 ,0 067 and σˆ ≈ 01,.3
rW::LW RW:
Because the P-value of the Fisher's exact test is 0,14, Hp: ==pp= is not rejected with
01 2 L
5 %-significance level.
Table 16 — List of the estimated detection probability of each laboratory
Laboratory Lab 1 Lab 2 Lab 3 Lab 4 Lab 5
Estimated detection probability, pˆ 1,0 1,0 0,33 1,0 0,60
i
6.1.4 Case 2(b)
The estimated detection probability pˆ of each laboratory is listed in Table 17; therefore, pˆ = 02, 0 .
i
Since L = 5 and n = 3 , the estimates of the repeatability, between-laboratory and reproducibility
22 2
variances are, respectively, calculated as σσˆˆ≈≈01,,30,044 and σˆ ≈ 01,8.
rW::LW RW:
Because the P-value of the Fisher's exact test is 0,41, Hp: ==pp= is not rejected with
01 2 L
5 %-significance level.
Table 17 — List of the estimated detection probability of each laboratory
Laboratory Lab 1 Lab 2 Lab 3 Lab 4 Lab 5
ˆ 0,00 0,67 0,00 0,33 0,00
Estimated detection probability, p
i
6.1.5 Case 3(a)
ˆ ˆ
The estimated probability p of each laboratory is listed in Table 18; therefore, p =10, . Since L = 5
i
and n = 5 , the estimates of the repeatability, between-laboratory and reproducibility variances are,
22 2
respectively, calculated as σσˆˆ==00,,00,00 and σˆ =00,.0
rW::LW RW:
Because the P-value of the Fisher's exact test is 1,0, Hp: ==pp= is not rejected with
01 2 L
5 %-significance level.
Table 18 — List of the estimated detection probability of each laboratory
Laboratory Lab 1 Lab 2 Lab 3 Lab 4 Lab 5
ˆ 1,0 1,0 1,0 1,0 1,0
Estimated detection probability, p
i
6.1.6 Case 3(b)
ˆ ˆ
The estimated detection probability p of each laboratory is listed in Table 19; therefore, p = 06, 0 .
i
Since L = 5 and n = 5 , the estimates of the repeatability, between-laboratory and reproducibility
22 2
variances are, respectively, calculated as σσˆˆ==02,,20,036 and σˆ ≈02,.6
rW::LW RW:
Because the P-value of the Fisher's exact test is 0,19, Hp: ==pp= is not rejected with
01 2 L
5 %-significance level.
Table 19 — List of the estimated detection probability of each laboratory
Laboratory Lab 1 Lab 2 Lab 3 Lab 4 Lab 5
ˆ 1,0 0,40 0,40 0,80 0,40
Estimated detection probability, p
i
6.2 Accordance and concordance
6.2.1 Overview
Accordance and concordance were originally introduced by Reference [3]. They correspond to
repeatability and reproducibility, respectively, in ISO 5725. These concepts were based on the
probability that two measured values were identical.
The definition of accordance is the probability that pairs in each laboratory be identical; while that of
concordance is the probability that pairs between different laboratories be identical. The estimates of
accordance and concordance are, respectively, as follows:
p p p p
nn − 11+−nn nn−−
() ()()
i i i i
ˆˆ
AA of laboratory i = ,, (19)
()
i
nn− 1
()
ˆˆ
AA= thea rithmetric mean valueof , (20)
i
and
pp
ˆ
21nn −nL +−nL nL −−An · Ln 1
() ()
()
ˆ
C = , (21)
nL L − 1
()
p
where nL,  andn are the number of the repetitions in each laboratory, that of laboratories, and that of
i
L
p
p
positive measured values in laboratory i , respectively; and nn= .
∑ i
i=1
NOTE 1 The estimate of accordance of laboratory i is from the expression
p p
   
n
nL −n
i i
   
+ /;  (22)
 
    2
 
   
and that of concordance is from the expression
pp
   
nL −n
ˆ
   
+ −−An · Ln 1
()
   
   
. (23)
L
n  
 
 
Each term in Formula (23) based on the number of pairs satisfying some conditions when arbitrary
pairs of each laboratory measured values or all laboratory measured values are considered. The details
are as follows:
p
 
n
p
  is the number of pairs from all positive measured values in all laboratories n ;
 
 
p
 
Ln−
p
 
is the number of pairs from all negative measured values in all laboratories Ln− ;
 
 
is the sum of the number of pairs from positive measured values in each laboratory
ˆ
An · Ln − 1
()
and pairs from negative measured values in each laboratory;
L
n  
is the number of pairs from all measured values in in all laboratories.
 
 
When concordance is less than accordance, a between-laboratory variance seems to be present;
however, it is difficult to quantify the size of the between-laboratory variance. To demonstrate whether
a between-laboratory variance was present or not, Reference [3] proposed to consider concordance
odds ratios (CORs), which were defined as follows:
ˆˆ
100 AC · 100 − 100
()
COR = . (24)
ˆˆ
100 CA · 100 − 100
()
The COR is the odds ratio when the contingency table shown in Table 20 is considered; therefore, if
COR =1, then accordance = concordance, while if COR>1 , then accordance > concordance. To check
whether COR=1 or COR>1 , the Fisher's exact test or the chi-squared test can be used.
Table 20 — Contingency table for detecting a between-laboratory variance in Langton's method
Number of pairs
Attribute Total
of the same elements of different elements
ˆ ˆ
Within laboratory 100
100 A 100− 100 A
Between labora-
ˆ ˆ
100 C 100− 100 C
tory
ˆˆ ˆˆ
Total 100 AC+ 200−+100 AC 200
() ()
ˆ ˆ ˆ ˆ
NOTE 2 Since A and C strongly depend on sensitivity, the difference between C and A is not suitable for an
expression of the size of the between-laboratory variance.
6.2.2 Case 1
The estimate of accordance of each laboratory is shown in Table 21; therefore,
10,,,,++++10 10 10 04,,01++00,40+++10,,,10 10
ˆ
A = = 08,.8 (25)
p
Since the number of positive measured values in all laboratories n is 46, the estimate of concordance
is calculated as follows:
24 ··64 65− ··10 +−51 05 ·· 10 10− ,·88 51 · 0 ·· 51−
() () ()
ˆ
C = ≈ 08,.5 (26)
51 ··01 01−
()
Then,
COR≈13,. (27)
Because the P-value of the Fisher's exact test is 0,34, H : COR= 1 is not rejected with 5 %-significance
level.
Table 21 — List of the estimate of accordance of each laboratory in Case 1
Laboratory Lab 1 Lab 2 Lab 3 Lab 4 Lab 5 Lab 6 Lab 7 Lab 8 Lab 9 Lab 10
ˆ
A 1,0 1,0 1,0 1,0 0,40 1,0 0,40 1,0 1,0 1,0
i
6.2.3 Case 2(a)
p
ˆ
The estimate of accordance of each laboratory is shown in Table 22; therefore, A ≈ 08, 7 . Since n =10,
ˆ
C ≈≈07,,32 andCOR 4 .
Because the P-value of the Fisher's exact test is 0,01, H : COR= 1 is rejected with 5 %-significance
level; thus, a between-laboratory variance is present from the viewpoint of statistics.
ISO/TR 27877
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...