ASTM E2935-14
(Practice)Standard Practice for Conducting Equivalence Testing in Laboratory Applications
Standard Practice for Conducting Equivalence Testing in Laboratory Applications
SIGNIFICANCE AND USE
4.1 Laboratories conducting routine testing have a continuing need to evaluate test result bias, to evaluate changes for improving the test process performance, or to validate the transfer of a test method to a new location or apparatus. In all situations it must be demonstrated that any bias or innovation will have negligible effect on test results for a characteristic of a material. This standard provides statistical methods to confirm that the mean test results from a testing process are equivalent to those from a reference standard or another testing process, where equivalence is defined as agreement within prescribed limits, termed equivalence limits.
4.1.1 The intra-laboratory applications in this practice include, but are not limited to, the following:
(1) Evaluating the bias of a test method with respect to a certified reference material,
(2) Evaluating bias due to a minor change in a test method procedure,
(3) Qualifying new instruments, apparatus, or operators in a laboratory, and
(4) Qualifying new sources of reagents or other materials used in the test procedure.
4.1.2 This practice also supports evaluating systematic differences in a method transfer from a developing laboratory to a receiving laboratory.
4.2 This practice currently deals only with the equivalence of population means. In this standard, a population refers to a hypothetical set of test results arising from a stable testing process that measures a characteristic of a single material.
Note 1: The equivalence concept can also apply to population parameters other than means, such as precision, stated as variances, standard deviations, or relative standard deviations (coefficients of variation), linearity, sensitivity, specificity, etc.
4.3 The data analysis for equivalence testing of population means in this practice uses a statistical methodology termed the “Two one-sided t-test” (TOST) procedure which shall be described in detail in this standard (see X1.1). The TOST...
SCOPE
1.1 This practice provides statistical methodology for conducting equivalence testing on numerical data from two sources to determine if their true means are similar within predetermined limits.
1.2 Applications include (1) equivalence testing for bias against an accepted reference value, (2) determining equivalence of two test methods, test apparatus, instruments, reagent sources, or operators within a laboratory, and (3) equivalence of two laboratories in a method transfer.
1.3 The current guidance in this standard applies only to experiments conducted on a single material. Guidance is given for determining the amount of data required for an equivalence trial.
1.4 The statistical methodology for determining equivalence used is the “Two one-sided t-test” (TOST). The control of risks associated with the equivalence decision is discussed.
1.5 The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard.
1.6 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.
General Information
Relations
Buy Standard
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2935 − 14 AnAmerican National Standard
Standard Practice for
Conducting Equivalence Testing in Laboratory Applications
This standard is issued under the fixed designation E2935; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope E2586Practice for Calculating and Using Basic Statistics
1.1 This practice provides statistical methodology for con-
3. Terminology
ducting equivalence testing on numerical data from two
sources to determine if their true means are similar within
3.1 Definitions—See Terminology E456 for a more exten-
predetermined limits. sive listing of statistical terms.
3.1.1 accepted reference value, n—a value that serves as an
1.2 Applications include (1) equivalence testing for bias
agreed-upon reference for comparison, and which is derived
against an accepted reference value, (2) determining equiva-
as: (1) a theoretical or established value, based on scientific
lence of two test methods, test apparatus, instruments, reagent
principles, (2) an assigned or certified value, based on experi-
sources, or operators within a laboratory, and (3) equivalence
mental work of some national or international organization, or
of two laboratories in a method transfer.
(3) a consensus or certified value, based on collaborative
1.3 The current guidance in this standard applies only to
experimental work under the auspices of a scientific or
experiments conducted on a single material. Guidance is given
engineering group. E177
fordeterminingtheamountofdatarequiredforanequivalence
3.1.2 bias, n—the difference between the expectation of the
trial.
test results and an accepted reference value. E177
1.4 Thestatisticalmethodologyfordeterminingequivalence
3.1.3 confidence interval, n—an interval estimate [L, U]
usedisthe“Twoone-sided t-test”(TOST).Thecontrolofrisks
with the statistics L and U as limits for the parameter θ and
associated with the equivalence decision is discussed.
withconfidencelevel1–α,wherePr(L≤θ≤U)≥1–α. E2586
1.5 The values stated in SI units are to be regarded as
3.1.3.1 Discussion—Theconfidencelevel,1–α,reflectsthe
standard. No other units of measurement are included in this
proportion of cases that the confidence interval [L, U] would
standard.
containorcoverthetrueparametervalueinaseriesofrepeated
1.6 This standard does not purport to address all of the random samples under identical conditions. Once L and U are
safety concerns, if any, associated with its use. It is the given values, the resulting confidence interval either does or
responsibility of the user of this standard to establish appro- doesnotcontainit.Inthissense“confidence”appliesnottothe
priate safety and health practices and determine the applica- particular interval but only to the long run proportion of cases
bility of regulatory limitations prior to use. when repeating the procedure many times.
3.1.4 confidence level, n—thevalue,1–α,oftheprobability
2. Referenced Documents
associated with a confidence interval, often expressed as a
2.1 ASTM Standards: percentage. E2586
3.1.4.1 Discussion—α is generally a small number. Confi-
E177Practice for Use of the Terms Precision and Bias in
ASTM Test Methods dence level is often 95 % or 99 %.
E456Terminology Relating to Quality and Statistics
3.1.5 confidence limit, n—each of the limits, L and U, of a
E2282Guide for Defining the Test Result of a Test Method
confidence interval, or the limit of a one-sided confidence
interval. E2586
3.1.6 degrees of freedom, n—the number of independent
This test method is under the jurisdiction ofASTM Committee E11 on Quality
data points minus the number of parameters that have to be
and Statistics and is the direct responsibility of Subcommittee E11.20 on Test
estimated before calculating the variance. E2586
Method Evaluation and Quality Control.
Current edition approved Oct. 1, 2014. Published August 2013. Originally
3.1.7 equivalence, n—similarity between two population
approved in 2013. Last previous edition approved in 2013 as E2935 – 13. DOI:
parameters within predetermined limits.
10.1520/E2935-14.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
3.1.8 intermediate precision conditions, n—conditions un-
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
der which test results are obtained with the same test method
Standards volume information, refer to the standard’s Document Summary page on
the ASTM website. usingtestunitsortestspecimenstakenatrandomfromasingle
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2935 − 14
quantity of material that is as nearly homogeneous as possible, blockdesignforageneralnumberofpopulationssampled,and
and with changing conditions such as operator, measuring each group of data within a sampling point is termed a block.
equipment, location within the laboratory, and time. E177
3.2.6 power, n—in equivalence testing, the probability of
accepting equivalence, given the true difference between two
3.1.9 mean, n—of a population, µ, average or expected
¯ population means.
value of a characteristic in a population – of a sample, X sum
of the observed values in the sample divided by the sample 3.2.6.1 Discussion—In the case of testing for bias equiva-
size. E2586 lence the power is the probability of accepting equivalence,
given the true difference between a population mean and an
3.1.10 population, n—the totality of items or units of
accepted reference value.
material under consideration. E2586
3.2.7 two independent samples design, n—in means equiva-
3.1.11 population parameter, n—summary measure of the
lence testing, replicate test results are determined indepen-
values of some characteristic of a population. E2586
dentlyfromtwopopulationsatasinglesamplingtimeforeach
3.1.12 precision, n—the closeness of agreement between
population.
independent test results obtained under stipulated conditions.
E177
3.2.7.1 Discussion—This design is termed a completely
randomized design for a general number of populations
3.1.13 repeatability, n—precision under repeatability
sampled.
conditions. E177
3.3 Symbols:
3.1.14 repeatability conditions, n—conditions where inde-
pendent test results are obtained with the same method on
B = bias (7.1.1)
identicaltestitemsinthesamelaboratorybythesameoperator
d = difference between a pair of test results at sampling
j
using the same equipment within short intervals of time. E177
point j (7.1.1)
¯
= average difference (7.1.1)
d
3.1.15 repeatability standard deviation (s ), n—the standard
r
D = difference in sample means (6.1.2)(X1.1.2)
deviation of test results obtained under repeatability
E = equivalence limit (5.2.1)
conditions. E177
E = lower equivalence limit (5.2.1.1)
3.1.16 sample, n—a group of observations or test results,
E = upper equivalence limit (5.2.1.1)
taken from a larger collection of observations or test results,
H : = null hypothesis (X1.1.1)
whichservestoprovideinformationthatmaybeusedasabasis
H : = alternate hypothesis (X1.1.1)
A
f = degrees of freedom for s (8.1.1)(X1.1.2)
for making a decision concerning the larger collection. E2586
f = degrees of freedom for s (6.1.1)
i i
3.1.17 sample size, n, n—number of observed values in the
f = degrees of freedom for s (6.1.2)
p p
sample. E2586
n = samplesize(numberoftestresults)fromapopulation
3.1.18 sample statistic, n—summary measure of the ob-
(5.3)(6.1.3)(7.1.1)(8.1.1)
served values of a sample. E2586
n = sample size from ith population (6.1.1)
i
n = sample size from population 1 (6.1.2)
3.1.19 test result, n—the value of a characteristic obtained
n = sample size from population 2 (6.1.2)
by carrying out a specified test method. E2282
s = sample standard deviation (8.1.1)
3.1.20 test unit, n—thetotalquantityofmaterial(containing
s = sample standard deviation for bias (8.1.2)
B
one or more test specimens) needed to obtain a test result as s = standard deviation of the difference between two test
d
specified in the test method. See test result. E2282 results (7.1.1)
s = samplestandarddeviationformeandifference(6.1.3)
D
3.2 Definitions of Terms Specific to This Standard:
(X1.1.2)
3.2.1 bias equivalence, n—equivalence of a population
s = sample standard deviation for ith population (6.1.1)
i
mean with an accepted reference value.
s = sample variance for ith population (6.1.1)
i
3.2.2 equivalence limit, E, n—in equivalence testing, a limit s = sample variance for population 1 (6.1.2)
s = sample variance for population 2 (6.1.2)
on the difference between two population parameters.
s = pooled sample standard deviation (6.1.2)
p
3.2.2.1 Discussion—In certain applications, this may be
s = repeatability sample standard deviation (6.2)
r
termed practical limit or practical difference.
t = Student’s t statistic (6.1.4)(7.1.3)(8.1.3)
3.2.3 equivalence test, n—a statistical test conducted within t = (1-α)th percentile of the Student’s t distribution with
12α,f
f degrees of freedom (X1.1.2)
predetermined risks to confirm equivalence of two population
X = jth test result from the ith population (6.1)
parameters.
ij
¯
= test result average (8.1.1)
X
3.2.4 means equivalence, n—equivalence of two population
¯
= test result average for the ith population (6.1.1)
X
i
means.
¯
= test result average for population 1 (6.1.3)
X
3.2.5 paired samples design, n—in means equivalence ¯
= test result average for population 2 (6.1.3)
X
testing, single samples are taken from the two populations at a
Z = (1-α)th percentile of the standard normal distribution
12α
number of sampling points.
(X1.5.1)
α = consumer’s risk (5.2.2)(6.2)(7.2)
3.2.5.1 Discussion—This design is termed a randomized
E2935 − 14
procedure will be adapted to the type of objective and
β = producer’s risk (5.3)
experiment design selected.
∆ = true mean difference between populations (5.3)
4.3.1 Historically, this procedure originated in the pharma-
µ = population mean (X1.4.1)
µ = ith population mean (X1.1.1) ceutical industry for use in bioequivalence trials (1, 2),
i
ν = approximate degrees of freedom for s (X1.1.4)
denoted as the Two One-Sided Test, and has since been
D
σ = standard deviation of the test method (5.2.3)
adopted for other applications, particularly in testing and
σ = standard deviation of the true difference between two
d measurement applications (3, 4).
populations (7.2)
4.3.2 The conventional Student’s t test used for detecting
Φ(•) = standard normal cumulative distribution function
differences is not recommended for equivalence testing as it
(X1.5.1)
does not properly control the consumer’s and producer’s risks
for this application (see X1.3).
3.4 Acronyms:
3.4.1 ARV, n—accepted reference value (5.1.2)(8.1)(X1.4)
4.4 Risk Management—Guidance is provided for determin-
ing the amount of data required to control the risks of making
3.4.2 CRM, n—certified reference material (5.1.2)(8.1)
the wrong decision in accepting or rejecting equivalence (see
3.4.3 ILS, n—interlaboratory study (6.2)
X1.2).
3.4.4 LCL, n—lower confidence limit (6.2.5)(7.2.3)
4.4.1 The consumer’s risk is the probability of accepting
3.4.5 TOST, n—two one-sided t test (4.3) (Section 6) (Sec- equivalence when the actual bias or difference in means is
tion 7) (Section 8)(Appendix X1) equal to the equivalence limit.This probability is controlled to
a low level so that accepting equivalence gives a high degree
3.4.6 UCL, n—upper confidence limit (6.2.5)(7.2.3)
of assurance that differences in question are less than the
equivalence limit.
4. Significance and Use
4.4.2 The producer’s risk is the risk of falsely rejecting
4.1 Laboratories conducting routine testing have a continu-
equivalence. If improvements are rejected this can lead to
ing need to evaluate test result bias, to evaluate changes for
opportunity losses to the company and its laboratories (the
improving the test process performance, or to validate the
producers) or cause additional unnecessary effort in improving
transfer of a test method to a new location or apparatus. In all
the testing process.
situations it must be demonstrated that any bias or innovation
will have negligible effect on test results for a characteristic of
5. Planning the Equivalence Study
a material. This standard provides statistical methods to con-
5.1 Objectives and Design Selection—This practice sup-
firm that the mean test results from a testing process are
ports two equivalence study objectives: (1) determining the
equivalenttothosefromareferencestandardoranothertesting
means equivalenceoftestresultsfromtwotestingprocessesor
process, where equivalence is defined as agreement within
(2) determining the bias equivalence of a test method. In both
prescribed limits, termed equivalence limits.
objectives, two population means are compared for equiva-
4.1.1 The intra-laboratory applications in this practice
lence.
include, but are not limited to, the following:
5.1.1 Means Equivalence—Thisstudycomparestheaverage
(1)Evaluating the bias of a test method with respect to a
test result from the current testing process with the innovated
certified reference material,
process. A single material is selected, subdivided into test
(2)Evaluating bias due to a minor change in a test method
samples, and distributed for testing by each process. The
procedure,
material should be reasonably homogeneous, because inhomo-
(3)Qualifying new instruments, apparatus, or operators in
geneity in the material will decrease the test precision.
a laboratory, and
5.1.1.1 Design Types—This practice provides recommenda-
(4)Qualifying new sources of reagents or other materials
tions for the design of a means equivalence experiment, and
used in the test procedure.
two basic designs are discussed. Section 6 discusses the two
4.1.2 This practice also supports evaluating systematic dif-
independent samples design, in which each population is
ferences in a method transfer from a developing laboratory to
sampled independently. Section 7 discusses the paired samples
a receiving laboratory.
design in which pairs of single samples from each population
are taken under different conditions of a second variable, such
4.2 This practice currently deals only with the equivalence
as time.
of population means. In this standard, a population refers to a
5.1.2 Bias Equivalence—This study requires a suitable
hypothetical set of test results arising from a stable testing
quantity of a certified reference material (CRM) having an
process that measures a characteristic of a single material.
...
This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E2935 − 13 E2935 − 14 An American National Standard
Standard Practice for
Conducting Equivalence Testing in Laboratory Applications
This standard is issued under the fixed designation E2935; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope
1.1 This practice provides statistical methodology for conducting equivalence testing on numerical data from two sources to
determine if their true means are similar within predetermined limits.
1.2 Applications include (1) equivalence testing for bias against an accepted reference value, (2) determining equivalence of two
test methods, test apparatus, instruments, reagent sources, or operators within a laboratory, and (3) equivalence of two laboratories
in a method transfer.
1.3 The current guidance in this standard applies only to experiments conducted on a single material. Guidance is given for
determining the amount of data required for an equivalence trial.
1.4 The statistical methodology for determining equivalence used is the “Two one-sided t-test” (TOST). The control of risks
associated with the equivalence decision is discussed.
1.5 The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard.
1.6 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory
limitations prior to use.
2. Referenced Documents
2.1 ASTM Standards:
E177 Practice for Use of the Terms Precision and Bias in ASTM Test Methods
E456 Terminology Relating to Quality and Statistics
E2282 Guide for Defining the Test Result of a Test Method
E2586 Practice for Calculating and Using Basic Statistics
3. Terminology
3.1 Definitions—See Terminology E456 for a more extensive listing of statistical terms.
3.1.1 accepted reference value, n—a value that serves as an agreed-upon reference for comparison, and which is derived as: (1)
a theoretical or established value, based on scientific principles, (2) an assigned or certified value, based on experimental work of
some national or international organization, or (3) a consensus or certified value, based on collaborative experimental work under
the auspices of a scientific or engineering group. E177
3.1.2 bias, n—the difference between the expectation of the test results and an accepted reference value. E177
3.1.3 confidence interval, n—an interval estimate [L, U] with the statistics L and U as limits for the parameter θ and with
confidence level 1 – α, where Pr(L ≤ θ ≤ U) ≥ 1– α. E2586
This test method is under the jurisdiction of ASTM Committee E11 on Quality and Statistics and is the direct responsibility of Subcommittee E11.20 on Test Method
Evaluation and Quality Control.
Current edition approved Aug. 1, 2013Oct. 1, 2014. Published August 2013. Originally approved in 2013. Last previous edition approved in 2013 as E2935 – 13. DOI:
10.1520/E2935-13.10.1520/E2935-14.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
3.1.3.1 Discussion—
The confidence level, 1 – α, reflects the proportion of cases that the confidence interval [L, U] would contain or cover the true
parameter value in a series of repeated random samples under identical conditions. Once L and U are given values, the resulting
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2935 − 14
confidence interval either does or does not contain it. In this sense “confidence” applies not to the particular interval but only to
the long run proportion of cases when repeating the procedure many times.
3.1.4 confidence level, n—the value, 1 – α, of the probability associated with a confidence interval, often expressed as a
percentage. E2586
3.1.4.1 Discussion—
α is generally a small number. Confidence level is often 95 % or 99 %.
3.1.5 confidence limit, n—each of the limits, L and U, of a confidence interval, or the limit of a one-sided confidence interval.
E2586
3.1.6 degrees of freedom, n—the number of independent data points minus the number of parameters that have to be estimated
before calculating the variance. E2586
3.1.7 equivalence, n—similarity between two population parameters within predetermined limits.
3.1.8 intermediate precision conditions, n—conditions under which test results are obtained with the same test method using test
units or test specimens taken at random from a single quantity of material that is as nearly homogeneous as possible, and with
changing conditions such as operator, measuring equipment, location within the laboratory, and time. E177
3.1.9 mean, n—of a population, μ, average or expected value of a characteristic in a population – of a sample,X¯ sum of the
observed values in the sample divided by the sample size. E2586
3.1.10 population, n—the totality of items or units of material under consideration. E2586
3.1.11 population parameter, n—summary measure of the values of some characteristic of a population. E2586
3.1.12 precision, n—the closeness of agreement between independent test results obtained under stipulated conditions. E177
3.1.13 repeatability, n—precision under repeatability conditions. E177
3.1.14 repeatability conditions, n—conditions where independent test results are obtained with the same method on identical test
items in the same laboratory by the same operator using the same equipment within short intervals of time. E177
3.1.15 repeatability standard deviation (s ), n—the standard deviation of test results obtained under repeatability conditions.
r
E177
3.1.16 sample, n—a group of observations or test results, taken from a larger collection of observations or test results, which
serves to provide information that may be used as a basis for making a decision concerning the larger collection. E2586
3.1.17 sample size, n, n—number of observed values in the sample. E2586
3.1.18 sample statistic, n—summary measure of the observed values of a sample. E2586
3.1.19 test result, n—the value of a characteristic obtained by carrying out a specified test method. E2282
3.1.20 test unit, n—the total quantity of material (containing one or more test specimens) needed to obtain a test result as
specified in the test method. See test result. E2282
3.2 Definitions of Terms Specific to This Standard:
3.2.1 bias equivalence, n—equivalence of a population mean with an accepted reference value.
3.2.2 equivalence limit, E, n—in equivalence testing, a limit on the difference between two population parameters.
3.2.2.1 Discussion—
In certain applications, this may be termed practical limit or practical difference.
3.2.3 equivalence test, n—a statistical test conducted within predetermined risks to confirm equivalence of two population
parameters.
3.2.4 means equivalence, n—equivalence of two population means.
3.2.5 paired samples design, n—in means equivalence testing, single samples are taken from the two populations at a number
of sampling points.
3.2.5.1 Discussion—
This design is termed a randomized block design for a general number of populations sampled, and each group of data within a
sampling point is termed a block.
3.2.6 power, n—in equivalence testing, the probability of accepting equivalence, given the true difference between two
population means.
E2935 − 14
3.2.6.1 Discussion—
In the case of testing for bias equivalence the power is the probability of accepting equivalence, given the true difference between
a population mean and an accepted reference value.
3.2.7 two independent samples design, n—in means equivalence testing, replicate test results are determined independently from
two populations at a single sampling time for each population.
3.2.7.1 Discussion—
This design is termed a completely randomized design for a general number of populations sampled.
3.3 Symbols:
B = bias (7.1.1)
d = difference between a pair of test results at sampling point j (7.1.1)
j
¯
= average difference (7.1.1)
d
D = difference in sample means (6.1.2) (X1.1.2)
E = equivalence limit (5.2.1)
E = lower equivalence limit (5.2.1.1)
E = upper equivalence limit (5.2.1.1)
H : = null hypothesis (X1.1.1)
H : = alternate hypothesis (X1.1.1)
A
f = degrees of freedom for s (8.1.1) (X1.1.2)
f = degrees of freedom for s (6.1.1)
i i
f = degrees of freedom for s (6.1.2)
p p
n = sample size (number of test results) from a population (5.3) (6.1.3) (7.1.1) (8.1.1)
n = sample size from ith population (6.1.1)
i
n = sample size from population 1 (6.1.2)
n = sample size from population 2 (6.1.2)
s = sample standard deviation (8.1.1)
s = sample standard deviation for bias (8.1.2)
B
s = standard deviation of the difference between two test results (7.1.1)
d
s = sample standard deviation for mean difference (6.1.3) (X1.1.2)
D
s = sample standard deviation for ith population (6.1.1)
i
s = sample variance for ith population (6.1.1)
i
s = sample variance for population 1 (6.1.2)
s = sample variance for population 2 (6.1.2)
s = pooled sample standard deviation (6.1.2)
p
s = repeatability sample standard deviation (6.2)
r
t = Student’s t statistic (6.1.4) (7.1.3) (8.1.3)
t = (1-α)th percentile of the Student’s t distribution with f degrees of freedom (X1.1.2)
12α,f
X = jth test result from the ith population (6.1)
ij
¯
= test result average (8.1.1)
X
¯
= test result average for the ith population (6.1.1)
X
i
¯
= test result average for population 1 (6.1.3)
X
¯
= test result average for population 2 (6.1.3)
X
Z = (1-α)th percentile of the standard normal distribution (X1.5.1)
12α
α = consumer’s risk (5.2.2) (6.2) (7.2)
β = producer’s risk (5.3)
Δ = true mean difference between populations (5.3)
μ = population mean (X1.4.1)
μ = ith population mean (X1.1.1)
i
ν = approximate degrees of freedom for s (X1.1.4)
D
σ = standard deviation of the test method (5.2.3)
σ = standard deviation of the true difference between two populations (7.2)
d
Φ(•) = standard normal cumulative distribution function (X1.5.1)
3.4 Acronyms:
3.4.1 ARV, n—accepted reference value (5.1.2) (8.1) (X1.4)
3.4.2 CRM, n—certified reference material (5.1.2) (8.1)
3.4.3 ILS, n—interlaboratory study (6.2)
E2935 − 14
3.4.4 LCL, n—lower confidence limit (6.2.5) (7.2.3)
3.4.5 TOST, n—two one-sided t test (4.3) (Section 6) (Section 7) (Section 8) (Appendix X1)
3.4.6 UCL, n—upper confidence limit (6.2.5) (7.2.3)
4. Significance and Use
4.1 Laboratories conducting routine testing have a continuing need to evaluate test result bias, to evaluate changes for improving
the test process performance, or to validate the transfer of a test method to a new location or apparatus. In all situations it must
be demonstrated that any bias or innovation will have negligible effect on test results for a characteristic of a material. This standard
provides statistical methods to confirm that the mean test results from a testing process are equivalent to those from a reference
standard or another testing process, where equivalence is defined as agreement within prescribed limits, termed equivalence limits.
4.1.1 The intra-laboratory applications in this practice include, but are not limited to, the following:
(1) Evaluating the bias of a test method with respect to a certified reference material,
(2) Evaluating bias due to a minor change in a test method procedure,
(3) Qualifying new instruments, apparatus, or operators in a laboratory, and
(4) Qualifying new sources of reagents or other materials used in the test procedure.
4.1.2 This practice also supports evaluating bias systematic differences in a method transfer from a developing laboratory to a
receiving laboratory.
4.2 This practice currently deals only with the equivalence of population means. In this standard, a population refers to a
hypothetical set of test results arising from a stable testing process that measures a characteristic of a single material.
NOTE 1—The equivalence concept can also apply to population parameters other than means, such as precision, stated as variances, standard deviations,
or relative standard deviations (coefficients of variation), linearity, sensitivity, specificity, etc.
4.3 The data analysis for equivalence testing of population means in this practice uses a statistical methodology termed the “Two
one-sided t-test” (TOST) procedure which shall be described in detail in this standard (see X1.1). The TOST procedure will be
adapted to the type of objective and experiment design selected.
4.3.1 Historically, this procedure originated in the pharmaceutical industry for use in bioequivalence trials (1, 2), denoted as
the Two One-Sided Test, and has since been adopted for other applications, particularly in testing and measurement applications
(3, 4).
4.3.2 The conventional Student’s t test used for detecting differences is not recommended for equivalence testing as it does not
properly control the consumer’s and producer’s risks for this application (see X1.3).
4.4 Risk Management—This practice provides recommendations for the design of an equivalence experiment, and two basic
designs are discussed. Guidance is provided for determining the amount of data required to control the risks of making the wrong
decision in accepting or rejecting equivalence (see X1.2).
4.4.1 The consumer’s risk is the probability of accepting equivalence when the actual bias or difference in means is equal to
the equivalence limit. This probability is controlled to a low level so that accepting equivalence gives a high degree of assurance
that differences in question are less than the equivalence limit.
4.4.2 The producer’s risk is the risk of falsely rejecting equivalence. If improvements are rejected this can lead to opportunity
losses to the company and its laboratories (the producers) or cause additional unnecessary effort in improving the testing process.
5. Planning the Equivalence Study
5.1 Objectives and Design Selection—Thi
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.