ASTM E2935-13
(Practice)Standard Practice for Conducting Equivalence Testing in Laboratory Applications
Standard Practice for Conducting Equivalence Testing in Laboratory Applications
SIGNIFICANCE AND USE
4.1 Laboratories conducting routine testing have a continuing need to evaluate test result bias, to evaluate changes for improving the test process performance, or to validate the transfer of a test method to a new location or apparatus. In all situations it must be demonstrated that any bias or innovation will have negligible effect on test results for a characteristic of a material. This standard provides statistical methods to confirm that the mean test results from a testing process are equivalent to those from a reference standard or another testing process, where equivalence is defined as agreement within prescribed limits, termed equivalence limits.
4.1.1 The intra-laboratory applications in this practice include, but are not limited to, the following: (1) Evaluating the bias of a test method with respect to a certified reference material,(2) Evaluating bias due to a minor change in a test method procedure, (3) Qualifying new instruments, apparatus, or operators in a laboratory, and(4) Qualifying new sources of reagents or other materials used in the test procedure.
4.1.2 This practice also supports evaluating bias in a method transfer from a developing laboratory to a receiving laboratory.
4.2 This practice currently deals only with the equivalence of population means. In this standard, a population refers to a hypothetical set of test results arising from a stable testing process that measures a characteristic of a single material.Note 1—The equivalence concept can also apply to population parameters other than means, such as precision, stated as variances, standard deviations, or relative standard deviations (coefficients of variation), linearity, sensitivity, specificity, etc.
4.3 The data analysis for equivalence testing of population means in this practice uses a statistical methodology termed the “Two one-sided t-test” (TOST) procedure which shall be described in detail in this standard (see X1.1). The TOST procedure will be adapted to the...
SCOPE
1.1 This practice provides statistical methodology for conducting equivalence testing on numerical data from two sources to determine if their true means are similar within predetermined limits.
1.2 Applications include (1) equivalence testing for bias against an accepted reference value, (2) determining equivalence of two test methods, test apparatus, instruments, reagent sources, or operators within a laboratory, and (3) equivalence of two laboratories in a method transfer.
1.3 The current guidance in this standard applies only to experiments conducted on a single material. Guidance is given for determining the amount of data required for an equivalence trial.
1.4 The statistical methodology for determining equivalence used is the “Two one-sided t-test” (TOST). The control of risks associated with the equivalence decision is discussed.
1.5 The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard.
1.6 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2935 − 13 AnAmerican National Standard
Standard Practice for
Conducting Equivalence Testing in Laboratory Applications
This standard is issued under the fixed designation E2935; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 3. Terminology
1.1 This practice provides statistical methodology for con- 3.1 Definitions—See Terminology E456 for a more exten-
ducting equivalence testing on numerical data from two sive listing of statistical terms.
sources to determine if their true means are similar within
3.1.1 accepted reference value, n—a value that serves as an
predetermined limits.
agreed-upon reference for comparison, and which is derived
as: (1) a theoretical or established value, based on scientific
1.2 Applications include (1) equivalence testing for bias
principles, (2) an assigned or certified value, based on experi-
against an accepted reference value, (2) determining equiva-
mental work of some national or international organization, or
lence of two test methods, test apparatus, instruments, reagent
(3) a consensus or certified value, based on collaborative
sources, or operators within a laboratory, and (3) equivalence
experimental work under the auspices of a scientific or
of two laboratories in a method transfer.
engineering group. E177
1.3 The current guidance in this standard applies only to
3.1.2 bias, n—the difference between the expectation of the
experiments conducted on a single material. Guidance is given
test results and an accepted reference value. E177
for determining the amount of data required for an equivalence
trial.
3.1.3 confidence interval, n—an interval estimate [L, U]
with the statistics L and U as limits for the parameter θ and
1.4 Thestatisticalmethodologyfordeterminingequivalence
withconfidencelevel1–α,wherePr(L≤θ≤U)≥1–α. E2586
used is the “Two one-sided t-test” (TOST).The control of risks
3.1.3.1 Discussion—The confidence level, 1 –α, reflects the
associated with the equivalence decision is discussed.
proportion of cases that the confidence interval [L, U] would
1.5 The values stated in SI units are to be regarded as
containorcoverthetrueparametervalueinaseriesofrepeated
standard. No other units of measurement are included in this
random samples under identical conditions. Once L and U are
standard.
given values, the resulting confidence interval either does or
1.6 This standard does not purport to address all of the
doesnotcontainit.Inthissense“confidence”appliesnottothe
safety concerns, if any, associated with its use. It is the
particular interval but only to the long run proportion of cases
responsibility of the user of this standard to establish appro-
when repeating the procedure many times.
priate safety and health practices and determine the applica-
3.1.4 confidence level, n—thevalue,1–α,oftheprobability
bility of regulatory limitations prior to use.
associated with a confidence interval, often expressed as a
percentage. E2586
2. Referenced Documents
2 3.1.4.1 Discussion—α is generally a small number. Confi-
2.1 ASTM Standards:
dence level is often 95 % or 99 %.
E177 Practice for Use of the Terms Precision and Bias in
ASTM Test Methods
3.1.5 confidence limit, n—each of the limits, L and U, of a
E456 Terminology Relating to Quality and Statistics confidence interval, or the limit of a one-sided confidence
E2282 Guide for Defining the Test Result of a Test Method interval. E2586
E2586 Practice for Calculating and Using Basic Statistics
3.1.6 degrees of freedom, n—the number of independent
data points minus the number of parameters that have to be
estimated before calculating the variance. E2586
This test method is under the jurisdiction ofASTM Committee E11 on Quality
and Statistics and is the direct responsibility of Subcommittee E11.20 on Test
3.1.7 equivalence, n—similarity between two population
Method Evaluation and Quality Control.
parameters within predetermined limits.
Current edition approved Aug. 1, 2013. Published August 2013. DOI: 10.1520/
E2935-13.
3.1.8 intermediate precision conditions, n—conditions un-
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
der which test results are obtained with the same test method
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
usingtestunitsortestspecimenstakenatrandomfromasingle
Standards volume information, refer to the standard’s Document Summary page on
the ASTM website. quantity of material that is as nearly homogeneous as possible,
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2935 − 13
and with changing conditions such as operator, measuring 4. Significance and Use
equipment, location within the laboratory, and time. E177
4.1 Laboratories conducting routine testing have a continu-
3.1.9 mean, n—of a population, µ, average or expected
ing need to evaluate test result bias, to evaluate changes for
¯
value of a characteristic in a population – of a sample, X sum
improving the test process performance, or to validate the
of the observed values in the sample divided by the sample
transfer of a test method to a new location or apparatus. In all
size. E2586
situations it must be demonstrated that any bias or innovation
will have negligible effect on test results for a characteristic of
3.1.10 population, n—the totality of items or units of
a material. This standard provides statistical methods to con-
material under consideration. E2586
firm that the mean test results from a testing process are
3.1.11 population parameter, n—summary measure of the
equivalenttothosefromareferencestandardoranothertesting
values of some characteristic of a population. E2586
process, where equivalence is defined as agreement within
3.1.12 precision, n—the closeness of agreement between
prescribed limits, termed equivalence limits.
independent test results obtained under stipulated conditions.
4.1.1 The intra-laboratory applications in this practice
E177
include, but are not limited to, the following:
3.1.13 repeatability, n—precision under repeatability
(1) Evaluating the bias of a test method with respect to a
conditions. E177 certified reference material,
(2) Evaluating bias due to a minor change in a test method
3.1.14 repeatability conditions, n—conditions where inde-
procedure,
pendent test results are obtained with the same method on
(3) Qualifying new instruments, apparatus, or operators in
identical test items in the same laboratory by the same operator
a laboratory, and
using the same equipment within short intervals of time. E177
(4) Qualifying new sources of reagents or other materials
3.1.15 repeatability standard deviation (s ), n—the standard
r
used in the test procedure.
deviation of test results obtained under repeatability
4.1.2 Thispracticealsosupportsevaluatingbiasinamethod
conditions. E177
transfer from a developing laboratory to a receiving laboratory.
3.1.16 sample, n—a group of observations or test results,
4.2 This practice currently deals only with the equivalence
taken from a larger collection of observations or test results,
of population means. In this standard, a population refers to a
whichservestoprovideinformationthatmaybeusedasabasis
hypothetical set of test results arising from a stable testing
for making a decision concerning the larger collection. E2586
process that measures a characteristic of a single material.
3.1.17 sample size, n, n—number of observed values in the
NOTE 1—The equivalence concept can also apply to population
sample. E2586
parameters other than means, such as precision, stated as variances,
3.1.18 sample statistic, n—summary measure of the ob-
standard deviations, or relative standard deviations (coefficients of
served values of a sample. E2586 variation), linearity, sensitivity, specificity, etc.
3.1.19 test result, n—the value of a characteristic obtained 4.3 The data analysis for equivalence testing of population
by carrying out a specified test method. E2282 meansinthispracticeusesastatisticalmethodologytermedthe
“Two one-sided t-test” (TOST) procedure which shall be
3.1.20 test unit, n—the total quantity of material (containing
described in detail in this standard (see X1.1). The TOST
one or more test specimens) needed to obtain a test result as
procedure will be adapted to the type of objective and
specified in the test method. See test result. E2282
experiment design selected.
3.2 Definitions of Terms Specific to This Standard:
4.3.1 Historically, this procedure originated in the pharma-
3.2.1 bias equivalence, n—equivalence of a population
ceutical industry for use in bioequivalence trials (1, 2),
mean with an accepted reference value.
denoted as the Two One-Sided Test, and has since been
3.2.2 equivalence limit, E, n—in equivalence testing, a limit
adopted for other applications, particularly in testing and
on the difference between two population parameters.
measurement applications (3, 4).
4.3.2 The conventional Student’s t test used for detecting
3.2.2.1 Discussion—In certain applications, this may be
differences is not recommended for equivalence testing as it
termed practical limit or practical difference.
does not properly control the consumer’s and producer’s risks
3.2.3 equivalence test, n—a statistical test conducted within
for this application (see X1.3).
predetermined risks to confirm equivalence of two population
4.4 This practice provides recommendations for the design
parameters.
of an equivalence experiment, and two basic designs are
3.2.4 means equivalence, n—equivalence of two population
discussed. Guidance is provided for determining the amount of
means.
data required to control the risks of making the wrong decision
3.2.5 power, n—in equivalence testing, the probability of
in accepting or rejecting equivalence (see X1.2).
accepting equivalence, given the true difference between two
4.4.1 The consumer’s risk is the probability of accepting
population means.
equivalence when the actual bias or difference in means is
3.2.5.1 Discussion—In the case of testing for bias equiva-
lence the power is the probability of accepting equivalence,
given the true difference between a population mean and an
The boldface numbers in parentheses refer to the list of references at the end of
accepted reference value. this standard.
E2935 − 13
equal to the equivalence limit. This probability is controlled to 5.2.1 The equivalence limits to be used in the TOST
a low level so that accepting equivalence gives a high degree procedure are selected as the worst-case differences between
of assurance that differences in question are less than the
the two population means and are determined by the subject
equivalence limit. matter expert or by industry consensus. These limits are
4.4.2 The producer’s risk is the risk of falsely rejecting
usually symmetrical around zero and then are denoted as –E
equivalence. If improvements are rejected this can lead to and E.
opportunity losses to the company and its laboratories (the
5.2.1.1 In certain cases the limits may be asymmetrical and
producers) or cause additional unnecessary effort in improving
are then denoted by E and E , where E is usually a negative
1 2 1
the testing process.
value. The producer’s risk profile for this situation will not be
treated in this practice.
5. Planning the Equivalence Study
5.2.2 The consumer’s risk α is the probability of falsely
5.1 Objectives and Design Selection—This practice sup-
declaring equivalence and is usually set at a value of 0.05,
ports two equivalence study objectives: (1) determining the
representing a 5% risk. Other risk levels may be selected,
bias equivalence of a test method or (2) determining the means
depending on circumstances.
equivalence of test results from two testing processes. In both
5.2.3 The test method precision, σ, is stated as the standard
objectives two population means are compared for equiva-
deviation of the test method, or methods, used in the equiva-
lence.
lence study. An estimate may be available from a method
5.1.1 Bias Equivalence—This study requires a suitable
validation, an interlaboratory study, or other sources.
quantity of a certified reference material having an accepted
5.3 Sample Size Determination—The number of test results,
reference value (ARV) for the material characteristic of inter-
n, from each population controls the producer’s risk β of
est. The ARV is considered as a known population mean with
falselyrejectingequivalenceatagiventruemeandifference,∆.
zero variability for the purpose of the equivalence study. The
The producer’s risk may be alternatively stated in terms of the
average of the test results conducted on the reference material
power, or probability 1–β of properly accepting equivalence at
is the population mean estimate to be compared with theARV
a given value of ∆.
(see X1.4).
5.1.2 Means Equivalence—Thisstudycomparestheaverage
5.3.1 For symmetric equivalence limits, the power profile
test result from the current testing process with the innovated
plots the probability of properly declaring equivalence versus
process. A single material is selected, subdivided into test
theabsolutevalueof∆,duetothesymmetryoftheequivalence
samples, and distributed for testing by each process. The
limits. This calculation can be performed using a spreadsheet
material should be reasonably homogeneous, because inhomo-
computer package (see X1.5).
geneity in the material will decrease the test precision.
5.3.2 An example of a set of power profiles is shown in Fig.
5.2 Design Requirements—Inputs for carrying out the sta- 1. The probability scale for power on the vertical axis varies
tistical test of equivalence are the equivalence limits and the from 0 to 1. The power profile, a reversed S-shaped curve,
consumer’s risk. Additional inputs for designing the equiva- should be close to a probability of 1 at zero absolute difference
lencestudyareanestimateofthetestmethodprecisionandthe and will decline to the consumer risk probability at an absolute
producer’s risk profile over selected differences in the means. difference of E. Power for absolute differences greater than E
FIG. 1 Multiple Power Curves for Lab Transfer Example
E2935 − 13
are less than the consumer risk and decline asymptotically to (0 6 E), equivalently if LCL>–E and UCL < E, then accept
zero as the absolute difference increases. equivalence. Otherwise, reject equivalence.
5.3.2.1 In Fig. 1 power profiles ar
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.