ASTM D4855-97
(Practice)Standard Practice for Comparing Test Methods
Standard Practice for Comparing Test Methods
SCOPE
1.1 This practice provides a procedure for evaluating and comparing test methods under controlled conditions using the same materials tested during the same time span. The practice describes how to obtain and compare estimates on precision, sensitivity, and bias.
1.2 This practice covers the following topics: Topic Title Section Number Scope 1 Referenced Documents 2 Terminology 3 Significance and Use 4 Requirements for Materials 5 Evaluating Test Methods 6 Sensitivity Criterion 7 Basic Statistical Design 8 Experimental Procedure 9 Procedure for Comparing Precision 10 Evaluating the Bias Between Test Methods 11 Procedure for Comparing Sensitivities 12 Report 13
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superceded and replaced by a new version or discontinued.
Contact ASTM International (www.astm.org) for the latest information.
Designation: D 4855 – 97
Standard Practice for
Comparing Test Methods
This standard is issued under the fixed designation D 4855; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
1. Scope 3. Terminology
1.1 This practice provides a procedure for evaluating and 3.1 Definitions:
comparing test methods under controlled conditions using the 3.1.1 accuracy, n—of a test method, the degree of agree-
same materials tested during the same time span. The practice ment between the true value of the property being tested (or an
describes how to obtain and compare estimates on precision, accepted standard value) and the average of many observations
sensitivity, and bias. made according to the test method, preferably by many
1.2 This practice covers the following topics: observers. (See also bias and precision.)
3.1.1.1 Discussion—Increased accuracy is associated with
Topic Title Section
number
decreased bias relative to the true value; two methods with
equal bias relative to the true value have equal accuracy even
Scope 1
if one method is more precise than the other. The true value is
Referenced Documents 2
Terminology 3
the exact value of the property being tested for the statistical
Significance and Use 4
universe being sampled. When the true value is not known or
Requirements for Materials 5
cannot be determined, and an acceptable standard value is not
Evaluating Test Methods 6
Sensitivity Criterion 7
available, accuracy cannot be established. No valid inferences
Basic Statistical Design 8
on the accuracy of a method can be drawn from an individual
Experimental Procedure 9
Procedure for Comparing Precision 10 observation.
Evaluating the Bias Between Test Methods 11
3.1.2 bias, n—in statistics, a constant or systematic error in
Procedure for Comparing Sensitivities 12
test results.
Report 13
3.1.2.1 Discussion—Bias can exist between the accepted
2. Referenced Documents
reference value and a test result obtained from one method,
between test results obtained from two methods, or between
2.1 ASTM Standards:
D 123 Terminology Relating to Textiles two test results obtained from a single method, for example,
between operators or between laboratories.
D 2905 Practice for Statements on Number of Specimens
for Textiles 3.1.3 confidence interval, n—the interval estimate of a
population parameter computed so that the statement “the
D 2906 Practice for Statements on Precision and Bias for
Textiles population parameter lies in this interval” will be true, on the
average, in a stated proportion of the times such statements are
E 456 Terminology Relating to Quality and Statistics
made.
2.2 ASTM Adjuncts:
3.1.4 confidence level, n—the stated proportion of times the
TEX-PAC
confidence interval is expected to include the population
NOTE 1—Tex-Pac is a group of PC programs on floppy disks, available
parameter.
through ASTM Headquarters, 100 Barr Harbor Drive, Conshohocken, PA
3.1.4.1 Discussion—Statisticians generally accept that, in
19428, USA. The calculations for comparing the precision, sensitivity and
the absence of special consideration, 0.95 or 95 % is a realistic
bias of two test methods can be done using one of these programs and
statements on the relative merits of the two test methods are part of the confidence level. If the consequences of incorrectly estimating
output.
the confidence interval would be grave, then a higher confi-
dence level might be considered. If the consequences of
incorrectly estimating the confidence interval are of less than
This practice is under the jurisdiction of ASTM Committee D-13 on Textiles
usual concern, then a lower confidence interval might be
and is the direct responsibility of Subcommittee D13.93 on Statistics.
considered.
Current edition approved September 10, 1997. Published August 1998. Origi-
3.1.5 confidence limits, n—the two statistics that define the
nally published as D 4855 – 88. Last previous edition D 4855 – 91.
Annual Book of ASTM Standards, Vol 07.01.
ends of a confidence interval.
Annual Book of ASTM Standards, Vol 14.02.
3.1.6 degrees of freedom, n—for a set, the number of values
PC programs on floppy disks are available through ASTM. For a 3 ⁄2 inch disk
that can be assigned arbitrarily and still get the same value for
request PCN:12-429040-18, for a 5 ⁄4 inch disk request PCN:12-429041-18.
Copyright © ASTM, 100 Barr Harbor Drive, West Conshohocken, PA 19428-2959, United States.
NOTICE: This standard has either been superceded and replaced by a new version or discontinued.
Contact ASTM International (www.astm.org) for the latest information.
D 4855
each of one or more statistics calculated from the set of data. averages or a sample average and a hypothetical value.
3.1.19 Type I error—See error of the first kind.
3.1.6.1 Discussion—For example, if only an average is
specified for a set of five observations, there are four degrees of 3.1.20 Type II error—See error of the second kind.
freedom since the same average can be obtained with any
3.1.21 For definitions of textile terms used in this standard,
values substituted for four of the five observations as long as refer to Terminology D 123. For definitions of other statistical
the fifth value is set to give the correct total. If both the average terms used in this standard, refer to Terminology D 4392 or
and the standard deviation have been specified, there are only Terminology E 456.
three degrees of freedom left.
4. Significance and Use
3.1.7 error of the first kind, a, n—in a statistical test, the
rejection of a statistical hypothesis when it is true. (Syn. Type
4.1 Task groups developing a test method frequently find
I error.)
themselves with two or more alternative procedures that must
3.1.8 error of the second kind, b, n—in a statistical test, the
be compared. Three common situations are:
acceptance of a statistical hypothesis when it is false. (Syn.
4.1.1 Two or more new test methods may have been
Type II error.)
proposed to measure a property for which there is no existing
3.1.9 F-test, n—a test of statistical significance based on the
method.
use of George W. Snedecor’s F-distribution and used to
4.1.2 A new test method may have been suggested to
compare two sample variances or a sample variance and a
replace an existing test method.
hypothetical value.
4.1.3 Two or more existing test methods may overlap in
3.1.10 interference, n—in testing, an effect due to the
their scopes so that one should be chosen over the other.
presence of a constituent or characteristic that influences the
4.2 The selection of one test method in preference to
measurement of another constituent or characteristic.
another is not simply a statistical choice. There are many other
3.1.11 least difference of practical importance, d, n—the
aspects of two test methods that should be considered, which
smallest difference based on engineering judgment deemed to
may have an influence (on the engineering judgment of the task
be of practical importance when considering whether a signifi-
group) equal to or greater than the statistical evidence. Some of
cant difference exists between two statistics or between a
these characteristics are discussed in Section 6.
statistic and a hypothetical value.
3.1.12 parameter, n—in statistics, a variable that describes 5. Requirements for Materials
a characteristic of a population or mathematical model.
5.1 The number and type of materials to be included in a
3.1.13 precision, n—the degree of agreement within a set of
comparison study will depend on the following:
observations or test results obtained as directed in a method.
5.1.1 The range of the values of the property being mea-
3.1.13.1 Discussion—The term “precision,” delimited in
sured on a given material and how the precision varies over
various ways, is used to describe different aspects of precision.
that range,
This usage was chosen in preference to the use of “repeatabil-
5.1.2 The number of different materials to which the test
ity” and “reproducibility,” which have been assigned conflict-
method is applied.
ing meanings by various authors and standardizing bodies.
5.1.3 The difficulty and expense involved in obtaining,
3.1.14 ruggedness test, n—an experiment in which environ-
processing, and distributing samples,
mental or test conditions are deliberately varied in order to
5.1.4 The difficulty of, length of time required for, and
evaluate the effects of such variations.
expense of performing the tests, and
3.1.15 sensitivity, n—for a single test method, the result of
5.1.5 The uncertainty of prior information on any of these
dividing (1) the derivative of measurements at different levels
points. For example, if it is already known that the precision is
of a property of interest to known values of the property by (2)
relatively constant or proportional to the average level over the
the standard deviation of such measurements. (Syn. absolute
range of values of interest, a smaller number of materials will
sensitivity.)
be needed than if it is known that the precision changes
3.1.15.1 Discussion—The sensitivity of a single test method
erratically at different levels. A preliminary pilot or screening
may be determined only with materials for which the values of
program may help to settle some of these questions, and may
the property of interest is known.
often result in the saving of considerable time and expense in
3.1.16 sensitivity ratio, SR, n—in comparing two test meth-
the full comparison study.
ods, the ratio of the sensitivities of the test methods with the
5.2 In general, a minimum of three materials should be
larger sensitivity in the numerator. (Syn. relative sensitivity.)
considered acceptable, and for development of broadly appli-
3.1.16.1 Discussion—When the same materials are used for
cable precision statements, six or more materials should be
each test method, the sensitivity ratio may be determined using included in the study.
materials for which the value of the property of interest is not
5.3 Whenever feasible, the material representing any given
known.
level in a comparison study should be made as homogeneous as
3.1.17 statistic, n—a quantity that is calculated from obser- possible prior to its subdivision into portions or specimens that
vations on a sample and that estimates a parameter of a sample
are allocated to the different methods.
and that estimates a parameter of a population. 5.4 For each level of material, an adequate quantity
3.1.18 t-test, n—a test of statistical significance based on the (sample) of reasonably homogeneous material should be avail-
use of Student’s t-distribution and used to compare two sample able for subdivision for each test method. This supply of
NOTICE: This standard has either been superceded and replaced by a new version or discontinued.
Contact ASTM International (www.astm.org) for the latest information.
D 4855
material should include a reserve of 50 % beyond the require- standpoint of detecting changes in the level in the property of
ments of the protocol for the comparison study for possible interest. The sensitivity criterion is a quantitative measure of
later use in checking results or retesting the test methods in one the relative merit of two test methods which:
or more laboratories. 7.1.1 Combines the precision of each method with the
ability of the test method to measure differences in the property
6. Evaluating Test Methods
of interest.
6.1 Each Proposed New Test Method—When evaluating 7.1.2 Permits the comparison of test methods for which test
one or more test methods, take into account the following
results are reported in different units of measure. For this
features that are desirable in a proposed test method: reason, comparisons of the sensitivity of two methods may be
6.1.1 The relationship between the test results and the
more meaningful than comparisons of their precisions.
property of interest is clearly understood. 7.2 When comparing test methods on the basis of data
6.1.2 There is a small or non-existent bias over a wide range
collected, it is important that the task group has formulated and
of test results. evaluated a plan for analysis of the data so as to arrive at a
6.1.3 The test method is precise enough to satisfy the
correct decision, before conducting any tests. Statistical tests of
requirements of the application. significance are recommended as a means of helping make the
6.1.4 The test method has acceptable ruggedness and sen-
decisions for these reasons: they are objective, they require a
sitivity. clear statement of the problem, they make more efficient use of
6.1.5 Any potential interferences are known and small
the observed data than subjective techniques, and they allow
enough to tolerate. control of the probability of concluding two test methods are
6.1.6 There is a low cost for making an observation with different when they are really alike, as well as the probability
short times for learning to run the test, getting ready to run the of concluding two test methods are alike when they are really
test and cleaning up after running the test. different.
6.1.7 The test method may have other special attributes that
8. Basic Statistical Design
encourage its selection as a preferred method.
6.1.8 Data are available from the advocates of the test 8.1 Decide whether the precision, the sensitivity, the accu-
method to support the above claims. racy, or the bias of the two test methods is to be compared.
6.2 Two or More New Test Methods—When two or more 8.2 Specify the values of probability of Type I error, a,
new test methods are being evaluated, the task group should probability of Type II error, b, and the least difference of
also consider the possibility that: practical importance, d, to be used in determining the number
6.2.1 One test method may be more suitable for one range of of observations required for each level and method (see Fig. 1).
values and another for a second range of values. 8.3 It is common practice to arbitrarily set a5 0.05 and b
5 0.10. The use of an a error of 0.05 is a compromise between
6.2.2 One method may be better suited as a referee method
while the other is better for routine testing. the increased cost of experimenting when a is smaller and the
greater risk of falsely stating that two equivalent methods are
6.3 New Versus Existing Test Method—When looking for a
new test method the task group wants improved precision, different that exists when a is larger. The b error of 0.10 takes
into account the fact that the risk
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.