ASTM D4855-97(2002)
(Practice)Standard Practice for Comparing Test Methods (Withdrawn 2008)
Standard Practice for Comparing Test Methods (Withdrawn 2008)
SIGNIFICANCE AND USE
Task groups developing a test method frequently find themselves with two or more alternative procedures that must be compared. Three common situations are:
4.1.1 Two or more new test methods may have been proposed to measure a property for which there is no existing method.
4.1.2 A new test method may have been suggested to replace an existing test method.
4.1.3 Two or more existing test methods may overlap in their scopes so that one should be chosen over the other.
The selection of one test method in preference to another is not simply a statistical choice. There are many other aspects of two test methods that should be considered, which may have an influence (on the engineering judgment of the task group) equal to or greater than the statistical evidence. Some of these characteristics are discussed in Section 6.
SCOPE
1.1 This practice provides a procedure for evaluating and comparing test methods under controlled conditions using the same materials tested during the same time span. The practice describes how to obtain and compare estimates on precision, sensitivity, and bias.
1.2 This practice covers the following topics:Topic TitleSection numberScope1Referenced Documents2Terminology3Significance and Use4Requirements for Materials5Evaluating Test Methods6Sensitivity Criterion7Basic Statistical Design8Experimental Procedure9Procedure for Comparing Precision10Evaluating the Bias Between Test Methods11Procedure for Comparing Sensitivities12Report13
WITHDRAWN RATIONALE
This practice provides a procedure for evaluating and comparing test methods under controlled conditions using the same materials tested during the same time span. The practice describes how to obtain and compare estimates on precision, sensitivity, and bias.
This practice is being withdrawn with no replacement because D13 no longer has the expertise to maintain and statistical standards are being maintained by committee E11.
Formerly under the jurisdiction of Committee D13 on Textiles, this practice was withdrawn in July 2008.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation:D4855–97 (Reapproved 2002)
Standard Practice for
Comparing Test Methods
This standard is issued under the fixed designation D4855; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
statements on the relative merits of the two test methods are part of the
1. Scope
output.
1.1 This practice provides a procedure for evaluating and
comparing test methods under controlled conditions using the
3. Terminology
same materials tested during the same time span. The practice
3.1 Definitions:
describes how to obtain and compare estimates on precision,
3.1.1 accuracy, n—of a test method, the degree of agree-
sensitivity, and bias.
ment between the true value of the property being tested (or an
1.2 This practice covers the following topics:
acceptedstandardvalue)andtheaverageofmanyobservations
Topic Title Section
made according to the test method, preferably by many
number
observers. (See also bias and precision.)
Scope 1
3.1.1.1 Discussion—Increased accuracy is associated with
Referenced Documents 2
decreased bias relative to the true value; two methods with
Terminology 3
equal bias relative to the true value have equal accuracy even
Significance and Use 4
Requirements for Materials 5
if one method is more precise than the other. The true value is
Evaluating Test Methods 6
the exact value of the property being tested for the statistical
Sensitivity Criterion 7
universe being sampled. When the true value is not known or
Basic Statistical Design 8
Experimental Procedure 9
cannot be determined, and an acceptable standard value is not
Procedure for Comparing Precision 10
available, accuracy cannot be established. No valid inferences
Evaluating the Bias Between Test Methods 11
on the accuracy of a method can be drawn from an individual
Procedure for Comparing Sensitivities 12
Report 13
observation.
3.1.2 bias, n—in statistics, a constant or systematic error in
2. Referenced Documents
test results.
2.1 ASTM Standards:
3.1.2.1 Discussion—Bias can exist between the accepted
D123 Terminology Relating to Textiles
reference value and a test result obtained from one method,
D2905 Practice for Statements on Number of Specimens
between test results obtained from two methods, or between
for Textiles
two test results obtained from a single method, for example,
D2906 Practice for Statements on Precision and Bias for
between operators or between laboratories.
Textiles
3.1.3 confidence interval, n—the interval estimate of a
E456 Terminology Relating to Quality and Statistics
population parameter computed so that the statement “the
2.2 ASTM Adjuncts:
population parameter lies in this interval” will be true, on the
TEX-PAC
average, in a stated proportion of the times such statements are
made.
NOTE 1—Tex-Pac is a group of PC programs on floppy disks, available
throughASTM Headquarters, 100 Barr Harbor Drive, Conshohocken, PA
3.1.4 confidence level, n—the stated proportion of times the
19428,USA.Thecalculationsforcomparingtheprecision,sensitivityand
confidence interval is expected to include the population
bias of two test methods can be done using one of these programs and
parameter.
3.1.4.1 Discussion—Statisticians generally accept that, in
the absence of special consideration, 0.95 or 95% is a realistic
ThispracticeisunderthejurisdictionofASTMCommitteeD13onTextilesand
confidence level. If the consequences of incorrectly estimating
is the direct responsibility of Subcommittee D13.93 on Statistics.
the confidence interval would be grave, then a higher confi-
Current edition approved September 10, 1997. Published August 1998. Origi-
nally published as D4855–88. Last previous edition D4855–91.
dence level might be considered. If the consequences of
Annual Book of ASTM Standards, Vol 07.01.
incorrectly estimating the confidence interval are of less than
Annual Book of ASTM Standards, Vol 14.02.
1 usual concern, then a lower confidence interval might be
PC programs on floppy disks are available throughASTM. For a 3 ⁄2 inch disk
request PCN:12-429040-18, for a 5 ⁄4 inch disk request PCN:12-429041-18. considered.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
D4855–97 (2002)
3.1.5 confidence limits, n—the two statistics that define the 3.1.17 statistic, n—a quantity that is calculated from obser-
ends of a confidence interval. vations on a sample and that estimates a parameter of a sample
and that estimates a parameter of a population.
3.1.6 degrees of freedom, n—for a set,thenumberofvalues
that can be assigned arbitrarily and still get the same value for 3.1.18 t-test, n—atestofstatisticalsignificancebasedonthe
each of one or more statistics calculated from the set of data. useofStudent’s t-distributionandusedtocomparetwosample
averages or a sample average and a hypothetical value.
3.1.6.1 Discussion—For example, if only an average is
specifiedforasetoffiveobservations,therearefourdegreesof 3.1.19 Type I error—See error of the first kind.
freedom since the same average can be obtained with any
3.1.20 Type II error—See error of the second kind.
values substituted for four of the five observations as long as
3.1.21 For definitions of textile terms used in this standard,
thefifthvalueissettogivethecorrecttotal.Ifboththeaverage
refer to Terminology D123. For definitions of other statistical
and the standard deviation have been specified, there are only
terms used in this standard, refer to Terminology D4392 or
three degrees of freedom left.
Terminology E456.
3.1.7 error of the first kind, a, n—in a statistical test, the
rejection of a statistical hypothesis when it is true. (Syn. Type
4. Significance and Use
I error.)
4.1 Task groups developing a test method frequently find
3.1.8 error of the second kind, b, n—in a statistical test, the
themselves with two or more alternative procedures that must
acceptance of a statistical hypothesis when it is false. (Syn.
be compared. Three common situations are:
Type II error.)
4.1.1 Two or more new test methods may have been
3.1.9 F-test, n—atestofstatisticalsignificancebasedonthe
proposed to measure a property for which there is no existing
use of George W. Snedecor’s F-distribution and used to
method.
compare two sample variances or a sample variance and a
4.1.2 A new test method may have been suggested to
hypothetical value.
replace an existing test method.
3.1.10 interference, n—in testing, an effect due to the
4.1.3 Two or more existing test methods may overlap in
presence of a constituent or characteristic that influences the
their scopes so that one should be chosen over the other.
measurement of another constituent or characteristic.
4.2 The selection of one test method in preference to
3.1.11 least difference of practical importance, d, n—the
another is not simply a statistical choice.There are many other
smallest difference based on engineering judgment deemed to
aspects of two test methods that should be considered, which
be of practical importance when considering whether a signifi-
mayhaveaninfluence(ontheengineeringjudgmentofthetask
cant difference exists between two statistics or between a
group)equaltoorgreaterthanthestatisticalevidence.Someof
statistic and a hypothetical value.
these characteristics are discussed in Section 6.
3.1.12 parameter, n—in statistics, a variable that describes
a characteristic of a population or mathematical model.
5. Requirements for Materials
3.1.13 precision, n—thedegreeofagreementwithinasetof
5.1 The number and type of materials to be included in a
observations or test results obtained as directed in a method.
comparison study will depend on the following:
3.1.13.1 Discussion—The term “precision,” delimited in
5.1.1 The range of the values of the property being mea-
various ways, is used to describe different aspects of precision.
sured on a given material and how the precision varies over
This usage was chosen in preference to the use of “repeatabil-
that range,
ity” and “reproducibility,” which have been assigned conflict-
ing meanings by various authors and standardizing bodies.
5.1.2 The number of different materials to which the test
method is applied.
3.1.14 ruggedness test, n—anexperimentinwhichenviron-
mental or test conditions are deliberately varied in order to 5.1.3 The difficulty and expense involved in obtaining,
evaluate the effects of such variations.
processing, and distributing samples,
3.1.15 sensitivity, n—for a single test method, the result of
5.1.4 The difficulty of, length of time required for, and
dividing (1) the derivative of measurements at different levels
expense of performing the tests, and
of a property of interest to known values of the property by (2)
5.1.5 The uncertainty of prior information on any of these
the standard deviation of such measurements. (Syn. absolute
points. For example, if it is already known that the precision is
sensitivity.)
relativelyconstantorproportionaltotheaverageleveloverthe
3.1.15.1 Discussion—Thesensitivityofasingletestmethod
range of values of interest, a smaller number of materials will
may be determined only with materials for which the values of
be needed than if it is known that the precision changes
the property of interest is known.
erratically at different levels. A preliminary pilot or screening
3.1.16 sensitivity ratio, SR, n—in comparing two test meth- program may help to settle some of these questions, and may
ods, the ratio of the sensitivities of the test methods with the often result in the saving of considerable time and expense in
the full comparison study.
larger sensitivity in the numerator. (Syn. relative sensitivity.)
3.1.16.1 Discussion—When the same materials are used for 5.2 In general, a minimum of three materials should be
eachtestmethod,thesensitivityratiomaybedeterminedusing considered acceptable, and for development of broadly appli-
materials for which the value of the property of interest is not cable precision statements, six or more materials should be
known. included in the study.
D4855–97 (2002)
5.3 Whenever feasible, the material representing any given improved sensitivity, a shorter elapsed time to get test results,
levelinacomparisonstudyshouldbemadeashomogeneousas or a reduced cost without unduly disturbing any other charac-
possible prior to its subdivision into portions or specimens that teristics of the test method.
are allocated to the different methods.
7. Sensitivity Criterion
5.4 For each level of material, an adequate quantity
7.1 Sometimes a test method that is more precise than
(sample) of reasonably homogeneous material should be avail-
able for subdivision for each test method. This supply of another test method has less discriminating power from the
standpoint of detecting changes in the level in the property of
material should include a reserve of 50% beyond the require-
ments of the protocol for the comparison study for possible interest. The sensitivity criterion is a quantitative measure of
the relative merit of two test methods which:
lateruseincheckingresultsorretestingthetestmethodsinone
or more laboratories. 7.1.1 Combines the precision of each method with the
abilityofthetestmethodtomeasuredifferencesintheproperty
6. Evaluating Test Methods
of interest.
6.1 Each Proposed New Test Method—When evaluating 7.1.2 Permits the comparison of test methods for which test
one or more test methods, take into account the following
results are reported in different units of measure. For this
features that are desirable in a proposed test method: reason, comparisons of the sensitivity of two methods may be
6.1.1 The relationship between the test results and the
more meaningful than comparisons of their precisions.
property of interest is clearly understood. 7.2 When comparing test methods on the basis of data
6.1.2 Thereisasmallornon-existentbiasoverawiderange
collected,itisimportantthatthetaskgrouphasformulatedand
of test results. evaluated a plan for analysis of the data so as to arrive at a
6.1.3 The test method is precise enough to satisfy the
correctdecision,beforeconductinganytests.Statisticaltestsof
requirements of the application. significance are recommended as a means of helping make the
6.1.4 The test method has acceptable ruggedness and sen- decisions for these reasons: they are objective, they require a
sitivity. clearstatementoftheproblem,theymakemoreefficientuseof
6.1.5 Any potential interferences are known and small the observed data than subjective techniques, and they allow
enough to tolerate. control of the probability of concluding two test methods are
6.1.6 There is a low cost for making an observation with different when they are really alike, as well as the probability
short times for learning to run the test, getting ready to run the of concluding two test methods are alike when they are really
test and cleaning up after running the test. different.
6.1.7 The test method may have other special attributes that
8. Basic Statistical Design
encourage its selection as a preferred method.
6.1.8 Data are available from the advocates of the test 8.1 Decide whether the precision, the sensitivity, the accu-
method to support the above claims. racy, or the bias of the two test methods is to be compared.
6.2 Two or More New Test Methods—When two or more 8.2 Specify the values of probability of Type I error, a,
new test methods are being evaluated, the task group should probability of Type II error, b, and the least difference of
also consider the possibility that: practical importance, d, to be used in determining the number
6.2.1 Onetestmethodmaybemoresuitableforonerangeof ofobservationsrequiredforeachlevelandmethod(seeFig.1).
values and another for a second range of values. 8.3 It is common practice to arbitrarily set a = 0.05 and b =
6.2.2 One method may be better suited as a referee method 0.10. The use of an a error of 0.05 is a compromise between
while the other is better for routine testing. the increased cost of experimenting when a is smaller and the
6.3 New Versus Existing Test Method—When looking for a greater risk of falsely stating that two equivalent methods are
new test method the task group wants improved precision, different that exists when a is larger. The b error of 0.10 takes
FIG. 1 Schematic of Decision Procedure
D4855–97 (2002)
into account the fact that the risk of failing to detect a true 8.7 Analyze the data and calculate the test statistic in 8.4.
differencebetweentwomethods
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.