Standard Practice for Statistical Assessment and Improvement of Expected Agreement Between Two Test Methods that Purport to Measure the Same Property of a Material

SIGNIFICANCE AND USE
5.1 This practice can be used to determine if a constant, proportional, or linear bias correction can improve the degree of agreement between two methods that purport to measure the same property of a material.  
5.2 The bias correction developed in this practice can be applied to a single result (X) obtained from one test method (method X) to obtain a predicted result ( Y^) for the other test method (method Y).
Note 5: Users are cautioned to ensure that  Y^ is within the scope of method Y before its use.  
5.3 The between methods reproducibility established by this practice can be used to construct an interval around  Y^ that would contain the result of test method Y, if it were conducted, with approximately 95 % probability.  
5.4 This practice can be used to guide commercial agreements and product disposition decisions involving test methods that have been evaluated relative to each other in accordance with this practice.  
5.5 The magnitude of a statistically detectable bias is directly related to the uncertainties of the statistics from the experimental study. These uncertainties are related to both the size of the data set and the precision of the processes being studied. A large data set, or, highly precise test method(s), or both, can reduce the uncertainties of experimental statistics to the point where the “statistically detectable” bias can become “trivially small,” or be considered of no practical consequence in the intended use of the test method under study. Therefore, users of this practice are advised to determine in advance as to the magnitude of bias correction below which they would consider it to be unnecessary, or, of no practical concern for the intended application prior to execution of this practice.
Note 6: It should be noted that the determination of this minimum bias of no practical concern is not a statistical decision, but rather, a subjective decision that is directly dependent on the application requirements of the users.
SCOPE
1.1 This practice covers statistical methodology for assessing the expected agreement between two different standard test methods that purport to measure the same property of a material, and for the purpose of deciding if a simple linear bias correction can further improve the expected agreement. It is intended for use with results obtained from interlaboratory studies meeting the requirement of Practice D6300 or equivalent (for example, ISO 4259). The interlaboratory studies shall be conducted on at least ten materials in common that among them span the intersecting scopes of the test methods, and results shall be obtained from at least six laboratories using each method. Requirements in this practice shall be met in order for the assessment to be considered suitable for publication in either method, if such publication includes claim to have been carried out in compliance with this practice. Any such publication shall include mandatory information regarding certain details of the assessment outcome as specified in the Report section of this practice.  
1.2 The statistical methodology is based on the premise that a bias correction will not be needed. In the absence of strong statistical evidence that a bias correction would result in better agreement between the two methods, a bias correction is not made. If a bias correction is required, then the parsimony principle is followed whereby a simple correction is to be favored over a more complex one.
Note 1: Failure to adhere to the parsimony principle generally results in models that are over-fitted and do not perform well in practice.  
1.3 The bias corrections of this practice are limited to a constant correction, proportional correction, or a linear (proportional + constant) correction.  
1.4 The bias-correction methods of this practice are method symmetric, in the sense that equivalent corrections are obtained regardless of which method is bias-corrected to match the othe...

General Information

Status
Published
Publication Date
29-Feb-2024

Relations

Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Dec-2023
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024
Effective Date
01-Mar-2024

Overview

ASTM D6708-24, "Standard Practice for Statistical Assessment and Improvement of Expected Agreement Between Two Test Methods that Purport to Measure the Same Property of a Material," provides a structured statistical methodology for evaluating, and when justified, improving the agreement between two analytical laboratory test methods that claim to measure the same property of a material. Developed under ASTM Committee D02, this practice is widely used in industries such as petroleum, fuels, and lubricants, where method equivalency and data comparability are critical for standardization, commercial agreements, and product verification.

This practice helps laboratories, manufacturers, and quality assurance professionals determine if a constant, proportional, or linear bias correction can statistically enhance the agreement between two standard methods, guiding informed decisions about method substitution, product disposition, and regulatory compliance.

Key Topics

  • Statistical Assessment of Method Agreement: Establishes a framework for comparing the outcomes of two test methods using data from interlaboratory studies.
  • Bias Correction: Outlines procedures for determining if, and how, a bias correction (constant, proportional, or linear) should be applied to better align the results from different methods.
  • Between-Methods Reproducibility: Defines how to quantify and interpret the reproducibility between methods, reflecting the expected difference between results when different operators, labs, and apparatus are used.
  • Appropriate Data Requirements: Specifies that interlaboratory studies must include at least ten diverse samples and data from at least six laboratories, ensuring robust and relevant assessment.
  • Parsimony Principle: Recommends the simplest effective bias correction (favoring less complex models) to avoid overfitting and ensure reliable application in practice.
  • Evaluation Criteria: Details the statistical checks and reporting requirements necessary to validate the comparability assessment for publication or regulatory use.

Applications

  • Method Substitution: Enables laboratories and organizations to assess whether one test method can reliably replace another for the same property, supporting operational flexibility and cost-effectiveness.
  • Quality Assurance: Equips quality managers with a scientifically sound approach to verify method alignment, crucial for consistent product quality and meeting industry regulations.
  • Commercial Agreements: Supports data-driven decisions in contractual scenarios, such as supply chain agreements or product certification, by providing objective evidence of method agreement or necessary corrections.
  • Regulatory Compliance: Provides documentation and validation for authorities when demonstrating that alternative methods meet specified requirements for materials testing.
  • Interlaboratory Comparisons: Guides the planning and interpretation of multi-lab studies aimed at establishing or verifying the statistical equivalence of test procedures.
  • Bias Detection and Correction: Helps identify and correct systematic differences between methods, essential for accurate reporting and decision-making.

Related Standards

  • ASTM D6300 - Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and Lubricants.
  • ASTM D6299 - Practice for Applying Statistical Quality Assurance and Control Charting Techniques to Evaluate Analytical Measurement System Performance.
  • ASTM D7372 - Guide for Analysis and Interpretation of Proficiency Test Program Results.
  • ASTM D5580 / D5769 - Common test methods for chemical analysis in the petroleum sector, often assessed using ASTM D6708-24.
  • ISO 4259 - Petroleum Products-Determination and Application of Precision Data in Relation to Methods of Test.

Keywords: ASTM D6708-24, statistical assessment, test method agreement, bias correction, interlaboratory study, method equivalency, between-methods reproducibility, petroleum testing, quality assurance, analytical method comparison, standard test methods.

Buy Documents

Standard

ASTM D6708-24 - Standard Practice for Statistical Assessment and Improvement of Expected Agreement Between Two Test Methods that Purport to Measure the Same Property of a Material

English language (19 pages)
sale 15% off
sale 15% off
Standard

REDLINE ASTM D6708-24 - Standard Practice for Statistical Assessment and Improvement of Expected Agreement Between Two Test Methods that Purport to Measure the Same Property of a Material

English language (19 pages)
sale 15% off
sale 15% off

Get Certified

Connect with accredited certification bodies for this standard

ABS Quality Evaluations Inc.

American Bureau of Shipping quality certification.

ANAB United States Verified

Element Materials Technology

Materials testing and product certification.

UKAS United Kingdom Verified

ABS Group Brazil

ABS Group certification services in Brazil.

CGCRE Brazil Verified

Sponsored listings

Frequently Asked Questions

ASTM D6708-24 is a standard published by ASTM International. Its full title is "Standard Practice for Statistical Assessment and Improvement of Expected Agreement Between Two Test Methods that Purport to Measure the Same Property of a Material". This standard covers: SIGNIFICANCE AND USE 5.1 This practice can be used to determine if a constant, proportional, or linear bias correction can improve the degree of agreement between two methods that purport to measure the same property of a material. 5.2 The bias correction developed in this practice can be applied to a single result (X) obtained from one test method (method X) to obtain a predicted result ( Y^) for the other test method (method Y). Note 5: Users are cautioned to ensure that Y^ is within the scope of method Y before its use. 5.3 The between methods reproducibility established by this practice can be used to construct an interval around Y^ that would contain the result of test method Y, if it were conducted, with approximately 95 % probability. 5.4 This practice can be used to guide commercial agreements and product disposition decisions involving test methods that have been evaluated relative to each other in accordance with this practice. 5.5 The magnitude of a statistically detectable bias is directly related to the uncertainties of the statistics from the experimental study. These uncertainties are related to both the size of the data set and the precision of the processes being studied. A large data set, or, highly precise test method(s), or both, can reduce the uncertainties of experimental statistics to the point where the “statistically detectable” bias can become “trivially small,” or be considered of no practical consequence in the intended use of the test method under study. Therefore, users of this practice are advised to determine in advance as to the magnitude of bias correction below which they would consider it to be unnecessary, or, of no practical concern for the intended application prior to execution of this practice. Note 6: It should be noted that the determination of this minimum bias of no practical concern is not a statistical decision, but rather, a subjective decision that is directly dependent on the application requirements of the users. SCOPE 1.1 This practice covers statistical methodology for assessing the expected agreement between two different standard test methods that purport to measure the same property of a material, and for the purpose of deciding if a simple linear bias correction can further improve the expected agreement. It is intended for use with results obtained from interlaboratory studies meeting the requirement of Practice D6300 or equivalent (for example, ISO 4259). The interlaboratory studies shall be conducted on at least ten materials in common that among them span the intersecting scopes of the test methods, and results shall be obtained from at least six laboratories using each method. Requirements in this practice shall be met in order for the assessment to be considered suitable for publication in either method, if such publication includes claim to have been carried out in compliance with this practice. Any such publication shall include mandatory information regarding certain details of the assessment outcome as specified in the Report section of this practice. 1.2 The statistical methodology is based on the premise that a bias correction will not be needed. In the absence of strong statistical evidence that a bias correction would result in better agreement between the two methods, a bias correction is not made. If a bias correction is required, then the parsimony principle is followed whereby a simple correction is to be favored over a more complex one. Note 1: Failure to adhere to the parsimony principle generally results in models that are over-fitted and do not perform well in practice. 1.3 The bias corrections of this practice are limited to a constant correction, proportional correction, or a linear (proportional + constant) correction. 1.4 The bias-correction methods of this practice are method symmetric, in the sense that equivalent corrections are obtained regardless of which method is bias-corrected to match the othe...

SIGNIFICANCE AND USE 5.1 This practice can be used to determine if a constant, proportional, or linear bias correction can improve the degree of agreement between two methods that purport to measure the same property of a material. 5.2 The bias correction developed in this practice can be applied to a single result (X) obtained from one test method (method X) to obtain a predicted result ( Y^) for the other test method (method Y). Note 5: Users are cautioned to ensure that Y^ is within the scope of method Y before its use. 5.3 The between methods reproducibility established by this practice can be used to construct an interval around Y^ that would contain the result of test method Y, if it were conducted, with approximately 95 % probability. 5.4 This practice can be used to guide commercial agreements and product disposition decisions involving test methods that have been evaluated relative to each other in accordance with this practice. 5.5 The magnitude of a statistically detectable bias is directly related to the uncertainties of the statistics from the experimental study. These uncertainties are related to both the size of the data set and the precision of the processes being studied. A large data set, or, highly precise test method(s), or both, can reduce the uncertainties of experimental statistics to the point where the “statistically detectable” bias can become “trivially small,” or be considered of no practical consequence in the intended use of the test method under study. Therefore, users of this practice are advised to determine in advance as to the magnitude of bias correction below which they would consider it to be unnecessary, or, of no practical concern for the intended application prior to execution of this practice. Note 6: It should be noted that the determination of this minimum bias of no practical concern is not a statistical decision, but rather, a subjective decision that is directly dependent on the application requirements of the users. SCOPE 1.1 This practice covers statistical methodology for assessing the expected agreement between two different standard test methods that purport to measure the same property of a material, and for the purpose of deciding if a simple linear bias correction can further improve the expected agreement. It is intended for use with results obtained from interlaboratory studies meeting the requirement of Practice D6300 or equivalent (for example, ISO 4259). The interlaboratory studies shall be conducted on at least ten materials in common that among them span the intersecting scopes of the test methods, and results shall be obtained from at least six laboratories using each method. Requirements in this practice shall be met in order for the assessment to be considered suitable for publication in either method, if such publication includes claim to have been carried out in compliance with this practice. Any such publication shall include mandatory information regarding certain details of the assessment outcome as specified in the Report section of this practice. 1.2 The statistical methodology is based on the premise that a bias correction will not be needed. In the absence of strong statistical evidence that a bias correction would result in better agreement between the two methods, a bias correction is not made. If a bias correction is required, then the parsimony principle is followed whereby a simple correction is to be favored over a more complex one. Note 1: Failure to adhere to the parsimony principle generally results in models that are over-fitted and do not perform well in practice. 1.3 The bias corrections of this practice are limited to a constant correction, proportional correction, or a linear (proportional + constant) correction. 1.4 The bias-correction methods of this practice are method symmetric, in the sense that equivalent corrections are obtained regardless of which method is bias-corrected to match the othe...

ASTM D6708-24 is classified under the following ICS (International Classification for Standards) categories: 75.080 - Petroleum products in general. The ICS classification helps identify the subject area and facilitates finding related standards.

ASTM D6708-24 has the following relationships with other standards: It is inter standard links to ASTM D6708-21, ASTM D6300-24, ASTM D6300-23a, ASTM D7795-15(2022)e1, ASTM D7798-20, ASTM D8183-22, ASTM D8210-22, ASTM D5481-21, ASTM D2887-23, ASTM D8473-22, ASTM D7778-15(2022)e1, ASTM D7945-23, ASTM D3764-23, ASTM D7344-17a, ASTM D7157-23. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

ASTM D6708-24 is available in PDF format for immediate download after purchase. The document can be added to your cart and obtained through the secure checkout process. Digital delivery ensures instant access to the complete standard document.

Standards Content (Sample)


This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: D6708 − 24 An American National Standard
Standard Practice for
Statistical Assessment and Improvement of Expected
Agreement Between Two Test Methods that Purport to
Measure the Same Property of a Material
This standard is issued under the fixed designation D6708; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope* 1.4 The bias-correction methods of this practice are method
symmetric, in the sense that equivalent corrections are obtained
1.1 This practice covers statistical methodology for assess-
regardless of which method is bias-corrected to match the
ing the expected agreement between two different standard test
other.
methods that purport to measure the same property of a
1.5 A methodology is presented for establishing the numeri-
material, and for the purpose of deciding if a simple linear bias
cal limit (designated by this practice as the between methods
correction can further improve the expected agreement. It is
reproducibility) that would be exceeded about 5 % of the time
intended for use with results obtained from interlaboratory
(one case in 20 in the long run) for the difference between two
studies meeting the requirement of Practice D6300 or equiva-
results where each result is obtained by a different operator in
lent (for example, ISO 4259). The interlaboratory studies shall
a different laboratory using different apparatus and each
be conducted on at least ten materials in common that among
applying one of the two methods X and Y on identical material,
them span the intersecting scopes of the test methods, and
where one of the methods has been appropriately bias-
results shall be obtained from at least six laboratories using
corrected in accordance with this practice, in the normal and
each method. Requirements in this practice shall be met in
correct operation of both test methods.
order for the assessment to be considered suitable for publica-
NOTE 2—In earlier versions of this standard practice, the term “cross-
tion in either method, if such publication includes claim to
method reproducibility” was used in place of the term “between methods
have been carried out in compliance with this practice. Any
reproducibility.” The change was made because the “between methods
such publication shall include mandatory information regard-
reproducibility” term is more intuitive and less confusing. It is important
to note that these two terms are synonymous and interchangeable with one
ing certain details of the assessment outcome as specified in the
another, especially in cases where the “cross-method reproducibility” term
Report section of this practice.
was subsequently referenced by name in methods where a D6708
assessment was performed, before the change in terminology in this
1.2 The statistical methodology is based on the premise that
standard practice was adopted.
a bias correction will not be needed. In the absence of strong
NOTE 3—Users are cautioned against applying the between methods
statistical evidence that a bias correction would result in better
reproducibility as calculated from this practice to materials that are
significantly different in composition from those actually studied, as the
agreement between the two methods, a bias correction is not
ability of this practice to detect and address sample-specific biases (see
made. If a bias correction is required, then the parsimony
6.7) is dependent on the materials selected for the interlaboratory study.
principle is followed whereby a simple correction is to be
When sample-specific biases are present, the types and ranges of samples
favored over a more complex one.
may need to be expanded significantly from the minimum of ten as
specified in this practice in order to obtain a more comprehensive and
NOTE 1—Failure to adhere to the parsimony principle generally results
reliable between methods reproducibility that adequately cover the range
in models that are over-fitted and do not perform well in practice.
of sample-specific biases for different types of materials.
1.3 The bias corrections of this practice are limited to a 1.6 This practice is intended for test methods which mea-
sure quantitative (numerical) properties of petroleum or petro-
constant correction, proportional correction, or a linear (pro-
leum products.
portional + constant) correction.
1.7 The statistical calculations of this practice are also
applicable for assessing the expected agreement between two
different test methods that purport to measure the same
This practice is under the jurisdiction of ASTM Committee D02 on Petroleum
Products, Liquid Fuels, and Lubricants and is the direct responsibility of Subcom-
property of a material using results that are not as described in
mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.
1.1, provided the results and associated statistics from each test
Current edition approved March 1, 2024. Published March 2024. Originally
method are obtained from a specifically designed multi-lab
approved in 2001. Last previous edition approved in 2021 as D6708 – 21. DOI:
10.1520/D6708-24. study or from a proficiency testing program (e.g.: ILCP) where
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
D6708 − 24
for each sample a single result is provided by each lab for each 2. Referenced Documents
test method. The comparison sample set shall comprise at least 2
2.1 ASTM Standards:
ten different materials that span the intersecting scopes of the
D5580 Test Method for Determination of Benzene, Toluene,
test methods with no material exceeding the leverage require-
Ethylbenzene, p/m-Xylene, o-Xylene, C and Heavier
ment in Practice D6300. Results and statistics shall meet
Aromatics, and Total Aromatics in Finished Gasoline by
requirements in 1.7.1. Requirements in this practice shall be
Gas Chromatography
met in order for the assessment to be considered suitable for D5769 Test Method for Determination of Benzene, Toluene,
publication in either method, if such publication includes claim and Total Aromatics in Finished Gasolines by Gas
Chromatography/Mass Spectrometry
to have been carried out in compliance with this practice. Any
D6299 Practice for Applying Statistical Quality Assurance
such publication shall include mandatory information regard-
and Control Charting Techniques to Evaluate Analytical
ing certain details of the assessment as specified in the Report
Measurement System Performance
section of this practice. R shall be based on the published
XY
D6300 Practice for Determination of Precision and Bias
reproducibility of the methods.
Data for Use in Test Methods for Petroleum Products,
1.7.1 For each test method and sample, results and statistics
Liquid Fuels, and Lubricants
used to perform the assessment in 1.7 shall meet the following
D7372 Guide for Analysis and Interpretation of Proficiency
requirements:
Test Program Results
(1) No. of results (N) ≥ 10, 3
2.2 ISO Standard:
(2) Anderson Darling statistic ≤ 1.12 (based on Normal
ISO 4259 Petroleum Products—Determination and Applica-
Distribution),
tion of Precision Data in Relation to Methods of Test
(3) Standard Error (se ) is calculated using published
sample
3. Terminology
reproducibility evaluated at the sample mean, N, and the factor
2.8 as follows:
3.1 Definitions:
3.1.1 between ILCP method-averages reproducibility
se 5 @R ⁄ ~2.8 =N!# (1)
sample pub
(R ), n—a quantitative expression of the random error
˜
ILCP_ X, ILCP_Y
(4) se is numerically less than [R / (2.8 √10 )], and
sample pub
associated with the difference between the bias-corrected ILCP
(5) Sample standard deviation (s ) per root-mean-
sample
average of method X versus the ILCP average of method Y
square technique is not statistically greater than R / 2.8 for
pub
from a Proficiency Testing program, when the method X has
at least 80 % of the samples in the comparison data set based
been assessed versus method Y, and an appropriate bias-
on an F-test using 30 as the assumed degrees of freedom for
correction has been applied to all method X results in accor-
R , and (N − 1) for s at the 0.05 significance level.
pub sample dance with this practice; it is defined as the numerical limit for
the difference between two such averages that would be
1.8 The methodology in this practice can also be used to
exceeded about 5 % of the time (one case in 20 in the long run).
perform linear regression analysis between two variables (X,
3.1.2 between-method bias, n—a quantitative expression for
Y) where there is known uncertainty in both variables that may
the mathematical correction that can statistically improve the
or may not be constant over the regression range. The common
degree of agreement between the expected values of two test
acronym used to describe this type of linear regression is
methods which purport to measure the same property.
ReXY (Regression with errors in X and Y). The ReXY
technique for assessing the correlation between two variables 3.1.3 between methods reproducibility (R ), n—a quantita-
XY
tive expression of the random error associated with the
as described in this practice can be used for investigative
difference between two results obtained by different operators
applications where the strict data input requirement may not be
in different laboratories using different apparatus and applying
met, but the outcome can still be useful for the intended
the two methods X and Y, respectively, each obtaining a single
application. Use of this practice for ReXY should be conducted
result on an identical test sample, when the methods have been
under the tutelage of subject matter experts familiar with the
assessed and an appropriate bias-correction has been applied in
statistical theory and techniques described in this practice, the
accordance with this practice; it is defined as the numerical
methodologies associated with the production and collection of
limit for the difference between two such single and indepen-
the results to be used for the regression analysis, and interpre-
dent results that would be exceeded about 5 % of the time (one
tation of assessment outcome relative to the intended applica-
case in 20 in the long run) in the normal and correct operation
tion.
of both test methods.
1.9 This international standard was developed in accor-
dance with internationally recognized principles on standard- 2
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
ization established in the Decision on Principles for the contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on
Development of International Standards, Guides and Recom-
the ASTM website.
mendations issued by the World Trade Organization Technical 3
Available from American National Standards Institute (ANSI), 25 W. 43rd St.,
Barriers to Trade (TBT) Committee. 4th Floor, New York, NY 10036.
D6708 − 24
3.1.3.1 Discussion—A statement of between methods repro-
w = weight associated with the difference be-
i
ducibility shall include a description of any bias correction
tween mean results (or corrected mean
th
used in accordance with this practice.
results) from the i round robin sample
CSS = centered sum of squares, weighted sum of
3.1.3.2 Discussion—Between methods reproducibility is a
squared differences between (possibly
meaningful concept only if there are no statistically observable
corrected) mean results from the round
sample-specific relative biases between the two methods, or if
robin
such biases vary from one sample to another in such a way that
ˆ
a,b = parameters of a linear correction: Y = a +
they may be considered random effects. (See 6.7.)
bX
3.1.4 centered sum of squares (CSS), n—a statistic used to t , t = ratios for assessing reductions in sums of
1 2
quantify the degree of agreement between the results from two squares
R = estimate of between methods reproduc-
test methods after bias-correction using the methodology of
XY
ibility
this practice.
ˆ
Y = predicted Y-method value for a sample by
3.1.5 Interlaboratory Crosscheck Program (ILCP),
applying the bias correction established
n—ASTM International Proficiency Test Program sponsored
from this practice to an actual X-method
by Committee D02 on Petroleum Products, Liquid Fuels, and
result for the same sample
th
ˆ
Lubricants; see ASTM website for current details. D7372
Y = predicted i round robin sample
i
Y-method mean, by applying the bias
3.1.6 total sum of squares (TSS), n—a statistic used to
correction established from this practice
quantify the information content from the inter-laboratory
to its corresponding X-method mean
study in terms of total variation of sample means relative to the
ˆ
ε = standardized difference between Y and Y .
i i i
standard error of each sample mean.
L , L = harmonic mean numbers of laboratories
X Y
3.2 Symbols: submitting results on round robin
samples, by X- and Y- methods,
respectively
X,Y = single X-method and Y-method results,
4. Summary of Practice
respectively
X , Y = single results from the X-method and
ijk ijk 4.1 Precisions of the two methods are quantified using
Y-method round robins, respectively
inter-laboratory studies meeting the requirements of Practice
th
X , Y = means of results on the i round robin
I i
D6300 or equivalent, using at least ten samples in common that
sample
span the intersecting scopes of the methods. The arithmetic
S = the number of samples in the round robin
means of the results for each common sample obtained by each
L , L = the numbers of laboratories that returned
Xi Yi
method are calculated. Estimates of the standard errors of these
th
results on the i round robin sample
means are computed.
R , R = the reproducibilities of the X- and Y-
X Y
NOTE 4—For established standard test methods, new precision studies
methods, respectively
generally will be required in order to meet the common sample require-
R , R = the reproducibility of method X and Y,
Xi Yi
ment.
evaluated at the method X and Y means
th
4.2 Weighted sums of squares are computed for the total
of the i round robin sample, respectively
variation of the mean results across all common samples for
R = estimate of between ILCP method-
˜
ILCP_ X, ILCP_Y
each method. These sums of squares are assessed against the
averages reproducibility
standard errors of the mean results for each method to ensure
s , s = the reproducibility standard deviations,
RXi RYi
that the samples are sufficiently varied before continuing with
evaluated at the method X and Y means
th
the practice.
of the i round robin sample
s , s = the repeatability standard deviations,
rXi rYi 4.3 The closeness of agreement of the mean results by each
evaluated at the method X and Y means
method is evaluated using appropriate weighted sums of
th
of the i round robin sample
squared differences. Such sums of squares are computed from
th
s , s = standard errors of the means i round
Xi Yi
the data first with no bias correction, then with a constant bias
robin sample
correction, then, when appropriate, with a proportional
¯ ¯
X, Y = the weighted means of round robins
correction, and finally with a linear (proportional + constant)
(across samples)
correction.
th
x , y = deviations of the means of the i round
i i
4.4 The weighted sums of squared differences for the linear
¯ ¯
robin sample results from X and Y, re-
correction is assessed against the total variation in the mean
spectively.
results for both methods to ensure that there is sufficient
¯ ¯
TSS , TSS = total sums of squares, around X and Y
X Y
correlation between the two methods.
F = a ratio for comparing variances; not
unique—more than one use
4.5 The most parsimonious bias correction is selected.
v , v = the degrees of freedom for reproducibility
X Y
4.6 The weighted sum of squares of differences, after
variances from the round robins
applying the selected bias correction, is assessed to determine
D6708 − 24
whether additional unexplained sources of variation remain in 6.1 Calculate sample means and standard errors from Prac-
the residual (that is, the individual Y minus bias-corrected X ) tice D6300 results.
i i
data. Any remaining, unexplained variation is attributed to
6.1.1 The process of applying Practice D6300 to the data
sample-specific biases (also known as method-material
may involve elimination of some results as outliers, and it may
interactions, or matrix effects). In the absence of sample-
also involve applying a transformation to the data. For this
specific biases, the between methods reproducibility is esti-
practice, compute the mean results from data that have not
mated.
been transformed, but with outliers removed in accordance
4.7 If sample-specific biases are present, the residuals (that
with Practice D6300. The precision estimates from Practice
is, the individual Y minus bias-corrected X ) are tested for
i i
D6300 are used to estimate the standard errors of these means.
randomness. If they are found to be consistent with a random-
6.1.2 Compute the means as follows:
effects model, then their contribution to the between methods
th th
6.1.2.1 Let X represent the k result on the i common
ijk
reproducibility is estimated, and accumulated into an all-
th
material by the j lab in the round robin for method X.
encompassing between methods reproducibility estimate.
th
Similarly for Y . (The i material is the same for both round
ijk
4.8 Refer to Fig. 1 for a simplified flow diagram of the
th
robins, but the j lab in one round robin is not necessarily the
process described in this practice.
th
same lab as the j lab in the other round robin.) Let n be the
Xij
th th
number of results on the i material from the j X-method lab,
5. Significance and Use
after removing outliers, that is, the number of results in cell
5.1 This practice can be used to determine if a constant,
(i,j). Let L be the number of laboratories in the X-method
Xi
proportional, or linear bias correction can improve the degree
th
round robin that have at least one result on the i material
of agreement between two methods that purport to measure the
remaining in the data set, after removal of outliers. Let S be the
same property of a material.
total number of materials common to both round robins.
5.2 The bias correction developed in this practice can be
th
6.1.2.2 The mean X-method result for the i material is:
applied to a single result (X) obtained from one test method
ˆ
(method X) to obtain a predicted result (Y) for the other test X
ijk
(
k
method (method Y). X 5 (2)
i (
L n
j
xi Xij
th
ˆ
NOTE 5—Users are cautioned to ensure that Y is within the scope of where, X is the average of the cell averages on the i mate-
i
method Y before its use. rial by method X.
5.3 The between methods reproducibility established by this
th
6.1.2.3 Similarly, the mean Y-method result for the i
ˆ
practice can be used to construct an interval around Y that
material is:
would contain the result of test method Y, if it were conducted,
with approximately 95 % probability.
Y
( ijk
k
Y 5 (3)
5.4 This practice can be used to guide commercial agree- i (
L n
j
Yi Yij
ments and product disposition decisions involving test methods
6.1.3 The standard errors (standard deviations of the means
that have been evaluated relative to each other in accordance
of the results) are computed as follows:
with this practice.
6.1.3.1 If s is the estimated reproducibility standard
5.5 The magnitude of a statistically detectable bias is RXi
deviation from the X-method round robin, and s is the
directly related to the uncertainties of the statistics from the
rXi
estimated repeatibility standard deviation, then an estimate of
experimental study. These uncertainties are related to both the
size of the data set and the precision of the processes being the standard error for X is given by:
i
studied. A large data set, or, highly precise test method(s), or
1 1 1
both, can reduce the uncertainties of experimental statistics to 2 2
s 5Œ s 2 s 1 2 (4)
F S DG
Xi RXi rXi (
L L n
j
Xi Xi Xij
the point where the “statistically detectable” bias can become
NOTE 8—Since repeatability and reproducibility may vary with X, even
“trivially small,” or be considered of no practical consequence
if the L were the same for all materials and the n were the same for all
Xi Xij
in the intended use of the test method under study. Therefore,
laboratories and all materials, the {s } might still differ from one material
Xi
users of this practice are advised to determine in advance as to
to the next.
the magnitude of bias correction below which they would
6.1.3.2 s , the estimated standard error for Y , is given by an
consider it to be unnecessary, or, of no practical concern for the
Yi i
analogous formula.
intended application prior to execution of this practice.
NOTE 6—It should be noted that the determination of this minimum bias 6.2 Calculate the total variation sum of squares for each
of no practical concern is not a statistical decision, but rather, a subjective
method, and determine whether the samples can be distin-
decision that is directly dependent on the application requirements of the
guished from each other by both methods.
users.
6.2.1 The total sums of squares (TSS) are given by:
6. Procedure
2 2
¯ ¯
X 2 X Y 2 Y
i i
NOTE 7—For an in-depth statistical discussion of the methodology used
TSS 5 S D and TSS 5 S D (5)
x ( y (
s s
in this section, see Appendix X1. For a worked example, see Appendix i Xi i Yi
X2. where:
D6708 − 24
FIG. 1 Simplified Flow Diagram for this Practice
th
X Y 6.2.2 Compare F = TSS /(S-1) to the 95 percentile of
i i X
S D S D
2 2
( (
s s
Fisher’s F distribution with (S-1) and v degrees of freedom for
i Xi i Yi
x
¯ ¯
X 5 and Y 5 (6)
1 1
the numerator and denominator, respectively, where v is the
X
S D S D
( 2 ( 2
s s
i i degrees of freedom for the reproducibility variance (Practice
Xi Yi
are weighted averages of all X ’s and Y ’s respectively.
i i
D6708 − 24
D6300, paragraph 8.3.3.3) for the X-method round robin. If F
w ~Y 2 X ! w Y w X
i i i ( i i ( i i
th a 5 5 2 (11)
(
does not exceed the 95 percentile, then the X-method is not
i w w w
( i ( i ( i
i
sufficiently precise to distinguish among the S samples. Do not
proceed with this practice, as meaningful results cannot be
6.4.2.2 Compute CSS:
produced.
CSS 5 w Y 2 X 1a (12)
~ ~ !!
1a ( i i i
6.2.3 In a similar manner, compare F = TSS /(S-1) to the
Y i
th
95 percentile of Fisher’s F distribution, using the degrees of
6.4.3 Class 1b—Proportional bias correction.
freedom of the reproducibility variance of the Y-method, v , in
Y
6.4.3.1 The computations of this subsection (6.4.3) are
place of v . Similarly, do not proceed with this practice if F
X
appropriate only if both of the following conditions apply: (1)
th
does not exceed the 95 percentile.
the measured property assumes only non-negative values, and
NOTE 9—If one or both of the conditions of 6.2.2 and 6.2.3 are satisfied (2) a property value of zero has a physical significance (for
only marginally, it is unlikely that this practice will produce a meaningful
example, concentrations of specific constituents). In addition, it
outcome. The test in the next subsection will almost certainly fail.
is not mandatory but highly recommended that max(Y )≥2
i
6.3 Test whether the methods are sufficiently correlated. min(Y ).
i
6.4.3.2 The computations involve iterative calculation of the
6.3.1 Using the weights {w } as computed in 6.4.1.1, Eq 6,
i
weights {w } and the proportional correction b.
calculate the weighted correlation coeffıcient r:
i
6.4.3.3 Set b = 1.
¯ ¯
w ~X 2 X!~Y 2 Y!
( i i i
6.4.3.4 Compute the weight w for each sample i:
r 5 (7) i
2 2
¯ ¯
=
w ~X 2 X! w ~Y 2 Y!
( i i ( i i 1
¯ ¯ w 5 (13)
where X and Y are w X w and w Y w , respectively. i 2 2 2
( ( ( (
i i i i i i
/ / s 1b s
Yi Xi
6.3.2 Use r to calculate the F-statistic:
6.4.3.5 Calculate the following three sums:
~S 2 2!r
2 2
A 5 w X Y s (14)
F 5 (8)
( i i i Xi
1 2 r
2 2 2 2 2
B 5 w X s 2 Y s (15)
~ !
( i i Yi i Xi
th
6.3.3 Compare F to the 99 percentile of Fisher’s F
2 2
C 5 2 w X Y s (16)
( i i i Yi
distribution with 1 and S-2 degrees of freedom in the numerator
and denominator, respectively.
6.4.3.6 Calculate b :
th
6.3.3.1 If F is less than the 99 percentile value, then this
=
2B1 B 2 4AC
practice concludes that the methods are too discordant to
b 5 (17)
2A
permit use of the results from one method to predict those of
the other.
6.4.3.7 If |b − b | > .001 b, replace b with b and go back to
0 0
6.3.3.2 If F is greater than the tabled value, proceed to 6.5.
6.4.3.4. Otherwise, the iteration can be stopped, as further
iteration will not produce meaningful improvement. Replace b
6.4 Calculate the centered sum of squares (CSS) statistic for
with b and go on to 6.4.3.8.
each of the following classes of bias-correction methodology.
6.4.3.8 Calculate the final weights {w } as in 6.4.3.4.
i
NOTE 10—The revised algorithms presented in this version of D6708
6.4.3.9 Calculate CSS :
1b
were developed in order to correct very rare cases in which the algorithms
CSS 5 w ~Y 2 bX ! (18)
of previous versions do not converge to the optimal linear models. The
1b ( i i i
rare cases generally involved data sets with poor correlations between the
6.4.4 Class 2—Linear (proportional + constant) bias correc-
two methods. In the vast majority of data sets, including worked example
of this practice, the old and the new algorithms converge to exactly the tion.
same optimal models. Continuing to use the old algorithms is a reasonable
6.4.4.1 This involves iterative calculation of the weights
option provided the user verifies that the computed value of CSS1b is
{w }, the weighted means of X ’s and Y ’s, and the proportional
i i i
never larger than CSS0, and that the computed value of CSS2 is never
term b.
larger than either CSS1a or CSS1b. If the aforementioned situation is
6.4.4.2 Set b = 1.
detected using the old algorithms, then the outcome from this version is
deemed to be the correct outcome.
6.4.4.3 Compute the weight w for each sample i:
i
6.4.1 Class 0—No bias correction. 1
w 5 (19)
i 2 2 2
s 1b s
6.4.1.1 Compute the weights (w ) for each sample i:
yi xi
i
6.4.4.4 Calculate the weighted means of {X } and {Y }
i i
w 5 (9)
2 2
i
s 1s
respectively:
Yi Xi
6.4.1.2 Compute CSS: w Y
( i i
¯
Y 5 (20)
w
( i
CSS 5 w ~X 2 Y ! (10)
0 ( i i i
i
w X
( i i
¯
6.4.2 Class 1a—Constant bias correction. X 5
w
( i
6.4.2.1 Using the weights (w ) from 6.4.1.1, compute the
i
constant bias correction (a): 6.4.4.5 Calculate the deviations from the weighted means:
D6708 − 24
¯
x 5 X 2 X (21) CSS 2 CSS
i i
0 1
t 5Œ (29)
CSS /~S 2 2!
¯
y 5 Y 2 Y
i i
CSS 2 CSS
1 2
6.4.4.6 Calculate the three sums: t 5Œ
CSS /~S 2 2!
2 2 where, CSS is the lesser of CSS or CSS , provided the
1 1a 1b
A 5 w x y s (22)
( i i i Xi
latter is appropriate and has been calculated.
2 2 2 2 2
B 5 w ~x s 2 y s ! (23)
( i i Yi i Xi
th
6.5.3.1 Compare t to the upper 97.5 percentile of the t
2 2
C 5 2 w x y s (24)
distribution with S-2 degrees of freedom.
( i i i Yi
6.5.3.2 If t is larger, conclude that a bias correction of Class
6.4.4.7 Calculate b :
2 (proportional + constant correction) can improve the ex-
pected agreement over that of a single term (constant or
=
2B1 B 2 4AC
b 5 (25)
proportional) correction alone (Class 1). Proceed to 6.6.
2A
6.5.3.3 If t is smaller than the t-percentile, compare t to the
2 1
th
6.4.4.8 If |b − b | > .001 b, replace b with b and go back to
0 0 same upper 97.5 percentile of the t distribution with (S-2)
¯ ¯
6.4.4.3, computing new values for the weights {w }, X, Y, {x },
i i degrees of freedom.
{y }, and b . Otherwise, the iteration can be stopped, as further
i 0 6.5.3.4 If t is larger, conclude that a single term bias
iteration will not produce meaningful improvement. Replace b
correction of Class 1 is preferred to a bias correction of Class
with b and go to 6.4.4.9.
0 2. Use the constant correction unless CSS is appropriate and
1b
is smaller than CSS . Proceed to 6.6.
6.4.4.9 Calculate the final weights {w } as in 6.4.4.3.
1a
i
6.5.3.5 If t is smaller, then neither t nor t is statistically
1 1 2
6.4.4.10 Calculate CSS and a:
significant. A bias correction of Class 2 is preferred over
CSS 5 w y 2 bx (26)
~ !
2 i i i
( single-term (constant or proportional) correction of Class 1.
¯ ¯
a 5 Y 2 b X (27)
6.6 Test for existence of sample-specific biases.
6.6.1 Compare the CSS of the bias-correction class selected
6.5 Conduct tests to select the most parsimonious bias
th
in 6.5 to the 95 percentile value of a chi-square distribution
correction class needed.
with v degrees of freedom.
6.5.1 The centered sum of squares for differences from each
where:
class of bias correction are used to select the most parsimoni-
ous bias correction class that can improve the expected degree v = S for Class 0 (no bias) correction,
ˆ
v = S − 1 for Class 1a or Class 1b (constant or proportional)
of agreement between the Y (the predicted Y-method result
correction, and
using X-method result) and the actual Y-method result on the
v = S − 2 for Class 2 (linear) correction.
same material. The classes of bias correction and the associated
CSS as calculated earlier are repeated in the following table.
6.6.2 If the CSS is smaller than the chi-square percentile, it
Bias Correction Class CSS is reasonable to conclude that there are no sample-specific
Class 0–no correction CSS biases, that is, that there are no other sources of variation that
Class 1a–constant bias correction CSS
1a
are statistically observable above the measurement error. Per-
Class 1b–proportional bias correction (when appropriate) CSS
1b
form the Anderson-Darling (A-D) assessment on the residuals
Class 2–linear (proportional + constant bias correction) CSS
as per 6.7.2.2 and 6.7.2.3. If the outcome is not significant at
6.5.2 To determine whether any bias correction (Class 1a,
the 5 % level, calculate the between methods reproducibility
1b, or 2 above) can significantly improve the expected agree-
(R ) as per Eq 30 below. If the A-D assessment is significant,
XY
ment between the two methods, calculate the following ratio:
application of the practice is considered terminated with failure
at this point, as the statistical evidence suggests that a single
~CSS 2 CSS !/2
0 2
F 5 (28)
between-method reproducibility (R ) cannot be found that is
CSS / S 2 2
~ ! XY
applicable to all materials covered by the intersecting scope of
th
6.5.2.1 Compare F to the upper 95 percentile of the F
both test methods. It is reasonable to conclude that, at least for
distribution with 2 and S-2 degrees of freedom for the
some materials, the test methods are not measuring the same
numerator and denominator, respectively.
property.
6.5.2.2 If the calculated F is smaller, conclude that a bias
2 2 2
R 1b R
Y X
correction of Class 1a, 1b, or 2 does not sufficiently improve
R 5Œ (30)
XY
the expected agreement between the two methods, relative to
where:
Class 0 (no bias correction). Proceed to 6.6.
b = the coefficient of the appropriate bias correction. (For
6.5.2.3 If the calculated F is larger, conclude that a correc-
Class 0 and Class 1a bias corrections, b=1.)
tion can improve the expected agreement between the two
methods, and continue in 6.5.3.
6.6.3 If the CSS is larger than the chi-square percentile (see
th
6.5.3 If the F-value calculated in 6.5.2 is larger than the 95
6.6.1), there is strong evidence that biases between the methods
percentile of F, compute the following t-ratios: have not been adequately corrected by the bias-corrections of
D6708 − 24
6.4. In other words, the relative biases are not consistent across conclude that, at least for some materials, the test methods are
the S common samples of the round robins. The user may wish not measuring the same property. Do NOT proceeed to 6.7.3.
to investigate whether the biases can be attributed to other
NOTE 11—It is possible that, by restricting the comparison to a narrower
observable properties of the samples. Or he or she may wish to
class of materials, a between methods reproducibility can be obtained (for
restrict attention to a smaller class of materials for the purpose
that narrower class) that does not have sample-specific biases, or, has
sample-specific biases that can be treated as a random effect. However,
of establishing a between methods reproducibility. Such inves-
individual outlier materials should not be excluded from this study,
tigations are beyond the scope of this practice, as the issues
after-the-fact, based on the statistics only, without other evidence that they
typically are not statistical in nature. This practice does
clearly belong to a separate and identifiable class.
recommend investigating whether it is reasonable to treat the
6.7.3 Calculate the between methods reproducibility (R )
XY
sample-specific biases as random effects, as described in 6.7.
as follows:
6.7 Treatment of Sample-Specific Relative Bias as a Vari-
2 2 2 2
b R R 2~1.96! ~CSS 2 S1k!S
ance Component: X Y
R 5 1 11 (32)
S D
XY 2 2 2
th
2 2 b R 1R
Xi Yi
6.7.1 If the CSS exceeds the 95 percentile value of the
S D
~S 2 k!
!
( 2 2 2
b s 1s
Xi Yi
appropriate chi-square distribution (see 6.6.1), there is strong
where b and CSS are appropriate to the selected bias-
evidence that sources other than measurement error are con-
correction, and k is 0 if the bias-correction is Class 0; k is 1
tributing towards the variation of the expected agreement
if the bias correction is Class 1a or Class 1b; or k is 2 if the
between the two methods. In this practice, these sources are
bias-correction is Class 2.
attributed to sample-specific effects (also known as matrix
NOTE 12—Eq 32 provides an estimate of the magnitude below which
about 95 % of the differences are expected to fall, when one party uses the
effects or method-material interactions). In some cases these
bias-corrected X-method while another party uses the Y-method, on
sample-specific effects can be treated as random effects, and
materials similar to the round robin samples. Application of the methods
hence can be incorporated as an additional source of variation
to materials which are substantially different from these round robin
into a between methods reproducibility as described in this
materials may affect both the average bias and the variance of the random
section. Note that, even when it is appropriate to treat these
component. Laboratories which engage in routine substitution of one
method for another are advised to periodically monitor the deviations
sample-specific effects as random, the additional variation may
between methods, as a regular part of their quality assurance program.
cause the between methods reproducibility to be far larger than
6.8 Construction of an interval using a single bias-corrected
the root mean square of the reproducibilities of the methods
result from method X, and R that may contain, about 95 % of
(Eq 30).
XY
the time, a single result from method Y, if the latter is
6.7.2 Examine residuals to assess reasonableness of random
conducted on the same sample.
effect assumption.
ˆ
6.8.1 Let Y be a single bias-corrected X-method result. An
6.7.2.1 Assess the reasonableness of the assumption that the
ˆ
interval bounded by Y 6 R can be expected to contain a
sample-specific biases can be treated as random effects by XY
single corresponding Y-method result, obtained on the identical
examination of the distribution of the residuals. While there are
material about 95 % of the time. Here R is computed from Eq
numerous statistical tools available to perform this assessment, XY
ˆ
30 or Eq 32, as appropriate, with R evaluated at Y = Y.
Y
this practice recommends use of the Anderson-Darling normal-
ity test, based on its simplicity and ease of use. It is not the
7. Report
intent of this practice to exclude other tools for this purpose.
ˆ
7.1 Upon completion of the calculations, it is recommended
6.7.2.2 Let {Y } be the Y-method values predicted from the
i
that the assessment findings be reported in the Precision and
corresponding X-method mean values {X }, using the bias-
i
Bias section of the appropriate test method(s). In order for the
correction selected in 6.5. The (standardized) residuals {ε } are
i
assessment to be claimed to be compliant with this practice, the
given by:
outcome, whether it is a success or fail, shall be reported. For
ˆ
ε 5 =w ~Y 2 Y ! (31)
i i i i
successful outcome, it is mandatory to report the bias correc-
tion equation, applicable test result ranges for the equation, and
where:
between-method reproducibility (R ). In the event that one of
XY
{w } = the appropriate weights from 6.4.1 – 6.4.4.
i
the test methods assessed is cited as a referee test method, with
6.7.2.3 Calculate the Anderson Darling (AD) statistic on the
the other test method being an alternative, this practice
residuals {ε }. (Refer to Practice D6299 for guidance on
recommends the following naming convention, indicating the
i
calculation and interpretation of this statistic.)
publication year for method D YYYY by the addition of suffix
“-yy”, and the publication year for method XXXX by the
6.7.2.4 If the AD statistic is not significant at the 5 %
significance level, conclude that the sample-specific relative addition of the suffix “-xx”:
bias may be treated as a variance component. Proceed to 6.7.3.
Referee Test Method designation: Test Method D YYYY-yy
Alternative Test Method designation: Test Method D XXXX-xx
6.7.2.5 If the AD statistic is significant, there is strong
evidence that the sample-specific effects cannot be treated as 7.2 The reporting format and information in this section
random effects. Application of this practice is considered (7.2) can be followed at the discretion of the user. The phrase
terminated at this point, as the statistical evidence suggests that “List sample types and property ranges” in this section refers to
a single between methods reproducibility (R ) cannot be an overview summary of sample types used to conduct study.
XY
found that is applicable to all materials covered by the Due to the random nature of sample-specific biases, users are
intersecting scope of both test methods. It is reasonable to not required (nor is it always possible) to explain these biases
D6708 − 24
A
TABLE 1 Summary of Findings
A B C D1 D2 D3 Assessment
Outcome
Is there adequate variation Is there adequate Will a scaling/bias correction Are there sample- If yes to (D1), If no to (D1),
in the property level of correlation significantly improve the specific biases? can these biases are the residuals
the sample set relative to between the test results agreement between the results be treated as a randomly
Test Method XXXX and from Test Method XXXX from Test Method XXXX random effect? scattered?
Test Method YYYY and Test Method YYYY? and Test Method YYYY
reproducibilities? over and above their combined
reproducibilities?
Yes Yes No No N/A Yes Pass (A1)
Yes Yes No No N/A No Fail (B4)
Yes Yes No Yes Yes N/A Pass (A2)
Yes Yes No Yes No N/A Fail (B3)
Yes Yes Yes No N/A Yes Pass (A3)
Yes Yes Yes No N/A No Fail (B4)
Yes Yes Yes Yes Yes N/A Pass (A4)
Yes Yes Yes Yes No N/A Fail (B3)
Yes No N/A N/A N/A N/A Fail (B2)
No N/A N/A N/A N/A N/A Fail (B1)
A
Boldfaced type indicates reason for failure.
by listing detailed characterizations of each of the samples.
Differences between results from Test Method D XXXX and Test
Report assessment findings in the Precision and Bias section of
Method D YYYY-yy, for the sample types and property ranges
the appropriate test method, under a subsection titled
studied, are expected to exceed the following between methods re-
“Between-Method Bias,” as follows:
producibility (R ), as defined in Practice D6708, about 5 % of the
XY
time. (Report the between methods reproducibility here.)
Degree of Agreement between results by Test Method D XXXX and
Test Method D YYYY-yy—Results on the same materials produced
7.2.1.3 If the finding is A2, report the following:
by Test Method D XXXX and Test Method D YYYY-yy have been
assessed in accordance with procedures outlined in Practice D6708. No bias-correction considered in Practice D6708 can further improve
the agreement between results from Test Method D XXXX and Test
The findings are: (report the findings here).
Method D YYYY-yy for the material types and property range listed
7.2.1 To choose the appropriate findings, see Table 1. (A)
below (reference Research Report ZZZZ). Sample-specific bias, as
represents passing, and (B) represents failure. Choose one of defined in Practice D6708, was observed for some samples. (List
sample types and property ranges for above findings here.)
the following findings (A1, A2, A3, A4, B1, B2, B3, or B4).
7.2.1.1 If the finding is A1, and R , estimated with at least
X Differences between results from Test Method D XXXX and Test
30 degrees of freedom, is less than or equal to 1.2 published R , Method D YYYY-yy, for the sample types and property ranges studied,
Y
are expected to exceed the following between methods reproducibility
report the following for property range where R satisfies the
X
(R ), as defined in Practice D6708, about 5 % of the time. (Report the
XY
aforementioned requirement.
between methods reproducibility here.)
No bias-correction considered in Practice D6708 can further im-
As a consequence of sample-specific biases, R may exceed the
XY
prove the agreement between results from Test Method D XXXX
reproducibility for Test Method D XXXX (R ), or reproducibility for
X
and Test Method D YYYY-yy for the materials studied (reference
Test Method D YYYY-yy (R ), or both. Users intending to use Test
Y
Research Report ZZZZ). For applications where Test Method X is
Method D XXXX as a predictor of Test Method D YYYY-yy, or vice
used as an alternative to Test Method Y, results from Test Method
versa, are advised to assess the required degree of prediction
D XXXX and Test Method D YYYY-yy may be considered to be sta-
agreement relative to the estimated R to determine the fitness-for-
XY
tistically indistinguishable, for sample types and property ranges
use of the prediction.
listed below. No sample-specific bias, as defined in Practice D6708,
was observed for the materials studied.
7.2.1.4 If the finding is A3, and R estimated with at least 30
X
Sample types and property range where results from method degrees of freedom, is less than or equal to 1.2 published R ,
Y
D XXXX and DYYYY-yy may be considered to be statistically indis-
report the following for property range where R satisfies the
X
tinguishable are: (list applicable sample types and property ranges
aforementioned requirement:
here).
7.2.1.2 If the finding is A1, for property range where R
X
does not meet the requirement listed above, report the follow-
ing:
No bias-correction considered in Practice D6708 can further improve
the agreement between results from Test Method D XXXX and Test
Method D YYYY-yy for the materials studied (reference Research
Report ZZZZ). No sample-specific bias, as defined in Practice
D6708, was observed for the materials and property range listed
below. (List sample types and property ranges for above findings
here.)
D6708 − 24
b, a = parameter estimates for a linear correction as defined
The degree of agreement between results from Test Method
in this practice.
D XXXX and Test Method D YYYY-yy can be further improved by
applying correction equation C1 as listed below (reference Research
Differences between bias-corrected results from Test Method
Report ZZZZ). For applications where Test Method X is used as an
D XXXX and Test Method D YYYY-yy, for the sample types and
alternative to Test Method Y, bias-corrected results from Test Method
property ranges studied, are expected to exceed the following
D XXXX (as per correction equation C1) and results from Test
between methods reproducibi
...


This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: D6708 − 21 D6708 − 24 An American National Standard
Standard Practice for
Statistical Assessment and Improvement of Expected
Agreement Between Two Test Methods that Purport to
Measure the Same Property of a Material
This standard is issued under the fixed designation D6708; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope*
1.1 This practice covers statistical methodology for assessing the expected agreement between two different standard test methods
that purport to measure the same property of a material, and for the purpose of deciding if a simple linear bias correction can further
improve the expected agreement. It is intended for use with results obtained from interlaboratory studies meeting the requirement
of Practice D6300 or equivalent (for example, ISO 4259). The interlaboratory studies shall be conducted on at least ten materials
in common that among them span the intersecting scopes of the test methods, and results shall be obtained from at least six
laboratories using each method. Requirements in this practice shall be met in order for the assessment to be considered suitable
for publication in either method, if such publication includes claim to have been carried out in compliance with this practice. Any
such publication shall include mandatory information regarding certain details of the assessment outcome as specified in the Report
section of this practice.
1.2 The statistical methodology is based on the premise that a bias correction will not be needed. In the absence of strong statistical
evidence that a bias correction would result in better agreement between the two methods, a bias correction is not made. If a bias
correction is required, then the parsimony principle is followed whereby a simple correction is to be favored over a more complex
one.
NOTE 1—Failure to adhere to the parsimony principle generally results in models that are over-fitted and do not perform well in practice.
1.3 The bias corrections of this practice are limited to a constant correction, proportional correction, or a linear (proportional +
constant) correction.
1.4 The bias-correction methods of this practice are method symmetric, in the sense that equivalent corrections are obtained
regardless of which method is bias-corrected to match the other.
1.5 A methodology is presented for establishing the numerical limit (designated by this practice as the between methods
reproducibility) that would be exceeded about 5 % of the time (one case in 20 in the long run) for the difference between two results
where each result is obtained by a different operator in a different laboratory using different apparatus and each applying one of
the two methods X and Y on identical material, where one of the methods has been appropriately bias-corrected in accordance with
this practice, in the normal and correct operation of both test methods.
This practice is under the jurisdiction of ASTM Committee D02 on Petroleum Products, Liquid Fuels, and Lubricants and is the direct responsibility of Subcommittee
D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.
Current edition approved May 1, 2021March 1, 2024. Published May 2021March 2024. Originally approved in 2001. Last previous edition approved in 20192021 as
ɛ1
D6708 – 19aD6708 – 21. . DOI: 10.1520/D6708-21.10.1520/D6708-24.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
D6708 − 24
NOTE 2—In earlier versions of this standard practice, the term “cross-method reproducibility” was used in place of the term “between methods
reproducibility.” The change was made because the “between methods reproducibility” term is more intuitive and less confusing. It is important to note
that these two terms are synonymous and interchangeable with one another, especially in cases where the “cross-method reproducibility” term was
subsequently referenced by name in methods where a D6708 assessment was performed, before the change in terminology in this standard practice was
adopted.
NOTE 3—Users are cautioned against applying the between methods reproducibility as calculated from this practice to materials that are significantly
different in composition from those actually studied, as the ability of this practice to detect and address sample-specific biases (see 6.7) is dependent on
the materials selected for the interlaboratory study. When sample-specific biases are present, the types and ranges of samples may need to be expanded
significantly from the minimum of ten as specified in this practice in order to obtain a more comprehensive and reliable between methods reproducibility
that adequately cover the range of sample-specific biases for different types of materials.
1.6 This practice is intended for test methods which measure quantitative (numerical) properties of petroleum or petroleum
products.
1.7 The statistical calculations of this practice are also applicable for assessing the expected agreement between two different test
methods that purport to measure the same property of a material using results that are not as described in 1.1, provided the results
and associated statistics from each test method are obtained from a specifically designed multi-lab study or from a proficiency
testing program (e.g.: ILCP) where for each sample a single result is provided by each lab for each test method. The comparison
sample set shall comprise at least ten different materials that span the intersecting scopes of the test methods with no material
exceeding the leverage requirement in Practice D6300. Results and statistics shall meet requirements in 1.7.1. Requirements in this
practice shall be met in order for the assessment to be considered suitable for publication in either method, if such publication
includes claim to have been carried out in compliance with this practice. Any such publication shall include mandatory information
regarding certain details of the assessment as specified in the Report section of this practice. R shall be based on the published
XY
reproducibility of the methods.
1.7.1 For each test method and sample, results and statistics used to perform the assessment in 1.7 shall meet the following
requirements:
(1) No. of results (N) ≥ 10,
(2) Anderson Darling statistic ≤ 1.12 (based on Normal Distribution),
(3) Standard Error (se ) is calculated using published reproducibility evaluated at the sample mean, N, and the factor 2.8
sample
as follows:
se 5@R ⁄ ~2.8 =N!# (1)
sample pub
(4) se is numerically less than [R / (2.8 √10 )], and
sample pub
(5) Sample standard deviation (s ) per root-mean-square technique is not statistically greater than R / 2.8 for at least
sample pub
80 % of the samples in the comparison data set based on an F-test using 30 as the assumed degrees of freedom for R , and (N
pub
− 1) for s at the 0.05 significance level.
sample
1.8 The methodology in this practice can also be used to perform linear regression analysis between two variables (X, Y) where
there is known uncertainty in both variables that may or may not be constant over the regression range. The common acronym used
to describe this type of linear regression is ReXY (Regression with errors in X and Y). The ReXY technique for assessing the
correlation between two variables as described in this practice can be used for investigative applications where the strict data input
requirement may not be met, but the outcome can still be useful for the intended application. Use of this practice for ReXY should
be conducted under the tutelage of subject matter experts familiar with the statistical theory and techniques described in this
practice, the methodologies associated with the production and collection of the results to be used for the regression analysis, and
interpretation of assessment outcome relative to the intended application.
1.9 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Referenced Documents
2.1 ASTM Standards:
D5580 Test Method for Determination of Benzene, Toluene, Ethylbenzene, p/m-Xylene, o-Xylene, C and Heavier Aromatics,
and Total Aromatics in Finished Gasoline by Gas Chromatography
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
D6708 − 24
D5769 Test Method for Determination of Benzene, Toluene, and Total Aromatics in Finished Gasolines by Gas
Chromatography/Mass Spectrometry
D6299 Practice for Applying Statistical Quality Assurance and Control Charting Techniques to Evaluate Analytical Measure-
ment System Performance
D6300 Practice for Determination of Precision and Bias Data for Use in Test Methods for Petroleum Products, Liquid Fuels, and
Lubricants
D7372 Guide for Analysis and Interpretation of Proficiency Test Program Results
2.2 ISO Standard:
ISO 4259 Petroleum Products—Determination and Application of Precision Data in Relation to Methods of Test
3. Terminology
3.1 Definitions:
3.1.1 between ILCP method-averages reproducibility (R ), n—a quantitative expression of the random error
ILCP_ X˜, ILCP_Y
associated with the difference between the bias-corrected ILCP average of method X versus the ILCP average of method Y from
a Proficiency Testing program, when the method X has been assessed versus method Y, and an appropriate bias-correction has been
applied to all method X results in accordance with this practice; it is defined as the numerical limit for the difference between two
such averages that would be exceeded about 5 % of the time (one case in 20 in the long run).
3.1.2 between-method bias, n—a quantitative expression for the mathematical correction that can statistically improve the degree
of agreement between the expected values of two test methods which purport to measure the same property.
3.1.3 between methods reproducibility (R ),n—a quantitative expression of the random error associated with the difference
XY
between two results obtained by different operators in different laboratories using different apparatus and applying the two methods
X and Y, respectively, each obtaining a single result on an identical test sample, when the methods have been assessed and an
appropriate bias-correction has been applied in accordance with this practice; it is defined as the numerical limit for the difference
between two such single and independent results that would be exceeded about 5 % of the time (one case in 20 in the long run)
in the normal and correct operation of both test methods.
3.1.3.1 Discussion—
A statement of between methods reproducibility shall include a description of any bias correction used in accordance with this
practice.
3.1.3.2 Discussion—
Between methods reproducibility is a meaningful concept only if there are no statistically observable sample-specific relative
biases between the two methods, or if such biases vary from one sample to another in such a way that they may be considered
random effects. (See 6.7.)
3.1.4 centered sum of squares (CSS), n—a statistic used to quantify the degree of agreement between the results from two test
methods after bias-correction using the methodology of this practice.
3.1.5 Interlaboratory Crosscheck Program (ILCP), n—ASTM International Proficiency Test Program sponsored by Committee
D02 on Petroleum Products, Liquid Fuels, and Lubricants; see ASTM website for current details. D7372
3.1.6 total sum of squares (TSS), n—a statistic used to quantify the information content from the inter-laboratory study in terms
of total variation of sample means relative to the standard error of each sample mean.
3.2 Symbols:
X,Y = single X-method and Y-method results, respectively
X , Y = single results from the X-method and Y-method round robins, respectively
ijk ijk
th
X , Y = means of results on the i round robin sample
I i
S = the number of samples in the round robin
th
L , L = the numbers of laboratories that returned results on the i round robin sample
Xi Yi
R , R = the reproducibilities of the X- and Y- methods, respectively
X Y
Available from American National Standards Institute (ANSI), 25 W. 43rd St., 4th Floor, New York, NY 10036.
D6708 − 24
th
R , R = the reproducibility of method X and Y, evaluated at the method X and Y means of the i round robin sample,
Xi Yi
respectively
R = estimate of between ILCP method-averages reproducibility
ILCP_ X˜, ILCP_Y
th
s , s = the reproducibility standard deviations, evaluated at the method X and Y means of the i round robin sample
RXi RYi
th
s , s = the repeatability standard deviations, evaluated at the method X and Y means of the i round robin sample
rXi rYi
th
s , s = standard errors of the means i round robin sample
Xi Yi
X¯, Y¯ = the weighted means of round robins (across samples)
th
x , y = deviations of the means of the i round robin sample results from X¯ and Y¯, respectively.
i i
TSS , TSS = total sums of squares, around X¯ and Y¯
X Y
F = a ratio for comparing variances; not unique—more than one use
v , v = the degrees of freedom for reproducibility variances from the round robins
X Y
th
w = weight associated with the difference between mean results (or corrected mean results) from the i round
i
robin sample
CSS = centered sum of squares, weighted sum of squared differences between (possibly corrected) mean results from
the round robin
a,b = parameters of a linear correction: Yˆ = a + bX
t , t = ratios for assessing reductions in sums of squares
1 2
R = estimate of between methods reproducibility
XY
Yˆ = predicted Y-method value for a sample by applying the bias correction established from this practice to an
actual X-method result for the same sample
th
Yˆ = predicted i round robin sample Y-method mean, by applying the bias correction established from this
i
practice to its corresponding X-method mean
ε = standardized difference between Y and Yˆ .
i i i
L , L = harmonic mean numbers of laboratories submitting results on round robin samples, by X- and Y- methods,
X Y
respectively
R = estimate of between methods reproducibility, computed from an X-method result only
X Yˆ
4. Summary of Practice
4.1 Precisions of the two methods are quantified using inter-laboratory studies meeting the requirements of Practice D6300 or
equivalent, using at least ten samples in common that span the intersecting scopes of the methods. The arithmetic means of the
results for each common sample obtained by each method are calculated. Estimates of the standard errors of these means are
computed.
NOTE 4—For established standard test methods, new precision studies generally will be required in order to meet the common sample requirement.
NOTE 5—Both test methods do not need to be run by the same laboratory. If they are, care should be taken to ensure the independent test result requirement
of Practice D6300 is met (for example, by double-blind testing of samples in random order).
4.2 Weighted sums of squares are computed for the total variation of the mean results across all common samples for each method.
These sums of squares are assessed against the standard errors of the mean results for each method to ensure that the samples are
sufficiently varied before continuing with the practice.
4.3 The closeness of agreement of the mean results by each method is evaluated using appropriate weighted sums of squared
differences. Such sums of squares are computed from the data first with no bias correction, then with a constant bias correction,
then, when appropriate, with a proportional correction, and finally with a linear (proportional + constant) correction.
4.4 The weighted sums of squared differences for the linear correction is assessed against the total variation in the mean results
for both methods to ensure that there is sufficient correlation between the two methods.
4.5 The most parsimonious bias correction is selected.
4.6 The weighted sum of squares of differences, after applying the selected bias correction, is assessed to determine whether
additional unexplained sources of variation remain in the residual (that is, the individual Y minus bias-corrected X ) data. Any
i i
remaining, unexplained variation is attributed to sample-specific biases (also known as method-material interactions, or matrix
effects). In the absence of sample-specific biases, the between methods reproducibility is estimated.
4.7 If sample-specific biases are present, the residuals (that is, the individual Y minus bias-corrected X ) are tested for randomness.
i i
D6708 − 24
FIG. 1 Simplified Flow Diagram for this Practice
If they are found to be consistent with a random-effects model, then their contribution to the between methods reproducibility is
estimated, and accumulated into an all-encompassing between methods reproducibility estimate.
4.8 Refer to Fig. 1 for a simplified flow diagram of the process described in this practice.
D6708 − 24
5. Significance and Use
5.1 This practice can be used to determine if a constant, proportional, or linear bias correction can improve the degree of
agreement between two methods that purport to measure the same property of a material.
5.2 The bias correction developed in this practice can be applied to a single result (X) obtained from one test method (method X)
to obtain a predicted result (Yˆ) for the other test method (method Y).
NOTE 5—Users are cautioned to ensure that Yˆ is within the scope of method Y before its use.
5.3 The between methods reproducibility established by this practice can be used to construct an interval around Yˆ that would
contain the result of test method Y, if it were conducted, with approximately 95 % probability.
5.4 This practice can be used to guide commercial agreements and product disposition decisions involving test methods that have
been evaluated relative to each other in accordance with this practice.
5.5 The magnitude of a statistically detectable bias is directly related to the uncertainties of the statistics from the experimental
study. These uncertainties are related to both the size of the data set and the precision of the processes being studied. A large data
set, or, highly precise test method(s), or both, can reduce the uncertainties of experimental statistics to the point where the
“statistically detectable” bias can become “trivially small,” or be considered of no practical consequence in the intended use of the
test method under study. Therefore, users of this practice are advised to determine in advance as to the magnitude of bias correction
below which they would consider it to be unnecessary, or, of no practical concern for the intended application prior to execution
of this practice.
NOTE 6—It should be noted that the determination of this minimum bias of no practical concern is not a statistical decision, but rather, a subjective decision
that is directly dependent on the application requirements of the users.
6. Procedure
NOTE 7—For an in-depth statistical discussion of the methodology used in this section, see Appendix X1. For a worked example, see Appendix X2.
6.1 Calculate sample means and standard errors from Practice D6300 results.
6.1.1 The process of applying Practice D6300 to the data may involve elimination of some results as outliers, and it may also
involve applying a transformation to the data. For this practice, compute the mean results from data that have not been transformed,
but with outliers removed in accordance with Practice D6300. The precision estimates from Practice D6300 are used to estimate
the standard errors of these means.
6.1.2 Compute the means as follows:
th th th
6.1.2.1 Let X represent thek result on thei common material by thej lab in the round robin for method X. Similarly
ijk
th th th
ijk
forY .(Thei material is the same for both round robins, but thej lab in one round robin is not necessarily the same lab as thej
th th
lab in the other round robin.) Letn be the number of results on thei material from thej X-method lab, after removing outliers,
Xij
that is, the number of results incell (i,j). LetL be the number of laboratories in the X-method round robin that have at least one
Xi
th
result on thei material remaining in the data set, after removal of outliers. LetS be the total number of materials common to both
round robins.
th
6.1.2.2 The mean X-method result for the i material is:
X
( ijk
k
X 5 (2)
i (
L n
j
xi Xij
th
where, X is the average of the cell averages on the i material by method X.
i
th
6.1.2.3 Similarly, the mean Y-method result for the i material is:
D6708 − 24
Y
( ijk
k
Y 5 (3)
i (
L n
j
Yi Yij
6.1.3 The standard errors (standard deviations of the means of the results) are computed as follows:
6.1.3.1 If s is the estimated reproducibility standard deviation from the X-method round robin, and s is the estimated
RXi rXi
repeatibility standard deviation, then an estimate of the standard error for X is given by:
i
1 1 1
2 2
s 5 s 2 s 12 (4)
Œ F S DG
Xi RXi rXi
(
L L n
Xi Xi j Xij
NOTE 8—Since repeatability and reproducibility may vary with X, even if the L were the same for all materials and the n were the same for all
Xi Xij
laboratories and all materials, the {s } might still differ from one material to the next.
Xi
6.1.3.2 s , the estimated standard error for Y , is given by an analogous formula.
Yi i
6.2 Calculate the total variation sum of squares for each method, and determine whether the samples can be distinguished from
each other by both methods.
6.2.1 The total sums of squares (TSS) are given by:
2 2
¯ ¯
X 2 X Y 2 Y
i i
TSS 5 S D and TSS 5 S D (5)
x y
( (
s s
i Xi i Yi
where:
X Y
i i
S D S D
( 2 ( 2
s s
i i
Xi Yi
¯ ¯
X 5 and Y 5 (6)
1 1
S D S D
2 2
( (
s s
i Xi i Yi
are weighted averages of all X ’s and Y ’s respectively.
i i
th
6.2.2 Compare F = TSS /(S-1) to the 95 percentile of Fisher’s F distribution with (S-1) and v degrees of freedom for the
X x
numerator and denominator, respectively, where v is the degrees of freedom for the reproducibility variance (Practice D6300,
X
th
paragraph 8.3.3.3) for the X-method round robin. If F does not exceed the 95 percentile, then the X-method is not sufficiently
precise to distinguish among the S samples. Do not proceed with this practice, as meaningful results cannot be produced.
th
6.2.3 In a similar manner, compare F = TSS /(S-1) to the 95 percentile of Fisher’s F distribution, using the degrees of freedom
Y
of the reproducibility variance of the Y-method, v , in place of v . Similarly, do not proceed with this practice if F does not exceed
Y X
th
the 95 percentile.
NOTE 9—If one or both of the conditions of 6.2.2 and 6.2.3 are satisfied only marginally, it is unlikely that this practice will produce a meaningful outcome.
The test in the next subsection will almost certainly fail.
6.3 Test whether the methods are sufficiently correlated.
6.3.1 Using the weights {w } as computed in 6.4.1.1, Eq 6, calculate the weighted correlation coeffıcient r:
i
¯ ¯
~ !~ !
w X 2 X Y 2 Y
( i i i
r 5 (7)
2 2
¯ ¯
=
w ~X 2 X! w ~Y 2 Y!
i i i i
( (
where X¯ and Y¯ are w X w and w Y w , respectively.
( ( ( (
i i/ i i i/ i
6.3.2 Use r to calculate the F-statistic:
~S 2 2!r
F 5 (8)
12 r
th
6.3.3 Compare F to the 99 percentile of Fisher’s F distribution with 1 and S-2 degrees of freedom in the numerator and
denominator, respectively.
D6708 − 24
th
6.3.3.1 If F is less than the 99 percentile value, then this practice concludes that the methods are too discordant to permit use
of the results from one method to predict those of the other.
6.3.3.2 If F is greater than the tabled value, proceed to 6.5.
6.4 Calculate the centered sum of squares (CSS) statistic for each of the following classes of bias-correction methodology.
NOTE 10—The revised algorithms presented in this version of D6708 were developed in order to correct very rare cases in which the algorithms of
previous versions do not converge to the optimal linear models. The rare cases generally involved data sets with poor correlations between the two
methods. In the vast majority of data sets, including worked example of this practice, the old and the new algorithms converge to exactly the same optimal
models. Continuing to use the old algorithms is a reasonable option provided the user verifies that the computed value of CSS1b is never larger than CSS0,
and that the computed value of CSS2 is never larger than either CSS1a or CSS1b. If the aforementioned situation is detected using the old algorithms,
then the outcome from this version is deemed to be the correct outcome.
6.4.1 Class 0—No bias correction.
6.4.1.1 Compute the weights (w ) for each sample i:
i
w 5 (9)
2 2
i
s 1s
Yi Xi
6.4.1.2 Compute CSS:
CSS 5 w ~X 2 Y ! (10)
0 ( i i i
i
6.4.2 Class 1a—Constant bias correction.
6.4.2.1 Using the weights (w ) from 6.4.1.1, compute the constant bias correction (a):
i
w Y 2 X w Y w X
~ !
i i i i i i i
( (
a 5 5 2 (11)
(
i w w w
( i ( i ( i
i
6.4.2.2 Compute CSS:
CSS 5 w Y 2 X 1a (12)
~ ~ !!
1a ( i i i
i
6.4.3 Class 1b—Proportional bias correction.
6.4.3.1 The computations of this subsection (6.4.3) are appropriate only if both of the following conditions apply: (1) the measured
property assumes only non-negative values, and (2) a property value of zero has a physical significance (for example,
concentrations of specific constituents). In addition, it is not mandatory but highly recommended that max(Y )≥2 min(Y ).
i i
6.4.3.2 The computations involve iterative calculation of the weights {w } and the proportional correction b.
i
6.4.3.3 Set b = 1.
6.4.3.4 Compute the weight w for each sample i:
i
w 5 (13)
i 2 2 2
s 1b s
Yi Xi
6.4.3.5 Calculate the following three sums:
2 2
A 5 w X Y s (14)
( i i i Xi
2 2 2 2 2
B 5 w ~X s 2 Y s ! (15)
( i i Yi i Xi
2 2
C 52 w X Y s (16)
i i i Yi
(
D6708 − 24
6.4.3.6 Calculate b :
2B1=B 2 4AC
b 5 (17)
2A
6.4.3.7 If |b − b | > .001 b, replace b with b and go back to 6.4.3.4. Otherwise, the iteration can be stopped, as further iteration
0 0
will not produce meaningful improvement. Replace b with b and go on to 6.4.3.8.
6.4.3.8 Calculate the final weights {w } as in 6.4.3.4.
i
6.4.3.9 Calculate CSS :
1b
CSS 5 w Y 2 bX (18)
~ !
1b ( i i i
6.4.4 Class 2—Linear (proportional + constant) bias correction.
6.4.4.1 This involves iterative calculation of the weights {w }, the weighted means of X ’s and Y ’s, and the proportional term b.
i i i
6.4.4.2 Set b = 1.
6.4.4.3 Compute the weight w for each sample i:
i
w 5 (19)
i 2 2 2
s 1b s
yi xi
6.4.4.4 Calculate the weighted means of {X } and {Y } respectively:
i i
w Y
( i i
¯
Y 5 (20)
w
( i
w X
( i i
¯
X 5
w
( i
6.4.4.5 Calculate the deviations from the weighted means:
¯
x 5 X 2 X (21)
i i
¯
y 5 Y 2 Y
i i
6.4.4.6 Calculate the three sums:
2 2
A 5 w x y s (22)
( i i i Xi
2 2 2 2 2
B 5 w ~x s 2 y s ! (23)
( i i Yi i Xi
2 2
C 52 w x y s (24)
( i i i Yi
6.4.4.7 Calculate b :
=
2B1 B 2 4AC
b 5 (25)
2A
6.4.4.8 If |b − b | > .001 b, replace b with b and go back to 6.4.4.3, computing new values for the weights {w }, X¯,Y¯, {x }, {y },
0 0 i i i
and b . Otherwise, the iteration can be stopped, as further iteration will not produce meaningful improvement. Replace b with b
0 0
and go to 6.4.4.9.
6.4.4.9 Calculate the final weights {w } as in 6.4.4.3.
i
D6708 − 24
6.4.4.10 Calculate CSS and a:
CSS 5 w ~y 2 bx ! (26)
2 ( i i i
¯ ¯
a 5 Y 2 b X (27)
6.5 Conduct tests to select the most parsimonious bias correction class needed.
6.5.1 The centered sum of squares for differences from each class of bias correction are used to select the most parsimonious bias
correction class that can improve the expected degree of agreement between the Yˆ (the predicted Y-method result using X-method
result) and the actual Y-method result on the same material. The classes of bias correction and the associated CSS as calculated
earlier are repeated in the following table.
Bias Correction Class CSS
Class 0–no correction CSS
Class 1a–constant bias correction CSS
1a
Class 1b–proportional bias correction (when appropriate) CSS
1b
Class 2–linear (proportional + constant bias correction) CSS
6.5.2 To determine whether any bias correction (Class 1a, 1b, or 2 above) can significantly improve the expected agreement
between the two methods, calculate the following ratio:
~CSS 2 CSS !/2
0 2
F 5 (28)
CSS / S 2 2
~ !
th
6.5.2.1 Compare F to the upper 95 percentile of the F distribution with 2 and S-2 degrees of freedom for the numerator and
denominator, respectively.
6.5.2.2 If the calculated F is smaller, conclude that a bias correction of Class 1a, 1b, or 2 does not sufficiently improve the
expected agreement between the two methods, relative to Class 0 (no bias correction). Proceed to 6.6.
6.5.2.3 If the calculated F is larger, conclude that a correction can improve the expected agreement between the two methods, and
continue in 6.5.3.
th
6.5.3 If the F-value calculated in 6.5.2 is larger than the 95 percentile of F, compute the following t-ratios:
CSS 2 CSS
0 1
t 5 (29)
Œ
CSS / S 2 2
~ !
CSS 2 CSS
1 2
t 5Œ
CSS /~S 2 2!
where, CSS is the lesser of CSS or CSS , provided the latter is appropriate and has been calculated.
1 1a 1b
th
6.5.3.1 Compare t to the upper 97.5 percentile of the t distribution with S-2 degrees of freedom.
6.5.3.2 If t is larger, conclude that a bias correction of Class 2 (proportional + constant correction) can improve the expected
agreement over that of a single term (constant or proportional) correction alone (Class 1). Proceed to 6.6.
th
6.5.3.3 If t is smaller than the t-percentile, compare t to the same upper 97.5 percentile of the t distribution with (S-2) degrees
2 1
of freedom.
6.5.3.4 If t is larger, conclude that a single term bias correction of Class 1 is preferred to a bias correction of Class 2. Use the
constant correction unless CSS is appropriate and is smaller than CSS . Proceed to 6.6.
1b 1a
6.5.3.5 If t is smaller, then neither t nor t is statistically significant. A bias correction of Class 2 is preferred over single-term
1 1 2
(constant or proportional) correction of Class 1.
6.6 Test for existence of sample-specific biases.
D6708 − 24
th
6.6.1 Compare the CSS of the bias-correction class selected in 6.5 to the 95 percentile value of a chi-square distribution with v
degrees of freedom.
where:
v = S for Class 0 (no bias) correction,
v = S − 1 for Class 1a or Class 1b (constant or proportional) correction, and
v = S − 2 for Class 2 (linear) correction.
6.6.2 If the CSS is smaller than the chi-square percentile, it is reasonable to conclude that there are no sample-specific biases, that
is, that there are no other sources of variation that are statistically observable above the measurement error. Perform the
Anderson-Darling (A-D) assessment on the residuals as per 6.7.2.2 and 6.7.2.3. If the outcome is not significant at the 5 % level,
calculate the between methods reproducibility (R ) as per Eq 30 below. If the A-D assessment is significant, application of the
XY
practice is considered terminated with failure at this point, as the statistical evidence suggests that a single between-method
reproducibility (R ) cannot be found that is applicable to all materials covered by the intersecting scope of both test methods.
XY
It is reasonable to conclude that, at least for some materials, the test methods are not measuring the same property.
2 2 2
R 1b R
Y X
R 5 (30)
Œ
XY
where:
b = the coefficient of the appropriate bias correction. (For Class 0 and Class 1a bias corrections, b=1.)
6.6.3 If the CSS is larger than the chi-square percentile (see 6.6.1), there is strong evidence that biases between the methods have
not been adequately corrected by the bias-corrections of 6.4. In other words, the relative biases are not consistent across the S
common samples of the round robins. The user may wish to investigate whether the biases can be attributed to other observable
properties of the samples. Or he or she may wish to restrict attention to a smaller class of materials for the purpose of establishing
a between methods reproducibility. Such investigations are beyond the scope of this practice, as the issues typically are not
statistical in nature. This practice does recommend investigating whether it is reasonable to treat the sample-specific biases as
random effects, as described in 6.7.
6.7 Treatment of Sample-Specific Relative Bias as a Variance Component:
th
6.7.1 If the CSS exceeds the 95 percentile value of the appropriate chi-square distribution (see 6.6.1), there is strong evidence
that sources other than measurement error are contributing towards the variation of the expected agreement between the two
methods. In this practice, these sources are attributed to sample-specific effects (also known as matrix effects or method-material
interactions). In some cases these sample-specific effects can be treated as random effects, and hence can be incorporated as an
additional source of variation into a between methods reproducibility as described in this section. Note that, even when it is
appropriate to treat these sample-specific effects as random, the additional variation may cause the between methods
reproducibility to be far larger than the root mean square of the reproducibilities of the methods (Eq 30).
6.7.2 Examine residuals to assess reasonableness of random effect assumption.
6.7.2.1 Assess the reasonableness of the assumption that the sample-specific biases can be treated as random effects by
examination of the distribution of the residuals. While there are numerous statistical tools available to perform this assessment, this
practice recommends use of the Anderson-Darling normality test, based on its simplicity and ease of use. It is not the intent of this
practice to exclude other tools for this purpose.
6.7.2.2 Let {Yˆ } be the Y-method values predicted from the corresponding X-method mean values {X }, using the bias-correction
i i
selected in 6.5. The (standardized) residuals {ε } are given by:
i
ˆ
ε 5=w ~Y 2 Y ! (31)
i i i i
where:
{w } = the appropriate weights from 6.4.1 – 6.4.4.
i
D6708 − 24
6.7.2.3 Calculate the Anderson Darling (AD) statistic on the residuals {ε }. (Refer to Practice D6299 for guidance on calculation
i
and interpretation of this statistic.)
6.7.2.4 If the AD statistic is not significant at the 5 % significance level, conclude that the sample-specific relative bias may be
treated as a variance component. Proceed to 6.7.3.
6.7.2.5 If the AD statistic is significant, there is strong evidence that the sample-specific effects cannot be treated as random effects.
Application of this practice is considered terminated at this point, as the statistical evidence suggests that a single between methods
reproducibility (R ) cannot be found that is applicable to all materials covered by the intersecting scope of both test methods. It
XY
is reasonable to conclude that, at least for some materials, the test methods are not measuring the same property. Do NOT proceeed
to 6.7.3.
NOTE 11—It is possible that, by restricting the comparison to a narrower class of materials, a between methods reproducibility can be obtained (for that
narrower class) that does not have sample-specific biases, or, has sample-specific biases that can be treated as a random effect. However, individual outlier
materials should not be excluded from this study, after-the-fact, based on the statistics only, without other evidence that they clearly belong to a separate
and identifiable class.
6.7.3 Calculate the between methods reproducibility (R ) as follows:
XY
2 2 2 2
b R R 2~1.96! ~CSS 2 S1k!S
X Y
R 5 1 11 (32)
S D
XY 2 2 2
2 2 b R 1R
Xi Yi
S D
~S 2k!
!
( 2 2 2
b s 1s
Xi Yi
where b and CSS are appropriate to the selected bias-correction, and k is 0 if the bias-correction is Class 0; k is 1 if the bias
correction is Class 1a or Class 1b; or k is 2 if the bias-correction is Class 2.
NOTE 12—Eq 32 provides an estimate of the magnitude below which about 95 % of the differences are expected to fall, when one party uses the
bias-corrected X-method while another party uses the Y-method, on materials similar to the round robin samples. Application of the methods to materials
which are substantially different from these round robin materials may affect both the average bias and the variance of the random component.
Laboratories which engage in routine substitution of one method for another are advised to periodically monitor the deviations between methods, as a
regular part of their quality assurance program.
6.8 Construction of an interval using a single bias-corrected result from method X, and R that may contain, about 95 % of the
XY
time, a single result from method Y, if the latter is conducted on the same sample.
6.8.1 Let Yˆ be a single bias-corrected X-method result. An interval bounded by Yˆ 6 R can be expected to contain a single
XYˆXY
corresponding Y-method result, obtained on the identical material about 95 % of the time. Here R is computed from Eq 30
XYˆXY
or Eq 32, as appropriate, with R evaluated at Y = Yˆ.
Y
7. Report
7.1 Upon completion of the calculations, it is recommended that the assessment findings be reported in the Precision and Bias
section of the appropriate test method(s). In order for the assessment to be claimed to be compliant with this practice, the outcome,
whether it is a success or fail, shall be reported. For successful outcome, it is mandatory to report the bias correction equation,
applicable test result ranges for the equation, and between-method reproducibility (R ). In the event that one of the test methods
XY
assessed is cited as a referee test method, with the other test method being an alternative, this practice recommends the following
naming convention, indicating the publication year for method D YYYY by the addition of suffix “-yy”, and the publication year
for method XXXX by the addition of the suffix “-xx”:
Referee Test Method designation: Test Method D YYYY-yy
Alternative Test Method designation: Test Method D XXXX-xx
7.2 The reporting format and information in this section (7.2) can be followed at the discretion of the user. The phrase “List sample
types and property ranges” in this section refers to an overview summary of sample types used to conduct study. Due to the random
nature of sample-specific biases, users are not required (nor is it always possible) to explain these biases by listing detailed
characterizations of each of the samples. Report assessment findings in the Precision and Bias section of the appropriate test
method, under a subsection titled “Between-Method Bias,” as follows:
D6708 − 24
A
TABLE 1 Summary of Findings
A B C D1 D2 D3 Assessment
Outcome
Is there adequate variation Is there adequate Will a scaling/bias correction Are there sample- If yes to (D1), If no to (D1),
in the property level of correlation significantly improve the specific biases? can these biases are the residuals
the sample set relative to between the test results agreement between the results be treated as a randomly
Test Method XXXX and from Test Method XXXX from Test Method XXXX random effect? scattered?
Test Method YYYY and Test Method YYYY? and Test Method YYYY
reproducibilities? over and above their combined
reproducibilities?
Yes Yes No No N/A Yes Pass (A1)
Yes Yes No No N/A No Fail (B4)
Yes Yes No Yes Yes N/A Pass (A2)
Yes Yes No Yes No N/A Fail (B3)
Yes Yes Yes No N/A Yes Pass (A3)
Yes Yes Yes No N/A No Fail (B4)
Yes Yes Yes Yes Yes N/A Pass (A4)
Yes Yes Yes Yes No N/A Fail (B3)
Yes No N/A N/A N/A N/A Fail (B2)
No N/A N/A N/A N/A N/A Fail (B1)
A
Boldfaced type indicates reason for failure.
Degree of Agreement between results by Test Method D XXXX and
Test Method D YYYY-yy—Results on the same materials produced
by Test Method D XXXX and Test Method D YYYY-yy have been
assessed in accordance with procedures outlined in Practice D6708.
The findings are: (report the findings here).
7.2.1 To choose the appropriate findings, see Table 1. (A) represents passing, and (B) represents failure. Choose one of the
following findings (A1, A2, A3, A4, B1, B2, B3, or B4).
7.2.1.1 If the finding is A1, and R , estimated with at least 30 degrees of freedom, is less than or equal to 1.2 published R , report
X Y
the following for property range where R satisfies the aforementioned requirement.
X
No bias-correction considered in Practice D6708 can further im-
prove the agreement between results from Test Method D XXXX
and Test Method D YYYY-yy for the materials studied (reference
Research Report ZZZZ). For applications where Test Method X is
used as an alternative to Test Method Y, results from Test Method
D XXXX and Test Method D YYYY-yy may be considered to be sta-
tistically indistinguishable, for sample types and property ranges
listed below. No sample-specific bias, as defined in Practice D6708,
was observed for the materials studied.
Sample types and property range where results from method
D XXXX and DYYYY-yy may be considered to be statistically indis-
tinguishable are: (list applicable sample types and property ranges
here).
7.2.1.2 If the finding is A1, for property range where R does not meet the requirement listed above, report the following:
X
No bias-correction considered in Practice D6708 can further improve
the agreement between results from Test Method D XXXX and Test
Method D YYYY-yy for the materials studied (reference Research
Report ZZZZ). No sample-specific bias, as defined in Practice
D6708, was observed for the materials and property range listed
below. (List sample types and property ranges for above findings
here.)
Differences between results from Test Method D XXXX and Test
Method D YYYY-yy, for the sample types and property ranges
studied, are expected to exceed the following between methods re-
producibility (R ), as defined in Practice D6708, about 5 % of the
XY
time. (Report the between methods reproducibility here.)
7.2.1.3 If the finding is A2, report the following:
D6708 − 24
No bias-correction considered in Practice D6708 can further improve
the agreement between results from Test Method D XXXX and Test
Method D YYYY-yy for the material types and property range listed
below (reference Research Report ZZZZ). Sample-specific bias, as
defined in Practice D6708, was observed for some samples. (List
sample types and property ranges for above findings here.)
Differences between results from Test Method D XXXX and Test
Method D YYYY-yy, for the sample types and property ranges studied,
are expected to exceed the following between methods reproducibility
(R ), as defined in Practice D6708, about 5 % of the time. (Report the
XY
between methods reproducibility here.)
As a consequence of sample-specific biases, R may exceed the
XY
reproducibility for Test Method D XXXX (R ), or reproducibility for
X
Test Method D YYYY-yy (R ), or both. Users intending to use Test
Y
Method D XXXX as a predictor of Test Method D YYYY-yy, or vice
versa, are advised to assess the required degree of prediction
agreement relative to the estimated R to determine the fitness-for-
XY
use of the prediction.
7.2.1.4 If the finding is A3, and R estimated with at least 30 degrees of freedom, is less than or equal to 1.2 published R , report
X Y
the following for property range where R satisfies the afor
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...