ASTM D7915-22
(Practice)Standard Practice for Application of Generalized Extreme Studentized Deviate (GESD) Technique to Simultaneously Identify Multiple Outliers in a Data Set
Standard Practice for Application of Generalized Extreme Studentized Deviate (GESD) Technique to Simultaneously Identify Multiple Outliers in a Data Set
SIGNIFICANCE AND USE
3.1 The GESD procedure can be used to simultaneously identify up to a pre-determined number of outliers (r) in a data set, without having to pre-examine the data set and make a priori decisions as to the location and number of potential outliers.
3.2 The GESD procedure is robust to masking. Masking describes the phenomenon where the existence of multiple outliers can prevent an outlier identification procedure from declaring any of the observations in a data set to be outliers.
3.3 The GESD procedure is automation-friendly, and hence can easily be programmed as automated computer algorithms.
SCOPE
1.1 This practice provides a step by step procedure for the application of the Generalized Extreme Studentized Deviate (GESD) Many-Outlier Procedure to simultaneously identify multiple outliers in a data set. (See Bibliography.)
1.2 This practice is applicable to a data set comprising observations that is represented on a continuous numerical scale.
1.3 This practice is applicable to a data set comprising a minimum of six observations.
1.4 This practice is applicable to a data set where the normal (Gaussian) model is reasonably adequate for the distributional representation of the observations in the data set.
1.5 The probability of false identification of outliers associated with the decision criteria set by this practice is 0.01.
1.6 It is recommended that the execution of this practice be conducted under the guidance of personnel familiar with the statistical principles and assumptions associated with the GESD technique.
1.7 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.
1.8 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
General Information
Buy Standard
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: D7915 − 22 An American National Standard
Standard Practice for
Application of Generalized Extreme Studentized Deviate
(GESD) Technique to Simultaneously Identify Multiple
1
Outliers in a Data Set
This standard is issued under the fixed designation D7915; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision.Anumber in parentheses indicates the year of last reapproval.A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope* 2.1.1 outlier, n—anobservation(orasubsetofobservations)
whichappearstobeinconsistentwiththeremainderofthedata
1.1 This practice provides a step by step procedure for the
set.
application of the Generalized Extreme Studentized Deviate
(GESD) Many-Outlier Procedure to simultaneously identify
3. Significance and Use
multiple outliers in a data set. (See Bibliography.)
3.1 The GESD procedure can be used to simultaneously
1.2 This practice is applicable to a data set comprising
identify up to a pre-determined number of outliers (r) in a data
observations that is represented on a continuous numerical
set, without having to pre-examine the data set and make a
scale.
priori decisions as to the location and number of potential
1.3 This practice is applicable to a data set comprising a
outliers.
minimum of six observations.
3.2 The GESD procedure is robust to masking. Masking
1.4 Thispracticeisapplicabletoadatasetwherethenormal
describes the phenomenon where the existence of multiple
(Gaussian) model is reasonably adequate for the distributional
outliers can prevent an outlier identification procedure from
representation of the observations in the data set.
declaring any of the observations in a data set to be outliers.
1.5 The probability of false identification of outliers asso-
3.3 The GESD procedure is automation-friendly, and hence
ciated with the decision criteria set by this practice is 0.01.
can easily be programmed as automated computer algorithms.
1.6 It is recommended that the execution of this practice be
conducted under the guidance of personnel familiar with the 4. Procedure
statistical principles and assumptions associated with the
4.1 Specifythemaximumnumberofoutliers(r)inadataset
GESD technique.
to be identified. This is the number of cycles required to be
1.7 This standard does not purport to address all of the
executed (see 4.2) for the identification of up to r outliers.
safety concerns, if any, associated with its use. It is the
4.1.1 The recommended maximum number of outliers (r)
responsibility of the user of this standard to establish appro-
by this practice is two (2) for data sets with six to twelve
priate safety, health, and environmental practices and deter-
observations.
mine the applicability of regulatory limitations prior to use.
4.1.2 For data sets with more than twelve observations, the
1.8 This international standard was developed in accor-
recommended maximum number of outliers (r) is the lesser of
dance with internationally recognized principles on standard-
ten (10) or 20%.
ization established in the Decision on Principles for the
4.1.3 The recommended values for r in 4.1.1 and 4.1.2 are
Development of International Standards, Guides and Recom-
not intended to be mandatory. Users can specify other values
mendations issued by the World Trade Organization Technical
based on their specific needs.
Barriers to Trade (TBT) Committee.
4.2 Set the current cycle number cto1(c = 1).
4.2.1 Assign the original data set to be assessed (in 4.1)as
2. Terminology
the data set for the current cycle 1 and label it as DTS .
1
2.1 Definitions of Terms Specific to This Standard:
4.3 Compute test statistic T for each observation in the data
set assigned to the current cycle (DTS ) as follows:
c
1
This practice is under the jurisdiction ofASTM Committee D02 on Petroleum
Products, Liquid Fuels, and Lubricants and is the direct responsibility of Subcom-
T 5 |x 2 x¯|⁄s (1)
mittee D02.94 on Coordinating Subcommittee on QualityAssurance and Statistics.
Current edition approved May 1, 2022. Published May 2022. Originally where:
approved in 1988. Last previous edition approved in 2018 as D7915–18. DOI:
x = an observation in the data set,
10.1520/D7915-22.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1
---------------------- Page: 1 ----------------------
D7915 − 22
(DTS ) and all observations associated with maximum T from
x¯ = average calculated using all observations in the data set,
c
DTS to DTS a
...
This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: D7915 − 18 D7915 − 22 An American National Standard
Standard Practice for
Application of Generalized Extreme Studentized Deviate
(GESD) Technique to Simultaneously Identify Multiple
1
Outliers in a Data Set
This standard is issued under the fixed designation D7915; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope*
1.1 This practice provides a step by step procedure for the application of the Generalized Extreme Studentized Deviate (GESD)
Many-Outlier Procedure to simultaneously identify multiple outliers in a data set. (See Bibliography.)
1.2 This practice is applicable to a data set comprising observations that is represented on a continuous numerical scale.
1.3 This practice is applicable to a data set comprising a minimum of six observations.
1.4 This practice is applicable to a data set where the normal (Gaussian) model is reasonably adequate for the distributional
representation of the observations in the data set.
1.5 The probability of false identification of outliers associated with the decision criteria set by this practice is 0.01.
1.6 It is recommended that the execution of this practice be conducted under the guidance of personnel familiar with the statistical
principles and assumptions associated with the GESD technique.
1.7 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of
regulatory limitations prior to use.
1.8 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Terminology
2.1 Definitions of Terms Specific to This Standard:
2.1.1 outlier, n—an observation (or a subset of observations) which appears to be inconsistent with the remainder of the data set.
1
This practice is under the jurisdiction of ASTM Committee D02 on Petroleum Products, Liquid Fuels, and Lubricants and is the direct responsibility of Subcommittee
D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.
Current edition approved July 1, 2018May 1, 2022. Published August 2018May 2022. Originally approved in 1988. Last previous edition approved in 20142018 as
D7915 – 14.D7915 – 18. DOI: 10.1520/D7915-18.10.1520/D7915-22.
*A Summary of Changes section appears at the end of this standard
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1
---------------------- Page: 1 ----------------------
D7915 − 22
3. Significance and Use
3.1 The GESD procedure can be used to simultaneously identify up to a pre-determined number of outliers (r) in a data set,
without having to pre-examine the data set and make a priori decisions as to the location and number of potential outliers.
3.2 The GESD procedure is robust to masking. Masking describes the phenomenon where the existence of multiple outliers can
prevent an outlier identification procedure from declaring any of the observations in a data set to be outliers.
3.3 The GESD procedure is automation-friendly, and hence can easily be programmed as automated computer algorithms.
4. Procedure
4.1 Specify the maximum number of outliers (r) in a data set to be identified. This is the number of cycles required to be executed
(see 4.2) for the identification of up to r outliers.
4.1.1 The recommended maximum number of outliers (r) by this practice is two (2) for data sets with six to twelve observations.
4.1.2 For data sets with more than twelve observations, the recommended maximum number of outliers (r) is the lesser of ten (10)
or 20 %.
4.1.3 The recommended values for r in 4.1.1 and 4.1.2 are not intended to be mandatory. Users can specify other values based
on their specific needs.
4.2 Set the current cycle number c to 1 (c = 1).
4.2.1 Assign the original data set to be assessed (in 4.1) as the data set for the current cycle 1 and label it as DTS .
1
4.3 Compute test statistic T for each observation in the data set assigned to the current cycl
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.