Standard Practice for Application of Generalized Extreme Studentized Deviate (GESD) Technique to Simultaneously Identify Multiple Outliers in a Data Set

SIGNIFICANCE AND USE
3.1 The GESD procedure can be used to simultaneously identify up to a pre-determined number of outliers (r) in a data set, without having to pre-examine the data set and make a priori decisions as to the location and number of potential outliers.  
3.2 The GESD procedure is robust to masking. Masking describes the phenomenon where the existence of multiple outliers can prevent an outlier identification procedure from declaring any of the observations in a data set to be outliers.  
3.3 The GESD procedure is automation-friendly, and hence can easily be programmed as automated computer algorithms.
SCOPE
1.1 This practice provides a step by step procedure for the application of the Generalized Extreme Studentized Deviate (GESD) Many-Outlier Procedure to simultaneously identify multiple outliers in a data set. (See Bibliography.)  
1.2 This practice is applicable to a data set comprising observations that is represented on a continuous numerical scale.  
1.3 This practice is applicable to a data set comprising a minimum of six observations.  
1.4 This practice is applicable to a data set where the normal (Gaussian) model is reasonably adequate for the distributional representation of the observations in the data set.  
1.5 The probability of false identification of outliers associated with the decision criteria set by this practice is 0.01.  
1.6 It is recommended that the execution of this practice be conducted under the guidance of personnel familiar with the statistical principles and assumptions associated with the GESD technique.  
1.7 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.

General Information

Status
Historical
Publication Date
30-Apr-2014
Current Stage
Ref Project

Relations

Buy Standard

Standard
ASTM D7915-14 - Standard Practice for Application of Generalized Extreme Studentized Deviate (GESD) Technique to Simultaneously Identify Multiple Outliers in a Data Set
English language
5 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)

NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: D7915 − 14 An American National Standard
Standard Practice for
Application of Generalized Extreme Studentized Deviate
(GESD) Technique to Simultaneously Identify Multiple
1
Outliers in a Data Set
This standard is issued under the fixed designation D7915; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope set, without having to pre-examine the data set and make a
priori decisions as to the location and number of potential
1.1 This practice provides a step by step procedure for the
outliers.
application of the Generalized Extreme Studentized Deviate
(GESD) Many-Outlier Procedure to simultaneously identify 3.2 The GESD procedure is robust to masking. Masking
multiple outliers in a data set. (See Bibliography.) describes the phenomenon where the existence of multiple
outliers can prevent an outlier identification procedure from
1.2 This practice is applicable to a data set comprising
declaring any of the observations in a data set to be outliers.
observations that is represented on a continuous numerical
scale. 3.3 The GESD procedure is automation-friendly, and hence
can easily be programmed as automated computer algorithms.
1.3 This practice is applicable to a data set comprising a
minimum of six observations.
4. Procedure
1.4 Thispracticeisapplicabletoadatasetwherethenormal
4.1 Specifythemaximumnumberofoutliers(r)inadataset
(Gaussian) model is reasonably adequate for the distributional
to be identified.
representation of the observations in the data set.
4.1.1 The recommended maximum number of outliers (r)
by this practice is two (2) for data sets with six to twelve
1.5 The probability of false identification of outliers asso-
observations.
ciated with the decision criteria set by this practice is 0.01.
4.1.2 For data sets with more than twelve observations, the
1.6 It is recommended that the execution of this practice be
recommended maximum number of outliers (r) is the lesser of
conducted under the guidance of personnel familiar with the
ten or 20 %.
statistical principles and assumptions associated with the
4.1.3 The recommended values for r in 4.1.1 and 4.1.2 are
GESD technique.
not intended to be mandatory. Users can specify other values
1.7 This standard does not purport to address all of the
based on their specific needs.
safety concerns, if any, associated with its use. It is the
4.2 Compute test statistic T for each observation in the
responsibility of the user of this standard to establish appro-
initial starting data set (DTS ) as follows:
0
priate safety and health practices and determine the applica-
T 5 |x 2 x¯|⁄s (1)
bility of regulatory limitations prior to use.
where:
2. Terminology
x = an observation in the data set,
2.1 Definitions of Terms Specific to This Standard:
x¯ = average calculated using all observations in the data set,
2.1.1 outlier, n—anobservation(orasubsetofobservations)
and
which appears to be inconsistent with the remainder of the data
s = sample standard deviation calculated using all observa-
set.
tions in the data set.
3. Significance and Use
4.3 Remove the observation in the data set with the largest
absolute magnitude of the test statistic T and form a reduced
3.1 The GESD procedure can be used to simultaneously
data set (DTS), where i = number of observations removed
i
identify up to a pre-determined number of outliers (r) in a data
from the initial data set.
1 4.4 Re-calculate T for all observations in the reduced data
This practice is under the jurisdiction of ASTM Committee D02 on Petroleum
Products, Liquid Fuels, and Lubricants and is the direct responsibility of Subcom-
set from 4.3.
mittee D02.94 on Coordinating Subcommittee on Quality Assurance and Statistics.
4.5 Repeat steps 4.3 to 4.4 until r number of observations
Current edition approved May 1, 2014. Published June 2014. DOI: 10.1520/
D7915-14. have been removed from the initial data set. That is, until
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
1

---------------------- Page: 1 ----------------------
D7915 − 14
calculationofall T’sforallobservationsinthereduceddataset 5. Worked Example
DTS has been completed.
r
5.1 Listed below is a data set comprising 30 observations:
4.6 Compare the maximum T computed in each data set
35.0 36.6 34.7 36.2 37.0 25.3 37.2 41.3 26.0 24.6
33.5 35.5 35.4 39.9 39.2 36.6 37.2 33.2 34.0 35.7
(DTS toDTS )toacriticalvalueλ associatedthedataset
0 r critical
39.2 42.1 35.7 40.2 36.6 41.1 41.1 39.1 40.6 41.3
DTS, where λ is chosen based on a false identification
i
5.1.1 The total number of
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.