ASTM G16-95(2010)
(Guide)Standard Guide for Applying Statistics to Analysis of Corrosion Data
Standard Guide for Applying Statistics to Analysis of Corrosion Data
SIGNIFICANCE AND USE
Corrosion test results often show more scatter than many other types of tests because of a variety of factors, including the fact that minor impurities often play a decisive role in controlling corrosion rates. Statistical analysis can be very helpful in allowing investigators to interpret such results, especially in determining when test results differ from one another significantly. This can be a difficult task when a variety of materials are under test, but statistical methods provide a rational approach to this problem.
Modern data reduction programs in combination with computers have allowed sophisticated statistical analyses on data sets with relative ease. This capability permits investigators to determine if associations exist between many variables and, if so, to develop quantitative expressions relating the variables.
Statistical evaluation is a necessary step in the analysis of results from any procedure which provides quantitative information. This analysis allows confidence intervals to be estimated from the measured results.
SCOPE
1.1 This guide covers and presents briefly some generally accepted methods of statistical analyses which are useful in the interpretation of corrosion test results.
1.2 This guide does not cover detailed calculations and methods, but rather covers a range of approaches which have found application in corrosion testing.
1.3 Only those statistical methods that have found wide acceptance in corrosion testing have been considered in this guide.
1.4 The values stated in SI units are to be regarded as standard. No other units of measurement are included in this standard.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: G16 − 95(Reapproved 2010)
Standard Guide for
Applying Statistics to Analysis of Corrosion Data
ThisstandardisissuedunderthefixeddesignationG16;thenumberimmediatelyfollowingthedesignationindicatestheyearoforiginal
adoptionor,inthecaseofrevision,theyearoflastrevision.Anumberinparenthesesindicatestheyearoflastreapproval.Asuperscript
epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope of materials are under test, but statistical methods provide a
rational approach to this problem.
1.1 This guide covers and presents briefly some generally
acceptedmethodsofstatisticalanalyseswhichareusefulinthe 3.2 Modern data reduction programs in combination with
interpretation of corrosion test results. computers have allowed sophisticated statistical analyses on
data sets with relative ease. This capability permits investiga-
1.2 This guide does not cover detailed calculations and
tors to determine if associations exist between many variables
methods, but rather covers a range of approaches which have
and, if so, to develop quantitative expressions relating the
found application in corrosion testing.
variables.
1.3 Only those statistical methods that have found wide
3.3 Statistical evaluation is a necessary step in the analysis
acceptance in corrosion testing have been considered in this
of results from any procedure which provides quantitative
guide.
information. This analysis allows confidence intervals to be
1.4 The values stated in SI units are to be regarded as
estimated from the measured results.
standard. No other units of measurement are included in this
4. Errors
standard.
4.1 Distributions—In the measurement of values associated
2. Referenced Documents
withthecorrosionofmetals,avarietyoffactorsacttoproduce
2.1 ASTM Standards:
measured values that deviate from expected values for the
E178Practice for Dealing With Outlying Observations
conditions that are present. Usually the factors which contrib-
E691Practice for Conducting an Interlaboratory Study to
utetotheerrorofmeasuredvaluesactinamoreorlessrandom
Determine the Precision of a Test Method
way so that the average of several values approximates the
G46Guide for Examination and Evaluation of Pitting Cor-
expected value better than a single measurement. The pattern
rosion
in which data are scattered is called its distribution, and a
IEEE/ASTM SI 10American National Standard for Use of
variety of distributions are seen in corrosion work.
theInternationalSystemofUnits(SI):TheModernMetric
4.2 Histograms—A bar graph called a histogram may be
System
used to display the scatter of the data. A histogram is
constructed by dividing the range of data values into equal
3. Significance and Use
intervals on the abscissa axis and then placing a bar over each
3.1 Corrosion test results often show more scatter than
interval of a height equal to the number of data points within
many other types of tests because of a variety of factors,
thatinterval.Thenumberofintervalsshouldbefewenoughso
including the fact that minor impurities often play a decisive
that almost all intervals contain at least three points, however
role in controlling corrosion rates. Statistical analysis can be
there should be a sufficient number of intervals to facilitate
very helpful in allowing investigators to interpret such results,
visualization of the shape and symmetry of the bar heights.
especially in determining when test results differ from one
Twenty intervals are usually recommended for a histogram.
anothersignificantly.Thiscanbeadifficulttaskwhenavariety
Because so many points are required to construct a histogram,
it is unusual to find data sets in corrosion work that lend
themselves to this type of analysis.
This guide is under the jurisdiction ofASTM Committee G01 on Corrosion of
Metals and is the direct responsibility of Subcommittee G01.05 on Laboratory
4.3 Normal Distribution—Many statistical techniques are
Corrosion Tests.
based on the normal distribution. This distribution is bell-
Current edition approved Feb. 1, 2010. Published March 2010. Originally
shapedandsymmetrical.Useofanalysistechniquesdeveloped
approved in 1971. Last previous edition approved in 2004 as G16–95(2004). DOI:
10.1520/G0016-95R10.
for the normal distribution on data distributed in another
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
mannercanleadtogrosslyerroneousconclusions.Thus,before
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
attempting data analysis, the data should either be verified as
Standards volume information, refer to the standard’s Document Summary page on
the ASTM website. being scattered like a normal distribution, or a transformation
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
G16 − 95 (2010)
should be used to obtain a data set which is approximately nonlinearity, a transformation may be used to obtain a new,
normally distributed. Transformed data may be analyzed sta- transformed data set that may be normally distributed. Al-
tistically and the results transformed back to give the desired though it is sometimes possible to guess at the type of
results, although the process of transforming the data back can distributionbylookingatthehistogram,andthusdeterminethe
createproblemsintermsofnothavingsymmetricalconfidence exacttransformationtobeused,itisusuallyjustaseasytouse
intervals. a computer to calculate a number of different transformations
and to check each for the normality of the transformed data.
4.4 Normal Probability Paper—If the histogram is not
Some transformations based on known non-normal
confirmatory in terms of the shape of the distribution, the data
distributions, or that have been found to work in some
may be examined further to see if it is normally distributed by
situations, are listed as follows:
constructing a normal probability plot as described as follows
y=logxy = exp x
(1).
y = x
y5 x
œ
4.4.1 It is easiest to construct a normal probability plot if
y=1/x
y5sin x/n
œ
normalprobabilitypaperisavailable.Thispaperhasonelinear
where:
axis, and one axis which is arranged to reflect the shape of the
cumulative area under the normal distribution. In practice, the y = transformed datum,
“probability” axis has 0.5 or 50% at the center, a number x = original datum, and
n = number of data points.
approaching 0 percent at one end, and a number approaching
1.0 or 100% at the other end. The marks are spaced far apart
Time to failure in stress corrosion cracking usually is best
in the center and close together at the ends. A normal
fitted with a log x transformation (2, 3).
probability plot may be constructed as follows with normal
Once a set of transformed data is found that yields an
probability paper.
approximately straight line on a probability plot, the statistical
NOTE 1—Data that plot approximately on a straight line on the
procedures of interest can be carried out on the transformed
probability plot may be considered to be normally distributed. Deviations
data. Results, such as predicted data values or confidence
from a normal distribution may be recognized by the presence of
intervals, must be transformed back using the reverse transfor-
deviationsfromastraightline,usuallymostnoticeableattheextremeends
mation.
of the data.
4.4.1.1 Number the data points starting at the largest nega- 4.6 Unknown Distribution—If there are insufficient data
tive value and proceeding to the largest positive value. The
points, or if for any other reason, the distribution type of the
numbersofthedatapointsthusobtainedarecalledtheranksof data cannot be determined, then two possibilities exist for
the points.
analysis:
4.4.1.2 Ploteachpointonthenormalprobabilitypapersuch
4.6.1 Adistribution type may be hypothesized based on the
that when the data are arranged in order: y (1), y (2), y (3), .,
behavior of similar types of data. If this distribution is not
these values are called the order statistics; the linear axis
normal, a transformation may be sought which will normalize
reflectsthevalueofthedata,whiletheprobabilityaxislocation
that particular distribution. See 4.5 above for suggestions.
is calculated by subtracting 0.5 from the number (rank) of that
Analysis may then be conducted on the transformed data.
pointanddividingbythetotalnumberofpointsinthedataset.
4.6.2 Statistical analysis procedures that do not require any
NOTE 2—Occasionally two or more identical values are obtained in a specific data distribution type, known as non-parametric
setofresults.Inthiscase,eachpointmaybeplotted,oracompositepoint
methods,maybeusedtoanalyzethedata.Non-parametrictests
may be located at the average of the plotting positions for all the identical
do not use the data as efficiently.
values.
4.7 Extreme Value Analysis—In the case of determining the
4.4.2 If normal probability paper is not available, the
probability of perforation by a pitting or cracking mechanism,
location of each point on the probability plot may be deter-
the usual descriptive statistics for the normal distribution are
mined as follows:
not the most useful. In this case, Guide G46 should be
4.4.2.1 Mark the probability axis using linear graduations
consulted for the procedure (4).
from 0.0 to 1.0.
4.4.2.2 Foreachpoint,subtract0.5fromtherankanddivide
4.8 Significant Digits—IEEE/ASTM SI 10 should be fol-
the result by the total number of points in the data set. This is
lowed to determine the proper number of significant digits
the area to the left of that value under the standardized normal
when reporting numerical results.
distribution. The cumulative distribution function is the
4.9 Propagation of Variance—If a calculated value is a
number, always between 0 and 1, that is plotted on the
function of several independent variables and those variables
probability axis.
have errors associated with them, the error of the calculated
4.4.2.3 Thevalueofthedatapointdefinesitslocationonthe
valuecanbeestimatedbyapropagationofvariancetechnique.
other axis of the graph.
See Refs (5) and (6) for details.
4.5 Other Probability Paper—If the histogram is not sym-
metrical and bell-shaped, or if the probability plot shows 4.10 Mistakes—Mistakes either in carrying out an experi-
mentorincalculationsarenotacharacteristicofthepopulation
and can preclude statistical treatment of data or lead to
The boldface numbers in parentheses refer to a list of references at the end of
this standard. erroneous conclusions if included in the analysis. Sometimes
G16 − 95 (2010)
mistakes can be identified by statistical methods by recogniz-
d = the difference between the average and the measured
ing that the probability of obtaining a particular result is very
value,
low.
n−1 = the degrees of freedom available.
4.11 Outlying Observations—See Practice E178 for proce-
Varianceisausefulmeasurebecauseitisadditiveinsystems
dures for dealing with outlying observations.
that can be described by a normal distribution, however, the
dimensionsofvariancearesquareofunits.Aprocedureknown
5. Central Measures
as analysis of variance (ANOVA) has been developed for data
sets involving several factors at different levels in order to
5.1 It is accepted practice to employ several independent
estimate the effects of these factors. (See Section 9.)
(replicate) measurements of any experimental quantity to
improvetheestimateofprecisionandtoreducethevarianceof
6.3 Standard Deviation—Standard deviation, σ, is defined
the average value. If it is assumed that the processes operating
asthesquarerootofthevariance.Ithasthepropertyofhaving
tocreateerrorinthemeasurementarerandominnatureandare
the same dimensions as the average value and the original
as likely to overestimate the true unknown value as to
measurements from which it was calculated and is generally
underestimate it, then the average value is the best estimate of
used to describe the scatter of the observations.
the unknown value in question. The average value is usually
6.3.1 Standard Deviation of the Average—The standard
indicated by placing a bar over the symbol representing the
deviation of an average, Sx¯, is different from the standard
measured variable.
deviation of a single measured value, but the two standard
deviations are related as in (Eq 2):
NOTE 3—In this standard, the term “mean” is reserved to describe a
central measure of a population, while average refers to a sample.
S
Sx¯ 5 (2)
5.2 If processes operate to exaggerate the magnitude of the
=n
error either in overestimating or underestimating the correct
where:
measurement, then the median value is usually a better
estimate.
n = the total number of measurements which were used to
calculate the average value.
5.3 If the processes operating to create error affect both the
probability and magnitude of the error, then other approaches
When reporting standard deviation calculations, it is impor-
must be employed to find the best estimation procedure. A
tant to note clearly whether the value reported is the standard
qualified statistician should be consulted in this case.
deviationoftheaverageorofasinglevalue.Ineithercase,the
number of measurements should also be reported. The sample
5.4 Incorrosiontesting,itisgenerallyobservedthataverage
estimate of the standard deviation is s.
values are useful in characterizing corrosion rates. In cases of
penetration from pitting and cracking, failure is often defined
6.4 Coeffıcient of Variation—The population coefficient of
as the first through penetration and in these cases, average
variation is defined as the standard deviation divided by the
penetration rates or times are of little value. Extreme value
mean.Thesamplecoefficientofvariationmaybecalculatedas
analysis has been used in these cases, see Guide G46.
S/x¯ and is usually reported in percent. This measure of
variability is particularly useful in cases where the size of the
5.5 Whentheaveragevalueiscalculatedandreportedasthe
errors is proportional to the magnitude of the measured value
only result in experiments when several replicate runs were
so that the coefficient of variation is approximately constant
made, information on the scatter of data is lost.
over a wide range of values.
6. Variability Measures
6.5 Range—The range is defined as the difference between
the maximum and minimum values in a set of replicate data
6.1 Severalmeasuresofdistributionvariabilityareavailable
values. The range is non-parametric in nature, that is, its
which can be useful in estimating confidence intervals and
calculation makes no assumption about the distribution of
making predictions from the observed data. In the case of
error. In cases when small numbers of replicate values are
norma
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.