ASTM G16-95(1999)e1
(Guide)Standard Guide for Applying Statistics to Analysis of Corrosion Data
Standard Guide for Applying Statistics to Analysis of Corrosion Data
SCOPE
1.1 This guide presents briefly some generally accepted methods of statistical analyses which are useful in the interpretation of corrosion test results.
1.2 This guide does not cover detailed calculations and methods, but rather covers a range of approaches which have found application in corrosion testing.
1.3 Only those statistical methods that have found wide acceptance in corrosion testing have been considered in this guide.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
e1
Designation: G 16 – 95 (Reapproved 1999)
Standard Guide for
Applying Statistics to Analysis of Corrosion Data
This standard is issued under the fixed designation G 16; the number immediately following the designation indicates the year of original
adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A superscript
epsilon (e) indicates an editorial change since the last revision or reapproval.
e NOTE—Section 10 was addeded editorially in October 1999.
1. Scope data sets with relative ease. This capability permits investiga-
tors to determine if associations exist between many variables
1.1 This guide presents briefly some generally accepted
and, if so, to develop quantitative expressions relating the
methods of statistical analyses which are useful in the inter-
variables.
pretation of corrosion test results.
3.3 Statistical evaluation is a necessary step in the analysis
1.2 This guide does not cover detailed calculations and
of results from any procedure which provides quantitative
methods, but rather covers a range of approaches which have
information. This analysis allows confidence intervals to be
found application in corrosion testing.
estimated from the measured results.
1.3 Only those statistical methods that have found wide
acceptance in corrosion testing have been considered in this
4. Errors
guide.
4.1 Distributions—In the measurement of values associated
2. Referenced Documents with the corrosion of metals, a variety of factors act to produce
measured values that deviate from expected values for the
2.1 ASTM Standards:
conditions that are present. Usually the factors which contrib-
E 178 Practice for Dealing with Outlying Observations
ute to the error of measured values act in a more or less random
E 380 Practice for Use of the International System of Units
way so that the average of several values approximates the
(SI) (the Modernized Metric System)
expected value better than a single measurement. The pattern
E 691 Practice for Conducting an Interlaboratory Study to
in which data are scattered is called its distribution, and a
Determine the Precision of a Test Method
variety of distributions are seen in corrosion work.
G 46 Guide for Examination and Evaluation of Pitting
3 4.2 Histograms—A bar graph called a histogram may be
Corrosion
used to display the scatter of the data. A histogram is
3. Significance and Use constructed by dividing the range of data values into equal
intervals on the abscissa axis and then placing a bar over each
3.1 Corrosion test results often show more scatter than
interval of a height equal to the number of data points within
many other types of tests because of a variety of factors,
that interval. The number of intervals should be few enough so
including the fact that minor impurities often play a decisive
that almost all intervals contain at least three points, however
role in controlling corrosion rates. Statistical analysis can be
there should be a sufficient number of intervals to facilitate
very helpful in allowing investigators to interpret such results,
visualization of the shape and symmetry of the bar heights.
especially in determining when test results differ from one
Twenty intervals are usually recommended for a histogram.
another significantly. This can be a difficult task when a variety
Because so many points are required to construct a histogram,
of materials are under test, but statistical methods provide a
it is unusual to find data sets in corrosion work that lend
rational approach to this problem.
themselves to this type of analysis.
3.2 Modern data reduction programs in combination with
4.3 Normal Distribution—Many statistical techniques are
computers have allowed sophisticated statistical analyses on
based on the normal distribution. This distribution is bell-
shaped and symmetrical. Use of analysis techniques developed
This guide is under the jurisdiction of ASTM Committee G-1 on Corrosion of
for the normal distribution on data distributed in another
Metals and is the direct responsibility of Subcommittee G01.05on Laboratory
manner can lead to grossly erroneous conclusions. Thus, before
Corrosion Tests.
Current edition approved Jan. 15, 1995. Published March 1995. Originally attempting data analysis, the data should either be verified as
published as G 16 – 71. Last previous edition G 16 – 94.
being scattered like a normal distribution, or a transformation
Annual Book of ASTM Standards, Vol 14.02.
should be used to obtain a data set which is approximately
Annual Book of ASTM Standards, Vol 03.02.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
G16
normally distributed. Transformed data may be analyzed sta- nonlinearity, a transformation may be used to obtain a new,
tistically and the results transformed back to give the desired transformed data set that may be normally distributed. Al-
results, although the process of transforming the data back can though it is sometimes possible to guess at the type of
create problems in terms of not having symmetrical confidence distribution by looking at the histogram, and thus determine the
intervals. exact transformation to be used, it is usually just as easy to use
4.4 Normal Probability Paper—If the histogram is not a computer to calculate a number of different transformations
confirmatory in terms of the shape of the distribution, the data and to check each for the normality of the transformed data.
may be examined further to see if it is normally distributed by Some transformations based on known non-normal distribu-
constructing a normal probability plot as described as follows tions, or that have been found to work in some situations, are
(1). listed as follows:
4.4.1 It is easiest to construct a normal probability plot if
y = logxy = exp x
y = =xy = x
normal probability paper is available. This paper has one linear
−1
y =1/xy = sin x/n
=
axis, and one axis which is arranged to reflect the shape of the
cumulative area under the normal distribution. In practice, the
where:
“probability” axis has 0.5 or 50 % at the center, a number
y = transformed datum,
approaching 0 percent at one end, and a number approaching x = original datum, and
n = number of data points.
1.0 or 100 % at the other end. The marks are spaced far apart
in the center and close together at the ends. A normal
Time to failure in stress corrosion cracking usually is best
probability plot may be constructed as follows with normal
fitted with a log x transformation (2, 3).
probability paper.
Once a set of transformed data is found that yields an
approximately straight line on a probability plot, the statistical
NOTE 1—Data that plot approximately on a straight line on the
procedures of interest can be carried out on the transformed
probability plot may be considered to be normally distributed. Deviations
data. Results, such as predicted data values or confidence
from a normal distribution may be recognized by the presence of
deviations from a straight line, usually most noticeable at the extreme ends intervals, must be transformed back using the reverse transfor-
of the data.
mation.
4.6 Unknown Distribution—If there are insufficient data
4.4.1.1 Number the data points starting at the largest nega-
tive value and proceeding to the largest positive value. The points, or if for any other reason, the distribution type of the
data cannot be determined, then two possibilities exist for
numbers of the data points thus obtained are called the ranks of
the points. analysis:
4.4.1.2 Plot each point on the normal probability paper such 4.6.1 A distribution type may be hypothesized based on the
that when the data are arranged in order: y (1), y (2), y (3), .,
behavior of similar types of data. If this distribution is not
these values are called the order statistics; the linear axis normal, a transformation may be sought which will normalize
reflects the value of the data, while the probability axis location
that particular distribution. See 4.5 above for suggestions.
is calculated by subtracting 0.5 from the number (rank) of that
Analysis may then be conducted on the transformed data.
point and dividing by the total number of points in the data set.
4.6.2 Statistical analysis procedures that do not require any
specific data distribution type, known as non-parametric meth-
NOTE 2—Occasionally two or more identical values are obtained in a
ods, may be used to analyze the data. Non-parametric tests do
set of results. In this case, each point may be plotted, or a composite point
may be located at the average of the plotting positions for all the identical not use the data as efficiently.
values.
4.7 Extreme Value Analysis—In the case of determining the
4.4.2 If normal probability paper is not available, the probability of perforation by a pitting or cracking mechanism,
the usual descriptive statistics for the normal distribution are
location of each point on the probability plot may be deter-
mined as follows: not the most useful. In this case, Guide G 46 should be
consulted for the procedure (4).
4.4.2.1 Mark the probability axis using linear graduations
from 0.0 to 1.0.
4.8 Significant Digits—Practice E 380 should be followed
4.4.2.2 For each point, subtract 0.5 from the rank and divide to determine the proper number of significant digits when
the result by the total number of points in the data set. This is
reporting numerical results.
the area to the left of that value under the standardized normal
4.9 Propagation of Variance—If a calculated value is a
distribution. The cumulative distribution function is the num-
function of several independent variables and those variables
ber, always between 0 and 1, that is plotted on the probability
have errors associated with them, the error of the calculated
axis.
value can be estimated by a propagation of variance technique.
4.4.2.3 The value of the data point defines its location on the
See Refs. (5) and (6) for details.
other axis of the graph.
4.10 Mistakes—Mistakes either in carrying out an experi-
4.5 Other Probability Paper—If the histogram is not sym-
ment or in calculations are not a characteristic of the population
metrical and bell-shaped, or if the probability plot shows
and can preclude statistical treatment of data or lead to
erroneous conclusions if included in the analysis. Sometimes
mistakes can be identified by statistical methods by recogniz-
ing that the probability of obtaining a particular result is very
The boldface numbers in parentheses refer to the list of references at the end of
this guide. low.
G16
4.11 Outlying Observations—See Practice E 178 for proce- dimensions of variance are square of units. A procedure known
dures for dealing with outlying observations. as analysis of variance (ANOVA) has been developed for data
sets involving several factors at different levels in order to
5. Central Measures
estimate the effects of these factors. (See Section 9.)
5.1 It is accepted practice to employ several independent
6.3 Standard Deviation—Standard deviation, s, is defined
(replicate) measurements of any experimental quantity to
as the square root of the variance. It has the property of having
improve the estimate of precision and to reduce the variance of
the same dimensions as the average value and the original
the average value. If it is assumed that the processes operating
measurements from which it was calculated and is generally
to create error in the measurement are random in nature and are
used to describe the scatter of the observations.
as likely to overestimate the true unknown value as to
6.3.1 Standard Deviation of the Average— The standard
underestimate it, then the average value is the best estimate of
deviation of an average, Sx¯, is different from the standard
the unknown value in question. The average value is usually
deviation of a single measured value, but the two standard
indicated by placing a bar over the symbol representing the
deviations are related as in (Eq 2):
measured variable.
S
NOTE 3—In this standard, the term “mean” is reserved to describe a
Sx¯ 5 (2)
n
=
central measure of a population, while average refers to a sample.
5.2 If processes operate to exaggerate the magnitude of the
error either in overestimating or underestimating the correct
where:
measurement, then the median value is usually a better
n = the total number of measurements which were used to
estimate. calculate the average value.
5.3 If the processes operating to create error affect both the
When reporting standard deviation calculations, it is impor-
probability and magnitude of the error, then other approaches
tant to note clearly whether the value reported is the standard
must be employed to find the best estimation procedure. A
deviation of the average or of a single value. In either case, the
qualified statistician should be consulted in this case.
number of measurements should also be reported. The sample
5.4 In corrosion testing, it is generally observed that average
estimate of the standard deviation is s.
values are useful in characterizing corrosion rates. In cases of
6.4 Coeffıcient of Variation—The population coefficient of
penetration from pitting and cracking, failure is often defined
variation is defined as the standard deviation divided by the
as the first through penetration and in these cases, average
mean. The sample coefficient of variation may be calculated as
penetration rates or times are of little value. Extreme value
S/ x¯ and is usually reported in percent. This measure of
analysis has been used in these cases, see Guide G 46.
variability is particularly useful in cases where the size of the
5.5 When the average value is calculated and reported as the
errors is proportional to the magnitude of the measured value
only result in experiments when several replicate runs were
made, information on the scatter of data is lost. so that the coefficient of variation is approximately constant
over a wide range of values.
6. Variability Measures
6.5 Range—The range is defined as the difference between
6.1 Several measures of distribution variability are available
the maximum and minimum values in a set of replicate data
which can be useful in estimating confidence intervals and
values. The range is non-parametric in nature, that is, its
making predictions from the observed data. In the case of
calculation makes no assumption about the distribution of
normal distribution, a number of procedures are available and
error. In cases when small numbers of replicate values are
can be handled with computer programs. These measures
involved and the data are normally distr
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.