Standard Practice for Calculating and Using Basic Statistics

SIGNIFICANCE AND USE
4.1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations. Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable amounts of empirical data.  
4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations, there may be several variables defined for study.  
4.1.2 The sample is selected from a larger set called the population. The population can be a finite set of items, a very large or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic, continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample originates. It is the population that is of primary interest in any particular study.  
4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes, the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the inspection interval ...
SCOPE
1.1 This practice covers methods and equations for computing and presenting basic descriptive statistics using a set of sample data containing a single variable or two variables. This practice includes simple descriptive statistics for variable data, tabular and graphical methods for variable data, and methods for summarizing simple attribute data. Some interpretation and guidance for use is also included.  
1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations of calculation methods. The examples are not binding on products or test methods treated.  
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.

General Information

Status
Historical
Publication Date
31-May-2014
Technical Committee
Current Stage
Ref Project

Relations

Buy Standard

Standard
ASTM E2586-14 - Standard Practice for Calculating and Using Basic Statistics
English language
21 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
REDLINE ASTM E2586-14 - Standard Practice for Calculating and Using Basic Statistics
English language
21 pages
sale 15% off
Preview
sale 15% off
Preview

Standards Content (Sample)


NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E2586 − 14 AnAmerican National Standard
Standard Practice for
Calculating and Using Basic Statistics
This standard is issued under the fixed designation E2586; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope 3.1.1 Unless otherwise noted, terms relating to quality and
statistics are as defined in Terminology E456.
1.1 This practice covers methods and equations for comput-
3.1.2 characteristic, n—a property of items in a sample or
ing and presenting basic descriptive statistics using a set of
population which, when measured, counted, or otherwise
sample data containing a single variable or two variables. This
observed, helps to distinguish among the items. E2282
practice includes simple descriptive statistics for variable data,
tabular and graphical methods for variable data, and methods
3.1.3 coeffıcient of determination, n—square of the correla-
for summarizing simple attribute data. Some interpretation and tion coefficient, r.
guidance for use is also included.
3.1.4 coeffıcient of variation, CV, n—for a nonnegative
1.2 The system of units for this practice is not specified. characteristic, the ratio of the standard deviation to the mean
Dimensional quantities in the practice are presented only as
for a population or sample
illustrations of calculation methods. The examples are not
3.1.4.1 Discussion—The coefficient of variation is often
binding on products or test methods treated.
expressed as a percentage.
3.1.4.2 Discussion—This statistic is also known as the
1.3 This standard does not purport to address all of the
relative standard deviation, RSD.
safety concerns, if any, associated with its use. It is the
responsibility of the user of this standard to establish appro-
3.1.5 confidence bound, n—see confidence limit.
priate safety and health practices and determine the applica-
3.1.6 confidence coeffıcient, n—see confidence level.
bility of regulatory limitations prior to use.
3.1.7 confidence interval, n—an interval estimate [L, U]
2. Referenced Documents with the statistics L and U as limits for the parameter θ and
2 with confidence level 1 – α, where Pr(L ≤θ≤ U) ≥1– α.
2.1 ASTM Standards:
3.1.7.1 Discussion—The confidence level, 1 – α, reflects the
E178 Practice for Dealing With Outlying Observations
proportion of cases that the confidence interval [L, U] would
E456 Terminology Relating to Quality and Statistics
containorcoverthetrueparametervalueinaseriesofrepeated
E2282 Guide for Defining the Test Result of a Test Method
random samples under identical conditions. Once L and U are
2.2 ISO Standards:
given values, the resulting confidence interval either does or
ISO 3534-1 Statistics—Vocabulary and Symbols, part 1:
does not contain it. In this sense "confidence" applies not to the
Probability and General Statistical Terms
particular interval but only to the long run proportion of cases
ISO 3534-2 Statistics—Vocabulary and Symbols, part 2:
when repeating the procedure many times.
Applied Statistics
3.1.8 confidence level, n—thevalue,1 – α,oftheprobability
associated with a confidence interval, often expressed as a
3. Terminology
percentage.
3.1 Definitions:
3.1.8.1 Discussion—α is generally a small number. Confi-
dence level is often 95 % or 99 %.
This practice is under the jurisdiction ofASTM Committee E11 on Quality and 3.1.9 confidence limit, n—each of the limits, L and U, of a
Statistics and is the direct responsibility of Subcommittee E11.10 on Sampling /
confidence interval, or the limit of a one-sided confidence
Statistics.
interval.
Current edition approved June 1, 2014. Published January 2015. Originally
approved in 2007. Last previous edition approved in 2014 as E2586 – 13. DOI:
3.1.10 correlation coeffecient, n—for a population, ρ,a
10.1520/E2586-14.
demensionlessmeasureofassociationbetweentwovariablesX
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
and Y, equal to the covariance divided by the product of σ
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM X
Standards volume information, refer to the standard’s Document Summary page on times σ .
Y
the ASTM website.
3.1.11 correlation coeffecient, n—for a sample, r, the quan-
Available fromAmerican National Standards Institute (ANSI), 25 W. 43rd St.,
4th Floor, New York, NY 10036, http://www.ansi.org. tity:
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2586 − 14
Σ x 2 x¯ y 2 y¯ 3.1.29 prediction interval, n—an interval for a future value
~ !~ !
(1)
~n 2 1!s s orsetofvalues,constructedfromacurrentsetofdata,inaway
x y
that has a specified probability for the inclusion of the future
3.1.12 covariance, n—of a population, cov (X, Y), for two
value.
variables, X and Y, the expected value of (X – µ )(Y – µ ).
X Y
3.1.30 regression, n—the process of estimating parameter(s)
3.1.13 covariance, n—of a sample; the quantity:
of an equation using a set of date.
Σ x 2 x¯ y 2 y¯
~ !~ !
(2) 3.1.31 residual, n—observed value minus fitted value, when
~n 2 1!
a model is used.
3.1.14 dependent variable, n—a variable to be predicted
3.1.32 statistic, n—see sample statistic.
using an equation.
3.1.33 quantile, n—valuesuchthatafraction fofthesample
3.1.15 degrees of freedom, n—the number of independent
or population is less than or equal to that value.
data points minus the number of parameters that have to be
3.1.34 range, R, n—maximum value minus the minimum
estimated before calculating the variance.
value in a sample.
3.1.16 estimate, n—sample statistic used to approximate a
3.1.35 sample, n—a group of observations or test results,
population parameter.
taken from a larger collection of observations or test results,
3.1.17 histogram, n—graphical representation of the fre-
whichservestoprovideinformationthatmaybeusedasabasis
quency distribution of a characteristic consisting of a set of
for making a decision concerning the larger collection.
rectangles with area proportional to the frequency. ISO 3534-1
3.1.36 sample size, n, n—number of observed values in the
3.1.17.1 Discussion—While not required, equal bar or class
sample
widths are recommended for histograms.
3.1.37 sample statistic, n—summary measure of the ob-
3.1.18 independent variable, n—a variable used to predict
served values of a sample.
another using an equation.
3.1.38 skewness, γ,g,n—for population or sample, a
th 1 1
3.1.19 interquartile range, IQR, n—the 75 percentile (0.75
measure of symmetry of a distribution, calculated as the ratio
th
quantile) minus the 25 percentile (0.25 quantile), for a data
of the third central moment (empirical if a sample, and
set.
theoretical if a population applies) to the standard deviation
3.1.20 kurtosis, γ,g,n—for a population or a sample, a
2 2 (sample, s, or population, σ) raised to the third power.
measure of the weight of the tails of a distribution relative to
3.1.39 standard error—standard deviation of the population
the center, calculated as the ratio of the fourth central moment
of values of a sample statistic in repeated sampling, or an
(empiricalifasample,theoreticalifapopulationapplies)tothe
estimate of it.
standard deviation (sample, s, or population, σ) raised to the
3.1.39.1 Discussion—If the standard error of a statistic is
fourth power, minus 3 (also referred to as excess kurtosis).
estimated, it will itself be a statistic with some variance that
3.1.21 mean, n—of a population, µ, average or expected
depends on the sample size.
value of a characteristic in a population – of a sample, x, sum
3.1.40 standard deviation—of a population, σ, the square
of the observed values in the sample divided by the sample
root of the average or expected value of the squared deviation
size.
of a variable from its mean; —of a sample, s, the square root
th
3.1.22 median,X,n—the 50 percentile in a population or
of the sum of the squared deviations of the observed values in
sample.
the sample divided by the sample size minus 1.
2 2
3.1.22.1 Discussion—The sample median is the [(n + 1)⁄2]
3.1.41 variance, σ,s,n—square of the standard deviation
order statistic if the sample size n is odd and is the average of
of the population or sample.
the [n/2] and [n/2 + 1] order statistics if n is even.
3.1.41.1 Discussion—For a finite population, σ is calcu-
3.1.23 midrange, n—average of the minimum and maxi- latedasthesumofsquareddeviationsofvaluesfromthemean,
mum values in a sample. divided by n. For a continuous population, σ is calculated by
th
integrating (x–µ) with respect to the density function. For a
3.1.24 orderstatistic,x ,n—valueofthek observedvalue
(k)
sample, s is calculated as the sum of the squared deviations of
in a sample after sorting by order of magnitude.
observedvaluesfromtheiraveragedividedbyonelessthanthe
3.1.24.1 Discussion—For a sample of size n, the first order
sample size.
statistic x is the minimum value, x is the maximum value.
(1) (n)
3.1.42 Z-score, n—observed value minus the sample mean
3.1.25 parameter, n—see population parameter.
divided by the sample standard deviation.
3.1.26 percentile, n—quantile of a sample or a population,
for which the fraction less than or equal to the value is
4. Significance and Use
expressed as a percentage.
4.1 This practice provides approaches for characterizing a
3.1.27 population, n—the totality of items or units of
sample of n observations that arrive in the form of a data set.
material under consideration.
Large data sets from organizations, businesses, and govern-
3.1.28 population parameter, n—summary measure of the mental agencies exist in the form of records and other
values of some characteristic of a population. ISO 3534-2 empirical observations. Research institutions and laboratories
E2586 − 14
at universities, government agencies, and the private sector 4.8 While the methods described in this practice may be
also generate considerable amounts of empirical data. used to summarize any set of observations, the results obtained
4.1.1 Adata set containing a single variable usually consists by using them may be of little value from the standpoint of
of a column of numbers. Each row is a separate observation or interpretation unless the data quality is acceptable and satisfies
instance of measurement of the variable. The numbers them- certain requirements. To be useful for inductive generalization,
selvesaretheresultofapplyingthemeasurementprocesstothe any sample of observations that is treated as a single group for
variable being studied or observed. We may refer to each presentationpurposesmustrepresentaseriesofmeasurements,
observation of a variable as an item in the data set. In many all made under essentially the same test conditions, on a
situations, there may be several variables defined for study. material or product, all of which have been produced under
4.1.2 The sample is selected from a larger set called the essentially the same conditions. When these criteria are met,
population. The population can be a finite set of items, a very we are minimizing the danger of mixing two or more distinctly
large or essentially unlimited set of items, or a process. In a different sets of data.
process, the items originate over time and the population is 4.8.1 If a given collection of data consists of two or more
dynamic, continuing to emerge and possibly change over time. samples collected under different test conditions or represent-
Sample data serve as representatives of the population from ing material produced under different conditions (that is,
which the sample originates. It is the population that is of
different populations), it should be considered as two or more
primary interest in any particular study. separate subgroups of observations, each to be treated inde-
pendently in a data analysis program. Merging of such
4.2 The data (measurements and observations) may be of
subgroups, representing significantly different conditions, may
the variable type or the simple attribute type. In the case of
lead to a presentation that will be of little practical value.
attributes, the data may be either binary trials or a count of a
Briefly, any sample of observations to which these methods are
defined event over some interval (time, space, volume, weight,
applied should be homogeneous or, in the case of a process,
or area). Binary trials consist of a sequence of 0s and 1s in
have originated from a process in a state of statistical control.
which a “1” indicates that the inspected item exhibited the
attribute being studied and a “0” indicates the item did not 4.9 The methods developed in Sections 6, 7, and 8 apply to
exhibit the attribute. Each inspection item is assigned either a
the sample data. There will be no misunderstanding when, for
“0” or a “1.” Such data are often governed by the binomial example, the term “mean” is indicated, that the meaning is
distribution. For a count of events over some interval, the
sample mean, not population mean, unless indicated otherwise.
number of times the event is observed on the inspection It is understood that there is a data set containing n observa-
interval is recorded for each of n inspection intervals. The
tions. The data set may be denoted as:
Poisson distribution often governs counting events over an
x , x , x … x (3)
1 2 3 n
interval.
4.9.1 There is no order of magnitude implied by the
4.3 For sample data to be used to draw conclusions about
subscript notation unless subscripts are contained in parenthe-
the population, the process of sampling and data collection
sis (see 6.7).
must be considered, at least potentially, repeatable. Descriptive
statistics are calculated using real sample data that will vary in
5. Characteristics of Populations
repeating the sampling process.As such, a statistic is a random
5.1 A population is the totality of a set of items under
variable subject to variation in its own right. The sample
statistic usually has a corresponding parameter in the popula- consideration. Populations may be finite or unlimited in size
and may be existing or continuing to emerge as, for example,
tion that is unknown (see Section 5). The point of using a
in a process. For continuous variables, X, representing an
statisticistosummarizethedatasetandestimateacorrespond-
essentially unlimited population or a process, the population is
ing population characteristic or parameter.
mathematicallycharacterizedbyaprobabilitydensityfunction,
4.4 Descriptive statistics consider numerical, tabular, and
f(x). The density function
...


This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E2586 − 13 E2586 − 14 An American National Standard
Standard Practice for
Calculating and Using Basic Statistics
This standard is issued under the fixed designation E2586; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope
1.1 This practice covers methods and equations for computing and presenting basic descriptive statistics using a set of sample
data containing a single variable or two variables. This practice includes simple descriptive statistics for variable data, tabular and
graphical methods for variable data, and methods for summarizing simple attribute data. Some interpretation and guidance for use
is also included.
1.2 The system of units for this practice is not specified. Dimensional quantities in the practice are presented only as illustrations
of calculation methods. The examples are not binding on products or test methods treated.
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory
limitations prior to use.
2. Referenced Documents
2.1 ASTM Standards:
E178 Practice for Dealing With Outlying Observations
E456 Terminology Relating to Quality and Statistics
E2282 Guide for Defining the Test Result of a Test Method
2.2 ISO Standards:
ISO 3534-1 Statistics—Vocabulary and Symbols, part 1: Probability and General Statistical Terms
ISO 3534-2 Statistics—Vocabulary and Symbols, part 2: Applied Statistics
3. Terminology
3.1 Definitions:
3.1.1 Unless otherwise noted, terms relating to quality and statistics are as defined in Terminology E456.
3.1.2 characteristic, n—a property of items in a sample or population which, when measured, counted, or otherwise observed,
helps to distinguish among the items. E2282
3.1.3 coeffıcient of determination, n—square of the correlation coefficient, r.
3.1.4 coeffıcient of variation, CV, n—for a nonnegative characteristic, the ratio of the standard deviation to the mean for a
population or sample
This practice is under the jurisdiction of ASTM Committee E11 on Quality and Statistics and is the direct responsibility of Subcommittee E11.10 on Sampling / Statistics.
Current edition approved Oct. 1, 2013June 1, 2014. Published October 2013January 2015. Originally approved in 2007. Last previous edition approved in 20122014 as
E2586 – 12b.E2586 – 13. DOI: 10.1520/E2586-13.10.1520/E2586-14.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
Available from American National Standards Institute (ANSI), 25 W. 43rd St., 4th Floor, New York, NY 10036, http://www.ansi.org.
3.1.4.1 Discussion—
The coefficient of variation is often expressed as a percentage.
3.1.4.2 Discussion—
This statistic is also known as the relative standard deviation, RSD.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E2586 − 14
3.1.5 confidence bound, n—see confidence limit.
3.1.6 confidence coeffıcient, n—see confidence level.
3.1.7 confidence interval, n—an interval estimate [L, U] with the statistics L and U as limits for the parameter θ and with
confidence level 1 – α, where Pr(L ≤ θ ≤ U) ≥ 1 – α.
3.1.7.1 Discussion—
The confidence level, 1 – α, reflects the proportion of cases that the confidence interval [L, U] would contain or cover the true
parameter value in a series of repeated random samples under identical conditions. Once L and U are given values, the resulting
confidence interval either does or does not contain it. In this sense "confidence" applies not to the particular interval but only to
the long run proportion of cases when repeating the procedure many times.
3.1.8 confidence level, n—the value, 1 – α, of the probability associated with a confidence interval, often expressed as a
percentage.
3.1.8.1 Discussion—
α is generally a small number. Confidence level is often 95 % or 99 %.
3.1.9 confidence limit, n—each of the limits, L and U, of a confidence interval, or the limit of a one-sided confidence interval.
3.1.10 correlation coeffecient, n—for a population, ρ, a demensionless measure of association between two variables X and Y,
equal to the covariance divided by the product of σ times σ .
X Y
3.1.11 correlation coeffecient, n—for a sample, r, the quantity:
Σ~x 2 x¯ !~y 2 y¯ !
(1)
n 2 1 s s
~ !
x y
3.1.12 covariance, n—of a population, cov (X, Y), for two variables, X and Y, the expected value of (X – μ )(Y – μ ).
X Y
3.1.13 covariance, n—of a sample; the quantity:
Σ~x 2 x¯ !~y 2 y¯ !
(2)
n 2 1
~ !
3.1.14 dependent variable, n—a variable to be predicted using an equation.
3.1.15 degrees of freedom, n—the number of independent data points minus the number of parameters that have to be estimated
before calculating the variance.
3.1.16 estimate, n—sample statistic used to approximate a population parameter.
3.1.17 histogram, n—graphical representation of the frequency distribution of a characteristic consisting of a set of rectangles
with area proportional to the frequency. ISO 3534-1
3.1.17.1 Discussion—
While not required, equal bar or class widths are recommended for histograms.
3.1.18 independent variable, n—a variable used to predict another using an equation.
th th
3.1.19 interquartile range, IQR, n—the 75 percentile (0.75 quantile) minus the 25 percentile (0.25 quantile), for a data set.
3.1.20 kurtosis, γ , g , n—for a population or a sample, a measure of the weight of the tails of a distribution relative to the center,
2 2
calculated as the ratio of the fourth central moment (empirical if a sample, theoretical if a population applies) to the standard
deviation (sample, s, or population, σ) raised to the fourth power, minus 3 (also referred to as excess kurtosis).
3.1.21 mean, n—of a population, μ, average or expected value of a characteristic in a population – of a sample, x, sum of the
observed values in the sample divided by the sample size.
th
3.1.22 median, X , n—the 50 percentile in a population or sample.
3.1.22.1 Discussion—
The sample median is the [(n + 1) ⁄2] order statistic if the sample size n is odd and is the average of the [n/2] and [n/2 + 1] order
statistics if n is even.
3.1.23 midrange, n—average of the minimum and maximum values in a sample.
th
3.1.24 order statistic, x , n—value of the k observed value in a sample after sorting by order of magnitude.
(k)
E2586 − 14
3.1.24.1 Discussion—
For a sample of size n, the first order statistic x is the minimum value, x is the maximum value.
(1) (n)
3.1.25 parameter, n—see population parameter.
3.1.26 percentile, n—quantile of a sample or a population, for which the fraction less than or equal to the value is expressed
as a percentage.
3.1.27 population, n—the totality of items or units of material under consideration.
3.1.28 population parameter, n—summary measure of the values of some characteristic of a population. ISO 3534-2
3.1.29 prediction interval, n—an interval for a future value or set of values, constructed from a current set of data, in a way that
has a specified probability for the inclusion of the future value.
3.1.30 regression, n—the process of estimating parameter(s) of an equation using a set of date.
3.1.31 residual, n—observed value minus fitted value, when a model is used.
3.1.32 statistic, n—see sample statistic.
3.1.33 quantile, n—value such that a fraction f of the sample or population is less than or equal to that value.
3.1.34 range, R, n—maximum value minus the minimum value in a sample.
3.1.35 sample, n—a group of observations or test results, taken from a larger collection of observations or test results, which
serves to provide information that may be used as a basis for making a decision concerning the larger collection.
3.1.36 sample size, n, n—number of observed values in the sample
3.1.37 sample statistic, n—summary measure of the observed values of a sample.
3.1.38 skewness, γ , g , n—for population or sample, a measure of symmetry of a distribution, calculated as the ratio of the third
1 1
central moment (empirical if a sample, and theoretical if a population applies) to the standard deviation (sample, s, or population,
σ) raised to the third power.
3.1.39 standard error—standard deviation of the population of values of a sample statistic in repeated sampling, or an estimate
of it.
3.1.39.1 Discussion—
If the standard error of a statistic is estimated, it will itself be a statistic with some variance that depends on the sample size.
3.1.40 standard deviation—of a population, σ, the square root of the average or expected value of the squared deviation of a
variable from its mean; —of a sample, s, the square root of the sum of the squared deviations of the observed values in the sample
divided by the sample size minus 1.
2 2
3.1.41 variance, σ , s , n—square of the standard deviation of the population or sample.
3.1.41.1 Discussion—
For a finite population, σ is calculated as the sum of squared deviations of values from the mean, divided by n. For a continuous
2 2 2
population, σ is calculated by integrating (x – μ) with respect to the density function. For a sample, s is calculated as the sum
of the squared deviations of observed values from their average divided by one less than the sample size.
3.1.42 Z-score, n—observed value minus the sample mean divided by the sample standard deviation.
4. Significance and Use
4.1 This practice provides approaches for characterizing a sample of n observations that arrive in the form of a data set. Large
data sets from organizations, businesses, and governmental agencies exist in the form of records and other empirical observations.
Research institutions and laboratories at universities, government agencies, and the private sector also generate considerable
amounts of empirical data.
4.1.1 A data set containing a single variable usually consists of a column of numbers. Each row is a separate observation or
instance of measurement of the variable. The numbers themselves are the result of applying the measurement process to the
variable being studied or observed. We may refer to each observation of a variable as an item in the data set. In many situations,
there may be several variables defined for study.
4.1.2 The sample is selected from a larger set called the population. The population can be a finite set of items, a very large
or essentially unlimited set of items, or a process. In a process, the items originate over time and the population is dynamic,
continuing to emerge and possibly change over time. Sample data serve as representatives of the population from which the sample
originates. It is the population that is of primary interest in any particular study.
E2586 − 14
4.2 The data (measurements and observations) may be of the variable type or the simple attribute type. In the case of attributes,
the data may be either binary trials or a count of a defined event over some interval (time, space, volume, weight, or area). Binary
trials consist of a sequence of 0s and 1s in which a “1” indicates that the inspected item exhibited the attribute being studied and
a “0” indicates the item did not exhibit the attribute. Each inspection item is assigned either a “0” or a “1.” Such data are often
governed by the binomial distribution. For a count of events over some interval, the number of times the event is observed on the
inspection interval is recorded for each of n inspection intervals. The Poisson distribution often governs counting events over an
interval.
4.3 For sample data to be used to draw conclusions about the population, the process of sampling and data collection must be
considered, at least potentially, repeatable. Descriptive statistics are calculated using real sample data that will vary in repeating
the sampling process. As such, a statistic is a random variable subject to variation in its own right. The sample statistic usually
has a corresponding parameter in the population that is unknown (see Section 5). The point of using a statistic is to summarize
the data set and estimate a corresponding population characteristic or parameter.
4.4 Descriptive statistics consider numerical, tabular, and graphical methods for summarizing a set of data. The methods
considered in this practice are used for summarizing the observations from a single variable.
4.5 The descriptive statistics described in this practice are:
4.5.1 Mean, median, min, max, range, mid range, order statistic, quartile, empirical percentile, quantile, interquartile range,
variance, standard deviation, Z-score, coefficient of variation, skewness and kurtosis, and standard error.
4.6 Tabular methods described in this practice are:
4.6.1 Frequency distribution, relative frequency distribution, cumulative frequency distribution, and cumulative relative
frequency distribution.
4.7 Graphical methods described in this practice are:
4.7.1 Histogram, ogive, boxplot, dotplot, normal probability plot, and q-q plot.
4.8 While the methods described in this practice may be used to summarize any set of observations, the results obtained by using
them may be of little value from the standpoint of interpretation unless the data quality is acceptable and satisfies certain
requirements. To be useful for inductive generalization, any sample of observations that is treated as a single group for presentation
purposes must represent a series of measurements, all made under essentially the same test conditions, on a material or product,
all of which have been produced under essentially the same conditions. When these criteria are met, we are minimizing the danger
of mixing two or more distinctly different sets of data.
4.8.1 If a given collection of data consists of two or more samples collected under different test conditions or representing
material produced under different conditions (that is, different populations), it should be considered as two or more separate
subgroups of
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.