ASTM D6582-00(2005)
(Guide)Standard Guide for Ranked Set Sampling: Efficient Estimation of a Mean Concentration in Environmental Sampling
Standard Guide for Ranked Set Sampling: Efficient Estimation of a Mean Concentration in Environmental Sampling
SCOPE
1.1 This guide describes ranked set sampling, discusses its relative advantages over simple random sampling, and provides examples of potential applications in environmental sampling.
1.2 Ranked set sampling is useful and cost-effective when there is an auxiliary variable, which can be inexpensively measured relative to the primary variable, and when the auxiliary variable has correlation with the primary variable. The resultant estimation of the mean concentration is unbiased, more precise than simple random sampling, and more representative of the population under a wide variety of conditions.
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation:D6582–00(Reapproved2005)
Standard Guide for
Ranked Set Sampling: Efficient Estimation of a Mean
Concentration in Environmental Sampling
This standard is issued under the fixed designation D 6582; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
1. Scope with respect to the values of the primary variable when there is
correlation between the auxiliary variable and the primary
1.1 This guide describes ranked set sampling, discusses its
variable. Auxiliary information may include visual inspection,
relative advantages over simple random sampling, and pro-
inexpensive quick measurement, knowledge of operational
vides examples of potential applications in environmental
history, previous site data, or any other similar information.
sampling.
3.1.2 data quality objectives (DQO) process, n—a quality
1.2 Ranked set sampling is useful and cost-effective when
managementtoolbasedonthescientificmethodanddeveloped
there is an auxiliary variable, which can be inexpensively
by the U.S. Environmental Protection Agency (EPA) to facili-
measured relative to the primary variable, and when the
tate the planning of environmental data collection activities.
auxiliary variable has correlation with the primary variable.
(D 5792)
Theresultantestimationofthemeanconcentrationisunbiased,
3.1.3 equal allocation, n—this occurs when the number of
more precise than simple random sampling, and more repre-
sets in ranked set sampling is an integer multiple of the size of
sentative of the population under a wide variety of conditions.
the set.
1.3 This standard does not purport to address all of the
3.1.4 primary variable, n—the primary characteristic or
safety concerns, if any, associated with its use. It is the
measurement of interest.
responsibility of the user of this standard to establish appro-
3.1.5 ranked set sampling, n—a sampling method in which
priate safety and health practices and determine the applica-
samples are ranked by the use of auxiliary information on the
bility of regulatory limitations prior to use.
samples and only a subset of the samples are selected for the
2. Referenced Documents
measurement of the primary variable.
3.1.6 representative sample, n—asamplecollectedinsucha
2.1 ASTM Standards:
mannerthatitreflectsoneormorecharacteristicsofinterest(as
D 5792 Practice for Generation of Environmental Data
defined by the project objectives) of a population from which
Related to Waste ManagementActivities: Development of
it is collected. (D 6044)
Data Quality Objectives
3.1.6.1 Discussion—Arepresentativesamplecanbeasingle
D 6044 Guide for Representative Sampling for Manage-
sample, a collection of samples, or one or more composite
ment of Waste and Contaminated Media
samples. A single sample can be representative only when the
3. Terminology
population is highly homogeneous. (D 6044)
3.1 Definitions:
4. Significance and Use
3.1.1 auxiliary variable, n—the secondary characteristic or
4.1 Ranked set sampling is cost-effective, unbiased, more
measurement of interest.
precise and more representative of the population than simple
3.1.1.1 Discussion—In ranked set sampling, information
random sampling under a variety of conditions (1).
contained in an auxiliary variable is useful for ranking the
4.2 Ranked set sampling (RSS) can be used when:
samples. This ranking may mimic the rankings of the samples
4.2.1 The population is likely to have stratification in
concentrations of contaminant.
This guide is under the jurisdiction of ASTM Committee D34 on Waste
4.2.2 There is an auxiliary variable.
Management and is the direct responsibility of Subcommittee D34.01.01 on
4.2.3 The auxiliary variable has strong correlation with the
Planning for Sampling.
primary variable.
Current edition approved July 1, 2005. Published August 2005. Originally
approved in 2000. Last previous edition approved in 2000 as D 6582-00.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on The boldface numbers in parentheses refer to the list of references at the end of
the ASTM website. this standard.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
D6582–00 (2005)
4.2.4 Theauxiliaryvariableiseitherquickorinexpensiveto 5.7.4 Calculate the needed number of replicates, r (the
measure, relative to the primary variable. number of times the ranked sets are to be repeated). Divide n
by m and round it up to whole number to obtain the needed r.
4.3 This guide provides a ranked set sampling method only
under the rule of equal allocation. This guide is intended for Namely, r = n/m and round up to whole number.
5.7.5 Randomly select a total of m r samples from the
those who manage, design, and implement sampling and
analysis plans for management of wastes and contaminated population, for example, by simple random sampling design,
and randomly divide them into r replicates, with m (m times
media. This guide can be used in conjunction with the DQO
process (see Practice D 5792). m) samples in each replicate.
NOTE 1—In practice, the m r samples may not be taken all at once.
5. Ranked Set Sampling (RSS)
More often, m random samples may be taken from a geographical
sub-area of the population and are then ranked according to the auxiliary
5.1 Environmental sampling typically requires the identifi-
variable. This is repeated m times to obtain the first replicate of m (m
cation of the locations where the samples are to collected.
times m) samples. This entire process is repeated r times to obtain the
Subsequent analyses of these samples to quantify the charac-
needed r replicates.
teristics of interest allow inference on the population mean
5.7.6 Startwiththefirstreplicateofm (mtimesm)samples.
concentration from the sample data.
Arrange these samples into m sets of size m (an m by m
5.2 A simple random sampling (SRS) approach is one
matrix).
sampling design that can be used. In this case, a set of random
5.7.7 For each of the m sets in this replicate, rank the
samples is identified and collected from a population and all of
samples within each set by using the auxiliary measurement on
these samples are analyzed (for the primary variable).
the samples. When the observations on the auxiliary variable
5.3 Ranked set sampling (RSS) is similar to SRS in the
cannotbedistinguishedfromeachother,theseobservationsare
identification and collection of the samples, but only a subset
called “ties.”Ties can be broken arbitrarily (namely, arbitrarily
of the samples are selected for analysis. The selection is done
assigning one rank to one sample and a succeeding rank to the
by ranking the samples using auxiliary information on the
other).
samples and selecting a subset based on the rankings of the
5.7.8 Select samples for the measurement on the primary
samples.
variableasfollows.Inset i,selectandmeasurethesamplewith
5.4 As can be seen from the steps described below, RSS is
rank i, i = 1, 2, ., m. Completion of this step leads to a total
in fact a “stratified random sampling at the sample level,”
of m samples to be analyzed for the primary variable, out of a
meaning that stratification of the population is induced after
total of m samples collected.
sampling and no construction of the strata is needed before
5.7.9 Repeat steps 5.7.6 through 5.7.8 for r times to obtain
sampling. Increased precision of stratified random sampling in
a total of m 3r=n samples to be analyzed and measured for
the estimation of the population mean, relative to SRS, is well
the primary variable.
known, especially when the population is stratified by concen-
5.8 Sincethenumberofsets(m)instep5.7.6equalsthesize
trations.
of the set (m), this is called equal allocation. RSS under
5.5 Increased precision of RSS relative to SRS means that
unequal allocation tends to have additional gains in precision,
the same precision can be achieved with fewer samples
relative to equal allocation; but, this gain is, in general, not
analyzed for the primary variable under RSS. RSS is therefore
large compared to the gain against SRS, and is not covered in
more cost-effective than SRS. When the objective is to
this guide.
minimizesamplingandanalyticalcosts,thenumberofsamples
5.9 The value of n can be the total number of samples for
can be determined so that RSS has precision equal to that of
which the budget can afford to analyze.
SRS at a lower cost.
5.10 The rounding up in step 5.7.4 may cause the total
5.6 The actual steps to conduct RSS are given below.
number of analyses for the primary variable to exceed n.When
5.7 Steps in Ranked Set Sampling (RSS):
this is the case, there are two options:
5.7.1 Determine the total number of sample analyses (n)
5.10.1 Obtain buy-in from the stakeholders to accept the
agreed to by the stakeholders. A planning process, such as a
slightly higher total number of sample analyses, or
data quality objectives (DQO) process (Practice D 5792), may
5.10.2 Trydifferentvaluesof mand rtogetthetotalnumber
be used to determine this number.
of analyses as close to n as possible.
5.7.2 Determine the primary variable and the auxiliary
5.11 Estimation of Mean and Standard Error of the Mean:
variable of interest.
5.11.1 In 5.7,if n = 12, m=3,and r = 4, the data on the
5.7.3 Determine the size of the set, m. Study the auxiliary
primary variable obtained from the steps in that section may be
measurement and determine its capability in ranking the
summarized as in Table 1. The true mean concentration of the
samples. For example, if the auxiliary measurement is visual
characteristic of interest is estimated by the arithmetic sample
inspection, its capability in ranking the samples may be
mean of the measured samples. For the hypothetical example
somewhat limited. Namely, it may be capable of ranking 3–4
in Table 1 (and assuming normal distribution of the data), the
samples, but may have difficulty in ranking greater than 5 or 6
mean (M) is estimated as follows:
samples based on visual inspection; thus, the preferred size of
M 5 ~X 1 X 1 X 1 X 1 X 1 X 1 . 1 X !/12 .
the set (m) in ranked set sampling is about 3 or 4. On the other 11 12 13 14 21 22 34
(1)
hand, an instrument-based quick-test may be capable of a
larger m (see 5.14 for ranking criteria). The standard error of the mean (S ) is estimated as follows:
M
D6582–00 (2005)
TABLE 1 Sample Values on Primary Variable
ranking of the soil coloration, giving a rank of 1 for lightest
Set Replicate Value coloration, a rank of 3 for the darkest coloration.
5.12.2.8 In the first set, select the sample with rank 1 for the
11 X
2 X
12 measurement of the primary variable. In set 2, select the
3 X
sample with rank 2 for the measurement of the primary
4 X
variable.And so forth for the third set.After this step, a total of
21 X
21 m = 3 samples have been chosen for the measurement of the
2 X
primary variable.
3 X
5.12.2.9 Repeat steps 5.12.2.6 through 5.12.2.8 for four
4 X
times to obtain a total of m 3 r=3 3 4 = 12 samples.
31 X
5.12.3 After steps 5.12.2.1 through 5.12.2.5, the 36 samples
2 X
to be taken from the population may appear graphically as in
3 X
4 X
34 Fig. 1. The samples in Fig. 1 are arbitrarily numbered from 1
through 36.
5.12.4 Aftersteps5.12.2.1through5.12.2.9,therankingson
2 2 2 2
S 5 $@~X – X ! 1 ~X – X ! 1 ~X – X ! 1 ~X – X ! 1
M 11 1. 12 1. 13 1. 14 1.
the auxiliary variable and the measured values on the primary
2 2 2
~X – X ! 1 ~X – X ! 1 ~X – X ! 1 . 1
21 2. 22 2. 23 2.
variable may appear as in Table 2. Each row of three samples
2 2 1/2
~X – X ! #/~m r~r–1!% (2)
34 3.
can be called a cluster, and they are so designated in Table 2.
These clusters can be marked in Fig. 1.
where:
th
5.12.5 Table 3 is a summary of the data on the primary
X = the value of the primary variable from the i set and
ij
th
variable in Table 2. Note that the sample values of the primary
the j replicate, and
variableandthebold-facedrankdatainTable2happentohave
X = the average of set i.
i
the same ordering in all the replicates, except replicate 4,
NOTE 2—The numerator of Eq 2 represents the squared differences
implying good correlation between the auxiliary variable and
between a value and its set average.
the primary variable.
5.11.2 Giventheseestimates,inferenceaboutthepopulation
5.12.6 When the data of the primary variable in Table 3
mean concentration can be made from the sample data (with
follow a normal distribution, the sample mean (M) and
some typical assumptions about the underlying statistical
standard error of the mean (S ) can be calculated as follows:
M
distribution of the data). This includes the use of confidence
M 5 ~9110112115115116120114117118123120!/12 5 15.75,
limits to estimate the population mean.
(3)
5.12 An Illustration of RSS:
5.12.1 An illustration of the steps in 5.7 and 5.11 is given in
2 2 2 2
S 5 $@~9–11.5! 1 ~10–11.5! 1 ~12–11.5! 1 ~15–11.5!
M
5.12.2.1 through 5.12.2.9. The objective of this example is to
2 2 2 2
1 ~15–16.25! 1 ~16–16.25! 1 ~20–16.25! 1 ~14–16.25!
estimate the mean of total petroleum hydrocarbon (TPH)
2 2 2 2 2
1 ~17–19.5! 1 ~18–19.5! 1 ~23–19.5! 1 ~20–19.5! #/[~3 !~4!
concentration in the soil of a 1-acre site, down to the depth of
1/2
4–1
~ !#%
one inch from the surface. Assume that the stakeholders agree
1/2
5 62.75/81!
~
that,duetocostandotherconsiderations,atotalof12analyses
5 0.76. (4)
are the limit. Further assume that coloration on the surface of
5.12.7 One- or two-sided confidence limits (CL) can be
the soil is positively correlated with TPH concentration, with
calculated from the sample mean and sample standard error of
darker color indicating higher concentration.
the mean (under the assumption that the data normally are
5.12.2 The steps to carry out ranked set sampling are as
distributed). For example, two-sided 95 % confidence limits
follows:
are as follows:
5.12.2.1 n = 12, the desired total number of analyses for the
primary variable.
CL 5 M 6 t S (5)
0.95,11 M
5.12.2.2 The primary variable = TPH concentration and the
5 15.756 ~2.201! ~0.76!
auxiliary variable = soil coloration, where coloration is ob-
served in-situ.
5.12.2.3 Professional judgment may indicate that visual
inspection of soil coloration is capable of ranking three
samples; thus, m=3.
5.12.2.4 r = n/m = 12/3 = 4, the number of replicates.
5.12.2.5 R
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.