ASTM D6582-00
(Guide)Standard Guide for Ranked Set Sampling: Efficient Estimation of a Mean Concentration in Environmental Sampling
Standard Guide for Ranked Set Sampling: Efficient Estimation of a Mean Concentration in Environmental Sampling
SCOPE
1.1 This guide describes ranked set sampling, discusses its relative advantages over simple random sampling, and provides examples of potential applications in environmental sampling.
1.2 Ranked set sampling is useful and cost-effective when there is an auxiliary variable, which can be inexpensively measured relative to the primary variable, and when the auxiliary variable has correlation with the primary variable. The resultant estimation of the mean concentration is unbiased, more precise than simple random sampling, and more representative of the population under a wide variety of conditions.
1.3 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety and health practices and determine the applicability of regulatory limitations prior to use.
General Information
Relations
Standards Content (Sample)
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation:D6582–00
Standard Guide for
Ranked Set Sampling: Efficient Estimation of a Mean
Concentration in Environmental Sampling
This standard is issued under the fixed designation D 6582; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (e) indicates an editorial change since the last revision or reapproval.
1. Scope inexpensive quick measurement, knowledge of operational
history, previous site data, or any other similar information.
1.1 This guide describes ranked set sampling, discusses its
3.1.2 data quality objectives (DQO) process, n—a quality
relative advantages over simple random sampling, and pro-
managementtoolbasedonthescientificmethodanddeveloped
vides examples of potential applications in environmental
by the U.S. Environmental Protection Agency (EPA) to facili-
sampling.
tate the planning of environmental data collection activities.
1.2 Ranked set sampling is useful and cost-effective when
(D 5792)
there is an auxiliary variable, which can be inexpensively
3.1.3 equal allocation, n—this occurs when the number of
measured relative to the primary variable, and when the
sets in ranked set sampling is an integer multiple of the size of
auxiliary variable has correlation with the primary variable.
the set.
Theresultantestimationofthemeanconcentrationisunbiased,
3.1.4 primary variable, n—the primary characteristic or
more precise than simple random sampling, and more repre-
measurement of interest.
sentative of the population under a wide variety of conditions.
3.1.5 ranked set sampling, n—a sampling method in which
1.3 This standard does not purport to address all of the
samples are ranked by the use of auxiliary information on the
safety concerns, if any, associated with its use. It is the
samples and only a subset of the samples are selected for the
responsibility of the user of this standard to establish appro-
measurement of the primary variable.
priate safety and health practices and determine the applica-
3.1.6 representative sample, n—asamplecollectedinsucha
bility of regulatory limitations prior to use.
mannerthatitreflectsoneormorecharacteristicsofinterest(as
2. Referenced Documents
defined by the project objectives) of a population from which
it is collected. (D 6044)
2.1 ASTM Standards:
3.1.6.1 Discussion—Arepresentativesamplecanbeasingle
D 5792 Practice for Generation of Environmental Data
sample, a collection of samples, or one or more composite
Related to Waste ManagementActivities: Development of
samples. A single sample can be representative only when the
Data Quality Objectives
population is highly homogeneous. (D 6044)
D 6044 Guide for Representative Sampling for Manage-
ment of Waste and Contaminated Media
4. Significance and Use
3. Terminology
4.1 Ranked set sampling is cost-effective, unbiased, more
precise and more representative of the population than simple
3.1 Definitions:
random sampling under a variety of conditions (1).
3.1.1 auxiliary variable, n—the secondary characteristic or
4.2 Ranked set sampling (RSS) can be used when:
measurement of interest.
4.2.1 The population is likely to have stratification in
3.1.1.1 Discussion—In ranked set sampling, information
concentrations of contaminant.
contained in an auxiliary variable is useful for ranking the
4.2.2 There is an auxiliary variable.
samples. This ranking may mimic the rankings of the samples
4.2.3 The auxiliary variable has strong correlation with the
with respect to the values of the primary variable when there is
primary variable.
correlation between the auxiliary variable and the primary
4.2.4 Theauxiliaryvariableiseitherquickorinexpensiveto
variable. Auxiliary information may include visual inspection,
measure, relative to the primary variable.
4.3 This guide provides a ranked set sampling method only
under the rule of equal allocation. This guide is intended for
This guide is under the jurisdiction of ASTM Committee D34 on Waste
Management and is the direct responsibility of Subcommittee D34.01.01 on
Planning for Sampling.
Current edition approved Aug. 10, 2000. Published October 2000. The boldface numbers in parentheses refer to the list of references at the end of
Annual Book of ASTM Standards, Vol 11.04. this standard.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959, United States.
D6582
those who manage, design, and implement sampling and 5.7.5 Randomly select a total of m r samples from the
analysis plans for management of wastes and contaminated population, for example, by simple random sampling design,
media. This guide can be used in conjunction with the DQO and randomly divide them into r replicates, with m (m times
process (see Practice D 5792). m) samples in each replicate.
NOTE 1—In practice, the m r samples may not be taken all at once.
5. Ranked Set Sampling (RSS)
More often, m random samples may be taken from a geographical
sub-area of the population and are then ranked according to the auxiliary
5.1 Environmental sampling typically requires the identifi-
variable. This is repeated m times to obtain the first replicate of m (m
cation of the locations where the samples are to collected.
times m) samples. This entire process is repeated r times to obtain the
Subsequent analyses of these samples to quantify the charac-
needed r replicates.
teristics of interest allow inference on the population mean
concentration from the sample data. 5.7.6 Startwiththefirstreplicateofm (mtimesm)samples.
Arrange these samples into m sets of size m (an m by m
5.2 A simple random sampling (SRS) approach is one
sampling design that can be used. In this case, a set of random matrix).
samples is identified and collected from a population and all of 5.7.7 For each of the m sets in this replicate, rank the
these samples are analyzed (for the primary variable). samples within each set by using the auxiliary measurement on
5.3 Ranked set sampling (RSS) is similar to SRS in the the samples. When the observations on the auxiliary variable
cannotbedistinguishedfromeachother,theseobservationsare
identification and collection of the samples, but only a subset
of the samples are selected for analysis. The selection is done called “ties.”Ties can be broken arbitrarily (namely, arbitrarily
assigning one rank to one sample and a succeeding rank to the
by ranking the samples using auxiliary information on the
samples and selecting a subset based on the rankings of the other).
samples. 5.7.8 Select samples for the measurement on the primary
variableasfollows.Inset i,selectandmeasurethesamplewith
5.4 As can be seen from the steps described below, RSS is
in fact a “stratified random sampling at the sample level,” rank i, i = 1, 2, ., m. Completion of this step leads to a total
of m samples to be analyzed for the primary variable, out of a
meaning that stratification of the population is induced after
total of m samples collected.
sampling and no construction of the strata is needed before
sampling. Increased precision of stratified random sampling in 5.7.9 Repeat steps 5.7.6 through 5.7.8 for r times to obtain
the estimation of the population mean, relative to SRS, is well a total of m 3r=n samples to be analyzed and measured for
known, especially when the population is stratified by concen- the primary variable.
trations.
5.8 Sincethenumberofsets(m)instep5.7.6equalsthesize
5.5 Increased precision of RSS relative to SRS means that of the set (m), this is called equal allocation. RSS under
the same precision can be achieved with fewer samples unequal allocation tends to have additional gains in precision,
analyzed for the primary variable under RSS. RSS is therefore relative to equal allocation; but, this gain is, in general, not
large compared to the gain against SRS, and is not covered in
more cost-effective than SRS. When the objective is to
minimizesamplingandanalyticalcosts,thenumberofsamples this guide.
can be determined so that RSS has precision equal to that of 5.9 The value of n can be the total number of samples for
SRS at a lower cost.
which the budget can afford to analyze.
5.6 The actual steps to conduct RSS are given below.
5.10 The rounding up in step 5.7.4 may cause the total
5.7 Steps in Ranked Set Sampling (RSS): number of analyses for the primary variable to exceed n.When
5.7.1 Determine the total number of sample analyses (n) this is the case, there are two options:
agreed to by the stakeholders. A planning process, such as a 5.10.1 Obtain buy-in from the stakeholders to accept the
data quality objectives (DQO) process (Practice D 5792), may slightly higher total number of sample analyses, or
be used to determine this number. 5.10.2 Trydifferentvaluesof mand rtogetthetotalnumber
5.7.2 Determine the primary variable and the auxiliary of analyses as close to n as possible.
variable of interest.
5.11 Estimation of Mean and Standard Error of the Mean:
5.7.3 Determine the size of the set, m. Study the auxiliary
5.11.1 In 5.7, if n = 12, m=3,and r = 4, the data on the
measurement and determine its capability in ranking the primary variable obtained from the steps in that section may be
samples. For example, if the auxiliary measurement is visual
summarized as in Table 1. The true mean concentration of the
inspection, its capability in ranking the samples may be characteristic of interest is estimated by the arithmetic sample
somewhat limited. Namely, it may be capable of ranking 3–4
mean of the measured samples. For the hypothetical example
samples, but may have difficulty in ranking greater than 5 or 6 in Table 1 (and assuming normal distribution of the data), the
samples based on visual inspection; thus, the preferred size of
mean (M) is estimated as follows:
the set (m) in ranked set sampling is about 3 or 4. On the other
M 5 X 1 X 1 X 1 X 1 X 1 X 1 . 1 X !/12 .
~
11 12 13 14 21 22 34
hand, an instrument-based quick-test may be capable of a
(1)
larger m (see 5.14 for ranking criteria).
The standard error of the mean (S ) is estimated as follows:
M
5.7.4 Calculate the needed number of replicates, r (the
2 2 2 2
S 5 @~X – X ! 1 ~X – X ! 1 ~X – X ! 1 ~X – X ! 1
number of times the ranked sets are to be repeated). Divide n $
M 11 1. 12 1. 13 1. 14 1.
2 2 2
~X – X ! 1 ~X – X ! 1 ~X – X ! 1 . 1
by m and round it up to whole number to obtain the needed r.
21 2. 22 2. 23 2.
2 2 1/2
Namely, r = n/m and round up to whole number. ~X – X ! #/~m r~r–1!% (2)
34 3.
D6582
TABLE 1 Sample Values on Primary Variable
5.12.2.8 In the first set, select the sample with rank 1 for the
Set Replicate Value measurement of the primary variable. In set 2, select the
sample with rank 2 for the measurement of the primary
11 X
2 X
12 variable.And so forth for the third set.After this step, a total of
3 X
m = 3 samples have been chosen for the measurement of the
4 X
primary variable.
21 X
5.12.2.9 Repeat steps 5.12.2.6 through 5.12.2.8 for four
2 X
times to obtain a total of m 3 r=3 3 4 = 12 samples.
3 X
4 X
24 5.12.3 After steps 5.12.2.1 through 5.12.2.5, the 36 samples
to be taken from the population may appear graphically as in
31 X
Fig. 1. The samples in Fig. 1 are arbitrarily numbered from 1
2 X
3 X
33 through 36.
4 X
5.12.4 Aftersteps5.12.2.1through5.12.2.9,therankingson
the auxiliary variable and the measured values on the primary
variable may appear as in Table 2. Each row of three samples
can be called a cluster, and they are so designated in Table 2.
where:
These clusters can be marked in Fig. 1.
th
X = the value of the primary variable from the i set and
ij
th 5.12.5 Table 3 is a summary of the data on the primary
the j replicate, and
variable in Table 2. Note that the sample values of the primary
X = the average of set i.
i
variableandthebold-facedrankdatainTable2happentohave
NOTE 2—The numerator of Eq 2 represents the squared differences
the same ordering in all the replicates, except replicate 4,
between a value and its set average.
implying good correlation between the auxiliary variable and
the primary variable.
5.11.2 Giventheseestimates,inferenceaboutthepopulation
5.12.6 When the data of the primary variable in Table 3
mean concentration can be made from the sample data (with
some typical assumptions about the underlying statistical follow a normal distribution, the sample mean (M) and
standard error of the mean (S ) can be calculated as follows:
distribution of the data). This includes the use of confidence
M
limits to estimate the population mean.
M 5 ~9110112115115116120114117118123120!/12 5 15.75,
5.12 An Illustration of RSS:
(3)
5.12.1 An illustration of the steps in 5.7 and 5.11 is given in
5.12.2.1 through 5.12.2.9. The objective of this example is to 2 2 2 2
S 5 $@~9–11.5! 1 ~10–11.5! 1 ~12–11.5! 1 ~15–11.5!
M
estimate the mean of total petroleum hydrocarbon (TPH)
2 2 2 2
1 ~15–16.25! 1 ~16–16.25! 1 ~20–16.25! 1 ~14–16.25!
concentration in the soil of a 1-acre site, down to the depth of
2 2 2 2 2
1 ~17–19.5! 1 ~18–19.5! 1 ~23–19.5! 1 ~20–19.5! #/@~3 !~4!
one inch from the surface. Assume that the stakeholders agree
1/2
4–1
~ !#%
that,duetocostandotherconsiderations,atotalof12analyses
1/2
5 62.75/81!
~
are the limit. Further assume that coloration on the surface of
5 0.76. (4)
the soil is positively correlated with TPH concentration, with
5.12.7 One- or two-sided confidence limits (CL) can be
darker color indicating higher concentration.
calculated from the sample mean and sample standard error of
5.12.2 The steps to carry out ranked set sampling are as
the mean (under the assumption that the data normally are
follows:
distributed). For example, two-sided 95 % confidence limits
5.12.2.1 n = 12, the desired total number of analyses for the
are as follows:
primary variable.
5.12.2.2 The primary variable = TPH concentration and the CL 5 M 6 t S (5)
0.95,11 M
auxiliary variable = soil coloration, where coloration is ob-
5 15.756 ~2.201! ~0.76!
served in-situ.
5 14.08 to 17.42,
5.12.2.3 Professional judgment may indicate that visual
inspection of soil coloration is capable of ranking three
samples; thus, m=3.
5.12.2.4 r = n/m = 12/3 = 4, the number of replicates.
5.12.2.5 Randomly select a total of m r samples = (3)(3)(4)
= 36 samples. Divide these samples into 4 replicates with 9
samples in each replicate.
5.12.2.6 Take the first replicate of 9 samples and arrange
them into 3 by 3 matrix; each row is called a set with set size
of 3 (three samples in that set).
5.12.2.7 Rankthethreesampleswithineachofthethreesets
(namely, assigning ranks to the samples) according to the
ranking
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.