Standard Test Method for Sensory Analysis-Tetrad Test

SIGNIFICANCE AND USE
5.1 The test method is effective for the following test objectives:  
5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change is made in ingredients, processing, packaging, handling, or storage; or  
5.1.2 To select, train, and monitor assessors.  
5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or Pd change. If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are assumed to be indistinguishable (for example, HO: δ or Pd = 0) and the data are examined to determine if the assumption can be rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example, HO: δ or Pd > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably).  
5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue, carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred.
SCOPE
1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products or to estimate the magnitude of the perceptible difference.  
1.2 This test method applies whether a difference may exist in a single sensory attribute or in several.  
1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible for the difference are not identified.  
1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method E2610).  
1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.  
1.6 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

General Information

Status
Published
Publication Date
14-Feb-2024
Technical Committee
Drafting Committee
Current Stage
Ref Project

Relations

Standard
ASTM E3009-24 - Standard Test Method for Sensory Analysis—Tetrad Test
English language
14 pages
sale 15% off
sale 15% off
Standard
REDLINE ASTM E3009-24 - Standard Test Method for Sensory Analysis—Tetrad Test
English language
14 pages
sale 15% off
sale 15% off

Frequently Asked Questions

ASTM E3009-24 is a standard published by ASTM International. Its full title is "Standard Test Method for Sensory Analysis-Tetrad Test". This standard covers: SIGNIFICANCE AND USE 5.1 The test method is effective for the following test objectives: 5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change is made in ingredients, processing, packaging, handling, or storage; or 5.1.2 To select, train, and monitor assessors. 5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or Pd change. If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are assumed to be indistinguishable (for example, HO: δ or Pd = 0) and the data are examined to determine if the assumption can be rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example, HO: δ or Pd > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably). 5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue, carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred. SCOPE 1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products or to estimate the magnitude of the perceptible difference. 1.2 This test method applies whether a difference may exist in a single sensory attribute or in several. 1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible for the difference are not identified. 1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method E2610). 1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.6 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

SIGNIFICANCE AND USE 5.1 The test method is effective for the following test objectives: 5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change is made in ingredients, processing, packaging, handling, or storage; or 5.1.2 To select, train, and monitor assessors. 5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or Pd change. If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are assumed to be indistinguishable (for example, HO: δ or Pd = 0) and the data are examined to determine if the assumption can be rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example, HO: δ or Pd > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably). 5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue, carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred. SCOPE 1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products or to estimate the magnitude of the perceptible difference. 1.2 This test method applies whether a difference may exist in a single sensory attribute or in several. 1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible for the difference are not identified. 1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method E2610). 1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.6 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.

ASTM E3009-24 is classified under the following ICS (International Classification for Standards) categories: 67.240 - Sensory analysis. The ICS classification helps identify the subject area and facilitates finding related standards.

ASTM E3009-24 has the following relationships with other standards: It is inter standard links to ASTM E3009-23a, ASTM E456-13a(2022)e1, ASTM E456-13a(2022), ASTM E3093-20, ASTM E3261-21. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.

You can purchase ASTM E3009-24 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ASTM standards.

Standards Content (Sample)


This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E3009 − 24
Standard Test Method for
Sensory Analysis—Tetrad Test
This standard is issued under the fixed designation E3009; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope E2262 Practice for Estimating Thurstonian Discriminal Dis-
tances
1.1 This test method covers a procedure for determining
E2610 Test Method for Sensory Analysis—Duo-Trio Test
whether a perceptible sensory difference exists between
samples of two products or to estimate the magnitude of the 2.2 ISO Standards:
perceptible difference. ISO 4120 Sensory Analysis – Methodology – Triangle Test
ISO 10399 Sensory Analysis – Methodology – Duo-Trio
1.2 This test method applies whether a difference may exist
Test
in a single sensory attribute or in several.
1.3 This test method is applicable when the nature of the
3. Terminology
difference between the samples is unknown. The attribute(s)
3.1 Definitions—For definition of terms relating to sensory
responsible for the difference are not identified.
analysis, see Terminology E253, and for terms relating to
1.4 The tetrad test is more efficient statistically than the
statistics, see Terminology E456.
triangle test (Test Method E1885) or the duo-trio test (Test
3.2 Definitions of Terms Specific to This Standard:
Method E2610).
3.2.1 α (alpha) risk—probability of concluding that a per-
1.5 This standard does not purport to address all of the
ceptible difference exists when, in reality, one does not.
safety concerns, if any, associated with its use. It is the
3.2.1.1 Discussion—Also known as Type I Error or signifi-
responsibility of the user of this standard to establish appro-
cance level.
priate safety, health, and environmental practices and deter-
3.2.2 β (beta) risk—probability of concluding that no per-
mine the applicability of regulatory limitations prior to use.
ceptible difference exists when, in reality, one does.
1.6 This international standard was developed in accor-
3.2.2.1 Discussion—Also known as Type II Error.
dance with internationally recognized principles on standard-
ization established in the Decision on Principles for the 3.2.3 δ—Thurstonian measure of sensory difference (effect
Development of International Standards, Guides and Recom-
size) relative to perceptual noise (standard deviation) (see
mendations issued by the World Trade Organization Technical
Practice E2262).
Barriers to Trade (TBT) Committee.
3.2.4 P —the probability of obtaining a correct answer from
c
an assessor in the test.
2. Referenced Documents
3.2.4.1 Discussion—If the products are indistinguishable
2.1 ASTM Standards:
sensorially, P = ⁄3 in the tetrad test; while if the products are
c
E253 Terminology Relating to Sensory Evaluation of Mate- 1
perceptibly different, P > ⁄3 .
c
rials and Products
3.2.5 P —proportion of assessors who can discriminate the
d
E456 Terminology Relating to Quality and Statistics
two products in the test.
E1871 Guide for Serving Protocol for Sensory Evaluation of
3.2.5.1 Discussion—P is the measure of sensory difference
d
Foods and Beverages
used in the guessing model.
E1885 Test Method for Sensory Analysis—Triangle Test
3.2.6 product—material to be evaluated.
3.2.7 sample—unit of product prepared, presented, and
This test method is under the jurisdiction of ASTM Committee E18 on Sensory
evaluated in the test.
Evaluation and is the direct responsibility of Subcommittee E18.04 on Test
Methods.
3.2.8 sensitivity—general term used to summarize the per-
Current edition approved Feb. 15, 2024. Published March 2024. Originally
formance characteristics of the test.
approved in 2015. Last previous edition approved in 2023 as E3009 – 23a. DOI:
10.1520/E3009-24.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on Available from International Organization for Standardization (ISO), 1, ch. de
the ASTM website. la Voie-Creuse, CP 56, CH-1211 Geneva 20, Switzerland, http://www.iso.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3009 − 24
3.2.8.1 Discussion—The sensitivity of the test is rigorously be rejected (that is, conclude that the samples are sufficiently
defined, in statistical terms, by the values selected for α, β, δ, similar to be used interchangeably).
or P .
d
5.3 The tetrad method involves the evaluation of four
samples. When the products being tested cause excessive
4. Summary of Test Method
sensory fatigue, carryover, or adaptation, methods that involve
the evaluation of fewer samples (same-different, triangle test,
4.1 Clearly define the test objective in writing.
etc.) may be preferred.
4.2 Choose the number of assessors based on the objective
of the test (that is, testing for a difference or testing for
6. Apparatus
similarity) and the level of sensitivity desired for the test. The
6.1 Carry out the test under conditions that prevent contact
sensitivity of the test is, in part, a function of two competing
between assessors until the evaluations have been completed,
risks, α and β and the maximum acceptable difference between
for example, using booths that comply with STP 913 (1).
the samples, as measured by δ or P . The value chosen by the
d
6.2 Sample preparation and serving sizes should comply
researcher for δ or P prior to the test represents the threshold
d
with Practice E1871. See Refs (2) or (3).
of a meaningful difference, where a value larger than δ or P
d
represents a meaningfully large perceptible difference. When
7. Assessors
testing for a difference, α is the risk of declaring the samples
different when they are not, and β is the risk of not declaring
7.1 All assessors must be familiar with the mechanics of the
the samples different when they are (and the difference is, in tetrad test (the format, the task, and the procedure of evalua-
fact, equal to δ or P ). When testing for similarity, α is the risk
tion). Experience and familiarity with the product and test
d
of declaring the samples similar when they are not, and β is the method may increase the sensitivity of an assessor and may
risk of not declaring the samples similar when they are.
therefore increase the likelihood of finding a significant differ-
Acceptable values of α and β vary depending on the test ence. Monitoring the performance of assessors over time may
objective and should be determined before the test (see
be useful.
Appendix X1 and Appendix X2).
7.2 Choose assessors in accordance with test objectives. For
example, if the project results are to represent the general
4.3 Each assessor receives four coded samples where two
samples are of one product and the other two samples are of the consumer population, assessors with unknown sensitivity
might be selected. To increase protection of product quality,
other product being tested. The assessors are instructed to
group the four samples into two groups of two based on assessors with demonstrated acuity should be selected.
similarity.
7.3 The decision whether or not to train assessors on the
samples before testing should be addressed prior to testing.
4.4 Results are tallied and significance determined by ref-
Training may include a preliminary presentation on the nature
erence to a statistical table or software that calculates binomial
of the samples and the problem concerned. For example, if the
probabilities.
test concerns the detection of a particular taint, consider the
inclusion of samples during training that demonstrate its
5. Significance and Use
presence and absence. Such demonstration will increase the
5.1 The test method is effective for the following test
panel’s acuity for the taint but may detract from other differ-
objectives:
ences. See STP 758 for details (4). Allow adequate time
5.1.1 To determine whether a perceptible difference results
between the exposure to the training samples and the actual
or a perceptible difference does not result, for example, when
tetrad test to avoid carryover.
a change is made in ingredients, processing, packaging,
7.4 During the test sessions, do not give any information
handling, or storage; or
about product identity, expected treatment effects, or individual
5.1.2 To select, train, and monitor assessors.
performance until all testing is complete.
5.2 The test method itself does not change whether the
7.5 Avoid replicate evaluations by the same assessor when-
purpose of the test is to determine that the products are
ever possible. However, if replications are needed to produce a
perceptibly different versus that the products are sufficiently
sufficient number of total evaluations, every effort should be
similar to be used interchangeably. Only the selected values of
made to have each assessor perform the same number of
α, β, and δ or P change. If the objective of the test is to
d
replicate evaluations.
determine if there is a perceptible difference between two
products, then initially the products are assumed to be indis-
8. Number of Assessors
tinguishable (for example, H : δ or P = 0) and the data are
O d
8.1 Choose the number of assessors to yield the level of
examined to determine if the assumption can be rejected (that
sensitivity called for by the test objectives. The sensitivity of
is, conclude that the products are perceptively different). If the
the test is a function of three values: α, β, and the maximum
objective is to determine if the two products are sufficiently
allowable sensory difference, expressed as either δ or P .
d
similar to be used interchangeably, then initially the products
are assumed to be meaningfully different (for example, H : δ
O
or P > the value chosen to represent a meaningful difference)
d The boldface numbers in parentheses refer to a list of references at the end of
and the data are examined to determine if the assumption can this standard.
E3009 − 24
8.2 Prior to conducting the test, select values for α, β, and δ Table A1.2 (when testing for similarity) to determine the
or P . The following can be considered as general guidelines. number of assessors necessary. Enter the appropriate table in
d
8.2.1 For α-risk, when testing for a difference: a statistically the section corresponding to the selected value of δ or P and
d
significant result at: the column corresponding to the selected value of β. The
8.2.1.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evi-
minimum required number of assessors is found in the row
dence that a difference was apparent; corresponding to the selected value of α. Alternatively, Tables
8.2.1.2 5 % to 1 % (0.05 to 0.01) indicates “moderate”
A1.1 and A1.2 can be used to develop a set of values for δ or
evidence that a difference was apparent; P , α, and β that provide acceptable sensitivity while maintain-
d
8.2.1.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong”
ing the number of assessors within practical limits. The
evidence that a difference was apparent; and
approach is presented in detail in Ref (6). Software that
8.2.1.4 Below 0.1 % (<0.001) indicates “very strong” evi-
performs the same calculations may also be used, for example
dence that a difference was apparent.
by using the discrimSS or d.primeSS function in the R package
8.2.2 For α-risk, when testing for similarity: a statistically
sensR.
significant result at:
8.4 Often in practice, the number of assessors is determined
8.2.2.1 10 % to 5 % (0.10 to 0.05) indicates “slight”
by material conditions (for example, duration of the
evidence that no meaningful difference was apparent;
experiment, number of available assessors, quantity of prod-
8.2.2.2 5 % to 1 % (0.05 to 0.01) indicates “moderate”
uct). Increasing the number of assessors increases the likeli-
evidence that no meaningful difference was apparent;
hood of detecting small values of δ or P .
d
8.2.2.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong”
evidence that no meaningful difference was apparent; and
9. Procedure
8.2.2.4 Below 0.1 % (<0.001) indicates “very strong” evi-
dence that no meaningful difference was apparent.
9.1 Prepare worksheets and scoresheets, either manually or
8.2.3 For δ and P , the value that defines a meaningful
d
using software designed for this purpose (see Appendix X3), in
sensory difference is affected by several factors, such as the
advance of the test so as to utilize an equal number of the six
importance of the product in the company’s portfolio, the stage
possible sequences of two products, A and B:
in the development process at which testing is being done, etc.
AABB BBAA
Presently, there is no consensus on the values of δ or P that
d ABAB BABA
ABBA BAAB
represent small, medium, and large sensory differences.
However, based on input from researchers who use δ in
Distribute these at random among the assessors so that
discrimination testing, the following ranges are presented as
serving order is balanced.
general guidance:
9.2 Present each set of four samples simultaneously if
8.2.3.1 A more risk-averse business unit might consider
possible, following the same spatial arrangement for each
δ ≤ 0.5 to be small values,
assessor. Within the set of four samples, assessors are typically
0.5 < δ ≤ 1.0 to be medium values, and
allowed to make repeated evaluations, for example, retasting,
δ > 1.0 to be large values.
of each sample as desired. If the conditions of the test require
8.2.3.2 A more risk-tolerant business unit might consider
the prevention of repeat evaluations, for example, if samples
δ ≤ 1.0 to be small values,
are too large to serve simultaneously or leave an aftertaste,
1.0 < δ ≤ 1.5 to be medium values, and
present the samples sequentially and do not allow repeated
δ > 1.5 to be large values.
evaluations. In addition, if the samples change over time, for
8.2.3.3 Smaller values of δ are usually chosen in late-stage
example, cereal with milk, samples should be tested sequen-
testing or for very important products in the company’s
tially (or consider using an alternative testing method).
portfolio, whereas larger values are chosen, for example, in
early-stage testing or for less important products.
9.3 Instruct the assessors to evaluate the four test samples in
8.2.3.4 The value of P that corresponds to a value chosen
d
the order presented. The assessor should then group the four
for δ can be obtained, for example, by using a software like the
samples into two groups of two based on similarity. It is critical
rescale function in the R package sensR (5).
that the instructions to the assessors say, “Group the four
For example,
samples into two groups of two based on similarity,” and not,
>library(sensR)
“Identify the two samples that are most similar to each other.”
>rescale(d.prime=c(0.5,0.75,1.00,1.25,1.50),
The latter wording does not correctly represent the tetrad task
method="tetrad")
the assessor is to perform. It should be confirmed that the
will yield the following output:
assessors understand the instructions and the tetrad task in
Estimates for the tetrad protocol:
general, for example, when they are being familiarized with the
pc pd d.prime
mechanics of the test.
1 0.3777187 0.06657806 0.50
2 0.4290391 0.14355862 0.75
9.4 Each scoresheet should provide for a single group of
3 0.4938084 0.24071264 1.00
4 0.5663366 0.34950487 1.25 samples. If a different set of products is to be evaluated by an
5 0.6409366 0.46140484 1.50
assessor in a single session, the completed scoresheet and any
8.3 Having defined the required level of sensitivity for the remaining product from the evaluation just completed should
test using 8.2, use Table A1.1 (when testing for a difference) or be returned to the test administrator prior to receiving the
E3009 − 24
subsequent set of test samples. The assessor cannot go back to similarity, use Table A1.4 to analyze the data from a tetrad test.
any of the previous samples or change the verdict on any If the number of correct answers is less than or equal to the
previous test. number given in Table A1.4, conclude that the samples are
sufficiently similar to be used interchangeably. Again, the
9.5 Do not ask questions about preference, acceptance, or
conclusions are based on the risks accepted when the level of
degree of difference after the initial grouping of samples into
sensitivity (that is, α, β, and δ or P ) was selected.
d
pairs. The selection the assessor has just made may bias the
reply to any additional questions. Responses to such questions
11. Report
may be obtained through separate tests for preference,
11.1 Report the test objective, the results, and the conclu-
acceptance, degree of difference, etc. (see Manual 26) (7). A
sions. The following additional information is recommended:
comment section asking why the choice was made may be
11.1.1 The purpose of the test and the nature of the
included for the assessor’s remarks.
treatment studied;
9.6 The tetrad test is a forced-choice procedure; assessors
11.1.2 Full Identification of the Samples—Origin, age, lot
are not allowed the option of reporting “no difference.” An
number, packaging, where obtained, method of preparation,
assessor who detects no difference between the samples and
quantity, shape, storage prior to testing, serving size, tempera-
requests to report “no difference,” should be instructed to
ture. (Sample information should communicate that all storage,
group the test samples into two pairs randomly. In such
handling, and preparation was done in such a way as to yield
situations the assessor can indicate that the selection was only
samples that differ only due to the variable of interest, if at all.);
a guess in the comments section of the scoresheet.
11.1.3 The number of assessors, the number of correct
selections, and the result of the statistical evaluation;
10. Analysis and Interpretation of Results
11.1.4 Assessors—Age, gender, experience in sensory
10.1 Prior to conducting the test decide whether the objec-
testing, experience with the product category, experience with
tive of the test is to determine that the products are perceptibly
the samples in the test;
different or that the products are sufficiently similar to be used
11.1.5 Any information and any specific instructions given
interchangeably.
to the assessor in connection with the test;
10.1.1 If the objective is to determine that the products are
11.1.6 The test environment: use of booths, simultaneous or
perceptibly different (that is, testing for a difference), then the
sequential presentation, light conditions, whether the identity
null and alternative hypotheses for the test are:
of the samples was disclosed after the test, and the manner in
H : δ or P = 0 (that is, P = ⁄3) versus H : δ or P > 0 (that
O d c A d
which it was done; and
is, P > ⁄3).
c
11.1.7 The location and date of the test and the name of the
10.1.2 If the objective is to determine that the products are
test administrator.
sufficiently similar to use interchangeably (that is, testing for
similarity), then the null and alternative hypotheses for the test 12. Precision and Bias
are (ass
...


This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E3009 − 23a E3009 − 24
Standard Test Method for
Sensory Analysis—Tetrad Test
This standard is issued under the fixed designation E3009; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope
1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two
products or to estimate the magnitude of the perceptible difference.
1.2 This test method applies whether a difference may exist in a single sensory attribute or in several.
1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible
for the difference are not identified.
1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method
E2610).
1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of
regulatory limitations prior to use.
1.6 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Referenced Documents
2.1 ASTM Standards:
E253 Terminology Relating to Sensory Evaluation of Materials and Products
E456 Terminology Relating to Quality and Statistics
E1871 Guide for Serving Protocol for Sensory Evaluation of Foods and Beverages
E1885 Test Method for Sensory Analysis—Triangle Test
E2262 Practice for Estimating Thurstonian Discriminal Distances
E2610 Test Method for Sensory Analysis—Duo-Trio Test
2.2 ISO Standards:
ISO 4120 Sensory Analysis – Methodology – Triangle Test
ISO 10399 Sensory Analysis – Methodology – Duo-Trio Test
This test method is under the jurisdiction of ASTM Committee E18 on Sensory Evaluation and is the direct responsibility of Subcommittee E18.04 on Test Methods.
Current edition approved Dec. 15, 2023Feb. 15, 2024. Published December 2023March 2024. Originally approved in 2015. Last previous edition approved in 2023 as
E3009 – 23.E3009 – 23a. DOI: 10.1520/E3009-23A.10.1520/E3009-24.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
Available from International Organization for Standardization (ISO), 1, ch. de la Voie-Creuse, CP 56, CH-1211 Geneva 20, Switzerland, http://www.iso.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3009 − 24
3. Terminology
3.1 Definitions—For definition of terms relating to sensory analysis, see Terminology E253, and for terms relating to statistics, see
Terminology E456.
3.2 Definitions of Terms Specific to This Standard:
3.2.1 α (alpha) risk—probability of concluding that a perceptible difference exists when, in reality, one does not.
3.2.1.1 Discussion—
Also known as Type I Error or significance level.
3.2.2 β (beta) risk—probability of concluding that no perceptible difference exists when, in reality, one does.
3.2.2.1 Discussion—
Also known as Type II Error.
3.2.3 δ—Thurstonian measure of sensory difference (effect size) relative to perceptual noise (standard deviation) (see Practice
E2262).
3.2.4 P —the probability of obtaining a correct answer from an assessor in the test.
c
3.2.4.1 Discussion—
1 1
If the products are indistinguishable sensorially, P = ⁄3 in the tetrad test; while if the products are perceptibly different, P > ⁄3
c c
.
3.2.5 P —proportion of assessors who can discriminate the two products in the test.
d
3.2.5.1 Discussion—
P is the measure of sensory difference used in the guessing model.
d
3.2.6 product—material to be evaluated.
3.2.7 sample—unit of product prepared, presented, and evaluated in the test.
3.2.8 sensitivity—general term used to summarize the performance characteristics of the test.
3.2.8.1 Discussion—
The sensitivity of the test is rigorously defined, in statistical terms, by the values selected for α, β, δ, or P .
d
4. Summary of Test Method
4.1 Clearly define the test objective in writing.
4.2 Choose the number of assessors based on the objective of the test (that is, testing for a difference or testing for similarity) and
the level of sensitivity desired for the test. The sensitivity of the test is, in part, a function of two competing risks, α and β and
the maximum acceptable difference between the samples, as measured by δ or P . (that is, The value chosen by the researcher for
d
δ or P a meaningful difference). prior to the test represents the threshold of a meaningful difference, where a value larger than δ
d
or P represents a meaningfully large perceptible difference. When testing for a difference, α is the risk of declaring the samples
d
different when they are not, and β is the risk of not declaring the samples different when they are. When testing for similarity, the
meanings of α and β are reversed. are (and the difference is, in fact, equal to δ or P ). When testing for similarity, α is the risk
d
of declaring the samples similar when they are not, and β is the risk of not declaring the samples similar when they are. Acceptable
values of α and β vary depending on the test objective and should be determined before the test (see Appendix X1 and Appendix
X2).
4.3 Each assessor receives four coded samples where two samples are of one product and the other two samples are of the other
product being tested. The assessors are instructed to group the four samples into two groups of two based on similarity.
4.4 Results are tallied and significance determined by reference to a statistical table or software that calculates binomial
probabilities.
E3009 − 24
5. Significance and Use
5.1 The test method is effective for the following test objectives:
5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change
is made in ingredients, processing, packaging, handling, or storage; or
5.1.2 To select, train, and monitor assessors.
5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different
versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or P change.
d
If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are
assumed to be indistinguishable (for example, H : δ or P = 0) and the data are examined to determine if the assumption can be
O d
rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are
sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example,
H : δ or P > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption
O d
can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably).
5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue,
carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred.
6. Apparatus
6.1 Carry out the test under conditions that prevent contact between assessors until the evaluations have been completed, for
example, using booths that comply with STP 913 (1).
6.2 Sample preparation and serving sizes should comply with Practice E1871. See Refs (2) or (3).
7. Assessors
7.1 All assessors must be familiar with the mechanics of the tetrad test (the format, the task, and the procedure of evaluation).
Experience and familiarity with the product and test method may increase the sensitivity of an assessor and may therefore increase
the likelihood of finding a significant difference. Monitoring the performance of assessors over time may be useful.
7.2 Choose assessors in accordance with test objectives. For example, if the project results are to represent the general consumer
population, assessors with unknown sensitivity might be selected. To increase protection of product quality, assessors with
demonstrated acuity should be selected.
7.3 The decision whether or not to train assessors on the samples before testing should be addressed prior to testing. Training may
include a preliminary presentation on the nature of the samples and the problem concerned. For example, if the test concerns the
detection of a particular taint, consider the inclusion of samples during training that demonstrate its presence and absence. Such
demonstration will increase the panel’s acuity for the taint but may detract from other differences. See STP 758 for details (4).
Allow adequate time between the exposure to the training samples and the actual tetrad test to avoid carryover.
7.4 During the test sessions, do not give any information about product identity, expected treatment effects, or individual
performance until all testing is complete.
7.5 Avoid replicate evaluations by the same assessor whenever possible. However, if replications are needed to produce a sufficient
number of total evaluations, every effort should be made to have each assessor perform the same number of replicate evaluations.
8. Number of Assessors
8.1 Choose the number of assessors to yield the level of sensitivity called for by the test objectives. The sensitivity of the test is
a function of three values: α, β, and the maximum allowable sensory difference, expressed as either δ or P .
d
The boldface numbers in parentheses refer to a list of references at the end of this standard.
E3009 − 24
8.2 Prior to conducting the test, select values for α, β, and δ or P . The following can be considered as general guidelines.
d
E3009 − 24
8.2.1 For α-risk, when testing for a difference: a statistically significant result at:
8.2.1.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evidence that a difference was apparent;
8.2.1.2 5 % to 1 % (0.05 to 0.01) indicates “moderate” evidence that a difference was apparent;
8.2.1.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” evidence that a difference was apparent; and
8.2.1.4 Below 0.1 % (<0.001) indicates “very strong” evidence that a difference was apparent.
8.2.2 For α-risk, when testing for similarity: a statistically significant result at:
8.2.2.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evidence that no meaningful difference was apparent;
8.2.2.2 5 % to 1 % (0.05 to 0.01) indicates “moderate” evidence that no meaningful difference was apparent;
8.2.2.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” evidence that no meaningful difference was apparent; and
8.2.2.4 Below 0.1 % (<0.001) indicates “very strong” evidence that no meaningful difference was apparent.
8.2.3 For δ and P , the value that defines a meaningful sensory difference is affected by several factors, such as the importance
d
of the product in the company’s portfolio, the stage in the development process at which testing is being done, etc. As a general
guide, meaningful differences fall into three ranges:
Presently, there is no consensus on the values of δ or P that represent small, medium, and large sensory differences. However,
d
based on input from researchers who use δ in discrimination testing, the following ranges are presented as general guidance:
8.2.3.1 A more risk-averse business unit might consider
δ ≤ 0.5 to be small values,
0.5 < δ ≤ 1.0 to be medium values, and
δ > 1.0 to be large values.
8.2.3.2 A more risk-tolerant business unit might consider
δ ≤ 1.0 to be small values,
1.0 < δ ≤ 1.5 to be medium values, and
δ > 1.5 to be large values.
8.2.3.3 Smaller values of δ are usually chosen in late-stage testing or for very important products in the company’s portfolio,
whereas larger values are chosen, for example, in early-stage testing or for less important products.
8.2.3.4 δ < 0.5 or The value of P < 20 % represent small values;that corresponds to a value chosen for δ can be obtained, for
d
example, by using a software like the rescale function in the R package sensR (5).
For example,
>library(sensR)
>rescale(d.prime=c(0.5,0.75,1.00,1.25,1.50), method="tetrad")
will yield the following output:
Estimates for the tetrad protocol:
pc pd d.prime
1 0.3777187 0.06657806 0.50
2 0.4290391 0.14355862 0.75
3 0.4938084 0.24071264 1.00
4 0.5663366 0.34950487 1.25
5 0.6409366 0.46140484 1.50
8.2.3.2 0.5 < δ < 1.0 or 20 % < P < 30 % represent medium sized values; and
d
8.2.3.3 δ > 1.0 or P > 30 % represent large values.
d
E3009 − 24
8.3 Having defined the required level of sensitivity for the test using 8.2, use Table A1.1 (when testing for a difference) or Table
A1.2 (when testing for similarity) to determine the number of assessors necessary. Enter the appropriate table in the section
corresponding to the selected value of δ or P and the column corresponding to the selected value of β. The minimum required
d
number of assessors is found in the row corresponding to the selected value of α. Alternatively, Tables A1.1 and A1.2 can be used
to develop a set of values for δ or P , α, and β that provide acceptable sensitivity while maintaining the number of assessors within
d
practical limits. The approach is presented in detail in Ref (56). Software that performs the same calculations may also be used,
for example by using the discrimSS or d.primeSS function in the R package sensR.
8.4 Often in practice, the number of assessors is determined by material conditions (for example, duration of the experiment,
number of available assessors, quantity of product). Increasing the number of assessors increases the likelihood of detecting small
values of δ or P .
d
9. Procedure
9.1 Prepare worksheets and scoresheets, either manually or using software designed for this purpose (see Appendix X3), in
advance of the test so as to utilize an equal number of the six possible sequences of two products, A and B:
AABB BBAA
ABAB BABA
ABBA BAAB
Distribute these at random among the assessors so that serving order is balanced.
9.2 Present each set of four samples simultaneously if possible, following the same spatial arrangement for each assessor. Within
the set of four samples, assessors are typically allowed to make repeated evaluations, for example, retasting, of each sample as
desired. If the conditions of the test require the prevention of repeat evaluations, for example, if samples are too large to serve
simultaneously or leave an aftertaste, present the samples sequentially and do not allow repeated evaluations. In addition, if the
samples change over time, for example, cereal with milk, samples should be tested sequentially (or consider using an alternative
testing method).
9.3 Instruct the assessors to evaluate the four test samples in the order presented. The assessor should then group the four samples
into two groups of two based on similarity. It is critical that the instructions to the assessors say, “Group the four samples into two
groups of two based on similarity,” and not, “Identify the two samples that are most similar to each other.” The latter wording does
not correctly represent the tetrad task the assessor is to perform. It should be confirmed that the assessors understand the
instructions and the tetrad task in general, for example, when they are being familiarized with the mechanics of the test.
9.4 Each scoresheet should provide for a single group of samples. If a different set of products is to be evaluated by an assessor
in a single session, the completed scoresheet and any remaining product from the evaluation just completed should be returned to
the test administrator prior to receiving the subsequent set of test samples. The assessor cannot go back to any of the previous
samples or change the verdict on any previous test.
9.5 Do not ask questions about preference, acceptance, or degree of difference after the initial grouping of samples into pairs. The
selection the assessor has just made may bias the reply to any additional questions. Responses to such questions may be obtained
through separate tests for preference, acceptance, degree of difference, etc. (see Manual 26) (67). A comment section asking why
the choice was made may be included for the assessor’s remarks.
9.6 The tetrad test is a forced-choice procedure; assessors are not allowed the option of reporting “no difference.” An assessor who
detects no difference between the samples and requests to report “no difference,” should be instructed to group the test samples
into two pairs randomly. In such situations the assessor can indicate that the selection was only a guess in the comments section
of the scoresheet.
10. Analysis and Interpretation of Results
10.1 Prior to conducting the test decide whether the objective of the test is to determine that the products are perceptibly different
or that the products are sufficiently similar to be used interchangeably.
E3009 − 24
10.1.1 If the objective is to determine that the products are perceptibly different (that is, testing for a difference), then the null and
alternative hypotheses for the test are:
1 1
H : δ or P = 0 (that is, P = ⁄3) versus H : δ or P > 0 (that is, P > ⁄3).
O d c A d c
10.1.2 If the objective is to determine that the products are sufficiently similar to use interchangeably (that is, testing for
similarity), then the null and alternative hypotheses for the test are (assuming that the value chosen to represent a meaningful
difference has been specified as d):
10.1.2.1 H : δ or P > d versus H : δ or P ≤ d.
O d A d
10.2 Use Table A1.3 to analyze the data obtained from a tetrad test when testing for a difference. If the number of correct responses
is greater than or equal to the number given in Table A1.3, conclude that a perceptible difference exists between the samples.
Failure to obtain a significant difference using Table A1.3 does not allow the researcher to conclude that the samples are
significantly similar. Instead, when testing for similarity, use Table A1.4 to analyze the data from a tetrad test. If the number of
correct answers is less than or equal to the number given in Table A1.4, conclude that the samples are sufficiently similar to be
used interchangeably. Again, the conclusions are based on the risks accepted when the level of sensitivity (that is, α, β, and δ or
P ) was selected.
d
11. Report
11.1 Report the test objective, the results, and the conclusions. The following additional information is recommended:
11.1.1 The purpose of the test and the nature of the treatment studied;
11.1.2 Full Identification of the Samples—Origin, age, lot number, packaging, where obtained, method of preparation, quantity,
shape, storage prio
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...