ASTM E3009-23a
(Test Method)Standard Test Method for Sensory Analysis-Tetrad Test
Standard Test Method for Sensory Analysis-Tetrad Test
SIGNIFICANCE AND USE
5.1 The test method is effective for the following test objectives:
5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change is made in ingredients, processing, packaging, handling, or storage; or
5.1.2 To select, train, and monitor assessors.
5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or Pd change. If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are assumed to be indistinguishable (for example, HO: δ or Pd = 0) and the data are examined to determine if the assumption can be rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example, HO: δ or Pd > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably).
5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue, carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred.
SCOPE
1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products or to estimate the magnitude of the perceptible difference.
1.2 This test method applies whether a difference may exist in a single sensory attribute or in several.
1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible for the difference are not identified.
1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method E2610).
1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use.
1.6 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
General Information
Relations
Frequently Asked Questions
ASTM E3009-23a is a standard published by ASTM International. Its full title is "Standard Test Method for Sensory Analysis-Tetrad Test". This standard covers: SIGNIFICANCE AND USE 5.1 The test method is effective for the following test objectives: 5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change is made in ingredients, processing, packaging, handling, or storage; or 5.1.2 To select, train, and monitor assessors. 5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or Pd change. If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are assumed to be indistinguishable (for example, HO: δ or Pd = 0) and the data are examined to determine if the assumption can be rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example, HO: δ or Pd > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably). 5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue, carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred. SCOPE 1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products or to estimate the magnitude of the perceptible difference. 1.2 This test method applies whether a difference may exist in a single sensory attribute or in several. 1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible for the difference are not identified. 1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method E2610). 1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.6 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
SIGNIFICANCE AND USE 5.1 The test method is effective for the following test objectives: 5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change is made in ingredients, processing, packaging, handling, or storage; or 5.1.2 To select, train, and monitor assessors. 5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or Pd change. If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are assumed to be indistinguishable (for example, HO: δ or Pd = 0) and the data are examined to determine if the assumption can be rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example, HO: δ or Pd > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably). 5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue, carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred. SCOPE 1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two products or to estimate the magnitude of the perceptible difference. 1.2 This test method applies whether a difference may exist in a single sensory attribute or in several. 1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible for the difference are not identified. 1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method E2610). 1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of regulatory limitations prior to use. 1.6 This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
ASTM E3009-23a is classified under the following ICS (International Classification for Standards) categories: 67.240 - Sensory analysis. The ICS classification helps identify the subject area and facilitates finding related standards.
ASTM E3009-23a has the following relationships with other standards: It is inter standard links to ASTM E3009-23, ASTM E3009-24, ASTM E3261-21, ASTM E3093-20. Understanding these relationships helps ensure you are using the most current and applicable version of the standard.
You can purchase ASTM E3009-23a directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ASTM standards.
Standards Content (Sample)
This international standard was developed in accordance with internationally recognized principles on standardization established in the Decision on Principles for the
Development of International Standards, Guides and Recommendations issued by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
Designation: E3009 − 23a
Standard Test Method for
Sensory Analysis—Tetrad Test
This standard is issued under the fixed designation E3009; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope E2262 Practice for Estimating Thurstonian Discriminal Dis-
tances
1.1 This test method covers a procedure for determining
E2610 Test Method for Sensory Analysis—Duo-Trio Test
whether a perceptible sensory difference exists between
samples of two products or to estimate the magnitude of the 2.2 ISO Standards:
perceptible difference. ISO 4120 Sensory Analysis – Methodology – Triangle Test
ISO 10399 Sensory Analysis – Methodology – Duo-Trio
1.2 This test method applies whether a difference may exist
Test
in a single sensory attribute or in several.
1.3 This test method is applicable when the nature of the
3. Terminology
difference between the samples is unknown. The attribute(s)
3.1 Definitions—For definition of terms relating to sensory
responsible for the difference are not identified.
analysis, see Terminology E253, and for terms relating to
1.4 The tetrad test is more efficient statistically than the
statistics, see Terminology E456.
triangle test (Test Method E1885) or the duo-trio test (Test
3.2 Definitions of Terms Specific to This Standard:
Method E2610).
3.2.1 α (alpha) risk—probability of concluding that a per-
1.5 This standard does not purport to address all of the
ceptible difference exists when, in reality, one does not.
safety concerns, if any, associated with its use. It is the
3.2.1.1 Discussion—Also known as Type I Error or signifi-
responsibility of the user of this standard to establish appro-
cance level.
priate safety, health, and environmental practices and deter-
3.2.2 β (beta) risk—probability of concluding that no per-
mine the applicability of regulatory limitations prior to use.
ceptible difference exists when, in reality, one does.
1.6 This international standard was developed in accor-
3.2.2.1 Discussion—Also known as Type II Error.
dance with internationally recognized principles on standard-
ization established in the Decision on Principles for the 3.2.3 δ—Thurstonian measure of sensory difference (effect
Development of International Standards, Guides and Recom-
size) relative to perceptual noise (standard deviation) (see
mendations issued by the World Trade Organization Technical
Practice E2262).
Barriers to Trade (TBT) Committee.
3.2.4 P —the probability of obtaining a correct answer from
c
an assessor in the test.
2. Referenced Documents
3.2.4.1 Discussion—If the products are indistinguishable
2.1 ASTM Standards:
sensorially, P = ⁄3 in the tetrad test; while if the products are
c
E253 Terminology Relating to Sensory Evaluation of Mate- 1
perceptibly different, P > ⁄3 .
c
rials and Products
3.2.5 P —proportion of assessors who can discriminate the
d
E456 Terminology Relating to Quality and Statistics
two products in the test.
E1871 Guide for Serving Protocol for Sensory Evaluation of
3.2.5.1 Discussion—P is the measure of sensory difference
d
Foods and Beverages
used in the guessing model.
E1885 Test Method for Sensory Analysis—Triangle Test
3.2.6 product—material to be evaluated.
3.2.7 sample—unit of product prepared, presented, and
This test method is under the jurisdiction of ASTM Committee E18 on Sensory
evaluated in the test.
Evaluation and is the direct responsibility of Subcommittee E18.04 on Test
Methods.
3.2.8 sensitivity—general term used to summarize the per-
Current edition approved Dec. 15, 2023. Published December 2023. Originally
formance characteristics of the test.
approved in 2015. Last previous edition approved in 2023 as E3009 – 23. DOI:
10.1520/E3009-23A.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on Available from International Organization for Standardization (ISO), 1, ch. de
the ASTM website. la Voie-Creuse, CP 56, CH-1211 Geneva 20, Switzerland, http://www.iso.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3009 − 23a
3.2.8.1 Discussion—The sensitivity of the test is rigorously sensory fatigue, carryover, or adaptation, methods that involve
defined, in statistical terms, by the values selected for α, β, δ, the evaluation of fewer samples (same-different, triangle test,
or P . etc.) may be preferred.
d
4. Summary of Test Method 6. Apparatus
4.1 Clearly define the test objective in writing. 6.1 Carry out the test under conditions that prevent contact
between assessors until the evaluations have been completed,
4.2 Choose the number of assessors based on the objective
for example, using booths that comply with STP 913 (1).
of the test (that is, testing for a difference or testing for
similarity) and the level of sensitivity desired for the test. The 6.2 Sample preparation and serving sizes should comply
sensitivity of the test is, in part, a function of two competing with Practice E1871. See Refs (2) or (3).
risks, α and β and the maximum acceptable difference between
7. Assessors
the samples, δ or P (that is, a meaningful difference). When
d
testing for a difference, α is the risk of declaring the samples 7.1 All assessors must be familiar with the mechanics of the
different when they are not, and β is the risk of not declaring tetrad test (the format, the task, and the procedure of evalua-
the samples different when they are. When testing for tion). Experience and familiarity with the product and test
similarity, the meanings of α and β are reversed. When testing method may increase the sensitivity of an assessor and may
for similarity, α is the risk of declaring the samples similar therefore increase the likelihood of finding a significant differ-
when they are not, and β is the risk of not declaring the samples ence. Monitoring the performance of assessors over time may
similar when they are. Acceptable values of α and β vary
be useful.
depending on the test objective and should be determined
7.2 Choose assessors in accordance with test objectives. For
before the test (see Appendix X1 and Appendix X2).
example, if the project results are to represent the general
4.3 Each assessor receives four coded samples where two consumer population, assessors with unknown sensitivity
samples are of one product and the other two samples are of the
might be selected. To increase protection of product quality,
other product being tested. The assessors are instructed to assessors with demonstrated acuity should be selected.
group the four samples into two groups of two based on
7.3 The decision whether or not to train assessors on the
similarity.
samples before testing should be addressed prior to testing.
4.4 Results are tallied and significance determined by ref-
Training may include a preliminary presentation on the nature
erence to a statistical table or software that calculates binomial of the samples and the problem concerned. For example, if the
probabilities.
test concerns the detection of a particular taint, consider the
inclusion of samples during training that demonstrate its
5. Significance and Use
presence and absence. Such demonstration will increase the
5.1 The test method is effective for the following test panel’s acuity for the taint but may detract from other differ-
ences. See STP 758 for details (4). Allow adequate time
objectives:
5.1.1 To determine whether a perceptible difference results between the exposure to the training samples and the actual
tetrad test to avoid carryover.
or a perceptible difference does not result, for example, when
a change is made in ingredients, processing, packaging,
7.4 During the test sessions, do not give any information
handling, or storage; or
about product identity, expected treatment effects, or individual
5.1.2 To select, train, and monitor assessors.
performance until all testing is complete.
5.2 The test method itself does not change whether the
7.5 Avoid replicate evaluations by the same assessor when-
purpose of the test is to determine that the products are
ever possible. However, if replications are needed to produce a
perceptibly different versus that the products are sufficiently
sufficient number of total evaluations, every effort should be
similar to be used interchangeably. Only the selected values of
made to have each assessor perform the same number of
α, β, and δ or P change. If the objective of the test is to
d
replicate evaluations.
determine if there is a perceptible difference between two
products, then initially the products are assumed to be indis-
8. Number of Assessors
tinguishable (for example, H : δ or P = 0) and the data are
O d
8.1 Choose the number of assessors to yield the level of
examined to determine if the assumption can be rejected (that
sensitivity called for by the test objectives. The sensitivity of
is, conclude that the products are perceptively different). If the
the test is a function of three values: α, β, and the maximum
objective is to determine if the two products are sufficiently
allowable sensory difference, expressed as either δ or P .
d
similar to be used interchangeably, then initially the products
8.2 Prior to conducting the test, select values for α, β, and δ
are assumed to be meaningfully different (for example, H : δ
O
or P . The following can be considered as general guidelines.
d
or P > the value chosen to represent a meaningful difference)
d
and the data are examined to determine if the assumption can
be rejected (that is, conclude that the samples are sufficiently
similar to be used interchangeably).
5.3 The tetrad method involves the evaluation of four
The boldface numbers in parentheses refer to a list of references at the end of
samples. When the products being tested cause excessive this standard.
E3009 − 23a
8.2.1 For α-risk, when testing for a difference: a statistically
AABB BBAA
ABAB BABA
significant result at:
ABBA BAAB
8.2.1.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evi-
Distribute these at random among the assessors so that
dence that a difference was apparent;
serving order is balanced.
8.2.1.2 5 % to 1 % (0.05 to 0.01) indicates “moderate”
evidence that a difference was apparent;
9.2 Present each set of four samples simultaneously if
8.2.1.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” possible, following the same spatial arrangement for each
evidence that a difference was apparent; and assessor. Within the set of four samples, assessors are typically
allowed to make repeated evaluations, for example, retasting,
8.2.1.4 Below 0.1 % (<0.001) indicates “very strong” evi-
of each sample as desired. If the conditions of the test require
dence that a difference was apparent.
the prevention of repeat evaluations, for example, if samples
8.2.2 For α-risk, when testing for similarity: a statistically
are too large to serve simultaneously or leave an aftertaste,
significant result at:
present the samples sequentially and do not allow repeated
8.2.2.1 10 % to 5 % (0.10 to 0.05) indicates “slight”
evaluations. In addition, if the samples change over time, for
evidence that no meaningful difference was apparent;
example, cereal with milk, samples should be tested sequen-
8.2.2.2 5 % to 1 % (0.05 to 0.01) indicates “moderate”
tially (or consider using an alternative testing method).
evidence that no meaningful difference was apparent;
9.3 Instruct the assessors to evaluate the four test samples in
8.2.2.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong”
evidence that no meaningful difference was apparent; and the order presented. The assessor should then group the four
samples into two groups of two based on similarity. It is critical
8.2.2.4 Below 0.1 % (<0.001) indicates “very strong” evi-
that the instructions to the assessors say, “Group the four
dence that no meaningful difference was apparent.
samples into two groups of two based on similarity,” and not,
8.2.3 For δ and P , the value that defines a meaningful
d
“Identify the two samples that are most similar to each other.”
sensory difference is affected by several factors, such as the
The latter wording does not correctly represent the tetrad task
importance of the product in the company’s portfolio, the stage
the assessor is to perform. It should be confirmed that the
in the development process at which testing is being done, etc.
assessors understand the instructions and the tetrad task in
As a general guide, meaningful differences fall into three
general, for example, when they are being familiarized with the
ranges:
mechanics of the test.
8.2.3.1 δ < 0.5 or P < 20 % represent small values;
d
8.2.3.2 0.5 < δ < 1.0 or 20 % < P < 30 % represent medium 9.4 Each scoresheet should provide for a single group of
d
sized values; and samples. If a different set of products is to be evaluated by an
assessor in a single session, the completed scoresheet and any
8.2.3.3 δ > 1.0 or P > 30 % represent large values.
d
remaining product from the evaluation just completed should
8.3 Having defined the required level of sensitivity for the
be returned to the test administrator prior to receiving the
test using 8.2, use Table A1.1 (when testing for a difference) or
subsequent set of test samples. The assessor cannot go back to
Table A1.2 (when testing for similarity) to determine the
any of the previous samples or change the verdict on any
number of assessors necessary. Enter the appropriate table in
previous test.
the section corresponding to the selected value of δ or P and
d
9.5 Do not ask questions about preference, acceptance, or
the column corresponding to the selected value of β. The
degree of difference after the initial grouping of samples into
minimum required number of assessors is found in the row
corresponding to the selected value of α. Alternatively, Tables pairs. The selection the assessor has just made may bias the
reply to any additional questions. Responses to such questions
A1.1 and A1.2 can be used to develop a set of values for δ or
P , α, and β that provide acceptable sensitivity while maintain- may be obtained through separate tests for preference,
d
acceptance, degree of difference, etc. (see Manual 26) (6). A
ing the number of assessors within practical limits. The
approach is presented in detail in Ref (5). Software that comment section asking why the choice was made may be
included for the assessor’s remarks.
performs the same calculations may also be used, for example
by using the discrimSS or d.primeSS function in the R package
9.6 The tetrad test is a forced-choice procedure; assessors
sensR.
are not allowed the option of reporting “no difference.” An
assessor who detects no difference between the samples and
8.4 Often in practice, the number of assessors is determined
by material conditions (for example, duration of the requests to report “no difference,” should be instructed to
experiment, number of available assessors, quantity of prod- group the test samples into two pairs randomly. In such
uct). Increasing the number of assessors increases the likeli- situations the assessor can indicate that the selection was only
hood of detecting small values of δ or P . a guess in the comments section of the scoresheet.
d
9. Procedure 10. Analysis and Interpretation of Results
9.1 Prepare worksheets and scoresheets, either manually or 10.1 Prior to conducting the test decide whether the objec-
using software designed for this purpose (see Appendix X3), in tive of the test is to determine that the products are perceptibly
advance of the test so as to utilize an equal number of the six different or that the products are sufficiently similar to be used
possible sequences of two products, A and B: interchangeably.
E3009 − 23a
10.1.1 If the objective is to determine that the products are 11.1.2 Full Identification of the Samples—Origin, age, lot
perceptibly different (that is, testing for a difference), then the number, packaging, where obtained, method of preparation,
null and alternative hypotheses for the test are: quantity, shape, storage prior to testing, serving size, tempera-
H : δ or P = 0 (that is, P = ⁄3) versus H : δ or P > 0 (that ture. (Sample information should communicate that all storage,
O d c A d
is, P > ⁄3). handling, and preparation was done in such a way as to yield
c
10.1.2 If the objective is to determine that the products are samples that differ only due to the variable of interest, if at all.);
sufficiently similar to use interchangeably (that is, testing for 11.1.3 The number of assessors, the number of correct
similarity), then the null and alternative hypotheses for the test selections, and the result of the statistical evaluation;
are (assuming that the value chosen to represent a meaningful 11.1.4 Assessors—Age, gender, experience in sensory
difference has been specified as d): testing, experience with the product category, experience with
10.1.2.1 H : δ or P > d versus H : δ or P ≤ d. the samples in the test;
O d A d
11.1.5 Any information and any specific instructions given
10.2 Use Table A1.3 to analyze the data obtained from a
to the assessor in connection with the test;
tetrad test when testing for a difference. If the number of
11.1.6 The test environment: use of booths, simultaneous or
correct responses is greater than or equal to the number given
sequential presentation, light conditions, whether the identity
in Table A1.3, conclude that a perceptible difference exists
of the samples was disclosed after the test, and the manner in
between the samples. Failure to obtain a significant difference
which it was done; and
using Table A1.3 does not allow the researcher to conclude that
11.1.7 The location and date of the test and the name of the
the samples are significantly similar. Instead, when testing for
test administrator.
similarity, use Table A1.4 to analyze the data from a tetrad test.
If the number of correct answers is less than or equal to the
12. Precision and Bias
number given in Table A1.4, conclude that the samples are
12.1 Because results of sensory difference tests are func-
sufficiently similar to be used interchangeably. Again, the
tions of individual sensitivities, a general statement regarding
conclusions are based on the risks accepted when the level of
the precision of results that is applicable to all populations of
sen
...
This document is not an ASTM standard and is intended only to provide the user of an ASTM standard an indication of what changes have been made to the previous version. Because
it may not be technically possible to adequately depict all changes accurately, ASTM recommends that users consult prior editions as appropriate. In all cases only the current version
of the standard as published by ASTM is to be considered the official document.
Designation: E3009 − 23 E3009 − 23a
Standard Test Method for
Sensory Analysis—Tetrad Test
This standard is issued under the fixed designation E3009; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope
1.1 This test method covers a procedure for determining whether a perceptible sensory difference exists between samples of two
products or to estimate the magnitude of the perceptible difference.
1.2 This test method applies whether a difference may exist in a single sensory attribute or in several.
1.3 This test method is applicable when the nature of the difference between the samples is unknown. The attribute(s) responsible
for the difference are not identified.
1.4 The tetrad test is more efficient statistically than the triangle test (Test Method E1885) or the duo-trio test (Test Method
E2610).
1.5 This standard does not purport to address all of the safety concerns, if any, associated with its use. It is the responsibility
of the user of this standard to establish appropriate safety, health, and environmental practices and determine the applicability of
regulatory limitations prior to use.
1.6 This international standard was developed in accordance with internationally recognized principles on standardization
established in the Decision on Principles for the Development of International Standards, Guides and Recommendations issued
by the World Trade Organization Technical Barriers to Trade (TBT) Committee.
2. Referenced Documents
2.1 ASTM Standards:
E253 Terminology Relating to Sensory Evaluation of Materials and Products
E456 Terminology Relating to Quality and Statistics
E1871 Guide for Serving Protocol for Sensory Evaluation of Foods and Beverages
E1885 Test Method for Sensory Analysis—Triangle Test
E2262 Practice for Estimating Thurstonian Discriminal Distances
E2610 Test Method for Sensory Analysis—Duo-Trio Test
2.2 ISO Standards:
ISO 4120 Sensory Analysis – Methodology – Triangle Test
ISO 10399 Sensory Analysis – Methodology – Duo-Trio Test
This test method is under the jurisdiction of ASTM Committee E18 on Sensory Evaluation and is the direct responsibility of Subcommittee E18.04 on Fundamentals
of SensoryTest Methods.
Current edition approved April 1, 2023Dec. 15, 2023. Published May 2023December 2023. Originally approved in 2015. Last previous edition approved in 20152023 as
ɛ1
E3009 – 15E3009 – 23. . DOI: 10.1520/E3009-23.10.1520/E3009-23A.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM Standards
volume information, refer to the standard’s Document Summary page on the ASTM website.
Available from International Organization for Standardization (ISO), 1, ch. de la Voie-Creuse, CP 56, CH-1211 Geneva 20, Switzerland, http://www.iso.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3009 − 23a
3. Terminology
3.1 Definitions—For definition of terms relating to sensory analysis, see Terminology E253, and for terms relating to statistics, see
Terminology E456.
3.2 Definitions of Terms Specific to This Standard:
3.2.1 α (alpha) risk—probability of concluding that a perceptible difference exists when, in reality, one does not.
3.2.1.1 Discussion—
Also known as Type I Error or significance level.
3.2.2 β (beta) risk—probability of concluding that no perceptible difference exists when, in reality, one does.
3.2.2.1 Discussion—
Also known as Type II Error.
3.2.3 δ—Thurstonian measure of sensory difference (effect size) relative to perceptual noise (standard deviation) (see Practice
E2262).
3.2.4 P —the probability of obtaining a correct answer from an assessor in the test.
c
3.2.4.1 Discussion—
1 1
If the products are indistinguishable sensorially, P = ⁄3 in the tetrad test; while if the products are perceptibly different, P > ⁄3
c c
.
3.2.5 P —proportion of assessors who can discriminate the two products in the test.
d
3.2.5.1 Discussion—
P is the measure of sensory difference used in the guessing model.
d
3.2.6 product—material to be evaluated.
3.2.7 sample—unit of product prepared, presented, and evaluated in the test.
3.2.8 sensitivity—general term used to summarize the performance characteristics of the test.
3.2.8.1 Discussion—
The sensitivity of the test is rigorously defined, in statistical terms, by the values selected for α, β, δ, or P .
d
4. Summary of Test Method
4.1 Clearly define the test objective in writing.
4.2 Choose the number of assessors based on the objective of the test (that is, testing for a difference or testing for similarity) and
the level of sensitivity desired for the test. The sensitivity of the test is, in part, a function of two competing risks, α and β and
the maximum acceptable difference between the samples, δ or P (that is, a meaningful difference). When testing for a difference,
d
α is the risk of declaring the samples different when they are not, and β is the risk of not declaring the samples different when they
are. When testing for similarity, the meanings of α and β are reversed. When testing for similarity, α is the risk of declaring the
samples similar when they are not, and β is the risk of not declaring the samples similar when they are. Acceptable values of α
and β vary depending on the test objective and should be determined before the test (see Appendix X1 and Appendix X2).
4.3 Each assessor receives four coded samples where two samples are of one product and the other two samples are of the other
product being tested. The assessors are instructed to group the four samples into two groups of two based on similarity.
4.4 Results are tallied and significance determined by reference to a statistical table or software that calculates binomial
probabilities.
5. Significance and Use
5.1 The test method is effective for the following test objectives:
E3009 − 23a
5.1.1 To determine whether a perceptible difference results or a perceptible difference does not result, for example, when a change
is made in ingredients, processing, packaging, handling, or storage; or
5.1.2 To select, train, and monitor assessors.
5.2 The test method itself does not change whether the purpose of the test is to determine that the products are perceptibly different
versus that the products are sufficiently similar to be used interchangeably. Only the selected values of α, β, and δ or P change.
d
If the objective of the test is to determine if there is a perceptible difference between two products, then initially the products are
assumed to be indistinguishable (for example, H : δ or P = 0) and the data are examined to determine if the assumption can be
O d
rejected (that is, conclude that the products are perceptively different). If the objective is to determine if the two products are
sufficiently similar to be used interchangeably, then initially the products are assumed to be meaningfully different (for example,
H : δ or P > the value chosen to represent a meaningful difference) and the data are examined to determine if the assumption
O d
can be rejected (that is, conclude that the samples are sufficiently similar to be used interchangeably).
5.3 The tetrad method involves the evaluation of four samples. When the products being tested cause excessive sensory fatigue,
carryover, or adaptation, methods that involve the evaluation of fewer samples (same-different, triangle test, etc.) may be preferred.
6. Apparatus
6.1 Carry out the test under conditions that prevent contact between assessors until the evaluations have been completed, for
example, using booths that comply with STP 913 (1).
6.2 Sample preparation and serving sizes should comply with Practice E1871. See Refs (2) or (3).
7. Assessors
7.1 All assessors must be familiar with the mechanics of the tetrad test (the format, the task, and the procedure of evaluation).
Experience and familiarity with the product and test method may increase the sensitivity of an assessor and may therefore increase
the likelihood of finding a significant difference. Monitoring the performance of assessors over time may be useful.
7.2 Choose assessors in accordance with test objectives. For example, if the project results are to represent the general consumer
population, assessors with unknown sensitivity might be selected. To increase protection of product quality, assessors with
demonstrated acuity should be selected.
7.3 The decision whether or not to train assessors on the samples before testing should be addressed prior to testing. Training may
include a preliminary presentation on the nature of the samples and the problem concerned. For example, if the test concerns the
detection of a particular taint, consider the inclusion of samples during training that demonstrate its presence and absence. Such
demonstration will increase the panel’s acuity for the taint but may detract from other differences. See STP 758 for details (4).
Allow adequate time between the exposure to the training samples and the actual tetrad test to avoid carryover.
7.4 During the test sessions, do not give any information about product identity, expected treatment effects, or individual
performance until all testing is complete.
7.5 Avoid replicate evaluations by the same assessor whenever possible. However, if replications are needed to produce a sufficient
number of total evaluations, every effort should be made to have each assessor perform the same number of replicate evaluations.
8. Number of Assessors
8.1 Choose the number of assessors to yield the level of sensitivity called for by the test objectives. The sensitivity of the test is
a function of three values: α, β, and the maximum allowable sensory difference, expressed as either δ or P .
d
8.2 Prior to conducting the test, select values for α, β, and δ or P . The following can be considered as general guidelines.
d
The boldface numbers in parentheses refer to a list of references at the end of this standard.
E3009 − 23a
8.2.1 For α-risk, when testing for a difference: a statistically significant result at:
8.2.1.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evidence that a difference was apparent;
8.2.1.2 5 % to 1 % (0.05 to 0.01) indicates “moderate” evidence that a difference was apparent;
8.2.1.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” evidence that a difference was apparent; and
8.2.1.4 Below 0.1 % (<0.001) indicates “very strong” evidence that a difference was apparent.
8.2.2 For α-risk, when testing for similarity: a statistically significant result at:
8.2.2.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evidence that no meaningful difference was apparent;
8.2.2.2 5 % to 1 % (0.05 to 0.01) indicates “moderate” evidence that no meaningful difference was apparent;
8.2.2.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” evidence that no meaningful difference was apparent; and
8.2.2.4 Below 0.1 % (<0.001) indicates “very strong” evidence that no meaningful difference was apparent.
8.2.3 For δ and P , the value that defines a meaningful sensory difference is affected by several factors, such as the importance
d
of the product in the company’s portfolio, the stage in the development process at which testing is being done, etc. As a general
guide, meaningful differences fall into three ranges:
8.2.3.1 δ < 0.5 or P < 20 % represent small values;
d
8.2.3.2 0.5 < δ < 1.0 or 20 % < P < 30 % represent medium sized values; and
d
8.2.3.3 δ > 1.0 or P > 30 % represent large values.
d
8.3 Having defined the required level of sensitivity for the test using 8.2, use Table A1.1 (when testing for a difference) or Table
A1.2 (when testing for similarity) to determine the number of assessors necessary. Enter the appropriate table in the section
corresponding to the selected value of δ or P and the column corresponding to the selected value of β. The minimum required
d
number of assessors is found in the row corresponding to the selected value of α. Alternatively, Tables A1.1 and A1.2 can be used
to develop a set of values for δ or P , α, and β that provide acceptable sensitivity while maintaining the number of assessors within
d
practical limits. The approach is presented in detail in Ref (5). Software that performs the same calculations may also be used, for
example by using the discrimSS or d.primeSS function in the R package sensR.
8.4 Often in practice, the number of assessors is determined by material conditions (for example, duration of the experiment,
number of available assessors, quantity of product). Increasing the number of assessors increases the likelihood of detecting small
values of δ or P .
d
9. Procedure
9.1 Prepare worksheets and scoresheets, either manually or using software designed for this purpose (see Appendix X3), in
advance of the test so as to utilize an equal number of the six possible sequences of two products, A and B:
AABB BBAA
ABAB BABA
ABBA BAAB
Distribute these at random among the assessors so that serving order is balanced.
9.2 Present each set of four samples simultaneously if possible, following the same spatial arrangement for each assessor. Within
the set of four samples, assessors are typically allowed to make repeated evaluations, for example, retasting, of each sample as
desired. If the conditions of the test require the prevention of repeat evaluations, for example, if samples are too large to serve
E3009 − 23a
simultaneously or leave an aftertaste, present the samples sequentially and do not allow repeated evaluations. In addition, if the
samples change over time, for example, cereal with milk, samples should be tested sequentially (or consider using an alternative
testing method).
9.3 Instruct the assessors to evaluate the four test samples in the order presented. The assessor should then group the four samples
into two groups of two based on similarity. It is critical that the instructions to the assessors say, “Group the four samples into two
groups of two based on similarity,” and not, “Identify the two samples that are most similar to each other.” The latter wording does
not correctly represent the tetrad task the assessor is to perform. It should be confirmed that the assessors understand the
instructions and the tetrad task in general, for example, when they are being familiarized with the mechanics of the test.
9.4 Each scoresheet should provide for a single group of samples. If a different set of products is to be evaluated by an assessor
in a single session, the completed scoresheet and any remaining product from the evaluation just completed should be returned to
the test administrator prior to receiving the subsequent set of test samples. The assessor cannot go back to any of the previous
samples or change the verdict on any previous test.
9.5 Do not ask questions about preference, acceptance, or degree of difference after the initial grouping of samples into pairs. The
selection the assessor has just made may bias the reply to any additional questions. Responses to such questions may be obtained
through separate tests for preference, acceptance, degree of difference, etc. (see Manual 26) (6). A comment section asking why
the choice was made may be included for the assessor’s remarks.
9.6 The tetrad test is a forced-choice procedure; assessors are not allowed the option of reporting “no difference.” An assessor who
detects no difference between the samples and requests to report “no difference,” should be instructed to group the test samples
into two pairs randomly. In such situations the assessor can indicate that the selection was only a guess in the comments section
of the scoresheet.
10. Analysis and Interpretation of Results
10.1 Prior to conducting the test decide whether the objective of the test is to determine that the products are perceptibly different
or that the products are sufficiently similar to be used interchangeably.
10.1.1 If the objective is to determine that the products are perceptibly different (that is, testing for a difference), then the null and
alternative hypotheses for the test are:
1 1
H : δ or P = 0 (that is, P = ⁄3) versus H : δ or P > 0 (that is, P > ⁄3).
O d c A d c
10.1.2 If the objective is to determine that the products are sufficiently similar to use interchangeably (that is, testing for
similarity), then the null and alternative hypotheses for the test are (assuming that the value chosen to represent a meaningful
difference has been specified as d):
10.1.2.1 H : δ or P > d versus H : δ or P ≤ d.
O d A d
10.2 Use Table A1.3 to analyze the data obtained from a tetrad test when testing for a difference. If the number of correct responses
is greater than or equal to the number given in Table A1.3, conclude that a perceptible difference exists between the samples.
Failure to obtain a significant difference using Table A1.3 does not allow the researcher to conclude that the samples are
significantly similar. Instead, when testing for similarity, use Table A1.4 to analyze the data from a tetrad test. If the number of
correct answers is less than or equal to the number given in Table A1.4, conclude that the samples are sufficiently similar to be
used interchangeably. Again, the conclusions are based on the risks accepted when the level of sensitivity (that is, α, β, and δ or
P ) was selected.
d
11. Report
11.1 Report the test objective, the results, and the conclusions. The following additional information is recommended:
11.1.1 The purpose of the test and the nature of the treatment studied;
11.1.2 Full Identification of the Samples—Origin, age, lot number, packaging, where obtained, method of preparation, quantity,
shape, storage prior to testing, serving size, temperature. (Sample information should communicate that all storage, handling, and
preparation was done in such a way as to yield samples that differ only due to the variable of interest, if at all.);
E3009 − 23a
11.1.3 The number of assessors, the number of correct selections, and the result of the statistical evaluation;
11.1.4 Assessors—Age, gender, experience in sensory testing, experience with the product category, experience with the samples
in the test;
11.1.5 Any information and any specific instructions given to the assessor in connection with the test;
11.1.6 The test environment: use of booths, simultaneous or sequential presentation, light conditions, whether the identity of the
samples was disclosed after the test, and the manner in wh
...
NOTICE: This standard has either been superseded and replaced by a new version or withdrawn.
Contact ASTM International (www.astm.org) for the latest information
Designation: E3009 − 23a
Standard Test Method for
Sensory Analysis—Tetrad Test
This standard is issued under the fixed designation E3009; the number immediately following the designation indicates the year of
original adoption or, in the case of revision, the year of last revision. A number in parentheses indicates the year of last reapproval. A
superscript epsilon (´) indicates an editorial change since the last revision or reapproval.
1. Scope E2262 Practice for Estimating Thurstonian Discriminal Dis-
tances
1.1 This test method covers a procedure for determining
E2610 Test Method for Sensory Analysis—Duo-Trio Test
whether a perceptible sensory difference exists between
samples of two products or to estimate the magnitude of the 2.2 ISO Standards:
perceptible difference. ISO 4120 Sensory Analysis – Methodology – Triangle Test
ISO 10399 Sensory Analysis – Methodology – Duo-Trio
1.2 This test method applies whether a difference may exist
Test
in a single sensory attribute or in several.
1.3 This test method is applicable when the nature of the
3. Terminology
difference between the samples is unknown. The attribute(s)
3.1 Definitions—For definition of terms relating to sensory
responsible for the difference are not identified.
analysis, see Terminology E253, and for terms relating to
1.4 The tetrad test is more efficient statistically than the
statistics, see Terminology E456.
triangle test (Test Method E1885) or the duo-trio test (Test
3.2 Definitions of Terms Specific to This Standard:
Method E2610).
3.2.1 α (alpha) risk—probability of concluding that a per-
1.5 This standard does not purport to address all of the
ceptible difference exists when, in reality, one does not.
safety concerns, if any, associated with its use. It is the
3.2.1.1 Discussion—Also known as Type I Error or signifi-
responsibility of the user of this standard to establish appro-
cance level.
priate safety, health, and environmental practices and deter-
3.2.2 β (beta) risk—probability of concluding that no per-
mine the applicability of regulatory limitations prior to use.
ceptible difference exists when, in reality, one does.
1.6 This international standard was developed in accor-
3.2.2.1 Discussion—Also known as Type II Error.
dance with internationally recognized principles on standard-
ization established in the Decision on Principles for the
3.2.3 δ—Thurstonian measure of sensory difference (effect
Development of International Standards, Guides and Recom-
size) relative to perceptual noise (standard deviation) (see
mendations issued by the World Trade Organization Technical
Practice E2262).
Barriers to Trade (TBT) Committee.
3.2.4 P —the probability of obtaining a correct answer from
c
an assessor in the test.
2. Referenced Documents
3.2.4.1 Discussion—If the products are indistinguishable
2.1 ASTM Standards: 1
sensorially, P = ⁄3 in the tetrad test; while if the products are
c
E253 Terminology Relating to Sensory Evaluation of Mate-
perceptibly different, P > ⁄3 .
c
rials and Products
3.2.5 P —proportion of assessors who can discriminate the
d
E456 Terminology Relating to Quality and Statistics
two products in the test.
E1871 Guide for Serving Protocol for Sensory Evaluation of
3.2.5.1 Discussion—P is the measure of sensory difference
d
Foods and Beverages
used in the guessing model.
E1885 Test Method for Sensory Analysis—Triangle Test
3.2.6 product—material to be evaluated.
3.2.7 sample—unit of product prepared, presented, and
This test method is under the jurisdiction of ASTM Committee E18 on Sensory
evaluated in the test.
Evaluation and is the direct responsibility of Subcommittee E18.04 on Test
Methods.
3.2.8 sensitivity—general term used to summarize the per-
Current edition approved Dec. 15, 2023. Published December 2023. Originally
formance characteristics of the test.
approved in 2015. Last previous edition approved in 2023 as E3009 – 23. DOI:
10.1520/E3009-23A.
For referenced ASTM standards, visit the ASTM website, www.astm.org, or
contact ASTM Customer Service at service@astm.org. For Annual Book of ASTM
Standards volume information, refer to the standard’s Document Summary page on Available from International Organization for Standardization (ISO), 1, ch. de
the ASTM website. la Voie-Creuse, CP 56, CH-1211 Geneva 20, Switzerland, http://www.iso.org.
Copyright © ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA 19428-2959. United States
E3009 − 23a
3.2.8.1 Discussion—The sensitivity of the test is rigorously sensory fatigue, carryover, or adaptation, methods that involve
defined, in statistical terms, by the values selected for α, β, δ, the evaluation of fewer samples (same-different, triangle test,
or P . etc.) may be preferred.
d
4. Summary of Test Method 6. Apparatus
4.1 Clearly define the test objective in writing. 6.1 Carry out the test under conditions that prevent contact
between assessors until the evaluations have been completed,
4.2 Choose the number of assessors based on the objective
for example, using booths that comply with STP 913 (1).
of the test (that is, testing for a difference or testing for
similarity) and the level of sensitivity desired for the test. The 6.2 Sample preparation and serving sizes should comply
sensitivity of the test is, in part, a function of two competing with Practice E1871. See Refs (2) or (3).
risks, α and β and the maximum acceptable difference between
7. Assessors
the samples, δ or P (that is, a meaningful difference). When
d
testing for a difference, α is the risk of declaring the samples
7.1 All assessors must be familiar with the mechanics of the
different when they are not, and β is the risk of not declaring tetrad test (the format, the task, and the procedure of evalua-
the samples different when they are. When testing for tion). Experience and familiarity with the product and test
similarity, the meanings of α and β are reversed. When testing method may increase the sensitivity of an assessor and may
for similarity, α is the risk of declaring the samples similar therefore increase the likelihood of finding a significant differ-
when they are not, and β is the risk of not declaring the samples
ence. Monitoring the performance of assessors over time may
similar when they are. Acceptable values of α and β vary be useful.
depending on the test objective and should be determined
7.2 Choose assessors in accordance with test objectives. For
before the test (see Appendix X1 and Appendix X2).
example, if the project results are to represent the general
4.3 Each assessor receives four coded samples where two
consumer population, assessors with unknown sensitivity
samples are of one product and the other two samples are of the might be selected. To increase protection of product quality,
other product being tested. The assessors are instructed to
assessors with demonstrated acuity should be selected.
group the four samples into two groups of two based on
7.3 The decision whether or not to train assessors on the
similarity.
samples before testing should be addressed prior to testing.
4.4 Results are tallied and significance determined by ref- Training may include a preliminary presentation on the nature
erence to a statistical table or software that calculates binomial
of the samples and the problem concerned. For example, if the
probabilities. test concerns the detection of a particular taint, consider the
inclusion of samples during training that demonstrate its
5. Significance and Use
presence and absence. Such demonstration will increase the
5.1 The test method is effective for the following test panel’s acuity for the taint but may detract from other differ-
objectives: ences. See STP 758 for details (4). Allow adequate time
between the exposure to the training samples and the actual
5.1.1 To determine whether a perceptible difference results
or a perceptible difference does not result, for example, when tetrad test to avoid carryover.
a change is made in ingredients, processing, packaging,
7.4 During the test sessions, do not give any information
handling, or storage; or
about product identity, expected treatment effects, or individual
5.1.2 To select, train, and monitor assessors.
performance until all testing is complete.
5.2 The test method itself does not change whether the
7.5 Avoid replicate evaluations by the same assessor when-
purpose of the test is to determine that the products are
ever possible. However, if replications are needed to produce a
perceptibly different versus that the products are sufficiently
sufficient number of total evaluations, every effort should be
similar to be used interchangeably. Only the selected values of
made to have each assessor perform the same number of
α, β, and δ or P change. If the objective of the test is to
d
replicate evaluations.
determine if there is a perceptible difference between two
products, then initially the products are assumed to be indis- 8. Number of Assessors
tinguishable (for example, H : δ or P = 0) and the data are
O d
8.1 Choose the number of assessors to yield the level of
examined to determine if the assumption can be rejected (that
sensitivity called for by the test objectives. The sensitivity of
is, conclude that the products are perceptively different). If the
the test is a function of three values: α, β, and the maximum
objective is to determine if the two products are sufficiently
allowable sensory difference, expressed as either δ or P .
d
similar to be used interchangeably, then initially the products
8.2 Prior to conducting the test, select values for α, β, and δ
are assumed to be meaningfully different (for example, H : δ
O
or P . The following can be considered as general guidelines.
d
or P > the value chosen to represent a meaningful difference)
d
and the data are examined to determine if the assumption can
be rejected (that is, conclude that the samples are sufficiently
similar to be used interchangeably).
5.3 The tetrad method involves the evaluation of four 4
The boldface numbers in parentheses refer to a list of references at the end of
samples. When the products being tested cause excessive this standard.
E3009 − 23a
8.2.1 For α-risk, when testing for a difference: a statistically
AABB BBAA
ABAB BABA
significant result at:
ABBA BAAB
8.2.1.1 10 % to 5 % (0.10 to 0.05) indicates “slight” evi-
Distribute these at random among the assessors so that
dence that a difference was apparent;
serving order is balanced.
8.2.1.2 5 % to 1 % (0.05 to 0.01) indicates “moderate”
evidence that a difference was apparent;
9.2 Present each set of four samples simultaneously if
8.2.1.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” possible, following the same spatial arrangement for each
evidence that a difference was apparent; and assessor. Within the set of four samples, assessors are typically
allowed to make repeated evaluations, for example, retasting,
8.2.1.4 Below 0.1 % (<0.001) indicates “very strong” evi-
of each sample as desired. If the conditions of the test require
dence that a difference was apparent.
the prevention of repeat evaluations, for example, if samples
8.2.2 For α-risk, when testing for similarity: a statistically
are too large to serve simultaneously or leave an aftertaste,
significant result at:
present the samples sequentially and do not allow repeated
8.2.2.1 10 % to 5 % (0.10 to 0.05) indicates “slight”
evaluations. In addition, if the samples change over time, for
evidence that no meaningful difference was apparent;
example, cereal with milk, samples should be tested sequen-
8.2.2.2 5 % to 1 % (0.05 to 0.01) indicates “moderate”
tially (or consider using an alternative testing method).
evidence that no meaningful difference was apparent;
8.2.2.3 1 % to 0.1 % (0.01 to 0.001) indicates “strong” 9.3 Instruct the assessors to evaluate the four test samples in
the order presented. The assessor should then group the four
evidence that no meaningful difference was apparent; and
samples into two groups of two based on similarity. It is critical
8.2.2.4 Below 0.1 % (<0.001) indicates “very strong” evi-
that the instructions to the assessors say, “Group the four
dence that no meaningful difference was apparent.
samples into two groups of two based on similarity,” and not,
8.2.3 For δ and P , the value that defines a meaningful
d
“Identify the two samples that are most similar to each other.”
sensory difference is affected by several factors, such as the
The latter wording does not correctly represent the tetrad task
importance of the product in the company’s portfolio, the stage
the assessor is to perform. It should be confirmed that the
in the development process at which testing is being done, etc.
assessors understand the instructions and the tetrad task in
As a general guide, meaningful differences fall into three
general, for example, when they are being familiarized with the
ranges:
mechanics of the test.
8.2.3.1 δ < 0.5 or P < 20 % represent small values;
d
8.2.3.2 0.5 < δ < 1.0 or 20 % < P < 30 % represent medium 9.4 Each scoresheet should provide for a single group of
d
sized values; and samples. If a different set of products is to be evaluated by an
assessor in a single session, the completed scoresheet and any
8.2.3.3 δ > 1.0 or P > 30 % represent large values.
d
remaining product from the evaluation just completed should
8.3 Having defined the required level of sensitivity for the
be returned to the test administrator prior to receiving the
test using 8.2, use Table A1.1 (when testing for a difference) or
subsequent set of test samples. The assessor cannot go back to
Table A1.2 (when testing for similarity) to determine the
any of the previous samples or change the verdict on any
number of assessors necessary. Enter the appropriate table in
previous test.
the section corresponding to the selected value of δ or P and
d
9.5 Do not ask questions about preference, acceptance, or
the column corresponding to the selected value of β. The
minimum required number of assessors is found in the row degree of difference after the initial grouping of samples into
pairs. The selection the assessor has just made may bias the
corresponding to the selected value of α. Alternatively, Tables
A1.1 and A1.2 can be used to develop a set of values for δ or reply to any additional questions. Responses to such questions
may be obtained through separate tests for preference,
P , α, and β that provide acceptable sensitivity while maintain-
d
ing the number of assessors within practical limits. The acceptance, degree of difference, etc. (see Manual 26) (6). A
comment section asking why the choice was made may be
approach is presented in detail in Ref (5). Software that
performs the same calculations may also be used, for example included for the assessor’s remarks.
by using the discrimSS or d.primeSS function in the R package
9.6 The tetrad test is a forced-choice procedure; assessors
sensR.
are not allowed the option of reporting “no difference.” An
8.4 Often in practice, the number of assessors is determined assessor who detects no difference between the samples and
by material conditions (for example, duration of the requests to report “no difference,” should be instructed to
experiment, number of available assessors, quantity of prod- group the test samples into two pairs randomly. In such
uct). Increasing the number of assessors increases the likeli- situations the assessor can indicate that the selection was only
hood of detecting small values of δ or P . a guess in the comments section of the scoresheet.
d
9. Procedure 10. Analysis and Interpretation of Results
9.1 Prepare worksheets and scoresheets, either manually or 10.1 Prior to conducting the test decide whether the objec-
using software designed for this purpose (see Appendix X3), in tive of the test is to determine that the products are perceptibly
advance of the test so as to utilize an equal number of the six different or that the products are sufficiently similar to be used
possible sequences of two products, A and B: interchangeably.
E3009 − 23a
10.1.1 If the objective is to determine that the products are 11.1.2 Full Identification of the Samples—Origin, age, lot
perceptibly different (that is, testing for a difference), then the number, packaging, where obtained, method of preparation,
null and alternative hypotheses for the test are: quantity, shape, storage prior to testing, serving size, tempera-
H : δ or P = 0 (that is, P = ⁄3) versus H : δ or P > 0 (that ture. (Sample information should communicate that all storage,
O d c A d
is, P > ⁄3). handling, and preparation was done in such a way as to yield
c
10.1.2 If the objective is to determine that the products are samples that differ only due to the variable of interest, if at all.);
sufficiently similar to use interchangeably (that is, testing for 11.1.3 The number of assessors, the number of correct
similarity), then the null and alternative hypotheses for the test selections, and the result of the statistical evaluation;
are (assuming that the value chosen to represent a meaningful 11.1.4 Assessors—Age, gender, experience in sensory
difference has been specified as d): testing, experience with the product category, experience with
10.1.2.1 H : δ or P > d versus H : δ or P ≤ d. the samples in the test;
O d A d
11.1.5 Any information and any specific instructions given
10.2 Use Table A1.3 to analyze the data obtained from a
to the assessor in connection with the test;
tetrad test when testing for a difference. If the number of
11.1.6 The test environment: use of booths, simultaneous or
correct responses is greater than or equal to the number given
sequential presentation, light conditions, whether the identity
in Table A1.3, conclude that a perceptible difference exists
of the samples was disclosed after the test, and the manner in
between the samples. Failure to obtain a significant difference
which it was done; and
using Table A1.3 does not allow the researcher to conclude that
11.1.7 The location and date of the test and the name of the
the samples are significantly similar. Instead, when testing for
test administrator.
similarity, use Table A1.4 to analyze the data from a tetrad test.
If the number of correct answers is less than or equal to the
12. Precision and Bias
number given in Table A1.4, conclude that the samples are
12.1 Because results of sensory difference tests are func-
sufficiently similar to be used interchangeably. Again, the
tions of individual sensitivities, a general statement regarding
conclusions are based on the risks accepted when the level of
the precision of results that is applicable to all populations of
sensitivity (that is, α, β, and δ or P ) was selected.
d
assessors cannot be ma
...












Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...