SIST ISO 20462-1:2011
Photography - Psychophysical experimental methods for estimating image quality - Part 1: Overview of psychophysical elements
Photography - Psychophysical experimental methods for estimating image quality - Part 1: Overview of psychophysical elements
This part of ISO 20462 is part of a multiple-part standard pertaining to the subjective evaluation of pictorial still image quality. This part of ISO 20462 a) defines the units by which image quality is quantified (just noticeable differences, or JNDs); b) describes the influence of stimulus properties, observer characteristics, and task instructions on results obtained from rating experiments; c) provides a flow chart for choosing the preferred psychophysical method for determining image quality from among those defined in subsequent parts of ISO 20462.
Photographie - Méthodes psychophysiques expérimentales pour estimer la qualité d'image - Partie 1: Aperçu général des éléments psychophysiques
Fotografija - Psihofizične eksperimentalne metode za ocenjevanje slikovne kakovosti - 1. del: Pregled psihofizičnih dejavnikov
Ta del ISO 20462 je del večdelnega standarda, ki se nanaša na subjektivno vrednotenje slikovne kakovosti mirujočih slik. Ta del ISO 20462 a) opredeljuje enote, s katerimi se ovrednoti kakovost slike (komaj zaznavne razlike oz. JND); b) opisuje vpliv lastnosti dražljajev, značilnosti opazovalca in navodil za nalogo na rezultate, dobljene iz ocenjevalnih eksperimentov; c) podaja shematski prikaz za izbor prednostne psihofizične metode za določanje kakovosti slike izmed metod, navedenih v naslednjih delih ISO 20462.
General Information
Standards Content (Sample)
INTERNATIONAL ISO
STANDARD 20462-1
First edition
2005-11-01
Photography — Psychophysical
experimental methods for estimating
image quality —
Part 1:
Overview of psychophysical elements
Photographie — Méthodes psychophysiques expérimentales pour
estimer la qualité d'image —
Partie 1: Aperçu général des éléments psychophysiques
Reference number
ISO 20462-1:2005(E)
©
ISO 2005
---------------------- Page: 1 ----------------------
ISO 20462-1:2005(E)
PDF disclaimer
This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but
shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In
downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat
accepts no liability in this area.
Adobe is a trademark of Adobe Systems Incorporated.
Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation
parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In
the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
© ISO 2005
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means,
electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or
ISO's member body in the country of the requester.
ISO copyright office
Case postale 56 • CH-1211 Geneva 20
Tel. + 41 22 749 01 11
Fax + 41 22 749 09 47
E-mail copyright@iso.org
Web www.iso.org
Published in Switzerland
ii © ISO 2005 – All rights reserved
---------------------- Page: 2 ----------------------
ISO 20462-1:2005(E)
Contents Page
Foreword. iv
Introduction . v
1 Scope. 1
2 Normative references. 1
3 Terms and definitions. 1
4 Specification of the experimental conditions and results. 5
4.1 Observer characteristics. 5
4.2 Stimulus properties . 6
4.3 Instructions to the observer . 6
4.4 Viewing conditions . 7
4.5 Experimental duration. 7
4.6 Results. 8
4.7 Summary of reported quantities . 8
Annex A (informative) Selection of an appropriate psychophysical method . 9
Annex B (informative) Stimulus differences, paired comparison proportions, and JNDs . 11
Annex C (informative) Example of a report of a psychophysical experiment. 13
Annex D (informative) Comparison of selected psychometric methods. 15
Bibliography . 17
© ISO 2005 – All rights reserved iii
---------------------- Page: 3 ----------------------
ISO 20462-1:2005(E)
Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.
International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.
The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.
Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.
ISO 20462-1 was prepared by Technical Committee ISO/TC 42, Photography.
ISO 20462 consists of the following parts, under the general title Photography — Psychophysical experimental
methods for estimating image quality:
⎯ Part 1: Overview of psychophysical elements
⎯ Part 2: Triplet comparison method
⎯ Part 3: Quality ruler method
iv © ISO 2005 – All rights reserved
---------------------- Page: 4 ----------------------
ISO 20462-1:2005(E)
Introduction
There are many circumstances under which it is desirable to quantify image quality in a standardized fashion
that facilitates interpretation of results within a given experiment and/or comparison of results between
different experiments. Such information can be of value in assessing the performance of different capture or
display devices, image processing algorithms, etc. under various conditions. There are a number of
psychometric methods described in the literature, such as paired comparison, rank ordering, categorical sort,
and magnitude estimation, which might be considered as candidates for experimentally measuring image
[1] [3] [4] [5] [9] [12]
quality. Several textbooks have reviewed these and other methods and have discussed
[11]
associated data reduction techniques, which usually are based upon the approach of Thurstone or
analogous reasoning. However, the choice of the best method for a particular application may be difficult to
make, and interpretation of the rating scales produced by the numerical analyses is frequently ambiguous.
Furthermore, none of the commonly used techniques provides an efficient mechanism for calibration of the
results against a standardised numerical scale or associated physical references, which is desirable when
results of different experiments are to be compared or integrated. The value of new calibrated psychometric
methods in developing comprehensive models of imaging system quality has been demonstrated in a recent
[6]
work that contains more detailed discussions of many of the informative topics superficially considered
herein.
The three parts of ISO 20462 address the need for documented means of determining image quality in a
calibrated fashion. Part 1 provides an overview of practical psychophysics; specific experimental methods and
[8] [10]
associated data reduction techniques are described in Part 2 (triplet comparison method ) and Part 3
[6]
(quality ruler method ). Informative Annex A aids in identifying the better choice between the two alternative
methods of Parts 2 to 3, which are complementary and together are sufficient to span a wide range of
applications. It is the intent of these methods to produce results that are not merely directional in nature, but
are expressed in terms of relative or fixed scales that are calibrated in just noticeable differences (JNDs), so
that the significance of experimentally measured stimulus differences is readily ascertained.
© ISO 2005 – All rights reserved v
---------------------- Page: 5 ----------------------
INTERNATIONAL STANDARD ISO 20462-1:2005(E)
Photography — Psychophysical experimental methods
for estimating image quality —
Part 1:
Overview of psychophysical elements
1 Scope
This part of ISO 20462 is part of a multiple-part standard pertaining to the subjective evaluation of pictorial still
image quality. This part of ISO 20462
a) defines the units by which image quality is quantified (just noticeable differences, or JNDs);
b) describes the influence of stimulus properties, observer characteristics, and task instructions on results
obtained from rating experiments;
c) provides a flow chart for choosing the preferred psychophysical method for determining image quality
from among those defined in subsequent parts of ISO 20462.
2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.
ISO 3664, Viewing conditions — Graphic technology and photography
3 Terms and definitions
For the purpose of this document, the following terms and definitions apply:
3.1
artefactual attribute
attribute of image quality that, when evident in an image, nearly always leads to a loss of overall image quality
EXAMPLE Examples of artefactual attributes include noise and aliasing.
NOTE The commonly used terms defect and impairment are similar in meaning.
3.2
attribute
aspect, dimension, or component of overall image quality
cf. artefactual attribute (3.1) and preferential attribute (3.12)
© ISO 2005 – All rights reserved 1
---------------------- Page: 6 ----------------------
ISO 20462-1:2005(E)
EXAMPLE Examples of image quality attributes include image structure properties such as sharpness and noise;
colour and tone reproduction properties such as contrast, colour balance, and relative colourfulness; and digital artefacts
such as aliasing, contouring, and compression defects.
3.3
attribute just noticeable difference
attribute JND
measure of the detectability of appearance variations, corresponding to a stimulus difference that leads to a
75:25 proportion of responses in a paired comparison task in which univariate stimuli pairs are assessed in
terms of a single attribute identified in the instructions
cf. quality JND (3.14)
NOTE 1 As an example, a paired comparison identifying the sharper of two stimuli that differ only in their generating
system modulation transfer function (MTF), would yield results in terms of sharpness attribute JNDs. If the MTF curves
differed monotonically and did not cross, the outcome of the paired comparison would depend primarily upon the
observers’ ability to detect changes in the appearance of the stimuli as a function of MTF variations, with little or no value
judgement required of the observers. The relationship between paired comparison proportions and stimulus differences is
discussed in greater detail in Annex B.
NOTE 2 If observers are instead asked to choose which of a pair of stimuli is higher in overall image quality, and if the
stimuli in aggregate are multivariate, such that the observer should make value judgements of the importance of a number
of attributes, rather than focussing on one aspect of image appearance, it is observed experimentally that larger objective
stimulus differences (for example, MTF changes) are required to obtain a 75:25 proportion of responses, which in this
case corresponds to a quality JND.
NOTE 3 A JND is a statistical quantity, derived from a number of observations. An observer assessing a single pair of
images differing by one attribute JND is unlikely to be confident that he or she has detected the sample difference. A
stimulus difference of approximately three JNDs is usually needed for an observer of average sensitivity to feel reasonably
certain of his or her assessment.
3.4
categorical sort method
psychophysical method involving the classification of a stimulus into one of several ordered categories, at
least some of which are identified by adjectives or phrases that describe different levels of image quality or
attributes thereof
NOTE The application of adjectival descriptors is strongly affected by the range of stimuli presented, so that it is
difficult to compare the results of one categorical sort experiment to another. Range effects and the coarse quantization of
categorical sort experiments also hinder conversion of the responses to JND units. Given these limitations, it is not
possible to unambiguously map adjectival descriptors to JND units, but it is worth noting that in some experiments where a
broad range of stimuli have been presented, the categories excellent, very good, good, fair, poor, and not worth keeping
have been found to provide very roughly comparable intervals that average about six quality JNDs in width.
3.5
image quality
impression of the overall merit or excellence of an image, as perceived by an observer neither associated with
the act of photography, nor closely involved with the subject matter depicted
NOTE The purpose of defining image quality in terms of third-party (uninvolved) observers is to eliminate sources of
variability that arise from more idiosyncratic aspects of image perception and pertain to attributes outside the control of
imaging system designers.
3.6
instructions
set of directions given to the observer for performing the psychophysical evaluation task
3.7
just noticeable difference
JND
stimulus difference that leads to a 75:25 proportion of responses in a paired comparison task
cf. attribute JND (3.3) and quality JND (3.14)
2 © ISO 2005 – All rights reserved
---------------------- Page: 7 ----------------------
ISO 20462-1:2005(E)
3.8
magnitude estimation method
psychophysical method involving the assignment of a numerical value to each test stimulus that is proportional
to image quality; typically, a reference stimulus with an assigned numerical value is present to anchor the
rating scale
NOTE The numerical scale resulting from a magnitude estimation experiment is usually assumed to constitute a ratio
scale, which, ideally, is a scale in which a constant percentage change in value corresponds with one JND. In practice,
modest deviations from this behaviour occur, complicating the transformation of the rating scale into units of JNDs without
inclusion of unidentified reference stimuli (having known quality) among the test stimuli.
3.9
multivariate
describing a series of test or reference stimuli that vary in multiple attributes of image quality
3.10
observer
individual performing the subjective evaluation task in a psychophysical method
3.11
paired comparison method
psychophysical method involving the choice of which of two simultaneously presented stimuli exhibits greater
or lesser image quality or an attribute thereof, in accordance with a set of instructions given to the observer
NOTE Two limitations of the paired comparison method are as follows.
a) If all possible stimulus comparisons are done, as is usually the case, a large number of assessments are required for
even modest numbers of experimental stimulus levels [if N levels are to be studied, N(N − 1)/2 paired comparisons
are needed].
b) If a stimulus difference exceeds approximately 1,5 JNDs, the magnitude of the stimulus difference cannot be directly
estimated reliably because the response saturates as the proportions approach unanimity.
However, if a series of stimuli having no large gaps are assessed, the differences between more widely separated stimuli
may be deduced indirectly by summing smaller, reliably determined (unsaturated) stimulus differences. The standard
methods for transformation of paired comparison data to an interval scale (a scale linearly related to JNDs) perform
statistically optimized procedures for inferring the stimulus differences, but they may yield unreliable results when
saturated responses are included in the analysis.
3.12
preferential attribute
attribute of image quality that is invariably evident in an image, and for which the preferred degree is a matter
of opinion, depending upon both the observer and the image content
EXAMPLE Examples of preferential image quality attributes include colour and tone reproduction properties such as
contrast and relative colourfulness. Because the perceived quality associated with a preferential attribute is dependent
upon both the observer and image content, in studies involving variations of preferential attributes, particular care is
needed in the selection of representative sets of stimuli and groups of observers.
NOTE The term noticeable in just noticeable difference is not linguistically strictly correct when applied to a
preferential attribute, but is nonetheless retained in this part of ISO 20462 for convenience. For example, the higher
contrast stimulus of a pair differing only in contrast might be readily identified by all observers, whereas there might be a
lack of consensus regarding which of the two images was higher in overall image quality. Nonetheless, if the responses
from the paired comparison for quality were in the proportion of 75:25, the image chosen more frequently would be said to
be one JND higher in quality. The JND is best regarded as a measurement unit tied to the predicted or measured outcome
of a paired comparison.
3.13
psychophysical method
experimental technique for subjective evaluation of image quality or attributes thereof, from which stimulus
differences in units of JNDs may be estimated
© ISO 2005 – All rights reserved 3
---------------------- Page: 8 ----------------------
ISO 20462-1:2005(E)
cf. categorical sort (3.4), magnitude estimation (3.8), paired comparison (3.11), quality ruler (3.15), rank
ordering (3.16) and triplet comparison methods (3.24)
3.14
quality just noticeable difference
quality JND
measure of the significance or importance of quality variations, corresponding to a stimulus difference that
leads to a 75:25 proportion of responses in a paired comparison task in which multivariate stimuli pairs are
assessed in terms of overall image quality
NOTE 1 See Notes for attribute JND (3.3).
NOTE 2 The attribute JND is a measure of detectability of appearance changes, whereas the quality JND is a measure
of significance or importance of stimulus differences in terms of their impact on quality. An attribute JND is a useful unit for
predicting how observers would react to an advertisement showing images carefully matched in all respects but one, and
drawing the attention of the observer to the attribute varying. In contrast, a quality JND is useful for predicting how
observers would perceive overall quality as a function of one or more stimulus variations, and so is a more useful quantity
in optimizing imaging system design, where different attributes should be balanced against one another. The overall
quality of an image may be predicted from a knowledge of the impact of each attribute in isolation, expressed in terms of
quality JNDs, whereas the same is not true of attribute JNDs. Therefore, it is often highly desirable to obtain results
expressed in quality JNDs, even if the stimuli being assessed are univariate in nature. This can be accomplished if test
stimuli are rated against a series of appropriately calibrated reference stimuli, as in the quality ruler method.
3.15
quality ruler method
psychophysical method that involves quality or attribute assessment of a test stimulus against a series of
ordered, univariate reference stimuli that differ by known numbers of JNDs
NOTE The quality ruler method is described in more detail in ISO 20462-3.
3.16
rank ordering method
psychophysical method involving the arrangement by an observer of a series of stimuli in order of increasing
or decreasing image quality or an attribute thereof, in accordance with the set of instructions provided
3.17
reference stimulus
image provided to the observer for the purpose of anchoring or calibrating the perceptual assessments of test
stimuli in such a manner that the given ratings may be converted to JND units
NOTE The plural is reference stimuli.
3.18
scene
content or subject matter of an image, or a starting image from which multiple stimuli may be produced
through different experimental treatments
NOTE Typically, stimuli depicting the same scene are compared in a psychophysical experiment, because it is the
effect of the treatment that is of interest, and differences in image content could cause spurious effects. In cases where
scene content is not matched, a number of scenes should be used so that scene effects may be expected to average out.
3.19
standard quality scale
SQS
fixed numerical scale of quality having the following properties:
a) the numerical scale is anchored against physical standards;
b) a one unit increase in scale value corresponds to an improvement of one JND of quality; and
4 © ISO 2005 – All rights reserved
---------------------- Page: 9 ----------------------
ISO 20462-1:2005(E)
c) a value of zero corresponds to an image having so little information content that the nature of the subject
of the image is difficult to identify
NOTE The standard quality scale is described in more detail in ISO 20462-3.
3.20
stimulus
image presented or provided to the observer either for the purpose of anchoring a perceptual assessment (a
reference stimulus) or for the purpose of subjective evaluation (a test stimulus)
NOTE The plural is stimuli.
3.21
suppression
perceptual effect in which one attribute is present in a degree that seriously degrades image quality and
thereby reduces the impact that other attributes have on overall quality, compared to the impact they would
have had in the absence of the dominant attribute
NOTE To generate reference stimuli that are separated by a specified number of JNDs based on variations in one
attribute, it will be necessary to ensure that other attributes do not significantly suppress the impact of the attribute varied.
3.22
test stimulus
image presented to the observer for subjective evaluation
NOTE The plural is test stimuli.
3.23
treatment
controlled or characterized source of the variations between test stimuli (excluding scene content) that are to
be investigated in a psychophysical experiment
EXAMPLE Examples of treatments include different image processing algorithms, variations in capture or display
device properties, changes in image capture conditions (e.g. camera exposure), etc.
NOTE Different treatments may be achieved through hardware or software changes, or may be numerical
simulations of such effects. Typically, a series of treatments is applied to multiple scenes, each generating a series of test
stimuli. The effect of the treatment may then be determined by averaging the results over scene and observer to improve
signal to noise and reduce the likelihood of systematic bias.
3.24
triplet comparison
psychophysical method that involves the simultaneous scaling of three test stimuli with respect to image
quality or an attribute thereof, in accordance with a set of instructions given to the observer
NOTE The triplet comparison method is described in more detail in ISO 20462-2.
3.25
univariate
describing a series of test or reference stimuli that vary only in a single attribute of image quality
4 Specification of the experimental conditions and results
4.1 Observer characteristics
Observers shall be free of any personal involvement with the design of the psychophysical experiment or the
generation of, or subject matter depicted by, the test stimuli.
© ISO 2005 – All rights reserved 5
---------------------- Page: 10 ----------------------
ISO 20462-1:2005(E)
Observers shall be checked for normal vision characteristics insofar as they affect their ability to carry out the
assessment task. In most cases, observers should be confirmed to have normal colour vision and should be
tested for visual acuity at approximately the viewing distance employed in the psychophysical experiment.
The number of observers participating in an experiment shall be reported. If the data of any observers are
omitted from the analysis because of indications of difficulties with the task, the number omitted, and the
criteria upon which exclusion was based, shall be reported. The percentage of observers excluded should not
exceed 15 %. At least 10 observers shall (and preferably 20 should) contribute data to the analysis. Criteria
for selection of observers, and notable characteristics of the observer group as a whole, should also be
reported.
NOTE Examples of information regarding the observer group that might be reported include demographic data, level
of experience in image evaluation, technical training if pertinent to a particular imaging application, etc.
4.2 Stimulus properties
The number of distinct scenes represented in the test stimuli shall be reported and shall equal or exceed 3
scenes (and preferably should equal or exceed 6 scenes). If fewer than 6 scenes are used, each shall be
preferably depicted or alternatively briefly described, particularly with regard to properties that might influence
the importance or obviousness of the stimulus differences.
The nature of the variation (other than scene content) among the test stimuli shall be described in both
subjective terms (image quality attributes) and objective terms (stimulus treatment or generation). Other
properties of the stimuli or their generation that might be expected to influence the results obtained even if
present at a constant level in the test stimuli should also be reported. If reference stimuli are provided to the
observer, as in a quality ruler experiment, their pedigree shall be specified in accordance with ISO 20462-3.
NOTE Examples of stimulus properties or aspects of their generation that might affect the outcome of an experiment
even if they were invariant include presence of serious artefacts that might cause suppression, and application of image
processing steps that could amplify certain types of signal or noise.
4.3 Instructions to the observer
The instructions shall state what is to be evaluated by the observer and shall describe the mechanics of the
experimental procedure. If the test stimuli vary only in the degree of a single artefactual attribute, and there
are no calibrated reference stimuli presented to the observer, then the instructions shall direct the observer to
evaluate the attribute varied, rather than to evaluate overall quality. A small set of preview images showing the
range of stimulus variations should be shown to observers before they begin their evaluations, and the
differences between the preview images should be explained.
The task assigned to the observer shall be reported, making clear whether evaluation of overall quality or an
attribute thereof was requested, and specifying which psychophysical method was used. The extent to which
the nature of the variation of the stimuli was demonstrated and explained to the observers shall also be
reported.
NOTE 1 There are various viewpoints regarding the extent to which the instructions should identify the variations in
stimuli to be presented to the observer. One danger in not identifying the attributes being varied is that an observer may
fail to recognize the nature of the stimulus differences until a particularly obvious example of an attribute is encountered,
causing a transition from a state of insensitivity to one of sensitivity. Because the goal of most investigations is to
determine responses at a steady state or equilibrium condition, rather than to characterize transient behaviour, dramatic
changes in observer perception in the middle of an experiment are normally undesirable.
NOTE 2 If stimuli are univariate or vary only in a small number of attributes, merely asking the observer to assess the
overall quality of the stimuli does not guarantee that the observer will make the desired value judgement rather than
evaluating the attribute appearances in an artificial and analytical manner, potentially leading to results in units that are
intermediate between attribute and quality JNDs. A helpful tactic in such cases is to ask the observer to imagine that the
image represents a personally treasured moment, and to compare images (whether test and reference stimuli or multiple
test stimuli) based on which they would prefer to own, if they could have only one. This approach may help to place the
observer in the proper frame of mind to assess overall quality.
6 © ISO 2005 – All rights reserved
---------------------- Page: 11 ----------------------
ISO 20462-1:2005(E)
4.4
...
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.VOLNRYQHPhotographie - Méthodes psychophysiques expérimentales pour estimer la qualité d'image - Partie 1: Aperçu général des éléments psychophysiquesPhotography - Psychophysical experimental methods for estimating image quality - Part 1: Overview of psychophysical elements37.040.01Fotografija na splošnoPhotography in generalICS:Ta slovenski standard je istoveten z:ISO 20462-1:2005SIST ISO 20462-1:2011en01-julij-2011SIST ISO 20462-1:2011SLOVENSKI
STANDARD
SIST ISO 20462-1:2011
Reference numberISO 20462-1:2005(E)© ISO 2005
INTERNATIONAL STANDARD ISO20462-1First edition2005-11-01Photography — Psychophysical experimental methods for estimating image quality — Part 1: Overview of psychophysical elements Photographie — Méthodes psychophysiques expérimentales pour estimer la qualité d'image — Partie 1: Aperçu général des éléments psychophysiques
SIST ISO 20462-1:2011
ISO 20462-1:2005(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below.
©
ISO 2005 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body in the country of the requester. ISO copyright office Case postale 56 • CH-1211 Geneva 20 Tel.
+ 41 22 749 01 11 Fax
+ 41 22 749 09 47 E-mail
copyright@iso.org Web
www.iso.org Published in Switzerland
ii
© ISO 2005 – All rights reserved
SIST ISO 20462-1:2011
ISO 20462-1:2005(E) © ISO 2005 – All rights reserved
iiiContents Page Foreword.iv Introduction.v 1 Scope.1 2 Normative references.1 3 Terms and definitions.1 4 Specification of the experimental conditions and results.5 4.1 Observer characteristics.5 4.2 Stimulus properties.6 4.3 Instructions to the observer.6 4.4 Viewing conditions.7 4.5 Experimental duration.7 4.6 Results.8 4.7 Summary of reported quantities.8 Annex A (informative)
Selection of an appropriate psychophysical method.9 Annex B (informative)
Stimulus differences, paired comparison proportions, and JNDs.11 Annex C (informative)
Example of a report of a psychophysical experiment.13 Annex D (informative)
Comparison of selected psychometric methods.15 Bibliography.17
SIST ISO 20462-1:2011
ISO 20462-1:2005(E) iv
© ISO 2005 – All rights reserved Foreword ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2. The main task of technical committees is to prepare International Standards. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this document may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. ISO 20462-1 was prepared by Technical Committee ISO/TC 42, Photography. ISO 20462 consists of the following parts, under the general title Photography — Psychophysical experimental methods for estimating image quality: ⎯ Part 1: Overview of psychophysical elements ⎯ Part 2: Triplet comparison method ⎯ Part 3: Quality ruler method
SIST ISO 20462-1:2011
ISO 20462-1:2005(E) © ISO 2005 – All rights reserved
vIntroduction There are many circumstances under which it is desirable to quantify image quality in a standardized fashion that facilitates interpretation of results within a given experiment and/or comparison of results between different experiments. Such information can be of value in assessing the performance of different capture or display devices, image processing algorithms, etc. under various conditions. There are a number of psychometric methods described in the literature, such as paired comparison, rank ordering, categorical sort, and magnitude estimation, which might be considered as candidates for experimentally measuring image quality. Several textbooks[1] [3] [4] [5] [9] [12] have reviewed these and other methods and have discussed associated data reduction techniques, which usually are based upon the approach of Thurstone[11] or analogous reasoning. However, the choice of the best method for a particular application may be difficult to make, and interpretation of the rating scales produced by the numerical analyses is frequently ambiguous. Furthermore, none of the commonly used techniques provides an efficient mechanism for calibration of the results against a standardised numerical scale or associated physical references, which is desirable when results of different experiments are to be compared or integrated. The value of new calibrated psychometric methods in developing comprehensive models of imaging system quality has been demonstrated in a recent work[6] that contains more detailed discussions of many of the informative topics superficially considered herein. The three parts of ISO 20462 address the need for documented means of determining image quality in a calibrated fashion. Part 1 provides an overview of practical psychophysics; specific experimental methods and associated data reduction techniques are described in Part 2 (triplet comparison method[8] [10]) and Part 3 (quality ruler method[6]). Informative Annex A aids in identifying the better choice between the two alternative methods of Parts 2 to 3, which are complementary and together are sufficient to span a wide range of applications. It is the intent of these methods to produce results that are not merely directional in nature, but are expressed in terms of relative or fixed scales that are calibrated in just noticeable differences (JNDs), so that the significance of experimentally measured stimulus differences is readily ascertained. SIST ISO 20462-1:2011
SIST ISO 20462-1:2011
INTERNATIONAL STANDARD ISO 20462-1:2005(E) © ISO 2005 – All rights reserved
1Photography — Psychophysical experimental methods for estimating image quality — Part 1: Overview of psychophysical elements 1 Scope This part of ISO 20462 is part of a multiple-part standard pertaining to the subjective evaluation of pictorial still image quality. This part of ISO 20462 a) defines the units by which image quality is quantified (just noticeable differences, or JNDs); b) describes the influence of stimulus properties, observer characteristics, and task instructions on results obtained from rating experiments; c) provides a flow chart for choosing the preferred psychophysical method for determining image quality from among those defined in subsequent parts of ISO 20462. 2 Normative references The following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. ISO 3664, Viewing conditions — Graphic technology and photography 3 Terms and definitions For the purpose of this document, the following terms and definitions apply: 3.1 artefactual attribute attribute of image quality that, when evident in an image, nearly always leads to a loss of overall image quality EXAMPLE Examples of artefactual attributes include noise and aliasing. NOTE The commonly used terms defect and impairment are similar in meaning. 3.2 attribute aspect, dimension, or component of overall image quality cf. artefactual attribute (3.1) and preferential attribute (3.12) SIST ISO 20462-1:2011
ISO 20462-1:2005(E) 2
© ISO 2005 – All rights reserved EXAMPLE Examples of image quality attributes include image structure properties such as sharpness and noise; colour and tone reproduction properties such as contrast, colour balance, and relative colourfulness; and digital artefacts such as aliasing, contouring, and compression defects. 3.3 attribute just noticeable difference attribute JND measure of the detectability of appearance variations, corresponding to a stimulus difference that leads to a 75:25 proportion of responses in a paired comparison task in which univariate stimuli pairs are assessed in terms of a single attribute identified in the instructions cf. quality JND (3.14) NOTE 1 As an example, a paired comparison identifying the sharper of two stimuli that differ only in their generating system modulation transfer function (MTF), would yield results in terms of sharpness attribute JNDs. If the MTF curves differed monotonically and did not cross, the outcome of the paired comparison would depend primarily upon the observers’ ability to detect changes in the appearance of the stimuli as a function of MTF variations, with little or no value judgement required of the observers. The relationship between paired comparison proportions and stimulus differences is discussed in greater detail in Annex B. NOTE 2 If observers are instead asked to choose which of a pair of stimuli is higher in overall image quality, and if the stimuli in aggregate are multivariate, such that the observer should make value judgements of the importance of a number of attributes, rather than focussing on one aspect of image appearance, it is observed experimentally that larger objective stimulus differences (for example, MTF changes) are required to obtain a 75:25 proportion of responses, which in this case corresponds to a quality JND. NOTE 3 A JND is a statistical quantity, derived from a number of observations. An observer assessing a single pair of images differing by one attribute JND is unlikely to be confident that he or she has detected the sample difference. A stimulus difference of approximately three JNDs is usually needed for an observer of average sensitivity to feel reasonably certain of his or her assessment. 3.4 categorical sort method psychophysical method involving the classification of a stimulus into one of several ordered categories, at least some of which are identified by adjectives or phrases that describe different levels of image quality or attributes thereof NOTE The application of adjectival descriptors is strongly affected by the range of stimuli presented, so that it is difficult to compare the results of one categorical sort experiment to another. Range effects and the coarse quantization of categorical sort experiments also hinder conversion of the responses to JND units. Given these limitations, it is not possible to unambiguously map adjectival descriptors to JND units, but it is worth noting that in some experiments where a broad range of stimuli have been presented, the categories excellent, very good, good, fair, poor, and not worth keeping have been found to provide very roughly comparable intervals that average about six quality JNDs in width. 3.5 image quality impression of the overall merit or excellence of an image, as perceived by an observer neither associated with the act of photography, nor closely involved with the subject matter depicted NOTE The purpose of defining image quality in terms of third-party (uninvolved) observers is to eliminate sources of variability that arise from more idiosyncratic aspects of image perception and pertain to attributes outside the control of imaging system designers. 3.6 instructions set of directions given to the observer for performing the psychophysical evaluation task 3.7 just noticeable difference
JND stimulus difference that leads to a 75:25 proportion of responses in a paired comparison task cf. attribute JND (3.3) and quality JND (3.14) SIST ISO 20462-1:2011
ISO 20462-1:2005(E) © ISO 2005 – All rights reserved
33.8 magnitude estimation method psychophysical method involving the assignment of a numerical value to each test stimulus that is proportional to image quality; typically, a reference stimulus with an assigned numerical value is present to anchor the rating scale NOTE The numerical scale resulting from a magnitude estimation experiment is usually assumed to constitute a ratio scale, which, ideally, is a scale in which a constant percentage change in value corresponds with one JND. In practice, modest deviations from this behaviour occur, complicating the transformation of the rating scale into units of JNDs without inclusion of unidentified reference stimuli (having known quality) among the test stimuli. 3.9 multivariate describing a series of test or reference stimuli that vary in multiple attributes of image quality 3.10 observer individual performing the subjective evaluation task in a psychophysical method 3.11 paired comparison method psychophysical method involving the choice of which of two simultaneously presented stimuli exhibits greater or lesser image quality or an attribute thereof, in accordance with a set of instructions given to the observer NOTE Two limitations of the paired comparison method are as follows. a) If all possible stimulus comparisons are done, as is usually the case, a large number of assessments are required for even modest numbers of experimental stimulus levels [if N levels are to be studied, N(N − 1)/2 paired comparisons are needed]. b) If a stimulus difference exceeds approximately 1,5 JNDs, the magnitude of the stimulus difference cannot be directly estimated reliably because the response saturates as the proportions approach unanimity. However, if a series of stimuli having no large gaps are assessed, the differences between more widely separated stimuli may be deduced indirectly by summing smaller, reliably determined (unsaturated) stimulus differences. The standard methods for transformation of paired comparison data to an interval scale (a scale linearly related to JNDs) perform statistically optimized procedures for inferring the stimulus differences, but they may yield unreliable results when saturated responses are included in the analysis. 3.12 preferential attribute attribute of image quality that is invariably evident in an image, and for which the preferred degree is a matter of opinion, depending upon both the observer and the image content EXAMPLE Examples of preferential image quality attributes include colour and tone reproduction properties such as contrast and relative colourfulness. Because the perceived quality associated with a preferential attribute is dependent upon both the observer and image content, in studies involving variations of preferential attributes, particular care is needed in the selection of representative sets of stimuli and groups of observers. NOTE The term noticeable in just noticeable difference is not linguistically strictly correct when applied to a preferential attribute, but is nonetheless retained in this part of ISO 20462 for convenience. For example, the higher contrast stimulus of a pair differing only in contrast might be readily identified by all observers, whereas there might be a lack of consensus regarding which of the two images was higher in overall image quality. Nonetheless, if the responses from the paired comparison for quality were in the proportion of 75:25, the image chosen more frequently would be said to be one JND higher in quality. The JND is best regarded as a measurement unit tied to the predicted or measured outcome of a paired comparison. 3.13 psychophysical method experimental technique for subjective evaluation of image quality or attributes thereof, from which stimulus differences in units of JNDs may be estimated SIST ISO 20462-1:2011
ISO 20462-1:2005(E) 4
© ISO 2005 – All rights reserved cf. categorical sort (3.4), magnitude estimation (3.8), paired comparison (3.11), quality ruler (3.15), rank ordering (3.16) and triplet comparison methods (3.24) 3.14 quality just noticeable difference quality JND measure of the significance or importance of quality variations, corresponding to a stimulus difference that leads to a 75:25 proportion of responses in a paired comparison task in which multivariate stimuli pairs are assessed in terms of overall image quality NOTE 1 See Notes for attribute JND (3.3). NOTE 2 The attribute JND is a measure of detectability of appearance changes, whereas the quality JND is a measure of significance or importance of stimulus differences in terms of their impact on quality. An attribute JND is a useful unit for predicting how observers would react to an advertisement showing images carefully matched in all respects but one, and drawing the attention of the observer to the attribute varying. In contrast, a quality JND is useful for predicting how observers would perceive overall quality as a function of one or more stimulus variations, and so is a more useful quantity in optimizing imaging system design, where different attributes should be balanced against one another. The overall quality of an image may be predicted from a knowledge of the impact of each attribute in isolation, expressed in terms of quality JNDs, whereas the same is not true of attribute JNDs. Therefore, it is often highly desirable to obtain results expressed in quality JNDs, even if the stimuli being assessed are univariate in nature. This can be accomplished if test stimuli are rated against a series of appropriately calibrated reference stimuli, as in the quality ruler method. 3.15 quality ruler method psychophysical method that involves quality or attribute assessment of a test stimulus against a series of ordered, univariate reference stimuli that differ by known numbers of JNDs NOTE The quality ruler method is described in more detail in ISO 20462-3. 3.16 rank ordering method psychophysical method involving the arrangement by an observer of a series of stimuli in order of increasing or decreasing image quality or an attribute thereof, in accordance with the set of instructions provided 3.17 reference stimulus image provided to the observer for the purpose of anchoring or calibrating the perceptual assessments of test stimuli in such a manner that the given ratings may be converted to JND units NOTE The plural is reference stimuli. 3.18 scene content or subject matter of an image, or a starting image from which multiple stimuli may be produced through different experimental treatments NOTE Typically, stimuli depicting the same scene are compared in a psychophysical experiment, because it is the effect of the treatment that is of interest, and differences in image content could cause spurious effects. In cases where scene content is not matched, a number of scenes should be used so that scene effects may be expected to average out. 3.19 standard quality scale
SQS fixed numerical scale of quality having the following properties: a) the numerical scale is anchored against physical standards; b) a one unit increase in scale value corresponds to an improvement of one JND of quality; and SIST ISO 20462-1:2011
ISO 20462-1:2005(E) © ISO 2005 – All rights reserved
5c) a value of zero corresponds to an image having so little information content that the nature of the subject of the image is difficult to identify NOTE The standard quality scale is described in more detail in ISO 20462-3. 3.20 stimulus image presented or provided to the observer either for the purpose of anchoring a perceptual assessment (a reference stimulus) or for the purpose of subjective evaluation (a test stimulus) NOTE The plural is stimuli. 3.21 suppression perceptual effect in which one attribute is present in a degree that seriously degrades image quality and thereby reduces the impact that other attributes have on overall quality, compared to the impact they would have had in the absence of the dominant attribute NOTE To generate reference stimuli that are separated by a specified number of JNDs based on variations in one attribute, it will be necessary to ensure that other attributes do not significantly suppress the impact of the attribute varied. 3.22 test stimulus image presented to the observer for subjective evaluation NOTE The plural is test stimuli. 3.23 treatment controlled or characterized source of the variations between test stimuli (excluding scene content) that are to be investigated in a psychophysical experiment EXAMPLE Examples of treatments include different image processing algorithms, variations in capture or display device properties, changes in image capture conditions (e.g. camera exposure), etc. NOTE Different treatments may be achieved through hardware or software changes, or may be numerical simulations of such effects. Typically, a series of treatments is applied to multiple scenes, each generating a series of test stimuli. The effect of the treatment may then be determined by averaging the results over scene and observer to improve signal to noise and reduce the likelihood of systematic bias. 3.24 triplet comparison psychophysical method that involves the simultaneous scaling of three test stimuli with respect to image quality or an attribute thereof, in accordance with a set of instructions given to the observer NOTE The triplet comparison method is described in more detail in ISO 20462-2. 3.25 univariate describing a series of test or reference stimuli that vary only in a single attribute of image quality 4 Specification of the experimental conditions and results 4.1 Observer characteristics Observers shall be free of any personal involvement with the design of the psychophysical experiment or the generation of, or subject matter depicted by, the test stimuli. SIST ISO 20462-1:2011
ISO 20462-1:2005(E) 6
© ISO 2005 – All rights reserved Observers shall be checked for normal vision characteristics insofar as they affect their ability to carry out the assessment task. In most cases, observers should be confirmed to have normal colour vision and should be tested for visual acuity at approximately the viewing distance employed in the psychophysical experiment. The number of observers participating in an experiment shall be reported. If the data of any observers are omitted from the analysis because of indications of difficulties with the task, the number omitted, and the criteria upon which exclusion was based, shall be reported. The percentage of observers excluded should not exceed 15 %. At least 10 observers shall (and preferably 20 should) contribute data to the analysis. Criteria for selection of observers, and notable characteristics of the observer group as a whole, should also be reported. NOTE Examples of information regarding the observer group that might be reported include demographic data, level of experience in image evaluation, technical training if pertinent to a particular imaging application, etc. 4.2 Stimulus properties The number of distinct scenes represented in the test stimuli shall be reported and shall equal or exceed 3 scenes (and preferably should equal or exceed 6 scenes). If fewer than 6 scenes are used, each shall be preferably depicted or alternatively briefly described, particularly with regard to properties that might influence the importance or obviousness of the stimulus differences. The nature of the variation (other than scene content) among the test stimuli shall be described in both subjective terms (image quality attributes) and objective terms (stimulus treatment or generation). Other properties of the stimuli or their generation that might be expected to influence the results obtained even if present at a constant level in the test stimuli should also be reported. If reference stimuli are provided to the observer, as in a quality ruler experiment, their pedigree shall be specified in accordance with ISO 20462-3. NOTE Examples of stimulus properties or aspects of their generation that might affect the outcome of an experiment even if they were invariant include presence of serious artefacts that might cause suppression, and application of image processing steps that could amplify certain types of signal or noise. 4.3 Instructions to the observer The instructions shall state what is to be evaluated by the observer and shall describe the mechanics of the experimental procedure. If the test stimuli vary only in the degree of a single artefactual attribute, and there are no calibrated reference stimuli presented to the observer, then the instructions shall direct the observer to evaluate the attribute varied, rather than to evaluate overall quality. A small set of preview images showing the range of stimulus variations should be shown to observers before they begin their evaluations, and the differences between the preview images should be explained. The task assigned to the observer shall be reported, making clear whether evaluation of overall quality or an attribute thereof was requested, and specifying which psychophysical method was used. The extent to which the nature of the variation of the stimuli was demonstrated and explained to the observers shall also be reported. NOTE 1 There are various viewpoints regarding the extent to which the instructions should identify the variations in stimuli to be presented to the observer. One danger in not identifying the attributes being varied is that an observer may fail to recognize the nature of the stimulus differences until a particularly obvious example of an attribute is encountered, causing a transition from a state of insensitivity to one of sensitivity. Because the goal of most investigations is to determine responses at a steady state or equilibrium condition, rather than to characterize transient behaviour, dramatic changes in observer perception in the middle of an experiment are normally undesirable. NOTE 2 If stimuli are univariate or vary only in a small number of attributes, merely asking the observer to assess the overall quality of the stimuli does not guarantee that the observer will make the desired value judgement rather than evaluating the attribute appearances in an artificial and analytical manner, potentially leading to results in units that are intermediate between attribute and quality JNDs. A helpful tactic in such cases is to ask the observer to imagine that the image represents a personally treasured moment, and to compare images (whether test and reference stimuli or multiple test stimuli) based on which they would prefer to own, if they could have only one. This approach may help to place the observer in the proper frame of mind to assess overall quality. SIST ISO 20462-1:2011
ISO 20462-1:2005(E) © ISO 2005 – All rights reserved
74.4 Viewing conditions Viewing conditions shall be consistent with ISO 3664 except for the following relaxed criteria. a) For print viewing, the illuminance level shall
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.