ISO/IEC TR 29170-1:2017
(Main)Information technology - Advanced image coding and evaluation - Part 1: Guidelines for image coding system evaluation
Information technology - Advanced image coding and evaluation - Part 1: Guidelines for image coding system evaluation
ISO/IEC TR 29170-1:2017 recommends best practices for coding system evaluation of images and image sequences. ISO/IEC TR 29170-1:2017 defines a common vocabulary of terms for coding system evaluation and divides evaluation methods into three broad categories: a) subjective assessment; b) objective assessment; c) computational assessment. In addition to these broad assessment categories, this document discusses special care that is given for coding unusual imagery, e.g. high dynamic range or high colour depth. A fourth assessment category, hardware complexity, is often important for real-time or computationally complex applications; however, it is outside the scope of this document.
Technologies de l'information — Codage d'image avancé et évaluation — Partie 1: Lignes directices pour l'évaluation des systèmes de codage d'image
General Information
Overview
ISO/IEC TR 29170-1:2017 provides guidelines and best practices for evaluating image coding systems (codecs) for images and image sequences. The technical report defines a common vocabulary, classifies evaluation methods, and recommends procedures for testing image quality and codec behavior. It addresses ordinary and unusual imagery (for example, high dynamic range (HDR) and high colour depth) while explicitly excluding hardware complexity assessment.
Key topics
- Evaluation categories:
- Subjective assessment (human observers, mean opinion score - MOS)
- Objective assessment (computational image-quality metrics)
- Computational assessment (algorithmic/complexity measures)
- Note: hardware complexity (real-time performance, implementation cost) is outside the scope.
- Terminology & definitions: common terms such as codec, channel, component bit depth, pixel, quality loss, generational quality loss, drift, constant/variable bit rate.
- Test image selection & characteristics: guidance on selecting representative images and describing image properties (channels, sample precision, resolution).
- Compression performance metrics: bits per pixel (bpp, defined as 8·L / (w·h)), compression ratio (CR), and considerations for variable vs. constant bit-rate systems.
- Subjective testing best practices: observer selection (expert vs non-expert), number of observers, visual acuity, instructions, evaluation scales, viewing conditions (references to ISO 3664 and ISO 9241), and statistical analysis.
- Objective & computational metrics: reference to common measures and algorithms (examples found in annexes: PSNR, MSE, SSIM/MS-SSIM, CIEDE2000, HDR-VDP and other visual difference predictors).
- Special cases: handling HDR, high colour depth imagery, error resilience, recursive compression effects and verification of codec characteristics (see informative annexes A–D).
Applications and who uses it
ISO/IEC TR 29170-1 is useful for:
- Codec developers and researchers - to design and validate compression algorithms and measure visual quality.
- Image processing and multimedia QA teams - to establish reproducible test procedures and pass/fail criteria.
- Broadcast, streaming and imaging device manufacturers - to evaluate perceptual quality under different bit rates and content types including HDR.
- Standards bodies and test-lab operators - to harmonize vocabulary, test image selection, and assessment approaches.
- Tool vendors - to implement objective and computational metrics consistent with standard guidance.
Related standards
- ISO 3664 (viewing conditions for image evaluation)
- ISO 9241 (ergonomics of human-system interaction / display requirements)
- Referenced metrics and codec definitions tie into broader JTC 1 / SC 29 image coding work.
Standards Content (Sample)
TECHNICAL ISO/IEC TR
REPORT 29170-1
First edition
2017-10
Information technology — Advanced
image coding and evaluation —
Part 1:
Guidelines for image coding system
evaluation
Technologies de l'information — Codage d'image avancé et
évaluation —
Partie 1: Lignes directices pour l'évaluation des systèmes de
codage d'image
Reference number
©
ISO/IEC 2017
© ISO/IEC 2017, Published in Switzerland
All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form
or by any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior
written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of
the requester.
ISO copyright office
Ch. de Blandonnet 8 • CP 401
CH-1214 Vernier, Geneva, Switzerland
Tel. +41 22 749 01 11
Fax +41 22 749 09 47
copyright@iso.org
www.iso.org
ii © ISO/IEC 2017 – All rights reserved
Contents Page
Foreword .iv
Introduction .v
1 Scope . 1
2 Normative references . 1
3 Terms and definitions . 1
4 Abbreviated terms . 3
5 Selection and characteristics of test images . 4
5.1 Common image characteristics . 4
5.2 Bits per pixel . 5
5.3 Compression ratio . 5
5.4 Variation in bit rates . 5
5.4.1 Constant bit rate systems . 5
5.4.2 Variable bit rate systems . 5
5.5 Error resilience . 6
5.6 Recursive compression assessment . 6
5.7 Image selection . 6
6 Best practices of subjective image quality assessments . 7
6.1 Goals of subjective assessment . 7
6.2 Subjective assessment evaluation procedures . 7
6.2.1 Observer selection . 7
6.2.2 Visual acuity . 7
6.2.3 Number of observers . 7
6.2.4 Instructions to observers . 8
6.2.5 Evaluation scales . 8
6.2.6 Statistical analysis . . 8
6.3 Viewing conditions for electronic displays . 8
6.3.1 Purpose . 8
6.3.2 ISO 3664 . 9
6.3.3 ISO 9241 . 9
6.4 Goals for evaluation of visually lossless and nearly lossless coding . 9
7 Best practices of objective image quality assessment methodology .9
Annex A (informative) Subjective metrics .11
Annex B (informative) Objective metrics .14
Annex C (informative) Computational metrics .19
Annex D (informative) Verification of codec characteristics .31
Bibliography .34
© ISO/IEC 2017 – All rights reserved iii
Foreword
ISO (the International Organization for Standardization) and IEC (the International Electrotechnical
Commission) form the specialized system for worldwide standardization. National bodies that are
members of ISO or IEC participate in the development of International Standards through technical
committees established by the respective organization to deal with particular fields of technical
activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international
organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the
work. In the field of information technology, ISO and IEC have established a joint technical committee,
ISO/IEC JTC 1.
The procedures used to develop this document and those intended for its further maintenance are
described in the ISO/IEC Directives, Part 1. In particular the different approval criteria needed for
the different types of document should be noted. This document was drafted in accordance with the
editorial rules of the ISO/IEC Directives, Part 2 (see www.iso.org/directives).
Attention is drawn to the possibility that some of the elements of this document may be the subject
of patent rights. ISO and IEC shall not be held responsible for identifying any or all such patent
rights. Details of any patent rights identified during the development of the document will be in the
Introduction and/or on the ISO list of patent declarations received (see www.iso.org/patents).
Any trade name used in this document is information given for the convenience of users and does not
constitute an endorsement.
For an explanation on the voluntary nature of standards, the meaning of ISO specific terms and
expressions related to conformity assessment, as well as information about ISO's adherence to the
World Trade Organization (WTO) principles in the Technical Barriers to Trade (TBT) see the following
URL: www.iso.org/iso/foreword.html.
This document was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology,
Subcommittee SC 29, Coding of audio, picture, multimedia and hypermedia information.
A list of all parts in the ISO/IEC 29170 series can be found on the ISO website.
iv © ISO/IEC 2017 – All rights reserved
Introduction
This document provides a framework and best practices to evaluate image compression algorithms.
This document provides a selection of evaluation tools that allow testing multiple features, including
objective metric image quality, subjective metric image quality and codec algorithmic complexity.
Which features of codecs should be tested and pass-fail criteria is beyond the scope of this document.
© ISO/IEC 2017 – All rights reserved v
TECHNICAL REPORT ISO/IEC TR 29170-1:2017(E)
Information technology — Advanced image coding and
evaluation —
Part 1:
Guidelines for image coding system evaluation
1 Scope
This document recommends best practices for coding system evaluation of images and image
sequences. This document defines a common vocabulary of terms for coding system evaluation and
divides evaluation methods into three broad categories:
a) subjective assessment;
b) objective assessment;
c) computational assessment.
In addition to these broad assessment categories, this document discusses special care that is given for
coding unusual imagery, e.g. high dynamic range or high colour depth.
A fourth assessment category, hardware complexity, is often important for real-time or computationally
complex applications; however, it is outside the scope of this document.
2 Normative references
There are no normative references in this document.
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
ISO and IEC maintain terminological databases for use in standardization at the following addresses:
— IEC Electropedia: available at http://www.electropedia.org/
— ISO Online browsing platform: available at http://www.iso.org/obp
3.1
channel
one logical component of an image
Note 1 to entry: A channel may be a direct representation of one component from the bitstream, or may be
generated by the application of a palette to a component from the bitstream.
[SOURCE: ISO/IEC 15444-1:2016, 3.17 – modified to move part of definition into a Note to entry]
3.2
codec
coding system
system comprising a compressor (3.6), a decompressor (3.8) and the compressor's bitstream output is
compatible with the decompressor's bitstream input
© ISO/IEC 2017 – All rights reserved 1
3.3
component
two-dimensional array of samples
Note 1 to entry: An image typically consists of several components, for instance, representing red, green, and blue.
[SOURCE: ISO/IEC 15444-1:2016, 3.26 – modified to move part of definition into a Note to entry]
3.4
component bit depth
number of bits of precision of colour channels (or components) of an unencoded image
3.5
component number
number of colour channels (or components) encoded in an image
3.6
compressor
portion of a coding system that has a pixel stream and may have control metadata as its input and a
coded bitstream as its output
3.7
constant bit rate
mode where the number of encoded bits from a portion of an image represented by a fixed number of
pixels (3.16) does not vary compared to the number of encoded bits in any other equally sized portion
of the same image
3.8
decompressor
portion of a codec (coding system) (3.2) that has a coded bitstream as its input and a pixel (3.16) stream
as its output
3.9
drift
net generational loss of image quality if the output of a lossy image compression/reconstruction cycle is
recompressed again under the same conditions by the same codec (3.2)
3.10
expert observer
observer that has expertise in image artefacts that may be introduced by the system under test or who
has designed or participated in the selection of test content for the system under test
3.11
generational quality loss
measure of quality loss (3.17) between a reference image and a reconstruction of the same image after
repetitive generations of encoding and decoding
3.12
horizontal pixel resolution
horizontal extent of the image in image pixels (3.16) where the horizontal extent may depend on the
channel
3.13
idempotent
codec (3.2) that operates losslessly on its own decompression output
3.14
non-expert observer
naïve observer
observer that has no expertise in the image artefacts that may be introduced by the system under test
2 © ISO/IEC 2017 – All rights reserved
3.15
objective assessment
computational algorithmic process leading to a numerical score for all or a portion of an image under test
3.16
pixel
smallest element that is capable of generating the full intended functionality, e.g. colour and grey scale,
of the display
Note 1 to entry: In a multicolour display, the smallest addressable element capable of producing the full colour
range or the smallest element that is capable of generating the full functionality of the display.
3.17
quality loss
measure of the difference between a reference image and an encoded and reconstructed representation
of the same image
3.18
sample
one unit of a grey scale or colour where an unencoded image comprises a plurality of these units
3.19
sample precision
bit depth of a given data type encoding the image
3.20
sample type
type of numeric value that contains sample (3.18) values to a resolution specified by sample precision (3.19)
where types can include unsigned integers, signed integers and floating point or fixed point samples
3.21
sub-sample
sample (3.18) where the number of samples in either the horizontal dimension or the vertical dimension
is not equal to the horizontal or vertical image dimension, respectively
3.22
subjective assessment
algorithmic process where recorded observations from human subjects (observers) lead to a numerical
score for all or a portion of an image under test
3.23
variable bit rate
mode where the number of encoded bits in a portion of an image represented by a fixed number of
pixels (3.16) can be different from the number of encoded bits in any other equally sized portion of the
same image
3.24
vertical pixel resolution
vertical extent of the image in pixels (3.16) and the vertical extent may depend on the channel for
subsampled images
4 Abbreviated terms
bpp bits per pixel
CIE International Commission on Illumination
CIEDE2000 CIE colour difference formula
© ISO/IEC 2017 – All rights reserved 3
CIELAB CIE – Lab colour space
CIE-XYZ CIE – XYZ colour space
CR compression ratio
CSF contrast sensitivity function
CW-SSIM complex wavelet structural similarity index
DDP degree of data parallelism
HDR high dynamic range
HDR-VDP high dynamic range visual difference predictor
HVS human visual system
JND just noticeable difference
LDR low dynamic range, synonymous with SDR
MOS mean opinion score
MSE mean squared error
MSSIM mean structural similarity index
MS-SSIM multi scale structural similarity index
PSNR peak signal-to-noise ratio
RDP ratio of pixels to data parallelism
S-CIELAB spatial extension to CIEDE2000
SDR standard dynamic range, synonymous with LDR
SIMD single instruction, multiple data
SSIM structural similarity index
VDM visual discrimination model
VDP visual differences predictor
5 Selection and characteristics of test images
5.1 Common image characteristics
Image selection relies on a common vocabulary for describing image characteristics. This clause defines
this vocabulary and the applicability to testing both standard and high dynamic range images.
For example, integer samples in range [0.1023] are here described as ten bit data, regardless of
whether the samples are stored in 16 bit values or packed into ten bits each. Integer values in the range
[-128.127] are here classified as 8 bit signed data because the data representation consists of one sign
bit and seven magnitude bits.
The image dimension data consists of the full set of data defined above, that is, the number of channels,
the width and height of each image channel, the sample type of each channel and the sample precision
of each channel.
4 © ISO/IEC 2017 – All rights reserved
5.2 Bits per pixel
Bits per pixel (bpp) describes the compression performance of image compression codecs independent
of the original image's sample size.
bpp, given in Formula (1), is defined independently of the image sample precision as the size of the
compressed image stream L and the image dimensions, w and h:
8⋅L
bpp= (1)
wh⋅
where
L is the compressed image stream, in bytes;
w is the width;
h is the height.
5.3 Compression ratio
Compression ratio (CR), given in Formula (2), describes the compression performance of image coding
[6]
system dependent of the original image's sample size :
d−1
bc ⋅wc ⋅hc
() () ()
∑
c=0
CR = (2)
8⋅L
where
d is the number of channels of the image;
w(c) is the horizontal extent of channel c;
h(c) is the vertical extent of channel c;
b(c) is the number of bits of sample precision in the samples of channel c.
5.4 Variation in bit rates
5.4.1 Constant bit rate systems
Constant bit rate systems have a constant pixels per unit of time input that matches the constant pixels
per unit of time output without variation within an image. A test can verify if any bit rate variation is
present. This restriction may not apply between two or more images.
5.4.2 Variable bit rate systems
For some applications, it is important that a coding system is able to generate a continuous stream of
symbols, ensuring that some output is generated at least in every given time span, i.e. that the output bit
rate does not vary too much over time. For example, carry-over resolution in arithmetic coding might
cause arbitrary long delays in the output until the carry can be resolved.
For the purpose of this test, the output bit rate is defined as the number of output symbols generated for
each input symbol, measured in dependence of the percentage of the input stream fed into the codec.
A measurement procedure to measure bit rate variations appears in Annex D.
© ISO/IEC 2017 – All rights reserved 5
5.5 Error resilience
In modern systems, error resiliency can be assisted by error markers in the bitstream or error resiliency
can be part of transport layer capabilities. A coding system evaluation needs to take into consideration
whether error resiliency is in a bitstream and if so, whether optional or intertwined and inseparable.
The best practices at the time of this document separates error resiliency by computing the efficiency
of the algorithm to code images while assuming a perfect transmission medium. The ability to recover
errors can be added either through resiliency markers, forward error correction or merely parity
checking to identify but not correct errors.
If separable, the topic is outside the scope of this document and codec testing should assume no error
introduction in the bitstream.
If error markers and error handling markers are not separable from the coded bitstream, the coding
system efficiency will include such markers.
5.6 Recursive compression assessment
Generation loss is a loss in image quality if the output of a lossy image compression/decompression
cycle is recompressed again under the same conditions by the same compression/decompression.
If this recompression is repeated over several cycles, this can result in severe degradation of image
[26]
quality .
Generation loss limits the number of repeated compressions/decompressions in an image
processing chain if repeated recompression generates severely more distortion than a single
compression/decompression cycle. This subclause distinguishes between drift and quality loss. While
the former is due to a systematic DC error often due to mis-calibration in the colour transformation or
quantization, the latter covers all other error sources, as well as, for example, due to limited precision
in the image transformation implementation.
A measurement procedure to measure generational quality loss appears in Annex D.
5.7 Image selection
Colour content and categories of images to consider when testing a codec include continuous tone
images, black and white or half tones. Test material should reflect the potential applications in which a
coding system will be used. The following examples represent common image categories for evaluation:
a) natural scenes;
b) portraits with differing skin tones;
c) compound (multi-layer);
d) photo-realistic synthetic;
e) graphics and animations;
f) text and web pages;
g) engineered test patterns.
If the coding system is intended for specific image types or applications, such as medical imaging, a set
of images appropriate to the application should be the test set.
Image size used during testing should be appropriate for the application, not very much smaller or
larger than targeted in typical usage.
6 © ISO/IEC 2017 – All rights reserved
6 Best practices of subjective image quality assessments
6.1 Goals of subjective assessment
Some subjective image assessment methods are likely to reflect the human notion of quality by
anticipating the reactions of those who might view the tested systems. While other subjective image
assessment methods can determine if some artefacts are visually discernible and likely to adversely
affect image quality. These methods become the best quality assessment methods. However, they are
very time demanding and they might eventually become very expensive, because of the cost of the
viewers and also of the system under test implementation.
[7]
Test evaluations can be application specific, for example, according to Rec. ITU-R BT.500 :
“Subjective assessment methods are used to establish the performance of television systems using
measurements that more directly anticipate the reactions of those who might view the systems tested.
In this regard, it is understood that it may not be possible to fully characterize system performance by
objective means; consequently, it is necessary to supplement objective measurements with subjective
measurements.”
This document suggests that best practice should separate applications from the image quality
evaluation to the best extent possible. Subjective assessment methodology recommended herein
follows this guideline.
Best practices in this document draw from the psychophysical experimental method standardized in
[3]
ISO 29462-2 for photography and extended the methods for electronic displays.
[8]
Some applications will have specific goals differing from general practice, such as, radiological images .
6.2 Subjective assessment evaluation procedures
6.2.1 Observer selection
Evaluators should prefer naïve observers for most general viewing or entertainment applications. In
the case of specialized imaging, such as, medical or structural engineering, an expert observer who can
discern defects from artefacts is needed.
6.2.2 Visual acuity
Common to all subjective evaluation procedures, observers will need to demonstrate meet a well-
defined visual acuity. Sometimes colour vision is not tested.
The following recommendations usually apply.
a) Test for visual acuity with or without corrective lens, either glasses or contacts that do not have
multiple focal lengths, e.g. progressive, bifocal or trifocal corrective lens.
b) Verify normal visual acuity by using a Snellen or Landolt test charts where the observer reads at
20/20 from 50 cm.
c) If screening for normal colour vision, verify by testing with Ishihara plates or equivalent.
Evaluators may refer to ISO/IEC 29170-2 for examples of tools that help assess an observer's visual
[5]
acuity .
6.2.3 Number of observers
The number of observers is dependent on the evaluation system. For example, according to Rec.
ITU-R BT.500:
© ISO/IEC 2017 – All rights reserved 7
“At least 15 observers should be used. The number of assessors needed depends upon the sensitivity and
reliability of the test procedure adopted and upon the anticipated size of the effect sought. For studies
with limited scope, e.g. of exploratory nature, fewer than 15 observers may be used.”
The example from ISO/IEC 29170-2, casts more importance on repetitions per observer and less on
observer number. These guidelines for the observer population apply:
“This procedure recommends evaluators recruit a suitable number of observers sufficient to include no
less than 10 observers who pass visual acuity (see 5.3.2) and test reporting (see D.1.2) requirements.”
In some cases, an evaluation procedure may set an absolute age limit due to visual acuity degradation
with age. For example, ISO/IEC 29170-2:2015 limits an observer's age to "40 years or less."
6.2.4 Instructions to observers
Each procedure should contain directions for observer instruction. In general, the procedure should be
understood, when to take breaks, and how to use any applicable user interface or software tools. In the
event of grading, explain the relative scale and illustrate with examples of good and impaired images of
various types.
6.2.5 Evaluation scales
Subjective testing usually employs one of the following scales: Lickert scale (see Rec. ITU-R BT.500 and
[9] [4]
Rec. ITU-T P.910 ), Quality ruler (ISO 20462-3) and forced choice and ternary choice procedures
(see ISO/IEC 29170-2 and Rec. ITU-R BT.500).
Refer to Rec. ITU-R BT.500 for an explanation of assessment problems and methods used in television.
Rec. ITU-T P.910 was used successfully for teleconferencing systems quality analysis.
Rec. ITU-T P.910 also cites usage of an explicit reference, depending on the objective of the testing.
“An important issue in choosing a test method is the fundamental difference between methods that use
explicit references (e.g. DCR), and methods that do not use any explicit reference (e.g. ACR, ACR-HR, and
PC). This second class of method does not test transparency or fidelity.”
6.2.6 Statistical analysis
This subclause recommends several methods for statistical analysis, each represent a separate topic.
For information about mean opinion score calculation and data treatment, refer to Annex A.
“Because they vary with range, it is inappropriate to interpret judgements from most of the assessment
methods in absolute terms (e.g. the quality of an image or image sequence).”
“For each test parameter, the mean and 95% confidence interval of the statistical distribution of the
assessment grades must be given. If the assessment was of the change in impairment with a changing
parameter value, curve-fitting techniques should be used. Logistic curve-fitting and logarithmic axis will
allow a straight line representation, which is the preferred form of presentation.” (Rec. ITU-R BT.500)
This report also refers readers to ISO/IEC 29170-2:2015, Annex D for statistical treatment of binary and
ternary forced choice data reports.
6.3 Viewing conditions for electronic displays
6.3.1 Purpose
Various International Standards and guidelines from trade organizations exist that are relevant to
compression investigators. This subclause describes some of the viewing conditions arranged for
viewing in standards defined by ISO and ISO/IEC and other references related closely with the end
application, such as, home television viewing or an office work environment.
8 © ISO/IEC 2017 – All rights reserved
6.3.2 ISO 3664
Originally designed for photographs, ISO 3664 defines viewing conditions for laboratory testing
[1]
environments . This is useful for native compression evaluation without distractions and other
influences from surrounding light.
However, any viewing conditions procedure is debatable in a fixed environment. The ideal conditions
for evaluation of an entire display system may be in the environment where it will be used or the
photograph viewed (cit. ISO 3664). White points will vary, recommendations include:
a) colour electronic displays: D65;
b) television: D65;
c) photographs: D50.
For evaluation of compression systems on colour monitors, this guideline recommends adherence to the
methods in ISO 3664 in all practical aspects. Deviations can either be defined in an applicable standard
or noted in a test report. In all cases for subjective evaluations, test reports should take care to report
sufficient detail for an evaluator skilled in the art to recreate applicable testing and viewing conditions.
6.3.3 ISO 9241
The ISO 9241 family of standards defines viewing conditions and ergonomic conditions of office viewing
monitor, takes into consideration many factors including ambient lighting, viewing distance, viewer’s
[2]
age and so forth . ISO 9241 represents a large body of work that can serve as a useful reference for the
compression expert when evaluating or designing suitable coding systems for the office environment.
6.4 Goals for evaluation of visually lossless and nearly lossless coding
ISO/IEC 29170-2 is useful for evaluating lightly compressed coding systems. For instance, display
stream compression where a source compresses image data sent to a display may be evaluated as
visible or invisible to a viewer. Examples of display streams include but are not limited to a wired link
between a set-top box unit and a television or between a mobile host graphics processor to a display
panel module in a mobile appliance.
A coding system will be considered visually lossless if the test results obtained when using this
procedure meet a pre-defined acceptable quality level. The interpretation of data obtained by this
subjective test procedure that may lead to a pass-fail threshold is outside the scope of this document.
The procedure compares individual images with various binary or ternary forced choice protocols. The
procedure also relies only on subjective evaluation methods designed to discern image imperfections
on electronic colour displays of any technology or size.
7 Best practices of objective image quality assessment methodology
In the recent literature, many papers have proposed objective quality metrics dedicated to several
image and video applications. This document recommends a few well-known metrics and a set of best
practices.
Objective evaluation metrics can be categorized into three groups.
a) Full-reference
Full reference metrics need full information of the original images and demand ideal images as
references which can be hardly achieved in practice. The traditional methods (such as peak signal-
to-noise-ratio PSNR) are based on pixel-wise error and have not always been in agreement with
perceived quality measurement. Recently, some full reference metrics modelled by simulating
the human visual system have been proposed. For instance, Reference [11] which was introduced
in an alternative complementary framework for quality assessment based on the degradation of
© ISO/IEC 2017 – All rights reserved 9
structural information. They developed a structural similarity index (SSIM) and demonstrated its
promise through a set of intuitive examples.
b) No-reference
No reference metrics aim to evaluate distorted images without any cue from their original ones.
"No reference" coding evaluation tends not to be favoured by this document because most of the
proposed no reference quality metrics are designed for one or sets of predefined specific distortion
types and are unlikely to be generalized for evaluating images degraded with other types of
distortions.
c) Reduced-reference
Reduced reference metrics make use of a part of the information from the original images in order
to evaluate the visual perception quality of the distorted ones. As the transformed and stored
data are reduced, reduced reference metrics have a great potential in some specific applications of
quality assessment.
Best practice recommendations appear in Annex B, which contains entirely full reference algorithms
and objective metrics that this technical working group has found useful when comparing codecs
designed within ISO/IEC and those from other organizations. As such, this collection represents an
understanding of best and common practice.
10 © ISO/IEC 2017 – All rights reserved
Annex A
(informative)
Subjective metrics
A.1 Mean opinion score
The mean opinion score (MOS) provides a numerical indication of the perceived quality of an image or
an image sequence after a process such as compression, quantization, and transmission. The MOS is
expressed as a single number in the range 1 to 5 in the case of a discrete scale (resp. 1 to 100 in the case
of a continuous scale), where 1 is the lowest perceived quality, and 5 (resp. 100) is the highest perceived
quality. Its computation allows to study the general behaviour of the observers with regard to a given
impairment.
A.1.1 MOS calculation
The interpretation of the obtained judgments is completely dependent on the nature of the constructed
test. The MOS m is computed for each presentation as given in Formula (A.1):
jkr
N
m = m (A.1)
jkri∑ jkr
N
i=1
where
th
is the score of the observer i for the degradation j of the image k and the r iteration;
m
ijkr
N is the number of observers.
In a similar way, we can calculate the global average scores, m and m , respectively for each test
j k
condition (impairment) and each test image.
© ISO/IEC 2017 – All rights reserved 11
A.1.2 Calculation of confidence interval
In order to evaluate as well as possible the reliability of the results, a confidence interval is associated to
the MOS. It is commonly adopted that the 95% confidence interval is enough. This interval is designed
as Formula (A.2):
m,−δδm+ (A.2)
jkrjkr jkrjkr
where
s
jkr
δ =1,96 (A.3)
jkr
N
s represents the standard deviation defined as Formula (A.4):
jkr
N
mm−
()
jkrijkr
s = (A.4)
jkr ∑
N−1
i=1
The value 1,96 in Formula (A.3) comes from the cumulative distribution function (CDF) of the normal
distribution. In case of smaller number of observers (less than 30), it is advisable to consider the Student
distribution instead.
A.1.3 Outliers rejection
One of the objectives of results analysis is also to be able to eliminate from the final calculation
either a particular score or an observer. This rejection allows to correct influences induced by the
observer's behaviour or bad choice of test images. The most obstructing effect is incoherence of the
answers provided by an observer, which characterizes the non-reproducibility of a measurement.
Rec. ITU-R BT.500-10 contains a way to reject incoherent results.
To that aim, it is necessary to calculate the MOS and the standard deviations associated with each
presentation. These average values are functions of two variables the presentations and the observers.
Then, check if this distribution is normal by using the β test. The latter is the kurtosis coefficient (i.e.
the ratio between the fourth-order moment and the square of the second-order moment). The β to
2jkr
be tested is given by Formula (A.5):
mm−
()
jkrijkr
∑
N
β = (A.5)
2jkr
1
mm−
()
∑ jkrijkr
N
If β is between 2 and 4, we can consider that the distribution is normal. In order, to compute P and
2jkr i
Q values allowing taking the final decision regarding the outliers, the observations m for each
i
ijkr
observer i, each degradation j, each image k, and each iteration r, is compared thanks to a combination
of the MOS and the associated standard deviation. The different steps of the calculation are summarized
in the following algorithm.
12 © ISO/IEC 2017 – All rights reserved
Algorithm 1: Steps for outliers rejection
if
[2 ≤ β₂
≤ 4 /* (normal distribution) */ then
if (u
≥ ū
+ 2σ
) then
P = P + 1;
endif
if (u
≤ ū
− 2σ
) then
Q = Q + 1;
endif
endif
else
if (u
≥ ū
+ 20σ
) then
P = P + 1;
endif
if (u
≤ ū
− 20σ
) then
Q = Q + 1;
endif
endif
/* Finally, we can carry out the following rejection test : */
P − Q
P + Q
if 0,05 0,3 then
and
J.K.R P + Q
Eliminate scores of observer i;
endif
/* Where J is the total number of degradations, K is the total number of images and R is the total number
of iterations */
A.2 Binary forced choice image comparison for nearly lossless imagery
Refer to ISO/IEC 29170-2:2015, Annex D for statistical treatment of binary and ternary forced choice
experimental design.
© ISO/IEC 2017 – All rights reserved 13
Annex B
(informative)
Objective metrics
B.1 Mean squared error
Mean square error (MSE) and peak signal-to-noise ratio (PSNR) approximate image quality in a full
reference quality assessment framework.
Record the mean square error between the original and the reconstructed image. Denote the sample
value of the reference image at position x,y in channel c by p(x,y,c) and the sample value of the
reconstructed image in channel c, at position x,y by q(x,y,c). Denote by d the number of image channels,
the width of channel c by w(c) and its height by h(c). Then, the MSE between the reference and the
reconstructed image is defined as Formula (B.1):
wc −1hc −1
() ()
d−1
11 2
MSE= px,y,c −qx,y,c (B.1)
() ()
∑ ∑ ∑
d wc ⋅hc
() ()
cc=0 x=0 y=0
B.2 Peak signal to noise ratio
PSNR is a quantity related to the MSE and defined as follows: let c denote the image channel, t(c) the
sample type of this channel and b(c) the sample precision of this channel (see B.1). Then, define the
quantity m(c) as follows:
b(c)
t(c) = signed or unsigned integers m(c) = 2 − 1
t(c) = floating point or fixed point m(c) = 1
The PSNR is then given by Formula (B.2):
wc()−1hc()−1
px,y,c −qx,y,c
() ()
∑ ∑
d−1
1 x=0 y=0
PSNR =−10log (B.2)
10 ∑
d
w cch⋅ cm⋅ c
c=0 () () ()
NOTE The purpose of this measurement is not to define an image quality. A separated benchmark exists
for this test. It is rather designed to identify pathological cases where incorrect or unreasonable compressed
streams are generated.
B.3 Structural similarity index
B.3.1 SSIM
The Structural Similarity Index (SSIM) proposed by Reference [10] quantifies the visible difference
between a distorted image and a reference image. This index is based on the universal image quality
[11]
index (UIQ) . The algorithm identifies the structural information in an image as those attributes
that represent the structure of the objects in the scene, independent of the average luminance and
contrast. The index is based on a combination of luminance, contrast and structure comparison. The
14 © ISO/IEC 2017 – All rights reserved
comparisons are done for local windows in the image; the overall image quality is the mean of all these
local windows.
The resulting measure known as structural similarity index between local windows x and y is defined
as Formula (B.3):
αβ γ
SSIM xy,,= lcxy xy,,s xy (B.3)
() () () ()
where constants αβ,,γ >0 are parameters use to set the importance of respective comparison
measures. The individual similarities for lightness, contrast and structure are defined as Formula (B.4):
22μμ +C σσ +C σ +C
xy 1 xy 2 xy 3
l xy,,= c xy,,= s xy, == (B.4)
() () ()
22 22
σσ +C
μμ++C σσ++C
xy 3
xy 1 xy 2
where
N
μ = x
xi∑
N
i=1
N
2
1 2
σμ= x − (B.5)
()
xi∑ x
N−1
i=1
N
σμ= xy− −μ
()
()
xy ∑ ix iyi
N−1
i=1
and N is number of pixels in the local window. SSIM is usually used in a simplified form where
αβ==γ = 1 and CC= /2 resulting in Formula (B.6):
22μμ +C σ +C
xy 1 xy 2
SSIM xy, = (B.6)
()
22 22
μμ++C σσ++C
xy 1 xy 2
where constants C , C are included to avoid instability.
1 2
Local SSIM index can be calculated yielding in a map describing spatially variant structural similarity
between images. In the image quality assessment, tasks we would like to have one aggregate measure
which can be defined in the simplified case as what is widely known as mean structural similarity
index (MSSIM), as given in Formula (B.7):
M
MSSIM XY,,= SSIM xy (B.7)
()
()
∑ jj
M
j=1
where
X,Y are the reference and distorted images;
x , y are images content in a local window j;
j j
M is the total number of local windows.
Figure B.1 shows the SSIM flowchart, where signal x or signal y has perfect quality and the other is the
distorted image.
© ISO/IEC 2017 – All rights reserved 15
Luminance
Signal x
measurement
−
Contrast
+
Luminance
measurement
+
comparison
÷
Similarity
Contrast
Combination
measure
Luminance comparison
Signal y
measurement
Structure
−
comparison
Contrast
+
measurement
+
Figure B.1 — Flowchart of the SSIM metric
Several values, for example, block size and block overlap, potential colour weights, applied in tests by
the original SSIM metric were left undefined which may make cross correlation of results undependable.
Use and documentation of SSIM should clarify undefined terms and parameters.
B.3.2 Multiscale SSIM
A multiscale version of SSIM was proposed by Reference [12]. The original and reproduction is run
through the SSIM, where the contrast and structure is computed for each subsampled level. The images
are low-passed filtered and down-sampled by 2. The lightness (l) is only computed in the final step,
contrast (c) and structure (s) for each step. The overall values are obtained by multiplying the lightness
value with the sum of contrast and structure for all subsampled levels. Weighting parameters for l, c
and s are suggested based on experimental results. The multiscale SSIM
...
Frequently Asked Questions
ISO/IEC TR 29170-1:2017 is a technical report published by the International Organization for Standardization (ISO). Its full title is "Information technology - Advanced image coding and evaluation - Part 1: Guidelines for image coding system evaluation". This standard covers: ISO/IEC TR 29170-1:2017 recommends best practices for coding system evaluation of images and image sequences. ISO/IEC TR 29170-1:2017 defines a common vocabulary of terms for coding system evaluation and divides evaluation methods into three broad categories: a) subjective assessment; b) objective assessment; c) computational assessment. In addition to these broad assessment categories, this document discusses special care that is given for coding unusual imagery, e.g. high dynamic range or high colour depth. A fourth assessment category, hardware complexity, is often important for real-time or computationally complex applications; however, it is outside the scope of this document.
ISO/IEC TR 29170-1:2017 recommends best practices for coding system evaluation of images and image sequences. ISO/IEC TR 29170-1:2017 defines a common vocabulary of terms for coding system evaluation and divides evaluation methods into three broad categories: a) subjective assessment; b) objective assessment; c) computational assessment. In addition to these broad assessment categories, this document discusses special care that is given for coding unusual imagery, e.g. high dynamic range or high colour depth. A fourth assessment category, hardware complexity, is often important for real-time or computationally complex applications; however, it is outside the scope of this document.
ISO/IEC TR 29170-1:2017 is classified under the following ICS (International Classification for Standards) categories: 35.040 - Information coding; 35.040.30 - Coding of graphical and photographical information. The ICS classification helps identify the subject area and facilitates finding related standards.
You can purchase ISO/IEC TR 29170-1:2017 directly from iTeh Standards. The document is available in PDF format and is delivered instantly after payment. Add the standard to your cart and complete the secure checkout process. iTeh Standards is an authorized distributor of ISO standards.
기사 제목: ISO/IEC TR 29170-1:2017 - 정보 기술 - 고급 이미지 코딩 및 평가 - 제 1부: 이미지 코딩 시스템 평가를 위한 지침 기사 내용: ISO/IEC TR 29170-1:2017은 이미지와 이미지 시퀀스의 코딩 시스템 평가를 위한 최적의 실천 방법을 권장한다. ISO/IEC TR 29170-1:2017은 코딩 시스템 평가를 위한 공통 용어를 정의하고, 평가 방법을 주관적 평가, 객관적 평가, 계산적 평가의 세 가지 범주로 나눈다. 이 문서는 이러한 평가 범주 외에도 고화질이나 고색심도와 같은 특이한 이미지 코딩에 대한 특별한 주의사항을 논의한다. 다만, 실시간이나 계산적으로 복잡한 응용 프로그램에 중요한 하드웨어 복잡성 평가는 본 문서의 범위를 벗어난다.
記事のタイトル:ISO/IEC TR 29170-1:2017 - 情報技術-高度な画像符号化と評価- 第1部:画像符号化システム評価のためのガイドライン 記事の内容:ISO/IEC TR 29170-1:2017は、画像および画像シーケンスの符号化システムの評価に対するベストプラクティスを推奨しています。ISO/IEC TR 29170-1:2017は、評価方法を主観評価、客観評価、計算評価の3つの広範なカテゴリに分類し、評価のための共通の用語を定義しています。本文では、これらの広範な評価カテゴリに加えて、ハイダイナミックレンジやハイカラーデプスなどの特殊なイメージ符号化に対する特別な注意が述べられています。ただし、リアルタイムまたは計算上複雑なアプリケーションに対するハードウェアの複雑さについては、本文書の範囲外とされています。
The article discusses ISO/IEC TR 29170-1:2017, which provides guidelines for evaluating image coding systems. It recommends best practices and defines a common vocabulary for evaluation. The evaluation methods are divided into three categories: subjective assessment, objective assessment, and computational assessment. The article also mentions that special attention is given to coding unusual imagery, such as high dynamic range or high colour depth. However, hardware complexity is not covered in this document.








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...