Speech and multimedia Transmission Quality (STQ) - Transmission requirements for wideband VoIP loudspeaking and handsfree terminals from a QoS perspective as perceived by the user

The present document provides speech transmission performance requirements for 8 kHz wideband VoIP loudspeaking
and hands-free terminals; it addresses all types of IP based terminals, including wireless, softphones and group audio
terminals. DECT terminals are covered in ETSI EN 300 175-8 [i.6] and ETSI EN 300 176-2 [i.7].
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
NOTE: The present document does not concern headset terminals.

Kakovost prenosa govora in večpredstavnih vsebin (STQ) - Prenosne zahteve za širokopasovne zvočniške in prostoročne terminale VoIP glede na kakovost storitev (QoS), kot jo dojema uporabnik

V tem dokumentu so podane zahteve glede učinkovitosti prenosa govora za 8-kHz širokopasovne zvočniške in prostoročne terminale VoIP; obravnava vse vrste terminalov na podlagi naslova IP, vključno z brezžičnimi terminali, programskimi telefoni in terminali za skupinske zvočne klice. Terminali digitalnih izboljšanih brezvrvičnih telekomunikacij (DECT) so zajeti v dokumentih ETSI EN 300 175-8 [i.6] in ETSI EN 300 176-2 [i.7]. V nasprotju z drugimi standardi, ki opredeljujejo minimalne zahteve glede učinkovitosti, je namen tega dokumenta določiti zahteve za terminalsko opremo, ki proizvajalcem in ponudnikom storitev omogočajo, da zagotavljajo dobro kakovost govora od začetka do konca, kot jo dojema uporabnik. Poleg osnovnih preskusnih postopkov ta dokument opisuje napredne preskusne postopke, ki upoštevajo tudi druge parametre kakovosti, kot jih dojema uporabnik. OPOMBA: Ta dokument se ne navezuje na naglavne terminale.

General Information

Status
Published
Publication Date
29-Jun-2022
Current Stage
6060 - National Implementation/Publication (Adopted Project)
Start Date
23-Jun-2022
Due Date
28-Aug-2022
Completion Date
30-Jun-2022
Standard
ETSI ES 202 740 V1.8.2 (2022-03) - Speech and multimedia Transmission Quality (STQ); Transmission requirements for wideband VoIP loudspeaking and handsfree terminals from a QoS perspective as perceived by the user
English language
50 pages
sale 15% off
Preview
sale 15% off
Preview
Standard
ETSI ES 202 740 V1.8.2 (2022-05) - Speech and multimedia Transmission Quality (STQ); Transmission requirements for wideband VoIP loudspeaking and handsfree terminals from a QoS perspective as perceived by the user
English language
50 pages
sale 15% off
Preview
sale 15% off
Preview
Standardization document
SIST ES 202 740 V1.8.2:2022
English language
50 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


Final draft ETSI ES 202 740 V1.8.2 (2022-03)

ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Transmission requirements for wideband
VoIP loudspeaking and handsfree terminals
from a QoS perspective as perceived by the user

2 Final draft ETSI ES 202 740 V1.8.2 (2022-03)

Reference
RES/STQ-303
Keywords
handsfree, loudspeaking, quality, speech, terminal,
VoIP, wideband
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
and/or governmental rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.

Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2022.
All rights reserved.
ETSI
3 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 7
3 Definition of terms, symbols and abbreviations . 8
3.1 Terms . 8
3.2 Symbols . 9
3.3 Abbreviations . 9
4 General considerations . 10
4.1 Coding algorithm . 10
4.2 End-to-end considerations . 10
5 Test equipment . 11
5.1 IP half channel measurement adaptor . 11
5.2 Environmental conditions for tests . 11
5.3 Accuracy of measurements and test signal generation . 11
5.4 Network impairment simulation . 12
5.5 Acoustic environment . 12
5.6 Influence of terminal delay on measurements . 12
6 Requirements and associated measurement methodologies . 12
6.1 Notes . 12
6.2 Test setup. 13
6.2.1 General . 13
6.2.2 Setup for terminals . 13
6.2.2.1 Hands-free measurements . 13
6.2.2.2 Measurements in loudspeaking mode . 18
6.2.3 Test signal levels . 18
6.2.3.1 Send. 18
6.2.3.2 Receive . 19
6.2.4 Setup of background noise simulation . 19
6.2.5 Setup for variable echo path . 20
6.3 Coding independent parameters . 20
6.3.1 Send sensitivity/frequency response . 20
6.3.2 Send Loudness Rating (SLR). 21
6.3.3 Mic mute . 22
6.3.4 Send distortion . 22
6.3.5 Out-of-band signals in send direction . 23
6.3.6 Send noise . 23
6.3.7 Terminal Coupling Loss (TCL) . 24
6.3.8 Stability loss. 25
6.3.9 Receive frequency response . 25
6.3.10 Receive Loudness Rating (RLR) . 28
6.3.11 Receive distortion . 28
6.3.12 Out-of-band signals in receive direction . 29
6.3.13 Receive noise . 30
6.3.14 Double talk performance . 30
6.3.14.1 General . 30
6.3.14.2 Attenuation range in send direction during double talk A . 31
H,S,dt
6.3.14.3 Attenuation range in receive direction during double talk A . 32
H,R,dt
ETSI
4 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
6.3.14.4 Detection of echo components during double talk . 32
6.3.14.5 Minimum activation level and sensitivity of double talk detection . 34
6.3.15 Switching characteristics . 34
6.3.15.1 Note . 34
6.3.15.2 Activation in send direction . 34
6.3.15.3 Silence suppression and comfort noise generation . 35
6.3.16 Background noise performance . 35
6.3.16.1 Performance in send direction in the presence of background noise . 35
6.3.16.2 Speech quality in the presence of background noise . 36
6.3.16.3 Quality of background noise transmission (with far end speech). 36
6.3.17 Quality of echo cancellation . 37
6.3.17.1 Temporal echo effects . 37
6.3.17.2 Spectral echo attenuation . 37
6.3.17.3 Occurrence of artefacts . 38
6.3.17.4 Variable echo path. 38
6.3.18 Variant impairments . 39
6.3.18.1 Clock accuracy send . 39
6.3.18.2 Clock accuracy receive . 39
6.3.18.3 Send packet delay variation. 39
6.3.19 Send and receive delay - round trip delay . 40
6.4 Codec specific requirements. 42
6.4.1 Objective listening speech quality MOS-LQO in send direction . 42
6.4.2 Objective listening speech quality MOS-LQO in receive direction. 43
6.4.3 Quality of jitter buffer adjustment . 44
Annex A (informative): Processing delays in VoIP terminals . 46
Annex B (informative): Bibliography . 49
History . 50

ETSI
5 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its

Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the ®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This final draft ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia
Transmission Quality (STQ), and is now submitted for the ETSI standards Membership Approval Procedure.
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Traditionally, analogue and digital telephones were interfacing switched-circuit 64 kbit/s PCM networks. With the fast
growth of IP networks, wideband terminals providing higher audio-bandwidth and directly interfacing packet-switched
networks (VoIP) are being rapidly introduced. Such IP network edge devices may include gateways, specifically
designed IP phones, soft phones or other devices connected to the IP based networks and providing telephony service.
Due to the unique characteristics of the IP networks including packet loss, delay, etc., new performance specification, as
well as appropriate measurement methods, will have to be developed. Terminals are getting increasingly complex.
The advanced signal processing of terminals is targeted to speech signals. Therefore, wherever possible speech signals
are used for testing in order to achieve mostly realistic test conditions and meaningful results.
The present document provides speech transmission performance requirements for wideband VoIP loudspeaking and
hands-free terminals.
NOTE: Requirement limits are given in tables, the associated curve when provided is given for illustration.
ETSI
6 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
1 Scope
The present document provides speech transmission performance requirements for 8 kHz wideband VoIP loudspeaking
and hands-free terminals; it addresses all types of IP based terminals, including wireless, softphones and group audio
terminals. DECT terminals are covered in ETSI EN 300 175-8 [i.6] and ETSI EN 300 176-2 [i.7].
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
NOTE: The present document does not concern headset terminals.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] ETSI I-ETS 300 245-6: "Integrated Services Digital Network (ISDN); Technical characteristics of
telephony terminals; Part 6: Wideband (7 kHz), loudspeaking and hands free telephony".
[2] Recommendation ITU-T G.108: "Application of the E-model: A planning guide".
[3] Recommendation ITU-T G.109: "Definition of categories of speech transmission quality".
[4] Recommendation ITU-T G.722: "7 kHz audio-coding within 64 kbit/s".
[5] Recommendation ITU-T G.722.1: "Low-complexity coding at 24 and 32 kbit/s for hands-free
operation in systems with low frame loss".
[6] Recommendation ITU-T G.722.2: "Wideband coding of speech at around 16 kbit/s using Adaptive
Multi-Rate Wideband (AMR-WB)".
[7] Recommendation ITU-T G.729.1: "G.729 based embedded variable bit-rate coder: An 8-32 kbit/s
scalable wideband coder bitstream interoperable with G.729".
[8] Recommendation ITU-T P.56: "Objective measurement of active speech level".
[9] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[10] Recommendation ITU-T P.79: "Calculation of loudness ratings for telephone sets".
[11] Recommendation ITU-T P.310: "Transmission characteristics for narrow-band digital handset and
headset telephones".
[12] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
ETSI
7 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
[13] Recommendation ITU-T P.341: "Transmission characteristics for wideband digital loudspeaking
and hands-free telephony terminals".
[14] Recommendation ITU-T P.501: "Test signals for use in telephonometry".
[15] Recommendation ITU-T P.502: "Objective test methods for speech communication systems using
complex test signals".
[16] Recommendation ITU-T P.581: "Use of head and torso simulator for hands-free and handset
terminal testing".
[17] IEC 61260-1: "Electroacoustics - Octave-band and fractional-octave-band filters - Part 1:
Specifications".
[18] Recommendation ITU-T P.800.1: "Mean Opinion Score (MOS) terminology".
[19] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
[20] Recommendation ITU-T P.863.1: "Application guide for Recommendation ITU-T P.863".
[21] Recommendation ITU-T P.863: "Perceptual objective listening quality prediction".
[22] ETSI ES 202 737: "Speech and multimedia Transmission Quality (STQ); Transmission
requirements for narrowband VoIP terminals (handset and headset) from a QoS perspective as
perceived by the user".
[23] Recommendation ITU-T P.1010: "Fundamental voice transmission objectives for VoIP terminals
and gateways".
[24] IETF RFC 3550: "RTP: A Transport Protocol for Real-Time Applications".
[25] TIA-920.130: "Telecommunications Communications Products Transmission Requirements for
Digital Interface Communications Devices with Headsets".
[26] Recommendation ITU-T G.122: "Influence of national systems on stability and talker echo in
international connections".
[27] Void.
[28] Recommendation ITU-T G.711: "Pulse code modulation (PCM) of voice frequencies".
[29] Recommendation ITU-T G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraic-
code-excited linear prediction (CS-ACELP)".
[30] Recommendation ITU-T G.723.1: "Dual rate speech coder for multimedia communications
transmitting at 5.3 and 6.3 kbit/s".
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] ETSI EG 202 425: "Speech Processing, Transmission and Quality Aspects (STQ); Definition and
implementation of VoIP reference point".
ETSI
8 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
[i.2] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".

[i.3] Netem .
NOTE: Information available at https://wiki.linuxfoundation.org/networking/netem.
[i.4] ETSI EG 201 377-1: "Speech and multimedia Transmission Quality (STQ); Specification and
measurement of speech transmission quality; Part 1: Introduction to objective comparison
measurement methods for one-way speech quality across networks".
[i.5] IETF RFC 4737: "Packet Reordering Metrics".
[i.6] ETSI EN 300 175-8: "Digital Enhanced Cordless Telecommunications (DECT); Common
Interface (CI); Part 8: Speech and audio coding and transmission".
[i.7] ETSI EN 300 176-2: "Digital Enhanced Cordless Telecommunications (DECT); Test
specification; Part 2: Audio and speech".
3 Definition of terms, symbols and abbreviations
3.1 Terms
For the purposes of the present document, the following terms apply:
artificial ear: device for the calibration of earphones incorporating an acoustic coupler and a calibrated microphone for
the measurement of the sound pressure and having an overall acoustic impedance similar to that of the median adult
human ear over a given frequency band
codec: combination of an analogue-to-digital encoder and a digital-to-analogue decoder operating in opposite directions
of transmission in the same equipment
ear-Drum Reference Point (DRP): point located at the end of the ear canal, corresponding to the ear-drum position
freefield equalization: artificial head is equalized in such a way that for frontal sound incidence in anechoic conditions
the frequency response of the artificial head is flat
freefield reference point: point located in the free sound field, at least in 1,5 m distance from a sound source radiating
in free air
NOTE: In case of a head and torso simulator (HATS) in the centre of the artificial head with no artificial head
present.
group-audio terminal: handsfree terminal primarily designed for use by several users which will not be equipped with
a handset
handsfree telephony terminal: telephony terminal using a loudspeaker associated with an amplifier as a telephone
receiver and which can be used without a handset
HATS Hands-Free Reference Point (HATS HFRP): reference point "n" from Recommendation ITU-T P.58 [9] "n" is
one of the points numbered from 11 to 17 and defined in table 6a of Recommendation ITU-T P.58 [9], (coordinates of
far field front point)
NOTE: The HATS HFRP depends on the location(s) of the microphones of the terminal under test: the
appropriate axis lip-ring/HATS HFRP is to be as close as possible to the axis lip-ring/HFT microphone
under test.
Head And Torso Simulator (HATS) for telephonometry: manikin extending downward from the top of the head to
the waist, designed to simulate the sound pick-up characteristics and the acoustic diffraction produced by a median
human adult and to reproduce the acoustic field generated by the human mouth
ETSI
9 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
loudspeaking function: function of a handset telephone using a loudspeaker associated with an amplifier as a
telephone receiver
Mouth Reference Point (MRP): point located on axis and 25 mm in front of the lip plane of a mouth simulator
nominal setting of the volume control: setting which is closest to the nominal RLR
reordering: packet order changes during transfer over the network [i.5], packets arrive out of order at the receiver (i.e.
RTP packets)
softphone: speech communication system based upon a computer
3.2 Symbols
Void.
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
AM-FM Amplitude Modulation - Frequency Modulation
AMR-WB Adaptative Multi Rate - Wideband
CS Composite Source
CSS Composite Source Signal
DRP ear-Drum Reference Point
DUT Devise Under Test
EC Echo Canceller
EL Echo Loss
ERP Ear Reference Point
FFT Fast Fourrier Transform
G-MOS-LQOw Overall transmission quality wideband
GSM Global System for Mobile communications
HATS Head And Torso Simulator
HFRP Hands Free Reference Point
HFT Hands Free Terminal
IEC International Electrotechnical Commission
IP Internet Protocol
IPDV IP Packet Delay Variation
ITU-T International Telecommunication Union -Telecommunication standardization sector
LE Earphone coupling Loss
MOS Mean Opinion Score
MOS-LQOy Mean Opinion Score - Listening Quality Objective
NOTE: y being N for narrowband, M for mixed, S for super-wideband and F for fullband. See Recommendation
ITU-T P.800.1 [18].
MRP Mouth Reference Point
N-MOS-LQOw Transmission quality of the background noise wideband
NLP Non Linear Processor
PBX Private Branch eXchange
PC Personal Computer
PCM Pulse Code Modulation
PDA Personal Digital Assistant
PMRP Sound Pressure at the Mouth Reference Point
PN Pseudo Noise
POI Point Of Interconnection
QoS Quality of Service
RLR Receive Loudness Rating
RMS Root Mean Square
RTP Real-Time Transport Protocol
S-MOS-LQOw Transmission quality of the speech wideband
ETSI
10 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
SLR Send Loudness Rating
TCL Terminal Coupling Loss
TDM Time Division Multiplex
TELR Talker Echo Loudness Rating
TOSQA Telecommunications Objective Speech Quality Assessment
VAD Voice Activity Detection
VoIP Voice over Internet Protocol
4 General considerations
4.1 Coding algorithm
The coding algorithm assumed in the present document is according to Recommendation ITU-T G.722 [4]. VoIP
terminals may support other coding algorithms.
NOTE: Associated Packet Loss Concealment, e.g. as defined in Recommendation ITU-T G.722 [4], appendixes 3
and 4, should be used.
4.2 End-to-end considerations
In order to achieve a desired end-to-end speech transmission performance (mouth-to-ear) it is recommended that
general rules of transmission planning tasks are carried out with the E-model taking into account that E-model does not
directly address handsfree or loudspeaking terminals; this includes the a-priori determination of the desired category of
speech transmission quality as defined in Recommendation ITU-T G.109 [3].
While, in general, the transmission characteristics of single circuit-oriented network elements, such as switches or
terminals can be assumed to have a single input value for the planning tasks of Recommendation ITU-T G.108 [2], this
approach is not applicable in packet based systems and thus there is a need for the transmission planner's specific
attention.
In particular the decision as to which delay measured according to the present document should be acceptable or
representative for the specific configuration is the responsibility of the individual transmission planner.
Recommendation ITU-T G.108 [2] with its amendments provides further guidance on this important issue.
The following optimum terminal parameters from a user's perspective need to be considered:
• Minimized delay in send and receive direction.
• Optimum loudness rating (RLR, SLR).
• Compensation for network delay variation.
• Packet loss recovery performance.
• Maximized terminal coupling loss.
• Some more basic (ETSI I-ETS 300 245-6 [1]) parameters are applicable, if Recommendation ITU-T G.722 [4]
is used.
ETSI
11 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
5 Test equipment
5.1 IP half channel measurement adaptor
The IP half channel measurement adaptor is described in ETSI EG 202 425 [i.1].
5.2 Environmental conditions for tests
The following conditions shall apply for the testing environment:
a) ambient temperature: 15 °C to 35 °C;
b) relative humidity: 5 % to 85 %;
c) air pressure: 86 kPa to 106 kPa (860 mbar to 1 060 mbar).
5.3 Accuracy of measurements and test signal generation
Unless specified otherwise, the accuracy of measurements made by test equipment shall be equal to or better than:
Table 1: Measurement accuracy
Item Accuracy
Electrical signal level ±0,2 dB for levels ≥ -50 dBV
±0,4 dB for levels < -50 dBV
Sound pressure ±0,7 dB
Frequency ±0,2 %
Time ±0,2 %
Application force ±2 N
Measured maximum frequency 20 kHz

NOTE: The measured maximum frequency is due to Recommendation ITU-T P.58 [9] limitations.
Unless specified otherwise, the accuracy of the signals generated by the test equipment shall be better than:
Table 2: Accuracy of test signal generation
Quantity Accuracy
Sound pressure level at ±3 dB for frequencies from 100 Hz to 200 Hz
Mouth Reference Point (MRP) ±1 dB for frequencies from 200 Hz to 4 000 Hz
±3 dB for frequencies from 4 000 Hz to 8 000 Hz
Electrical excitation levels ±0,4 dB across the whole frequency range
Frequency generation ±2 % (see note)
Time ±0,2 %
Specified component values ±1 %
NOTE: This tolerance may be used to avoid measurements at critical frequencies, e.g. those
due to sampling operations within the terminal under test.

For terminal equipment which is directly powered from the mains supply, all tests shall be carried out within ±5 % of
the rated voltage of that supply. If the equipment is powered by other means and those means are not supplied as part of
the apparatus, all tests shall be carried out within the power supply limit declared by the supplier. If the power supply is
alternate current, the test shall be conducted within ±4 % of the rated frequency.
ETSI
12 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
5.4 Network impairment simulation
At least one set of requirements is based on the assumption of an error free packet network, and at least one other set of
requirements is based on a defined simulated loss of performance of the packet network.
™ ™
An appropriate network simulator has to be used, for example Netem [i.3]. The key points of Netem can be
summarized as follows:
™ ™ ™
• Netem is part of the networking function of Linux . With Netem , there can be generated loss, duplication,

delay, jitter and reordering (and the distribution of jitter can be chosen during runtime). Netem can be run on

a Linux -PC running as a bridge or a router.
• It is not advised to define specific distortion patterns for testing in standards, because it will be easy to adapt
devices to these patterns (as it is already done for test signals). But if a pattern is unknown to a manufacturer,
the same pattern can be used by a test lab for different devices and gives comparable results.
™ ™
NOTE: Netem and Linux are examples of suitable products available commercially. This information is given
for the convenience of users of the present document and does not constitute an endorsement by ETSI of
these product(s).
Requirements for the network impairment simulation can be found in annex D of ETSI ES 202 737 [22].
5.5 Acoustic environment
Unless stated otherwise measurements shall be conducted under quiet and "anechoic" conditions. Depending on the
distance of the transducers from mouth and ear a quiet office room may be sufficient e.g. for handsets where artificial
mouth and artificial ear are located close to the acoustical transducers. But this is not applicable for handsfree and
loudspeaking terminals.
In cases where real or simulated background noise is used as part of the testing environment, the original background
noise shall not be noticeably influenced by the acoustical properties of the room.
In all cases where the performance of acoustic echo cancellers shall be tested, a realistic room, which represents the
typical user environment for the terminal shall be used.
In case where an anechoic room is not available the test room has to be an acoustically treated room with few
reflections and a low noise level.
Considering this, test laboratory, in the case where its test room does not conform to anechoic conditions as given in
Recommendation ITU-T P.341 [13], has to present difference in results for measurements due to its test room.
5.6 Influence of terminal delay on measurements
As delay is introduced by the terminal, care shall be taken for all measurements where exact position of the analysis
window is required. It shall be checked that the test is performed on the test signal and not on any other signal.
6 Requirements and associated measurement
methodologies
6.1 Notes
NOTE 1: In general the test methods as described in the present document apply. If alternative methods exist they
may be used if they have been proven to give the same result as the method described in the present
document. This will be indicated in the test report.
NOTE 2: Due to time variant nature of IP connection, delay variation may impair the measurement. In such case,
the measurement has to be repeated until a valid measurement can be achieved.
ETSI
13 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
6.2 Test setup
6.2.1 General
In order to use a compatible test system for all types of speech terminals a HATS (Head And Torso Simulator) will be
used instead of freefield microphone (for receive measurement) and artificial mouth (for send measurement). HATS is
described in Recommendation ITU-T P.58 [9].
The preferred way of testing a terminal is to connect it to a network simulator with exact defined settings and access
points. The test sequences are fed in either electrically, using a reference codec or using the direct signal processing
approach or acoustically using ITU-T specified devices.
When a coder with variable bit rate is used for testing the terminal parameters, the bit rate recognized giving the best
characteristics and/or the ones commonly used should be selected, e.g.:
• Recommendation ITU-T G.722 [4]: 64 kbit/s.
• Recommendation ITU-T G.722.2 [6]: 12,65 kbit/s and/or 23,85 kbit/s.
• Recommendation ITU-T G.729.1 [7]: 32 kbit/s.
VoIP
IP-Half-Channel
Network
Measurement Terminal
simulator
under
Adapter
delay,
(VoIP Reference Point)
Path through Path through
test
jitter,
Gateway
IP network IP network
packet loss
Simulation
POI
Electrical
Reference
Point
Measurement System
Figure 1: Half channel terminal measurement
6.2.2 Setup for terminals
6.2.2.1 Hands-free measurements
The ear used for measurement will be indicated in the test report.
Desktop operated hands-free terminal
For HATS test equipment, definition of hands-free terminal and setups for hands-free terminal can be found in
Recommendation ITU-T P.581 [16].
ETSI
14 Final draft ETSI ES 202 740 V1.8.2 (2022-03)

Figure 2: Position for test of desktop hands free terminal side view
60 cm
Figure 3: Position for test of desktop hands free terminal top sight
Handheld hands-free terminal
It should be placed in according to Figure 4. The HATS should be positioned so that the HATS Reference Point is at a
distance d from the centre point of the visual display of the Mobile Station. The distance d is specified by the
HF HF
manufacturer. A vertical angle θ may be specified by the manufacturer.
HF
ETSI
15 Final draft ETSI ES 202 740 V1.8.2 (2022-03)

Figure 4: Configuration of Hand-Held loudspeaker relative to the HATS side view
d = d , d = d - d , where d is the distance for receive measurement, d is the distance for send
HFR HF HFS HF EM HFR HFS
measurement, and d is the distance from ERP to MRP.
EM
When no operating distance is specified by manufacturer, value for d will be 30 cm. A calculation of d for HATS
HFS EM
gives 12 cm.
A value of 42 cm will be taken for d .
HF
Softphone (computer-based terminals)
When manufacturer gives conditions of use, they will apply for test.
If no other requirement is given by manufacturer softphone will be positioned according to the following conditions:
Softphone including speakers and microphone
Two types of softphones are to be considered:
• Type 1 is to be used as a desktop type (e.g. notebook).
• Type 2 is to be used as a handheld type (e.g. PDA).

Figure 5: Configuration of softphone relative to the HATS side view
ETSI
16 Final draft ETSI ES 202 740 V1.8.2 (2022-03)

Figure 6: Configuration of softphone relative to the HATS top sight
Softphone with separate speakers
When separate loudspeakers are used, system will be positioned as in Figure 7.

Figure 7: Configuration of softphone using external speakers relative to the HATS top sight
ETSI
17 Final draft ETSI ES 202 740 V1.8.2 (2022-03)
When external microphone and speakers are used, system will be positioned as in Figure 8.

Figure 8: Configuration of softphone using
external speakers and microphone relative to the HATS top sight
Group audio terminal
When manufacturer gives conditions of use, they will apply for test.
When no requirement from manufacturer is given, the following conditions will be used by test laboratory.
Measurement will be conducted by using a HATS test equipment.
The following test position will be used.

Figure 9: Configuration of group audio terminal relative to the HATS side view
ETSI
18 Final draft ETSI ES 202 740 V1.8.2 (2022-03)

Figure 10: Configuration of group audio terminal relative to the HATS top sight
NOTE: In case of special casing where those conditions are not realistic, test laboratory can use a different
position more representative of real use. The conditions of test will be given in the test report.
6.2.2.2 Measurements in loudspeaking mode
For those measurements HATS will be used.
It will be positioned as defined in clause 6.2.2. Measurement will be performed on one ear and handset will be placed
on the other ear. The ear used for measurement will be specified in test report. For the handset 8N application force
shall be used.
NOTE: Only desktop terminals are concerned by loudspeaking measurement.
6.2.3 Test signal levels
6.2.3.1 Send
Unless specified otherwise, the test signal level shall be -4,7 dBPa at the MRP.
The various steps for calibration of the artificial mouth of the HATS are described in Recommendation ITU-T
P.581 [16].
The level at MRP (measured in third octave bands) adjusted at the first step (with total level of -4,7 dBPa) is used as the
reference for send characteristics.
The test setup shall be in conformance with Figure 11 but, depending on the type of terminal, the appropriate distance
and level will be used. When using this calibration method, send sensitivity shall be calculated as follows:
S = 20logV − 20logP + Corr − Dcorr
mJ s MRP
(1)
where:
• V is the measured voltage across the a
...


ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Transmission requirements for wideband
VoIP loudspeaking and handsfree terminals
from a QoS perspective as perceived by the user

2 ETSI ES 202 740 V1.8.2 (2022-05)

Reference
RES/STQ-303
Keywords
handsfree, loudspeaking, quality, speech, terminal,
VoIP, wideband
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
If you find a security vulnerability in the present document, please report it through our
Coordinated Vulnerability Disclosure Program:
https://www.etsi.org/standards/coordinated-vulnerability-disclosure
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
governmental rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
and/or
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.

Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2022.
All rights reserved.
ETSI
3 ETSI ES 202 740 V1.8.2 (2022-05)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 7
3 Definition of terms, symbols and abbreviations . 8
3.1 Terms . 8
3.2 Symbols . 9
3.3 Abbreviations . 9
4 General considerations . 10
4.1 Coding algorithm . 10
4.2 End-to-end considerations . 10
5 Test equipment . 11
5.1 IP half channel measurement adaptor . 11
5.2 Environmental conditions for tests . 11
5.3 Accuracy of measurements and test signal generation . 11
5.4 Network impairment simulation . 12
5.5 Acoustic environment . 12
5.6 Influence of terminal delay on measurements . 12
6 Requirements and associated measurement methodologies . 12
6.1 Notes . 12
6.2 Test setup. 13
6.2.1 General . 13
6.2.2 Setup for terminals . 13
6.2.2.1 Hands-free measurements . 13
6.2.2.2 Measurements in loudspeaking mode . 18
6.2.3 Test signal levels . 18
6.2.3.1 Send. 18
6.2.3.2 Receive . 19
6.2.4 Setup of background noise simulation . 19
6.2.5 Setup for variable echo path . 20
6.3 Coding independent parameters . 20
6.3.1 Send sensitivity/frequency response . 20
6.3.2 Send Loudness Rating (SLR). 21
6.3.3 Mic mute . 22
6.3.4 Send distortion . 22
6.3.5 Out-of-band signals in send direction . 23
6.3.6 Send noise . 23
6.3.7 Terminal Coupling Loss (TCL) . 24
6.3.8 Stability loss. 25
6.3.9 Receive frequency response . 25
6.3.10 Receive Loudness Rating (RLR) . 28
6.3.11 Receive distortion . 28
6.3.12 Out-of-band signals in receive direction . 29
6.3.13 Receive noise . 30
6.3.14 Double talk performance . 30
6.3.14.1 General . 30
6.3.14.2 Attenuation range in send direction during double talk A . 31
H,S,dt
6.3.14.3 Attenuation range in receive direction during double talk A . 32
H,R,dt
ETSI
4 ETSI ES 202 740 V1.8.2 (2022-05)
6.3.14.4 Detection of echo components during double talk . 32
6.3.14.5 Minimum activation level and sensitivity of double talk detection . 34
6.3.15 Switching characteristics . 34
6.3.15.1 Note . 34
6.3.15.2 Activation in send direction . 34
6.3.15.3 Silence suppression and comfort noise generation . 35
6.3.16 Background noise performance . 35
6.3.16.1 Performance in send direction in the presence of background noise . 35
6.3.16.2 Speech quality in the presence of background noise . 36
6.3.16.3 Quality of background noise transmission (with far end speech). 36
6.3.17 Quality of echo cancellation . 37
6.3.17.1 Temporal echo effects . 37
6.3.17.2 Spectral echo attenuation . 37
6.3.17.3 Occurrence of artefacts . 38
6.3.17.4 Variable echo path. 38
6.3.18 Variant impairments . 39
6.3.18.1 Clock accuracy send . 39
6.3.18.2 Clock accuracy receive . 39
6.3.18.3 Send packet delay variation. 39
6.3.19 Send and receive delay - round trip delay . 40
6.4 Codec specific requirements. 42
6.4.1 Objective listening speech quality MOS-LQO in send direction . 42
6.4.2 Objective listening speech quality MOS-LQO in receive direction. 43
6.4.3 Quality of jitter buffer adjustment . 44
Annex A (informative): Processing delays in VoIP terminals . 46
Annex B (informative): Bibliography . 49
History . 50

ETSI
5 ETSI ES 202 740 V1.8.2 (2022-05)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its

Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the ®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia Transmission
Quality (STQ).
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Traditionally, analogue and digital telephones were interfacing switched-circuit 64 kbit/s PCM networks. With the fast
growth of IP networks, wideband terminals providing higher audio-bandwidth and directly interfacing packet-switched
networks (VoIP) are being rapidly introduced. Such IP network edge devices may include gateways, specifically
designed IP phones, soft phones or other devices connected to the IP based networks and providing telephony service.
Due to the unique characteristics of the IP networks including packet loss, delay, etc., new performance specification, as
well as appropriate measurement methods, will have to be developed. Terminals are getting increasingly complex.
The advanced signal processing of terminals is targeted to speech signals. Therefore, wherever possible speech signals
are used for testing in order to achieve mostly realistic test conditions and meaningful results.
The present document provides speech transmission performance requirements for wideband VoIP loudspeaking and
hands-free terminals.
NOTE: Requirement limits are given in tables, the associated curve when provided is given for illustration.
ETSI
6 ETSI ES 202 740 V1.8.2 (2022-05)
1 Scope
The present document provides speech transmission performance requirements for 8 kHz wideband VoIP loudspeaking
and hands-free terminals; it addresses all types of IP based terminals, including wireless, softphones and group audio
terminals. DECT terminals are covered in ETSI EN 300 175-8 [i.6] and ETSI EN 300 176-2 [i.7].
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
NOTE: The present document does not concern headset terminals.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] ETSI I-ETS 300 245-6: "Integrated Services Digital Network (ISDN); Technical characteristics of
telephony terminals; Part 6: Wideband (7 kHz), loudspeaking and hands free telephony".
[2] Recommendation ITU-T G.108: "Application of the E-model: A planning guide".
[3] Recommendation ITU-T G.109: "Definition of categories of speech transmission quality".
[4] Recommendation ITU-T G.722: "7 kHz audio-coding within 64 kbit/s".
[5] Recommendation ITU-T G.722.1: "Low-complexity coding at 24 and 32 kbit/s for hands-free
operation in systems with low frame loss".
[6] Recommendation ITU-T G.722.2: "Wideband coding of speech at around 16 kbit/s using Adaptive
Multi-Rate Wideband (AMR-WB)".
[7] Recommendation ITU-T G.729.1: "G.729 based embedded variable bit-rate coder: An 8-32 kbit/s
scalable wideband coder bitstream interoperable with G.729".
[8] Recommendation ITU-T P.56: "Objective measurement of active speech level".
[9] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[10] Recommendation ITU-T P.79: "Calculation of loudness ratings for telephone sets".
[11] Recommendation ITU-T P.310: "Transmission characteristics for narrow-band digital handset and
headset telephones".
[12] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
ETSI
7 ETSI ES 202 740 V1.8.2 (2022-05)
[13] Recommendation ITU-T P.341: "Transmission characteristics for wideband digital loudspeaking
and hands-free telephony terminals".
[14] Recommendation ITU-T P.501: "Test signals for use in telephonometry".
[15] Recommendation ITU-T P.502: "Objective test methods for speech communication systems using
complex test signals".
[16] Recommendation ITU-T P.581: "Use of head and torso simulator for hands-free and handset
terminal testing".
[17] IEC 61260-1: "Electroacoustics - Octave-band and fractional-octave-band filters - Part 1:
Specifications".
[18] Recommendation ITU-T P.800.1: "Mean Opinion Score (MOS) terminology".
[19] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
[20] Recommendation ITU-T P.863.1: "Application guide for Recommendation ITU-T P.863".
[21] Recommendation ITU-T P.863: "Perceptual objective listening quality prediction".
[22] ETSI ES 202 737: "Speech and multimedia Transmission Quality (STQ); Transmission
requirements for narrowband VoIP terminals (handset and headset) from a QoS perspective as
perceived by the user".
[23] Recommendation ITU-T P.1010: "Fundamental voice transmission objectives for VoIP terminals
and gateways".
[24] IETF RFC 3550: "RTP: A Transport Protocol for Real-Time Applications".
[25] TIA-920.130: "Telecommunications Communications Products Transmission Requirements for
Digital Interface Communications Devices with Headsets".
[26] Recommendation ITU-T G.122: "Influence of national systems on stability and talker echo in
international connections".
[27] Void.
[28] Recommendation ITU-T G.711: "Pulse code modulation (PCM) of voice frequencies".
[29] Recommendation ITU-T G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraic-
code-excited linear prediction (CS-ACELP)".
[30] Recommendation ITU-T G.723.1: "Dual rate speech coder for multimedia communications
transmitting at 5.3 and 6.3 kbit/s".
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] ETSI EG 202 425: "Speech Processing, Transmission and Quality Aspects (STQ); Definition and
implementation of VoIP reference point".
ETSI
8 ETSI ES 202 740 V1.8.2 (2022-05)
[i.2] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".

[i.3] Netem .
NOTE: Information available at https://wiki.linuxfoundation.org/networking/netem.
[i.4] ETSI EG 201 377-1: "Speech and multimedia Transmission Quality (STQ); Specification and
measurement of speech transmission quality; Part 1: Introduction to objective comparison
measurement methods for one-way speech quality across networks".
[i.5] IETF RFC 4737: "Packet Reordering Metrics".
[i.6] ETSI EN 300 175-8: "Digital Enhanced Cordless Telecommunications (DECT); Common
Interface (CI); Part 8: Speech and audio coding and transmission".
[i.7] ETSI EN 300 176-2: "Digital Enhanced Cordless Telecommunications (DECT); Test
specification; Part 2: Audio and speech".
3 Definition of terms, symbols and abbreviations
3.1 Terms
For the purposes of the present document, the following terms apply:
artificial ear: device for the calibration of earphones incorporating an acoustic coupler and a calibrated microphone for
the measurement of the sound pressure and having an overall acoustic impedance similar to that of the median adult
human ear over a given frequency band
codec: combination of an analogue-to-digital encoder and a digital-to-analogue decoder operating in opposite directions
of transmission in the same equipment
ear-Drum Reference Point (DRP): point located at the end of the ear canal, corresponding to the ear-drum position
freefield equalization: artificial head is equalized in such a way that for frontal sound incidence in anechoic conditions
the frequency response of the artificial head is flat
freefield reference point: point located in the free sound field, at least in 1,5 m distance from a sound source radiating
in free air
NOTE: In case of a head and torso simulator (HATS) in the centre of the artificial head with no artificial head
present.
group-audio terminal: handsfree terminal primarily designed for use by several users which will not be equipped with
a handset
handsfree telephony terminal: telephony terminal using a loudspeaker associated with an amplifier as a telephone
receiver and which can be used without a handset
HATS Hands-Free Reference Point (HATS HFRP): reference point "n" from Recommendation ITU-T P.58 [9] "n" is
one of the points numbered from 11 to 17 and defined in table 6a of Recommendation ITU-T P.58 [9], (coordinates of
far field front point)
NOTE: The HATS HFRP depends on the location(s) of the microphones of the terminal under test: the
appropriate axis lip-ring/HATS HFRP is to be as close as possible to the axis lip-ring/HFT microphone
under test.
Head And Torso Simulator (HATS) for telephonometry: manikin extending downward from the top of the head to
the waist, designed to simulate the sound pick-up characteristics and the acoustic diffraction produced by a median
human adult and to reproduce the acoustic field generated by the human mouth
ETSI
9 ETSI ES 202 740 V1.8.2 (2022-05)
loudspeaking function: function of a handset telephone using a loudspeaker associated with an amplifier as a
telephone receiver
Mouth Reference Point (MRP): point located on axis and 25 mm in front of the lip plane of a mouth simulator
nominal setting of the volume control: setting which is closest to the nominal RLR
reordering: packet order changes during transfer over the network [i.5], packets arrive out of order at the receiver (i.e.
RTP packets)
softphone: speech communication system based upon a computer
3.2 Symbols
Void.
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
AM-FM Amplitude Modulation - Frequency Modulation
AMR-WB Adaptative Multi Rate - Wideband
CS Composite Source
CSS Composite Source Signal
DRP ear-Drum Reference Point
DUT Devise Under Test
EC Echo Canceller
EL Echo Loss
ERP Ear Reference Point
FFT Fast Fourrier Transform
G-MOS-LQOw Overall transmission quality wideband
GSM Global System for Mobile communications
HATS Head And Torso Simulator
HFRP Hands Free Reference Point
HFT Hands Free Terminal
IEC International Electrotechnical Commission
IP Internet Protocol
IPDV IP Packet Delay Variation
ITU-T International Telecommunication Union -Telecommunication standardization sector
LE Earphone coupling Loss
MOS Mean Opinion Score
MOS-LQOy Mean Opinion Score - Listening Quality Objective
NOTE: y being N for narrowband, M for mixed, S for super-wideband and F for fullband. See Recommendation
ITU-T P.800.1 [18].
MRP Mouth Reference Point
N-MOS-LQOw Transmission quality of the background noise wideband
NLP Non Linear Processor
PBX Private Branch eXchange
PC Personal Computer
PCM Pulse Code Modulation
PDA Personal Digital Assistant
PMRP Sound Pressure at the Mouth Reference Point
PN Pseudo Noise
POI Point Of Interconnection
QoS Quality of Service
RLR Receive Loudness Rating
RMS Root Mean Square
RTP Real-Time Transport Protocol
S-MOS-LQOw Transmission quality of the speech wideband
ETSI
10 ETSI ES 202 740 V1.8.2 (2022-05)
SLR Send Loudness Rating
TCL Terminal Coupling Loss
TDM Time Division Multiplex
TELR Talker Echo Loudness Rating
TOSQA Telecommunications Objective Speech Quality Assessment
VAD Voice Activity Detection
VoIP Voice over Internet Protocol
4 General considerations
4.1 Coding algorithm
The coding algorithm assumed in the present document is according to Recommendation ITU-T G.722 [4]. VoIP
terminals may support other coding algorithms.
NOTE: Associated Packet Loss Concealment, e.g. as defined in Recommendation ITU-T G.722 [4], appendixes 3
and 4, should be used.
4.2 End-to-end considerations
In order to achieve a desired end-to-end speech transmission performance (mouth-to-ear) it is recommended that
general rules of transmission planning tasks are carried out with the E-model taking into account that E-model does not
directly address handsfree or loudspeaking terminals; this includes the a-priori determination of the desired category of
speech transmission quality as defined in Recommendation ITU-T G.109 [3].
While, in general, the transmission characteristics of single circuit-oriented network elements, such as switches or
terminals can be assumed to have a single input value for the planning tasks of Recommendation ITU-T G.108 [2], this
approach is not applicable in packet based systems and thus there is a need for the transmission planner's specific
attention.
In particular the decision as to which delay measured according to the present document should be acceptable or
representative for the specific configuration is the responsibility of the individual transmission planner.
Recommendation ITU-T G.108 [2] with its amendments provides further guidance on this important issue.
The following optimum terminal parameters from a user's perspective need to be considered:
• Minimized delay in send and receive direction.
• Optimum loudness rating (RLR, SLR).
• Compensation for network delay variation.
• Packet loss recovery performance.
• Maximized terminal coupling loss.
• Some more basic (ETSI I-ETS 300 245-6 [1]) parameters are applicable, if Recommendation ITU-T G.722 [4]
is used.
ETSI
11 ETSI ES 202 740 V1.8.2 (2022-05)
5 Test equipment
5.1 IP half channel measurement adaptor
The IP half channel measurement adaptor is described in ETSI EG 202 425 [i.1].
5.2 Environmental conditions for tests
The following conditions shall apply for the testing environment:
a) ambient temperature: 15 °C to 35 °C;
b) relative humidity: 5 % to 85 %;
c) air pressure: 86 kPa to 106 kPa (860 mbar to 1 060 mbar).
5.3 Accuracy of measurements and test signal generation
Unless specified otherwise, the accuracy of measurements made by test equipment shall be equal to or better than:
Table 1: Measurement accuracy
Item Accuracy
Electrical signal level ±0,2 dB for levels ≥ -50 dBV
±0,4 dB for levels < -50 dBV
Sound pressure ±0,7 dB
Frequency ±0,2 %
Time ±0,2 %
Application force ±2 N
Measured maximum frequency 20 kHz

NOTE: The measured maximum frequency is due to Recommendation ITU-T P.58 [9] limitations.
Unless specified otherwise, the accuracy of the signals generated by the test equipment shall be better than:
Table 2: Accuracy of test signal generation
Quantity Accuracy
Sound pressure level at ±3 dB for frequencies from 100 Hz to 200 Hz
Mouth Reference Point (MRP) ±1 dB for frequencies from 200 Hz to 4 000 Hz
±3 dB for frequencies from 4 000 Hz to 8 000 Hz
Electrical excitation levels ±0,4 dB across the whole frequency range
Frequency generation ±2 % (see note)
Time ±0,2 %
Specified component values ±1 %
NOTE: This tolerance may be used to avoid measurements at critical frequencies, e.g. those
due to sampling operations within the terminal under test.

For terminal equipment which is directly powered from the mains supply, all tests shall be carried out within ±5 % of
the rated voltage of that supply. If the equipment is powered by other means and those means are not supplied as part of
the apparatus, all tests shall be carried out within the power supply limit declared by the supplier. If the power supply is
alternate current, the test shall be conducted within ±4 % of the rated frequency.
ETSI
12 ETSI ES 202 740 V1.8.2 (2022-05)
5.4 Network impairment simulation
At least one set of requirements is based on the assumption of an error free packet network, and at least one other set of
requirements is based on a defined simulated loss of performance of the packet network.
™ ™
An appropriate network simulator has to be used, for example Netem [i.3]. The key points of Netem can be
summarized as follows:
™ ™ ™
• Netem is part of the networking function of Linux . With Netem , there can be generated loss, duplication,

delay, jitter and reordering (and the distribution of jitter can be chosen during runtime). Netem can be run on

a Linux -PC running as a bridge or a router.
• It is not advised to define specific distortion patterns for testing in standards, because it will be easy to adapt
devices to these patterns (as it is already done for test signals). But if a pattern is unknown to a manufacturer,
the same pattern can be used by a test lab for different devices and gives comparable results.
™ ™
NOTE: Netem and Linux are examples of suitable products available commercially. This information is given
for the convenience of users of the present document and does not constitute an endorsement by ETSI of
these product(s).
Requirements for the network impairment simulation can be found in annex D of ETSI ES 202 737 [22].
5.5 Acoustic environment
Unless stated otherwise measurements shall be conducted under quiet and "anechoic" conditions. Depending on the
distance of the transducers from mouth and ear a quiet office room may be sufficient e.g. for handsets where artificial
mouth and artificial ear are located close to the acoustical transducers. But this is not applicable for handsfree and
loudspeaking terminals.
In cases where real or simulated background noise is used as part of the testing environment, the original background
noise shall not be noticeably influenced by the acoustical properties of the room.
In all cases where the performance of acoustic echo cancellers shall be tested, a realistic room, which represents the
typical user environment for the terminal shall be used.
In case where an anechoic room is not available the test room has to be an acoustically treated room with few
reflections and a low noise level.
Considering this, test laboratory, in the case where its test room does not conform to anechoic conditions as given in
Recommendation ITU-T P.341 [13], has to present difference in results for measurements due to its test room.
5.6 Influence of terminal delay on measurements
As delay is introduced by the terminal, care shall be taken for all measurements where exact position of the analysis
window is required. It shall be checked that the test is performed on the test signal and not on any other signal.
6 Requirements and associated measurement
methodologies
6.1 Notes
NOTE 1: In general the test methods as described in the present document apply. If alternative methods exist they
may be used if they have been proven to give the same result as the method described in the present
document. This will be indicated in the test report.
NOTE 2: Due to time variant nature of IP connection, delay variation may impair the measurement. In such case,
the measurement has to be repeated until a valid measurement can be achieved.
ETSI
13 ETSI ES 202 740 V1.8.2 (2022-05)
6.2 Test setup
6.2.1 General
In order to use a compatible test system for all types of speech terminals a HATS (Head And Torso Simulator) will be
used instead of freefield microphone (for receive measurement) and artificial mouth (for send measurement). HATS is
described in Recommendation ITU-T P.58 [9].
The preferred way of testing a terminal is to connect it to a network simulator with exact defined settings and access
points. The test sequences are fed in either electrically, using a reference codec or using the direct signal processing
approach or acoustically using ITU-T specified devices.
When a coder with variable bit rate is used for testing the terminal parameters, the bit rate recognized giving the best
characteristics and/or the ones commonly used should be selected, e.g.:
• Recommendation ITU-T G.722 [4]: 64 kbit/s.
• Recommendation ITU-T G.722.2 [6]: 12,65 kbit/s and/or 23,85 kbit/s.
• Recommendation ITU-T G.729.1 [7]: 32 kbit/s.
VoIP
IP-Half-Channel
Network
Measurement Terminal
simulator
under
Adapter
delay,
(VoIP Reference Point)
Path through Path through
test
jitter,
Gateway
IP network IP network
packet loss
Simulation
POI
Electrical
Reference
Point
Measurement System
Figure 1: Half channel terminal measurement
6.2.2 Setup for terminals
6.2.2.1 Hands-free measurements
The ear used for measurement will be indicated in the test report.
Desktop operated hands-free terminal
For HATS test equipment, definition of hands-free terminal and setups for hands-free terminal can be found in
Recommendation ITU-T P.581 [16].
ETSI
14 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 2: Position for test of desktop hands free terminal side view
60 cm
Figure 3: Position for test of desktop hands free terminal top sight
Handheld hands-free terminal
It should be placed in according to Figure 4. The HATS should be positioned so that the HATS Reference Point is at a
distance d from the centre point of the visual display of the Mobile Station. The distance d is specified by the
HF HF
manufacturer. A vertical angle θ may be specified by the manufacturer.
HF
ETSI
15 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 4: Configuration of Hand-Held loudspeaker relative to the HATS side view
d = d , d = d - d , where d is the distance for receive measurement, d is the distance for send
HFR HF HFS HF EM HFR HFS
measurement, and d is the distance from ERP to MRP.
EM
When no operating distance is specified by manufacturer, value for d will be 30 cm. A calculation of d for HATS
HFS EM
gives 12 cm.
A value of 42 cm will be taken for d .
HF
Softphone (computer-based terminals)
When manufacturer gives conditions of use, they will apply for test.
If no other requirement is given by manufacturer softphone will be positioned according to the following conditions:
Softphone including speakers and microphone
Two types of softphones are to be considered:
• Type 1 is to be used as a desktop type (e.g. notebook).
• Type 2 is to be used as a handheld type (e.g. PDA).

Figure 5: Configuration of softphone relative to the HATS side view
ETSI
16 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 6: Configuration of softphone relative to the HATS top sight
Softphone with separate speakers
When separate loudspeakers are used, system will be positioned as in Figure 7.

Figure 7: Configuration of softphone using external speakers relative to the HATS top sight
ETSI
17 ETSI ES 202 740 V1.8.2 (2022-05)
When external microphone and speakers are used, system will be positioned as in Figure 8.

Figure 8: Configuration of softphone using
external speakers and microphone relative to the HATS top sight
Group audio terminal
When manufacturer gives conditions of use, they will apply for test.
When no requirement from manufacturer is given, the following conditions will be used by test laboratory.
Measurement will be conducted by using a HATS test equipment.
The following test position will be used.

Figure 9: Configuration of group audio terminal relative to the HATS side view
ETSI
18 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 10: Configuration of group audio terminal relative to the HATS top sight
NOTE: In case of special casing where those conditions are not realistic, test laboratory can use a different
position more representative of real use. The conditions of test will be given in the test report.
6.2.2.2 Measurements in loudspeaking mode
For those measurements HATS will be used.
It will be positioned as defined in clause 6.2.2. Measurement will be performed on one ear and handset will be placed
on the other ear. The ear used for measurement will be specified in test report. For the handset 8N application force
shall be used.
NOTE: Only desktop terminals are concerned by loudspeaking measurement.
6.2.3 Test signal levels
6.2.3.1 Send
Unless specified otherwise, the test signal level shall be -4,7 dBPa at the MRP.
The various steps for calibration of the artificial mouth of the HATS are described in Recommendation ITU-T
P.581 [16].
The level at MRP (measured in third octave bands) adjusted at the first step (with total level of -4,7 dBPa) is used as the
reference for send characteristics.
The test setup shall be in conformance with Figure 11 but, depending on the type of terminal, the appropriate distance
and level will be used. When using this calibration method, send sensitivity shall be calculated as follows:
S = 20logV − 20logP + Corr − Dcorr
mJ s MRP
(1)
where:
• V is the measured voltage across the appropriate termination (unless stated otherwise, a 600 Ω termination).
s
• P
...


SLOVENSKI STANDARD
01-september-2022
Kakovost prenosa govora in večpredstavnih vsebin (STQ) - Prenosne zahteve za
širokopasovne zvočniške in prostoročne terminale VoIP glede na kakovost
storitev (QoS), kot jo dojema uporabnik
Speech and multimedia Transmission Quality (STQ) - Transmission requirements for
wideband VoIP loudspeaking and handsfree terminals from a QoS perspective as
perceived by the user
Ta slovenski standard je istoveten z: ETSI ES 202 740 V1.8.2 (2022-05)
ICS:
33.050.01 Telekomunikacijska Telecommunication terminal
terminalska oprema na equipment in general
splošno
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

ETSI STANDARD
Speech and multimedia Transmission Quality (STQ);
Transmission requirements for wideband
VoIP loudspeaking and handsfree terminals
from a QoS perspective as perceived by the user

2 ETSI ES 202 740 V1.8.2 (2022-05)

Reference
RES/STQ-303
Keywords
handsfree, loudspeaking, quality, speech, terminal,
VoIP, wideband
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE

Tel.: +33 4 92 94 42 00  Fax: +33 4 93 65 47 16

Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871

Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
If you find a security vulnerability in the present document, please report it through our
Coordinated Vulnerability Disclosure Program:
https://www.etsi.org/standards/coordinated-vulnerability-disclosure
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
governmental rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
and/or
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.

Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.

© ETSI 2022.
All rights reserved.
ETSI
3 ETSI ES 202 740 V1.8.2 (2022-05)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 5
1 Scope . 6
2 References . 6
2.1 Normative references . 6
2.2 Informative references . 7
3 Definition of terms, symbols and abbreviations . 8
3.1 Terms . 8
3.2 Symbols . 9
3.3 Abbreviations . 9
4 General considerations . 10
4.1 Coding algorithm . 10
4.2 End-to-end considerations . 10
5 Test equipment . 11
5.1 IP half channel measurement adaptor . 11
5.2 Environmental conditions for tests . 11
5.3 Accuracy of measurements and test signal generation . 11
5.4 Network impairment simulation . 12
5.5 Acoustic environment . 12
5.6 Influence of terminal delay on measurements . 12
6 Requirements and associated measurement methodologies . 12
6.1 Notes . 12
6.2 Test setup. 13
6.2.1 General . 13
6.2.2 Setup for terminals . 13
6.2.2.1 Hands-free measurements . 13
6.2.2.2 Measurements in loudspeaking mode . 18
6.2.3 Test signal levels . 18
6.2.3.1 Send. 18
6.2.3.2 Receive . 19
6.2.4 Setup of background noise simulation . 19
6.2.5 Setup for variable echo path . 20
6.3 Coding independent parameters . 20
6.3.1 Send sensitivity/frequency response . 20
6.3.2 Send Loudness Rating (SLR). 21
6.3.3 Mic mute . 22
6.3.4 Send distortion . 22
6.3.5 Out-of-band signals in send direction . 23
6.3.6 Send noise . 23
6.3.7 Terminal Coupling Loss (TCL) . 24
6.3.8 Stability loss. 25
6.3.9 Receive frequency response . 25
6.3.10 Receive Loudness Rating (RLR) . 28
6.3.11 Receive distortion . 28
6.3.12 Out-of-band signals in receive direction . 29
6.3.13 Receive noise . 30
6.3.14 Double talk performance . 30
6.3.14.1 General . 30
6.3.14.2 Attenuation range in send direction during double talk A . 31
H,S,dt
6.3.14.3 Attenuation range in receive direction during double talk A . 32
H,R,dt
ETSI
4 ETSI ES 202 740 V1.8.2 (2022-05)
6.3.14.4 Detection of echo components during double talk . 32
6.3.14.5 Minimum activation level and sensitivity of double talk detection . 34
6.3.15 Switching characteristics . 34
6.3.15.1 Note . 34
6.3.15.2 Activation in send direction . 34
6.3.15.3 Silence suppression and comfort noise generation . 35
6.3.16 Background noise performance . 35
6.3.16.1 Performance in send direction in the presence of background noise . 35
6.3.16.2 Speech quality in the presence of background noise . 36
6.3.16.3 Quality of background noise transmission (with far end speech). 36
6.3.17 Quality of echo cancellation . 37
6.3.17.1 Temporal echo effects . 37
6.3.17.2 Spectral echo attenuation . 37
6.3.17.3 Occurrence of artefacts . 38
6.3.17.4 Variable echo path. 38
6.3.18 Variant impairments . 39
6.3.18.1 Clock accuracy send . 39
6.3.18.2 Clock accuracy receive . 39
6.3.18.3 Send packet delay variation. 39
6.3.19 Send and receive delay - round trip delay . 40
6.4 Codec specific requirements. 42
6.4.1 Objective listening speech quality MOS-LQO in send direction . 42
6.4.2 Objective listening speech quality MOS-LQO in receive direction. 43
6.4.3 Quality of jitter buffer adjustment . 44
Annex A (informative): Processing delays in VoIP terminals . 46
Annex B (informative): Bibliography . 49
History . 50

ETSI
5 ETSI ES 202 740 V1.8.2 (2022-05)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its

Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the ®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This ETSI Standard (ES) has been produced by ETSI Technical Committee Speech and multimedia Transmission
Quality (STQ).
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
Introduction
Traditionally, analogue and digital telephones were interfacing switched-circuit 64 kbit/s PCM networks. With the fast
growth of IP networks, wideband terminals providing higher audio-bandwidth and directly interfacing packet-switched
networks (VoIP) are being rapidly introduced. Such IP network edge devices may include gateways, specifically
designed IP phones, soft phones or other devices connected to the IP based networks and providing telephony service.
Due to the unique characteristics of the IP networks including packet loss, delay, etc., new performance specification, as
well as appropriate measurement methods, will have to be developed. Terminals are getting increasingly complex.
The advanced signal processing of terminals is targeted to speech signals. Therefore, wherever possible speech signals
are used for testing in order to achieve mostly realistic test conditions and meaningful results.
The present document provides speech transmission performance requirements for wideband VoIP loudspeaking and
hands-free terminals.
NOTE: Requirement limits are given in tables, the associated curve when provided is given for illustration.
ETSI
6 ETSI ES 202 740 V1.8.2 (2022-05)
1 Scope
The present document provides speech transmission performance requirements for 8 kHz wideband VoIP loudspeaking
and hands-free terminals; it addresses all types of IP based terminals, including wireless, softphones and group audio
terminals. DECT terminals are covered in ETSI EN 300 175-8 [i.6] and ETSI EN 300 176-2 [i.7].
In contrast to other standards which define minimum performance requirements it is the intention of the present
document to specify terminal equipment requirements which enable manufacturers and service providers to enable good
quality end-to-end speech performance as perceived by the user.
In addition to basic testing procedures, the present document describes advanced testing procedures taking into account
further quality parameters as perceived by the user.
NOTE: The present document does not concern headset terminals.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] ETSI I-ETS 300 245-6: "Integrated Services Digital Network (ISDN); Technical characteristics of
telephony terminals; Part 6: Wideband (7 kHz), loudspeaking and hands free telephony".
[2] Recommendation ITU-T G.108: "Application of the E-model: A planning guide".
[3] Recommendation ITU-T G.109: "Definition of categories of speech transmission quality".
[4] Recommendation ITU-T G.722: "7 kHz audio-coding within 64 kbit/s".
[5] Recommendation ITU-T G.722.1: "Low-complexity coding at 24 and 32 kbit/s for hands-free
operation in systems with low frame loss".
[6] Recommendation ITU-T G.722.2: "Wideband coding of speech at around 16 kbit/s using Adaptive
Multi-Rate Wideband (AMR-WB)".
[7] Recommendation ITU-T G.729.1: "G.729 based embedded variable bit-rate coder: An 8-32 kbit/s
scalable wideband coder bitstream interoperable with G.729".
[8] Recommendation ITU-T P.56: "Objective measurement of active speech level".
[9] Recommendation ITU-T P.58: "Head and torso simulator for telephonometry".
[10] Recommendation ITU-T P.79: "Calculation of loudness ratings for telephone sets".
[11] Recommendation ITU-T P.310: "Transmission characteristics for narrow-band digital handset and
headset telephones".
[12] Recommendation ITU-T P.340: "Transmission characteristics and speech quality parameters of
hands-free terminals".
ETSI
7 ETSI ES 202 740 V1.8.2 (2022-05)
[13] Recommendation ITU-T P.341: "Transmission characteristics for wideband digital loudspeaking
and hands-free telephony terminals".
[14] Recommendation ITU-T P.501: "Test signals for use in telephonometry".
[15] Recommendation ITU-T P.502: "Objective test methods for speech communication systems using
complex test signals".
[16] Recommendation ITU-T P.581: "Use of head and torso simulator for hands-free and handset
terminal testing".
[17] IEC 61260-1: "Electroacoustics - Octave-band and fractional-octave-band filters - Part 1:
Specifications".
[18] Recommendation ITU-T P.800.1: "Mean Opinion Score (MOS) terminology".
[19] ETSI TS 103 224: "Speech and multimedia Transmission Quality (STQ); A sound field
reproduction method for terminal testing including a background noise database".
[20] Recommendation ITU-T P.863.1: "Application guide for Recommendation ITU-T P.863".
[21] Recommendation ITU-T P.863: "Perceptual objective listening quality prediction".
[22] ETSI ES 202 737: "Speech and multimedia Transmission Quality (STQ); Transmission
requirements for narrowband VoIP terminals (handset and headset) from a QoS perspective as
perceived by the user".
[23] Recommendation ITU-T P.1010: "Fundamental voice transmission objectives for VoIP terminals
and gateways".
[24] IETF RFC 3550: "RTP: A Transport Protocol for Real-Time Applications".
[25] TIA-920.130: "Telecommunications Communications Products Transmission Requirements for
Digital Interface Communications Devices with Headsets".
[26] Recommendation ITU-T G.122: "Influence of national systems on stability and talker echo in
international connections".
[27] Void.
[28] Recommendation ITU-T G.711: "Pulse code modulation (PCM) of voice frequencies".
[29] Recommendation ITU-T G.729: "Coding of speech at 8 kbit/s using conjugate-structure algebraic-
code-excited linear prediction (CS-ACELP)".
[30] Recommendation ITU-T G.723.1: "Dual rate speech coder for multimedia communications
transmitting at 5.3 and 6.3 kbit/s".
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] ETSI EG 202 425: "Speech Processing, Transmission and Quality Aspects (STQ); Definition and
implementation of VoIP reference point".
ETSI
8 ETSI ES 202 740 V1.8.2 (2022-05)
[i.2] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".

[i.3] Netem .
NOTE: Information available at https://wiki.linuxfoundation.org/networking/netem.
[i.4] ETSI EG 201 377-1: "Speech and multimedia Transmission Quality (STQ); Specification and
measurement of speech transmission quality; Part 1: Introduction to objective comparison
measurement methods for one-way speech quality across networks".
[i.5] IETF RFC 4737: "Packet Reordering Metrics".
[i.6] ETSI EN 300 175-8: "Digital Enhanced Cordless Telecommunications (DECT); Common
Interface (CI); Part 8: Speech and audio coding and transmission".
[i.7] ETSI EN 300 176-2: "Digital Enhanced Cordless Telecommunications (DECT); Test
specification; Part 2: Audio and speech".
3 Definition of terms, symbols and abbreviations
3.1 Terms
For the purposes of the present document, the following terms apply:
artificial ear: device for the calibration of earphones incorporating an acoustic coupler and a calibrated microphone for
the measurement of the sound pressure and having an overall acoustic impedance similar to that of the median adult
human ear over a given frequency band
codec: combination of an analogue-to-digital encoder and a digital-to-analogue decoder operating in opposite directions
of transmission in the same equipment
ear-Drum Reference Point (DRP): point located at the end of the ear canal, corresponding to the ear-drum position
freefield equalization: artificial head is equalized in such a way that for frontal sound incidence in anechoic conditions
the frequency response of the artificial head is flat
freefield reference point: point located in the free sound field, at least in 1,5 m distance from a sound source radiating
in free air
NOTE: In case of a head and torso simulator (HATS) in the centre of the artificial head with no artificial head
present.
group-audio terminal: handsfree terminal primarily designed for use by several users which will not be equipped with
a handset
handsfree telephony terminal: telephony terminal using a loudspeaker associated with an amplifier as a telephone
receiver and which can be used without a handset
HATS Hands-Free Reference Point (HATS HFRP): reference point "n" from Recommendation ITU-T P.58 [9] "n" is
one of the points numbered from 11 to 17 and defined in table 6a of Recommendation ITU-T P.58 [9], (coordinates of
far field front point)
NOTE: The HATS HFRP depends on the location(s) of the microphones of the terminal under test: the
appropriate axis lip-ring/HATS HFRP is to be as close as possible to the axis lip-ring/HFT microphone
under test.
Head And Torso Simulator (HATS) for telephonometry: manikin extending downward from the top of the head to
the waist, designed to simulate the sound pick-up characteristics and the acoustic diffraction produced by a median
human adult and to reproduce the acoustic field generated by the human mouth
ETSI
9 ETSI ES 202 740 V1.8.2 (2022-05)
loudspeaking function: function of a handset telephone using a loudspeaker associated with an amplifier as a
telephone receiver
Mouth Reference Point (MRP): point located on axis and 25 mm in front of the lip plane of a mouth simulator
nominal setting of the volume control: setting which is closest to the nominal RLR
reordering: packet order changes during transfer over the network [i.5], packets arrive out of order at the receiver (i.e.
RTP packets)
softphone: speech communication system based upon a computer
3.2 Symbols
Void.
3.3 Abbreviations
For the purposes of the present document, the following abbreviations apply:
AM-FM Amplitude Modulation - Frequency Modulation
AMR-WB Adaptative Multi Rate - Wideband
CS Composite Source
CSS Composite Source Signal
DRP ear-Drum Reference Point
DUT Devise Under Test
EC Echo Canceller
EL Echo Loss
ERP Ear Reference Point
FFT Fast Fourrier Transform
G-MOS-LQOw Overall transmission quality wideband
GSM Global System for Mobile communications
HATS Head And Torso Simulator
HFRP Hands Free Reference Point
HFT Hands Free Terminal
IEC International Electrotechnical Commission
IP Internet Protocol
IPDV IP Packet Delay Variation
ITU-T International Telecommunication Union -Telecommunication standardization sector
LE Earphone coupling Loss
MOS Mean Opinion Score
MOS-LQOy Mean Opinion Score - Listening Quality Objective
NOTE: y being N for narrowband, M for mixed, S for super-wideband and F for fullband. See Recommendation
ITU-T P.800.1 [18].
MRP Mouth Reference Point
N-MOS-LQOw Transmission quality of the background noise wideband
NLP Non Linear Processor
PBX Private Branch eXchange
PC Personal Computer
PCM Pulse Code Modulation
PDA Personal Digital Assistant
PMRP Sound Pressure at the Mouth Reference Point
PN Pseudo Noise
POI Point Of Interconnection
QoS Quality of Service
RLR Receive Loudness Rating
RMS Root Mean Square
RTP Real-Time Transport Protocol
S-MOS-LQOw Transmission quality of the speech wideband
ETSI
10 ETSI ES 202 740 V1.8.2 (2022-05)
SLR Send Loudness Rating
TCL Terminal Coupling Loss
TDM Time Division Multiplex
TELR Talker Echo Loudness Rating
TOSQA Telecommunications Objective Speech Quality Assessment
VAD Voice Activity Detection
VoIP Voice over Internet Protocol
4 General considerations
4.1 Coding algorithm
The coding algorithm assumed in the present document is according to Recommendation ITU-T G.722 [4]. VoIP
terminals may support other coding algorithms.
NOTE: Associated Packet Loss Concealment, e.g. as defined in Recommendation ITU-T G.722 [4], appendixes 3
and 4, should be used.
4.2 End-to-end considerations
In order to achieve a desired end-to-end speech transmission performance (mouth-to-ear) it is recommended that
general rules of transmission planning tasks are carried out with the E-model taking into account that E-model does not
directly address handsfree or loudspeaking terminals; this includes the a-priori determination of the desired category of
speech transmission quality as defined in Recommendation ITU-T G.109 [3].
While, in general, the transmission characteristics of single circuit-oriented network elements, such as switches or
terminals can be assumed to have a single input value for the planning tasks of Recommendation ITU-T G.108 [2], this
approach is not applicable in packet based systems and thus there is a need for the transmission planner's specific
attention.
In particular the decision as to which delay measured according to the present document should be acceptable or
representative for the specific configuration is the responsibility of the individual transmission planner.
Recommendation ITU-T G.108 [2] with its amendments provides further guidance on this important issue.
The following optimum terminal parameters from a user's perspective need to be considered:
• Minimized delay in send and receive direction.
• Optimum loudness rating (RLR, SLR).
• Compensation for network delay variation.
• Packet loss recovery performance.
• Maximized terminal coupling loss.
• Some more basic (ETSI I-ETS 300 245-6 [1]) parameters are applicable, if Recommendation ITU-T G.722 [4]
is used.
ETSI
11 ETSI ES 202 740 V1.8.2 (2022-05)
5 Test equipment
5.1 IP half channel measurement adaptor
The IP half channel measurement adaptor is described in ETSI EG 202 425 [i.1].
5.2 Environmental conditions for tests
The following conditions shall apply for the testing environment:
a) ambient temperature: 15 °C to 35 °C;
b) relative humidity: 5 % to 85 %;
c) air pressure: 86 kPa to 106 kPa (860 mbar to 1 060 mbar).
5.3 Accuracy of measurements and test signal generation
Unless specified otherwise, the accuracy of measurements made by test equipment shall be equal to or better than:
Table 1: Measurement accuracy
Item Accuracy
Electrical signal level ±0,2 dB for levels ≥ -50 dBV
±0,4 dB for levels < -50 dBV
Sound pressure ±0,7 dB
Frequency ±0,2 %
Time ±0,2 %
Application force ±2 N
Measured maximum frequency 20 kHz

NOTE: The measured maximum frequency is due to Recommendation ITU-T P.58 [9] limitations.
Unless specified otherwise, the accuracy of the signals generated by the test equipment shall be better than:
Table 2: Accuracy of test signal generation
Quantity Accuracy
Sound pressure level at ±3 dB for frequencies from 100 Hz to 200 Hz
Mouth Reference Point (MRP) ±1 dB for frequencies from 200 Hz to 4 000 Hz
±3 dB for frequencies from 4 000 Hz to 8 000 Hz
Electrical excitation levels ±0,4 dB across the whole frequency range
Frequency generation ±2 % (see note)
Time ±0,2 %
Specified component values ±1 %
NOTE: This tolerance may be used to avoid measurements at critical frequencies, e.g. those
due to sampling operations within the terminal under test.

For terminal equipment which is directly powered from the mains supply, all tests shall be carried out within ±5 % of
the rated voltage of that supply. If the equipment is powered by other means and those means are not supplied as part of
the apparatus, all tests shall be carried out within the power supply limit declared by the supplier. If the power supply is
alternate current, the test shall be conducted within ±4 % of the rated frequency.
ETSI
12 ETSI ES 202 740 V1.8.2 (2022-05)
5.4 Network impairment simulation
At least one set of requirements is based on the assumption of an error free packet network, and at least one other set of
requirements is based on a defined simulated loss of performance of the packet network.
™ ™
An appropriate network simulator has to be used, for example Netem [i.3]. The key points of Netem can be
summarized as follows:
™ ™ ™
• Netem is part of the networking function of Linux . With Netem , there can be generated loss, duplication,

delay, jitter and reordering (and the distribution of jitter can be chosen during runtime). Netem can be run on

a Linux -PC running as a bridge or a router.
• It is not advised to define specific distortion patterns for testing in standards, because it will be easy to adapt
devices to these patterns (as it is already done for test signals). But if a pattern is unknown to a manufacturer,
the same pattern can be used by a test lab for different devices and gives comparable results.
™ ™
NOTE: Netem and Linux are examples of suitable products available commercially. This information is given
for the convenience of users of the present document and does not constitute an endorsement by ETSI of
these product(s).
Requirements for the network impairment simulation can be found in annex D of ETSI ES 202 737 [22].
5.5 Acoustic environment
Unless stated otherwise measurements shall be conducted under quiet and "anechoic" conditions. Depending on the
distance of the transducers from mouth and ear a quiet office room may be sufficient e.g. for handsets where artificial
mouth and artificial ear are located close to the acoustical transducers. But this is not applicable for handsfree and
loudspeaking terminals.
In cases where real or simulated background noise is used as part of the testing environment, the original background
noise shall not be noticeably influenced by the acoustical properties of the room.
In all cases where the performance of acoustic echo cancellers shall be tested, a realistic room, which represents the
typical user environment for the terminal shall be used.
In case where an anechoic room is not available the test room has to be an acoustically treated room with few
reflections and a low noise level.
Considering this, test laboratory, in the case where its test room does not conform to anechoic conditions as given in
Recommendation ITU-T P.341 [13], has to present difference in results for measurements due to its test room.
5.6 Influence of terminal delay on measurements
As delay is introduced by the terminal, care shall be taken for all measurements where exact position of the analysis
window is required. It shall be checked that the test is performed on the test signal and not on any other signal.
6 Requirements and associated measurement
methodologies
6.1 Notes
NOTE 1: In general the test methods as described in the present document apply. If alternative methods exist they
may be used if they have been proven to give the same result as the method described in the present
document. This will be indicated in the test report.
NOTE 2: Due to time variant nature of IP connection, delay variation may impair the measurement. In such case,
the measurement has to be repeated until a valid measurement can be achieved.
ETSI
13 ETSI ES 202 740 V1.8.2 (2022-05)
6.2 Test setup
6.2.1 General
In order to use a compatible test system for all types of speech terminals a HATS (Head And Torso Simulator) will be
used instead of freefield microphone (for receive measurement) and artificial mouth (for send measurement). HATS is
described in Recommendation ITU-T P.58 [9].
The preferred way of testing a terminal is to connect it to a network simulator with exact defined settings and access
points. The test sequences are fed in either electrically, using a reference codec or using the direct signal processing
approach or acoustically using ITU-T specified devices.
When a coder with variable bit rate is used for testing the terminal parameters, the bit rate recognized giving the best
characteristics and/or the ones commonly used should be selected, e.g.:
• Recommendation ITU-T G.722 [4]: 64 kbit/s.
• Recommendation ITU-T G.722.2 [6]: 12,65 kbit/s and/or 23,85 kbit/s.
• Recommendation ITU-T G.729.1 [7]: 32 kbit/s.
VoIP
IP-Half-Channel
Network
Measurement Terminal
simulator
under
Adapter
delay,
(VoIP Reference Point)
Path through Path through
test
jitter,
Gateway
IP network IP network
packet loss
Simulation
POI
Electrical
Reference
Point
Measurement System
Figure 1: Half channel terminal measurement
6.2.2 Setup for terminals
6.2.2.1 Hands-free measurements
The ear used for measurement will be indicated in the test report.
Desktop operated hands-free terminal
For HATS test equipment, definition of hands-free terminal and setups for hands-free terminal can be found in
Recommendation ITU-T P.581 [16].
ETSI
14 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 2: Position for test of desktop hands free terminal side view
60 cm
Figure 3: Position for test of desktop hands free terminal top sight
Handheld hands-free terminal
It should be placed in according to Figure 4. The HATS should be positioned so that the HATS Reference Point is at a
distance d from the centre point of the visual display of the Mobile Station. The distance d is specified by the
HF HF
manufacturer. A vertical angle θ may be specified by the manufacturer.
HF
ETSI
15 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 4: Configuration of Hand-Held loudspeaker relative to the HATS side view
d = d , d = d - d , where d is the distance for receive measurement, d is the distance for send
HFR HF HFS HF EM HFR HFS
measurement, and d is the distance from ERP to MRP.
EM
When no operating distance is specified by manufacturer, value for d will be 30 cm. A calculation of d for HATS
HFS EM
gives 12 cm.
A value of 42 cm will be taken for d .
HF
Softphone (computer-based terminals)
When manufacturer gives conditions of use, they will apply for test.
If no other requirement is given by manufacturer softphone will be positioned according to the following conditions:
Softphone including speakers and microphone
Two types of softphones are to be considered:
• Type 1 is to be used as a desktop type (e.g. notebook).
• Type 2 is to be used as a handheld type (e.g. PDA).

Figure 5: Configuration of softphone relative to the HATS side view
ETSI
16 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 6: Configuration of softphone relative to the HATS top sight
Softphone with separate speakers
When separate loudspeakers are used, system will be positioned as in Figure 7.

Figure 7: Configuration of softphone using external speakers relative to the HATS top sight
ETSI
17 ETSI ES 202 740 V1.8.2 (2022-05)
When external microphone and speakers are used, system will be positioned as in Figure 8.

Figure 8: Configuration of softphone using
external speakers and microphone relative to the HATS top sight
Group audio terminal
When manufacturer gives conditions of use, they will apply for test.
When no requirement from manufacturer is given, the following conditions will be used by test laboratory.
Measurement will be conducted by using a HATS test equipment.
The following test position will be used.

Figure 9: Configuration of group audio terminal relative to the HATS side view
ETSI
18 ETSI ES 202 740 V1.8.2 (2022-05)

Figure 10: Configuration of group audio terminal relat
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...