ETSI TS 103 224 V1.6.1 (2022-03)
Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise database
Speech and multimedia Transmission Quality (STQ); A sound field reproduction method for terminal testing including a background noise database
RTS/STQ-300
General Information
Standards Content (Sample)
ETSI TS 103 224 V1.6.1 (2022-03)
TECHNICAL SPECIFICATION
Speech and multimedia Transmission Quality (STQ);
A sound field reproduction method for terminal testing
including a background noise database
---------------------- Page: 1 ----------------------
2 ETSI TS 103 224 V1.6.1 (2022-03)
Reference
RTS/STQ-300
Keywords
noise, quality, speech, terminal
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE
Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N° 348 623 562 00017 - APE 7112B
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° w061004871
Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the prevailing version of an ETSI
deliverable is the one made publicly available in PDF format at www.etsi.org/deliver.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Notice of disclaimer & limitation of liability
The information provided in the present deliverable is directed solely to professionals who have the appropriate degree of
experience to understand and interpret its content in accordance with generally accepted engineering or
other professional standard and applicable regulations.
No recommendation as to products and services or vendors is made or should be implied.
No representation or warranty is made that this deliverable is technically accurate or sufficient or conforms to any law
and/or governmental rule and/or regulation and further, no representation or warranty is made of merchantability or fitness
for any particular purpose or against infringement of intellectual property rights.
In no event shall ETSI be held liable for loss of profits or any other incidental or consequential damages.
Any software contained in this deliverable is provided "AS IS" with no warranties, express or implied, including but not
limited to, the warranties of merchantability, fitness for a particular purpose and non-infringement of intellectual property
rights and ETSI shall not be held liable in any event for any damages whatsoever (including, without limitation, damages
for loss of profits, business interruption, loss of information, or any other pecuniary loss) arising out of or related to the use
of or inability to use the software.
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and
microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.
© ETSI 2022.
All rights reserved.
ETSI
---------------------- Page: 2 ----------------------
3 ETSI TS 103 224 V1.6.1 (2022-03)
Contents
Intellectual Property Rights . 5
Foreword . 5
Modal verbs terminology . 5
Introduction . 6
1 Scope . 7
2 References . 7
2.1 Normative references . 7
2.2 Informative references . 8
3 Definition of terms, symbols and abbreviations . 8
3.1 Terms . 8
3.2 Symbols . 9
3.2 Abbreviations . 9
4 Methods for realistic sound reproduction . 9
5 Recording arrangement . 10
5.0 General . 10
5.1 Microphone array setup . 10
5.1.1 Principle limitations . 10
5.1.2 Microphone calibration . 10
5.2 Microphone array setup for handset-type and headset terminals. 11
5.3 Microphone array setup for hands-free terminals . 11
5.4 Microphone array setup for binaural applications . 13
6 Loudspeaker setup for background noise simulation . 14
6.0 General setup . 14
6.1 Test room requirements . 15
6.2 Equalization and calibration . 16
6.2.0 Overview of the equalization procedure . 16
6.2.1 Separate level adjustment for each loudspeaker . 16
6.2.2 System identification . 16
6.2.3 Pre-processing of the impulse responses . 17
6.2.4 Calculation of the inversion filters . 19
6.2.4.0 Overview . 19
6.2.4.1 Inversion procedure . 20
6.2.4.2 Different microphones for different frequency bands . 21
6.2.4.3 Search for the optimum regularization factor . 21
6.2.4.3.0 Introduction . 21
6.2.4.3.1 Basic methodology to find the optimum regularization factor . 21
6.2.4.3.2 Extended methodology to find the optimum regularization factor for frequencies above 2 kHz . 23
6.2.5 First test of equalization and filter adjustment for inversion error compensation . 25
6.2.6 Accuracy of the equalization . 26
6.3 Accuracy of the reproduction arrangement . 26
6.3.0 Introduction. 26
6.3.1 Comparison between original sound field and simulated sound field . 26
6.3.2 Impact of handset positioner and phone on the simulated sound field . 28
6.3.3 Comparison of terminal performance in the original sound field and the simulated sound field . 29
6.3.3.1 Introduction . 29
6.3.3.2 Background noise transmission . 30
6.3.3.2.0 Validation Procedure . 30
6.3.3.2.1 Handset . 30
6.3.3.2.2 Handheld Hands-free . 34
6.3.3.2.3 Desktop Hands-Free . 35
6.3.3.3 S-/N-/G-MOS Analysis according to ETSI TS 103 106 . 35
6.3.3.3.1 Handset . 35
ETSI
---------------------- Page: 3 ----------------------
4 ETSI TS 103 224 V1.6.1 (2022-03)
6.3.3.3.2 Hands-free . 37
7 Generalization of the method for a more flexible loudspeaker and microphone arrangement. 38
7.0 Introduction . 38
7.1 Loudspeaker configuration . 38
7.2 Microphone setup . 38
7.3 Background noise recordings and reference noise . 39
7.4 Equalization and calibration . 39
7.5 Accuracy of the equalization . 40
7.6 Example use case: equalization inside a vehicle . 40
7.6.0 Introduction. 40
7.6.1 Loudspeaker configuration . 41
7.6.2 Microphone setup . 41
7.6.3 Equalization . 41
7.7 Example use case: binaural equalization . 41
7.7.1 Introduction. 41
7.7.2 Loudspeaker configuration . 41
7.7.3 Microphone setup . 42
7.7.4 Equalization . 42
8 Background noise database . 42
8.0 Introduction . 42
8.1 Reference noise recording . 42
8.1.1 Default frequency range . 42
8.1.2 Low frequency extension . 43
8.2 Background noise signals for terminal testing. 43
8.3 Background noise signals for binaural applications . 46
8.4 Background noise signals in a home-like test environment . 47
Annex A (informative): Home-like test environment . 50
History . 52
ETSI
---------------------- Page: 4 ----------------------
5 ETSI TS 103 224 V1.6.1 (2022-03)
Intellectual Property Rights
Essential patents
IPRs essential or potentially essential to normative deliverables may have been declared to ETSI. The declarations
pertaining to these essential IPRs, if any, are publicly available for ETSI members and non-members, and can be
found in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to
ETSI in respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the
ETSI Web server (https://ipr.etsi.org/).
Pursuant to the ETSI Directives including the ETSI IPR Policy, no investigation regarding the essentiality of IPRs,
including IPR searches, has been carried out by ETSI. No guarantee can be given as to the existence of other IPRs not
referenced in ETSI SR 000 314 (or the updates on the ETSI Web server) which are, or may be, or may become,
essential to the present document.
Trademarks
The present document may include trademarks and/or tradenames which are asserted and/or registered by their owners.
ETSI claims no ownership of these except for any which are indicated as being the property of ETSI, and conveys no
right to use or reproduce any trademark and/or tradename. Mention of those trademarks in the present document does
not constitute an endorsement by ETSI of products, services or organizations associated with those trademarks.
DECT™, PLUGTESTS™, UMTS™ and the ETSI logo are trademarks of ETSI registered for the benefit of its
Members. 3GPP™ and LTE™ are trademarks of ETSI registered for the benefit of its Members and of the 3GPP
Organizational Partners. oneM2M™ logo is a trademark of ETSI registered for the benefit of its Members and of the
®
oneM2M Partners. GSM and the GSM logo are trademarks registered and owned by the GSM Association.
Foreword
This Technical Specification (TS) has been produced by ETSI Technical Committee Speech and multimedia
Transmission Quality (STQ).
The present document describes a sound field recording and reproduction technique which can be applied for all types
of terminals but is especially suitable for modern multi-microphone terminals including array techniques. The present
document provides an additional simulation technique which can be used instead of the part 1 of ETSI multi-part
deliverable ES/EG 202 396 "Speech quality performance in the presence of background noise", as identified below:
• ETSI ES 202 396-1: "Background noise simulation technique and background noise database" [i.7];
• ETSI EG 202 396-2: "Background noise transmission - Network simulation - Subjective test database and
results" [i.8];
• ETSI EG 202 396-3: "Background noise transmission - Objective test methods" [i.9].
The background noise simulation can be used in conjunction with the objective test methods as described in ETSI
EG 202 396-3 [i.9], ETSI TS 103 106 [i.10] and ETSI TS 103 281 [i.12].
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
ETSI
---------------------- Page: 5 ----------------------
6 ETSI TS 103 224 V1.6.1 (2022-03)
Introduction
Background noise is present in most of the conversations today. Background noise may impact the speech
communication performance of terminal and network equipment significantly. Therefore testing and optimization of
such equipment is necessary using realistic background noises. Furthermore reproducible conditions for the tests are
required which can be guaranteed only under lab type conditions. Since modern terminals incorporate more advanced
noise cancellation techniques, such as multi-microphone based noise cancellation, the use of microphone-array
recording techniques and more realistic noise field simulations (compared to the method described in ETSI
ES 202 396-1 [i.7]) are required.
The present document addresses this topic by specifying a methodology for recording and playback of realistic
background noise fields under conditions that are well-defined and able to be calibrated in a lab type environment.
Furthermore a database with real background noises is included.
ETSI
---------------------- Page: 6 ----------------------
7 ETSI TS 103 224 V1.6.1 (2022-03)
1 Scope
The quality of background noise transmission is an important factor, which significantly contributes to the perceived
overall quality of speech. Terminals, networks, and system configurations including wideband, super-wideband, and
fullband speech services can be greatly improved with a proper design of terminals and systems in the presence of
background noise. The present document:
• describes a sound field simulation technique allowing to simulate the real environment using realistic
background noise scenarios for laboratory use;
• contains a database including relevant background noise samples for subjective and objective evaluation.
The present document describes the recording technique used for the sound field simulation, the loudspeaker setup, and
the loudspeaker calibration and equalization procedures. Furthermore the present document specifies the test room
requirements for laboratory conditions.
The simulation environment specified can be used for the evaluation and optimization of terminals and of complex
configurations including terminals, networks and others. The main application areas are: outdoor, office, home and car
environment.
The setup and database as described in the present document are applicable for:
• Objective performance evaluation of terminals in different (simulated) background noise environments.
• Speech processing evaluation by using the pre-processed speech signals in the presence of background noise,
recorded by a terminal.
• Subjective evaluation of terminals by performing conversational tests, specific double talk tests, or talking and
listening tests in the presence of background noise.
• Subjective evaluation in third party listening tests by recording the speech samples of terminals in the presence
of background noise.
2 References
2.1 Normative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
Referenced documents which are not found to be publicly available in the expected location might be found at
https://docbox.etsi.org/Reference.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are necessary for the application of the present document.
[1] Recommendation ITU-T P.58: "Head and Torso Simulator for Telephonometry".
[2] Recommendation ITU-T P.57: "Artificial ears".
ETSI
---------------------- Page: 7 ----------------------
8 ETSI TS 103 224 V1.6.1 (2022-03)
2.2 Informative references
References are either specific (identified by date of publication and/or edition number or version number) or
non-specific. For specific references, only the cited version applies. For non-specific references, the latest version of the
referenced document (including any amendments) applies.
NOTE: While any hyperlinks included in this clause were valid at the time of publication, ETSI cannot guarantee
their long term validity.
The following referenced documents are not necessary for the application of the present document but they assist the
user with regard to a particular subject area.
[i.1] Berkhout A. J., de Vries D., & Vogel, P.: "Acoustic control by wave field synthesis", J. Acoust.
Soc. Am., p. 2764-2778, Mai 1993.
[i.2] Gerzon, M. A.: "Periphony: With-Height Sound Production", Journal of the Audio Engineering
Society 21, 1973.
[i.3] Ward D. B., Abhayapala T. D.: "Reproduction of a Plane-Wave Sound Field Using an Array of
Loudspeakers", IEEE transactions on speech and audio processing, Vol. 9, No.6, p. 697-707,
September 2001.
[i.4] Kirkeby O., Nelson P. A., Orduna-Bustamante F., Hamada H.: "Local sound field reproduction
using digital signal processing", J. Acoust. Soc. Am. 100(3), p. 1584-1593, September 1996.
[i.5] Kirkeby O., Nelson P. A., Hamada H., Orduna-Bustamante F.: "Fast Deconvolution of
Multichannel Systems Using Regularization", IEEE transactions on speech and audio processing,
VOL. 6, NO. 2, p. 189-195, March 1998.
[i.6] Void.
[i.7] ETSI ES 202 396-1: "Speech and multimedia Transmission Quality (STQ); Speech quality
performance in the presence of background noise; Part 1: Background noise simulation technique
and background noise database".
[i.8] ETSI EG 202 396-2: "Speech Processing, Transmission and Quality Aspects (STQ); Speech
quality performance in the presence of background noise; Part 2: Background noise transmission -
Network simulation - Subjective test database and results".
[i.9] ETSI EG 202 396-3: "Speech and multimedia Transmission Quality (STQ); Speech Quality
performance in the presence of background noise; Part 3: Background noise transmission -
Objective test methods".
[i.10] ETSI TS 103 106: "Speech and multimedia Transmission Quality (STQ); Speech quality
performance in the presence of background noise: Background noise transmission for mobile
terminals-objective test methods".
[i.11] ISO 3382-1: "Measurement of room acoustic parameters -- Part 1: Performance spaces".
[i.12] ETSI TS 103 281: "Speech and multimedia Transmission Quality (STQ); Speech quality in the
presence of background noise: Objective test methods for super-wideband and fullband terminals".
3 Definition of terms, symbols and abbreviations
3.1 Terms
Void.
ETSI
---------------------- Page: 8 ----------------------
9 ETSI TS 103 224 V1.6.1 (2022-03)
3.2 Symbols
For the purposes of the present document, the following symbols apply:
c Sound velocity
C Matrix of FFT coefficients of Compensation Filters
H Matrix of FFT coefficients of Impulse Responses
3.2 Abbreviations
For the purposes of the present document, the following abbreviations apply:
DRP Drum Reference Point
DUT Device Under Test
EEP Ear canal Entrance Point
FFT Fast Fourier Transform
HATS Head And Torso Simulator
IR Impulse Response
LFE Low-Frequency Extension
MLS Maximum Length Sequence
MOS Mean Opinion Score
SNR Signal to Noise Ratio
SPL Sound Pressure Level
4 Methods for realistic sound reproduction
For reproduction of real world sound fields there exists a variety of different methods, two of them are wave field
synthesis [i.1] and Ambisonics [i.2]. Both methods, however, require a large number of microphones and loudspeakers
to achieve a sound field reproduction which is sufficiently good for testing purposes. The Wave-Field synthesis setup is
that complex and expensive that it can be neglected for laboratory purposes. Ambisonics, for example, has to be
performed using 43 microphones and 43 loudspeakers to reach a good sound field reproduction up to 2 kHz in a sweet
spot with radius 15 cm (using the rule of thumb in [i.3]). It furthermore cannot consider individual room characteristics
or insufficiencies, but is only designed for rooms offering pure free field conditions. If, e.g. for testing purposes a
HATS is positioned in the artificial noise field, the reproduction quality is reduced by an unknown amount. In summary,
the Ambisonics approach is due to its design not feasible for the intended testing scenario.
The present document introduces an alternative least mean squares method [i.4], which requires eight recording
channels and eight loudspeakers in order to achieve reasonably good reproduction results. The method is based on eight
sweet spots at important testing positions e.g. near the HATS, mainly at the microphone positions of modern phones.
A reasonable reproduction of the recorded sound field at the corresponding eight points in the reproduction situation
also yields good reproduction accuracy in between these points. This well-known property of sound fields is limited to
an upper cut-off frequency which depends on the distances between the recording microphones (see clause 5.1.1).
In clause 5, the recording technique required for this new method is described, while the setup allowing the
reproduction in laboratories and the different steps of the equalization procedure are introduced in clause 6.
A generic variant for flexible microphone and loudspeaker arrangements is described in clause 7.
ETSI
---------------------- Page: 9 ----------------------
10 ETSI TS 103 224 V1.6.1 (2022-03)
5 Recording arrangement
5.0 General
The sound field recording technique (Multi-point sound field recording technique) is based on optimization of the sound
field reproduction at different points in space. The optimization criterion is based on minimization of the reproduction
error at each microphone position. Based on this principle the microphone locations and as a consequence the points in
space for which the sound field reproduction is mostly accurate can be chosen in a wide range. The advantage of the
method is that these locations can be adapted to the type of device to be tested. If the Device Under Test (DUT)
incorporates a microphone array of the Multi-point sound field, recording microphones can be positioned in the area of
the microphones of the DUT. If a hands-free device is to be tested the Multi-point sound fi
...
Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.