SIST ETS 300 580-6 E4:2003
(Main)Digital cellular telecommunications system (Phase 2) (GSM); Full rate speech; Part 6: Voice Activity Detection (VAD) for full rate speech traffic channels (GSM 06.32 version 4.3.1)
Digital cellular telecommunications system (Phase 2) (GSM); Full rate speech; Part 6: Voice Activity Detection (VAD) for full rate speech traffic channels (GSM 06.32 version 4.3.1)
This ETS specifies the voice activity detector (VAD) to be used in the Discontinuous Transmission (DTX) as described in GSM 06.31. It also specifies the test methods to be used to verify that a VAD complies with the technical specification. The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations or Base Station Systems.
Digitalni celični telekomunikacijski sistem (faza 2) – Govor s polno hitrostjo – 6. del: Detekcija govornih dejavnosti (VAD) v prometnih kanalih s polno hitrostjo govora (GSM 06.32, različica 4.3.1)
General Information
Standards Content (Sample)
SLOVENSKI STANDARD
01-december-2003
'LJLWDOQLFHOLþQLWHOHNRPXQLNDFLMVNLVLVWHPID]D±*RYRUVSROQRKLWURVWMR±
GHO'HWHNFLMDJRYRUQLKGHMDYQRVWL9$'YSURPHWQLKNDQDOLKVSROQRKLWURVWMR
JRYRUD*60UD]OLþLFD
Digital cellular telecommunications system (Phase 2) (GSM); Full rate speech; Part 6:
Voice Activity Detection (VAD) for full rate speech traffic channels (GSM 06.32 version
4.3.1)
Ta slovenski standard je istoveten z: ETS 300 580-6 Edition 4
ICS:
33.070.50 Globalni sistem za mobilno Global System for Mobile
telekomunikacijo (GSM) Communication (GSM)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.
EUROPEAN ETS 300 580-6
TELECOMMUNICATION April 1998
STANDARD Fourth Edition
Source: SMG Reference: RE/SMG-110632PR3
ICS: 33.020
Key words: Digital cellular telecommunications system, Global System for Mobile communications (GSM)
R
GLOBAL SYSTEM FOR
MOBILE COMMUNICATIONS
Digital cellular telecommunications system (Phase 2)
Full rate speech;
Part 6: Voice Activity Detection (VAD) for full rate speech traffic
channels
(GSM 06.32 version 4.3.1)
ETSI
European Telecommunications Standards Institute
ETSI Secretariat
Postal address: F-06921 Sophia Antipolis CEDEX - FRANCE
Office address: 650 Route des Lucioles - Sophia Antipolis - Valbonne - FRANCE
Internet: secretariat@etsi.fr - http://www.etsi.fr - http://www.etsi.org
Tel.: +33 4 92 94 42 00 - Fax: +33 4 93 65 47 16
Copyright Notification: No part may be reproduced except as authorized by written permission. The copyright and the
foregoing restriction extend to reproduction in all media.
© European Telecommunications Standards Institute 1998. All rights reserved.
Page 2
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Whilst every care has been taken in the preparation and publication of this document, errors in content,
typographical or otherwise, may occur. If you have comments concerning its accuracy, please write to
"ETSI Editing and Committee Support Dept." at the address shown on the title page.
Page 3
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Contents
Foreword .5
0.1 Scope .7
0.2 Normative references .7
0.3 Abbreviations.7
1 General.7
2 Functional description.8
2.1 Overview and principles of operation .8
2.2 Algorithm description.8
2.2.1 Adaptive filtering and energy computation.10
2.2.2 ACF averaging.10
2.2.3 Predictor values computation .11
2.2.4 Spectral comparison .11
2.2.5 Periodicity detection.12
2.2.6 Information tone detection .13
2.2.7 Threshold adaptation .14
2.2.8 VAD decision .17
2.2.9 VAD hangover addition .17
3 Computational details .17
3.1 Adaptive filtering and energy computation .19
3.2 ACF averaging .20
3.3 Predictor values computation.21
3.3.1 Schur recursion to compute reflection coefficients.21
3.3.2 Step-up procedure to obtain the aav1[0.8].22
3.3.3 Computation of the rav1[0.8] .23
3.4 Spectral comparison .23
3.5 Periodicity detection .24
3.6 Threshold adaptation.25
3.7 VAD decision.27
3.8 VAD hangover addition .27
3.9 Periodicity updating .28
3.10 Tone detection.28
3.10.1 Windowing .28
3.10.2 Auto-correlation.29
3.10.3 Computation of the reflection coefficients.29
3.10.4 Filter coefficient calculation.30
3.10.5 Pole Frequency Test.30
3.10.6 Prediction gain test .31
4 Digital test sequences .31
4.1 Test configuration.31
4.2 Test sequences.32
Annex A (informative).33
Annex 1 (informative): Simplified block filtering operation.33
Annex 2 (informative): Description of digital test sequences.34
A.2.1 Test sequences .34
A.2.2 File format description .35
Page 4
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Annex 3 (informative): VAD performance. 38
Annex 4 (informative): Pole frequency calculation. 39
Annex B (normative): Test sequences diskette . 40
History. 41
Page 5
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Foreword
This fourth edition European Telecommunication Standard (ETS) has been produced by the Special
Mobile Group (SMG) of the European Telecommunications Standards Institute (ETSI).
This ETS specifies the Voice Activity Detector (VAD) to be used in the Discontinuous Transmission (DTX)
as described in GSM 06.31 for the digital cellular telecommunications system (Phase 2).
This ETS correspond to GSM technical specification, GSM 06.32 version 4.3.1.
A 3,5 inch diskette (annex B) is attached to the back cover of this ETS, the diskette contain test
sequences, as described in clause A.2.2
Diskette 1 ETS 300 580-6, annex A.2: Test sequences for the GSM Full Rate speech
codec; Test sequences files *.inp, *.cod, *.vad.
The specification from which this ETS has been derived was originally based on CEPT documentation,
hence the presentation of this ETS may not be entirely in accordance with the ETSI/PNE Rules.
Transposition dates
Date of adoption of this ETS: 3 April 1998
Date of latest announcement of this ETS (doa): 30 June 1998
Date of latest publication of new National Standard
or endorsement of this ETS (dop/e): 31 December 1998
Date of withdrawal of any conflicting National Standard (dow): 31 December 1998
Page 6
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Blank page
Page 7
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
0.1 Scope
This European Telecommunication Standard (ETS) specifies the Voice Activity Detector (VAD) to be used
in the Discontinuous Transmission (DTX) as described in GSM 06.31. It also specifies the test methods to
be used to verify that a VAD complies with the technical specification.
The requirements are mandatory on any VAD to be used either in the GSM Mobile Stations or Base
Station Systems.
0.2 Normative references
This ETS incorporates by dated and undated reference, provisions from other publications. These
normative references are cited at the appropriate places in the text and the publications are listed
hereafter. For dated references, subsequent amendments to or revisions of any of these publications
apply to this ETS only when incorporated in it by amendment or revision. For undated references, the
latest edition of the publication referred to applies.
[1] GSM 01.04 (ETR 100): "Digital cellular telecommunications system (Phase 2);
Abbreviations and acronyms".
[2] GSM 06.10 (ETS 300 580-2): "Digital cellular telecommunications system
(Phase 2); Full rate speech transcoding".
[3] GSM 06.12 (ETS 300 580-4): "Digital cellular telecommunications system
(Phase 2); Comfort noise aspect for full rate speech traffic channels".
[4] GSM 06.31 (ETS 300 580-5): "Digital cellular telecommunications system
(Phase 2); Discontinuous Transmission (DTX) for full rate speech traffic
channels".
0.3 Abbreviations
Abbreviations used in this ETS are listed in GSM 01.04 [1].
1 General
The function of the VAD is to indicate whether each 20 ms frame produced by the speech encoder
contains speech or not. The output is a binary flag which is used by the TX DTX handler defined in
GSM 06.31.
The technical specification is organized as follows:
Clause 2 describes the principles of operation of the VAD.
In clause 3, the computational details necessary for the fixed point implementation of the VAD algorithm
are given. This clause uses the same notation as used for computational details in GSM 06.10.
The verification of the VAD is based on the use of digital test sequences. Clause 4 defines the input and
output signals and the test configuration, whereas the detailed description of the test sequences is
contained in annex 2.
The performance of the VAD algorithm is characterized by the amount of audible speech clipping it
introduces and the percentage activity it indicates. These characteristics for the VAD defined in this
technical specification have been established by extensive testing under a wide range of operating
conditions. The results are summarized in annex 3.
Page 8
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
2 Functional description
The purpose of this clause is to give the reader an understanding of the principles of operation of the
VAD, whereas the detailed description is given in clause 3. In case of discrepancy between the two
descriptions, the detailed description of clause 3 shall prevail.
In the following subclauses of clause 2, a Pascal programming type of notation has been used to describe
the algorithm.
2.1 Overview and principles of operation
The function of the VAD is to distinguish between noise with speech present and noise without speech
present. The biggest difficulty for detecting speech in a mobile environment is the very low speech/noise
ratios which are often encountered. The accuracy of the VAD is improved by using filtering to increase the
speech/noise ratio before the decision is made.
For a mobile environment, the worst speech/noise ratios are encountered in moving vehicles. It has been
found that the noise is relatively stationary for quite long periods in a mobile environment. It is therefore
possible to use an adaptive filter with coefficients obtained during noise, to remove much of the vehicle
noise.
The VAD is basically an energy detector. The energy of the filtered signal is compared with a threshold;
speech is indicated whenever the threshold is exceeded.
The noise encountered in mobile environments may be constantly changing in level. The spectrum of the
noise can also change, and varies greatly over different vehicles. Because of these changes the VAD
threshold and adaptive filter coefficients must be constantly adapted. To give reliable detection the
threshold must be sufficiently above the noise level to avoid noise being identified as speech but not so far
above it that low level parts of speech are identified as noise. The threshold and the adaptive filter
coefficients are only updated when speech is not present. It is, of course, potentially dangerous for a VAD
to update these values on the basis of its own decision. This adaptation therefore only occurs when the
signal seems stationary in the frequency domain but does not have the pitch component inherent in voiced
speech. A tone detector is also used to prevent adaptation during information tones.
A further mechanism is used to ensure that low level noise (which is often not stationary over long
periods) is not detected as speech. Here, an additional fixed threshold is used.
A VAD hangover period is used to eliminate mid-burst clipping of low level speech. Hangover is only
added to speech-bursts which exceed a certain duration to avoid extending noise spikes.
2.2 Algorithm description
The block diagram of the VAD algorithm is shown in figure 2-1. The individual blocks are described in the
following clauses. ACF, N and sof are calculated in the speech encoder.
Page 9
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Adaptive
v
p
vad
VAD
vad
ACF vad
filtering and
VAD
hangover
energy
decision
addition
com putation
r
vad
th
vad
ptch
Periodicity
N Threshold
detection
adaptation
stat
sof Tone
detection
tone
Predictor
r
av1
Spectral
values
comparison
com putation
av1
av0
ACF
averaging
Figure 2-1: Functional block diagram of the VAD
The global variables shown in the block diagram are described as follows:
- ACF are auto-correlation coefficients which are calculated in the speech encoder defined in
GSM 06.10 (subclause 3.1.4, see also annex 1). The inputs to the speech encoder are 16 bit 2's
complement numbers, as described in GSM 06.10, subclause 4.2.0.
- av0 and av1 are averaged ACF vectors.
- rav1 are autocorrelated predictor values obtained from av1.
- rvad are the autocorrelated predictor values of the adaptive filter.
- N is the long term predictor lag value which is obtained every sub-segment in the speech coder
defined in GSM 06.10.
- ptch indicates whether the signal has a steady periodic component.
- sof is the offset compensated signal frame obtained in the speech coder defined in GSM 06.10.
- pvad is the energy in the current frame of the input signal after filtering.
- thvad is an adaptive threshold.
- stat indicates spectral stationarity.
- vvad indicates the VAD decision before hangover is added.
- vad is the final VAD decision with hangover included.
Page 10
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
2.2.1 Adaptive filtering and energy computation
Pvad is computed as follows:
Pvad=+rvad acf 2 rvad acf
00 i i
∑
i=1
This corresponds to performing an 8th order block filtering on the input samples to the speech encoder,
after zero offset compensation and pre-emphasis. This is explained in annex 1.
2.2.2 ACF averaging
Spectral characteristics of the input signal have to be obtained using blocks that are larger than one 20 ms
frame. This is done by averaging the auto-correlation values for several consecutive frames. This
averaging is given by the following equations:
frames-1
av00{}n=-acf{n j} ;i= .8
∑
ii
j=0
av10{}n=-av {n frames} ;i=0.8
ii
Where n represents the current frame, n-1 represents the previous frame etc. The values of constants are
given in table 2-1.
Table 2-1: Constants and variables for ACF averaging
Constant Value Variable Initial value
frames 4 previous ACF's
av0 & av1 All set to 0
Page 11
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
2.2.3 Predictor values computation
The filter predictor values aav1 are obtained from the auto-correlation values av1 according to the
equation:
-1
aRp=
where:
- -
R = | av1[0], av1[1], av1[2], av1[3], av1[4], av1[5], av1[6], av1[7] |
| av1[1], av1[0], av1[1], av1[2], av1[3], av1[4], av1[5], av1[6] |
| av1[2], av1[1], av1[0], av1[1], av1[2], av1[3], av1[4], av1[5] |
| av1[3], av1[2], av1[1], av1[0], av1[1], av1[2], av1[3], av1[4] |
| av1[4], av1[3], av1[2], av1[1], av1[0], av1[1], av1[2], av1[3] |
| av1[5], av1[4], av1[3], av1[2], av1[1], av1[0], av1[1], av1[2] |
| av1[6], av1[5], av1[4], av1[3], av1[2], av1[1], av1[0], av1[1] |
| av1[7], av1[6], av1[5], av1[4], av1[3], av1[2], av1[1], av1[0] |
- -
and:
- - - -
p = |av1[1]| a = |aav1[1]|
|av1[2]| |aav1[2]|
|av1[3]| |aav1[3]|
|av1[4]| |aav1[4]|
|av1[5]| |aav1[5]|
|av1[6]| |aav1[6]|
|av1[7]| |aav1[7]|
|av1[8]| |aav1[8]|
- - - -
aav1[0] = -1
av1 is used in preference to av0 as av0 may contain speech.
The autocorrelated predictor values rav1 are then obtained:
8-i
rav11==aav aav1 ;.i0.8
ik ki+
∑
k =0
2.2.4 Spectral comparison
The spectra represented by the autocorrelated predictor values rav1 and the averaged auto-correlation
values av0 are compared using the distortion measure dm defined below. This measure is used to
produce a boolean value stat every 20 ms, as given by these equations:
Page 12
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
rav10av +2 rav1 av0
00 ∑ ii
Łłi=1
dm =
av0
difference = |dm - lastdm|
lastdm = dm
stat = difference < thresh
The values of constants and initial values are given in table 2-2.
Table 2-2: Constants and variables for spectral comparison
Constant Value Variable Initial value
thresh 0.05 lastdm 0
2.2.5 Periodicity detection
The frequency spectrum of mobile noise is relatively stationary over quite long periods. The Inverse Filter
Autocorrelated Predictor coefficients of the adaptive filter rvad are only updated when this stationarity is
detected. Vowel sounds however, also have this stationarity, but can be excluded by detecting the
periodicity of these sounds using the long term predictor lag values (Nj) which are obtained every
sub-segment from the speech codec defined in GSM 06.10. Consecutive lag values are compared. Cases
in which one lag value is a factor of the other are catered for, however cases in which both lag values
have a common factor, are not. This case is not important for speech input but this method of periodicity
detection may fail for some sine waves. The boolean variable ptch is updated every 20 ms and is true
when periodicity is detected. It is calculated according to the following equation:
ptch = oldlagcount + veryoldlagcount >= nthresh
The following operations are done after the VAD decision and when the current LTP lag values (N0 . N3)
are available, this reduces the delay of the VAD decision. (N{-1} = N3 of previous segment.)
lagcount = 0
for j = 0 to 3 do
begin
smallag = maximum(Nj,N{j-1}) mod minimum(Nj,N{j-1})
if minimum(smallag,minimum(Nj,N{j-1})-smallag) < lthresh
then increment(lagcount)
end
veryoldlagcount = oldlagcount
oldlagcount = lagcount
Page 13
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
The values of constants and initial values are given in table 2-3.
Table 2-3: Constants and variables for periodicity detection
Constant Value Variable Initial value
lthresh 2 oldlagcount 0
nthresh 4 veryoldlagcount 0
N3 40
2.2.6 Information tone detection
The tone flag is only evaluated in the downlink VAD. In the uplink VAD, tone detection is not performed
and tone = false.
Computation of the tone flag is complex. It is therefore evaluated after the processing of the current
speech encoder frame. In this way transmission of the speech or SID frame is not delayed.
Information tones and environmental noise can be classified by inspecting the short term prediction gain,
information tones resulting in higher prediction gains than environmental noise. Tones can therefore be
detected by comparing the prediction gain to a fixed threshold. By limiting the prediction gain calculation to
a fourth order analysis, information signals consisting of one or two tones can be detected whilst
minimizing the prediction gain for environmental noise.
The prediction gain decision is implemented by comparing the normalized prediction error with a
threshold. This measure is used to evaluate the boolean variable tone every 20 ms. The signal is
classified as a tone if the prediction error is smaller than the threshold predth. This is equivalent to a
prediction gain threshold of 13,5 dB.
Mobile noise can contain very strong resonances at low frequencies, resulting in a high prediction gain. A
further test is therefore made to determine the pole frequency of a second order analysis of the signal
frame. The signal is classified as noise if the frequency of the pole is less than 385 Hz. The pole
frequency calculation is described in annex 4.
The algorithm for detecting information tones is as follows:
tone = false
den = a[1]*a[1]
num = 4*a[2] - a[1]*a[1]
if ( num <= 0 )
return
if (( a[1] < 0 ) AND ( num / den < freqth ))
return
prederr = MULT (1 - RC[i]*RC[i])
i=1
if (prederr < predth)
tone = true
return
The values of the constants are given in table 2-4. The coefficients a[0.2] are transversal filter coefficients
calculated from RC[1.2]. The calculation of the reflection coefficients RC[1.4] is described below.
Page 14
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
The offset compensated signal frame sof[0.159] is multiplied by the Hanning window to give the
windowed frame sofh[0.159]:
sofh==sof hann i 01. 59
ii i
where
i
hann=-05.c1 os 2p i =01. 59
i
ŁłŁ159łŁł
The auto-correlation acfh[0.4] of the windowed signal frame is then calculated:
acfh==sofh sofh ;.k 04.
ki ik-∑
ik=
RC[1.4] are then calculated from acfh[0.4] using the Schur recursion described in the RPE-LTP codec.
Table 2-4: Constants for information tone detection.
Constant Value
freqth 0,0973
predth 0,0158
NOTE: Reflection coefficients are available in the RPE-LTP codec. However, they are
calculated after pre-emphasis using a rectangular window and do not give good tone
detection results.
2.2.7 Threshold adaptation
A check is made every 20 ms to determine whether the VAD decision threshold (thvad) should be
changed. This adaptation is carried out according to the flowchart shown in figure 2-2. The constants used
are given in table 2-5.
Adaptation takes place in two different situations: firstly whenever ACF[0] is very low and secondly
whenever there is a very high probability that speech and information tones are not present.
In the first case, the threshold is adapted if the energy of the input signal is less than pth. The threshold is
set to plev without carrying out any further tests because at these very low levels the effect of the signal
quantization makes it impossible to obtain reliable results from these tests.
In the second case, the decision threshold (thvad) and the adaptive filter coefficients (rvad) are only
updated with the rav1 values when there is a very high probability that speech and information tones are
not present. Adaptation occurs if the following conditions are met over a number (adp) of signal frames:
- Stationarity is detected in the frequency domain.
- The signal does not contain a periodic component.
- Information tones are not present.
Page 15
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
The step-size by which the threshold is adapted is not constant but a proportion of the current value
(determined by constants dec and inc). The adaptation begins by experimentally multiplying the threshold
by a factor of (1-1/dec). If the new threshold is now higher than or equal to Pvad times fac then the
threshold needed to be decreased and it is left at this new lower level. If, on the other hand, the new
threshold level is less than Pvad times fac then the threshold either needed to be increased or kept
constant. In this case it is set to Pvad times fac unless this would mean multiplying it by more than a factor
of (1+1/inc) (in which case it is multiplied by a factor of (1+1/inc)). The threshold is never allowed to be
greater than Pvad+margin.
Table 2-5: Constants and variables for threshold adaptation
Constant Value Variable Initial value
pth 300 000 adaptcount 0
plev 800 000 thvad 1 000 000
fac 3.0 rvad[0] 6
adp 8 rvad[1] -4
inc 16 rvad[2] 1
dec 32 rvad[3] to
margin 80 000 000 rvad[8] All 0
Page 16
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
BEGIN
yes
ACF[0] < pth ?
no
th = plev
vad
yes
stat and not ptch
increment
and not tone ?
adaptcount
no
adaptcount = 0
END
adaptcount > adp ?
no
yes
th th th
= - / dec
vad vad vad
yes
= min (th + th / inc , p
th
th < p
* fac ? *fac)
vad vad
vad vad vad
vad
no
yes
th = p
+ margin th > p
+ margin ?
vad vad
vad vad
no
r
= r
vad av1
adaptcount = adp + 1
END
Figure 2-2: Flow diagram for threshold adaptation
Page 17
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
2.2.8 VAD decision
Prior to hangover the VAD decision condition is:
vvad = pvad > thvad
2.2.9 VAD hangover addition
VAD hangover is only added to bursts of speech greater than or equal to burstconst blocks. The boolean
variable vad indicates the decision of the VAD with hangover included. The values of the constants are
given in table 2-6. The hangover algorithm is as follows:
vvad = pvad > thvad
Table 2-6: Constants and variables for VAD hangover addition
Constant Value Variable Initial value
burstconst 3 burstcount 0
hangconst 5 hangcount -1
3 Computational details
In the next paragraphs, the detailed description of the VAD algorithm follows the preceding high level
description. This detailed description is divided in ten clauses related to the blocks of figure 2-1 (except
periodicity updating) in the high level description of the VAD algorithm.
Those clauses are:
1) Adaptive filtering and energy computation;
2) ACF averaging;
3) Predictor values computation;
4) Spectral comparison;
5) Periodicity detection;
6) Threshold adaptation;
7) VAD decision;
8) VAD hangover addition;
9) Periodicity updating;
10) Information tone detection.
The VAD algorithm takes as input the following variables of the RPE-LTP encoder (see the detailed
description of the RPE-LTP encoder GSM 06.10):
Page 18
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
- L_ACF[0.8], auto-correlation function ( GSM 06.10/4.2.4);
- scalauto, scaling factor to compute the L_ACF[0.8] ( GSM 06.10/4.2.4);
- Nc, LTP lag (one for each sub-segment, GSM 06.10/4.2.11).
- sof, offset compensated signal frame (GSM 06.10/4.2.2).
So four Nc values are needed for the VAD algorithm.
The VAD computation can start as soon as the L_ACF[0.8] and scalauto variables are known. This
means that the VAD computation can take place after part 4.2.4 of GSM 06.10 (Auto-correlation) of the
LPC analysis clause of the RPE-LTP encoder. This scheme will reduce the delay to yield the VAD
information. The periodicity updating (included in subclause 2.2.5) and information tone detection, are
done after the processing of the current speech encoder frame.
All the arithmetic operations and names of the variables follow the RPE-LTP detailed description. To
increase the precision within the fixed point implementation, a pseudo-floating point representation of
some variables is used. This stands for the following variables (and related constants) of the
VAD algorithm:
pvad: Energy of filtered signal;
thvad: Threshold of the VAD decision;
acf0: Energy of input signal.
For the representation of these variables, two integers (16 bits) are needed:
- one for the exponent (e_pvad, e_thvad, e_acf0);
- one for the mantissa (m_pvad, m_thvad, m_acf0).
The value e_pvad represents the lowest power of 2 just greater or equal to the actual value of pvad and
the m_pvad value represents a integer which is always greater or equal to 16384 (normalized mantissa). It
means that the pvad value is equal to:
e_pvad
pvad = 2 *(m_pvad/32768)
This scheme guarantees a large dynamic range for the pvad value and always keeps a precision of
16 bits. All the comparisons are easy to make by comparing the exponents of two variables and the VAD
algorithm needs only one pseudo-floating point addition. All the computations related to the
pseudo-floating point variables require very simple 16 or 32 bits arithmetic operations defined in the
detailed description of the RPE-LTP encoder. This pseudo-floating point arithmetic is only used in clauses
3.1 and 3.6.
Table 3-1 gives a list of all the variables of the VAD algorithm that must be initialized in the reset
procedure and kept in memory for processing the subsequent frame of the RPE- LTP encoder. The types
(16 or 32 bits) and initial values of all these variables are clearly indicated and their related subclause is
also mentioned. The bit exact implementation uses other temporary variables that are introduced in the
detailed description whenever it is needed.
Page 19
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Table 3-1: Initial values for variables to be stored in memory
Names of variables: type (# of bits): Initialization: Subclause:
Adaptive filter coefficients:
rvad[0] 16 24 576 3.1, 3.6
rvad[1] 16 -16 384 3.1, 3.6
rvad[2] 16 4 096 3.1, 3.6
rvad[3.8] 16 0 3.1, 3.6
Scaling factor of ravd[0.8]:
normrvad 16 7 3.1, 3.6
Delay line of the auto-correlation coefficients:
L_sacf[0.26] 32 0 3.2
L_sav0[0.35] 32 0 3.2
Pointers on the delay lines:
pt_sacf 16 0 3.2
pt_sav0 16 0 3.2
Distance measure:
L_lastdm 32 0 3.4
Periodicity counters:
oldlagcount 16 0 3.5, 3.9
veryoldlagcount 16 0 3.5, 3.9
Adaptive threshold:
e_thvad (exponent) 16 20 3.6
m_thvad (mantissa) 16 31 250 3.6
Counter for adaptation:
adaptcount 16 0 3.6
Hangover flags:
burstcount 16 0 3.8
hangcount 16 -1 3.8
LTP lag memory:
oldlag 16 40 3.9
Tone Detection
tone 16 0 3.10
3.1 Adaptive filtering and energy computation
This clause computes the e_pvad and m_pvad variables which represent the pvad value. It needs the
L_ACF[0.8] and scalauto variables of the RPE-LTP algorithm and the rvad[0.8] and normrvad variables
produced by clause 3.6 of the VAD algorithm. It also computes a floating point representation of L_ACF[0]
( e_acf0 and m_acf0) used in clause 3.6.
Page 20
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Test if L_ACF[0] is equal to 0:
IF ( scalauto < 0 ) THEN scalvad = 0;
ELSE scalvad = scalauto; / keep scalvad for use in clause 3.2 /
IF ( L_ACF[0] == 0 ) THEN
| e_pvad = -32768;
| m_pvad = 0;
| e_acf0 = -32768;
| m_acf0 = 0;
| EXIT /continue with clause 3.2/
Re-normalization of the L_ACF[0.8]:
normacf = norm( L_ACF[0] );
| FOR i = 0 to 8:
| sacf[i] = ( L_ACF[i] << normacf ) >> 19;
| NEXT i:
Computation of e_acf0 and m_acf0:
e_acf0 = add( 32, (scalvad << 1 ) );
e_acf0 = sub( e_acf0, normacf);
m_acf0 = sacf[0] << 3;
Computation of e_pvad and m_pvad:
e_pvad = add( e_acf0, 14 );
e_pvad = sub( e_pvad, normrvad );
L_temp = 0;
| FOR i = 1 to 8:
| L_temp = L_add( L_temp, L_mult( sacf[i], rvad[i] ) );
| NEXT i:
L_temp = L_add( L_temp, L_mult( sacf[0], rvad[0] ) >> 1 );
IF ( L_temp <= 0 ) THEN L_temp = 1;
normprod = norm( L_temp );
e_pvad = sub( e_pvad, normprod );
m_pvad = ( L_temp << normprod ) >> 16;
3.2 ACF averaging
This clause uses the L_ACF[0.8] and the scalvad variables to compute the array L_av0[0.8] and
L_av1[0.8] used in clause 3.3 and 3.4.
Computation of the scaling factor:
scal = sub( 10, (scalvad << 1) );
Page 21
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Computation of the arrays L_av0[0.8] and L_av1[0.8]:
| FOR i = 0 to 8:
| L_temp = L_ACF[i] >> scal;
| L_av0[i] = L_add( L_sacf[i], L_temp );
| L_av0[i] = L_add( L_sacf[i+9], L_av0[i] );
| L_av0[i] = L_add( L_sacf[i+18], L_av0[i] );
| L_sacf[ pt_sacf + i ] = L_temp;
| L_av1[i] = L_sav0[ pt_sav0 + i ];
| L_sav0[ pt_sav0 + i] = L_av0[i];
| NEXT i:
Update of the array pointers:
IF ( pt_sacf == 18 ) THEN pt_sacf = 0;
ELSE pt_sacf = add( pt_sacf, 9);
IF ( pt_sav0 == 27 ) THEN pt_sav0 = 0;
ELSE pt_sav0 = add( pt_sav0, 9);
3.3 Predictor values computation
This clause computes the array rav1[0.8] needed for the spectral comparison and the threshold
adaptation. It uses the L_av1[0.8] computed in clause 3.2, and is divided in the three following
sub-clauses:
- Schur recursion to compute reflection coefficients.
- Step up procedure to obtain the aav1[0.8].
- Computation of the rav1[0.8].
3.3.1 Schur recursion to compute reflection coefficients
This sub-clause is identical to the one used in the RPE-LTP algorithm. The array vpar[1.8] is computed
with the array L_av1[0.8] as an input.
Schur recursion with 16 bits arithmetic:
IF( L_av1[0] == 0 ) THEN
|== FOR i = 1 to 8:
| vpar[i] = 0;
|== NEXT i:
| EXIT; /continue with subclause 3.3.2/
temp = norm( L_av1[0] );
|== FOR k=0 to 8:
| sacf[k] = ( L_av1[k] << temp ) >> 16;
|== NEXT k:
Page 22
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Initialize array P[.] and K[.] for the recursion:
|== FOR i=1 to 7:
| K[9-i] = sacf[i];
|== NEXT i:
|== FOR i=0 to 8:
| P[i] = sacf[i];
|== NEXT i:
Compute reflection coefficients:
|== FOR n=1 to 8:
| IF( P[0] < abs( P[1] ) ) THEN
| |== FOR i = n to 8:
| | vpar[i] = 0;
| |== NEXT i:
| | EXIT; /continue with
| | subclause 3.3.2/
| vpar[n] = div( abs( P[1] ), P[0] );
| IF ( P[1] > 0 ) THEN vpar[n] = sub( 0, vpar[n] );
| IF ( n == 8 ) THEN EXIT; /continue with subclause 3.3.2/
|
| Schur recursion:
|
| P[0] = add( P[0], mult_r( P[1], vpar[n] ) );
|==== FOR m=1 to 8-n:
| P[m] = add( P[m+1], mult_r( K[9-m], vpar[n] ) );
| K[9-m] = add( K[9-m], mult_r( P[m+1], vpar[n] ) );
|==== NEXT m:
|
|== NEXT n:
3.3.2 Step-up procedure to obtain the aav1[0.8]
Initialization of the step-up recursion:
L_coef[0] = 16384 << 15;
L_coef[1] = vpar[1] << 14;
Loop on the LPC analysis order:
|= FOR m = 2 to 8:
|== FOR i = 1 to m-1:
|== temp = L_coef[m-i] >> 16; / takes the msb /
|== L_work[i] = L_add( L_coef[i], L_mult( vpar[m], temp ) );
|== NEXT i
|=
|== FOR i = 1 to m-1:
|== L_coef[i] = L_work[i];
|== NEXT i
|=
|= L_coef[m] = vpar[m] << 14;
|= NEXT m:
Page 23
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Keep the aav1[0.8] on 13 bits for next clause:
| FOR i = 0 to 8:
| aav1[i] = L_coef[i] >> 19;
| NEXT i:
3.3.3 Computation of the rav1[0.8]
|= FOR i= 0 to 8:
|= L_work[i] = 0;
|== FOR k = 0 to 8-i:
|== L_work[i] = L_add( L_work[i], L_mult( aav1[k], aav1[k+i] ) );
|== NEXT k:
|= NEXT i:
IF ( L_work[0] == 0 ) THEN normrav1 =0;
ELSE normrav1 = norm( L_work[0] );
|= FOR i= 0 to 8:
|= rav1[i] = ( L_work[i] << normrav1 ) >> 16;
|= NEXT i:
Keep the normrav1 for use in clause 3.4 and 3.6.
3.4 Spectral comparison
This clause computes the variable stat needed for the threshold adaptation. It uses the array L_av0[0.8]
computed in clause 3.2 and the array rav1[0.8] computed in subclause 3.3.3.
Re-normalize L_av0[0.8]:
IF ( L_av0[0] == 0 ) THEN
| FOR i = 0 to 8:
| sav0[i] = 4095;
| NEXT i:
ELSE
| shift = norm( L_av0[0] );
|= FOR i = 0 to 8:
|= sav0[i] = ( L_av0[i] << shift-3 ) >> 16;
|= NEXT i:
Compute partial S of dm:
L_ S p = 0;
|= FOR i = 1 to 8:
|= L_ S p = L_add( L_ S p, L_mult( rav1[i], sav0[i] ) );
|= NEXT i:
Page 24
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Compute the division of partial S by sav0[0]:
IF ( L_ S p < 0 ) THEN L_temp = L_sub( 0, L_ S p );
ELSE L_temp = L_ S p;
IF ( L_temp == 0 ) THEN
| L_dm = 0;
| shift = 0;
ELSE
| sav0[0] = sav0[0] << 3;
| shift = norm( L_temp );
| temp = ( L_temp << shift ) >> 16;
| IF ( sav0[0] >= temp ) THEN
| | divshift = 0;
| | temp = div( temp, sav0[0] );
| ELSE
| | divshift = 1;
| | temp = sub( temp, sav0[0] );
| | temp = div( temp, sav0[0] );
|
| IF( divshift == 1 ) THEN L_dm = 32768;
| ELSE L_dm = 0;
|
| L_dm = L_add( L_dm, temp) << 1;
| IF( L_ S p < 0 ) THEN L_dm = L_sub( 0, L_dm);
Re-normalization and final computation of L_dm:
L_dm = ( L_dm << 14 );
L_dm = L_dm >> shift;
L_dm = L_add( L_dm, ( rav1[0] << 11 ) );
L_dm = L_dm >> normrav1;
Compute the difference and save L_dm:
L_temp = L_sub( L_dm, L_lastdm );
L_lastdm = L_dm;
IF ( L_temp < 0 ) THEN L_temp = L_sub( 0, L_temp );
L_temp = L_sub( L_temp, 3277 );
Evaluation of the stat flag:
IF ( L_temp < 0 ) THEN stat = 1;
ELSE stat = 0;
3.5 Periodicity detection
This clause just sets the ptch flag needed for the threshold adaptation.
temp = add( oldlagcount, veryoldlagcount );
IF ( temp >= 4 ) THEN ptch = 1;
ELSE ptch = 0;
Page 25
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
3.6 Threshold adaptation
This clause uses the variables e_pvad, m_pvad, e_acf0 and m_acf0 computed in clause 3.1. It also uses
the flags stat (see clause 3.4) and ptch (see clause 3.5). It follows the flowchart represented on figure 2.2.
Some constants, represented by a floating point format, are needed and a symbolic name (in capital
letter) for their exponent and mantissa is used; table 3-2 lists all these constants with the symbolic names
associated and their numerical constant values.
Table 3-2: List of constants
Constant Exponent Mantissa
pth E_PTH = 19 M_PTH = 18 750
margin E_MARGIN = 27 M_MARGIN = 19 531
plev E_PLEV = 20 M_PLEV = 25 000
NOTE: Floating point representation of constants used in clause 3.6:
pth = 2(E_PTH)x(M_PTH/32768).
margin = 2(E_MARGIN)x(M_MARGIN/32768).
plev = 2(E_PLEV)x(M_PLEV/32768).
Test if acf0 < pth; if yes set thvad to plev:
comp = 0;
IF ( e_acf0 < E_PTH ) THEN comp = 1;
IF ( e_acf0 == E_PTH ) THEN IF ( m_acf0 < M_PTH ) THEN comp =1;
IF ( comp == 1 ) THEN
| e_thvad = E_PLEV;
| m_thvad = M_PLEV;
| EXIT; /continue with clause 3.7/
Test if an adaptation is needed:
comp = 0;
IF ( ptch == 1 ) THEN comp = 1;
IF ( stat == 0 ) THEN comp = 1;
IF ( tone == 1 ) THEN comp = 1;
IF ( comp == 1 ) THEN
| adaptcount = 0;
| EXIT; /continue with clause 3.7/
Incrementation of adaptcount:
adaptcount = add( adaptcount, 1 );
IF ( adaptcount <= 8 ) THEN EXIT; /continue with clause 3.7/
Computation of thvad-(thvad/dec):
m_thvad = sub( m_thvad, (m_thvad >> 5 ) );
IF ( m_thvad < 16384) THEN
| m_thvad = m_thvad << 1;
| e_thvad = sub( e_thvad, 1 );
Computation of pvad*fac:
L_temp = L_add( m_pvad, m_pvad );
L_temp = L_add( L_temp, m_pvad );
L_temp = L_temp >> 1;
e_temp = add( e_pvad, 1 );
IF ( L_temp > 32767 ) THEN
| L_temp = L_temp >> 1;
| e_temp = add( e_temp, 1 );
m_temp = L_temp;
Page 26
ETS 300 580-6 (GSM 06.32 version 4.3.1): April 1998
Test if thvad < pvad*fac:
comp = 0;
IF ( e_thvad < e_temp) THEN comp = 1;
IF (e_thvad == e_temp) THEN IF (m_thvad < m_temp) THEN comp =1;
Computation of minimum (thvad+(thvad/inc), pvad*fac) if comp = 1:
IF ( comp == 1 ) THEN
| Compute thvad +(thvad/inc).
| L_temp = L_add( m_thvad, (
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...