European digital cellular telecommunications system; Half rate speech; Part 2: Half rate speech transcoding (GSM 06.20)

DE/SMG-020620

Evropski digitalni celični telekomunikacijski sistem – Govor s polovično hitrostjo – 2. del: Prekodiranje govora s polovično hitrostjo (GSM 06.20)

General Information

Status
Published
Publication Date
18-Dec-1995
Technical Committee
Current Stage
12 - Completion
Due Date
01-Dec-1995
Completion Date
19-Dec-1995
Mandate
Standard
ETS 300 581-2 E1:2003
English language
49 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


SLOVENSKI STANDARD
01-december-2003
(YURSVNLGLJLWDOQLFHOLþQLWHOHNRPXQLNDFLMVNLVLVWHP±*RYRUVSRORYLþQRKLWURVWMR±
GHO3UHNRGLUDQMHJRYRUDVSRORYLþQRKLWURVWMR *60
European digital cellular telecommunications system; Half rate speech; Part 2: Half rate
speech transcoding (GSM 06.20)
Ta slovenski standard je istoveten z: ETS 300 581-2 Edition 1
ICS:
33.070.50 Globalni sistem za mobilno Global System for Mobile
telekomunikacijo (GSM) Communication (GSM)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

EUROPEAN ETS 300 581-2
TELECOMMUNICATION November 1995
STANDARD
Source: ETSI TC-SMG Reference: DE/SMG-020620
ICS: 33.060.50
European digital cellular telecommunications system, Global System for Mobile communications
Key words:
(GSM), CODEC, GSM, speech
European digital cellular telecommunications system;
Half rate speech
Part 2: Half rate speech transcoding
(GSM 06.20)
ETSI
European Telecommunications Standards Institute
ETSI Secretariat
F-06921 Sophia Antipolis CEDEX - FRANCE
Postal address:
650 Route des Lucioles - Sophia Antipolis - Valbonne - FRANCE
Office address:
c=fr, a=atlas, p=etsi, s=secretariat - secretariat@etsi.fr
X.400: Internet:
Tel.: +33 92 94 42 00 - Fax: +33 93 65 47 16
Copyright Notification: No part may be reproduced except as authorized by written permission. The copyright and the
foregoing restriction extend to reproduction in all media.
© European Telecommunications Standards Institute 1995. All rights reserved.
New presentation - see History box

Page 2
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
Whilst every care has been taken in the preparation and publication of this document, errors in content,
typographical or otherwise, may occur. If you have comments concerning its accuracy, please write to
"ETSI Editing and Committee Support Dept." at the address shown on the title page.

Page 3
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
Contents
Foreword .5
1 Scope .7
2 Normative references.7
3 Definitions, symbols and abbreviations.7
3.1 Definitions .7
3.2 Symbols .9
3.3 Abbreviations .10
4 Functional description of the GSM half rate speech codec.11
4.1 GSM half rate speech encoder .11
4.1.1 High-pass filter .13
4.1.2 Segmentation .13
4.1.3 Fixed Point Lattice Technique (FLAT).13
4.1.4 Spectral quantization.15
4.1.4.1 Autocorrelation Fixed Point Lattice Technique (AFLAT) .15
4.1.5 Frame energy calculation and quantization.17
4.1.6 Soft interpolation of the spectral parameters .17
4.1.7 Spectral noise weighting filter coefficients.18
4.1.8 Long Term Predictor lag determination.19
4.1.8.1 Open loop long term search initialization.20
4.1.8.2 Open loop lag search.21
4.1.8.3 Frame lag trajectory search (Mode ≠ 0) .27
4.1.8.4 Voicing mode selection.29
4.1.8.5 Closed loop lag search .29
4.1.9 Harmonic noise weighting .31
4.1.10 Code search algorithm .32
4.1.10.1 Decorrelation of filtered basis vectors .33
4.1.10.2 Fast search technique .34
4.1.11 Multimode gain vector quantization.35
4.1.11.1 Coding GS and P0.36
4.2 GSM half rate speech decoder .39
4.2.1 Excitation generation.39
4.2.2 Adaptive pitch prefilter.39
4.2.3 Synthesis Filter.40
4.2.4 Adaptive spectral postfilter .40
4.2.5 Updating decoder states .42
5 Homing sequences .42
5.1 Functional description.42
5.2 Definitions .42
5.3 Encoder homing.43
5.4 Decoder homing .43
5.5 Encoder home state.43
5.6 Decoder home state .43
Annex A (normative): Codec parameter description .44
A.1 Codec parameter description .44
A.1.1 MODE .44
A.1.2 R0 .44
A.1.3 LPC1 - LPC3.45
A.1.4 LAG_1 - LAG_4 .45
A.1.5 CODEx_1 - CODEx_4 .45

Page 4
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
A.1.6 GSP0_1 - GSP0_4.45
A.2 Basic coder parameters. 46
Annex B (normative): Order of occurrence of the codec parameters over Abis. 47
Annex C (informative): Bibliography . 48
History. 49

Page 5
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
Foreword
This European Telecommunication Standard (ETS) has been produced by the Special Mobile Group
(SMG) Technical Committee of the European Telecommunications Standards Institute (ETSI).
This ETS specifies the half rate speech traffic channels for the European digital cellular
telecommunications system. This ETS corresponds to GSM technical specification, GSM 06.20, version
4.2.1 and is part 2 of a multi-part ETS covering the half rate speech traffic channels as described below:
GSM 06.02 ETS 300 581-1: "European digital cellular telecommunications system; Half rate
speech Part 1: Half rate speech processing functions".
GSM 06.20 ETS 300 581-2: "European digital cellular telecommunications system;
Half rate speech Part 2: Half rate speech transcoding".
GSM 06.21 ETS 300 581-3: "European digital cellular telecommunications system; Half rate
speech Part 3: Substitution and muting of lost frames for half rate speech traffic
channels".
GSM 06.22 ETS 300 581-4: "European digital cellular telecommunications system; Half rate
speech Part 4: Comfort noise aspects for half rate speech traffic channels".
GSM 06.41 ETS 300 581-5: "European digital cellular telecommunications system; Half rate
speech Part 5: Discontinuous Transmission (DTX) for half rate speech traffic
channels".
GSM 06.42 ETS 300 581-6: "European digital cellular telecommunications system; Half rate
speech Part 6: Voice Activity Detection (VAD) for half rate speech traffic
channels".
GSM 06.06 ETS 300 581-7: "European digital cellular telecommunications system; Half rate
speech Part 7: ANSI-C code for the GSM half rate speech codec".
GSM 06.07 ETS 300 581-8: "European digital cellular telecommunications system; Half rate
speech Part 8: Test vectors for the GSM half rate speech codec".
NOTE: TC-SMG has produced documents which give the technical specifications for the
implementation of the European digital cellular telecommunications system.
Historically, these documents have been identified as GSM Technical Specifications
(GSM-TS). These TSs may have subsequently become Interim European
Telecommunication Standards (I-ETSs), (Phase 1), or European Telecommunication
Standards (ETSs), (Phase 2), whilst others may become ETSI Technical Reports
(ETRs).
Transposition dates
Date of adoption of this ETS: 27 October 1995
Date of latest announcement of this ETS (doa): 31 March 1996
Date of latest publication of new National Standard
or endorsement of this ETS (dop/e): 30 September 1996
Date of withdrawal of any conflicting National Standard (dow): 30 September 1996

Page 6
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
Blank page
Page 7
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
1 Scope
This European Telecommunications Standard (ETS) specifies the speech codec to be used for the GSM
half rate channel. It also specifies the test methods to be used to verify that the codec implementation
complies with this ETS.
The requirements are mandatory for the codec to be used either in GSM Mobile Stations (MS)s or Base
Station Systems (BSS)s that utilise the half rate GSM speech traffic channel.
2 Normative references
This ETS incorporates by dated and undated reference, provisions from other publications. These
normative references are cited at the appropriate places in the text and the publications are listed
hereafter. For dated references, subsequent amendments to or revisions of any of these publications
apply to this ETS only when incorporated in it by amendment or revision. For undated references, the
latest edition of the publication referred to applies.
[1] GSM 06.02 (ETS 300 581-1): "European digital cellular telecommunications
system; Half rate speech Part 1: Half rate speech processing functions".
[2] GSM 06.06 (ETS 300 581-7): "European digital cellular telecommunications
system; Half rate speech Part 7: ANSI-C code for the GSM half rate speech
codec".
[3] GSM 06.07 (ETS 300 581-8): "European digital cellular telecommunications
system; Half rate speech Part 8: Test vectors for the GSM half rate speech
codec".
3 Definitions, symbols and abbreviations
3.1 Definitions
For the purpose of this ETS, the following definitions apply.
adaptive codebook: The adaptive codebook is derived from the long term filter state. The lag value can
be viewed as an index into the adaptive codebook.
adaptive pitch prefilter: In the GSM half rate speech decoder, this filter is applied to the excitation signal
to enhance the periodicity of the reconstructed speech. Note that this is done prior to the application of the
short term filter.
adaptive spectral postfilter: In the GSM half rate speech decoder, this filter is applied to the output of
the short term filter to enhance the perceptual quality of the reconstructed speech.
allowable lags: The set of lag values which may be coded by the GSM half rate speech encoder and
transmitted to the GSM half rate speech decoder. This set contains both integer and fractional values. See
table 3.
analysis window: For each frame, the short term filter coefficients are computed using the high pass
filtered speech samples within the analysis window. The analysis window is 170 samples in length, and is
centered about the last 100 samples in the frame.
basis vectors: A set of M, M1, or M2 vectors of length Ns used to generate the VSELP codebook
vectors. These vectors are not necessarily orthogonal.
closed loop lag search: A process of determining the near optimal lag value from the weighted input
speech and the long term filter state.
closed loop lag trajectory: For a given frame, the sequence of near optimal lag values whose elements
correspond to each of the four subframes as determined by the closed loop lag search.

Page 8
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
codebook: A set of vectors used in a vector quantizer.
Codeword (OR Code): An M, M1, or M2 bit symbol indicating the vector to be selected from a VSELP
codebook.
Delta (LAG) code: A four bit code indicating the change in lag value for a subframe relative to the
previous subframe's coded lag. For frames in which the long term predictor is enabled (MODE 1, 2, or 3),
the lag for subframe 1 is independently coded using eight bits, and delta codes are used for subframes 2,
3, and 4.
direct form coefficients: One of the formats for storing the short term filter parameters. All filters which
are used to modify speech samples use direct form coefficients.
fractional lags: A set of lag values having sub-sample resolution. Note that not every fractional lag value
considered in the GSM half rate speech encoder is an allowable lag value.
frame: A time interval equal to 20 ms, or 160 samples at an 8 kHz sampling rate.
harmonic noise weighting filter: This filter exploits the noise masking properties of the spectral peaks
which occur at harmonics of the pitch frequency by weighting the residual error less in regions near the
pitch harmonics and more in regions away from them. Note that this filter is only used when the long term
filter is enabled (MODE = 1, 2 or 3).
high pass filter: This filter is used to de-emphasize the low frequency components of the input speech
signal.
integer lags: A set of lag values having whole sample resolution.
interpolating filter: An FIR filter used to estimate sub-sample resolution samples, given an input sampled
with integer sample resolution.
lag: The long term filter delay. This is typically the pitch period, or a multiple or sub-multiple of it.
long term filter: This filter is used to generate the periodic component in the excitation for the current
subframe. This filter is only enabled for MODE = 1, 2 or 3.
LPC coefficients: Linear Predictive Coding (LPC) coefficients is a generic descriptive term for describing
the short term filter coefficients.
open loop lag search: A process of estimating the near optimal lag directly from the weighted speech
input. This is done to narrow the range of lag values over which the closed loop lag search shall be
performed.
open loop lag trajectory: For a given frame, the sequence of near optimal lag values whose elements
correspond to the four subframes as determined by the open loop lag search.
reflection coefficients: An alternative representation of the information contained in the short term filter
parameters.
residual: The output signal resulting from an inverse filtering operation.
short term filter: This filter introduces, into the excitation signal, short term correlation which models the
impulse response of the vocal tract.
soft interpolation: A process wherein a decision is made for each frame to use either interpolated or
uninterpolated short term filter parameters for the four subframes in that frame.
soft interpolation bit: A one bit code indicating whether or not interpolation of the short term parameters
is to be used in the current frame.

Page 9
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
spectral noise weighting filter: This filter exploits the noise masking properties of the formants (vocal
tract resonances) by weighting the residual error less in regions near the formant frequencies and more in
regions away from them.
subframe: A time interval equal to 5 ms, or 40 samples at an 8 kHz sampling rate.
vector quantization: A method of grouping several parameters into a vector and quantizing them
simultaneously.
GSP0 vector quantizer: The process of vector quantization, its intermediate parameters (GS and P0) for
the coding of the excitation gains β and γ.
VSELP codebook: Vector-Sum Excited Linear Predictive (VSELP) codebook, used in the GSM half rate
speech coder, wherein each codebook vector is constructed as a linear combination of the fixed basis
vectors.
zero input response: The output of a filter due to all past inputs, i.e. due to the present state of the filter,
given that an input of zeros is applied.
zero state response: The output of a filter due to the present input, given that no past inputs have been
applied, i.e. given the state information in the filter is all zeroes.
3.2 Symbols
For the purpose of this ETS, the following symbols apply.
A(z) Short term spectral filter.
α The LPC coefficients.
i
b (n) The output of the long term filter state (adaptive codebook) for lag L.
L
β The long term filter coefficient.
C(z) Second weighting filter.
e(n) Weighted error signal
th
f (i) The coefficients of the j phase of the 10th order interpolating filter used to
j
evaluate candidate fractional lag values; i ranges from 0 to P -1.
f
th
g (i) The coefficients of the j phase of the 6th order interpolating filter used to
j
interpolate C's and G's as well as fractional lags in the harmonic noise weighting;
i ranges from 0 to Pg-1.
γ The gain applied to the vector(s) selected from the VSELP codebook(s).
H A M2 bit code indicating the vector to be selected from the second VSELP
codebook (when operating in mode 0).
I A M or M1 bit code indicating the vector to be selected from one of the two first
VSELP codebooks.
L The long term filter lag value.
L 142 (samples), the maximum possible value for the long term filter lag.
max
L 21 (samples), the minimum possible value for the long term filter lag.
min
M 9, the number of basis vectors, and the number of bits in a codeword, for the
VSELP codebook used in modes 1, 2, and 3.
M1 7, the number of basis vectors, and the number of bits in a codeword, for the
first VSELP codebook used in mode 0.
M2 7, the number of basis vectors, and the number of bits in a codeword, for the
second VSELP codebook used in mode 0.
MODE A two bit code indicating the mode for the current frame. See annex A.
N 170, the length of the analysis window. This is the number of high pass filtered
A
speech samples used to compute the short term filter parameters for each
frame.
N 160, the number of samples per frame (at a sampling rate of 8 kHz).
F
N 10, the short term filter order.
p
N 40, the number of samples per subframe (at a sampling rate of 8 kHz).
s
P1 6, the number of bits in the prequantizer for the r1 - r3 vector quantizer.
P2 5, the number of bits in the prequantizer for the r4 - r6 vector quantizer.
P3 4, the number of bits in the prequantizer for the r7 - r10 vector quantizer.

Page 10
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
P The order of one phase of an interpolating filter used to evaluate candidate
f
fractional lag values. P equals 10 for j ≠ 0 and equal to 1 for j = 0.
f
P The order of one phase of an interpolating filter, f (n), used to interpolate C's and
g j
G's as well as fractional lags in the harmonic noise weighting, P equals 6.
g
pitch The time duration between the glottal pulses which result when the vocal chords
vibrate during speech production.
Q1 11, the number of bits in the r1 - r3 reflection coefficient vector quantizer.
Q2 9, the number of bits in the r4 - r6 reflection coefficient vector quantizer.
Q3 8, the number of bits in the r7 - r10 reflection coefficient vector quantizer.
R0 A five bit code used to indicate the energy level in the current frame.
r(n) The long term filter state (the history of the excitation signal); n < 0
r (n) The long term filter state with the adaptive codebook output for lag L appended.
L
s'(n) Synthesised speech.
W(z) Spectral weighting filter.
λ The harmonic noise weighting filter coefficient.
hnw
ξ The adaptive pitch prefilter coefficient.
x Ceiling function: the largest integer y where y < x + 1,0.
x Floor function: the largest integer y where y ≤ x.
K
xi Summation: x(j)+x(j+1)+.+x(K).
∑ ()
ij=
K
xibg Product: x(j)(x(j+1)).(x(K))

ij=
max(x,y) Find the larger of two numbers x and y.
min(x,y) Find the smaller of two numbers x and y.
round(x) Round the non-integer x to the closest integer yy:,=+x 05 y: y=x+0,5.
3.3 Abbreviations
For the purposes of this ETS the following abbreviations apply.
AFLAT Autocorrelation Fixed point LAttice Technique
CELP Code Excited Linear Prediction
FLAT Fixed Point Lattice Technique
LTP Long Term Predictor
SST Spectral Smoothing Technique
VSELP Vector-Sum Excited Linear Prediction

Page 11
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
4 Functional description of the GSM half rate speech codec
The GSM half rate codec uses the VSELP (Vector-Sum Excited Linear Prediction) algorithm. The VSELP
algorithm is an analysis-by-synthesis coding technique and belongs to the class of speech coding
algorithms known as CELP (Code Excited Linear Prediction).
The GSM half rate codec's encoding process is performed on a 20 ms speech frame at a time. A speech
frame of the sampled speech waveform is read and based on the current waveform and the past history of
the waveform, the codec encoder derives 18 parameters that describe it. The parameters extracted are
grouped into the following three general classes:
- energy parameters (R0 and GSP0);
- spectral parameters (LPC and INT_LPC);
- excitation parameters (LAG and CODE).
These parameters are quantised into 112 bits for transmission as described in annex A and their order of
occurrence over Abis is given in annex B.
The GSM half rate codec is an analysis-by-synthesis codec, therefore the speech decoder is primarily a
subset of the speech encoder. The quantised parameters are decoded and a synthetic excitation is
generated using the energy and excitation parameters. The synthetic excitation is then filtered to provide
the spectral information resulting in the generation of the synthesised speech. See figure 1.
Figure 1: Block diagram of the GSM half rate speech codec
The ANSI-C code that describes the GSM half rate speech codec is given in
GSM 06.06 (ETS 300 581-7) [2] and the test sequences in GSM 06.07 (ETS 300 581-8) [3]. See clause 5
for the codec homing test vectors.
4.1 GSM half rate speech encoder
The GSM half rate speech encoder uses an analysis by synthesis approach to determine the code to use
to represent the excitation for each subframe. The codebook search procedure consists of trying each
codevector as a possible excitation for the Code Excited Linear Predictive (CELP) synthesizer. The
synthesized speech s'(n) is compared against the input speech and a difference signal is generated. This
difference signal is then filtered by a spectral weighting filter, W(z), (and possibly a second weighting filter,
C(z)) to generate a weighted error signal, e(n). The power in e(n) is computed. The codevector which
generates the minimum weighted error power is chosen as the codevector for that subframe. The spectral
weighting filter serves to weight the error spectrum based on perceptual considerations. This weighting
filter is a function of the speech spectrum and can be expressed in terms of the α parameters of the short
term (spectral) filter.
N
p
−1
1− αz
∑ i
i=1
bg
Wz = (1)
N
p
−1
1− α%z
∑ i
i=1
Page 12
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
The computation of the %α coefficients is described in subclause 4.1.7.
i
The second weighting filter C(z), if used, is a harmonic weighting filter and is used to control the amount of
error in the harmonics of the speech signal. If the weighting filter(s) are moved to both input paths to the
subtracter, an equivalent configuration is obtained as shown in figure 2.
Figure 2: Block diagram of the GSM half rate speech encoder (MODE = 1,2 and 3)
Here H(z) is the combination of A(z), the short term (spectral) filter, and W(z), the spectral weighting filter.
These filters are combined since the denominator of A(z) is cancelled by the numerator of W(z).
Hz = (2)
()
N
p
~ −1
1− α z
∑ i
=1
i
There are two approaches that can be used for calculating the gain, γ. The gain can be determined prior to
codebook search based on residual energy. This gain would then be fixed for the codebook search.
Another approach is to optimize the gain for each codevector during the codebook search. The

Page 13
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
codevector which yields the minimum weighted error would be chosen and its corresponding optimal gain
would be used for γ. The latter approach generally yields better results since the gain is optimized for each
codevector. This approach also implies that the gain term needs to be updated at the subframe rate. The
optimal code and gain for this technique can be computed as follows:
The input speech is first filtered by a high pass filter as described in subclause 4.1.1. The short term filter
parameters are computed from the filtered input speech once per frame. A fast fixed point covariance
lattice technique is used. Subclauses 4.1.3 and 4.1.4 describes in detail how the short term parameters
are determined and quantized. An overall frame energy is also computed and coded once per frame.
Once per frame, one of the four voicing modes is selected. If MODE≠0, the long term predictor is used
and the long term predictor lag, L, is updated at the subframe rate. L and a VSELP codeword are selected
sequentially. Each is chosen to minimize the weighted mean square error. The long-term filter coefficient,
β, and the codebook gain, γ, are optimized jointly. Subclause 4.1.8 describes the technique for selecting
from among the voicing modes and, if one of voiced modes is chosen, determining the long-term filter lag.
Subclause 4.1.10 describes an efficient technique for jointly optimizing β, γ and the codeword selection.
Subclause 4.1.10 also includes the description of the fast VSELP codebook search technique. The β and γ
parameters are transformed to equivalent parameters using the frame energy term, and are vector
quantized every subframe. The coding of the frame energy and the β and γ parameters is described in
subclause 4.1.11.
4.1.1 High-pass filter
The 13 bit linear Pulse Code Modulated (PCM) input speech, x(n), is filtered by a fourth order pole-zero
high pass filter. This filter suppresses the frequency components of the input speech which are below 120
Hz. The filter is implemented as a cascade of two second-order Infinite Impulse Response (IIR) filters.
Incorporated into the filter coefficients is a gain of 0,5. The difference equation for the first filter is:
2 2
~~
yn=−b x n i+ a y n− j (3)
() ( ) ( )
∑∑1,,i 1 j
i=−0 j1
where:
b = 0,335052
b = -0,669983 a = 0,926117
11 11
b = 0,335052 a = -0,429413
12 12
The difference equation for the second filter is:
2 2
~
yn()=−b y(n i)+ a y n− j (4)
()
2,,i 2 j
∑∑
i==0 j1
where:
b = 0,335052
b = -0,669434 a = 0,965332
21 21
b = 0,335052 a = -0,469513
22 22
4.1.2 Segmentation
A sample buffer containing the previous 195 input high pass filtered speech samples, y(n), is shifted so
that the oldest 160 samples are shifted out while the next 160 input samples are shifted in. The oldest 160
samples in the buffer correspond to the next frame of samples to be encoded. The analysis interval
comprises the most recent 170 samples in the buffer. The samples in the buffer are labelled as s(n) where
0≤n≤194 and s(0) is the first (oldest) sample.
4.1.3 Fixed Point Lattice Technique (FLAT)
th
Let r represent the j reflection coefficient. The FLAT algorithm for the determination of the reflection
j
coefficients is stated as follows:

Page 14
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
STEP 1 Compute the covariance (autocorrelation) matrix from the input speech:
N
A
φbgik,=+sbn 24−igsbn+ 24−kg 0 ≤ i, k ≤ N (5)

p
nN=
p
STEP 2
The φ(i,k) array is modified by windowing
bg bgc h
φφ',ik=−ik, w i k 0 ≤ i, k ≤ N (6)
p
STEP 3
Fibg,'k =φbgi,k 0 ≤ i, k ≤ N - 1 (7)
0 p
bg b g
Bi,'k=+φ i11,k+ 0 ≤ i, k ≤ N - 1 (8)
0 p
bg b g
Ci,'k=+φ i,k 1 0 ≤ i, k ≤ N - 1 (9)
0 p
STEP 4 set j = 1
STEP 5 Compute r
j
bg d i
CC00,,+−NjN−j
jj−−11p p
r =−2 (10)
j
bgbgdidi
FB00,,++00FN−j,N−j+BN−j,N−j
jj−−11 j−1p p j−1p p
STEP 6
If j = N then done.
P
STEP 7
Update F (i,k), B (i,k), C (i,k) 0 ≤ i, k ≤ N -j-1
j j j P
Fbgi,,k=+F bgik rdiC bgik,+C bgk,i+r Bbi,kg (11)
jj−−11jj j−1 jj−1
Bbi,,kg=Bbgik+11+ +rdiCbgik++11, +Cbgk+1,i+1+rFbi++11,kg, (12)
j j−−11jjjj−1 j−1
Cbi,,kg=+Cbgik11+rdiBbgik,++Fbgi,k+1+r Cbk+1,ig (13)
jj−−11jj j−1 jj−1
STEP 8 j = j+1
STEP 9 go to step 5.
The windowing coefficients, w(|i-k|), are found in the table 1.
Table 1: Windowing coefficients
w(0) 0,998966 w(5) 0,974915
w(1) 0,996037 w(6) 0,969054
w(2) 0,991663 w(7) 0,963060
w(3) 0,986399 w(8) 0,956796
w(4) 0.980722 w(9) 0,950127
This algorithm can be simplified by noting that the φ', F and B matrices are symmetric such that only the
upper triangular part of the matrices need to be computed or updated. Also, step 7 is done so that F (i,k),
j
B (i-1,k-1), C (i,k-1), and C (k,i-1) are updated together and common terms are computed once and the
j j j
recursion is done in place.
Page 15
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
4.1.4 Spectral quantization
A three segment vector quantizer of the reflection coefficients is employed. A reduced complexity search
technique is used to select the vector of reflection coefficients for each segment. The reflection coefficient
vector quantizer codebooks are stored in compressed form to minimize their memory requirements.
The three segments of the vector quantizer span reflection coefficients r r , r r , and r - r
1 3 4 6 7 10
respectively. The bit allocations for the vector quantizer segments are:
Q 11 bits
Q 9 bits
Q 8 bits
A reflection coefficient vector prequantizer is used at each segment. The prequantizer size at each
segment is:
P 6 bits
P 5 bits
P 4 bits
At a given segment, the residual error due to each vector from the prequantizer is computed and stored in
temporary memory. This list is searched to identify the four prequantizer vectors which have the lowest
distortion. The index of each selected prequantizer vector is used to calculate an offset into the vector
quantizer table at which the contiguous subset of quantizer vectors associated with that prequantizer
vector begins. The size of each vector quantizer subset at the k-th segment is given by:
Q
k
S =     (14)
k
P
k
The four subsets of quantizer vectors, associated with the selected prequantizer vectors, are searched for
the quantizer vector which yields the lowest residual error. Thus at the first segment, 64 prequantizer
vectors and 128 quantizer vectors are evaluated, 32 prequantizer vectors and 64 quantizer vectors are
evaluated at the second segment, and 16 prequantizer vectors and 64 quantizer vectors are evaluated at
the third segment.
4.1.4.1 Autocorrelation Fixed Point Lattice Technique (AFLAT)
An autocorrelation version of the FLAT algorithm, AFLAT, is used to compute the residual error energy for
a reflection coefficient vector being evaluated. Compute the autocorrelation sequence R(i), from the
optimal reflection coefficients, r , over the range 0 ≤ i ≤ N .
j p
STEP 1 Define the initial conditions for the AFLAT recursion:
bg bg
Pi=≤Ri,01i≤N− (15)
0 p
bg c h
Vi=+R i 11, −N≤i≤N−1 (16)
0 pp
STEP 2 Initialize k, the vector quantizer segment index:
k = 1 (17)
STEP 3 Let I (k) be the index of the first lattice stage in the k-th segment, and I (k) be the index of the
l h
last lattice stage in the k-th segment.

Page 16
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
STEP 4 Initialize j, the index of the lattice stage, to point to the beginning of the k-th segment:
jI= bgk (18)
l
STEP 5 Set the initial conditions P and V to:
j-1 j-1
Pibg=≤Pibg,0 i≤I bgk−Ibgk (19)
−−11
jj h l
Vibg=−V big, I bgk+Ibgk≤i≤I bgk−Ibgk (20)
−−1
jji h l h l
STEP 6 Compute the values of V and P arrays using:
j j
$$
Pibg=+e10r jP bgi+r V bgi+V b−ig, ≤i≤I bkg−j−1 (21)
jjj−−11jj j−1 h
Vibg=+V bi11g+r$$V b−i− g+2rP ci+1h,1+j−N≤i≤N−j−1 (22)
jj−−i jji jj−1 p p
STEP 7 Increment j:
j = j+1
STEP 8 If j < I (k) go to STEP 6.
h
STEP 9
The residual error out of lattice stage I (k), given the reflection coefficient vector $r , is
h
computed using equation (21):
EP= bg0 (23)
r
Ikbg
h
STEP 10 Using the AFLAT recursion outlined, the residual error due to each vector from the
prequantizer at the k-th segment is evaluated, the four subsets of quantizer vectors to be
searched are identified, and residual error due to each quantizer vector from the selected four
%r
subsets is computed. The index of , the quantizer vector which minimized E over all the
r
quantizer vectors in the four subsets, is encoded with Q bits.
k
STEP 11 If k < 3 then the initial conditions for doing the recursion at segment k+1 need to be computed.
Set j, the lattice stage index, equal to:
jI= bgk (24)
l
STEP 12 Compute:
bg e10%%j bg bg b g,1
Pi=+r P i+r V i+V −i ≤i≤N −j− (25)
jjj−−11jj j−1 p
bg b g%%b g c h
Vi=+V i11+rV −i−+2rP i+1,1+j−N≤i≤N−j−1 (26)
jj−−i jji jj−1 p p
STEP 13 Increment j,
j = j+1
STEP 14
If j ≤ I (k) go to STEP 12
h
STEP 15 Increment k, the vector quantizer segment index:
k=k+1
Page 17
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
STEP 16 If k ≤ 3 go to STEP 4.
Otherwise, the indices of the reflection coefficient vectors for the three segments have been
chosen, and the search of the reflection coefficient vector quantizer is terminated.
To minimize the storage requirements for the reflection coefficient vector quantizer, eight bit codes for the
individual reflection coefficients are stored in the vector quantizer table, instead of the actual reflection
coefficient values. The codes are used to look up the values of the reflection coefficients from a scalar
quantization table with 256 entries.
4.1.5 Frame energy calculation and quantization
The unquantized value of R0, R(0), is computed during the computation of the short term predictor
parameters.
φφ0,0 + 10,10
() ( )
R0 = (27)
()
where φ(i,k) is defined by equation (5). R(0) is then converted into dB relative to full scale (full scale,
R , is defined as the square of the maximum sample amplitude).
max
bg
F R 0 I
R = 10log (28)
dB 10G J
R
H K
max
R is then quantized to 32 levels. The 32 quantized values for R range from a minimum of -66
dB dB
(corresponding to a code of 0 for R0) to a maximum of -4 (corresponding to a code of 31 for R0). The step
size of the quantizer is 2 (2 dB steps). R0 is chosen as:
R0 which minimizes abs(R0 - (R + 66)/2) (29)
dB
where R0 can take on the integer values from 0 to 31 corresponding to the 32 codes for R0.
Decoding of the R0 code is given by:
(2R0)−66 /10
()
RR01= 0 (30)
()
max
4.1.6 Soft interpolation of the spectral parameters
Interpolation of the short term filter parameters improves the performance of the GSM half rate encoder.
The direct form filter coefficients (α's), which correspond to quantized reflection coefficients, are the
i
spectral parameters used for interpolation. The GSM half rate speech encoder uses either an interpolated
set of α's or an uninterpolated set of α's, choosing the set which gives better prediction gain for the
i i
frame.
Two sets of LPC coefficient vectors are generated: the first corresponds to the interpolated coefficients,
the second to the uninterpolated coefficients. The frame's speech samples are inverse filtered using each
of the two coefficient sets, and the residual frame energy corresponding to each set is computed. The
coefficient set yielding the lower frame residual energy is then selected to be used. If the residual energies
are equal, the uninterpolated coefficient set is used. INT_LPC, a soft interpolation bit, is set to 1 when
interpolation is selected or to 0 otherwise.
To generate the interpolated coefficient set, the coder interpolates the α 's for the first, second, and third
i
subframes of each frame. The fourth subframe uses the uninterpolated α 's for that frame.
i
The interpolation is done as follows. Let α be the direct-form LPC coefficients corresponding to the last
i,L
frame, α be the direct-form LPC coefficients corresponding to the current frame, and Del to be the
i,C
interpolation curve used. The interpolated direct-form LPC coefficient vector at the j-th subframe of the
current frame, α , is given by:
i,j
Page 18
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
αα=+ Del(,j INT_ SOFT)(α−α ), 1 ≤ i ≤ Np, 1 ≤ j ≤ 4 (31)
i,,j iL i,c iL,
The values of the interpolation curve Del are given in table 2.
Table 2: Values of the interpolation curve Del
j Del(j,0) Del(j,1)
1 0,0 0,30
2 1,0 0,62
3 1,0 0,92
4 1,0 1,00
From this point on, the subframe index j is omitted for simplicity when referring to α coefficients, although
i,j
it is implied. For interpolated subframes, the α 's are converted to reflection coefficients to check for filter
i
stability. If the resulting filter is unstable, then uninterpolated coefficients are used for that subframe. The
uninterpolated coefficients used for subframe 1 are the previous frame's coefficients. The uninterpolated
coefficients used for subframes 2, 3, and 4 are the current frame's coefficients.
4.1.7 Spectral noise weighting filter coefficients
To exploit the noise masking potential of the formants, spectral noise weighting is applied. The
computation of the %α coefficients, used by spectral noise weighting filters W(z) and H(z), is now
i
described. Define an impulse sequence δ(n) over N samples:
s
δ()01= ,0
δ()n =00, (32)
where 1≤n≤N -1 and h (n) is the zero-state response of the cascade of three filters to δ(n). The three
s 3
filters are an LPC synthesis filter, an inverse filter using a weighting factor of 0,93 and a synthesis filter
with a weighting factor of 0,7. In equation form:
Np
hn()=+δα()n h(n−i) 0 ≤ n ≤ N -1 (33)
11∑ i s
i=1
Np
i
h()n=−hn()(09,3)αhn(−i)0 ≤ n ≤ N -1 (34)
21∑i1 s
i=1
Np
i
hn()=+h (n) (07, )αhn(−i), 0 ≤ n ≤ N -1 (35)
32 ∑ i3 s
i=1
where α 's are the direct form LP coefficients. The autocorrelation sequence of h (n) is calculated using:
i 3
Ns−1
R ()i=−hn( )hn( i), 0 ≤ i ≤ N
h∑33 p (36)
ni=
From Ri() the reflection coefficients which define the combined spectrally noise weighted synthesis
h
filter are computed using the AFLAT recursion once per frame.

Page 19
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
STEP 1 Define the initial conditions for the AFLAT recursion:
Pi()=R (i), 0 ≤ i ≤ N -1 (37)
0 h p
Vi()=+Rchi 1 , 1-N ≤ i ≤ N -1 (38)
0 h p p
STEP 2 Initialize j, the index of the lattice stage, to point to the first lattice stage:
j = 1
STEP 3 Compute r , the j-th reflection coefficient, using:
j
V ()0
−1
j
r =− (39)
j
P ()0

j 1
STEP 4 Given r , update the values of V and P arrays using:
j j j
Pi()=+ej1 r P ()i+r V ()i+V (−i), 0 ≤ i ≤ N - j - 1 (40)
jjj−−11jj j−1 p
Vi()=+V bgi11+r V (−i− )+2rP ci+1h, 1 + j - N ≤ i ≤ N - j -1 (41)
jj−−1 jj11jj− p p
STEP 5 Increment j:
j = j+1
STEP 6 If j ≤ N go to STEP 3, otherwise all N reflection coefficients have been obtained.
p p
STEP 7
The reflection coefficients, r , are then converted to direct-form LPC filter coefficients,%α for
j
i
implementing the combined spectrally noise weighted synthesis filter H(z) and the filter
W(z).
The method for the spectral noise weighting filter coefficient update mimicks how the direct form LPC filter
coefficients are updated at subframes of a frame (subclause 4.1.6). No stability check of interpolated
spectral noise weighting filter coefficients is done at subframes 1, 2, or 3 if the interpolation flag,
INT_LPC="1", but if uninterpolated coefficients are used at subframes 1, 2, and/or 3 due to instability of
the unweighted coefficients (INT_LPC = "0"), uninterpolated weighting filter coefficients are also used at
those subframes.
4.1.8 Long Term Predictor lag determination
Figure 3 illustrates that the long term lag optimization looks just like a codebook search where the
codebook is defined by the long term filter state and the specific vector in the codebook is pointed to by
the long term predictor lag, L. The input p(n) is the weighted input speech for the subframe minus the zero
input response of just the H(z) filter.
Figure 3: Long term predictor lag search

Page 20
ETS 300 581-2: November 1995 (GSM 06.20 version 4.2.1)
The GSM half rate speech encoder uses a combination of open loop and closed loop techniques in
choosing the long term predictor lag. First an open loop search is conducted to determine "candidate" lags
at each subframe. Then at most, two best candidate lags at each subframe are selected, with each
serving as an anchor point for constructing an open loop frame lag trajectory, subject to a maximum delta
coding constraint. The frame lag trajectory which minimizes the open loop LTP spectrally weighted error
energy for the frame is then chosen. The open loop LTP prediction gains corresponding to the winning
trajectory are used to select the voicing mode 1, 2 or 3.
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...