Terrestrial Trunked Radio (TETRA); Speech codec for full-rate traffic channel; Part 2: TETRA codec

DE/TETRA-00002-2

Radijska oprema in sistemi (RES) – Vseevropski snopovni radio – Govorni kodek za prometni kanal s polno hitrostjo – 2. del: Kodek TETRA

General Information

Status
Published
Publication Date
05-Dec-1996
Current Stage
12 - Completion
Due Date
13-Dec-1996
Completion Date
06-Dec-1996
Mandate
Standard
ETS 300 395-2 E1:2003
English language
87 pages
sale 10% off
Preview
sale 10% off
Preview
e-Library read for
1 day

Standards Content (Sample)


SLOVENSKI STANDARD
01-december-2003
Radijska oprema in sistemi (RES) – Vseevropski snopovni radio – Govorni kodek
za prometni kanal s polno hitrostjo – 2. del: Kodek TETRA
Terrestrial Trunked Radio (TETRA); Speech codec for full-rate traffic channel; Part 2:
TETRA codec
Ta slovenski standard je istoveten z: ETS 300 395-2 Edition 1
ICS:
33.070.10 Prizemni snopovni radio Terrestrial Trunked Radio
(TETRA) (TETRA)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.

EUROPEAN ETS 300 395-2
TELECOMMUNICATION December 1996
STANDARD
Source: ETSI TC-RES Reference: DE/RES-06002-2
ICS: 33.060, 30.060.50
Key words: TETRA, codec
Radio Equipment and Systems (RES);
Trans-European Trunked Radio (TETRA);
Speech codec for full-rate traffic channel;
Part 2: TETRA codec
ETSI
European Telecommunications Standards Institute
ETSI Secretariat
Postal address: F-06921 Sophia Antipolis CEDEX - FRANCE
Office address: 650 Route des Lucioles - Sophia Antipolis - Valbonne - FRANCE
X.400: c=fr, a=atlas, p=etsi, s=secretariat - Internet: secretariat@etsi.fr
Tel.: +33 4 92 94 42 00 - Fax: +33 4 93 65 47 16
Copyright Notification: No part may be reproduced except as authorized by written permission. The copyright and the
foregoing restriction extend to reproduction in all media.
© European Telecommunications Standards Institute 1996. All rights reserved.

Page 2
ETS 300 395-2: December 1996
Whilst every care has been taken in the preparation and publication of this document, errors in content,
typographical or otherwise, may occur. If you have comments concerning its accuracy, please write to
"ETSI Editing and Committee Support Dept." at the address shown on the title page.

Page 3
ETS 300 395-2: December 1996
Contents
Foreword .7
1 Scope .9
2 Normative references.9
3 Abbreviations.9
4 Full rate codec.9
4.1 Structure of the codec.9
4.2 Functional description of the codec .12
4.2.1 Pre-and post-processing .12
4.2.2 Encoder .12
4.2.2.1 Short-term prediction .13
4.2.2.2 LP to LSP and LSP to LP conversion.14
4.2.2.3 Quantization and interpolation of LP parameters .16
4.2.2.4 Long-term prediction analysis.17
4.2.2.5 Algebraic codebook: structure and search .18
4.2.2.6 Quantization of the gains.21
4.2.2.7 Detailed bit allocation.23
4.2.3 Decoder.23
4.2.3.1 Decoding process.24
4.2.3.1.1 Decoding of LP filter parameters .24
4.2.3.1.2 Decoding of the adaptive codebook
vector .24
4.2.3.1.3 Decoding of the innovation vector.25
4.2.3.1.4 Decoding of the adaptive and
innovative codebook gains.25
4.2.3.1.5 Computation of the reconstructed
speech .25
4.2.3.2 Error concealment .25
5 Channel coding for speech.26
5.1 General .26
5.2 Interfaces in the error control structure.26
5.3 Notations.28
5.4 Definition of sensitivity classes and error control codes .28
5.4.1 Sensitivity classes .28
5.4.2 CRC codes.28
5.4.3 16-state RCPC codes.30
5.4.3.1 Encoding by the 16-state mother code of rate 1/3.30
5.4.3.2 Puncturing of the mother code .30
5.5 Error control scheme for normal speech traffic channel.31
5.5.1 CRC code.31
5.5.2 RCPC codes.31
5.5.2.1 Puncturing scheme of the RCPC code of rate 8/12 (equal
to 2/3).31
5.5.2.2 Puncturing scheme of the RCPC code of rate 8/18 .31
5.5.3 Matrix Interleaving .32
5.6 Error control scheme for speech traffic channel with frame stealing activated .34
5.6.1 CRC code.34
5.6.2 RCPC codes.35
5.6.2.1 Puncturing scheme of the RCPC code of rate 8/17 .36
5.6.3 Interleaving.36
6 Channel decoding for speech.36
6.1 General .36

Page 4
ETS 300 395-2: December 1996
6.2 Error control structure . 36
7 Codec performance. 37
8 Bit exact description of the TETRA codec. 37
Annex A (informative): Implementation of speech channel decoding . 39
A.1 Algorithmic description of speech channel decoding. 39
A.1.1 Definition of error control codes . 39
A.1.1.1 16-state RCPC codes. 39
A.1.1.1.1 Obtaining the mother code from punctured code . 39
A.1.1.1.2 Viterbi decoding of the 16-state mother code of the rate
1/3 . 39
A.1.1.2 CRC codes . 40
A.1.1.3 Type-4 bits . 40
A.1.2 Error control scheme for normal speech traffic channel . 40
A.1.2.1 Matrix Interleaving . 40
A.1.2.2 RCPC codes. 40
A.1.2.2.1 Puncturing scheme of the RCPC code of rate 8/12 (equal
to 2/3). 41
A.1.2.2.2 Puncturing scheme of the RCPC code of rate 8/18. 41
A.1.2.3 CRC code . 41
A.1.2.4 Speech parameters . 41
A.1.3 Error control scheme for speech traffic channel with frame stealing activated. 41
A.1.3.1 Interleaving . 41
A.1.3.2 RCPC codes. 41
A.1.3.2.1 Puncturing scheme of the RCPC code of rate 8/17. 42
A.1.3.3 CRC code . 42
A.1.3.4 Speech parameters . 42
A.2 C Code for speech channel decoding . 42
Annex B (informative): Indexes . 43
B.1 Index of C code routines. 43
B.2 Index of files. 46
Annex C (informative): Bibliography . 47
Annex D (informative): Codec performance . 48
D.1 General. 48
D.2 Quality. 48
D.2.1 Subjective speech quality. 48
D.2.1.1 Description of characterization tests. 48
D.2.1.2 Absolute speech quality. 48
D.2.1.3 Effect of input level . 48
D.2.1.4 Effect of input frequency characteristic. 48
D.2.1.5 Effect of transmission errors. 48
D.2.1.6 Effect of tandeming. 49
D.2.1.7 Effect of acoustic background noise. 49
D.2.1.8 Effect of vocal effort. 49
D.2.1.9 Effect of frame stealing. 49
D.2.1.10 Speaker and language dependency . 49
D.2.2 Comparison with analogue FM. 49
D.2.2.1 Analogue and digital systems results . 49
D.2.2.2 All conditions. 50
D.2.2.3 Input level. 50
D.2.2.4 Error patterns. 51
D.2.2.5 Background noise. 51

Page 5
ETS 300 395-2: December 1996
D.2.3 Additional tests.51
D.2.3.1 Types of signals .51
D.2.3.2 Codec behaviour .51
D.3 Performance of the channel coding/decoding for speech.52
D.3.1 Classes of simulation environment conditions.52
D.3.2 Classes of equipment .52
D.3.3 Classes of bits .53
D.3.4 Channel conditions .53
D.3.5 Results for normal case.53
D.4 Complexity.54
D.4.1 Complexity analysis .54
D.4.1.1 Measurement methodology.54
D.4.1.2 TETRA basic operators.54
D.4.1.3 Worst case path for speech encoder .56
D.4.1.4 Worst case path for speech decoder .57
D.4.1.5 Condensed complexity values for encoder and decoder .58
D.4.2 DSP independence.59
D.4.2.1 Program control structure.59
D.4.2.2 Basic operator implementation.59
D.4.2.3 Additional operator implementation.59
D.5 Delay .59
Annex E (informative): Results of the TETRA codec characterization listening and complexity tests.60
E.1 Characterization listening test .60
E.1.1 Experimental conditions.60
E.1.2 Tables of results .61
E.2 TETRA codec complexity study .70
E.2.1 Computational analysis results .70
E.2.1.1 TETRA speech encoder.70
E.2.1.2 TETRA speech decoder.78
E.2.1.3 TETRA channel encoder and decoder.81
E.2.2 Memory requirements analysis results .83
E.2.2.1 TETRA speech encoder.83
E.2.2.2 TETRA speech decoder.84
E.2.2.3 TETRA speech channel encoder .84
E.2.2.4 TETRA speech channel decoder .85
Annex F (informative): Description of attached computer files .86
F.1 Directory C-WORD.86
F.2 Directory C-CODE.86
History.87

Page 6
ETS 300 395-2: December 1996
Blank page
Page 7
ETS 300 395-2: December 1996
Foreword
This European Telecommunication Standard (ETS) has been produced by the Radio Equipment and
Systems (RES) Technical Committee of the European Telecommunications Standards Institute (ETSI).
This ETS consists of four parts as follows:
Part 1: "General description of speech functions".
Part 2: "TETRA codec".
Part 3: "Specific operating features".
Part 4: "Codec conformance testing".
Clause 4 provides a complete description of the full rate speech source encoder and decoder, whilst
clause 5 describes the speech channel encoder, and clause 6 the speech channel decoder.
Clause 7 describes the codec performance.
Finally, clause 8 introduces the bit exact description of the codec. This description is given as an
ANSI C code, fixed point, bit exact. The whole C code corresponding to the TETRA codec is given in
computer files attached to this ETS, and are an integral part of this ETS.
In addition to these clauses, five informative annexes are provided.
Annex A describes a possible implementation of the speech channel decoding function.
Annex B provides comprehensive indexes of all the routines and files included in the C code associated
with this ETS.
Annex C lists informative references relevant to the speech codec.
Annex D describes the actual quality, performance and complexity aspects of the codec.
Annex E reports detailed results from codec characterization listening and complexity tests.
Annex F contains instructions for the use of the attached electronic files.
Transposition dates
Date of adoption 22 November 1996
Date of latest announcement of this ETS (doa): 31 March 1997
Date of latest publication of new National Standard
or endorsement of this ETS (dop/e): 30 September 1997
Date of withdrawal of any conflicting National Standard (dow): 30 September 1997

Page 8
ETS 300 395-2: December 1996
Blank page
Page 9
ETS 300 395-2: December 1996
1 Scope
This European Telecommunication Standard (ETS) contains the full specification of the speech codec for
use in the Trans-European Trunked Radio (TETRA) system.
2 Normative references
This ETS incorporates by dated and undated reference, provisions from other publications.
These normative references are cited at the appropriate places in the text and the publications are listed
hereafter. For dated references, subsequent amendments to or revisions of any of these publications
apply to this ETS only when incorporated in it by amendment or revision. For undated references the latest
edition of the publication referred to applies.
[1] ETS 300 392-2: "Radio Equipment and Systems (RES); Trans-European
Trunked Radio (TETRA) system; Voice plus Data; Part 2: Air Interface".
[2] CCITT Recommendation P.48 (1988): "Specifications for an Intermediate
Reference System".
3 Abbreviations
For the purposes of this ETS, the following abbreviations apply:
ACELP Algebraic CELP
ANSI American National Standards Institute
BER Bit Error Ratio
BFI Bad Frame Indicator
BS Base Station
CELP Code-Excited Linear Predictive
CRC Cyclic Redundancy Code
DSP Digital Signal Processor
DTMF Dual Tone Multiple Frequency
EP Error Pattern
FIR Finite Impulse Response
IRS Intermediate Reference System
LP Linear Prediction
LPC Linear Predictive Coding
LSF Line Spectral Frequency
LSP Line Spectral Pair
MER Message Error Rate
MNRU Multiplicative Noise Reference Unit
MOS Mean Opinion Score
MS Mobile Station
MSE Mean Square Error
PDF Probability Density Function
PUEM Probability of Undetected Erroneous Message
RCPC Rate-Compatible Punctured Convolutional
RF Radio Frequency
VQ Vector Quantization
4 Full rate codec
4.1 Structure of the codec
The TETRA speech codec is based on the Code-Excited Linear Predictive (CELP) coding model. In this
model, a block of N speech samples is synthesized by filtering an appropriate innovation sequence from a
codebook, scaled by a gain factor g , through two time varying filters. A simplified high level block
c
diagram of this synthesis process, as implemented in the TETRA codec, is shown in figure 1.

Page 10
ETS 300 395-2: December 1996
Digital
Input
Algebraic codebook index
D
E
Pitch delay
M
U
L
T
GAIN PREDICTION
Gains I
P
AND VQ
L
E
Past
X
Excitation
g
T
p
ADAPTIVE
LPC Info
CODEBOOK
SHORT-TERM
LONG-TERM SYNTHESIS FILTER Output
SYNTHESIS FILTER
Speech
g
k
c
ALGEBRAIC
CODEBOOK
Figure 1: High level block diagram of the TETRA speech synthesizer
The first filter is a long-term prediction filter (pitch filter) aiming at modelling the pseudo-periodicity in the
speech signal and the second is a short-term prediction filter modelling the speech spectral envelope.
The long-term or pitch, synthesis filter is given by:
= (1)

T
Bz
()
1−gz
p
where T is the pitch delay and g is the pitch gain. The pitch synthesis filter is implemented as an adaptive
p
codebook, where for delays less than the sub-frame length the past excitation is repeated.
The short-term synthesis filter is given by:
Hz==
() (2)
p
Az
()
−i
1+∑az
i
i=1
where ai,,=1.,p, are the Linear Prediction (LP) parameters and p is the predictor order. In the
i
TETRA codec p shall be 10.
The TETRA encoder uses an analysis-by-synthesis technique to determine the pitch and excitation
codebook parameters. The simplified block diagram of the TETRA encoder is shown in figure 2.

Page 11
ETS 300 395-2: December 1996
Input
Speech
LPC ANALYSIS
Unquantized
QUANTIZATION
LPC info
& INTERPOLATION
T
OPEN LOOP PERCEPTUAL
PITCH ANALYSIS WEIGHTING
Past
Excitation
g
T p
ADAPTIVE LPC Info
CODEBOOK
SHORT-TERM
SYNTHESIS FILTER
g
k c
ALGEBRAIC
CODEBOOK
PERCEPTUAL
MSE SEARCH
WEIGHTING
Gains M
GAIN VQ
U
L
Pitch delay (T)
T
I
Codebook index (k)
P
L
LPC info Digital
E
Output
X
Figure 2: High level block diagram of the TETRA speech encoder
In this analysis-by-synthesis technique, the synthetic speech is computed for all candidate innovation
sequences retaining the particular sequence that produces the output closer to the original signal
according to a perceptually weighted distortion measure. The perceptual weighting filter de-emphasizes
the error at the formant regions of the speech spectrum and is given by:
Az
()
Wz = (3)
()
Az γ
()
where A(z) is the LP inverse filter (as in Equation (2)) and 01<≤γ . The value γ =08, 5 shall be used.
Both the weighting filter, Wz , and formant synthesis filter, Hz , shall use the quantized LP
() ()
parameters.
In the Algebraic CELP (ACELP) technique, special innovation codebooks having an algebraic structure
are used. This algebraic structure has several advantages in terms of storage, search complexity, and
robustness. The TETRA codec shall use a specific dynamic algebraic excitation codebook whereby the
fixed excitation vectors are shaped by a dynamic shaping matrix (see annex C {1}). The shaping matrix is
a function of the LP model Az , and its main role is to shape the excitation vectors in the frequency
()
domain so that their energies are concentrated in the important frequency bands. The shaping matrix
used is a Toeplitz lower triangular matrix constructed from the impulse response of the filter:
Az/γ
()
Fz = (4)
()
Az/γ
()
where Az is the LP inverse filter. The values γ =07, 5 and γ =08, 5 shall be used.
()
1 2
Page 12
ETS 300 395-2: December 1996
In the TETRA codec, 30 ms speech frames shall be used. It is required that the short-term prediction
parameters (or LP parameters) are computed and transmitted every speech frame. The speech frame
shall be divided into 4 sub-frames of 7,5 ms (60 samples). The pitch and algebraic codebook parameters
have also to be transmitted every sub-frame.
Table 1 gives the bit allocation for the TETRA codec. 137 bits shall be produced for each frame of 30 ms
resulting in a bit rate of 4 567 bit/s.
Table 1: Bit allocation for the TETRA codec
Parameter 1st subframe 2nd subframe 3rd subframe 4th subframe Total per frame
LP filter 26
Pitch delay 8 5 5 5 23
Algebraic code 16 16 16 16 64
VQ of 2 gains 6 6 6 6 24
Total 137
More details about the sequence of bits within the speech frame of 137 bits per 30 ms, with reference to
the speech parameters, can be found in subclause 4.2.2.7, table 3.
4.2 Functional description of the codec
4.2.1 Pre-and post-processing
Before starting the encoding process, the speech signal shall be pre-processed using the offset
compensation filter:
−1
 
1 1−z
 
Hz= (5)
()
p
 
−1
1−αz
 
where α = 32 735/32 768. In the time domain, this filter corresponds to:
''
s()n=−sn()//21sn(− )2+αs(n−1) (6)
'
where sn() is the input signal and sn() is the pre-processed signal. The purpose of this pre-processing
is firstly to remove the dc from the signal (offset compensation), and secondly, to scale down the input
signal in order to avoid saturation of the synthesis filtering.
At the decoder, the post-processing consists of scaling up the reconstructed signal (multiplication by 2
with saturation control).
4.2.2 Encoder
Figure 3 presents a detailed block diagram of the TETRA encoder illustrating the major parts of the codec
as well as signal flow. On this figure, names appearing at the bottom of the various building blocks
correspond to the C code routines associated with this ETS.

Page 13
ETS 300 395-2: December 1996
Input
OFFSET
Speech INTERPOLATION
COMPENSATION LSP
Pre-processing FOR THE 4
AND DIVISION QUANTIZATION
s(n) ^
SUBFRAMES
A(z)
BY 2
Int_Lpc4 Lsp_Az Clsp_334
Pre_Process
s'(n)
f
WINDOWING
r LEVINSON
AND
a LPC analysis DURBIN A(z)       LSP
AUTOCORRELATION
m R [ ]       A(z)
R [ ]
Lag_Window Az_Lsp
e Autocorr Levin_32
COMPUTE
INTERPOLATE
Open-loop WEIGHTED FIND
4 SUBFRAMES
pitch search SPEECH OPEN-LOOP PITCH
LSP       A(z)
(4 SUBFRAMES)
Int_Lpc4 Lsp_Az Pitch_Ol_Dec
Pond_Ai Residu Syn_Filt
T
^
A(z)
LSP index
COMPUTE
COMPUTE TARGET
Adaptive x(n)
FIND BEST DELAY ADAPTIVE
codebook FOR ADAPTIVE
AND GAIN CODEBOOK
search
CODEBOOK
CONTRIBUTION
Pitch_Fr
Syn_Filt Pred_Lt G_Pitch
pitch index
x(n)
s
u
b
COMPUTE TARGET FIND BEST
Innovative xn2(n)
f code index
codebook FOR INNOVATION
r
search INNOVATION AND GAIN
a
D4i60_16 G_Code
m
e
gains index
GAINS
UPDATE FILTER
Compute
COMPUTE QUANTIZATION
MEMORIES FOR
error EXCITATION IN ENERGY
NEXT SUBFRAME
DOMAIN
Syn_Filt
Ener_Qua
Figure 3: Signal flow at the encoder
4.2.2.1 Short-term prediction
Short-term prediction (LP or LPC analysis) shall be performed every 30 ms. The auto-correlation
approach shall be used with an asymmetric analysis window. The LP analysis window consists of two
halves of Hamming windows with different lengths. This window is given by:
 
πn
wn =−05,,4 0 46cos , nL=−01,.,
()  
L−1
 
(7)
 πnL− 
()
=+05, 4 0,46cos , nL=+,.,L L−1
 
11 2
L−1
 
A 32 ms analysis window (corresponding to 256 samples with the sampling frequency of 8 kHz) shall be
used with values L = 216 and L = 40. The window shall be positioned such that 40 samples are taken
1 2
from the future frame (look-ahead of 40 samples).

Page 14
ETS 300 395-2: December 1996
The auto-correlation of the windowed speech sn′ ,,n= 0.,255 , are computed by:
()
rk = s′ ns′ n−=k,,k 01.,0 (8)
() ∑ ( ) ( )
nk=
and a 60 Hz bandwidth expansion has to be used by lag windowing the auto-correlation using the window
(see annex C {2}):
 
πfi
1 2
 
wi=−exp , i=11,.,0 (9)
()  
lag
2 f
 
s
 
where f = 60 Hz is the bandwidth expansion and f = 8 000 Hz is the sampling frequency. Further, r 0
()
0 s
is multiplied by 1,00005 which is equivalent to adding a noise floor at -43 dB. In the TETRA coder, this is
'
alternatively performed by dividing the lag window as in equation (9) by 1,00005, resulting in w 01=
()
lag
and:
'
wi==w i /1,00005 i 1,.,10 (10)
() ()
lag lag
The modified auto-correlation:
''
rk==rkw k , k 01,.,0 (11)
() ( ) ()
lag
are used to obtain the LP filter coefficients ak, =11,.,0, by solving the set of equations:
k
ar i−k =−r i,,i=11.,0 (12)
∑ ′ ′()
()
k
=
k 1
The set of equations in (12) shall be solved using the Levinson-Durbin algorithm (see annex C {3}).
4.2.2.2 LP to LSP and LSP to LP conversion
The LP filter coefficients of Az (ak, =11,.,0) shall be converted to the Line Spectral Pair (LSP)
()
k
representation (see annex C {4}) for quantization and interpolation purposes. For a 10th order LP filter, the
LSPs are defined as the roots of the sum and difference polynomials:
' −−11 1
Fz()=+A()z z Az (13)
()
and
' −−11 1
Fz=−Az z Az (14)
() ()
2 ()
respectively. It can be proven that all roots of these polynomials are on the unit circle and they alternate
each other (see annex C {5}). Fz has a root z =−1(ωπ= ) and Fz has a root z==10()ω .
′() ′()
1 2
Page 15
ETS 300 395-2: December 1996
To eliminate these two roots, new polynomials are defined:
−1
Fz()= Fz′()/1+z (15)
11 ()
and
−1
Fz= Fz′/1−z (16)
() ( )
22 ()
±jω
i
Each polynomial has 5 conjugate roots on the unit circle e , therefore, the polynomials can be
()
written as:
−−12
Fz()=−∏ 12qz+z (17)
()
1 i
i=13,,.,9
and
−−12
Fz=−12qz+z (18)
() ∏
2 ()i
i=24,,.,10
where q= cos ω , with ω being the Line Spectral Frequencies (LSFs). They satisfy the ordering
()
ii i
property 0<<ωω<.<ω <π. q are referred as the LSPs in the cosine domain.
12 10 i
The first five coefficients of each of the symmetric polynomials Fz and Fz are found by the
() ()
1 2
recursive relations (for i = 0 to 4):
fi+=1 a + a − f i
() ()
11ip+−i 1
(19)
fi+=1 a − a + f i
() ()
21ip+−i 2
Fz Fz
The LSPs are found by evaluating the polynomials () and () at 60 points equally spaced between
1 2
0 and π and checking for sign changes. A sign change signifies the existence of a root and the sign
change interval is then divided 4 times to better track the root. The Chebyshev polynomials have to be
used to evaluate Fz and Fz (see annex C {6}). This method is very computationally efficient since
() ()
1 2
it bypasses the cosine computations as the roots are found directly in the cosine domain q . In the
{}
i
TETRA codec, implementation, quantization and interpolation of the LSPs are performed in the cosine
domain, thus no trigonometric computations are needed to convert to the frequency domain.
The polynomials Fz or Fz are given by:
() ()
1 2
−j5ω
Fz=+21e T x f Tx+f2T x+f3Tx+f4T x+f5/2 (20)
( )()() () () ( ) () ( ) () ( ) () ( )
54 3 2 1
where Tx= cosmω is the mth order Chebyshev polynomial, and fi ,i=15,. , are the
() ( ) ()
m
coefficients of either Fz or Fz , computed using the equations in (19). The details of the Chebyshev
() ()
1 2
polynomial evaluation method are found in (see annex C {6}). If this numerical process is not able to find
enough roots, the previous computed set of LSPs is used.
Once the LSPs are quantized and interpolated, they are converted back to the LP coefficient domain
Az . The conversion to the LP domain is done as follows. The coefficients of Fz and Fz are
() () ()
{}
1 2
found by expanding equations (17) and (18) knowing the quantized and interpolated LSPs qi,,=11.0.
i
Page 16
ETS 300 395-2: December 1996
The following recursive relation shall be used to compute fi :
()
i = 1
for to 5
fi =−21q f i− +2f i−2
() ( ) ( )
12i−11 1
for ji=−1 down to 1
fj=−f j21q f j−+f j−2
() () ( ) ( )
11 2i−11 1
with initial values f 01= and f −=10. The coefficients fi are computed similarly by replacing
() () ()
1 1 2
q by q . Once the coefficients fi and fi are found, Fz and Fz are multiplied by
() () () ()
21i− 2i 1 2 1 2
− −
1 1
1+z and 1−z , respectively, to obtain Fz′ and Fz′ ; that is fi′=+f i fi−1 and
() () () () ( )
1 2 11 1
f′i=−fi fi−11,i= ,.,5. Finally the LP coefficients are found by
() () ( )
22 2
af=+05,i 05,fi,i=1,.,5 and af=−05,i 5−0,5fi−5,i=5,.,10. This is directly
() () () ( )
i 12 i 12
derived from the relation Az = F′z +F′ z/2 , and considering the fact that Fz′ and Fz′ are
()()( ) () () ()
12 1 2
symmetrical and anti-symmetrical polynomials, respectively.
4.2.2.3 Quantization and interpolation of LP parameters
The computed LP parameters have to be converted to LSPs and quantized with 26 bits using split-VQ.
NOTE: Both the quantization and interpolation are performed on the LSPs in the cosine
domain; that is:
qf==cos21πf , i ,.,10 (21)
()
iis
where f is the line spectral frequencies in Hz and f is the sampling frequency.
i s
The LSP vector q shall be split into three sub-vectors of length 3, 3, and 4. The first sub-vector
qq,,q shall be quantized with 8 bits while the sub-vectors qq,,q and qq,,qq, shall
{} {}{}
12 3 45 6 78 9 10
be each quantized with 9 bits. The search is performed using Mean Square Error (MSE) minimization in
the q domain with no LSP weighting.
The quantized LP parameters are used for the fourth sub-frame, whereas the first three sub-frames use a
linear interpolation of the parameters of the present and previous frames. The interpolation is performed
� �
on the LSPs in the q domain. Let q be the quantized LSP vector at the present frame and q the
n n−1
quantized LSP vector at the past frame. The interpolated LSP vectors at each of the 4 sub-frames are
given by:
qq00,75� ,25q�
=+

11nn
qq05, 0� 0,50q�
=+

21nn
(22)
� �
qq=+00,25 ,75q
31nn−

qq=
4 n

The initial values of the past quantized LSP vector are given in Q15 by q = {30 000, 26 000, 21 000,
−1
15 000, 8 000, 0, -8 000, -15 000, -21 000, -26 000}. (Divide by 2 to obtain the values in the
range [-1,1]). The interpolated LSP vectors shall be used to compute a different LP filter at each sub-
frame.
Page 17
ETS 300 395-2: December 1996
4.2.2.4 Long-term prediction analysis
The aim of the long term prediction analysis or adaptive codebook search is to find the best pitch
parameters, which are the delay and gain values for the pitch filter. The pitch filter shall be implemented
using the so-called adaptive codebook approach whereby the excitation is repeated for delays less than
the sub-frame length (60). In this implementation the excitation is extended by the LP residual in the
search stage to simplify the closed-loop search. In the first sub-frame, a fractional pitch delay is used with
1 2
resolutions: 1/3 in the range 19 − 84 and integers only in the range [85 - 143]. For the other sub-
3 3
2 2
frames, a pitch resolution of 1/3 is always used in the range TT−−54+ , where T is the
1 1 1
3 3
nearest integer to the fractional pitch lag of the first sub-frame.
To simplify the pitch analysis procedure, a two stage approach shall be used, comprising first an open
loop pitch search followed by a closed loop search.
The open loop pitch has to be computed once every speech frame (30 ms) using a weighted speech
signal sn . A pole-zero type weighting procedure shall be used to get sn . This procedure shall be
() ()
w w
performed with the help of a shaping filter Az/,095/Az/0,60 for which the un-quantized LP
()( )
parameters shall be used.
The open loop pitch search shall then be performed as follows. In a first step, 3 maxima of the correlation:
Cs=−∑ ()22js(jk) (23)
kw w
j=0
are found in the three ranges, [20 - 39], [40 - 79] and [80 - 142], respectively. The retained maxima
Ci, =13,., , are normalized by dividing by sn−k ,i=13,., , respectively. The normalized
∑()
k wi
i
n
maxima and corresponding delays are denoted by Rk,,i=13,. . The winner among the three
()
ii
normalized correlation is selected by favouring the delays in the lower ranges. That is, k is selected if
i
RR>08, 5 . This procedure of dividing the delay range into 3 sections and favouring the lower
ii+1
sections is used to avoid choosing pitch multiples.
NOTE 1: The past weighted speech samples are initialized to zero.
Having found the open-loop pitch T , a closed-loop pitch analysis has to be performed around the open-
op
loop pitch delay on a sub-frame basis. In the first sub-frame the range T ± 2 bounded by [20 - 143] is
op
searched. For the other sub-frames, closed-loop pitch analysis is performed around the pitch selected in
the first sub-frame. As mentioned earlier, a pitch resolution of 1/3 is always used for the other sub-frames
2 2
in the range TT−−54+ , where T is the integer part of the first sub-frame pitch lag.
1 1 1
3 3
The pitch delay shall be encoded with 8 bits in the first sub-frame while the relative delays of the other
sub-frames shall be encoded with 5 bits per sub-frame.
The closed loop pitch search shall be performed by minimizing the mean-square weighted error between
the original and synthesized speech. This is achieved by maximizing the term:
xn y n
∑ () ()
k
n=0
τ = ,
k
y ny n
∑ () ()
k k
n= 0
(24)
Page 18
ETS 300 395-2: December 1996
where xn is the target for the adaptive codebook search given by the weighted input speech after
()
subtracting the zero-input response of the weighted synthesis filter HzW z and yn is the past
() () ()
k
filtered excitation at delay k (the past excitation is initialized to zero).
NOTE 2: The search range is limited around the open-loop pitch as explained earlier.
For delays k < 60 the excitation signal un is extended by the LP residual signal. Once the optimum
()
2 1 1 2
integer pitch delay is determined, the fractions − , − , , and around that integer are tested.
3 3 3 3
NOTE 3: For the first sub-frame, the fractions are tested only if the integer pitch lag is less than
85.
The fractional pitch search is performed by interpolating the normalized correlation in equation (24) and
searching for its maximum. Once the non-integer pitch is determined, the adaptive codebook vector vn
()
is computed by interpolating the past excitation signal un . The interpolation shall be performed using
()
two FIR filters (Hamming windowed sinc functions); one for interpolating the term in equation (24) with the
sinc truncated at ±12 (8 multiplications per fraction) and the other for interpolating the past excitation with
the sinc truncated at ±48 (32 multiplications per sample). The pitch gain is then found by:
∑xn()y()n
n=0
g = , bounded by 01≤≤g ,2 (25)
p p
yn y n
∑ () ()
n=0
where yn=∗v n h n is the filtered adaptive codebook vector (zero-state response of HzW z to
() () () () ( )
vn ).
()
NOTE 4: Only positive pitch gains are allowed since by maximizing the term in equation (24) the
negative correlations are eliminated.
4.2.2.5 Algebraic codebook: structure and search
A 16-bit algebraic codebook shall be used in the innovative codebook search, the aim of which is to find
the best innovation and gain parameters. The innovation vector contains, at most, four non-zero pulses.
The 4 pulses can assume the amplitudes and positions given in the following table:
Table 2
Codebook parameters Positions of the pulses Codebook
bit allocation
Pulse amplitude: +1,4142 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 5
32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58
Pulse amplitude: -1 2, 10, 18, 26, 34, 42, 50, 58 3
Pulse amplitude: +1 4, 12, 20, 28, 36, 44, 52, (60) 3
Pulse amplitude: -1 6, 14, 22, 30, 38, 46, 54, (62) 3
Global sign flag 1
Shift flag 1
The pulses shall have fixed amplitudes of +1,4142, -1, +1 and -1, respectively. The first pulse position
shall be encoded with 5 bits while the positions of the other pulses shall be encoded with 3 bits. The
positions of all pulses can be simultaneously shifted by one, to occupy odd positions. One bit shall be
used to encode this shift and a global sign bit shall be used to invert all pulses simultaneously, giving a
total of 16 bits.
Page 19
ETS 300 395-2: December 1996
NOTE 1: From table 2, it is possible to position the last two pulses outside the sub-frame which
indicates that these pulses are not present.
The codebook is searched by minimizing the mean squared error between the weighted input speech and
the weighted synthesis speech. The target signal used in the closed-loop pitch search is updated by
subtracting the adaptive codebook contribution. That is, the target for the innovation is computed using:
xn=−xn g yn , n=05,.9 (26)
() () ()
2 p
where yn=∗v n h n is the filtered adaptive codebook vector, with hn being the impulse response
() () () ()
of the weighted synthesis filter HzW z =1/A z /γ .
() () ( )
As described in subclause 4.1 the algebraic codebook is dynamically shaped to enhance the important
frequency regions. The used shaping matrix is a lower triangular convolution matrix consisting of the
impulse response of the filter Fz in equation (4). Thus the shaping can be performed as a filtering
()
process. To maintain the simplicity of the algebraic codebook search, the filter Fz is combined with the
()
weighted synthesis filter HzW z and the impulse response hn′ of the combined filter is computed
() ( ) ()
(see annex C {1}). If c is the algebraic codeword at index k , then the algebraic codebook is searched by
k
maximizing the term:
t
2 dc
()k
C
k
τ== (27)
k
t
ε
ccΦ
k
k k
where H h 0
is a lower triangular Toeplitz convolution matrix with diagonal ′() and lower diagonals
t t
15,.,9
hh′′() ( ) and dH= x is the backward filtered target vector and ΦΦ =HH.
The algebraic structure of the codebook allows for very fast search procedures since the innovation vector
c contains only 4 non-zero pulses. The search shall be performed in 4 nested loops, corresponding to
k
each pulse positions, where in each loop the contributio
...

Questions, Comments and Discussion

Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.

Loading comments...