ETSI EN 301 245 V4.0.1 (1997-12)
Digital cellular telecommunications system (Phase 2) (GSM); Enhanced Full Rate (EFR) speech transcoding (GSM 06.60 version 4.0.1)
Digital cellular telecommunications system (Phase 2) (GSM); Enhanced Full Rate (EFR) speech transcoding (GSM 06.60 version 4.0.1)
DEN/SMG-110660P
Digitalni celični telekomunikacijski sistem (faza 2) – Izboljšano prekodiranje govora s polno hitrostjo (EFR) (GSM 06.60, različica 4.0.1)
General Information
Standards Content (Sample)
2003-01.Slovenski inštitut za standardizacijo. Razmnoževanje celote ali delov tega standarda ni dovoljeno.Digital cellular telecommunications system (Phase 2) (GSM); Enhanced Full Rate (EFR) speech transcoding (GSM 06.60 version 4.0.1)33.070.50Globalni sistem za mobilno telekomunikacijo (GSM)Global System for Mobile Communication (GSM)ICS:Ta slovenski standard je istoveten z:EN 301 245 Version 4.0.1SIST EN 301 245 V4.0.1:2003en01-december-2003SIST EN 301 245 V4.0.1:2003SLOVENSKI
STANDARD
European Telecommunications Standards InstituteEN 301 245 V4.0.1 (1997-12)European Standard (Telecommunications series)Digital cellular telecommunications system (Phase 2);Enhanced Full Rate (EFR) speech transcoding(GSM 06.60 version 4.0.1)GLOBAL SYSTEM
FOR MOBILE COMMUNICATIONSRSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)2GSM 06.60 version 4.0.1ReferenceDEN/SMG-110660P (bj00200o.PDF)KeywordsEFR, digital cellular telecommunications system,Global System for Mobile communications(GSM), speechETSI SecretariatPostal addressF-06921 Sophia Antipolis Cedex - FRANCEOffice address650 Route des Lucioles - Sophia AntipolisValbonne - FRANCETel.: +33 4 92 94 42 00
Fax: +33 4 93 65 47 16Siret N° 348 623 562 00017 - NAF 742 CAssociation à but non lucratif enregistrée à laSous-Préfecture de Grasse (06) N° 7803/88X.400c= fr; a=atlas; p=etsi; s=secretariatInternetsecretariat@etsi.frhttp://www.etsi.frCopyright NotificationNo part may be reproduced except as authorized by written permission.The copyright and the foregoing restriction extend to reproduction in all media.© European Telecommunications Standards Institute 1997.All rights reserved.SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)3GSM 06.60 version 4.0.1ContentsIntellectual Property Rights.4Foreword.41Scope.52Normative references.53Definitions, symbols and abbreviations.63.1Definitions.63.2Symbols.73.3Abbreviations.134Outline description.134.1Functional description of audio parts.134.2Preparation of speech samples.144.2.1PCM format conversion.144.3Principles of the GSM enhanced full rate speech encoder.144.4Principles of the GSM enhanced full rate speech decoder.164.5Sequence and subjective importance of encoded parameters.165Functional description of the encoder.165.1Pre-processing.165.2Linear prediction analysis and quantization.175.2.1Windowing and auto-correlation computation.175.2.2Levinson-Durbin algorithm.185.2.3LP to LSP conversion.185.2.4LSP to LP conversion.205.2.5Quantization of the LSP coefficients.215.2.6Interpolation of the LSPs.225.3Open-loop pitch analysis.225.4Impulse response computation.235.5Target signal computation.235.6Adaptive codebook search.235.7Algebraic codebook structure and search.255.8Quantization of the fixed codebook gain.275.9Memory update.286Functional description of the decoder.296.1Decoding and speech synthesis.296.2Post-processing.316.2.1Adaptive post-filtering.316.2.2Up-scaling.327Variables, constants and tables in the C-code of the GSM EFR codec.327.1Description of the constants and variables used in the C code.328Homing sequences.358.1Functional description.358.2Definitions.358.3Encoder homing.378.4Decoder homing.378.5Encoder home state.388.6Decoder home state.399Bibliography.44History.45SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)4GSM 06.60 version 4.0.1Intellectual Property RightsIPRs essential or potentially essential to the present document may have been declared to ETSI. The informationpertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be foundin ETR 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in respect ofETSI standards", which is available free of charge from the ETSI Secretariat. Latest updates are available on the ETSIWeb server (http://www.etsi.fr/ipr).Pursuant to the ETSI Interim IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. Noguarantee can be given as to the existence of other IPRs not referenced in ETR 314 (or the updates onhttp://www.etsi.fr/ipr) which are, or may be, or may become, essential to the present document.ForewordThis European Standard (Telecommunications series) has been produced by the Special Mobile Group (SMG)Technical Committee of the European Telecommunications Standards Institute (ETSI).This EN describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform PCM format toencoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed speech sampleswithin the digital cellular telecommunications system.This EN corresponds to GSM technical specification, GSM 06.60, version 5.1.2.National transposition datesDate of adoption of this EN:19 December 1997Date of latest announcement of this EN (doa):31 March 1998Date of latest publication of new National Standardor endorsement of this EN (dop/e):30 September 1998Date of withdrawal of any conflicting National Standard (dow):30 September 1998SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)5GSM 06.60 version 4.0.11ScopeThis EN describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform PCM format toencoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed speech samples.The sampling rate is 8 000 sample/s leading to a bit rate for the encoded bit stream of 12,2 kbit/s. The coding scheme isthe so-called Algebraic Code Excited Linear Prediction Coder, hereafter referred to as ACELP.This EN also specifies the conversion between A-law PCM and 13-bit uniform PCM. Performance requirements for theaudio input and output parts are included only to the extent that they affect the transcoder performance. This part alsodescribes the codec down to the bit level, thus enabling the verification of compliance to the part to a high degree ofconfidence by use of a set of digital test sequences. These test sequences are described in GSM 06.54 [7] and areavailable on disks.In case of discrepancy between the requirements described in this EN and the fixed point computational description(ANSI-C code) of these requirements contained in GSM 06.53 [6], the description in GSM 06.53 [6] will prevail.The transcoding procedure specified in this EN is applicable for the enhanced full rate speech traffic channel (TCH) inthe GSM system.In GSM 06.51 [5], a reference configuration for the speech transmission chain of the GSM enhanced full rate (EFR)system is shown. According to this reference configuration, the speech encoder takes its input as a 13-bit uniform PCMsignal either from the audio part of the Mobile Station or on the network side, from the PSTN via an 8-bit/A-law to13-bit uniform PCM conversion. The encoded speech at the output of the speech encoder is delivered to a channelencoder unit which is specified in GSM 05.03 [3]. In the receive direction, the inverse operations take place.2Normative referencesThis EN incorporates by dated and undated reference, provisions from other publications. These normative referencesare cited at the appropriate places in the text and the publications are listed hereafter. For dated references, subsequentamendments to or revisions of any of these publications apply to this EN only when incorporated in it by amendment orrevision. For undated references, the latest edition of the publication referred to applies.[1]GSM 01.04 (ETR 100): "Digital cellular telecommunications system (Phase 2); Abbreviations andacronyms".[2]GSM 03.50 (ETS 300 540): "Digital cellular telecommunications system (Phase 2); Transmissionplanning aspects of the speech service in the GSM Public Land Mobile Network (PLMN) system".[3]GSM 05.03 (ETS 300 575): "Digital cellular telecommunications system (Phase 2); Channelcoding".[4]GSM 06.32 (ETS 300 580-6): "Digital cellular telecommunications system (Phase 2); VoiceActivity Detection (VAD)".[5]GSM 06.51 (EN 301 243): "Digital cellular telecommunications system (Phase 2); Enhanced FullRate (EFR) speech processing functions General description".[6]GSM 06.53 (EN 301 244): "Digital cellular telecommunications system (Phase 2); ANSI-C codefor the GSM Enhanced Full Rate (EFR) speech codec".[7]GSM 06.54 (EN 301 250): "Digital cellular telecommunications system (Phase 2); Test vectors forthe GSM Enhanced Full Rate (EFR) speech codec".[8]ITU-T Recommendation G.711 (1988): "Coding of analogue signals by pulse code modulationPulse code modulation (PCM) of voice frequencies".[9]ITU-T Recommendation G.726: "40, 32, 24, 16 kbit/s adaptive differential pulse code modulation(ADPCM)".SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)6GSM 06.60 version 4.0.13Definitions, symbols and abbreviations3.1DefinitionsFor the purposes of this EN, the following definitions apply:adaptive codebook:The adaptive codebook contains excitation vectors that are adapted for every subframe. Theadaptive codebook is derived from the long term filter state. The lag value can be viewed as anindex into the adaptive codebook.adaptive postfilter:This filter is applied to the output of the short term synthesis filter to enhance the perceptualquality of the reconstructed speech. In the GSM enhanced full rate codec, the adaptive postfilter isa cascade of two filters: a formant postfilter and a tilt compensation filter.algebraic codebook:A fixed codebook where algebraic code is used to populate the excitation vectors(innovation vectors).The excitation contains a small number of nonzero pulses with predefinedinterlaced sets of positions.closed-loop pitch analysis: This is the adaptive codebook search, i.e., a process of estimating the pitch (lag) valuefrom the weighted input speech and the long term filter state. In the closed-loop search, the lag issearched using error minimization loop (analysis-by-synthesis). In the GSM enhanced full ratecodec, closed-loop pitch search is performed for every subframe.direct form coefficients: One of the formats for storing the short term filter parameters. In the GSM enhanced fullrate codec, all filters which are used to modify speech samples use direct form coefficients.fixed codebook:The fixed codebook contains excitation vectors for speech synthesis filters. The contents of thecodebook are non-adaptive (i.e., fixed). In the GSM enhanced full rate codec, the fixed codebookis implemented using an algebraic codebook.fractional lags:A set of lag values having sub-sample resolution. In the GSM enhanced full rate codec asub-sample resolution of 1/6th of a sample is used.frame:A time interval equal to 20 ms (160 samples at an 8 kHz sampling rate).integer lags:A set of lag values having whole sample resolution.interpolating filter:An FIR filter used to produce an estimate of sub-sample resolution samples, given an inputsampled with integer sample resolution.inverse filter:This filter removes the short term correlation from the speech signal. The filter models an inversefrequency response of the vocal tract.lag:The long term filter delay. This is typically the true pitch period, or a multiple or sub-multiple of it.Line Spectral Frequencies:(see Line Spectral Pair)Line Spectral Pair: Transformation of LPC parameters. Line Spectral Pairs are obtained by decomposing theinverse filter transfer function A(z) to a set of two transfer functions, one having even symmetryand the other having odd symmetry. The Line Spectral Pairs (also called as Line SpectralFrequencies) are the roots of these polynomials on the z-unit circle).LP analysis window:For each frame, the short term filter coefficients are computed using the high pass filteredspeech samples within the analysis window. In the GSM enhanced full rate codec, the length of theanalysis window is 240 samples. For each frame, two asymmetric windows are used to generatetwo sets of LP coefficients. No samples of the future frames are used (no lookahead).LP coefficients:Linear Prediction (LP) coefficients (also referred as Linear Predictive Coding (LPC) coefficients)is a generic descriptive term for describing the short term filter coefficients.SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)7GSM 06.60 version 4.0.1open-loop pitch search:A process of estimating the near optimal lag directly from the weighted speech input.This is done to simplify the pitch analysis and confine the closed-loop pitch search to a smallnumber of lags around the open-loop estimated lags. In the GSM enhanced full rate codec,open-loop pitch search is performed every 10 ms.residual:The output signal resulting from an inverse filtering operation.short term synthesis filter: This filter introduces, into the excitation signal, short term correlation which models theimpulse response of the vocal tract.perceptual weighting filter: This filter is employed in the analysis-by-synthesis search of the codebooks. The filterexploits the noise masking properties of the formants (vocal tract resonances) by weighting theerror less in regions near the formant frequencies and more in regions away from them.subframe:A time interval equal to 5 ms (40 samples at an 8 kHz sampling rate).vector quantization:A method of grouping several parameters into a vector and quantizing them simultaneously.zero input response:The output of a filter due to past inputs, i.e. due to the present state of the filter, given thatan input of zeros is applied.zero state response:The output of a filter due to the present input, given that no past inputs have been applied, i.e.,given the state information in the filter is all zeroes.3.2SymbolsFor the purposes of this EN, the following symbols apply:()AzThe inverse filter with unquantized coefficients()AzThe inverse filter with quantified coefficients()()HzAz=1The speech synthesis filter with quantified coefficientsaiThe unquantized linear prediction parameters (direct form coefficients)aiThe quantified linear prediction parametersmThe order of the LP model1Bz()The long-term synthesis filter()WzThe perceptual weighting filter (unquantized coefficients)gg12,The perceptual weighting factorsFzE()Adaptive pre-filterTThe nearest integer pitch lag to the closed-loop fractional pitch lag of the subframebThe adaptive pre-filter coefficient (the quantified pitch gain)HzAzAzfnd()(/)(/)=ggThe formant postfilterSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)8GSM 06.60 version 4.0.1gnControl coefficient for the amount of the formant post-filteringgdControl coefficient for the amount of the formant post-filteringHzt()Tilt compensation filtergtControl coefficient for the amount of the tilt compensation filteringmg=tk1'A tilt factor, with k1'being the first reflection coefficienthnf()The truncated impulse response of the formant postfilterLhThe length of hnf()rih()The auto-correlations of hnf()(/)AzngThe inverse filter (numerator) part of the formant postfilter1/(/)AzdgThe synthesis filter (denominator) part of the formant postfilter()rnThe residual signal of the inverse filter (/)Aznghzt()Impulse response of the tilt compensation filterbscn()The AGC-controlled gain scaling factor of the adaptive postfilteraThe AGC factor of the adaptive postfilterHzh1()Pre-processing high-pass filterwnI(), wnII()LP analysis windowsLI1()Length of the first part of the LP analysis window wnI()LI2()Length of the second part of the LP analysis window wnI()LII1()Length of the first part of the LP analysis window wnII()LII2()Length of the second part of the LP analysis window wnII()rkac()The auto-correlations of the windowed speech sn'()wilag()Lag window for the auto-correlations (60 Hz bandwidth expansion)f0The bandwidth expansion in HzfsThe sampling frequency in Hzrkac'()The modified (bandwidth expanded) auto-correlationsSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)9GSM 06.60 version 4.0.1()EiLDThe prediction error in the ith iteration of the Levinson algorithmkiThe ith reflection coefficientaji()The jth direct form coefficient in the ith iteration of the Levinson algorithmFz1'()Symmetric LSF polynomialFz2'()Antisymmetric LSF polynomialFz1()Polynomial ()Fz1¢ with root z=-1 eliminatedFz2()Polynomial ()Fz2¢ with root z=1 eliminatedqiThe line spectral pairs (LSPs) in the cosine domainqAn LSP vector in the cosine domain()qinThe quantified LSP vector at the ith subframe of the frame nwiThe line spectral frequencies (LSFs)Txm()A mth order Chebyshev polynomialfifi12(),()The coefficients of the polynomials Fz1()and Fz2()fifi12''(),() The coefficients of the polynomials ()Fz1¢ and ()Fz2¢fi()The coefficients of either Fz1() or Fz2()Cx()Sum polynomial of the Chebyshev polynomialsxCosine of angular frequency wlkRecursion coefficients for the Chebyshev polynomial evaluationfiThe line spectral frequencies (LSFs) in Hz[]ftfff=1210The vector representation of the LSFs in Hzz()()1n,z()()2nThe mean-removed LSF vectors at frame nr()()1n, r()()2nThe LSF prediction residual vectors at frame np()nThe predicted LSF vector at frame n()()r21n-The quantified second residual vector at the past frameSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)10GSM 06.60 version 4.0.1fkThe quantified LSF vector at quantization index kELSPThe LSP quantization errorwii,,,,=110LSP-quantization weighting factorsdiThe distance between the line spectral frequencies fi+1 and fi-1hn()The impulse response of the weighted synthesis filterOkThe correlation maximum of open-loop pitch analysis at delay kOiti,,,=13The correlation maxima at delays tii,,,=13()Mtiii,,,,=13The normalized correlation maxima Mi and the corresponding delays tii,,,=13HzWzAzAzAz()()(/)()(/)=gg12 The weighted synthesis filterAz(/)g1The numerator of the perceptual weighting filter12/(/)AzgThe denominator of the perceptual weighting filterT1The nearest integer to the fractional pitch lag of the previous (1st or 3rd) subframesn'()The windowed speech signalsnw()The weighted speech signal()snReconstructed speech signal()¢snThe gain-scaled post-filtered signal()snfPost-filtered speech signal (before scaling)xn()The target signal for adaptive codebook searchxn2(),x2tThe target signal for algebraic codebook searchresnLP()The LP residual signalcn()The fixed codebook vectorvn()The adaptive codebook vectorynvnhn()=()()*The filtered adaptive codebook vectorynk()The past filtered excitationun()The excitation signalSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)11GSM 06.60 version 4.0.1()unThe emphasized adaptive codebook vector'()unThe gain-scaled emphasized excitation signalTopThe best open-loop lagtminMinimum lag search valuetmaxMaximum lag search value()RkCorrelation term to be maximized in the adaptive codebook searchb24The FIR filter for interpolating the normalized correlation term ()Rk()RktThe interpolated value of ()Rk for the integer delay k and fraction tb60The FIR filter for interpolating the past excitation signal un() to yield the adaptive codebookvector vn()AkCorrelation term to be maximized in the algebraic codebook search at index kCkThe correlation in the numerator of Ak at index kEDkThe energy in the denominator of Ak at index kdHx=t2The correlation between the target signal ()xn2 and the impulse response ()hn, i.e., backwardfiltered targetHThe lower triangular Toepliz convolution matrix with diagonal ()h0 and lower diagonals()()hh139,,F=HHtThe matrix of correlations of ()hndn()The elements of the vector df(,)ijThe elements of the symmetric matrix FckThe innovation vectorCThe correlation in the numerator of AkmiThe position of the i th pulseJiThe amplitude of the i th pulseNpThe number of pulses in the fixed codebook excitationEDThe energy in the denominator of AkSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)12GSM 06.60 version 4.0.1()resnLTPThe normalized long-term prediction residualbn()The sum of the normalized ()dn vector and normalized long-term prediction residual()resnLTPsnb()The sign signal for the algebraic codebook searchdn'()Sign extended backward filtered targetf'(,)ijThe modified elements of the matrix F, including sign informationzt, ()znThe fixed codebook vector convolved with hn()En()The mean-removed innovation energy (in dB)EThe mean of the innovation energy~()EnThe predicted energy[]bbbb1234
The MA prediction coefficients()RkThe quantified prediction error at subframe kEIThe mean innovation energyRn()The prediction error of the fixed-codebook gain quantizationEQThe quantization error of the fixed-codebook gain quantizationen()The states of the synthesis filter 1/()Azenw()The perceptually weighted error of the analysis-by-synthesis searchhThe gain scaling factor for the emphasized excitationgcThe fixed-codebook gaingc'The predicted fixed-codebook gaingcThe quantified fixed codebook gaingpThe adaptive codebook gaingpThe quantified adaptive codebook gainggcccgg=/'A correction factor between the gain gc and the estimated one gc'ggcThe optimum value for ggcSIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)13GSM 06.60 version 4.0.1gscGain scaling factor3.3AbbreviationsFor the purposes of this EN, the following abbreviations apply. Further GSM related abbreviations may be found inGSM 01.04 [1].ACELPAlgebraic Code Excited Linear PredictionAGCAdaptive Gain ControlCELPCode Excited Linear PredictionFIRFinite Impulse ResponseISPPInterleaved Single-Pulse PermutationLPLinear PredictionLPCLinear Predictive CodingLSFLine Spectral FrequencyLSPLine Spectral PairLTPLong Term Predictor (or Long Term Prediction)MAMoving Average4Outline descriptionThis EN is structured as follows:Section 4.1 contains a functional description of the audio parts including the A/D and D/A functions. Section 4.2describes the conversion between 13-bit uniform and 8-bit A-law samples. Sections 4.3 and 4.4 present a simplifieddescription of the principles of the GSM EFR encoding and decoding process respectively. In subclause 4.5, thesequence and subjective importance of encoded parameters are given.Section 5 presents the functional description of the GSM EFR encoding, whereas clause 6 describes the decodingprocedures. Section 7 describes variables, constants and tables of the C-code of the GSM EFR codec.4.1Functional description of audio partsThe analogue-to-digital and digital-to-analogue conversion will in principle comprise the following elements:1)Analogue to uniform digital PCM-microphone;-input level adjustment device;-input anti-aliasing filter;-sample-hold device sampling at 8 kHz;-analogue-to-uniform digital conversion to 13-bit representation.The uniform format shall be represented in two's complement.2) Uniform digital PCM to analogue-conversion from 13-bit/8 kHz uniform PCM to analogue;-a hold device;-reconstruction filter including x/sin( x ) correction;-output level adjustment device;SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)14GSM 06.60 version 4.0.1-earphone or loudspeaker.In the terminal equipment, the A/D function may be achieved either-by direct conversion to 13-bit uniform PCM format;-or by conversion to 8-bit/A-law compounded format, based on a standard A-law codec/filter according to ITU-TRecommendations G.711 [8] and G.714, followed by the 8-bit to 13-bit conversion as specified in subclause4.2.1.For the D/A operation, the inverse operations take place.In the latter case it should be noted that the specifications in ITU-T G.714 (superseded by G.712) are concerned withPCM equipment located in the central parts of the network. When used in the terminal equipment, this EN does not onits own ensure sufficient out-of-band attenuation. The specification of out-of-band signals is defined in GSM 03.50 [2]in clause 2.4.2Preparation of speech samplesThe encoder is fed with data comprising of samples with a resolution of 13 bits left justified in a 16-bit word. The threeleast significant bits are set to '0'. The decoder outputs data in the same format. Outside the speech codec furtherprocessing must be applied if the traffic data occurs in a different representation.4.2.1PCM format conversionThe conversion between 8-bit A-Law compressed data and linear data with 13-bit resolution at the speech encoder inputshall be as defined in ITU-T Rec. G.711 [8].ITU-T Rec. G.711 [8] specifies the A-Law to linear conversion and vice versa by providing table entries. Examples onhow to perform the conversion by fixed-point arithmetic can be found in ITU-T Rec. G.726 [9]. Section 4.2.1 ofG.726 [9] describes A-Law to linear expansion and subclause 4.2.7 of G.726 [9] provides a solution for linear to A-Lawcompression.4.3Principles of the GSM enhanced full rate speech encoderThe codec is based on the code-excited linear predictive (CELP) coding model. A 10th order linear prediction (LP), orshort-term, synthesis filter is used which is given by:HzAzazimii()(),==+=-å1111(1)where ,,,,aimi=1 are the (quantified) linear prediction (LP) parameters, and m=10 is the predictor order. Thelong-term, or pitch, synthesis filter is given by:111BzgzpT(),=--(2)where T is the pitch delay and gp is the pitch gain. The pitch synthesis filter is implemented using the so-calledadaptive codebook approach.The CELP speech synthesis model is shown in figure 2. In this model, the excitation signal at the input of the short-termLP synthesis filter is constructed by adding two excitation vectors from adaptive and fixed (innovative) codebooks. Thespeech is synthesized by feeding the two properly chosen vectors from these codebooks through the short-term synthesisfilter. The optimum excitation sequence in a codebook is chosen using an analysis-by-synthesis search procedure inwhich the error between the original and synthesized speech is minimized according to a perceptually weighteddistortion measure.SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)15GSM 06.60 version 4.0.1The perceptual weighting filter used in the analysis-by-synthesis search technique is given by:WzAzAz()(/)(/),=gg12(3)where ()Az is the unquantized LP filter and 0121<<£gg are the perceptual weighting factors. The valuesg109=. and g206=. are used. The weighting filter uses the unquantized LP parameters while the formant synthesisfilter uses the quantified ones.The coder operates on speech frames of 20 ms corresponding to 160 samples at the sampling frequency of 8 000sample/s. At each 160 speech samples, the speech signal is analysed to extract the parameters of the CELP model (LPfilter coefficients, adaptive and fixed codebooks' indices and gains). These parameters are encoded and transmitted. Atthe decoder, these parameters are decoded and speech is synthesized by filtering the reconstructed excitation signalthrough the LP synthesis filter.The signal flow at the encoder is shown in figure 3. LP analysis is performed twice per frame. The two sets of LPparameters are converted to line spectrum pairs (LSP) and jointly quantified using split matrix quantization (SMQ) with38 bits. The speech frame is divided into 4 subframes of 5 ms each (40 samples). The adaptive and fixed codebookparameters are transmitted every subframe. The two sets of quantified and unquantized LP filters are used for the secondand fourth subframes while in the first and third subframes interpolated LP filters are used (both quantified andunquantized). An open-loop pitch lag is estimated twice per frame (every 10 ms) based on the perceptually weightedspeech signal.Then the following operations are repeated for each subframe:The target signal xn() is computed by filtering the LP residual through the weighted synthesis filterWzHz()() with the initial states of the filters having been updated by filtering the error between LP residualand excitation (this is equivalent to the common approach of subtracting the zero input response of the weightedsynthesis filter from the weighted speech signal).The impulse response, hn() of the weighted synthesis filter is computed.Closed-loop pitch analysis is then performed (to find the pitch lag and gain), using the target xn() and impulseresponse hn(), by searching around the open-loop pitch lag. Fractional pitch with 1/6th of a sample resolutionis used. The pitch lag is encoded with 9 bits in the first and third subframes and relatively encoded with 6 bits inthe second and fourth subframes.The target signal xn() is updated by removing the adaptive codebook contribution (filtered adaptivecodevector), and this new target, xn2(), is used in the fixed algebraic codebook search (to find the optimuminnovation). An algebraic codebook with 35 bits is used for the innovative excitation.The gains of the adaptive and fixed codebook are scalar quantified with 4 and 5 bits respectively (with movingaverage (MA) prediction applied to the fixed codebook gain).Finally, the filter memories are updated (using the determined excitation signal) for finding the target signal inthe next subframe.The bit allocation of the codec is shown in table 1. In each 20 ms speech frame, 244 bits are produced, corresponding toa bit rate of 12.2 kbit/s. More detailed bit allocation is available in table 6. Note that the most significant bits (MSB) arealways sent first.SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)16GSM 06.60 version 4.0.1Table 1: Bit allocation of the 12.2 kbit/s coding algorithm for 20 ms frameParameter1st & 3rd subframes2nd & 4th subframestotal per frame2 LSP sets38Pitch delay9630Pitch gain4416Algebraic code3535140Codebook gain5520Total2444.4Principles of the GSM enhanced full rate speech decoderThe signal flow at the decoder is shown in figure 4. At the decoder, the transmitted indices are extracted from thereceived bitstream. The indices are decoded to obtain the coder parameters at each transmission frame. Theseparameters are the two LSP vectors, the 4 fractional pitch lags, the 4 innovative codevectors, and the 4 sets of pitch andinnovative gains. The LSP vectors are converted to the LP filter coefficients and interpolated to obtain LP filters at eachsubframe. Then, at each 40-sample subframe:-the excitation is constructed by adding the adaptive and innovative codevectors scaled by their respective gains;-the speech is reconstructed by filtering the excitation through the LP synthesis filter.Finally, the reconstructed speech signal is passed through an adaptive postfilter.4.5Sequence and subjective importance of encodedparametersThe encoder will produce the output information in a unique sequence and format, and the decoder must receive thesame information in the same way. In table 6, the sequence of output bits s1 to s244 and the bit allocation for eachparameter is shown.The different parameters of the encoded speech and their individual bits have unequal importance with respect tosubjective quality. Before being submitted to the channel encoding function the bits have to be rearranged in thesequence of importance as given in table6 in 05.03 [3].5Functional description of the encoderIn this clause, the different functions of the encoder represented in figure 3 are described.5.1Pre-processingTwo pre-processing functions are applied prior to the encoding process: high-pass filtering and signal down-scaling.Down-scaling consists of dividing the input by a factor of 2 to reduce the possibility of overflows in the fixed-pointimplementation.The high-pass filter serves as a precaution against undesired low frequency components. A filter with a cut off frequencyof 80 Hz is used, and it is given by:Hzzzzzh112120927274351854494109272743511905946509114024().=-+-+----(4)Down-scaling and high-pass filtering are combined by dividing the coefficients at the numerator of Hzh1() by 2.SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)17GSM 06.60 version 4.0.15.2Linear prediction analysis and quantizationShort-term prediction, or linear prediction (LP), analysis is performed twice per speech frame using the auto-correlationapproach with 30 ms asymmetric windows. No lookahead is used in the auto-correlation computation.The auto-correlations of windowed speech are converted to the LP coefficients using the Levinson-Durbin algorithm.Then the LP coefficients are transformed to the Line Spectral Pair (LSP) domain for quantization and interpolationpurposes. The interpolated quantified and unquantized filter coefficients are converted back to the LP filter coefficients(to construct the synthesis and weighting filters at each subframe).5.2.1Windowing and auto-correlation computationLP analysis is performed twice per frame using two different asymmetric windows. The first window has its weightconcentrated at the second subframe and it consists of two halves of Hamming windows with different sizes. Thewindow is given by:wnnLnLnLLnLLLIIIIIIII().46,,,,.46(),,,.()()()()()()()=--æèçöø÷=-+--æèçöø÷=+-ìíïïîïï05401010540111112112cos
cos
pp(5)The values LI1160()= and LI280()= are used. The second window has its weight concentrated at the fourthsubframe and it consists of two parts: the first part is half a Hamming window and the second part is a quarter of acosine function cycle. The window is given by:(6)where the values LII1232()= and LII28()= are used.Note that both LP analyses are performed on the same set of speech samples. The windows are applied to 80 samplesfrom past speech frame in addition to the 160 samples of the present speech frame. No samples from future frames areused (no lookahead). A diagram of the two LP analysis windows is depicted below.20 ms5 msframe (160 samples)sub frame(40 samples)frame n-1frame ntIw (n)IIw
(n)SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)18GSM 06.60 version 4.0.1Figure 1: LP analysis windowsThe auto-correlations of the windowed speech snn'(),,,=0239, are computed by:rksnsnkkacnk()'()'(),,,,=-==å239010
(7)and a 60 Hz bandwidth expansion is used by lag windowing the auto-correlations using the window:wififilags(),,,,=-æèçöø÷éëêêùûúú=exp
12211002p(8)where f060= Hz is the bandwidth expansion and fs=8000 Hz is the sampling frequency. Further, rac()0 ismultiplied by the white noise correction factor 1.0001 which is equivalent to adding a noise floor at -40 dB.5.2.2Levinson-Durbin algorithmThe modified auto-correlations rracac'().()0100010=
and rkrkwkkacaclag'()()(),,,== 110 are used toobtain the direct form LP filter coefficients akk,,,,=110 by solving the set of equations.()arikriikackac''(),,,.-=-==å110110
(9)The set of equations in (9) is solved using the Levinson-Durbin algorithm. This algorithm uses the following recursion:[]EriakarijEiakjiaakaEikEiLDaciijiacjiLDiiijijiiijiLDiLD()'()'()/()()()()()()()()()()0011011111101101112====---==-=+=----=----åfor
to
do
for
to
do
end
endThe final solution is given as aajjj==(),,,10110.The LP filter coefficients are converted to the line spectral pair (LSP) representation for quantization and interpolationpurposes. The conversions to the LSP domain and back to the LP filter coefficient domain are described in the nextclause.5.2.3LP to LSP conversionThe LP filter coefficients akk,,,=110, are converted to the line spectral pair (LSP) representation for quantizationand interpolation purposes. For a 10th order LP filter, the LSPs are defined as the roots of the sum and differencepolynomials:FzAzzAz1111'()()()=+--(10)SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)19GSM 06.60 version 4.0.1andFzAzzAz2111'()()()=---,(11)respectively. The polynomial Fz1'() and Fz2'() are symmetric and anti-symmetric, respectively. It can be proventhat all roots of these polynomials are on the unit circle and they alternate each other. Fz1'() has a rootz=-=1()wp and Fz2'() has a root z==10()w. To eliminate these two roots, we define the newpolynomials:FzFzz1111()()/()'=+-(12)andFzFzz2211()()/()'=--.(13)Each polynomial has 5 conjugate roots on the unit circle ()eji±w, therefore, the polynomials can be written as()Fzqzzii11213912(),,,=-+--=Õ(14)and()Fzqzzii212241012(),,,=-+--=Õ,(15)where ()qii=cosw with w i
being the line spectral frequencies (LSF) and they satisfy the ordering property01210<<<<
as the LSPs in the cosine domain.Since both polynomials Fz1() and Fz2() are symmetric only the first 5 coefficients of each polynomial need to becomputed. The coefficients of these polynomials are found by the recursive relations (for i=0 to 4):fiaafifiaafiimiimi11121211()(),()()+=+-+=-++-+-
,(16)where m=10 is the predictor order.The LSPs are found by evaluating the polynomials Fz1() and Fz2() at 60 points equally spaced between 0 and p and checking for sign changes. A sign change signifies the existence of a root and the sign change interval is thendivided 4 times to better track the root. The Chebyshev polynomials are used to evaluate Fz1() and Fz2(). In thismethod the roots are found directly in the cosine domain {}qi. The polynomials Fz1() or Fz2() evaluated atzej=w can be written as:FeCxj()(),ww=-25with:CxTxfTxfTxfTxfTxf()()()()()()()()()()()/=+++++54321123452,(17)SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)20GSM 06.60 version 4.0.1where Txmm()cos()=w is the mth order Chebyshev polynomial, and fii(),,,, =15 are the coefficients ofeither Fz1() or Fz2(), computed using the equations in (16). The polynomial Cx() is evaluated at a certain valueof x=cos()w using the recursive relation:for
down to
endkxfkCxxfkkk=-+-=-+=++4125521212lllll()()()/,with initial values l51= and l60=. The details of the Chebyshev polynomial evaluation method are found in P.Kabal and R.P. Ramachandran [6].5.2.4LSP to LP conversionOnce the LSPs are quantified and interpolated, they are converted back to the LP coefficient domain {}ak. Theconversion to the LP domain is done as follows. The coefficients of Fz1() or Fz2() are found by expandingequations (14) and (15) knowing the quantified and interpolated LSPs qii, = ,,110. The following recursiverelation is used to compute fi1():for
to
for
down to
endendifiqfifijifjfjqfjfjii==--+-=-=--+---1521221121212111112111()()()()()()()with initial values ()f101= and ()f110-=. The coefficients ()fi2 are computed similarly by replacing qi21- byqi2.Once the coefficients fi1() and fi2() are found, Fz1() and Fz2() are multiplied by 11+-z and 11--z,respectively, to obtain Fz1'() and Fz2'(); that is:fififiifififii111222115115''()()(),,,,()()(),,,.
=+-==--=(18)Finally the LP coefficients are found by:afifiififiii=+=---=ìíïîï050515051105116101212.().(),,,,.().(),,,.''''
(19)This is directly derived from the relation ()AzFzFz()()()/''=+122, and considering the fact that Fz1'() andFz2'() are symmetric and anti-symmetric polynomials, respectively.SIST EN 301 245 V4.0.1:2003
EN 301 245 V4.0.1 (1997-12)21GSM 06.60 version 4.0.15.2.5Quantization of the LSP coefficientsThe two sets of LP filter coefficients per frame are quantified using the LSP representation in the frequency domain; thatis:()ffqiisi==2110parccos,,,,(20)where fi are the line spectral frequencies (LSF) in Hz [0,4000] and fs=8000 is the sampling frequency. The LSFvector is given by []ftfff=1210, with t denoting transpose.A 1st order MA prediction is applied, and the two residual LSF vectors are jointly quantified using split matrixquantization (SMQ). The prediction and quantization are performed as follows. Let z()()1n and z()()2n denote themean-removed LSF vectors at frame n. The prediction residual vectors r()()1n and r()()2n are given by:rzprzp()()()()()()(),()()(),1122nnnnnn=-=-and(21)where p()n is the predicted LSF vector at frame n. Fi
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...