ETSI TS 126 243 V14.0.0 (2017-04)
Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; ANSI-C code for the fixed-point distributed speech recognition extended advanced front-end (3GPP TS 26.243 version 14.0.0 Release 14)
Digital cellular telecommunications system (Phase 2+) (GSM); Universal Mobile Telecommunications System (UMTS); LTE; ANSI-C code for the fixed-point distributed speech recognition extended advanced front-end (3GPP TS 26.243 version 14.0.0 Release 14)
RTS/TSGS-0426243ve00
General Information
Standards Content (Sample)
TECHNICAL SPECIFICATION
Digital cellular telecommunications system (Phase 2+) (GSM);
Universal Mobile Telecommunications System (UMTS);
LTE;
ANSI-C code for the fixed-point distributed speech recognition
extended advanced front-end
(3GPP TS 26.243 version 14.0.0 Release 14)
�
3GPP TS 26.243 version 14.0.0 Release 14 1 ETSI TS 126 243 V14.0.0 (2017-04)
Reference
RTS/TSGS-0426243ve00
Keywords
GSM,LTE,UMTS
ETSI
650 Route des Lucioles
F-06921 Sophia Antipolis Cedex - FRANCE
Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16
Siret N° 348 623 562 00017 - NAF 742 C
Association à but non lucratif enregistrée à la
Sous-Préfecture de Grasse (06) N° 7803/88
Important notice
The present document can be downloaded from:
http://www.etsi.org/standards-search
The present document may be made available in electronic versions and/or in print. The content of any electronic and/or
print versions of the present document shall not be modified without the prior written authorization of ETSI. In case of any
existing or perceived difference in contents between such versions and/or in print, the only prevailing document is the
print of the Portable Document Format (PDF) version kept on a specific network drive within ETSI Secretariat.
Users of the present document should be aware that the document may be subject to revision or change of status.
Information on the current status of this and other ETSI documents is available at
https://portal.etsi.org/TB/ETSIDeliverableStatus.aspx
If you find errors in the present document, please send your comment to one of the following services:
https://portal.etsi.org/People/CommiteeSupportStaff.aspx
Copyright Notification
No part may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying
and microfilm except as authorized by written permission of ETSI.
The content of the PDF version shall not be modified without the written authorization of ETSI.
The copyright and the foregoing restriction extend to reproduction in all media.
© European Telecommunications Standards Institute 2017.
All rights reserved.
TM TM TM
DECT , PLUGTESTS , UMTS and the ETSI logo are Trade Marks of ETSI registered for the benefit of its Members.
TM
3GPP and LTE™ are Trade Marks of ETSI registered for the benefit of its Members and
of the 3GPP Organizational Partners.
GSM® and the GSM logo are Trade Marks registered and owned by the GSM Association.
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 2 ETSI TS 126 243 V14.0.0 (2017-04)
Intellectual Property Rights
IPRs essential or potentially essential to the present document may have been declared to ETSI. The information
pertaining to these essential IPRs, if any, is publicly available for ETSI members and non-members, and can be found
in ETSI SR 000 314: "Intellectual Property Rights (IPRs); Essential, or potentially Essential, IPRs notified to ETSI in
respect of ETSI standards", which is available from the ETSI Secretariat. Latest updates are available on the ETSI Web
server (https://ipr.etsi.org/).
Pursuant to the ETSI IPR Policy, no investigation, including IPR searches, has been carried out by ETSI. No guarantee
can be given as to the existence of other IPRs not referenced in ETSI SR 000 314 (or the updates on the ETSI Web
server) which are, or may be, or may become, essential to the present document.
Foreword
This Technical Specification (TS) has been produced by ETSI 3rd Generation Partnership Project (3GPP).
The present document may refer to technical specifications or reports using their 3GPP identities, UMTS identities or
GSM identities. These should be interpreted as being references to the corresponding ETSI deliverables.
The cross reference between GSM, UMTS, 3GPP and ETSI identities can be found under
http://webapp.etsi.org/key/queryform.asp.
Modal verbs terminology
In the present document "shall", "shall not", "should", "should not", "may", "need not", "will", "will not", "can" and
"cannot" are to be interpreted as described in clause 3.2 of the ETSI Drafting Rules (Verbal forms for the expression of
provisions).
"must" and "must not" are NOT allowed in ETSI deliverables except when used in direct citation.
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 3 ETSI TS 126 243 V14.0.0 (2017-04)
Contents
Intellectual Property Rights . 2
Foreword . 2
Modal verbs terminology . 2
Foreword . 4
1 Scope . 5
2 References . 5
3 Definitions and abbreviations . 5
3.1 Definitions . 5
3.2 Abbreviations . 5
4 C code structure . 5
4.1 Contents of the C source code . 5
4.2 Program execution . 6
4.3 Code hierarchy . 7
4.5 Variables, constants and tables . 11
4.5.1 Description of constants used in the C-code . 12
4.5.2 Description of fixed tables used in the C-code . 15
4.5.3 Static variables used in the C-code . 16
5 File formats . 19
5.1 Speech file . 19
Annex A (informative): Change history . 20
History . 21
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 4 ETSI TS 126 243 V14.0.0 (2017-04)
Foreword
rd
This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP).
The contents of the present document are subject to continuing work within the TSG and may change following formal
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an
identifying change of release date and an increase in version number as follows:
Version x.y.z
where:
x the first digit:
1 presented to TSG for information;
2 presented to TSG for approval;
3 or greater indicates TSG approved document under change control.
y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections,
updates, etc.
z the third digit is incremented when editorial only changes have been incorporated in the document.
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 5 ETSI TS 126 243 V14.0.0 (2017-04)
1 Scope
The present document contains an electronic copy of the ANSI-C code for DSR Extended Advanced Front-end. The
ANSI-C code is necessary for a bit exact implementation of DSR Extended Advanced Front-end.
2 References
The following documents contain provisions which, through reference in this text, constitute provisions of the present
document.
[1] ETSI ES 202 050 (2007-01) V1.1.5: "Distributed Speech Recognition; Advanced Front-end
Feature Extraction Algorithm; Compression Algorithm".
[2] ETSI ES 202 212 (2005-11) V1.1.2: "Distributed Speech Recognition; Extended Advanced Front-
end Feature Extraction Algorithm; Compression Algorithm, Back-end Speech Reconstruction
Algorithm".
[3] 3GPP TS 26.177: "Speech Enabled Services (SES); Distributed Speech Recognition (DSR)
extended advanced front-end test sequences".
3 Definitions and abbreviations
3.1 Definitions
Definition of terms used in the present document, can be found in [1], [2]
3.2 Abbreviations
For the purpose of the present document, the following abbreviations apply:
ANSI American National Standards Institute
I/O Input/Output
RAM Random Access Memory
ROM Read Only Memory
AFE Advanced Front-end
X-AFE eXtended Advanced Front-end
DSR Distributed Speech Recognition
4 C code structure
This clause gives an overview of the structure of the bit-exact C code and provides an overview of the contents and
organization of the C code attached to this document.
The C code has been verified on the following systems:
- Sun Microsystems workstations and GNU gcc compiler
- IBM PC compatible computers with Linux operating system and GNU gcc compiler.
ANSI-C was selected as the programming language because portability was desirable.
4.1 Contents of the C source code
The distributed files with suffix "c" contain the source code and the files with suffix "h" are the header files.
Makefiles are provided for the platforms in which the C code has been verified (listed above).
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 6 ETSI TS 126 243 V14.0.0 (2017-04)
4.2 Program execution
There are separate executables for the FrontEnd and Vector Quantization, with and without Extensions. The command
line options are described below.
<> - indicates parameters for the given option for running the executable
() – indicates default parameter.
FrontEnd w/ Extension:
USAGE: bin/ExtAdvFrontEnd infile HTK_outfile pitch_outfile class_outfile [options]
OPTIONS:
-q Quiet Mode (FALSE)
-F format Input file format (NIST)
-fs freq Sampling frequency in kHz <8,16> (8)
-swap Change input byte ordering (Native)
-noh No HTK header to output file (FALSE)
-noc0 No c0 coefficient to output feature vector (FALSE)
-nologE No logE component to output feature vector (FALSE)
-skip_header_bytes n - Skip header, first n bytes ( Only for -F RAW)
-noh, -noc0, -nologE and –skip_header_bytes are not used and should not be changed.
FrontEnd w/o Extension:
USAGE: bin/AdvFrontEnd infile HTK_outfile [options]
OPTIONS: - Same as FrontEnd w/ Extension
Vector Quantization w/ Extension:
Usage: extcoder htk_file_in pitch_file_in class_file_in bitstream_file_out pitch_file_out txt_file_out -freq x -
VAD/No_VAD
htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format.
pitch_file_in Input pitch period file.
class_file_in Input classification file.
bit_file_out Output binary bitstream.
pitch_file_out Output quantised pitch period file.
txt_file_out Vector quantiser output in text format.
-freq x Sampling frequency in kHz (8 or 16).
-VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but
extension .vad
-No_VAD Do not incorporate voice activity detector information in output bitstream.
Vector Quantization w/o Extension:
Usage: coder htk_file_in bitstream_file_out txt_file_out -freq x -VAD/No_VAD
htk_file_in Input mel-frequency cepstral coefficient file in HTK MFCC format.
bit_file_out Binary output bitstream.
txt_file_out Vector quantiser output in text format.
-freq x Sampling frequency in kHz (8 or 16).
-VAD Use voice activity detector data. Voice activity input file must have same name as htk_file, but
extension .vad
-No_VAD Do not incorporate voice activity detector information in output bitstream.
File extension descriptions as generated by the sample script:
.cep – Binary file containing cepstral features in HTK format. Output from the FrontEnd, input to the vector quantizer.
.pitch – Binary file containing pitch information. Output from the FrontEnd, input to the vector quantizer. Only used for
Extension.
.class – Ascii file containing class information. Output from the FrontEnd, input to the vector quantizer. Only used for
Extension.
.bs – Binary file containing the bitstream. Output from the vector quantizer.
.log – Log files from the different executables.
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 7 ETSI TS 126 243 V14.0.0 (2017-04)
4.3 Code hierarchy
Tables 1 to 3 are call graphs that show the functions used for AFE (table 1), VQ (table 2), and Extension (table 3).
Each column represents a call level and each cell a function. The functions contain calls to the functions in rightwards
neighboring cells. The time order in the call graphs is from the top downwards as the processing of a frame advances.
All standard C functions: printf(), fwrite(), etc. have been omitted. Also, no basic operations (add(), L_add(), mac(),
etc.) or double precision extended operations (e.g. L_Extract()) appear in the graphs.
The basic operations are not counted as extending the depth, therefore the deepest level in this software is level 7.
Table 1: AFE call structure
main()
AdvProcessInit_B()
DoNoiseSupInit_B()
DoWaveProcInit_B()
DoCompCepsInit_B()
DoPostProcInit_B()
DoVADInit_F()
Do16kProcInit_B()
QMF_FIR_Init_B()
fir_initialization_B()
DP_HP_filters_B()
BufIn32Alloc()
AdvProcessAlloc_B()
DoNoiseSupAlloc_B()
DoWaveProcAlloc_B()
DoCompCepsAlloc_B()
DoPostProcAlloc_B()
DoVADAlloc_F()
Do16kProcAlloc_B()
FlushAdvProcess_B()
DoVADFlush_F()
CvFeatInt2Float()
AdvProcessDelete_B()
DoNoiseSupDelete_B()
DoWaveProcDelete_B()
DoCompCepsDelete_B()
DoPostProcDelete_B()
DoVADDelete_B()
BufIn32Free()
DoAdvProcess_B()
Do16kProcessing_B()
DoNoiseSup_B()
Get16k_p_bufferData16k_B()
Get16k_bufData16kSize_B()
Get16k_p_BandsForCoding16k_B()
Get16k_p_CodeForBands16k_B()
Get16k_dataHP_B()
VAD_F()
Log_2()
DoSigWindowing16_F1()
DoSigWindowing16_F2()
ff4NRFix32_B()
GetL15()
GetH15()
Mult16x32()
Add_Mult16x16_16()
Sub_Mult16x16_16()
Permut()
FFTtoPSD_F()
Square24d2_B()
Square24_B()
Get16k_BFC_dec_B()
GetBandsForCoding16k_B()
PSDMean_F()
NoiseEstimation_F1()
Sqrt_2()
Sqrt16_2()
NoiseEstimation_F2()
Sqrt_2()
Sqrt16_2()
FilterCalc_F()
SpeechQVar()
FilterBank16()
SpeechQSpec()
SpeechQMel()
DoGainFact_F1()
Log_2()
DoGainFact_F2()
Log_2()
DoMelIDCT_F16()
ApplyWF()
Get16k_dec1()
Get16k_dec2()
Get16k_dec3()
DoSigWindowing16_F3()
ff4NRFix32_B()
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 8 ETSI TS 126 243 V14.0.0 (2017-04)
GetL15()
GetH15()
Mult16x32()
Add_Mult16x16_16()
Sub_Mult16x16_16()
Permut()
FFTtoPSD_F()
Square24d2_B()
Square24_B()
DoMelFB_B()
CodeBands16k_B()
DoSpecSub16k_B()
Log_2()
UpDateDecal()
ApplyDecal()
DCOffsetFil_F()
Get16k_hpBandsSize_B()
Get16k_p_hpBands_B()
Get16k_p_bufferCodeForBands16k_B()
Get16k_p_CodeForBands16k_B()
Get16k_p_bufferCodeWeights_B()
Get16k_p_codeWeights_B()
Set16k_hpBands_dec_B()
DoWaveProc_B()
TeagerEng()
GetTeagerFilter()
GetMaximaPositions()
DoCompCeps_B()
CepsCompute()
Get16k_p_bufferCodeWeights_B()
Get16k_p_bufferCodeForBands16k_B()
PreEmphHamm()
ff4NB16_B()
GetBandsForDecoding16k_B()
DecodeBands16k_B()
FilterBank()
Get16k_hpBands_dec_B()
Get16k_p_hpBands_B()
MergeSSandCoded_B()
CorrectEnergy_B()
CosInv16Khz()
cosInv() (only for 8kHz)
DoPostProc_B()
DoVADProc_F()
focalpoint()
Table 2: VQ call structure
main()
quantize_and_print()
get_best_dataframe()
best_centroid()
quant_pitch_abs()
get_class_bit()
quant_pitch_diff()
get_class_bit()
mfcc_crc_encode()
pc_crc_encode()
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 9 ETSI TS 126 243 V14.0.0 (2017-04)
Table 3: Extension call structure
main()
RVC_ConstructPitchRom_be()
RVC_ConstructPitchMeter_be()
Allocate_Interpolated
Dft_be()
RVC_ResetPitchMete
r_be()
RVC_DestructPitchRom_be()
RVC_DestructPitchMeter_be()
Deallocate_Interpolat
edDft_be()
DoAdvProcess_B()
DoPitchExtract()
FilterBank()
dsr_afe_vad()
get_vm()
fnLog2()
IsLowBandNoise()
get_zcm()
pre_process()
iir_d()
iir_s()
RVC_MeasurePitch_be()
ClearPitch_be()
DirichletInterpolation_b
e()
IsLowLevelInput_be()
Finalize_be()
IsContinuousPitc
h_be()
Mpy_lw_sw()
Mpy_lw_sw()
PrepareSpectralPeaks_
be()
CalcSpectrum_b
e()
Mpy_lw_sw()
Mpy_lw_sw_Add(
)
FindPeaks_be()
Prelim_ScaleDow
nAmpsOfHighFre
qPeaks_be()
qsort_be()*
swap()
CompareIpointA
mp_be()
RefineSpectralPe
aks_be()
sqrt_l_fix()
Final_ScaleDown
AmpsOfHighFreq
Peaks_be()
Mpy_lw_sw()
FindPitchCandidates_b
e()
NormalizeAmplitu
des_be()
CalcUtilityFunctio
n_be()
CreatePieceWise
ConstantFunction
_be()
L_Extract()
Mpy_32_16()
qsort_be()*
swap()
Compare_ARRA
Y_OF_XPOINTS
_be()
LinkArrayOfPoint
s_be()
AddSortedArrayO
fPoints_be()
LinkArrayOfPoint
s_be()
ConvertLinkedLis
tOfDiffPointsToUt
ilFunc_be()
FindDominantLoc
alMaximaInUtility
Function_be()
Mpy_lw_sw()
UtilityFunctionAt
GivenPitchFreq_
be()
qsort_be()*
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 10 ETSI TS 126 243 V14.0.0 (2017-04)
swap()
ComparePitchFre
qAscending_be()
SelectTopPitchC
andidates_be()
Mpy_lw_sw()
compute_pcorr_b
e()
interpolate_be()
Mpy_lw_sw()
Mpy_lw_lw()
sqrt_l_fix()
find_most_energ
etic_window_be()
accumulate_be()
find_most_energ
etic_window2_be
()
Mpy_lw_sw()
SelectFinalPitch_be()
qsort_be()*
swap()
ComparePitchFre
qDescending_be(
)
ClearPitch_be()
GOOD_ENOUG
H_be()
CLOSELY_LOCA
TED_be()
Mpy_lw_sw()
BETTER_be()
IsContinuousPitc
h_be()
Mpy_lw_sw()
CalculateDoubleWindo
wDft_be()
classify_frame()
* qsort_be() is a recursive function
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 11 ETSI TS 126 243 V14.0.0 (2017-04)
4.5 Variables, constants and tables
The data types of variables and tables used in the fixed point implementation are signed integers in 2's complement
representation, defined by:
- Word16 16 bit variable;
- Word32 32 bit variable.
ETSI
3GPP TS 26.243 version 14.0.0 Release 14 12 ETSI TS 126 243 V14.0.0 (2017-04)
4.5.1 Description of constants used in the C-code
Table 5a: Global constants for AFE
Constant Value Description
NS_SPEC_ORDER_16K 64 Noise suppression Array length
NS_HANGOVER_16K 15 Noise suppression hangover count
NS_MIN_SPEECH_FRAME_HANGOVER_16K 4 Noise suppression minmum speech frame hangover count
NS_ANALYSIS_WINDOW_16K 80 Noise suppression analysis window
PERC_CODED 0.7 lambda merge (empirically set constant)
LAMBDA_NSE16k 0.99 Noise estimation Lambda
NS_NB_FRAME_THRESHOLD_NSE 100 Noise suppression number of frame threshold used for NSE
LENGTH_QMF 118 QMF filter length
f24 1 multiplier for QMF filter coefficients
SHFF_H 8 shift to get higher value
L_H 16 shift to get lower value
HP16k_MEL_USED 3 Higher frequnecy band Mel used
NB_LP_BANDS_CODING 3 Lower frequency band used in coding
NE16k_FRAMES_THRESH 100 Noise estimation frames threshold
NB_TOPOSTPROC 12 Number of coefficients to postprocess
CEP_FRAME_LENGTH 200 Frame length for cepstral coefficients
CEP_NB_COEF 13 Number of cepstral coefficients (including c0)
CEP_NB_CHANNELS 23 Number of filters used for cepstral coefficients
CEP_FFT_LENGTH 256 FFT length for cepstral coefficients
FRAME_BUF_SIZE 241 Denoised Output buffer size
FRAME_SHIFT 80 WaveProcessing input frame shift
FRAME_LENGTH 200 WaveProcessing frame size
NS_SPEC_ORDER 65 Noise suppression array length (8khz)
NS_BUFFER_SIZE 180 Noise suppression past frame size
NS_FRAME_SHIFT 80 Noise suppression input frame shift
NS_HALF_FILTER_LENGTH 8 Noise suppression filter half size
NS_NB_FRAME_THRESHOLD_LTE 10 Noise suppression long term energy forgetting factor threshold (in frames)
NS_NB_FRAME_THRESHOLD_NSE 100 Noise suppression spectrum estimate forgetting factor threshold (in frames)
NS_MIN_FRAME 10 Number of frame threshold to update average energy for Nosie suppression VAD
NS_FFT_LENGTH 256 FFT length for noise suppression
WF_MEL_ORDER 25 Noise suppression Wiener filter order
SHFT_NOISE 14 shift applied to noise spectrum estimate
SHFT_FACT_MUL 14 shift applied to gain coefficient (nosie suppression gain factoriization)
IDCT_ORDER 25 Noise suppression idct order
NS_BETA 0.98 Noiseless signal suppression factor
NS_RSB_MIN 0.079432823 Minimum a priori SNR
NS_LAMBDA_NSE 0.99 Forg
...








Questions, Comments and Discussion
Ask us and Technical Secretary will try to provide an answer. You can facilitate discussion about the standard in here.
Loading comments...