[go: up one dir, main page]

CN112614495A - Software radio multi-system voice coder-decoder - Google Patents

Software radio multi-system voice coder-decoder Download PDF

Info

Publication number
CN112614495A
CN112614495A CN202011452195.5A CN202011452195A CN112614495A CN 112614495 A CN112614495 A CN 112614495A CN 202011452195 A CN202011452195 A CN 202011452195A CN 112614495 A CN112614495 A CN 112614495A
Authority
CN
China
Prior art keywords
algorithm
decoding
coding
encoding
codec
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011452195.5A
Other languages
Chinese (zh)
Inventor
周小青
李建
刘新
曹清亮
赵静怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huaxin Shengyuan Technology Co ltd
Original Assignee
Beijing Huaxin Shengyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huaxin Shengyuan Technology Co ltd filed Critical Beijing Huaxin Shengyuan Technology Co ltd
Priority to CN202011452195.5A priority Critical patent/CN112614495A/en
Publication of CN112614495A publication Critical patent/CN112614495A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • G10L19/265Pre-filtering, e.g. high frequency emphasis prior to encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a software radio multi-standard voice codec.A hardware adopts an embedded system platform and comprises a coding module and a decoding module, and the coding and decoding processing of various voices such as CVSD, G.729 and MELP is realized through an application process consisting of a main process, an audio process and a coding and decoding algorithm process; the main process provides a communication port for a user, coordinates the work between an audio process and a coding and decoding algorithm process, and inputs configuration parameters; the audio process is used for providing volume adjustment, MIC and linear input channel switching and processing of a recording interface and a playback interface; the encoding and decoding algorithm process is provided with a plurality of algorithms, so that encoding and decoding processing are realized, parameters are configured to be used, and an interface of an audio process is obtained. The invention integrates a plurality of voice coding algorithms in a chip, meets the requirement of flexible switching of the waveform system of a new generation of software radio communication radio station on voice, and can realize flexible switching of the radio station voice.

Description

Software radio multi-system voice coder-decoder
Technical Field
The invention relates to a software radio device, in particular to a software radio multi-mode voice coder-decoder.
Background
Language is an important means for human interaction, and the most common form of data in a communication system is speech. Voice communication is one of the most basic and important ways of human communication. With the development of the social era, people rapidly enter the information era, and the requirements on the utilization rate of various resources are higher and higher, so that the development of the voice coding and decoding technology is promoted.
At present, global military and civil communication systems have different voice encoding and decoding modes due to different communication environments, communication distances, channel bandwidths and user requirements. At present, each military and soldier category in China has a respective independent communication system, the voice coding and decoding systems are different, and coding and decoding special chips adopted in corresponding equipment in the system are different, so that interconnection and intercommunication among different combat systems cannot be realized, and the combat efficiency is influenced. The new generation military radio station adopts a flexible architecture of a software radio system, and can realize interconnection and intercommunication of information (characters, images and videos) among the radio stations. In order to realize interconnection under such a structure, a radio station needs to be capable of flexibly switching between various coding and decoding systems, which urgently needs a multi-system voice codec.
Disclosure of Invention
The invention provides a software radio multi-system voice codec, which solves the problems of single communication mode code rate and algorithm fixation and poor system flexibility of the traditional customized radio station, and adopts the following technical scheme:
a software radio multi-standard voice codec, the hardware adopts the embedded system platform, include the code module and decode the module, realize CVSD, G.729, MELP many pronunciation code and decode the processing through the application process that the main process, audio frequency process and code and decode the algorithm process make up; the main process provides a communication port for a user, coordinates the work between an audio process and a coding and decoding algorithm process, and inputs configuration parameters; the audio process is used for providing volume adjustment, MIC and linear input channel switching and processing of a recording interface and a playback interface; the encoding and decoding algorithm process is provided with a plurality of algorithms, so that encoding and decoding processing are realized, parameters are configured to be used, and an interface of an audio process is obtained.
The encoding and decoding steps of the invention are opposite, wherein the encoding process comprises the following steps:
s1: setting input configuration parameters by the main process to form working parameters of a coding and decoding algorithm process;
s2: an adjustable gain amplifier in the audio process receives the collected audio data, amplifies the audio data and sends the amplified audio data to an ADC module of the audio process;
s3: the ADC module of the audio process converts the audio data into analog-digital data and transmits the analog-digital data to the coding module of the coding and decoding algorithm process through the annular buffer area;
s4: a coding module of the coding and decoding algorithm process selects a corresponding algorithm decision to perform coding processing according to the system;
s5: and finally, the coding module of the coding and decoding algorithm process outputs the corresponding code stream to other equipment through the network port.
The encoding and decoding algorithm process is provided with a CVSD encoding algorithm, the CVSD encoding algorithm tracks the change of a signal by continuously changing the size of a quantum order delta during encoding so as to reduce granular noise and slope overload distortion, and the quantum order delta is output based on the past 3 or 4 samples;
1) when f (n) > g (n), the comparator output e (n) >0, the digital code y (n) > 1, the integrator output g (n) > g (n-1) + delta
2) When f (n) < g (n), the comparator output e (n) <0, the digital code y (n) <0, the integrator output
g(n)=g(n-1)-δ。
The CVSD decoding algorithm process is provided with a CVSD decoding algorithm, when the CVSD decoding algorithm is used for decoding, the decoding is to judge the received digital code y (n), the integrator outputs a rising value when receiving a 1 code, the integrator outputs a falling value when receiving a 0 code, the integrator outputs a rising value when continuously receiving the 1 codes, otherwise, the integrator outputs a falling value when continuously receiving the 0 codes, and thus, the input signal is recovered;
1) when y (n) is 1, the integrator outputs g (n) g (n-1) + δ
2) When y (n) is 0, the integrator outputs g (n) g (n-1) - δ.
The encoding and decoding algorithm process is provided with a G.729 encoding algorithm, when the G.729 encoding algorithm is used for encoding, an input signal is subjected to high-pass filtering preprocessing, LP analysis is performed once every 10ms frame, LP filter coefficients are calculated and converted into line spectrum pairs, the line spectrum pairs are defined as LSP, an excitation signal is searched by an A-B-S method, the search is performed by taking the minimum error perception weighting of original speech and synthesized speech as a measure, and a perception weighting filter is constructed by using unquantized LP coefficients; determining excitation parameters once per subframe, quantized and unquantized LP filter coefficients for subframe 2, and using interpolated LP coefficients in subframe 1, estimating an open-loop pitch delay from the perceptually weighted speech signal every 10ms frame; repeating for each subframe: the target signal is obtained by calculating LP residual filtered by a weighted synthesis filter; calculating the impulse response of the weighted synthesis filter; searching values near the open-loop pitch delay by using the target signal and the impulse response to perform closed-loop pitch analysis; subtracting the contribution of the adaptive code book from the target signal, and using the new target signal for searching the fixed code book to find the optimal excitation; finally, the filter is modified by the determined excitation signal.
The coding and decoding algorithm process is provided with a G.729 coding algorithm, when the G.729 coding algorithm is used for decoding, parameter numbers are firstly extracted from a received code stream, the numbers are decoded to obtain coding parameters corresponding to a 10ms voice frame, the parameters are LSP parameters, two fractional fundamental tone time delays, two fixed code vectors and two groups of self-adaptive and fixed code word gains, the LSP parameters of each subframe are interpolated and converted into LPC filter coefficients, and then, the processing is carried out according to the following steps every 5ms subframe: firstly, respectively multiplying self-adaptive code words and fixed code words by respective gains and adding to form excitation; secondly, exciting the LPC synthesis filter to reconstruct voice; and thirdly, the reconstructed voice signal is subjected to post-processing, including long-time post-filtering, short-time comprehensive filtering and high-pass filtering.
The coding and decoding algorithm process is provided with a 2.4kbps coding algorithm, and the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter direct-current power frequency interference; then, multi-band mixed excitation is adopted to carry out unvoiced and voiced sound judgment so as to accurately extract a fundamental tone signal; linear prediction mainly includes analysis of input speech and analysis of residual signals; when the periodicity of the voiced segment signal is not good, exciting unstable vocal cord pulses at a decoding end by adopting an excitation source adaptive to the aperiodic mark; according to the minimum principle of perceptual weighted distortion, a four-level codebook fast search vector quantization algorithm is adopted to quantize the related parameters; and packaging the error correction coded bit stream and then transmitting the error correction coded bit stream.
The coding and decoding algorithm process is provided with a 1.2kbps coding algorithm, and compared with a 2.4kbps coding algorithm, only the intra-frame correlation is removed in linear prediction, and the code rate is reduced.
The coding and decoding algorithm process is provided with a 0.6kbps coding algorithm, wherein coding is divided into parameter extraction and parameter quantization, and the parameter extraction of a coder is divided into four parts, namely fundamental tone extraction, band-pass unvoiced and voiced sound analysis, line spectrum frequency parameter extraction and gain estimation; during decoding, firstly, unpacking the received bit streams, arranging the bit streams according to the parameter sequence, distinguishing the coded bit streams of each parameter, then sending the coded bit streams of each parameter to a parameter decoding module, and decoding each parameter by adopting an inverse quantization means to obtain four parameters of line spectrum frequency, band-pass unvoiced and voiced sound judgment, pitch period and gain of the whole super frame; and finally, forming an excitation signal by using the fundamental tone period, the residual harmonic amplitude and the band-pass voiced and unvoiced decision, performing spectrum enhancement processing on the generated excitation signal by using the line spectrum frequency, and performing voice synthesis processing on the input excitation signal by using the line spectrum frequency and the gain to obtain two frames of synthesized voice signals and outputting the two frames of synthesized voice signals.
The software radio multi-system voice codec is suitable for the requirement of flexible switching of the waveform system of a new generation software radio communication radio station on voice, realizes flexible switching of the radio station voice, and replaces the communication mode that the traditional customized radio station can only realize single conversation by depending on a single chip. The coder-decoder adopts a micro-system integration technology, integrates a plurality of voice coding algorithms into one chip, comprises a plurality of voice coding and decoding modes of CVSD, G.729 and MELP, is suitable for the requirement of flexible switching of a waveform system of a new generation of software radio communication radio station on voice, and can realize flexible switching of the radio station voice. The problem that the combat communication systems cannot be interconnected and intercommunicated is solved, and the combat efficiency is improved. Plays an important role in various communication systems of sea, land and air.
Drawings
FIG. 1 is a schematic diagram of a multi-process design of the software radio multi-mode speech codec;
FIG. 2 is a schematic diagram of the relationship of user space and kernel space of the present invention;
FIG. 3 is a schematic of the encoding flow process of the present invention;
FIG. 4 is a diagram of a delta modulation waveform and a corresponding digital code pattern in a CVSD speech codec algorithm;
FIG. 5 is a schematic diagram of the encoding operation of a CVSD;
FIG. 6 is a diagram illustrating the decoding operation of a CVSD;
FIG. 7 is a schematic diagram of the G.729 speech codec algorithm for encoding;
FIG. 8 is a schematic diagram of the decoding performed by the G.729 speech codec algorithm;
FIG. 9 is a schematic illustration of the encoding performed by the 2.4kbps speech codec algorithm;
FIG. 10 is a schematic illustration of the decoding performed by the 2.4kbps speech codec algorithm;
FIG. 11 is a schematic illustration of the encoding performed by the 1.2kbps speech codec algorithm;
FIG. 12 is a schematic illustration of the decoding performed by the 1.2kbps speech codec algorithm;
FIG. 13 is a schematic illustration of the encoding performed by the 0.6kbps speech codec algorithm;
FIG. 14 is a schematic illustration of decoding performed by the 0.6kbps speech codec algorithm.
Detailed Description
First, the software and hardware platform introduction of the invention
The invention loads the multi-system voice coding and decoding software to the embedded system platform, so that the chip has a plurality of voice coding and decoding modes such as CVSD, G.729, MELP and the like, has high voice quality, meets the requirements of various rates and various coding and decoding modes, and has full duplex voice coding and decoding capability. The multi-standard speech coding and decoding rate can be changed between 600bps and 32000 bps. Natural sound quality and speech intelligibility is maintained even at 600 bps.
The software radio multi-mode voice codec comprises two parts of hardware and software, wherein the hardware adopts an embedded system and mainly provides a platform for the software, and the software comprises an equipment driver and an application program. The software combines a plurality of voice coding and decoding algorithms, and can flexibly switch according to the requirements of customers so as to meet the requirements of the customers.
The software radio multi-system voice codec comprises a coding module and a decoding module, and the working mode is coding and decoding, and can be only coding, only decoding or coding and decoding.
And (3) encoding: firstly, audio data collected by a microphone is subjected to analog-to-digital (A/D) conversion and then transmitted to a coding module; then, selecting a corresponding coding algorithm according to the system to perform coding processing, and finally outputting a corresponding coding code stream to other equipment through a network port;
and (3) decoding: firstly, transmitting code stream data of the network port to a decoding module; and then selecting a corresponding decoding algorithm according to the system for decoding, and finally outputting the decoded data to a loudspeaker after digital-to-analog (D/A) conversion.
As shown in fig. 1, the software-defined radio multi-mode speech codec adopts a multi-process design, and its functions are split into: a main process, an audio process, and a coding/decoding algorithm process (coding algorithm process, decoding algorithm process), which are related as follows:
(1) a main process: providing a communication port for a user, completing data analysis, and coordinating the work among an audio process, an encoding algorithm process and a decoding algorithm process;
(2) and (3) audio process: providing volume adjustment, MIC and linear input channel switching, input gain control, and processing of a recording interface and a playback interface;
(3) encoding and decoding algorithm process: algorithms 1-n are provided to provide algorithm support for encoding algorithm processes and decoding algorithm processes, and to provide process parameter configuration and interface processing for acquiring audio processes.
The multi-process design of fig. 1 is implemented in both software and hardware, as shown in fig. 2, the software may be regarded as a user space, the top layer of the software is an application process, the application process is implemented by a modular application program made using a library language, and the hardware may be regarded as a kernel space, which is implemented by a device driver layer.
The application process comprises a main process, an audio process, an encoding algorithm process and a decoding algorithm process in the figure 1; the modularized application program comprises a voice algorithm, state management, command interaction, data interaction, protocol processing, log management, CORBA service and a ring buffer area; the library language comprises a system C language library and other third party libraries; the device driver layer comprises hardware drivers such as an SPI driver, a UART driver, a network driver, a GPIO driver, an audio driver, a Flash driver and the like.
The following is a functional description of some of the main contents:
(1) host process
And the system is responsible for scheduling the whole software and deciding the flow and the used parameters of the current software.
(2) Audio process
The alsa-lib library is relied on to provide recording, playback and audio parameter setting services for the system. And a socket communication mode is adopted between the process and other processes.
(3) Encoding and decoding process
And coding and decoding algorithm services of various systems are provided depending on coding and decoding algorithms. And the communication with other processes is carried out in a socket mode.
(4) Speech algorithm
The method realizes the speech coding and decoding algorithm interfaces and configuration interfaces such as CVSD, G729, MELP and the like, and various systems are mutually independently designed into independent processes.
(5) State management
And a state machine for realizing system work and simultaneously outputting each state on the GPIO. And performing abnormal state indication on the GPIO.
(6) Data interaction
TCP connection management is adopted as a middleware interface of a communication layer, a specific communication hardware port is shielded for upper-layer application, and data receiving and sending, communication timeout and the like are realized. The server is designed for concurrency and provides access of a plurality of clients.
(7) Protocol processing
Realizing CORBA protocol layer and self-defining TCP protocol encapsulation analysis.
(8) Command interaction
And man-machine interaction is realized in the program for troubleshooting at the test stage, and the module belongs to an independent module and can operate any other module.
(9) Log management
And writing and reading logs, adding a time stamp to each log, and determining to immediately output a print file, write a log file only and print and write a file according to parameters.
Secondly, the invention describes the processing steps of coding and decoding
As shown in fig. 3, based on the platform provided by the software and hardware, the encoding process of the present invention includes the following processes:
s1: setting input configuration parameters by the main process to form working parameters;
s2: the adjustable gain amplifier receives the collected audio data, amplifies the audio data and then sends the amplified audio data to the ADC module;
s3: the ADC module transmits the audio data to the coding module through an annular buffer area after analog-to-digital (A/D) conversion;
s4: the coding module selects a corresponding algorithm decision to perform coding processing according to the system; the selection process of the algorithm is realized through data frames, when the system is selected, the algorithm type is selected by sending the data frames in the prior receiving and sending process, the data frames sent by the sending end contain the information of the algorithm type, and the receiving end determines according to the information.
S5: and finally, the coding module outputs the corresponding code stream to other equipment through the network port.
Accordingly, the decoding process of the present invention is to perform the inverse process of the encoding process.
Thirdly, description of various algorithms in the invention
The coding and decoding algorithm process mainly realizes the coding and decoding functions of voice, has various voice coding and decoding modes such as CVSD (16K/32K), G.729(8K), MELP (2.4K/1.2K/0.6K) and the like, has high-quality voice quality, and can meet the requirements of various rates, various coding and decoding modes and full-duplex communication systems.
The voice coding and decoding method comprises a CVSD voice coding and decoding algorithm, a G.729 voice coding and decoding algorithm, a 2.4kbps voice algorithm, a 1.2kbps voice algorithm and a 0.6kbps voice algorithm, wherein the algorithms are as follows:
1. CVSD voice coding and decoding algorithm
In a plurality of voice coding and coding modulations, continuous variable slope delta modulation (CVSD) is used as one of a plurality of delta modulations, belongs to a differential waveform quantization technology, only one bit of code needs to be coded, code pattern synchronization is not needed between a sending end and a receiving end, and the size of a step delta can automatically track signal change, so that the voice coding and coding modulation method has strong error code resistance.
At present, a CVSD (composite video signal) special encoder is available in the market, but the universality, flexibility and expandability of the special encoder are greatly limited, the development period of a product is long, and the development cost is high. The special CVSD coder can only realize one-way coding and decoding, and needs a plurality of special coders when a plurality of paths of CVSD coding and decoding are needed, so that the special CVSD coder has limitation.
CVSD is a delta modulation mode in which the magnitude of the step delta varies continuously with the average slope of the input speech signal, as shown in fig. 4. The working principle is as follows: approximating the speech signal by a plurality of line segments with continuously variable slopes, wherein when the slope of the line segment is positive, the corresponding digital code is 1; when the slope of a line segment is negative, the corresponding number is encoded as 0.
When the CVSD operates in the encoding mode, the flow is shown in fig. 5. CVSD tracks signal changes to reduce grain noise and slope overload distortion by constantly changing the magnitude of the step δ, which is based on the past 3 or 4 sample outputs.
1) When f (n) > g (n), the comparator output e (n) >0, the digital code y (n) > 1, the integrator output
g(n)=g(n-1)+δ
2) When f (n) < g (n), the comparator output e (n) <0, the digital code y (n) <0, the integrator output
g(n)=g(n-1)-δ
When the CVSD operates in the decoding mode, the process is shown in fig. 6, the decoding is to determine the received digital code y (n), the integrator outputs a rising value when receiving a "1" code, the integrator outputs a falling value when receiving a "0" code, and the output rises (or falls) when continuously receiving "1" codes (or "0" codes), so that the input signal can be approximately recovered.
1) When y (n) is 1, the integrator outputs g (n) g (n-1) + δ.
2) When y (n) is 0, the integrator outputs g (n) g (n-1) - δ.
2. G.729 speech coding and decoding algorithm
ITU-T published the 8kbps Algebraic Code Excited Linear Prediction (CS-acelP) speech coding scheme with conjugated Structure proposed by G.729 in 3.1996. The scheme is characterized in that the analysis window adopts a mixed window; LSP (Linear spectral Pair) parameter adopts two-stage vector quantization; the codebook search with the subframe as a unit is divided into adaptive codebook search and algebraic codebook search; the pitch analysis combines open-loop pitch analysis and self-adaptive codebook search, so that the operation amount is reduced, the quantization bit number of the pitch is reduced, the accuracy of pitch prediction is improved, an algebraic codebook algorithm is simple, a codebook does not need to be stored, and the recovered tone quality is clear.
The encoding workflow of the g.729 algorithm is shown in fig. 7, where the input signal is first subjected to high-pass filtering preprocessing, LP analysis is performed every 10ms frame, the LP filter coefficients are calculated, and these coefficients are converted into Line Spectral Pairs (LSPs). The excitation signal is searched by the a-B-S method, with a measure of the perceptually weighted minimum of error between the original speech and the synthesized speech, and the perceptually weighted filter is constructed using unquantized LP coefficients.
Excitation parameters (fixed codebook and adaptive codebook parameters) are determined once per subframe (5ms, 40 samples). The quantized and unquantized LP filter coefficients are used for sub-frame 2, while Interpolated (Interpolated) LP coefficients are used in sub-frame 1. The open-loop pitch delay is estimated every 10ms frame from the perceptually weighted speech signal. The following operations are repeated for each subframe: the target signal is calculated from the LP residual filtered by the weighted synthesis filter. And secondly, calculating the impulse response of the weighted synthesis filter. And searching values near the open-loop pitch delay by using the target signal and the impulse response to perform closed-loop pitch analysis (namely, searching the adaptive codebook delay and gain). And fourthly, subtracting the contribution of the adaptive code book from the target signal, and using the new target signal for searching the fixed code book to find the optimal excitation. Finally, the filter is modified by the determined excitation signal.
The decoding work flow of the G.729 algorithm is shown in FIG. 8: firstly, parameter numbers are extracted from a received code stream, and the numbers are decoded to obtain coding parameters corresponding to a 10ms voice frame. These parameters are the LSP parameters, two fractional pitch delays, two fixed codevectors and two sets of adaptive and fixed codeword gains. The LSP parameters per sub-frame are interpolated and converted to LPC filter coefficients, and then processed every 5ms sub-frame as follows: firstly, respectively multiplying self-adaptive code words and fixed code words by respective gains and adding to form excitation; secondly, exciting the LPC synthesis filter to reconstruct voice; and thirdly, the reconstructed voice signal is subjected to post-processing, including long-time post-filtering, short-time comprehensive filtering and high-pass filtering.
3. 2.4kbps speech algorithm
Encoding of MELP algorithm as shown in fig. 9, the whole algorithm can be divided into two parts of parameter extraction and parameter quantization. The extraction of parameters of the MELP coder is divided into a fundamental tone extraction part, a band-pass unvoiced and voiced analysis part, a line spectrum pair (LSF) parameter extraction part, a gain estimation part and a Fourier spectrum amplitude extraction part, and the parts are correlated, and one part may use the results of the other parts in the calculation. The parameter quantization part of the MELP coder is characterized by using multi-stage vector quantization, the quantization performance is excellent, the bit number of LSF parameter quantization is effectively reduced, and the calculation complexity is low.
The MELP coding process includes that the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter out the direct current power frequency interference; then, multi-band mixed excitation is adopted to carry out unvoiced and voiced sound judgment so as to accurately extract a fundamental tone signal; linear prediction mainly includes analysis of input speech and analysis of residual signals; when the periodicity of the voiced segment signal is not good, exciting unstable vocal cord pulses at a decoding end by adopting an excitation source adaptive to the aperiodic mark; according to the minimum principle of perceptual weighted distortion, a four-level codebook fast search vector quantization algorithm is adopted to quantize the related parameters; and packaging the error correction coded bit stream and then transmitting the error correction coded bit stream.
Decoding of the MELP algorithm is shown in fig. 10. MELP uses a speech generation model more conforming to human pronunciation mechanism to synthesize speech, and utilizes adaptive spectrum enhancement technology and pulse spread filtering technology to carry out post-processing on the synthesized speech, so as to improve the matching degree of the synthesized speech and the analyzed speech, thereby obtaining higher reconstructed speech quality.
The decoder unpacks the received code stream bits and arranges the bits according to the parameter sequence; then decoding is carried out, the whole decoding process comprises data unpacking and mixed excitation signal generation, and then the mixed excitation signal is processed by adopting a series of methods to improve the quality of the synthesized voice; finally, the synthesized voice is obtained.
4. 1.2kbps speech algorithm
As shown in FIGS. 11 and 12, the 1.2kbps speech coding algorithm is performed on the basis of 2.4kbps MELP. In order to further reduce the code rate, a multi-frame joint coding technology is adopted, namely three continuous frames are adopted to form a super frame for coding, and each frame in the super frame is called as a sub frame. The frame length of the sub-frame is 22.5ms (or 180 samples), each super-frame is 67.5ms, the super-frame is divided into different states according to the difference of clear/turbid (U/V) attributes of the three sub-frames, and each state adopts different bit allocation schemes. The calculation method of the parameters of each subframe in the superframe is the same as the 2.4kbps algorithm, and in order to improve the quality, the 1.2kbps algorithm is added with two algorithm modules of fundamental tone smoothing and smoothing of band-pass voiced sound intensity during parameter estimation.
5. 0.6kbps speech algorithm
Regarding the parameter extraction method of 600bps speech coding, because the algorithm is an improvement on the basis of MELP, the extraction method is consistent, only four parameters which are important for speech intelligibility are reserved for reducing the speech coding rate: line spectrum frequency, unvoiced and voiced decision, pitch period, and gain. Three consecutive frames are encoded as a super-frame, each frame in the super-frame being referred to as a sub-frame. The frame length of a subframe is 25ms (or 200 samples), and each superframe is 75ms, and quantization is performed by using 45 bits. The following is the encoding of the above four parameters, and the specific encoding process is as follows.
The encoding of the 0.6kbps speech algorithm is shown in fig. 13. The whole algorithm can be divided into two parts of parameter extraction and parameter quantization. The parameter extraction of the encoder is divided into four parts of fundamental tone extraction, band-pass unvoiced and voiced sound analysis, Line Spectral Frequency (LSF) parameter extraction and gain estimation.
The encoding process is as follows: the digitized voice signal passes through a four-order Chebyshev high-pass filter to filter direct-current power frequency interference; then, the preprocessed voice signals are respectively sent to four modules of linear predictive analysis, band-pass voiced sound intensity analysis, fundamental tone detection and gain analysis.
After passing through the four modules, the parameter vector of the super frame can be obtained, and a proper quantization mode is selected for quantization coding. And finally outputting the obtained 45-bit speech coding data frame to a coding channel. The decoder adopts a voice generation model which is more in line with human pronunciation mechanism to synthesize voice, and utilizes the adaptive spectrum enhancement technology and the pulse spread filtering technology to carry out post-processing on the synthesized voice, so that the matching degree of the synthesized voice and the analyzed voice is improved, and the reconstructed voice quality is higher.
As shown in fig. 14, after receiving the bit stream transmitted from the channel, the decoding end first unpacks the received bit stream, arranges the bit stream in the order of the parameters, and distinguishes the encoded bit streams of the parameters. And then, the coded bit stream of each parameter is sent to a parameter decoding module, and each parameter is decoded by adopting a proper inverse quantization means to obtain four parameters of the line spectrum frequency, the band-pass unvoiced and voiced sound judgment, the pitch period and the gain of the whole super frame. And then, forming an excitation signal by using the fundamental tone period, the residual harmonic amplitude and the band-pass voiced and unvoiced decision, and performing spectrum enhancement processing on the generated excitation signal by using the line spectrum frequency. And finally, carrying out voice synthesis processing on the input excitation signal by using line spectrum frequency and gain to obtain two frames of synthesized voice signals and outputting the two frames of synthesized voice signals.
The invention has the following characteristics: 1. the speech codec rate may vary from 600bps to 32000 bps. 2. Multi-language algorithms are optimized on a criteria-based basis. 3. A plurality of voice coding and decoding modes are realized in an embedded system by using a digital mode, and can be freely switched.
The software radio multi-standard voice coder-decoder provided by the invention has the following physical characteristics:
(1) the external dimension is as follows: 35X35X5 (width X depth X height mm) (+0.01mm), technical grade no greater than 50 g.
(2) The environmental temperature requirement is as follows:
working temperature: minus 40 ℃ to plus 85 ℃.
Storage temperature: minus 55 ℃ to plus 125 ℃.
(3) Operating voltage and frequency:
the working voltage is 3.3V, and the working frequency is 600 MHz.
(4) The application environment requires: the internal and external field environment is applicable.
The invention utilizes the microsystem SIP packaging technology: the technology carries out secondary packaging on the components with smaller volume, so that the product has smaller volume, and the microsystem SIP packaging technology needs higher process requirements and technical level.

Claims (9)

1.一种软件无线电多制式语音编解码器,其特征在于:硬件采用嵌入式系统平台,包含编码模块与解码模块,通过主进程、音频进程和编解码算法进程组成的应用进程实现CVSD、G.729、MELP多种语音的编解码处理;所述主进程为用户提供通信端口,协调音频进程和编解码算法进程之间的工作,以及配置参数的输入;所述音频进程用于提供音量调节、MIC和线性输入通道切换以及录音接口和放音接口的处理;所述编解码算法进程提供有多种算法,实现编码和解码处理,以及配置参数的使用和获取音频进程的接口。1. a software radio multi-standard speech codec, is characterized in that: the hardware adopts embedded system platform, comprises coding module and decoding module, realizes CVSD, G, G, CVSD by the application process that main process, audio frequency process and codec algorithm process are formed. 729. The codec processing of multiple voices of MELP; the main process provides a communication port for the user, coordinates the work between the audio process and the codec algorithm process, and the input of configuration parameters; the audio process is used to provide volume adjustment , MIC and linear input channel switching and processing of recording interface and playback interface; the codec algorithm process provides a variety of algorithms to implement encoding and decoding processing, as well as the use of configuration parameters and the interface for obtaining audio processes. 2.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述本发明的编码和解码步骤相反,其中进行编码处理时包括如下步骤:2. software radio multi-standard speech codec according to claim 1, is characterized in that: the coding of the present invention is opposite to the decoding step, and comprises the following steps when wherein carrying out coding processing: S1:主进程设置输入配置参数,形成编解码算法进程的工作参数;S1: The main process sets the input configuration parameters to form the working parameters of the codec algorithm process; S2:音频进程中的可调增益放大器接收采集的音频数据,进行放大处理后,发送给音频进程的ADC模块;S2: The adjustable gain amplifier in the audio process receives the collected audio data, amplifies it, and sends it to the ADC module of the audio process; S3:音频进程的ADC模块将音频数据经模数转换后,通过环形缓冲区传输给编解码算法进程的编码模块;S3: The ADC module of the audio process transfers the audio data to the encoding module of the encoding and decoding algorithm process through the ring buffer after analog-to-digital conversion; S4:编解码算法进程的编码模块根据制式选择对应的算法决策进行编码处理;S4: The encoding module of the encoding and decoding algorithm process performs encoding processing according to the algorithm decision corresponding to the format selection; S5:编解码算法进程的编码模块最后将对应的编码码流通过网口输出到其他设备。S5: The encoding module of the encoding and decoding algorithm process finally outputs the corresponding encoded code stream to other devices through the network port. 3.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有CVSD编码算法,所述CVSD编码算法在编码时,通过不断改变量阶δ大小来跟踪信号的变化以减小颗粒噪声与斜率过载失真,量阶调整δ是基于过去的3个或4个样值输出;3. software radio multi-standard speech codec according to claim 1, is characterized in that: described codec algorithm process is provided with CVSD coding algorithm, and described CVSD coding algorithm, when coding, by constantly changing magnitude δ size To track the change of the signal to reduce particle noise and slope overload distortion, the magnitude adjustment δ is based on the past 3 or 4 sample output; 1)当f(n)>g(n)时,比较器输出e(n)>0,则数字编码y(n)=1,积分器输出g(n)=g(n-1)+δ1) When f(n)>g(n), the comparator output e(n)>0, then the digital code y(n)=1, the integrator output g(n)=g(n-1)+δ 2)当f(n)=<g(n)时,比较器输出e(n)<0,则数字编码y(n)=0,积分器输出2) When f(n)=<g(n), the comparator output e(n)<0, then the digital code y(n)=0, the integrator output g(n)=g(n-1)-δ。g(n)=g(n-1)-[delta]. 4.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有CVSD解码算法,所述CVSD解码算法在译码时,译码是对接收到的数字编码y(n)进行判断,每收到一个“1”码就使积分器输出上升一个值,每收到一个“0”码就使积分器输出下降一个值,连续收到“1”码就使输出一直上升,反之连续收到“0”码就使输出一直下降,从而恢复输入信号;4. software radio multi-standard speech codec according to claim 1, is characterized in that: described codec algorithm process is provided with CVSD decoding algorithm, described CVSD decoding algorithm when decoding, decoding is to receive The digital code y(n) is judged, each time a "1" code is received, the output of the integrator is increased by one value, and each time a "0" code is received, the output of the integrator is decreased by a value, and "1" is continuously received. If the code is received, the output will rise all the time, otherwise, if the "0" code is received continuously, the output will fall all the time, thus restoring the input signal; 1)当y(n)=1时,积分器输出g(n)=g(n-1)+δ1) When y(n)=1, the integrator outputs g(n)=g(n-1)+δ 2)当y(n)=0时,积分器输出g(n)=g(n-1)-δ。2) When y(n)=0, the integrator outputs g(n)=g(n-1)-δ. 5.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有G.729编码算法,所述G.729编码算法在编码时,输入信号先经过高通滤波预处理,每10ms帧作一次LP分析,计算LP滤波器系数,并将这些系数转换为线谱对,线谱对定义为LSP,激励信号用A-B-S方法搜索,以原始语音与合成语音的误差感觉加权最小为测度进行搜索,而感觉加权滤波器用未量化的LP系数构造而成;每个子帧确定一次激励参数,量化的和未量化的LP滤波系数用于第2子帧,而在第1子帧使用内插的LP系数,根据感觉加权语音信号每10ms帧估计一次开环基音时延;每个子帧都重复进行:①目标信号是由通过加权合成滤波器滤过的LP残差计算得到的;②计算加权合成滤波器的脉冲响应;③用目标信号和脉冲响应搜索开环基音时延附近的值作闭环基音分析;④从目标信号中减去自适应码书的贡献,新的目标信号用于固定码书搜索寻找最佳激励;⑤最后,用确定的激励信号修改滤波器。5. software radio multi-standard speech codec according to claim 1, is characterized in that: described codec algorithm process is provided with G.729 coding algorithm, and described G.729 coding algorithm when coding, input signal first. After high-pass filtering preprocessing, LP analysis is performed every 10ms frame, LP filter coefficients are calculated, and these coefficients are converted into line spectrum pairs, which are defined as LSPs. The perceptually weighted minimum error is searched for the measure, and the perceptually weighted filter is constructed with unquantized LP coefficients; the excitation parameters are determined once per subframe, and the quantized and unquantized LP filter coefficients are used in the second subframe, while in The first subframe uses the interpolated LP coefficients to estimate the open-loop pitch delay every 10ms frame from the perceptually weighted speech signal; this is repeated for each subframe: ① The target signal is the LP residual filtered by the weighted synthesis filter Calculated; ② Calculate the impulse response of the weighted synthesis filter; ③ Use the target signal and the impulse response to search for the value near the open-loop pitch delay for closed-loop pitch analysis; ④ Subtract the contribution of the adaptive codebook from the target signal, the new The target signal is used for fixed codebook search to find the best excitation; ⑤Finally, the filter is modified with the determined excitation signal. 6.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有G.729编码算法,所述G.729编码算法在解码时,先从接收码流中提取参数编号,解码这些编号得到10ms语音帧对应的编码参数,这些参数是LSP参数、两个分数基音时延、两个固定码矢量与两组自适应和固定码字增益,每子帧LSP参数被内插并转换为LPC滤波系数,然后每5ms子帧按下面步骤进行处理:①自适应码字和固定码字分别乘以各自的增益并相加构成激励;②激励LPC综合滤波器重构语音;③重构语音信号经过后置处理,包括长时后置滤波、短时综合滤波和高通滤波。6. software radio multi-standard speech codec according to claim 1, is characterized in that: described codec algorithm process is provided with G.729 encoding algorithm, and described G.729 encoding algorithm when decoding, starts from receiving The parameter numbers are extracted from the code stream, and these numbers are decoded to obtain the coding parameters corresponding to the 10ms speech frame. These parameters are LSP parameters, two fractional pitch delays, two fixed code vectors and two sets of adaptive and fixed codeword gains. The frame LSP parameters are interpolated and converted into LPC filter coefficients, and then each 5ms subframe is processed according to the following steps: ① the adaptive codeword and the fixed codeword are multiplied by their respective gains and added to form the excitation; ② the excitation LPC comprehensive filtering 3. The reconstructed speech signal undergoes post-processing, including long-term post-filtering, short-term comprehensive filtering and high-pass filtering. 7.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有2.4kbps编码算法,数字化后的语音信号通过一个四阶切比雪夫高通滤波器,滤除直流工频干扰;然后采用多带混合激励进行清浊音判决,以准确提取基音信号;线性预测主要包括输入语音的分析和残差信号的分析;当浊音段信号的周期性不是很好时,通过非周期标志在译码端采用与之相适应的激励源激励不稳定的声带脉冲;按照感知加权失真度最小原则,采用四级码本的快速搜索矢量量化算法量化相关参数;经过纠错编码的比特流打包后发送。7. software radio multi-standard speech codec according to claim 1 is characterized in that: described codec algorithm process is provided with 2.4kbps encoding algorithm, and the digitized speech signal passes through a fourth-order Chebyshev high-pass filter , filter out the DC power frequency interference; then use multi-band mixed excitation to determine the voiced sound to accurately extract the pitch signal; the linear prediction mainly includes the analysis of the input speech and the analysis of the residual signal; when the periodicity of the voiced segment signal is not very good At the decoding end, the corresponding excitation source is used to excite the unstable vocal cord pulses at the decoding end; according to the principle of the minimum perceptual weighted distortion, the fast search vector quantization algorithm of the four-level codebook is used to quantify the relevant parameters; The miscoded bit stream is packaged and sent. 8.根据权利要求7所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有1.2kbps编码算法,和2.4kbps编码算法相比,在于线性预测仅去除了帧内的相关性,降低了码率。8. software radio multi-standard speech codec according to claim 7, is characterized in that: described codec algorithm process is provided with 1.2kbps encoding algorithm, compared with 2.4kbps encoding algorithm, is that linear prediction only removes frame Correlations within, reducing the bit rate. 9.根据权利要求1所述的软件无线电多制式语音编解码器,其特征在于:所述编解码算法进程提供有0.6kbps编码算法,其中编码分为参数提取和参数量化,编码器的参数提取分为基音提取、带通清浊音分析、线谱频率参数提取、增益估计四部分;解码时是首先对接收到的比特流进行解包,按照参数次序排列好,并区分开各个参数的编码比特流,然后将各个参数的编码比特流送至参数解码模块,采用反量化手段对各个参数进行解码,得到整个超级帧的线谱频率、带通清浊音判决、基音周期以及增益共四种参数;最后利用基音周期、残差谐波幅度和带通清浊音判决形成激励信号,再利用线谱频率对生成的激励信号进行谱增强处理,以及利用线谱频率和增益对输入的激励信号进行语音合成处理后得到两帧的合成语音信号并输出。9. software radio multi-standard speech codec according to claim 1, is characterized in that: described codec algorithm process is provided with 0.6kbps encoding algorithm, wherein encoding is divided into parameter extraction and parameter quantization, the parameter extraction of encoder It is divided into four parts: pitch extraction, band-pass unvoiced and voiced analysis, line spectrum frequency parameter extraction, and gain estimation; when decoding, the received bit stream is first unpacked, arranged in the order of parameters, and the encoded bits of each parameter are distinguished. Then, the encoded bit stream of each parameter is sent to the parameter decoding module, and the inverse quantization method is used to decode each parameter to obtain the line spectrum frequency of the entire super frame, the band-pass unvoiced and voiced decision, the pitch period and the gain. A total of four parameters; Finally, the excitation signal is formed by using the pitch period, the residual harmonic amplitude and the band-pass unvoiced and unvoiced sound judgment, and then the line spectrum frequency is used to perform spectral enhancement processing on the generated excitation signal, and the input excitation signal is synthesized by using the line spectrum frequency and gain. After processing, two frames of synthesized speech signals are obtained and output.
CN202011452195.5A 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder Pending CN112614495A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011452195.5A CN112614495A (en) 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011452195.5A CN112614495A (en) 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder

Publications (1)

Publication Number Publication Date
CN112614495A true CN112614495A (en) 2021-04-06

Family

ID=75234476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011452195.5A Pending CN112614495A (en) 2020-12-10 2020-12-10 Software radio multi-system voice coder-decoder

Country Status (1)

Country Link
CN (1) CN112614495A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542401A (en) * 2021-07-13 2021-10-22 北京太极疆泰科技发展有限公司 Voice communication method based on Lora technology
CN115002751A (en) * 2022-05-27 2022-09-02 立讯电子科技(昆山)有限公司 Encryption and decryption method and encryption and decryption earphone
CN115294952A (en) * 2022-05-23 2022-11-04 神盾股份有限公司 Audio processing method and device, and non-transitory computer readable storage medium
CN117793077A (en) * 2024-02-23 2024-03-29 中国电子科技集团公司第三十研究所 Communication system and soft-hard volume adjusting method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005315973A (en) * 2004-04-27 2005-11-10 Seiko Epson Corp Semiconductor integrated circuit
CN101506876A (en) * 2006-06-21 2009-08-12 哈里公司 Vocoder and associated method that transcodes between mixed excitation linear prediction (melp) vocoders with different speech frame rates
CN106098072A (en) * 2016-06-02 2016-11-09 重庆邮电大学 A kind of 600bps very low speed rate encoding and decoding speech method based on MELP

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005315973A (en) * 2004-04-27 2005-11-10 Seiko Epson Corp Semiconductor integrated circuit
CN101506876A (en) * 2006-06-21 2009-08-12 哈里公司 Vocoder and associated method that transcodes between mixed excitation linear prediction (melp) vocoders with different speech frame rates
CN106098072A (en) * 2016-06-02 2016-11-09 重庆邮电大学 A kind of 600bps very low speed rate encoding and decoding speech method based on MELP

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟利: ""多种语音业务处理平台的设计与实现"", 《中国优秀硕士学位论文全文数据库 信息科技》, pages 5 - 21 *
王国文;赵耿, 方晓等: ""MELP低比特数字语音算法研究和改进"", 《第十六届全国青年通信学术会议论文集(上)》, pages 80 - 82 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113542401A (en) * 2021-07-13 2021-10-22 北京太极疆泰科技发展有限公司 Voice communication method based on Lora technology
CN115294952A (en) * 2022-05-23 2022-11-04 神盾股份有限公司 Audio processing method and device, and non-transitory computer readable storage medium
CN115002751A (en) * 2022-05-27 2022-09-02 立讯电子科技(昆山)有限公司 Encryption and decryption method and encryption and decryption earphone
CN117793077A (en) * 2024-02-23 2024-03-29 中国电子科技集团公司第三十研究所 Communication system and soft-hard volume adjusting method thereof

Similar Documents

Publication Publication Date Title
US8346544B2 (en) Selection of encoding modes and/or encoding rates for speech compression with closed loop re-decision
US8032369B2 (en) Arbitrary average data rates for variable rate coders
CN1223989C (en) Frame erasure compensation method in variable rate speech coder
KR100804461B1 (en) Method and apparatus for predictively quantizing voiced speech
CN1158647C (en) Spectral magnetude quantization for a speech coder
US11282530B2 (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
CN112614495A (en) Software radio multi-system voice coder-decoder
US8090573B2 (en) Selection of encoding modes and/or encoding rates for speech compression with open loop re-decision
FI113571B (en) speech Coding
JP4270866B2 (en) High performance low bit rate coding method and apparatus for non-speech speech
EP3338282A1 (en) High-band target signal control
CN1188832C (en) Multipulse interpolative coding of transition speech frames
US6678649B2 (en) Method and apparatus for subsampling phase spectrum information
KR20020033737A (en) Method and apparatus for interleaving line spectral information quantization methods in a speech coder
Drygajilo Speech Coding Techniques and Standards
KR100389898B1 (en) Quantization Method of Line Spectrum Pair Coefficients in Speech Encoding
HK40036813A (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
HK40011418B (en) Method, device and computer-readable non-transitory memory for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
HK40011418A (en) Method, device and computer-readable non-transitory memory for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
Gardner et al. Survey of speech-coding techniques for digital cellular communication systems
Skog et al. Voice over IP application on TMS320C6701 EVM DSP Board
HK1091583B (en) Method and apparatus for subsampling phase spectrum information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210406