[go: up one dir, main page]

CN110556121A - Frequency band extension method, device, electronic equipment and computer readable storage medium - Google Patents

Frequency band extension method, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN110556121A
CN110556121A CN201910882470.8A CN201910882470A CN110556121A CN 110556121 A CN110556121 A CN 110556121A CN 201910882470 A CN201910882470 A CN 201910882470A CN 110556121 A CN110556121 A CN 110556121A
Authority
CN
China
Prior art keywords
spectrum
frequency
sub
low
frequency spectrum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910882470.8A
Other languages
Chinese (zh)
Other versions
CN110556121B (en
Inventor
肖玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410194890.8A priority Critical patent/CN117975976A/en
Priority to CN201910882470.8A priority patent/CN110556121B/en
Publication of CN110556121A publication Critical patent/CN110556121A/en
Application granted granted Critical
Publication of CN110556121B publication Critical patent/CN110556121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmitters (AREA)

Abstract

the invention provides a frequency band expanding method, a frequency band expanding device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: carrying out low-pass filtering on the first narrow-band signal to be processed to obtain a second narrow-band signal; determining a low frequency spectrum of the second narrowband signal; obtaining a target high-frequency spectrum based on the low-frequency spectrum; and obtaining a broadband signal after the frequency band is expanded based on the low-frequency spectrum and the target high-frequency spectrum. According to the scheme, the second narrowband signal is a signal obtained by low-pass filtering the first narrowband signal, so that the second narrowband signal does not contain an aliasing signal, and the broadband signal obtained by expanding the second narrowband signal is not affected by the aliasing signal in the low-frequency spectrum based on the second narrowband signal, so that the quality of the obtained broadband signal is better.

Description

Frequency band extension method, device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of audio signal processing technologies, and in particular, to a frequency band extension method and apparatus, an electronic device, and a computer-readable storage medium.
Background
band extension, which may also be referred to as band replication, is a classic technique in the field of audio coding. The frequency band expansion technology is a parameter coding technology, and can realize the expansion of effective bandwidth at a receiving end through frequency band expansion so as to improve the quality of audio signals, so that a user can intuitively feel brighter tone, larger volume and better intelligibility.
In the prior art, an audio signal requiring band extension usually includes a large number of aliasing signals, i.e., signals with crossed spectrums, and band extension is performed based on the audio signal including the aliasing signals, so that the extended wideband signal includes aliasing signals, and the signal quality of the wideband signal is poor.
Disclosure of Invention
embodiments of the present invention provide a method, an apparatus, an electronic device, and a computer-readable storage medium for band extension, so as to solve at least one technical defect in the prior art and better meet the requirement of practical application. The technical scheme provided by the embodiment of the invention is as follows:
in a first aspect, an embodiment of the present invention provides a frequency band extending method, where the method includes:
carrying out low-pass filtering on the first narrow-band signal to be processed to obtain a second narrow-band signal;
Determining a low frequency spectrum of the second narrowband signal;
obtaining a target high-frequency spectrum based on the low-frequency spectrum;
And obtaining a broadband signal after the frequency band is expanded based on the low-frequency spectrum and the target high-frequency spectrum.
In a second aspect, the present invention provides a band extending apparatus, comprising:
The second narrowband signal determining module is used for performing low-pass filtering on the first narrowband signal to be processed to obtain a second narrowband signal;
A low-frequency spectrum determination module for determining a low-frequency spectrum of the second narrowband signal;
The high-frequency spectrum determining module is used for obtaining a target high-frequency spectrum based on the low-frequency spectrum;
and the broadband signal determining module is used for obtaining a broadband signal after the frequency band is expanded on the basis of the low-frequency spectrum and the target high-frequency spectrum.
In an optional embodiment of the second aspect, when the high-frequency spectrum determination module obtains the target high-frequency spectrum based on the low-frequency spectrum, the high-frequency spectrum determination module is specifically configured to:
Inputting the low-frequency spectrum into a neural network model, and obtaining a correlation parameter based on the output of the neural network model, wherein the correlation parameter represents the correlation between the high-frequency part and the low-frequency part of the target broadband spectrum, and the correlation parameter comprises a high-frequency spectrum envelope;
and obtaining a target high-frequency spectrum based on the correlation parameter and the low-frequency spectrum.
In an optional embodiment of the second aspect, the high frequency spectrum determination module, when inputting the low frequency spectrum to the neural network model, is specifically configured to:
determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
The low frequency spectrum and the low frequency spectrum envelope are input to a neural network model.
In an optional embodiment of the second aspect, the apparatus further comprises:
The low-frequency spectrum processing module is used for dividing a low-frequency spectrum into a first number of sub-spectrums; obtaining a sub-spectrum envelope corresponding to each sub-spectrum based on a logarithmic value of a spectrum coefficient included in each sub-spectrum, wherein the low-frequency spectrum envelope includes the determined first number of sub-spectrum envelopes.
In an optional embodiment of the second aspect, when the second narrowband signal determining module performs low-pass filtering on the first narrowband signal to be processed to obtain the second narrowband signal, the second narrowband signal determining module is specifically configured to:
performing up-sampling processing on the first narrow-band signal with a sampling factor of a first preset value to obtain an up-sampled signal;
performing low-pass filtering on the up-sampling signal through a filter to obtain a filtering signal;
And performing down-sampling processing on the filtered signal with the sampling factor being a second preset value to obtain a second narrow-band signal, wherein the second preset value is determined based on the number of filtering channels of the filter.
In an optional embodiment of the second aspect, when determining the low-frequency spectrum of the second narrowband signal, the low-frequency spectrum determination module is specifically configured to:
And carrying out discrete cosine transform processing on the second narrowband signal to obtain a low-frequency spectrum of the second narrowband signal.
In an alternative embodiment of the second aspect, at least one of the low frequency spectrum or the target high frequency spectrum is derived based on a corresponding filtered initial spectrum.
in an optional embodiment of the second aspect, the apparatus further comprises:
The first filtering module is used for filtering the initial spectrum:
And determining a first filtering gain based on the spectrum energy of the initial spectrum, and filtering the initial spectrum according to the first filtering gain.
In an optional embodiment of the second aspect, the first filtering module is specifically configured to determine a first filtering gain based on a spectrum energy of the initial spectrum, and when performing filtering processing on the initial spectrum according to the first filtering gain:
dividing the initial frequency spectrum into a first set number of sub-frequency spectrums, and determining first frequency spectrum energy corresponding to each sub-frequency spectrum;
Determining a second filtering gain corresponding to each sub-spectrum based on the first spectrum energy corresponding to each sub-spectrum, wherein the first filtering gain comprises a first set number of second filtering gains;
And performing filtering processing on the corresponding sub-spectrums based on the second filtering gain corresponding to each sub-spectrum.
in an optional embodiment of the second aspect, when determining the second filtering gain corresponding to each sub-spectrum based on the first spectral energy corresponding to each sub-spectrum, the first filtering module is specifically configured to:
dividing a frequency band corresponding to the initial frequency spectrum into a first sub-band and a second sub-band;
Determining first sub-band energy of the first sub-band according to the first spectrum energy of all sub-spectra corresponding to the first sub-band, and determining second sub-band energy of the second sub-band according to the first spectrum energy of all sub-spectra corresponding to the second sub-band;
Determining a spectrum tilt coefficient of the initial spectrum according to the first sub-band energy and the second sub-band energy;
And determining a second filtering gain corresponding to each sub-spectrum according to the spectrum inclination coefficient and the first spectrum energy corresponding to each sub-spectrum.
in an optional embodiment of the second aspect, when the first narrowband signal is a speech signal of a current speech frame and the first spectral energy of one sub-spectrum is determined, the first filtering module is specifically configured to:
Determining a first initial spectral energy of a sub-spectrum;
If the current speech frame is a first speech frame, the first spectrum energy is first initial spectrum energy;
if the current voice frame is not the first voice frame, acquiring second initial spectrum energy of a sub-spectrum corresponding to one sub-spectrum of a related voice frame, wherein the related voice frame is at least one voice frame which is positioned before the current voice frame and is adjacent to the current voice frame;
a first spectral energy of a sub-spectrum is derived based on the first initial spectral energy and the second initial spectral energy.
in an optional embodiment of the second aspect, when the high-frequency spectrum determination module obtains the target high-frequency spectrum based on the low-frequency spectrum, the high-frequency spectrum determination module is specifically configured to:
Obtaining an initial high-frequency spectrum based on the low-frequency spectrum;
Obtaining a target high-frequency spectrum based on the high-frequency part of the initial high-frequency spectrum;
The wideband signal determination module is specifically configured to, when obtaining a wideband signal after band expansion based on the low-frequency spectrum and the target high-frequency spectrum:
Determining a target low-frequency spectrum according to the low-frequency spectrum and the low-frequency part of the initial high-frequency spectrum;
And obtaining a broadband signal with the expanded frequency band according to the target low-frequency spectrum and the target high-frequency spectrum.
in an optional embodiment of the second aspect, when the wideband signal determination module obtains the wideband signal after the band expansion based on the target low-frequency spectrum and the target high-frequency spectrum, the wideband signal determination module is specifically configured to:
Performing frequency-time conversion on the target low-frequency spectrum to obtain a first time domain signal;
performing frequency-time conversion on the target high-frequency spectrum to obtain a second time domain signal;
A wideband signal is generated based on the first time domain signal and the second time domain signal.
In an optional embodiment of the second aspect, when the high-frequency spectrum determination module obtains the target high-frequency spectrum based on the correlation parameter and the low-frequency spectrum, the high-frequency spectrum determination module is specifically configured to:
Determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
Generating an initial high frequency spectrum based on the low frequency spectrum;
Determining a difference value between a high-frequency spectrum envelope and a low-frequency spectrum envelope, wherein the high-frequency spectrum envelope and the low-frequency spectrum envelope are both spectrum envelopes in a logarithmic domain;
and adjusting the initial high-frequency spectrum based on the difference value to obtain a target high-frequency spectrum.
In an optional embodiment of the second aspect, when the high frequency spectrum determination module generates the initial high frequency spectrum based on the low frequency spectrum, it is specifically configured to:
the spectrum of the high-band part of the low-frequency spectrum is copied.
In an alternative embodiment of the second aspect, the high-frequency spectral envelope comprises a second number of first sub-spectral envelopes, the initial high-frequency spectrum comprises a second number of sub-spectra, wherein each first sub-spectral envelope is determined based on a corresponding sub-spectrum in the initial high-frequency spectrum;
the high-frequency spectrum determining module is specifically configured to, when determining a difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, adjust the initial high-frequency spectrum based on the difference, and obtain a target high-frequency spectrum:
Determining a difference value of each first sub-spectral envelope and a corresponding spectral envelope of the low-frequency spectral envelopes;
Adjusting the corresponding initial sub-spectrum based on the difference corresponding to each first sub-spectrum envelope to obtain a second number of adjusted sub-spectrums;
and obtaining a target high-frequency spectrum based on the second number of adjusted sub-spectrums.
In an optional embodiment of the second aspect, the correlation parameter further includes relative flatness information, the relative flatness information characterizes a correlation of spectral flatness of a high frequency part and spectral flatness of a low frequency part of the target broadband spectrum;
The high-frequency spectrum determination module, when determining a difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, is specifically configured to:
determining a gain adjustment value of the high frequency spectrum envelope based on the relative flatness information and the spectral energy of the low frequency spectrum;
Adjusting the high-frequency spectrum envelope based on the gain adjustment value to obtain the adjusted high-frequency spectrum envelope;
A difference between the adjusted high frequency spectral envelope and the low frequency spectral envelope is determined.
in an alternative embodiment of the second aspect, the relative flatness information includes relative flatness information of at least two subband regions corresponding to the high frequency part, the relative flatness information corresponding to one subband region characterizing a correlation between a spectral flatness of one subband region of the high frequency part and a spectral flatness of a high frequency band of the low frequency part;
the high-frequency spectrum determination module is specifically configured to, when determining a gain adjustment value of the high-frequency spectrum envelope based on the relative flatness information and the spectral energy of the low-frequency spectrum:
determining a gain adjustment value of a spectrum envelope part corresponding to each sub-band region in a high-frequency spectrum envelope based on the relative flatness information corresponding to each sub-band region and the spectrum energy corresponding to each sub-band region in the low-frequency spectrum;
The high-frequency spectrum determination module is specifically configured to, when adjusting the high-frequency spectrum envelope based on the gain adjustment value:
And adjusting the corresponding spectrum envelope part in the high-frequency spectrum envelope based on the gain adjustment value of the spectrum envelope part corresponding to each sub-band region.
In an optional embodiment of the second aspect, if the high-frequency spectral envelope includes a second number of first sub-spectral envelopes, the high-frequency spectrum determining module is specifically configured to, when determining the gain adjustment value of the spectral envelope portion corresponding to each sub-band region in the high-frequency spectral envelope based on the relative flatness information corresponding to each sub-band region and the spectral energy corresponding to each sub-band region in the low-frequency spectrum:
For each first sub-spectrum envelope, determining a gain adjustment value of the first sub-spectrum envelope according to the spectrum energy corresponding to the spectrum envelope corresponding to the first sub-spectrum envelope in the low-frequency spectrum envelope, the relative flatness information corresponding to the corresponding sub-band region, and the spectrum energy corresponding to the corresponding sub-band region;
the high-frequency spectrum determining module is specifically configured to, when adjusting a corresponding spectral envelope portion in the high-frequency spectral envelope based on the gain adjustment value of the spectral envelope portion corresponding to each subband region:
And adjusting the corresponding first sub-spectrum envelope in the high-frequency spectrum envelope according to the gain adjustment value of the first sub-spectrum envelope corresponding to each sub-band region.
In an optional embodiment of the second aspect, if the first narrowband signal includes at least two correlated signals, the apparatus further includes:
The narrowband signal determining module is used for fusing at least two paths of related signals to obtain a first narrowband signal; or, each signal in the at least two correlated signals is taken as the first narrowband signal.
In a third aspect, an embodiment of the present invention provides an electronic device, where the electronic device includes a processor and a memory; the memory has stored therein readable instructions which, when loaded and executed by the processor, implement the method as shown in the first aspect or any one of the alternative embodiments of the first aspect described above.
In a fourth aspect, the embodiments of the present invention provide a computer-readable storage medium, in which readable instructions are stored, and when the readable instructions are loaded and executed by a processor, the method as shown in the first aspect or any optional embodiment of the first aspect is implemented.
according to the frequency band expanding scheme provided by the embodiment of the invention, the first narrow-band signal can be subjected to low-pass filtering, aliasing signals in the first narrow-band signal are eliminated, and the second narrow-band signal does not contain the aliasing signals, so that the broadband signal obtained based on the low-frequency spectrum of the second narrow-band signal can not be influenced by the aliasing signals, and the quality of the obtained broadband signal is better. Therefore, based on the frequency band expansion scheme of the embodiment of the invention, signals with surging tone and larger volume can be obtained, so that a user has better hearing experience.
Drawings
in order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below.
Fig. 1 is a flow chart illustrating a frequency band extending method provided in an embodiment of the present invention;
FIG. 2 illustrates a low pass filtered amplitude response diagram provided in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a network structure of a neural network model provided in an embodiment of the present invention;
Fig. 4 is a flow chart illustrating a band spreading method in an example provided in an embodiment of the present invention;
Fig. 5 shows another flow chart of the band extending method in the example provided in the embodiment of the present invention;
Fig. 6 is a schematic structural diagram of a frequency band extending apparatus provided in an embodiment of the present invention;
fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
in order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
As used herein, the singular forms "a", "an", "the" and "the" include plural referents unless the context clearly dictates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
for better understanding and description of the embodiments of the present invention, some technical terms used in the embodiments of the present invention will be briefly described below.
band Extension (BWE): is a technique for extending a narrowband signal into a wideband signal in the field of audio coding.
frequency spectrum: it is the abbreviation of frequency spectrum density, and is the distribution curve of frequency.
Spectral Envelope (SE): the energy of the spectral coefficient corresponding to the signal on the frequency axis corresponding to the signal is represented, and for a sub-band, the energy of the spectral coefficient corresponding to the sub-band is represented, for example, the average energy of the spectral coefficient corresponding to the sub-band.
spectral Flatness (SF): and characterizing the power flatness degree of the signal to be measured in the channel.
Neural Networks (NN): the method is an arithmetic mathematical model simulating animal neural network behavior characteristics and performing distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system.
Deep Learning (DL): one type of machine learning, deep learning forms more abstract high-level representation attribute classes or features by combining low-level features to discover a distributed feature representation of the data.
PSTN (Public Switched Telephone Network): a commonly used old telephone system, namely a telephone network commonly used in our daily life.
VoIP (Voice over Internet Protocol ): it is a voice call technology, which achieves voice call and multimedia conference through internet protocol, i.e. communication is performed through internet.
3GPP EVS: the 3GPP (3rd Generation Partnership Project) mainly defines the third Generation technical specification of the radio interface based on the gsm; an Enhanced Voice Services (EVS) encoder is a new-generation audio encoder, and not only can provide very high audio quality for voice and music signals, but also has strong capabilities of resisting frame loss and time delay jitter, and can bring brand-new experience to users.
IEFT OPUS: opus is a lossy vocoding format developed by The Internet Engineering Task Force (IETF).
SILK: the Silk audio encoder is a Silk broadband that Skype web phone provides royalty-free certification to third party developers and hardware manufacturers.
Band extension is a classic technique in the field of audio coding, and as can be seen from the foregoing description, in the prior art, band extension can be implemented by:
the first mode is as follows: selecting a frequency spectrum of a low-frequency part in the narrow-band signal to copy to a high frequency under the narrow-band signal with a low sampling rate; a narrow band signal (i.e., a narrowband signal) is extended to a wide band signal (i.e., a wideband signal) according to boundary information (information describing energy dependency of high frequency and low frequency) recorded in advance.
The second mode is as follows: blind band expansion, as the name suggests, is to directly complete band expansion without extra bits, and narrow band signals under low sampling rate utilize technologies such as neural network or deep learning, the input of the neural network or deep learning is the low frequency spectrum of the narrow band signals, the output is the high frequency spectrum, and the narrow band signals are expanded into wide band signals based on the high frequency spectrum.
however, the first way of performing band extension is that the side information consumes corresponding bits, and there is a problem of forward compatibility, for example, a typical scenario is PSTN (narrowband speech) and VoIP (broadband speech) interworking scenario. In the transmission direction from PSTN to VoIP (abbreviated as PSTN-VoIP), if the transmission protocol is not modified (corresponding band extension code stream is added), the purpose of outputting broadband voice in the transmission direction of PSTN-VoIP cannot be achieved. Band spreading is performed by the second method, where the input is a low frequency spectrum and the output is a high frequency spectrum. Although extra bits are not consumed in the method, the requirement on the generalization capability of the network is high, and in order to ensure the accuracy of the network output, the depth and the volume of the network are large, the complexity is high, and the performance is poor. Therefore, the performance requirements for actual band extension cannot be met by both of the above-described band extension methods.
aiming at the problems in the prior art and better meeting the requirements of practical application, the embodiment of the invention provides a frequency band extension method, which can eliminate aliasing signals in a first narrow-band signal, so that the quality of the extended broadband signal is better, additional bits are not needed, the depth and the volume of a network are reduced, and the complexity of the network is reduced.
In the embodiment of the present invention, a voice scene of intercommunication between PSTN and VoIP is taken as an example to describe the scheme of the present invention, that is, in the transmission direction of PSTN-VoIP, narrowband voice is extended to wideband voice. In practical applications, the present invention is not limited to the above application scenarios, and is also applicable to other coding systems, including but not limited to: mainstream audio encoders such as 3GPP EVS, IEFT OPUS, and SILK.
The following describes the technical solution of the present invention and how to solve the above technical problems with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
It should be noted that, in the following description of the scheme of the present invention, taking a voice scene of PSTN and VoIP interworking as an example, the sampling rate is 8000Hz, and the frame length of one frame of voice frame is 10ms (equivalent to 80 sample points/frame). In practical applications, the frame length of the PSTN frame is 20ms, so that only two operations need to be performed on each PSTN frame.
In the description process of the embodiment of the present invention, the data frame length is fixed to 10ms as an example, however, it is clear to those skilled in the art that the frame length is a scene with other values, such as a scene with 20ms (equivalent to 160 sample points/frame), and the present invention is still applicable, and is not limited herein. Similarly, in the embodiment of the present invention, the sampling rate is 8000Hz, which is not used to limit the range of the band extension provided by the embodiment of the present invention. For example, although the main embodiment of the present invention is to extend the frequency band of a signal with a sampling rate of 8000Hz to a signal with a sampling rate of 16000Hz, the present invention can also be applied to other sampling rate scenarios, such as extending a signal with a sampling rate of 16000Hz to a signal with a sampling rate of 32000Hz, extending a signal with a sampling rate of 8000Hz to a signal with a sampling rate of 12000Hz, and the like. The scheme of the embodiment of the invention can be applied to any scene needing signal frequency band expansion.
Fig. 1 shows a flow chart of a band extending method provided by the present invention, as shown in the figure, the method may include steps S110 to S140, wherein,
Step S110, low-pass filtering the first narrowband signal to be processed to obtain a second narrowband signal.
The first narrowband signal to be processed may be a speech frame signal that needs to be subjected to band extension, for example, in a PSTN-VoIP path, a PSTN narrowband speech signal needs to be extended to a VoIP wideband speech signal, and then the first narrowband signal may be a PSTN narrowband speech signal. If the first narrowband signal is a speech frame, the first narrowband signal may be all or part of a frame of speech frame.
specifically, in an actual application scenario, for a signal to be processed, the signal may be used as a first narrowband signal to perform band extension at one time, or the signal may be divided into a plurality of sub-signals, and the plurality of sub-signals are processed respectively, such that, as the frame length of the PSTN frame is 20ms, the signal of the 20ms speech frame may be subjected to band extension at one time, or the 20ms speech frame may be divided into two 10ms speech frames, and the two 10ms speech frames may be subjected to band extension respectively.
The first narrow-band signal contains an aliasing signal, the aliasing signal can be eliminated by low-pass filtering the first narrow-band signal, and the obtained second narrow-band signal is a low-frequency part signal in the first narrow-band signal. In the solution of the present invention, the low-pass filtering may be performed by using a QMF (Quadrature Mirror Filter) analysis Filter.
As an example, as shown in FIG. 2, the bandwidth of the first narrowband signal is 8kHz, wherein the bandwidth of the low frequency part is 0-4.6 kHz, and the bandwidth of the high frequency part is 3.3-8 kHz, wherein a certain aliasing exists between the low frequency part and the high frequency part. By low-pass filtering the first narrow-band signal, a second narrow-band signal with aliasing removed can be obtained. Correspondingly, in the signal synthesis stage, the corresponding QMF synthesis filter is used to perform low-pass filtering, i.e. the aliasing phenomenon can be eliminated in the signal synthesis.
Step S120, determining a low frequency spectrum of the second narrowband signal.
Step S130, a target high-frequency spectrum is obtained based on the low-frequency spectrum.
Specifically, one implementation manner of obtaining the target high-frequency spectrum based on the low-frequency spectrum may be to copy the low-frequency spectrum to obtain the target high-frequency spectrum.
Step S140, obtaining a wideband signal with a spread frequency band based on the low frequency spectrum and the target high frequency spectrum.
Specifically, the low-frequency spectrum and the target high-frequency spectrum may be combined, and the combined spectrum may be subjected to time-frequency inverse transformation, that is, frequency-time transformation, to obtain a wideband signal, thereby implementing the band extension of the first narrowband signal.
Because the bandwidth of the expanded broadband signal is greater than that of the first narrowband signal, a voice frame with a loud and bright tone and a large volume can be obtained based on the broadband signal, so that a user can have better hearing experience.
the frequency band expanding method provided by the embodiment of the invention can perform low-pass filtering on the first narrow-band signal, eliminate aliasing signals in the first narrow-band signal, and enable the second narrow-band signal not to contain the aliasing signals, so that the broadband signal obtained based on the low-frequency spectrum of the second narrow-band signal can not be influenced by the aliasing signals, and the quality of the obtained broadband signal is better. Therefore, based on the frequency band expansion scheme of the embodiment of the invention, signals with surging tone and larger volume can be obtained, so that a user has better hearing experience.
in order to better explain the solution provided by the present invention, the following is further detailed with reference to an example. This example is described with the example of the voice scenario of PSTN and VoIP interworking, the sampling rate of the voice signal being 8000Hz, and the frame length of one frame of voice frame being 10ms, described above.
in this example, the sampling rate of the PSTN signal is 8000Hz, and the effective bandwidth of the first narrowband signal is 4000Hz according to the Niquist sampling theorem. The purpose of this example is to obtain a signal with a bandwidth of 8000Hz after band spreading the first narrowband signal, i.e. the bandwidth of the wideband signal is 8000 Hz. Considering that in an actual voice communication scenario, the effective bandwidth is 4000Hz, the upper bound of the effective bandwidth is typically 3500 Hz. Therefore, in this scheme, the effective bandwidth of the actually obtained wideband signal is 7000Hz, and the purpose of this example is to perform band extension on the signal with the bandwidth of 3500Hz to obtain a wideband signal with the bandwidth of 7000Hz, that is, to extend the frequency band of the signal with the sampling rate of 8000Hz to the signal with the sampling rate of 16000 Hz.
in an alternative embodiment of the present invention, in step S110, the low-pass filtering the first narrowband signal to be processed to obtain a second narrowband signal may include:
Performing up-sampling processing on the first narrow-band signal with a sampling factor of a first preset value to obtain an up-sampled signal;
Performing low-pass filtering on the up-sampling signal through a filter to obtain a filtering signal;
And performing down-sampling processing on the filtered signal with the sampling factor being a second preset value to obtain a second narrow-band signal, wherein the second preset value is determined based on the number of filtering channels of the filter.
specifically, the first preset value is usually 2, the low-pass filter for performing low-pass filtering on the up-sampled signal may be a multi-channel filter, and the second preset value is usually the same as the number of channels of the low-pass filter, for example, if the filter is a dual-channel filter, the second preset value is 2, that is, the down-sampling processing with a sampling factor of 2 needs to be performed on the filtered signal filtered by the dual-channel filter.
Wherein if the filter is a dual-channel QMF filter, one channel of the filter low-pass filters the first narrowband signal and the other channel high-pass filters the first narrowband signal.
As an example, the sampling factor is 2 (the first preset value and the second preset value are the same and both are 2), the sampling rate of the first narrowband signal is 8000Hz, and the upsampling process with the sampling factor of 2 is performed on the first narrowband signal, so as to obtain an upsampled signal with the sampling rate of 16000 Hz. The up-sampled signal is then low-pass filtered and the filtered signal is down-sampled by a sampling factor of 2. The sampling rate of the second narrowband signal is the same as the sampling rate of the first narrowband signal and the sampling rate of the second narrowband signal is 8000 Hz.
In an alternative embodiment of the present invention, in step S130, obtaining the target high frequency spectrum based on the low frequency spectrum may include:
inputting the low-frequency spectrum into a neural network model, and obtaining a correlation parameter based on the output of the neural network model, wherein the correlation parameter represents the correlation between the high-frequency part and the low-frequency part of the target broadband spectrum, and the correlation parameter comprises a high-frequency spectrum envelope.
and obtaining a target high-frequency spectrum based on the correlation parameter and the low-frequency spectrum.
the target wideband spectrum refers to a spectrum corresponding to a wideband signal (target wideband signal) to which the second narrowband signal is to be spread.
The neural network model may be a model trained in advance based on sample data, each sample data includes a sample narrowband signal and a sample wideband signal corresponding to the sample narrowband signal, for each sample data, a correlation parameter of a high-frequency part and a low-frequency part of a frequency spectrum of the sample wideband signal (the parameter may be understood as labeling information of the sample data, i.e. a sample label, which is simply referred to as a labeling result) may be determined, the correlation parameter includes a high-frequency spectrum envelope, and may also include relative flatness information of the high-frequency part and the low-frequency part of the frequency spectrum of the sample wideband signal, when the neural network model is trained based on the sample data, an input of the initial neural network model is the low-frequency spectrum of the sample narrowband signal, an output is a predicted correlation parameter (which is simply referred to as a prediction result), and whether the model training is finished may be determined based on a similarity degree between the prediction result and the labeling result corresponding to each, and if the model training is judged to be finished through whether the loss function of the model is converged, the loss function represents the difference degree of the prediction result and the labeling result of each sample data, and the model after the training is finished is used as the neural network model when the embodiment of the application is applied.
In the application stage of the neural network model, for the second narrowband signal, the low-frequency spectrum of the second narrowband signal may be input into the trained neural network model, so as to obtain the correlation parameter corresponding to the second narrowband signal. When the model is trained based on the sample data, the sample label of the sample data is the correlation parameter of the high-frequency part and the low-frequency part of the sample broadband signal, so that the correlation parameter of the second narrowband signal obtained based on the output of the neural network model can well represent the correlation of the high-frequency part and the low-frequency part of the frequency spectrum of the target broadband signal.
Since the correlation parameter can represent the correlation between the high frequency portion and the low frequency portion of the target wide frequency spectrum, the target high frequency spectrum (parameter corresponding to the high frequency portion) of the wide frequency signal that needs to be extended can be predicted based on the correlation parameter and the low frequency spectrum (parameter corresponding to the low frequency portion).
The correlation parameter is obtained through the output of the neural network model, because the neural network model is adopted for prediction, extra bits do not need to be coded, the blind analysis method is a blind analysis method and has better forward compatibility, and because the output of the model is the parameter capable of reflecting the correlation of the high-frequency part and the low-frequency part of the target broadband frequency spectrum, the mapping from the frequency spectrum parameter to the correlation parameter is realized, and compared with the existing mapping mode from the coefficient to the coefficient, the mapping method has better generalization capability.
in an alternative aspect of the present invention, inputting the low frequency spectrum to the neural network model may include:
Determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
the low frequency spectrum and the low frequency spectrum envelope are input to a neural network model.
specifically, in order to enrich data input into the neural network model, parameters related to the spectrum of the low-frequency part can be selected as input of the neural network model, and the low-frequency spectrum envelope of the second narrowband signal is information related to the spectrum of the signal, so that the low-frequency spectrum envelope can be used as input of the neural network model, and more accurate correlation parameters can be obtained based on the low-frequency spectrum envelope and the low-frequency spectrum. Therefore, the low-frequency spectrum envelope and the low-frequency spectrum are input into the neural network model, and the correlation parameter can be obtained.
In an alternative aspect of the present invention, determining the low frequency spectrum of the second narrowband signal may include:
and performing time-frequency transformation on the second narrowband signal to obtain a low-frequency spectrum of the second narrowband signal.
specifically, the low-frequency spectrum of the second narrowband signal may be obtained by performing time-frequency transform on the second narrowband signal, where the time-frequency transform includes, but is not limited to, wavelet transform, fourier transform, offline cosine transform, and offline sine transform.
In an alternative of the present invention, performing time-frequency transformation on the second narrowband signal to obtain a low-frequency spectrum of the second narrowband signal may include:
performing time-frequency transformation on the second narrowband signal to obtain a low-frequency domain coefficient;
A low frequency spectrum of the second narrowband signal is determined based on the low frequency domain coefficients.
based on the above example, since the sampling rate of the second narrowband signal subjected to the downsampling processing is 8000Hz, and the frame length is 10ms, the second narrowband signal corresponds to 80 sample points.
In an alternative of the present invention, the time-frequency transformation may adopt Short-time Fourier Transform (STFT) and Fast Fourier Transform (FFT), and the low-frequency domain coefficient obtained by processing the second narrowband signal based on the Short-time Fourier Transform and the Fast Fourier Transform includes the frequency spectrum and the phase spectrum of each frequency point, and since the phase spectrum of the high frequency is directly mapped from the phase spectrum of the low frequency and may have a certain error, the time-frequency transformation in the present invention may adopt Discrete Cosine Transform (MDCT), that is, the low-frequency spectrum is obtained by performing Discrete Cosine Transform on the second narrowband signal. Relative to the STFT, the low-frequency domain coefficient generated through the discrete cosine transform is a real number, and the contained information amount is larger; the more band-expanded wideband signal based on the low frequency domain coefficients is more accurate.
It can be understood that, for different time-frequency transformation modes, the information input by the neural network model may be different, because the low-frequency domain coefficient of the signal after discrete cosine transformation is a real number, the low-frequency spectrum is obtained based on the low-frequency domain coefficient, and the input of the model is the low-frequency spectrum.
in the following examples, the offline cosine transform example is used for corresponding description, and the time-frequency transform example is used for offline cosine transform, and the specific time-frequency transform process is as follows:
The discrete cosine transform is performed on the second narrowband signal, and in consideration of eliminating discontinuity of data between frames, an array may be formed by combining a frequency point corresponding to a previous frame of voice frame and a frequency point corresponding to a current voice frame (the second narrowband signal), and then windowing may be performed on the frequency points in the array, where in this embodiment, a cosine window may be used for windowing. And then performing offline cosine transform on the windowed signal to obtain a low-frequency domain coefficient.
Specifically, for the second narrowband signal containing 80 sample points, an array is formed by the 80 sample points corresponding to the previous speech frame and the 80 sample points corresponding to the current speech frame, and the array includes 160 sample points. Then, windowing is performed on the sample points in the array (for example, windowing is performed by using a cosine window), and 160-point discrete cosine transform is performed on the windowed signal to obtain 80 low-frequency domain coefficients SLow(i, j), where i is the frame index of the speech frame and j is the intra sample index (j ═ 0, 1, …, 79).
After obtaining the low-frequency domain coefficient, the low-frequency spectral envelope of the second narrowband signal may be determined based on the low-frequency domain coefficient, and in an alternative of the present invention, determining the low-frequency spectral envelope of the second narrowband signal based on the low-frequency domain coefficient may include:
dividing the low frequency spectrum into a first number of sub-spectra;
A respective sub-spectral envelope is determined for each sub-spectrum, the low-frequency spectral envelopes comprising the determined first number of sub-spectral envelopes.
specifically, one way to divide the low frequency spectrum into M (first number) sub-spectra is to: and performing band division processing on the second narrow-band signal to obtain M sub-frequency spectrums, wherein each sub-band can correspond to the low-frequency domain coefficients of the same or different numbers of sub-frequency spectrums, and the total number of the low-frequency domain coefficients corresponding to all the sub-bands is equal to the number of the low-frequency domain coefficients of the low-frequency spectrums.
after dividing into M sub-spectrums, a sub-spectrum envelope corresponding to each sub-spectrum may be determined based on each sub-spectrum, where one implementation manner is: based on the low-frequency domain coefficient corresponding to each sub-spectrum, a sub-spectrum envelope of each sub-band, that is, a sub-spectrum envelope corresponding to each sub-spectrum, may be determined, where M sub-spectra may correspond to M sub-spectrum envelopes, and the low-frequency spectrum envelope includes the determined M sub-spectrum envelopes.
It should be noted that, if the sampling rate of the second narrowband signal is 8000Hz, the effective bandwidth is 3500Hz, and the frame length is 10ms, based on the obtained 80 low-frequency domain coefficients, only 70 of the low-frequency domain coefficients may be selected for subsequent processing, that is, the low-frequency domain coefficient S corresponding to the low-frequency spectrumLowthe number of (i, j) is 70, i.e., j is 0, 1, …, 69.
as an example, for the above 80 low-frequency-domain coefficients, for example, the first 70 low-frequency-domain coefficients may be selected for subsequent processing, and if each sub-band contains the same number of low-frequency-domain coefficients, for example, 5 low-frequency-domain coefficients, the frequency band corresponding to the low-frequency-domain coefficients of each 5 sub-spectrums may be divided into one sub-band, which is then divided into 14(M ═ 14) sub-bands, and each sub-band corresponds to 5 low-frequency-domain coefficients. After dividing the 14 sub-spectra, 14 sub-spectral envelopes may be determined based on the 14 sub-spectral correspondences.
In practical application, the 70 low-frequency domain coefficients may be directly used as the low-frequency domain coefficients of the second narrowband signal, and further, for convenience of calculation, the low-frequency domain coefficients may be further converted into a logarithmic domain, that is, the low-frequency domain coefficients obtained through time-frequency transform (such as MDCT) processing are subjected to logarithmic operation, and the low-frequency domain coefficients subjected to the logarithmic operation are used as the low-frequency domain coefficients in subsequent processing.
specifically, determining the sub-spectrum envelope corresponding to each sub-spectrum may include:
And obtaining a corresponding sub-spectrum envelope of each sub-spectrum based on the logarithm value of the spectrum coefficient included in each sub-spectrum.
Specifically, based on the spectral coefficient (low-frequency domain coefficient) of each sub-spectrum, the sub-spectrum envelope corresponding to each sub-spectrum is determined by formula (1).
Wherein, formula (1) is:
Wherein S isLow(i, j) represents a low-frequency domain coefficient (spectral coefficient), eLowand (i, k) represents a sub-spectrum envelope, i is a frame index of the speech frame, k represents an index number of a sub-band, and M sub-bands are provided, and k is 0, 1 and 2 … … M, so that the low-frequency spectrum envelope comprises M sub-spectrum envelopes.
Generally, the spectral envelopes of the sub-bands are defined as the average energy of adjacent coefficients (or further converted into logarithmic representation), but this way may cause that coefficients with smaller amplitudes cannot play a substantial role, and the scheme provided by the embodiment of the present invention that directly averages the logarithmic identifications of the spectral coefficients included in each sub-spectrum to obtain the sub-spectral envelopes corresponding to the sub-spectrum may better protect the coefficients with smaller amplitudes in the distortion control of the neural network model training process, compared with the existing commonly used envelope determination scheme, so that more signal parameters can play a corresponding role in frequency band extension.
as an example, for example, the number of the low-frequency domain coefficients corresponding to the low-frequency spectrum is 70, the number of the low-frequency domain coefficients corresponding to each sub-band is the same, and 14 sub-bands are totally divided, so that the number of the sub-bands is 14, each sub-band corresponds to 5 low-frequency domain coefficients, that is, adjacent 5 low-frequency domain coefficients correspond to one sub-band, each sub-band corresponds to 5 low-frequency domain coefficients, and a low-frequency spectrum envelope includes 14 sub-spectrum envelopes.
Thus, if the low-frequency spectrum and the low-frequency spectrum envelope are input to the neural network model, the low-frequency spectrum is 70-dimensional data, and the low-frequency spectrum envelope is 14-dimensional data, the input to the model is 84-dimensional data, and thus the neural network model in this scheme is small in size and low in complexity.
In an alternative aspect of the invention, at least one of the low frequency spectrum or the target high frequency spectrum is derived based on a corresponding filtered initial spectrum.
Specifically, the initial spectrum corresponding to the low-frequency spectrum refers to an initial low-frequency spectrum, that is, a low-frequency spectrum without wave filtering processing, the initial spectrum corresponding to the target high-frequency spectrum refers to an initial high-frequency spectrum, the low-frequency spectrum referred to hereinafter may be an initial low-frequency spectrum, or a low-frequency spectrum after filtering processing (for distinction, the low-frequency spectrum after filtering processing is described as a first low-frequency spectrum), and the target high-frequency spectrum referred to hereinafter may be an initial high-frequency spectrum, or a high-frequency spectrum after filtering processing is performed on the initial high-frequency spectrum.
The low-frequency spectrum in step S120 and step S140 may be a spectrum obtained by filtering the corresponding initial spectrum, that is, the low-frequency spectrum is a first low-frequency spectrum obtained by performing time-frequency transform on the second narrowband signal and performing filtering on the initial low-frequency spectrum, or may be an initial low-frequency spectrum without being subjected to filtering. If the first low-frequency spectrum is the first low-frequency spectrum, the second narrow-band signal generally needs to be quantized before the time-frequency transformation is performed on the second narrow-band signal, and quantization noise is generally introduced in the quantization process, so that after the time-frequency transformation is performed on the second narrow-band signal, the quantization noise in the initial low-frequency spectrum can be filtered by filtering the initial low-frequency spectrum after the time-frequency transformation, and the first low-frequency spectrum is obtained, so that the quantization noise is prevented from being expanded to the high-frequency spectrum in the subsequent process of performing band expansion based on the first low-frequency spectrum.
specifically, the target high-frequency spectrum in step S140 may be a spectrum obtained by filtering the corresponding initial high-frequency spectrum, so that noise possibly existing in the target high-frequency spectrum may be effectively filtered, the signal quality of the broadband signal is enhanced, and the hearing experience of the user is further improved.
In an alternative of the present invention, if the low-frequency spectrum is the first low-frequency spectrum after the filtering process, the sub-spectrum envelope corresponding to each sub-spectrum may be determined based on the sub-spectrum of the low-frequency spectrum after the filtering process.
The method specifically comprises the following steps: and obtaining a corresponding sub-spectrum envelope of each sub-spectrum based on the logarithm value of the spectrum coefficient included in each sub-spectrum.
Specifically, the corresponding sub-spectrum envelope of each sub-spectrum can be determined by formula (2).
Wherein, the formula (2) is:
Wherein S isLow_revlow-frequency domain coefficients (spectral coefficients) representing the filtered low-frequency spectrum, eLowAnd (i, k) represents a sub-spectrum envelope, i is a frame index of the speech frame, k represents an index number of a sub-band, and M sub-bands are provided, and k is 0, 1 and 2 … … M, so that the low-frequency spectrum envelope comprises M sub-spectrum envelopes.
In an alternative aspect of the present invention, the filtering process performed on the initial spectrum includes:
And determining a first filtering gain based on the spectrum energy of the initial spectrum, and filtering the initial spectrum according to the first filtering gain.
specifically, the filtering process performed on the initial spectrum corresponding to the spectrum to be processed actually is to perform filtering process on an initial low-frequency domain coefficient (a low-frequency domain coefficient obtained by MDCT) corresponding to the initial spectrum (initial low-frequency spectrum), and in the process of performing filtering process on the initial low-frequency domain coefficient according to the first filtering gain, the initial low-frequency domain coefficient may be filtered by performing product operation on the first filtering gain and the initial low-frequency domain coefficient to obtain a low-frequency domain coefficient, where the initial low-frequency domain coefficient is SLow(i, j), the low-frequency domain coefficient after filtering is SLow_rev(i, j). If the determined first filter gain is Gpre_filt(j) then the initial can be paired according to the following equation (3)And (3) carrying out filtering processing on the low-frequency domain coefficient:
SLow_rev(i,j)=Gpre_filt(j)*SLow(i,j) (3)
Where i is the frame index of the speech frame and j is the intra sample index (j ═ 0, 1, …, 69).
Specifically, determining a first filter gain based on the spectral energy of the initial spectrum, and performing filter processing on the initial spectrum according to the first filter gain may include: the method comprises the steps of dividing initial low-frequency domain coefficients into a first set number of sub-spectrums, determining first spectrum energy corresponding to each sub-spectrum, then determining second filter gains corresponding to each sub-spectrum based on the first spectrum energy corresponding to each sub-spectrum, wherein the first filter gain values comprise the first set number of second filter gains, and performing filter processing on the corresponding sub-spectrums based on the second filter gains corresponding to each sub-spectrum.
for convenience of description, the first set number is denoted as L, wherein one possible implementation manner of dividing the initial low-frequency-domain coefficients into the first set number (L) of sub-spectrums is as follows: and performing band division processing on the initial low-frequency domain coefficients to obtain a first set number of sub-frequency spectrums, wherein each sub-band corresponds to N initial low-frequency domain coefficients, N L is equal to the number of the initial low-frequency domain coefficients, L is more than or equal to 2, and N is more than or equal to 1.
As an example, for example, if there are 70 initial low-frequency domain coefficients, the frequency band corresponding to every 5 (i.e., N-5) initial low-frequency domain coefficients may be divided into one sub-band and 14 (i.e., L-14) sub-bands, where each sub-band corresponds to 5 initial low-frequency domain coefficients.
one possible implementation manner for determining the first spectrum energy corresponding to each sub-spectrum is as follows: and determining the sum of the spectral energy of the N initial low-frequency domain coefficients respectively corresponding to each sub-spectrum as the first spectral energy corresponding to each sub-spectrum. The spectral energy of each initial low-frequency-domain coefficient is defined as the sum of the real-part square and the imaginary-part square of the initial low-frequency-domain coefficient.
As an example, for example, still taking the above initial low-frequency domain coefficients as 70, N being 5, and L being 14 as examples, the first spectral energy corresponding to each sub-spectrum can be calculated by the following formula (4):
where i is the frame index of the speech frame, j is the intra sample index (j is 0, 1, …, 69), k is 0, 1, …, 13, representing 14 subbands, pe (k) representing the first spectral energy corresponding to the k-th subband, SLow(i, j) are the initial low frequency domain coefficients.
specifically, after obtaining the respective first spectral energy of each sub-spectrum, the second filter gain corresponding to each sub-spectrum may be determined based on the respective first spectral energy of each sub-spectrum. In the process of determining the second filter gain corresponding to each sub-spectrum, the frequency band corresponding to the initial spectrum may be divided into a first sub-band and a second sub-band; determining first sub-band energy of the first sub-band according to the first spectrum energy of all sub-spectrums corresponding to the first sub-band, and determining second sub-band energy of the second sub-band according to the first spectrum energy of all sub-spectrums corresponding to the second sub-band; determining a frequency spectrum tilt coefficient of an initial frequency spectrum according to the first sub-band energy and the second sub-band energy; then, a second filtering gain corresponding to each sub-spectrum is determined according to the spectrum tilt coefficient and the first spectrum energy corresponding to each sub-spectrum.
In the process of dividing the frequency band corresponding to the initial low-frequency domain coefficients into a first sub-band and a second sub-band, the frequency band corresponding to the 1 st to 35 th initial low-frequency domain coefficients (i is 0, 1, …, 34) may be used as the first sub-band, and the frequency band corresponding to the 36 th to 70 th initial low-frequency domain coefficients (j is 35, 36, …, 69) may be used as the second sub-band, that is, the first sub-band corresponds to the 1 st to 35 th initial low-frequency domain coefficients in the initial frequency spectrum, and the second sub-band corresponds to the 36 th to 70 th initial low-frequency domain coefficients in the initial frequency spectrum. If N is 5, i.e. each 5 initial low-frequency-domain coefficients are divided into one sub-spectrum, the first sub-band includes 7 sub-spectrums, and the second sub-band also includes 7 sub-spectrums, so that the first sub-band energy of the first sub-band can be determined according to the sum of the first spectrum energies of the 7 sub-spectrums included in the first sub-band, and the second sub-band energy of the second sub-band can also be determined according to the sum of the first spectrum energies of the 7 sub-spectrums included in the second sub-band.
Specifically, when the first narrowband signal is a speech signal of a current speech frame, one possible way to determine the corresponding first spectral energy for each sub-spectrum is to: and (4) determining the first initial spectral energy Pe (k) corresponding to each sub-spectrum according to the formula (4). If the current speech frame is the first speech frame, the first initial spectral energy pe (k) of each sub-spectrum may be determined as the first spectral energy (denoted as fe (k)) of each sub-spectrum, that is, pe (k) ═ fe (k). If the current speech frame is not the first speech frame, in the process of determining the first spectrum energy of the kth sub-spectrum, the second initial spectrum energy of the sub-spectrum corresponding to the kth sub-spectrum of the associated speech frame can be obtained, and the second spectrum energy is recorded as Pepre(k) Wherein the associated speech frame is at least one (e.g. 1, 2) speech frame that precedes and is adjacent to the current speech frame. After the second initial spectral energy is obtained, the first spectral energy of the certain sub-spectrum may be obtained based on the first initial spectral energy and the second initial spectral energy.
In one example, the first spectral energy of the kth sub-spectrum may be determined according to equation (5) as follows:
Fe(k)=1.0+Pe(k)+Pepre(k) (5)
Where Pe (k) is the first initial spectral energy of the kth sub-spectrum, Pepre(k) is the second initial spectral energy of the sub-spectrum corresponding to the k-th sub-spectrum of the associated speech frame of the current speech frame, i.e. the initial spectral energy of the k-th sub-spectrum corresponding to the associated speech frame, and fe (k) is the first spectral energy of the k-th sub-spectrum. Based on the above example, k can be 0 ~ 13.
In another example, after obtaining the first spectral energy of the kth sub-spectrum according to the above formula, the first spectral energy may be smoothed, and after determining the smoothed first spectral energy Fe _ sm (k), Fe _ sm (k) may be determined as the first spectral energy of the kth sub-spectrum. Wherein the first spectral energy may be smoothed according to the following formula:
Fe_sm(k)=(Fe(k)+Fepre(k))/2 (6)
wherein Fe (k) is the first spectral energy of the kth sub-spectrum, Pepre(k) Is the first spectral energy of the sub-spectrum corresponding to the k-th sub-spectrum of the associated speech frame, and Fe _ sm (k) is the smoothed first spectral energy. After determining the smoothed first spectral energy Fe _ sm (k), Fe _ sm (k) may be determined as the first spectral energy of the kth sub-spectrum.
it should be noted that the associated speech frame in the above formula (6) is a speech frame located before and adjacent to the current speech frame. When the associated speech frame is two or more speech frames located before and adjacent to the current speech frame, equation (6) above can be appropriately adjusted as needed.
Among them, the following formula (7) can be adjusted:
wherein,Is the first initial spectral energy, Pe, of the first speech frame immediately preceding and immediately adjacent to the current speech framepre2(k) Is the first initial spectral energy of a speech frame immediately preceding and next to the first speech frame.
in another example, after obtaining the first spectral energy of the kth sub-spectrum according to the above formula, the first spectral energy may be smoothed, and after determining the smoothed first spectral energy Fe _ sm (k), Fe _ sm (k) may be determined as the first spectral energy of the kth sub-spectrum. Wherein the first spectral energy may be smoothed according to the following equation (8):
Fe_sm(k)=(Fe(k)+Fepre(k))/2 (8)
wherein Fe (k) is the first spectral energy of the kth sub-spectrum, Fepre(k) is the first spectral energy of the sub-spectrum corresponding to the k-th sub-spectrum of the associated speech frame, and Fe _ sm (k) is the smoothed first spectral energy. After determining the smoothed first spectral energy Fe _ sm (k), Fe _ sm (k) may be determined as the first spectral energy of the kth sub-spectrum.
It should be noted that the associated speech frame in the above formula (8) is a speech frame located before and adjacent to the current speech frame. When the associated speech frame is two or more speech frames located before and adjacent to the current speech frame, the above equation (8) may be appropriately adjusted as needed, for example, the adjustment is:
Fe_sm(k)=(Fe(k)+Fepre1(k)+Fepre2(k))/3 (9)
Wherein, Fepre1(k) Is the first spectral energy, Pe, of the first speech frame immediately preceding and immediately adjacent to the current speech framepre2(k) Is the first spectral energy of a speech frame immediately preceding and adjacent to the first speech frame.
Specifically, after determining the first spectral energy Fe (k) or Fe _ sm (k) of each sub-spectrum according to the above process, when the first spectral energy of each sub-spectrum is Fe (k), the first sub-band energy of the first sub-band and the second sub-band energy of the second sub-band may be determined according to the following formula (10):
Where e1 is the first sub-band energy of the first sub-band and e2 is the second sub-band energy of the second sub-band.
when the first spectral energy of each sub-spectrum is Fe _ sm (k), the first sub-band energy of the first sub-band and the second sub-band energy of the second sub-band can be determined according to the following formula (11):
Where e1 is the first sub-band energy of the first sub-band and e2 is the second sub-band energy of the second sub-band.
Specifically, after the first subband energy and the second subband energy are determined, the spectrum tilt coefficient of the initial spectrum may be determined according to the first subband energy and the second subband energy. In practical applications, the spectral tilt coefficient of the initial spectrum may be determined according to the following logic:
When the second sub-band energy is greater than or equal to the first sub-band energy, the initial spectral tilt coefficient is determined to be 0, and when the second sub-band energy is less than the first sub-band energy, the initial spectral tilt coefficient may be determined according to the following equation (12):
T_para_0=8*f_cont_low*SQRT((e1-e2)/(e1+e2) (12)
wherein, T _ para _0 is the initial spectrum tilt coefficient, f _ cont _ low is the predetermined filter coefficient, SQRT is the root-opening operation, e1 is the first subband energy, and e2 is the second subband energy.
specifically, after the initial spectrum tilt coefficient T _ para _0 is obtained in the above manner, the initial spectrum tilt coefficient may be used as the spectrum tilt coefficient of the initial spectrum, or the obtained initial spectrum tilt coefficient may be further optimized in the following manner, and the optimized initial spectrum tilt coefficient is used as the spectrum tilt coefficient of the initial spectrum, in an example, the optimization manner is: t _ para _1 is min (1.0, T _ para _0), and T _ para _2 is T _ para _1/7, where min represents a minimum value, T _ para _1 is an initial spectral tilt coefficient after initial optimization, and T _ para _2 is an initial spectral tilt coefficient after final optimization, that is, a spectral tilt coefficient of the initial spectrum.
specifically, after the spectrum tilt coefficient of the initial spectrum is determined, the second filter gain corresponding to each sub-spectrum may be determined according to the spectrum tilt coefficient and the first spectrum energy corresponding to each sub-spectrum. In one example, the second filter gain corresponding to the k-th sub-spectrum may be determined according to the following equation (13):
gainfo(k)=Fe(k)f_cont_low (13)
wherein, gainf0(k) a second filter gain corresponding to the kth sub-spectrum; fe (k) is the first spectral energy of the kth sub-spectrum; f _ cont _ low is 0.035, which is a preset filter coefficient; k is 0, 1, …, 13, representing 14 sub-bands.
Determining a second filter gain corresponding to the kth sub-spectrumf0(k) then, if the spectrum tilt coefficient of the initial spectrum is not positive, gain can be directly setf0(k) as the second filter gain corresponding to the kth sub-spectrum, if the spectrum tilt coefficient of the initial spectrum is positive, the second filter gain may be adjusted according to the spectrum tilt coefficient of the initial spectrumf0(k) Adjusting the second filter gainf0(k) As a second filter gain corresponding to the k-th sub-spectrum. In one example, the second filter gain may be set according to the following equation (14)f0(k) and (3) adjusting:
gainf1(k)=gainf0(k)*(1+k*Tpara) (14)
wherein, gainf1(k) For the adjusted second filter gain, gainf0(k) for a second filter gain, T, corresponding to the k-th sub-spectrumparaThe initial spectrum has a spectrum tilt coefficient of k 0, 1, …, 13, and represents 14 subbands.
specifically, the second filtering gain corresponding to the kth sub-spectrum is determinedf1(k) Then, it can be done for gainf1(k) Further adjusting and optimizing gainf1(k) As the final second filter gain for the kth sub-spectrum. In one example, the second filter gain may be set according to the following equation (15)f1(k) And (3) adjusting:
Gpre_filt(k)=(1+gainf1(k))/2 (15)
wherein G ispre_filt(k) a second filter gain, corresponding to the k-th sub-spectrumf1(k) the second filter gain adjusted according to equation (13) is represented by k being 0, 1, …, and 13, which represents subband indexes and corresponds to 14 subbands, so as to obtain filter gains corresponding to 14 subbands (i.e., the second filter gain described above).
specifically, the above is an example of dividing 5 initial low-frequency domain coefficients into one sub-band, that is, dividing 70 initial low-frequency domain coefficients into 14 sub-bands, where each sub-band includes 5 initial low-frequency domain coefficients, and the first filtering gain for calculating the initial low-frequency domain coefficients is described. The obtained second filter Gain corresponding to each sub-band is the filter Gain of the 5 initial low-frequency domain systems corresponding to each sub-band, so that according to the second filter gains of the 14 sub-bands, the first filter Gain [ Gain ] corresponding to the 70 initial low-frequency domain coefficients is obtainedpre_filt(0),Gainpre_filt(1),...,Gainpre_filt(14)]In other words, the second filter gain G corresponding to the k-th sub-spectrum is determinedpre_filt(k) Then, the aforementioned first filter gain can be obtained, wherein the first filter gain comprises a first number (L, for example, 14) of second filter gains Gpre_filt(k) Second filter gain Gpre_filt(k) The filter gain of N spectral coefficients corresponding to the k-th sub-spectrum, so that the first filter gain G can be obtainedpre_filt(j)。
In an alternative embodiment of the present invention, in step S140, obtaining the target high frequency spectrum based on the correlation parameter and the low frequency spectrum (the low frequency spectrum may be the first low frequency spectrum or the initial low frequency spectrum), may include:
determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
generating an initial high frequency spectrum based on the low frequency spectrum;
And adjusting the initial high-frequency spectrum based on the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a target high-frequency spectrum.
Wherein the initial high frequency spectrum may be obtained by copying the low frequency spectrum. It will be appreciated that in practical applications, the specific way of copying the low frequency spectrum will be different according to the frequency bandwidth of the wideband signal to be obtained and the frequency bandwidth of the selected low frequency spectrum portion to be copied. For example, if the bandwidth of the wideband signal is 2 times the bandwidth of the second narrowband signal and the entire low-frequency spectrum of the second narrowband signal is selected to be copied, only one copy needs to be performed, if the copy of the low-frequency spectrum of the second narrowband signal portion is selected, the copy needs to be performed a corresponding number of times according to the bandwidth corresponding to the selected portion, if the copy of the low-frequency spectrum of the second narrowband signal 1/2 is selected, the copy needs to be performed 2 times, and if the copy of the low-frequency spectrum of the second narrowband signal 1/4 is selected, the copy needs to be performed 4 times.
as an example, for example, if the effective bandwidth of the extended wideband signal is 7kHz, and the bandwidth corresponding to the low-frequency spectrum selected for copying is 1.75kHz, the bandwidth corresponding to the low-frequency spectrum may be copied 3 times based on the bandwidth corresponding to the low-frequency spectrum and the bandwidth of the extended wideband signal, so as to obtain the bandwidth corresponding to the initial high-frequency spectrum (5.25 kHz). If the bandwidth corresponding to the low-frequency spectrum selected for copying is 3.5kHz and the effective bandwidth of the expanded broadband signal is 7kHz, the bandwidth corresponding to the low-frequency spectrum is copied for 1 time to obtain the bandwidth (3.5kHz) corresponding to the initial high-frequency spectrum.
in an alternative embodiment of the present application, based on a low-frequency spectrum (which may be the first low-frequency spectrum or the initial low-frequency spectrum), one implementation manner of generating the initial high-frequency spectrum may be: and copying the frequency spectrum of the high-frequency band part in the low-frequency spectrum to obtain an initial high-frequency spectrum.
since the low-band portion of the low-frequency spectrum contains a large number of harmonics, which affect the signal quality of the extended wideband signal, the spectrum of the high-band portion of the low-frequency spectrum can be selected to be copied to obtain the initial high-frequency spectrum.
As an example, as an example of the foregoing scenario, continuing to describe, the low frequency spectrum corresponds to 70 frequency points, if 35 to 69 frequency points (frequency spectrum of the high frequency band portion in the low frequency spectrum) corresponding to the low frequency spectrum are selected as frequency points to be copied, that is, "mother board", and the effective bandwidth of the extended broadband signal is 7000Hz, the frequency points corresponding to the selected low frequency spectrum need to be copied to obtain an initial high frequency spectrum including 70 frequency points, and in order to obtain the initial high frequency spectrum including 70 frequency points, 35 to 69 frequency points corresponding to the low frequency spectrum may be copied twice in total to generate the initial high frequency spectrum. Similarly, if 0 to 69 frequency points corresponding to the low frequency spectrum are selected as the frequency points to be copied, and the effective bandwidth of the expanded broadband signal is 7000Hz, the 0 to 69 frequency points corresponding to the low frequency spectrum can be copied once for a total of 70 frequency points to generate an initial high frequency spectrum, which includes 70 frequency points in total.
Since the signal corresponding to the low-frequency spectrum may contain a large amount of harmonics, and the signal corresponding to the initial high-frequency spectrum obtained only by copying may also contain a large amount of harmonics, in order to reduce the harmonics in the wideband signal after band expansion, the initial high-frequency spectrum may be adjusted by a difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, and the adjusted initial high-frequency spectrum is used as a target high-frequency spectrum, so that the harmonics in the wideband signal obtained after final frequency point expansion may be reduced.
in the alternative of the present invention, the adjusting the initial high frequency spectrum to obtain the target high frequency spectrum may include:
Determining a difference between the high frequency spectral envelope and the low frequency spectral envelope;
and adjusting the initial high-frequency spectrum based on the difference value to obtain a target high-frequency spectrum.
Specifically, the high-frequency spectrum envelope and the low-frequency spectrum envelope can be represented by the spectrum envelope of the log domain, the initial high-frequency spectrum can be adjusted based on the difference determined by the spectrum envelope of the log domain to obtain the target high-frequency spectrum, and the high-frequency spectrum envelope and the low-frequency spectrum envelope are represented by the spectrum envelope of the log domain, so that calculation is facilitated.
in an alternative aspect of the invention, the high-frequency spectral envelope comprises a second number of first sub-spectral envelopes, the initial high-frequency spectrum comprises a second number of sub-spectra, wherein each first sub-spectral envelope is determined based on a corresponding sub-spectrum in the initial high-frequency spectrum.
determining a difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, and adjusting the initial high-frequency spectrum based on the difference to obtain a target high-frequency spectrum, which may include:
Determining a difference value of each first sub-spectral envelope from a corresponding one of the low-frequency spectral envelopes (hereinafter, the corresponding one of the low-frequency spectral envelopes is described as a second sub-spectral envelope);
adjusting the corresponding initial sub-spectrum based on the difference corresponding to each first sub-spectrum envelope to obtain a second number of adjusted sub-spectrums;
and obtaining a target high-frequency spectrum based on the second number of adjusted sub-spectrums.
In particular, a first sub-spectral envelope may be determined based on a corresponding sub-spectrum in the corresponding initial high-frequency spectrum, and a second sub-spectral envelope may also be determined based on a corresponding sub-spectrum in the corresponding low-frequency spectrum. The number of spectral coefficients corresponding to each sub-spectrum may be the same or different, and if each sub-spectrum envelope is determined based on the corresponding sub-spectrum in the corresponding spectrum, the number of spectral coefficients of the sub-spectrum in the spectrum corresponding to each sub-spectrum envelope may also be different. The second number may be the same as or different from the first number, and is usually not less than the first number.
continuing with the above scenario as an example, if the second number is the same as the first number, the output of the model is a high-frequency spectrum envelope (the second number is 14) with 14 dimensions, and the input of the model includes a low-frequency spectrum and a low-frequency spectrum envelope, where the low-frequency spectrum includes 70-dimensional low-frequency domain coefficients, and the low-frequency spectrum envelope includes a sub-spectrum envelope (the first number is 14) with 14 dimensions, the input of the model is data with 84 dimensions, and the output dimension is much smaller than the input dimension, so that dividing the low-frequency spectrum envelope into the second sub-spectrum envelopes with the first number can reduce the volume and depth of the neural network model, and reduce the complexity of the model. In particular, the high-frequency spectral envelope obtained by the neural network model may include a second number of first sub-spectral envelopes, which is determined based on the corresponding sub-spectrum in the low-frequency spectrum, that is, one sub-frequency spectral envelope is determined based on the corresponding one of the low-frequency spectrum, as can be understood from the foregoing description. Based on the foregoing scenario as an example, to continue the description, if there are 14 sub-spectrums in the low-frequency spectrum, the high-frequency spectrum envelope includes 14 sub-spectrum envelopes.
The difference between the high-frequency spectral envelope and the low-frequency spectral envelope is the difference between each first sub-spectral envelope and the corresponding second sub-spectral envelope, and if the high-frequency spectral envelope is adjusted based on the difference, the initial sub-spectrum is adjusted based on the difference between each first sub-spectral envelope and the corresponding second sub-spectral envelope, and the initial sub-spectrum refers to one spectrum in the initial high-frequency spectrum. Continuing with the above example, if the second number is the same as the first number, that is, the high-frequency spectral envelope includes 14 first sub-spectral envelopes, and the low-frequency spectral envelope includes 14 second sub-spectral envelopes, 14 difference values may be determined based on the determined 14 second sub-spectral envelopes and the corresponding 14 first sub-spectral envelopes, and based on the 14 difference values, the initial sub-spectrum corresponding to the corresponding sub-band is adjusted.
In an alternative scheme of the invention, the correlation parameter further comprises relative flatness information, and the relative flatness information represents the correlation between the spectral flatness of the high-frequency part and the spectral flatness of the low-frequency part of the target broadband spectrum;
determining a difference between the high frequency spectral envelope and the low frequency spectral envelope may include:
Determining a gain adjustment value of the high frequency spectrum envelope based on the relative flatness information and the spectral energy of the low frequency spectrum;
adjusting the high-frequency spectrum envelope based on the gain adjustment value to obtain the adjusted high-frequency spectrum envelope;
A difference between the adjusted high frequency spectral envelope and the low frequency spectral envelope is determined.
based on the foregoing description, in the process of training the neural network model, the labeling result may include relative flatness information, that is, the sample label of the sample data includes relative flatness information of a high frequency part and a low frequency part of the sample wideband signal, where the relative flatness information is determined based on the high frequency part and the low frequency part of the spectrum of the sample wideband signal, and therefore, when the neural network model is applied, and when the input of the model is the low frequency spectrum of the second narrowband signal, the relative flatness information of the high frequency part and the low frequency part of the target wideband spectrum may be predicted based on the output of the neural network model. The relative flatness information may reflect the relative spectral flatness of the high frequency portion and the low frequency portion of the target wideband spectrum, that is, whether the spectrum of the high frequency portion relative to the low frequency portion is flat or not, if the correlation parameter further includes the relative flatness information, the high frequency spectrum envelope may be adjusted based on the relative flatness information and the spectral energy of the low frequency spectrum, and then the initial high frequency spectrum may be adjusted based on the difference between the adjusted high frequency spectrum envelope and the adjusted low frequency spectrum envelope, so that the finally obtained wideband signal has fewer harmonics. The spectral energy of the low-frequency spectrum may be determined based on spectral coefficients of the low-frequency spectrum, and the spectral energy of the low-frequency spectrum may represent spectral flatness.
in an alternative embodiment of the present application, the correlation parameters may include a high frequency spectrum envelope and relative flatness information, the neural network model at least includes an input layer and an output layer, the input layer inputs feature vectors of a low frequency spectrum (or a low frequency spectrum and a low frequency spectrum envelope) (for example, the feature vectors may include a 70-dimensional low frequency spectrum and a 14-dimensional low frequency spectrum envelope), the output layer includes at least one Long Short Term Memory network (LSTM) layer and two fully-connected network layers respectively connected to the LSTM layer, each fully-connected network layer may include at least one fully-connected layer, wherein the LSTM layer converts the feature vectors processed by the input layer, one of the fully-connected network layers performs a first classification process according to the vector values converted by the LSTM layer and outputs a high frequency spectrum envelope (14-dimensional), the other fully-connected network layer performs a second classification process according to the vector values converted by the LSTM layer, and outputs relative flatness information (4 dimensions).
as an example, fig. 3 illustrates a schematic structural diagram of a neural network model provided in an embodiment of the present application, and as shown in the diagram, the neural network model mainly includes two parts: the single-sided LSTM layer and the two fully-connected layers, i.e. each fully-connected network layer in this example comprises one fully-connected layer, where the output of one fully-connected layer is the high-frequency spectral envelope and the output of the other fully-connected layer is the relative flatness information.
The LSTM layer is a recurrent neural network, and the input of the LSTM layer is a feature vector (which may be referred to as an input vector for short) of the low-frequency spectrum (or the low-frequency spectrum and the low-frequency spectrum envelope), the input vector is processed by the LSTM to obtain hidden vectors of a certain dimension, the hidden vectors are respectively used as inputs of two fully-connected layers, the two fully-connected layers respectively perform classification prediction processing, one fully-connected layer predicts and outputs a 14-dimensional column vector, the output corresponds to the high-frequency spectrum envelope, the other fully-connected layer predicts and outputs a 4-dimensional column vector, the 4-dimensional values of the vector are the 4 probability values described above, and the 4 probability values respectively represent the probability that the relative flatness information is the 4 arrays. In an alternative aspect of the invention, the relative flatness information includes relative flatness information of at least two subband regions corresponding to the high frequency part, and the relative flatness information corresponding to one subband region characterizes a correlation between a spectral flatness of one subband region of the high frequency part and a spectral flatness of a high frequency band of the low frequency part.
The relative flatness information is determined based on the high-frequency part and the low-frequency part of the frequency spectrum of the sample broadband signal, and because the low-frequency band of the low-frequency part of the sample narrowband signal contains more abundant harmonics, the high-frequency band of the low-frequency part of the sample narrowband signal can be selected as a reference for determining the relative flatness information, the high-frequency band of the low-frequency part is taken as a master, the high-frequency part of the sample broadband signal is divided into at least two sub-band regions, and the relative flatness information of each sub-band region is determined based on the frequency spectrum of the corresponding sub-band region and the frequency spectrum of the low-frequency part.
Based on the foregoing description, in the process of training the neural network model, the labeling result may include the relative flatness information of each subband region, that is, the sample label of the sample data may include the relative flatness information of each subband region and the low frequency part of the high frequency part of the sample wideband signal, which is determined based on the frequency spectrum of the subband region and the frequency spectrum of the low frequency part of the high frequency part of the sample wideband signal, so that, when the neural network model is applied, when the input of the model is the low frequency spectrum of the second narrowband signal, the relative flatness information of the subband region and the low frequency part of the high frequency part of the target wideband spectrum may be predicted based on the output of the neural network model. Wherein if the high frequency part includes spectra of at least two subband regions, the relative flatness information also includes relative flatness information corresponding to the at least two subband regions, corresponding to the at least two subband regions.
In order to achieve the purpose of band expansion, the number of spectral coefficients of the spectrum of the low-frequency portion of the target wideband spectrum may be the same as or different from the number of spectral coefficients of the spectrum of the high-frequency portion, and the number of spectral coefficients corresponding to each subband region may be the same or different, as long as the total number of spectral coefficients corresponding to at least two subband regions is the same as the number of spectral coefficients corresponding to the initial high-frequency spectrum.
As an example, for example, the at least two subband regions are 2 subband regions, which are respectively a first subband region and a second subband region, the high frequency band of the low frequency portion is a frequency band corresponding to 35 th to 69 th frequency points, the number of spectral coefficients corresponding to the first subband region is the same as the number of spectral coefficients corresponding to the second subband region, the total number of spectral coefficients corresponding to the first subband region and the second subband region is the same as the number of spectral coefficients corresponding to the low frequency portion, the frequency band corresponding to the first subband region is a frequency band corresponding to 70 th to 104 th frequency points, the frequency band corresponding to the second subband region is a frequency band corresponding to 105 th to 139 th frequency points, the number of spectral coefficients of the spectrum of each subband region is 35, and the number of spectral coefficients is the same as the number of spectral coefficients of the spectrum of the high frequency band of the low frequency portion. If the high frequency band of the selected low frequency part is the frequency band corresponding to the 56 th to 69 th frequency points, the high frequency part can be divided into 5 sub-band regions, and each sub-band region corresponds to 14 spectral coefficients.
further, determining a gain adjustment value for the high frequency spectral envelope based on the relative flatness information and the spectral energy of the low frequency spectrum may include:
determining a gain adjustment value of a spectrum envelope part corresponding to each sub-band region in a high-frequency spectrum envelope based on the relative flatness information corresponding to each sub-band region and the spectrum energy corresponding to each sub-band region in the low-frequency spectrum;
Adjusting the high-frequency spectrum envelope based on the gain adjustment value may include:
And adjusting the corresponding spectrum envelope part in the high-frequency spectrum envelope based on the gain adjustment value of the spectrum envelope part corresponding to each sub-band region.
Specifically, if the high frequency portion includes at least two subband regions, the gain adjustment value of the corresponding spectral envelope portion in the high frequency spectral envelope corresponding to each subband region may be determined based on the relative flatness information corresponding to each subband region and the spectral energy corresponding to each subband region in the low frequency spectrum, and then the corresponding spectral envelope portion in the high frequency spectral envelope may be adjusted based on the determined gain adjustment value.
As an example, as described above, the at least two subband regions are two subband regions, which are respectively a first subband region and a second subband region, the relative flatness information between the first subband region and the high frequency band of the low frequency portion is first relative flatness information, and the relative flatness information between the second subband region and the high frequency band of the low frequency portion is second relative flatness information, a gain adjustment value determined based on the first relative flatness information and the spectral energy corresponding to the first subband region may be used to adjust an envelope portion of the high frequency spectral envelope corresponding to the first subband region, and a gain adjustment value determined based on the second relative flatness information and the spectral energy corresponding to the second subband region may be used to adjust an envelope portion of the high frequency spectral envelope corresponding to the second subband region.
in an alternative of the present invention, because the low frequency band of the low frequency portion of the sample narrowband signal contains richer harmonics, the high frequency band of the low frequency portion of the sample narrowband signal may be selected as a reference for determining the relative flatness information, the high frequency band of the low frequency portion is taken as a master, the high frequency portion of the sample wideband signal is divided into at least two subband regions, and the relative flatness information of each subband region is determined based on the frequency spectrum of each subband region of the high frequency portion and the frequency spectrum of the low frequency portion.
based on the foregoing description, in the training phase of the neural network, the relative flatness information of each subband region of the high frequency part of the spectrum of the sample wideband signal may be determined by the analysis of variance method based on sample data (sample data includes the sample narrowband signal and the corresponding sample wideband signal).
As an example, if the high frequency portion of the sample wideband signal is divided into two subband regions, a first subband region and a second subband region, respectively, the relative flatness information of the high frequency portion and the low frequency portion of the sample wideband signal may be first relative flatness information of the first subband region and the high frequency band of the low frequency portion of the sample wideband signal, and second relative flatness information of the second subband region and the high frequency band of the low frequency portion of the sample wideband signal.
the specific determination manner of the first relative flatness information and the second relative flatness information may be:
frequency domain coefficient S based on sample narrowband signal in sample dataLow,sample(i, j) and frequency domain coefficients S of the high frequency part of the sample wideband signal in the sample dataHigh,sample(i, j), the following three variances are calculated by equations (16) to (18):
varL(SLow,sample(i,j)),j=35,36,...,69 (16)
varH1(SHigh,sample(i,j)),j=70,71,...,104 (17)
varH2(SHigh,sample(i,j)),j=105,106,...,139 (18)
Where equation (16) is the variance of the spectrum of the high frequency band of the low frequency portion of the sample narrowband signal, equation (17) is the variance of the spectrum of the first sub-band region, equation (18) is the variance of the spectrum of the second sub-band region, var () represents the variance, the variance of the spectrum can be expressed based on the corresponding frequency domain coefficients, SLow,sample(i, j) represents the frequency domain coefficients of the sample narrowband signal.
In this example, in order to remove the aliasing signal in the sample wideband signal, QMF filtering may be performed on the sample wideband signal first, and then the sample wideband signal is determined by: the method comprises the steps of obtaining original sample broadband signals, wherein the original sample broadband signals are signals which are not subjected to wave filtering processing, conducting low-pass filtering on the original sample broadband signals through a QMF filter to obtain first signals, conducting high-pass filtering to obtain second signals, conducting down-sampling on the first signals and the second signals with a sampling factor of 2, and splicing the obtained first signals and the obtained second signals together to serve as sample broadband signals. The corresponding frequency domain coefficient S of 35-69Low,sample(i, j) are the frequency domain coefficients corresponding to the low frequency part of the sample wideband signal, and the frequency domain coefficients S corresponding to 70-104High,sample(i, j) are frequency domain coefficients corresponding to the low frequency portions (70-79) and the high frequency portions (80-104) of the sample wideband signal; 105-139 corresponding frequency domain coefficient SHigh,sample(i, j) are frequency domain coefficients corresponding to the high frequency portion of the sample wideband signal.
It should be noted that the low-frequency domain coefficient of the sample narrowband signal may also be filteredis the frequency domain coefficient SLow,sample_rev(i, j), i.e., S in the above-described equations (16) to (18)Low,sample(i, j) is replaced with SLow,sample_rev(i,j)。
based on the above three variances, relative flatness information of the spectrum of each subband area and the spectrum of the high frequency band of the low frequency part is determined by formula (19) and formula (20):
where fc (0) represents first relative flatness information of the frequency spectrum of the first subband region and the frequency spectrum of the high frequency band of the low frequency part, and fc (1) represents second relative flatness information of the frequency spectrum of the second subband region and the frequency spectrum of the high frequency band of the low frequency part.
The two values fc (0) and fc (1) can be classified by whether they are greater than or equal to 0 (in the embodiment of the present application, 1 represents greater than or equal to 0, and 0 represents less than 0), and fc (0) and fc (1) are defined as a two-classification array, so that the array includes 4 permutation combinations: {0, 0}, {0, 1}, {1, 0}, and {1, 1 }.
Thus, the relative flatness information output by the model may be 4 probability values for identifying the probability that the relative flatness information belongs to the 4 arrays.
through the probability maximization principle, one of 4 array permutation combinations can be selected as the relative flatness information of the predicted frequency spectrums of the two subband areas and the frequency spectrum of the high frequency band of the low frequency part. Specifically, it can be expressed by formula (21):
v(i,k)=0 or 1,k=0,1 (21)
where v (i, k) represents relative flatness information between the frequency spectrums of the two subband regions and the frequency spectrum of the high frequency band of the low frequency portion, and k represents an index of different subband regions, each subband region may correspond to one piece of relative flatness information, for example, when k is 0, v (i, k) 0 represents that the first subband region is relatively oscillatory with respect to the low frequency portion, i.e., the flatness is poor, and v (i, k) 1 represents that the first subband region is relatively flat with respect to the low frequency portion, i.e., the flatness is good.
In the embodiment of the invention, the low-frequency spectrum of the second narrowband signal is input to the trained neural network model, and the relative flatness information of the high-frequency part of the target broadband spectrum can be obtained through prediction of the neural network model. If the spectrum corresponding to the high-frequency band of the low-frequency part of the second narrowband signal is selected as the input of the neural network model, the relative flatness information of at least two sub-band regions of the high-frequency part of the target broadband spectrum can be predicted and obtained based on the trained neural network model.
In an alternative aspect of the present invention, if the high-frequency spectral envelope includes a second number of first sub-spectral envelopes, determining a gain adjustment value of a spectral envelope portion corresponding to each sub-band region in the high-frequency spectral envelope based on the relative flatness information corresponding to each sub-band region and the spectral energy corresponding to each sub-band region in the low-frequency spectrum, may include:
for each first sub-spectral envelope, determining a gain adjustment value of the first sub-spectral envelope according to spectral energy corresponding to a spectral envelope corresponding to the first sub-spectral envelope in the low-frequency spectral envelope (hereinafter, the spectral envelope corresponding to the first sub-spectral envelope in the low-frequency spectral envelope is described as a second sub-spectral envelope), relative flatness information corresponding to a subband region corresponding to the second sub-spectral envelope, and spectral energy corresponding to a subband region corresponding to the second sub-spectral envelope;
adjusting the corresponding spectral envelope portion of the high-frequency spectral envelope based on the gain adjustment value of the spectral envelope portion corresponding to each subband region in the high-frequency spectral envelope may include:
And adjusting the corresponding first sub-spectrum envelope in the high-frequency spectrum envelope according to the gain adjustment value of the first sub-spectrum envelope corresponding to each sub-band region.
Specifically, each first sub-spectral envelope in the high-frequency spectral envelope corresponds to a gain adjustment value, the gain adjustment value is determined based on spectral energy corresponding to the second sub-spectral envelope, relative flatness information corresponding to a sub-band region corresponding to the second sub-spectral envelope, and spectral energy corresponding to a sub-band region corresponding to the second sub-spectral envelope, and the second sub-spectral envelope corresponds to the first sub-spectral envelope, and the high-frequency spectral envelope includes a second number of first sub-spectral envelopes, and then the high-frequency spectral envelope includes a corresponding second number of gain adjustment values.
It is to be understood that, if the high frequency portion includes a high frequency spectral envelope corresponding to at least two subband regions, for the high frequency spectral envelope corresponding to at least two subband regions, the first sub-spectral envelope of the corresponding subband region may be adjusted based on the gain adjustment value corresponding to the first sub-spectral envelope corresponding to each subband region.
As an example, taking the first subband region including 35 frequency points as an example, an implementation scheme for determining a gain adjustment value of the first subband envelope corresponding to the second subband envelope based on the spectral energy corresponding to the second subband envelope, the relative flatness information corresponding to the subband region corresponding to the second subband envelope, and the spectral energy corresponding to the subband region corresponding to the second subband envelope is as follows:
(1) and analysis v (i, k) shows that the high frequency part is very flat if 1, and shows that the high frequency part oscillates if 0.
(2) And for 35 frequency points in the first subband area, dividing the frequency points into 7 subbands, wherein each subband corresponds to a first subband envelope. The average energy pow _ env of each sub-band (the spectral energy corresponding to the second sub-spectral envelope) is calculated, and the average value Mpow _ env of the 7 average energies (the spectral energy corresponding to the sub-band region corresponding to the second sub-spectral envelope) is calculated. For example, the square of the absolute value of the spectral coefficient of each low frequency spectrum is used as the energy of one low frequency spectrum, and one sub-band corresponds to the spectral coefficients of 5 low frequency spectrums, so that the average value of the energy of the low frequency spectrum corresponding to one sub-band can be used as the average energy of the sub-band.
(3) calculating a gain adjustment value of each first sub-spectrum envelope based on the analyzed relative flatness information, the average energy pow _ env and the average value Mpow _ env corresponding to the first sub-band region, specifically including:
when v (i, k) is 1, g (j) is a1+b1*SQRT(Mpow_env/pow_env(j)),j=0,1,...,6;
When v (i, k) is 0, g (j) is a0+b0*SQRT(Mpow_env/pow_env(j)),j=0,1,...,6;
Wherein, as an alternative, a1=0.875,b1=0.125,a0=0.925,b0G (j) is a gain adjustment value of 0.075.
where, for the case where v (i, k) is 0, the gain adjustment value is 1, i.e., there is no need to perform a flattening operation (adjustment) on the high-frequency spectrum envelope.
Based on the above manner, gain adjustment values of 7 first sub-spectral envelopes in the high-frequency spectral envelope may be determined, and based on the gain adjustment values of the 7 first sub-spectral envelopes, corresponding first sub-spectral envelopes may be adjusted, and the above operation may draw average energy differences of different sub-bands, and perform different degrees of flattening processing on the frequency spectrum corresponding to the first sub-band region.
it can be understood that the corresponding high-frequency spectral envelope of the second subband region may be adjusted in the same manner as described above, and will not be described herein again. If the high-frequency spectral envelope comprises 14 sub-bands in total, 14 gain adjustment values may be correspondingly determined, and the corresponding sub-spectral envelope is adjusted based on the 14 gain adjustment values.
In an alternative scheme of the present invention, after obtaining the adjusted initial high frequency spectrum, filtering processing may be performed on the adjusted initial high frequency spectrum, that is, filtering processing is performed on a high frequency domain coefficient corresponding to the adjusted initial high frequency spectrum to obtain a high frequency domain coefficient after filtering processing, and then a target high frequency spectrum is generated based on the low frequency domain coefficient after filtering processing and the high frequency domain coefficient after filtering processing, where the filtering processing process is substantially the same as the process of performing filtering processing on the low frequency domain coefficient, and is not described herein again.
In an alternative of the present invention, obtaining the target high frequency spectrum based on the low frequency spectrum may include:
Obtaining an initial high-frequency spectrum based on the low-frequency spectrum;
obtaining a target high-frequency spectrum based on the high-frequency part of the initial high-frequency spectrum;
obtaining a wideband signal after band expansion based on the low frequency spectrum and the target high frequency spectrum, which may include:
determining a target low-frequency spectrum according to the low-frequency spectrum and the low-frequency part of the initial high-frequency spectrum;
and obtaining a broadband signal with the expanded frequency band according to the target low-frequency spectrum and the target high-frequency spectrum.
Specifically, after the initial high-frequency spectrum is determined, since the low-frequency spectrum (which may be the initial low-frequency spectrum or the first low-frequency spectrum) corresponds to the low-frequency part of the wideband signal, one implementation manner of determining the target low-frequency spectrum according to the low-frequency spectrum and the low-frequency part of the initial high-frequency spectrum may be: and splicing the low-frequency part of the low-frequency spectrum and the initial high-frequency spectrum, and taking the spliced spectrum as a target low-frequency spectrum. The splicing of the low-frequency portion of the low-frequency spectrum and the low-frequency portion of the initial high-frequency spectrum actually means that the frequency domain coefficient corresponding to the low-frequency spectrum and the frequency domain coefficient corresponding to the low-frequency portion of the initial high-frequency spectrum are spliced.
The initial high-frequency spectrum may be a filtered spectrum or an unfiltered spectrum, if the initial high-frequency spectrum may be a filtered spectrum, the target high-frequency spectrum is a high-frequency portion of the filtered initial high-frequency spectrum, and if the initial high-frequency spectrum is an unfiltered spectrum, the target high-frequency spectrum is a high-frequency portion of the unfiltered initial high-frequency spectrum.
as an example, in this example, taking the second narrowband signal as a signal obtained by QMF low-pass filtering, the sampling rate of the second narrowband signal being 8000Hz, the effective bandwidth being 3500Hz, the sampling rate of the wideband signal to be extended being 16000Hz, and the frame length of the second narrowband signal being 10ms, taking the wideband signal obtained by the QMF synthesis filter as an example, the low-frequency coefficients corresponding to the low-frequency spectrum can be determined to be 80 based on the frame length and the sampling rate of the second narrowband signal, and since the effective bandwidth of the wideband signal to be extended is 7000Hz, only 70 (i.e., 0 to 69) low-frequency coefficients can be selected for the subsequent processing of band extension. Based on the 70 low-frequency domain coefficients, 70 initial high-frequency domain coefficients can be correspondingly obtained, and in order to obtain a wideband signal with an effective bandwidth of 7000Hz, one implementation manner is as follows: and splicing the low-frequency part (for example, 0-9 initial high-frequency domain coefficients) in the initial high-frequency domain coefficients with 70 low-frequency domain coefficients to obtain a target low-frequency domain coefficient containing 80 frequency domain coefficients and a target high-frequency domain coefficient containing 60 frequency domain coefficients. A broadband signal with an effective bandwidth of 7000Hz can be obtained based on the target low frequency spectrum and the target high frequency spectrum.
in an alternative of the present invention, obtaining a wideband signal after band spreading based on a target low frequency spectrum and a target high frequency spectrum may include:
performing frequency-time conversion on the target low-frequency spectrum to obtain a first time domain signal;
performing frequency-time conversion on the target high-frequency spectrum to obtain a second time domain signal;
And obtaining a broadband signal after the frequency band is expanded based on the first time domain signal and the second time domain signal.
in particular, QMF low-pass filtering may describe the spectral response relationship between the high-frequency portion and the low-frequency portion of the filtered signal, as an example, if the low-pass signal obtained after the signal is low-pass filtered is HLow(z) the high-pass signal is HHigh(z) then HLow(z) and HHigh(z) the following relationship exists;
hHigh(k)=-1khLow(k) (22)
according to the QMF correlation theory, the filter bank (H) can be analyzed based on QMFLow(z) and HHigh(z)), a QMF synthesis filter bank is described.
GLow(z)=HLow(z) (23)
GHigh(z)=(-1)*HHigh(z) (24)
a wideband signal can be synthesized and aliasing can be effectively suppressed by the QMF synthesis filterbank based on the principle of fully reconstructing the QMF filterbank.
in this embodiment, the second narrowband signal is a low-pass filtered signal of the first narrowband signal, and is synthesized based on the filter G in the QMF synthesis filter bankLow(z) filtering the first time domain signal, synthesizing the filter G in the filter bank based on the QMFHigh(z) filtering the second time domain signal to filter out an aliasing signal therein, and then obtaining a broadband signal without the aliasing signal based on the filtered first time domain signal and the filtered second time domain signal after filtering, thereby realizing perfect reconstruction.
based on the above example, the bandwidth of the wideband signal is 8000Hz, the effective bandwidth is 7000Hz, and the high-pass part (corresponding to H) after filtering by the QMF filter is described aboveHigh(z)), there is an energy "suppression" effect on portions of the target frequency band (e.g., frequency bands corresponding to 3500-.
it should be noted that, based on the foregoing solution, after the second narrowband signal subjected to low-pass filtering on the first narrowband signal is subjected to downsampling processing with a sampling factor of 2, before the wideband signal is obtained, upsampling processing with a sampling factor of 2 may be respectively performed on the first time domain signal and the second time domain signal, so that the effective bandwidth of the finally obtained wideband signal is 7000 Hz.
In an alternative aspect of the present invention, if the first narrowband signal includes at least two correlated signals, the method may further include:
Fusing at least two paths of related signals to obtain a first narrow-band signal;
Or,
And taking each path of signal in the at least two paths of correlated signals as a first narrow-band signal respectively.
Specifically, the first narrowband signal may be a plurality of paths of associated signals, for example, adjacent speech frames, at least two paths of associated signals may be fused to obtain a path of signal, the path of signal is used as a first narrowband signal, and then the first narrowband signal is extended by the band extension method in the present invention to obtain a wideband signal.
Or, each of the at least two associated signals may be used as a first narrowband signal, and the first narrowband signal is extended by the frequency band extension method in the present invention to obtain at least two corresponding wideband signals, where the at least two wideband signals may be combined into one signal to be output, or may be output separately, and the present invention is not limited in this respect.
in order to better understand the method provided by the embodiment of the present invention, the following describes the scheme of the embodiment of the present invention in further detail with reference to an example of a specific application scenario.
As an example, the application scenario is a PSTN (narrowband speech) and VoIP (broadband speech) interworking scenario, that is, a narrowband speech corresponding to a PSTN telephone is used as a first narrowband signal to be processed, and the first narrowband signal to be processed is subjected to band extension, so that a speech frame received by a VoIP receiving end is a broadband speech, thereby improving the hearing experience of the receiving end.
In this example, the first narrowband signal to be processed is a signal with a utilization rate of 8000Hz and a frame length of 10ms, and according to Nyquist sampling theorem, the effective bandwidth of the first narrowband signal to be processed is 4000 Hz. In an actual voice communication scenario, the upper bound of the effective bandwidth is typically 3500 Hz. Therefore, in this example, the effective bandwidth of the extended wideband signal is 7000Hz for example.
As shown in fig. 4 and 5, the flow of the present embodiment includes the following steps:
step S1, front end signal processing:
And (3) filtering treatment: a first narrow-band signal s to be processedLow(i, j) performing QMF low-pass filtering (QMF filtering as shown in fig. 4, which is a dual-channel QMF filter in this example), and performing downsampling processing with a sampling factor of 2, to obtain a second narrow-band signal sQMF,Low(i, j), second narrowband signalThe sampling rate of the numbers is 8000 Hz. The first narrowband signal may be a signal subjected to upsampling processing with a sampling factor of 2, the sampling rate of the first narrowband signal is 8000Hz, and the sampling rate of the signal subjected to upsampling processing with the sampling factor of 2 is 16000 Hz.
And (3) time-frequency transformation: because the sampling rate of the second narrowband signal is 8000Hz, and the frame length is 10ms, the second narrowband signal corresponds to 80 sample points (frequency points), and discrete cosine transform is performed on the second narrowband signal to obtain an initial low-frequency domain coefficient (MDCT coefficient) of 80 points, specifically: the 80 sample points corresponding to the last speech frame and the 80 sample points corresponding to the current speech frame (the second narrowband signal) are grouped into an array, which includes 160 sample points. Then, windowing, for example, cosine windowing, is performed on the sample points in the array to obtain 80 initial low-frequency-domain coefficients SLow(i, j) (MDCT coefficients), where similarly i is the frame index of the speech frame and j is the intra sample index (j is 0, 1, …, 79).
step S2, low frequency pre-filtering:
The low-frequency pre-filtering is to perform filtering processing on an initial low-frequency domain coefficient obtained by discrete cosine transform on the second narrowband signal to obtain a low-frequency domain coefficient. In the filtering process, the initial low-frequency domain coefficient is filtered by the filter gain determined based on the initial low-frequency domain coefficient, which is specifically shown in the following formula (3):
SLow_rev(i,j)=Gpre_filt(j)*SLow(i,j) (3)
Wherein G ispre_filt(j) for the filter gain calculated from the initial low-frequency-domain coefficients, SLow(i, j) is the initial low frequency domain coefficient, SLow_rev(i, j) is the low frequency domain coefficient obtained by filtering, SLow_revAnd (i, j) the corresponding low-frequency spectrum is a first low-frequency spectrum.
in this example, it is assumed that every 5 initial low-frequency domain coefficients in the same subband share a filter gain, wherein the process of calculating the filter gain is specifically as follows:
(1) The initial low frequency domain coefficients are banded, e.g. adjacent 5 initial low frequency domain coefficients are combined into one sub-spectrum, corresponding to 14 sub-bands in this example. The average energy is calculated for each subband. In particular, the energy of each bin (i.e., the initial low frequency domain coefficients described above) is defined as the sum of the square of the real part and the square of the imaginary part. Calculating the energy values of the adjacent 5 frequency points by the following formula (4), wherein the sum of the energy values of the 5 frequency points is the first spectrum energy of the current sub-spectrum:
Where i is a frame index of the speech frame, j is an intra sample index (j is 0, 1.. times, 69), k is 0, 1.. times, 13, which represents 14 subbands, pe (k) represents a first spectral energy corresponding to a k-th subband, and S (k) represents a first spectral energy corresponding to a k-th subbandLowAnd (i, j) is a low-frequency domain coefficient.
(2) Calculating a first spectral energy of the current sub-spectrum based on the inter-frame correlation by at least one of formula (5) and formula (6):
Fe(k)=1.0+Pe(k)+Pepre(k) (5)
Fe_sm(k)=(Fe(k)+Fepre(k))/2 (6)
where Fe (k) is the smoothing term for the first spectral energy of the current sub-spectrum, Pe (k) is the first spectral energy of the current sub-spectrum of the current speech frame, Pepre(k) Is a second initial spectral energy of a sub-spectrum of an associated speech frame of the current speech frame corresponding to the current sub-spectrum, Fe _ sm (k) is a smoothing term of the cumulatively averaged first spectral energy, Fepre(k) is a smoothing term of a first spectral energy corresponding to the current sub-spectrum of an associated speech frame of the current speech frame, the associated speech frame being at least one speech frame preceding and adjacent to the current speech frame.
(3) calculating a spectrum tilt coefficient of an initial spectrum, dividing a frequency band corresponding to the initial spectrum equally into a first sub-band and a second sub-band, and calculating a first sub-band energy of the first sub-band and a second sub-band energy of the second sub-band respectively, wherein a calculation formula (11) is as follows:
Wherein e1 is the first sub-band energy of the first sub-band, e2 is the second sub-band energy of the second sub-band
Next, from e1 and e2, the spectral tilt coefficient of the initial spectrum is determined based on the following logic:
If(e2>=e1):
T_para=0;
Else:
T_para=8*f_cont_low*SQRT((e1-e2)/(e1+e2);
T_para=min(1.0,T_para);
T_para=T_para/7;
Where T _ para is a spectrum tilt coefficient, SQRT is an open root operation, f _ cont _ low is 0.035, which is a predetermined filter coefficient, and 7 is half of the total number of sub-spectra.
(4) The second filter gain for each sub-spectrum is calculated according to the following equation (13):
gainf0(k)=Fe(k)f_cont_low (13)
wherein, gainf0(k) for the second filter gain of the kth sub-spectrum, f _ cont _ low is 0.035, which is a predetermined filter coefficient, fe (k) is a smoothing term of the first spectral energy of the kth sub-spectrum calculated according to equation (5), k is 0, 1.
Then, if the spectrum tilt coefficient T _ para is positive, the second filter gain is also required according to the following equation (14)f0(k) Further adjustment:
If(T_para>0):
gainf1(k)=gainfo(k)*(1+k*Tpara) (14)
Wherein, gainf1(k) Is the second filter gain adjusted according to the spectral tilt coefficient T _ para.
(5) Obtaining a filter gain value of the low frequency pre-filtering according to the following formula (15):
Gpre_filt(k)=(1+gainf1(k))/2 (15)
Wherein, gainf1(k) Is the adjusted second filter gain, G, according to equation (13)pre_filt(k) Is according to gainf1(k) And finally obtaining the filter gain (namely, the second filter gain) of 5 low-frequency domain coefficients corresponding to the kth sub-spectrum.
Specifically, the second filter gain G corresponding to the k-th sub-spectrum is determinedpre_filt(k) then, since the first filter gain includes a first number (e.g., L ═ 14) of second filter gains Gpre_filt(k) And a second filter gain Gpre_filt(k) The filter gain of N spectral coefficients corresponding to the k-th sub-spectrum, so that the first filter gain G can be obtainedpre_filt(j)。
step S3, feature extraction:
a) Spectral coefficient S based on each sub-spectrumLow_rev(low-frequency domain coefficients after pre-filtering), determining the corresponding sub-spectrum envelope of each sub-spectrum by formula (2).
Wherein, the formula (2) is:
Wherein S isLow_revrepresenting the filtered low frequency domain coefficients (spectral coefficients), eLowAnd (i, k) represents the sub-spectrum envelopes, i is the frame index of the speech frame, k represents the index number of the sub-band, and if the total number of the sub-bands is M, and k is 0, 1 and 2 … … 13, the low-frequency spectrum envelope comprises 14 sub-spectrum envelopes.
Generally, the spectral envelopes of the sub-bands are defined as the average energy of adjacent coefficients (or further converted into logarithmic representation), but this way may cause that coefficients with smaller amplitudes cannot play a substantial role, and the scheme provided by the embodiment of the present invention that directly averages the logarithmic identifications of the spectral coefficients included in each sub-spectrum to obtain the sub-spectral envelopes corresponding to the sub-spectrum may better protect the coefficients with smaller amplitudes in the distortion control of the neural network model training process, compared with the existing commonly used envelope determination scheme, so that more signal parameters can play a corresponding role in frequency band extension.
thus, a 70-dimensional low-frequency spectrum and a 14-dimensional low-frequency spectrum envelope can be used as inputs to the neural network model.
Step S4, inputting the neural network model:
An input layer: the neural network model inputs the 84-dimensional feature vector described above.
An output layer: considering that the target bandwidth of the band expansion in this embodiment is 7000Hz, it is necessary to predict the high frequency spectrum envelopes of 14 sub-bands corresponding to 3500-7000Hz frequency bands, so as to complete the basic band expansion function. Generally, the low-frequency part of the speech frame contains a large number of harmonic-like structures such as fundamental tones and formants; the frequency spectrum of the high-frequency part is flatter; if the low-frequency spectrum is simply copied to the high frequency to obtain an initial high-frequency spectrum, and the initial high-frequency spectrum is subjected to subband-based gain control, the reconstructed high-frequency part generates excessive harmonic-like structures, which can cause distortion and influence the listening sensation; therefore, in this example, based on the relative flatness information predicted by the neural network model, the relative flatness of the low frequency part and the high frequency part is described, and the initial high frequency spectrum is adjusted, so that the adjusted high frequency part is flatter, and the interference of harmonics is reduced.
in the example, the spectrum of the high-frequency band part in the low-frequency spectrum is copied twice to generate an initial high-frequency spectrum, and the low-frequency part, particularly the frequency band corresponding to the frequency band below 1000Hz, has richer harmonic components; therefore, in this embodiment, the spectral coefficients corresponding to the frequency points of 35 to 69 are selected as a "mother board", and the frequency band of the high frequency portion is divided into two sub-band regions, which are a first sub-band region and a second sub-band region, respectively, the high frequency portion corresponds to 70 spectral coefficients, and each sub-band region corresponds to 35 spectral coefficients, so that the high frequency portion performs two flatness analyses, that is, each sub-band region performs one flatness analysis, and then the frequency band corresponding to the first sub-band region is the frequency band corresponding to the frequency points from 70 th to 104 th, and the frequency band corresponding to the second sub-band region is the frequency band corresponding to the frequency points from 105 th to 139 th.
the flatness analysis may use the Variance (Variance) analysis method defined in classical statistics. The oscillation degree of the frequency spectrum can be described by an analysis of variance method, and harmonic components are richer when the value is higher.
Based on the foregoing description, since the low frequency band of the low frequency portion of the sample narrowband signal contains richer harmonics, the high frequency band of the low frequency portion of the sample narrowband signal may be selected as a reference for determining the relative flatness information, that is, the high frequency band of the low frequency portion (the frequency band corresponding to the frequency points of 35-69) is used as a master, the high frequency portion of the sample wideband signal is correspondingly divided into at least two subband regions, and the relative flatness information of each subband region is determined based on the frequency spectrum of each subband region of the high frequency portion and the frequency spectrum of the low frequency portion.
In a training phase of the neural network model, relative flatness information of each subband region of a high frequency portion of a spectrum of a sample wideband signal may be determined by an analysis of variance method based on sample data (sample data includes the sample narrowband signal and a corresponding sample wideband signal).
As an example, if the high frequency portion of the sample wideband signal is divided into two subband regions, a first subband region and a second subband region, respectively, the relative flatness information of the high frequency portion and the low frequency portion of the sample wideband signal may be first relative flatness information of the first subband region and the high frequency band of the low frequency portion of the sample wideband signal, and second relative flatness information of the second subband region and the high frequency band of the low frequency portion of the sample wideband signal.
The specific determination manner of the first relative flatness information and the second relative flatness information may be:
frequency domain coefficient S based on sample narrowband signal in sample dataLow,sample(i, j) and frequency domain coefficients S of the high frequency part of the sample wideband signal in the sample dataHigh,sample(i, j), the following three variances are calculated by equations (16) to (18):
varL(SLow,sample(i,j)),j=35,36,...,69 (16)
varH1(SHigh,sample(i,j)),j=70,71,...,104 (17)
varH2(SHigh,sample(i,j)),j=105,106,...,139 (18)
where equation (16) is the variance of the spectrum of the high frequency band of the low frequency portion of the sample narrowband signal, equation (17) is the variance of the spectrum of the first sub-band region, equation (18) is the variance of the spectrum of the second sub-band region, var () represents the variance, the variance of the spectrum can be expressed based on the corresponding frequency domain coefficients, SLow,sample(i, j) represents the frequency domain coefficients of the sample narrowband signal.
In this example, in order to remove the aliasing signal in the sample wideband signal, QMF filtering may be performed on the sample wideband signal first, and then the sample wideband signal is determined by: the method comprises the steps of obtaining original sample broadband signals, wherein the original sample broadband signals are signals which are not subjected to wave filtering processing, conducting low-pass filtering on the original sample broadband signals through a QMF filter to obtain first signals, conducting high-pass filtering to obtain second signals, conducting down-sampling on the first signals and the second signals with a sampling factor of 2, and splicing the obtained first signals and the obtained second signals together to serve as sample broadband signals. The corresponding frequency domain coefficient S of 35-69Low,sample(i, j) are the frequency domain coefficients corresponding to the low frequency part of the sample wideband signal, and the frequency domain coefficients S corresponding to 70-104High,sample(i, j) are frequency domain coefficients corresponding to the low frequency portions (70-79) and the high frequency portions (80-104) of the sample wideband signal; 105-139 corresponding frequency domain coefficient SHigh,sample(i, j) are frequency domain coefficients corresponding to the high frequency portion of the sample wideband signal.
The low-frequency domain coefficient of the sample narrowband signal may be a frequency domain coefficient S subjected to filtering processingLow,sample_rev(i, j), i.e., S in the above-described equations (16) to (18)Low,sample(i, j) is replaced with SLow,sample_rev(i,j)。
Based on the above three variances, relative flatness information of the spectrum of each subband area and the spectrum of the high frequency band of the low frequency part is determined by formula (19) and formula (20):
Where fc (0) represents first relative flatness information of the frequency spectrum of the first subband region and the frequency spectrum of the high frequency band of the low frequency part, and fc (1) represents second relative flatness information of the frequency spectrum of the second subband region and the frequency spectrum of the high frequency band of the low frequency part.
the two values fc (0) and fc (1) can be classified by whether they are greater than or equal to 0 (in the embodiment of the present application, 1 represents greater than or equal to 0, and 0 represents less than 0), and fc (0) and fc (1) are defined as a two-classification array, so that the array includes 4 permutation combinations: {0, 0}, {0, 1}, {1, 0}, and {1, 1 }.
Thus, the relative flatness information output by the model may be 4 probability values for identifying the probability that the relative flatness information belongs to the 4 arrays.
Through the probability maximization principle, one of 4 array permutation combinations can be selected as the relative flatness information of the predicted frequency spectrums of the two subband areas and the frequency spectrum of the high frequency band of the low frequency part. Specifically, it can be expressed by formula (21):
v(i,k)=0 or 1,k=0,1 (21)
Where v (i, k) represents relative flatness information between the frequency spectrums of the two subband regions and the frequency spectrum of the high frequency band of the low frequency portion, and k represents an index of different subband regions, for example, when k is 0, it represents a first subband region, and when k is 1, it represents a second subband region, each subband region may correspond to one piece of relative flatness information, for example, when k is 0, v (i, k) is 0, it represents that the first subband region is relatively oscillatory with respect to the low frequency portion, that is, relatively poor in flatness, and v (i, k) is 1, it represents that the first subband region is relatively flat with respect to the low frequency portion, that is, relatively good in flatness. Step S5, generating a target high frequency spectrum:
As described above, the first low-frequency spectrum (35-69 and 35 points in total) is copied twice to generate an initial high-frequency spectrum (70 frequency points in total), and based on the initial low-frequency domain coefficient corresponding to the second narrowband signal or the low-frequency domain coefficient after filtering, the relative flatness information of the high-frequency part of the target wideband spectrum obtained through prediction can be obtained through a trained neural network model. Since the frequency domain coefficients of the first low-frequency spectrum corresponding to 35-69 are selected in this example, the trained neural network model can predict the relative flatness information of at least two subband regions of the high-frequency portion of the target wideband spectrum, that is, the high-frequency portion of the target wideband spectrum is divided into at least two subband regions, in this example, taking 2 subband regions as an example, the output of the neural network model is the relative flatness information for the 2 subband regions.
And adjusting the reconstructed initial high-frequency spectrum according to the predicted relative flatness information corresponding to the 2 subband areas. Taking the first subband region as an example, the method specifically includes the following steps:
(1) When v (i, k) is analyzed, it means that the high frequency part is very flat if it is 1, and it means that the high frequency part oscillates if it is 0.
(2) for 35 frequency points in the first subband region, the frequency points are divided into 7 subbands, the high-frequency spectral envelope includes 14 first subband spectral envelopes, and the low-frequency spectral envelope includes 14 second subband spectral envelopes, so that each subband may correspond to one first subband spectral envelope. The average energy pow _ env of each sub-band (the spectral energy corresponding to the second sub-spectral envelope) is calculated, and the average value Mpow _ env of the 7 average energies (the spectral energy corresponding to the sub-band region corresponding to the second sub-spectral envelope) is calculated. For example, the square of the absolute value of the spectral coefficient of each low frequency spectrum is used as the energy of one low frequency spectrum, and one sub-band corresponds to the spectral coefficients of 5 low frequency spectrums, so that the average value of the energy of the low frequency spectrum corresponding to one sub-band can be used as the average energy of the sub-band.
(3) calculating a gain adjustment value of each first sub-spectrum envelope based on the analyzed relative flatness information, the average energy pow _ env, and the average value Mpow _ env corresponding to the first sub-band region, specifically including:
When v (i, k) is 1, g (j) is a1+b1*SQRT(Mpow_env/pow_env(j)),j=0,1,...,6;
When v (i, k) is 0, g (j) is a0+b0*SQRT(Mpow_env/pow_env(j)),j=0,1,...,6;
Wherein, in the present example, a1=0.875,b1=0.125,a0=0.925,b0g (j) is a gain adjustment value of 0.075.
Where, for the case where v (i, k) is 0, the gain adjustment value is 1, i.e., there is no need to perform a flattening operation (adjustment) on the high-frequency spectrum envelope.
(4) The high-frequency spectral envelope e can be determined based on the above mannerHigh(i, k), adjusting the corresponding first sub-spectrum envelope based on the gain adjustment value corresponding to each first sub-spectrum envelope, and by the above operations, the average energy difference of different sub-bands can be reduced, and the spectrum corresponding to the first sub-band region can be flattened to different degrees.
it can be understood that the corresponding high-frequency spectral envelope of the second subband region may be adjusted in the same manner as described above, and will not be described herein again. If the high-frequency spectral envelope comprises 14 sub-bands in total, 14 gain adjustment values may be correspondingly determined, and the corresponding sub-spectral envelope is adjusted based on the 14 gain adjustment values.
further, based on the adjusted high-frequency spectrum envelope, a difference value between the adjusted high-frequency spectrum envelope and the adjusted low-frequency spectrum envelope is determined, the initial high-frequency spectrum is adjusted based on the difference value, and the adjusted high-frequency spectrum S is obtainedHigh(i, j). To this end, the processing procedure corresponding to the above-described step S2 to step S5 may correspond to the band extension shown in fig. 4.
step S6, high frequency post filtering: after obtaining the adjusted high frequency spectrum, performing high frequency post-filtering corresponding to the adjusted high frequency spectrum, that is, performing high frequency post-filtering on the adjusted high frequency spectrum SHighAnd (i, j) filtering the initial high-frequency domain coefficient corresponding to the target high-frequency domain coefficient to obtain the filtered initial high-frequency domain coefficient, and recording the filtered initial high-frequency domain coefficient as the target high-frequency domain coefficient. In the filtering process, the initial high-frequency domain coefficient is filtered by a filtering gain determined based on the initial high-frequency domain coefficient, as shown in the following formula (25):
SHigh_rev(i,j)=Gpost_filt(j)*SHigh(i,j) (25)
wherein G ispost_filt(j) For the filter gain calculated from the high-frequency-domain coefficients, SHigh(i, j) is the initial high frequency domain coefficient, SHigh_revand (i, j) is a high-frequency domain coefficient obtained by filtering.
The specific processing procedure of the high-frequency post-filtering is similar to the specific processing procedure of the high-frequency pre-filtering, and specifically as follows:
In this example, it is assumed that every 5 initial frequency-domain coefficients in the same subband share a filter gain, wherein the process of calculating the filter gain is specifically as follows:
(1) the initial high frequency domain coefficients are banded, e.g. adjacent 5 initial high frequency domain coefficients are combined into one sub-spectrum, corresponding to 14 sub-bands in this example. The average energy is calculated for each subband. In particular, the energy of each bin (i.e., the initial high frequency domain coefficients described above) is defined as the sum of the square of the real part and the square of the imaginary part. Calculating the energy values of the adjacent 5 frequency points by the following formula (26), wherein the sum of the energy values of the 5 frequency points is the first spectrum energy of the current sub-spectrum:
wherein S isHigh(i, j) is the initial high frequency domain coefficient, Pe (k) represents the first spectral energy corresponding to the k-th sub-spectrum, i is the frame index of the speech frame,j is the intra sample index (j 0, 1, …, 69), k 0, 1.
(2) Calculating a first spectral energy of the current sub-spectrum based on the inter-frame correlation by at least one of formula (5) and formula (6):
Fe(k)=1.0+Pe(k)+Pepre(k) (5)
Fe_sm(k)=(Fe(k)+Fepre(k))/2 (6)
where Fe (k) is the smoothing term for the first spectral energy of the current sub-spectrum, Pe (k) is the first spectral energy of the current sub-spectrum of the current speech frame, Pepre(k) is a second initial spectral energy of a sub-spectrum of an associated speech frame of the current speech frame corresponding to the current sub-spectrum, Fe _ sm (k) is a smoothing term of the cumulatively averaged first spectral energy, Fepre(k) The method is a smoothing item of first spectrum energy corresponding to a current sub-spectrum of an associated speech frame of a current speech frame, and the associated speech frame is at least one speech frame which is positioned before the current speech frame and is adjacent to the current speech frame, so that short-time correlation and long-time correlation among speech signal frames are fully considered.
(3) calculating a spectrum tilt coefficient of an initial spectrum, dividing a frequency band corresponding to the initial spectrum equally into a first sub-band and a second sub-band, and calculating a first sub-band energy of the first sub-band and a second sub-band energy of the second sub-band respectively, wherein a calculation formula (11) is as follows:
wherein e1 is the first sub-band energy of the first sub-band, e2 is the second sub-band energy of the second sub-band
Next, from e1 and e2, the spectral tilt coefficient of the initial spectrum is determined based on the following logic:
If(e2>=e1):
T_para=0;
Else:
T_para=8*f_cont_low*SQRT((e1-e2)/(e1+e2);
T_para=min(1.0,T_para);
T_para=T_para/7;
Where T _ para is a spectrum tilt coefficient, SQRT is an open root operation, f _ cont _ low is 0.07, which is a predetermined filter coefficient, and 7 is half of the total number of sub-spectra.
(4) The second filter gain for each sub-spectrum is calculated according to the following equation (13):
gainf0(k)=Fe(k)f_cont_low (13)
wherein, gainfo(k) For the second filter gain of the kth sub-spectrum, f _ cont _ low is 0.07, which is a predetermined filter coefficient, fe (k) is a smoothing term of the first spectral energy of the kth sub-spectrum calculated according to equation (5), and k is 0, 1.
Then, if the spectrum tilt coefficient T _ para is positive, the second filter gain is also required according to the following equation (14)f0(k) Further adjustment:
If(T_para>0):
gainf1(k)=gainfo(k)*(1+k*Tpara) (14)
wherein, gainf1(k) is the second filter gain adjusted according to the spectral tilt coefficient T _ para.
(5) Obtaining a filter gain value of the high-frequency post-filtering according to the following formula (15):
Gpost_filt(k)=(1+gainf1(k))/2 (15)
wherein, gainf1(k) Is the adjusted second filter gain, G, according to equation (13)post_filt(k) Is according to gainf1(k) And finally obtaining the filter gain (namely, the second filter gain) of 5 high-frequency domain coefficients corresponding to the kth sub-spectrum.
Specifically, the second filter gain G corresponding to the k-th sub-spectrum is determinedpost_filt(k) Then, since the first filter gain includes a second number (e.g., L ═ 14) of second filter gains Gpost_filt(k) And 1 is firsttwo filter gains Gpost_filt(k) the filter gain of N spectral coefficients corresponding to the k-th sub-spectrum, so that the first filter gain G can be obtainedpost_filt(j)。
Therefore, the initial high-frequency domain coefficient obtained through the frequency band expansion can be subjected to filtering processing to filter quantization noise in the initial low-frequency spectrum, so that a target high-frequency spectrum is obtained, and the quantization noise is prevented from being expanded into a broadband signal in the subsequent processing process based on the target high-frequency spectrum.
Step S7, generating frequency domain coefficients:
the sampling rate of the second narrowband signal is 8000Hz, the effective bandwidth is 3500Hz, the sampling rate of the wideband signal to be expanded is 16000Hz, the effective bandwidth is 7000Hz, and when the frame length of the second narrowband signal is 10ms, 70 low-frequency domain coefficients can be determined, as shown in step S6, 70 effective high-frequency domain coefficients (MDCT coefficients) S are generated through post-filteringHigh_rev(i, j). Wherein, MDCT coefficient with 10-69 bits is used as coefficient S of QMF high-pass partHigh,QMF(i, j) (frequency domain coefficients corresponding to the target high frequency spectrum) and individually output.
low-frequency splicing: for the effective low-frequency domain coefficient after time-frequency transformation (which may be the frequency domain coefficient S after pre-filtering processing)Low_revOr may be the frequency domain coefficient S without the pre-filtering processLow(i, j), i.e., 0-69 bits of MDCT coefficients, due to the 0-9 bits of MDCT coefficients S among the above 70 high-frequency domain coefficientsLow_tran(i, j) corresponding to 7000-Low_tran(i, j) and 0-69 bits of MDCT coefficient SLow_revOr SLow(i, j) are combined to produce SLow,QMF(i, j) are output individually as coefficients of the QMF low-pass section, i.e., frequency domain coefficients corresponding to the target low-frequency spectrum.
Step S8, frequency-time conversion:
And obtaining a broadband signal after the frequency band is expanded based on the low-frequency spectrum and the high-frequency spectrum.
Specifically, the low-frequency domain system subjected to pre-filtering processingNumber SLow,QMF(i, j) and the high-frequency-domain coefficients S of the postfiltering processHigh,QMF(i, j) which are inverse MDCT transformed (frequency-time transform) to generate a low-pass time-domain representation s of QMFLow,QMF(i, j) (first time domain signal) and a high-pass time domain representation sHigh,QMF(i, j) (second time domain signal).
step S9, QMF synthesis filtering
according to the QMF principle, since the QMF filter is a dual-channel QMF filter, the pair sLow,QMF(i, j) and sHigh,QMF(i, j) upsampling by a sampling factor of 2, respectively, and then synthesizing the filter bank based on G as described aboveLow(z) and GHigh(z) to complete QMF synthesis filtering to obtain wideband signal sRec(i, j), the effective bandwidth of the wideband signal has been extended to 7000 Hz.
By the method of the scheme, in a voice communication scene of intercommunication between the PSTN and the VoIP, the VoIP side can only receive narrow-band voice from the PSTN (the sampling rate is 8kHz, and the effective bandwidth is generally 3.5 kHz). The user can intuitively feel that the sound is not bright enough, the volume is not large enough, and the intelligibility is general. The technical scheme disclosed by the invention is used for expanding the frequency band, and the effective bandwidth can be expanded to 7kHz at a receiving end of a VoIP side without extra bits. The user can intuitively feel brighter timbre, greater volume and better intelligibility. In addition, the problem of forward compatibility does not exist on the basis of the scheme, namely the protocol does not need to be modified, and the PSTN can be perfectly compatible.
in the embodiment of the present invention, the method of the present invention may be applied to the downstream side of the PSTN-VoIP path, for example, the functional module of the scheme provided in the embodiment of the present invention may be integrated at the client equipped with the conference system, and then the band extension of the narrow-band signal may be implemented at the client, so as to obtain the broadband signal. Specifically, the signal processing in this scenario is a signal post-processing technique, taking PSTN (a coding system may be ITU-T g.711) as an example, inside a conference system client, and when the g.711 decoding is completed, a speech frame is recovered; the post-processing technology related to the implementation of the invention is carried out on the voice frame, so that a VoIP user can receive a broadband signal even if a sending end is a narrow-band signal.
The method of the embodiment of the invention can also be applied to a sound mixing server of a PSTN-VoIP channel, after the frequency band expansion is carried out by the sound mixing server, the broadband signal after the frequency band expansion is sent to a VoIP client, and after the VoIP client receives the VoIP code stream corresponding to the broadband signal, the broadband voice output by the frequency band expansion can be recovered by decoding the VoIP code stream. One typical function in the mixing server is to perform transcoding, for example, transcoding a code stream of a PSTN link (e.g., using g.711 coding) such as a code stream commonly used in VoIP (e.g., OPUS or SILK, etc.). In the audio mixing server, the voice frame decoded by the G.711 can be up-sampled to 16000Hz, and then the scheme provided by the embodiment of the invention is used for completing the band expansion; then, the code is transcoded into a code stream commonly used by VoIP. When receiving one or more paths of VoIP code streams, the VoIP client can recover the broadband voice output by the band expansion through decoding.
Based on the same principle as the method shown in fig. 1, the embodiment of the present invention further provides a band extending apparatus 20, as shown in fig. 6, the band extending apparatus 20 may include a second narrowband signal determining module 210, a low-frequency spectrum determining module 220, a high-frequency spectrum determining module 230 and a wideband signal determining module 240, wherein,
a second narrowband signal determining module 210, configured to perform low-pass filtering on the first narrowband signal to be processed to obtain a second narrowband signal;
a low-frequency spectrum determination module 220, configured to determine a low-frequency spectrum of the second narrowband signal;
a high-frequency spectrum determination module 230, configured to obtain a target high-frequency spectrum based on the low-frequency spectrum;
And the wideband signal determining module 240 is configured to obtain a wideband signal after band expansion based on the low-frequency spectrum and the target high-frequency spectrum.
by the scheme in the embodiment, the first narrowband signal can be subjected to low-pass filtering, aliasing signals in the first narrowband signal are eliminated, and the second narrowband signal does not contain aliasing signals, so that a broadband signal obtained based on the low-frequency spectrum of the second narrowband signal can be free from the influence of the aliasing signals, and the quality of the obtained broadband signal is better. Therefore, based on the frequency band expansion scheme of the embodiment of the invention, signals with surging tone and larger volume can be obtained, so that a user has better hearing experience.
optionally, when the high-frequency spectrum determining module 230 obtains the target high-frequency spectrum based on the low-frequency spectrum, it is specifically configured to:
inputting the low-frequency spectrum into a neural network model, and obtaining a correlation parameter based on the output of the neural network model, wherein the correlation parameter represents the correlation between the high-frequency part and the low-frequency part of the target broadband spectrum, and the correlation parameter comprises a high-frequency spectrum envelope;
And obtaining a target high-frequency spectrum based on the correlation parameter and the low-frequency spectrum.
Optionally, when the high-frequency spectrum determining module 230 inputs the low-frequency spectrum to the neural network model, it is specifically configured to:
determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
the low frequency spectrum and the low frequency spectrum envelope are input to a neural network model.
Optionally, the apparatus further comprises:
The low-frequency spectrum processing module is used for dividing a low-frequency spectrum into a first number of sub-spectrums; a respective sub-spectral envelope is determined for each sub-spectrum, the low-frequency spectral envelopes comprising the determined first number of sub-spectral envelopes.
Optionally, when determining the sub-spectrum envelope corresponding to each sub-spectrum, the low-frequency spectrum processing module is specifically configured to:
and obtaining a corresponding sub-spectrum envelope of each sub-spectrum based on the logarithm value of the spectrum coefficient included in each sub-spectrum.
optionally, the second narrowband signal determining module 210 is specifically configured to, when performing low-pass filtering on the first narrowband signal to be processed to obtain a second narrowband signal:
Performing up-sampling processing on the first narrow-band signal with a sampling factor of a first preset value to obtain an up-sampled signal;
performing low-pass filtering on the up-sampling signal through a filter to obtain a filtering signal;
and performing down-sampling processing on the filtered signal with the sampling factor being a second preset value to obtain a second narrow-band signal, wherein the second preset value is determined based on the number of filtering channels of the filter.
Optionally, when determining the low-frequency spectrum of the second narrowband signal, the low-frequency spectrum determining module 220 is specifically configured to:
And carrying out discrete cosine transform processing on the second narrowband signal to obtain a low-frequency spectrum of the second narrowband signal.
optionally, at least one of the low frequency spectrum or the target high frequency spectrum is obtained based on the corresponding filtered initial spectrum.
Optionally, the apparatus further comprises:
The first filtering module is used for filtering the initial spectrum:
And determining a first filtering gain based on the spectrum energy of the initial spectrum, and filtering the initial spectrum according to the first filtering gain.
optionally, the first filtering module determines a first filtering gain based on the spectrum energy of the initial spectrum, and when performing filtering processing on the initial spectrum according to the first filtering gain, the first filtering module is specifically configured to:
dividing the initial frequency spectrum into a first set number of sub-frequency spectrums, and determining first frequency spectrum energy corresponding to each sub-frequency spectrum;
Determining a second filtering gain corresponding to each sub-spectrum based on the first spectrum energy corresponding to each sub-spectrum, wherein the first filtering gain comprises a first set number of second filtering gains;
and performing filtering processing on the corresponding sub-spectrums based on the second filtering gain corresponding to each sub-spectrum.
optionally, when determining the second filtering gain corresponding to each sub-spectrum based on the first spectral energy corresponding to each sub-spectrum, the first filtering module is specifically configured to:
dividing a frequency band corresponding to the initial frequency spectrum into a first sub-band and a second sub-band;
Determining first sub-band energy of the first sub-band according to the first spectrum energy of all sub-spectra corresponding to the first sub-band, and determining second sub-band energy of the second sub-band according to the first spectrum energy of all sub-spectra corresponding to the second sub-band;
determining a spectrum tilt coefficient of the initial spectrum according to the first sub-band energy and the second sub-band energy;
and determining a second filtering gain corresponding to each sub-spectrum according to the spectrum inclination coefficient and the first spectrum energy corresponding to each sub-spectrum.
Optionally, when the first narrowband signal is a speech signal of a current speech frame and the first spectrum energy of one sub-spectrum is determined, the first filtering module is specifically configured to:
determining a first initial spectral energy of a sub-spectrum;
If the current speech frame is a first speech frame, the first spectrum energy is first initial spectrum energy;
If the current voice frame is not the first voice frame, acquiring second initial spectrum energy of a sub-spectrum corresponding to one sub-spectrum of a related voice frame, wherein the related voice frame is at least one voice frame which is positioned before the current voice frame and is adjacent to the current voice frame;
a first spectral energy of a sub-spectrum is derived based on the first initial spectral energy and the second initial spectral energy.
optionally, when the high-frequency spectrum determining module 230 obtains the target high-frequency spectrum based on the low-frequency spectrum, it is specifically configured to:
obtaining an initial high-frequency spectrum based on the low-frequency spectrum;
Obtaining a target high-frequency spectrum based on the high-frequency part of the initial high-frequency spectrum;
when obtaining the wideband signal after the band expansion based on the low frequency spectrum and the target high frequency spectrum, the wideband signal determining module 240 is specifically configured to:
determining a target low-frequency spectrum according to the low-frequency spectrum and the low-frequency part of the initial high-frequency spectrum;
And obtaining a broadband signal with the expanded frequency band according to the target low-frequency spectrum and the target high-frequency spectrum.
optionally, when the wideband signal determining module 240 obtains the wideband signal after the band expansion based on the target low-frequency spectrum and the target high-frequency spectrum, specifically configured to:
performing frequency-time conversion on the target low-frequency spectrum to obtain a first time domain signal;
Performing frequency-time conversion on the target high-frequency spectrum to obtain a second time domain signal;
a wideband signal is generated based on the first time domain signal and the second time domain signal.
Optionally, when the high-frequency spectrum determining module 230 obtains the target high-frequency spectrum based on the correlation parameter and the low-frequency spectrum, it is specifically configured to:
Determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
Generating an initial high frequency spectrum based on the low frequency spectrum;
And adjusting the initial high-frequency spectrum based on the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain a target high-frequency spectrum.
optionally, when the high-frequency spectrum determining module 230 generates the initial high-frequency spectrum based on the low-frequency spectrum, it is specifically configured to: the spectrum of the high-band part of the low-frequency spectrum is copied.
optionally, the high-frequency spectrum envelope and the low-frequency spectrum envelope are both logarithmic-domain spectrum envelopes, and the high-frequency spectrum determining module 230 adjusts the initial high-frequency spectrum based on the high-frequency spectrum envelope and the low-frequency spectrum envelope to obtain the target high-frequency spectrum, which is specifically configured to:
Determining a difference between the high frequency spectral envelope and the low frequency spectral envelope;
and adjusting the initial high-frequency spectrum based on the difference value to obtain a target high-frequency spectrum.
Optionally, the high-frequency spectral envelope includes a second number of first sub-spectral envelopes, and the initial high-frequency spectrum includes a second number of sub-spectra, wherein each first sub-spectral envelope is determined based on a corresponding sub-spectrum in the initial high-frequency spectrum;
The high-frequency spectrum determining module 230 is specifically configured to, when determining a difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, and adjusting the initial high-frequency spectrum based on the difference to obtain a target high-frequency spectrum:
Determining a difference value of each first sub-spectral envelope and a corresponding spectral envelope of the low-frequency spectral envelopes;
adjusting the corresponding initial sub-spectrum based on the difference corresponding to each first sub-spectrum envelope to obtain a second number of adjusted sub-spectrums;
and obtaining a target high-frequency spectrum based on the second number of adjusted sub-spectrums.
optionally, the correlation parameter further includes relative flatness information, where the relative flatness information represents a correlation between the spectral flatness of the high-frequency portion and the spectral flatness of the low-frequency portion of the target broadband spectrum;
When determining the difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, the high-frequency spectrum determining module 230 is specifically configured to:
determining a gain adjustment value of the high frequency spectrum envelope based on the relative flatness information and the spectral energy of the low frequency spectrum;
adjusting the high-frequency spectrum envelope based on the gain adjustment value to obtain the adjusted high-frequency spectrum envelope;
A difference between the adjusted high frequency spectral envelope and the low frequency spectral envelope is determined.
optionally, the relative flatness information includes relative flatness information of at least two subband regions corresponding to the high frequency part, and the relative flatness information corresponding to one subband region represents a correlation between a spectral flatness of one subband region of the high frequency part and a spectral flatness of a high frequency band of the low frequency part;
The high-frequency spectrum determination module 230 is specifically configured to, when determining the gain adjustment value of the high-frequency spectrum envelope based on the relative flatness information and the spectral energy of the low-frequency spectrum:
determining a gain adjustment value of a spectrum envelope part corresponding to each sub-band region in a high-frequency spectrum envelope based on the relative flatness information corresponding to each sub-band region and the spectrum energy corresponding to each sub-band region in the low-frequency spectrum;
The high-frequency spectrum determining module 230 is specifically configured to, when adjusting the high-frequency spectrum envelope based on the gain adjustment value: and adjusting the corresponding spectrum envelope part in the high-frequency spectrum envelope based on the gain adjustment value of the spectrum envelope part corresponding to each sub-band region.
Optionally, if the high-frequency spectral envelope includes the second number of first sub-spectral envelopes, when determining the gain adjustment value of the spectral envelope portion corresponding to each sub-band region in the high-frequency spectral envelope based on the relative flatness information corresponding to each sub-band region and the spectral energy corresponding to each sub-band region in the low-frequency spectrum, the high-frequency spectral determining module 230 is specifically configured to:
For each first sub-spectrum envelope, determining a gain adjustment value of the first sub-spectrum envelope according to the spectrum energy corresponding to the spectrum envelope corresponding to the first sub-spectrum envelope in the low-frequency spectrum envelope, the relative flatness information corresponding to the corresponding sub-band region, and the spectrum energy corresponding to the corresponding sub-band region;
When the high-frequency spectrum determining module 230 adjusts the corresponding spectral envelope portion in the high-frequency spectral envelope based on the gain adjustment value of the spectral envelope portion corresponding to each subband region, it is specifically configured to:
and adjusting the corresponding first sub-spectrum envelope in the high-frequency spectrum envelope according to the gain adjustment value of the first sub-spectrum envelope corresponding to each sub-band region.
Optionally, if the first narrowband signal includes at least two associated signals, the apparatus further includes:
the narrowband signal determining module is used for fusing at least two paths of related signals to obtain a first narrowband signal; or, each signal in the at least two correlated signals is taken as the first narrowband signal.
since the band extending apparatus provided in the embodiment of the present invention is an apparatus capable of executing the band extending method in the embodiment of the present invention, based on the band extending method provided in the embodiment of the present invention, a specific implementation manner of the band extending apparatus in the embodiment of the present invention and various modifications thereof can be understood by those skilled in the art, and therefore, how to implement the band extending method in the embodiment of the present invention by the apparatus is not described in detail herein. The band extending apparatus used by those skilled in the art to implement the band extending method in the embodiments of the present invention all fall within the scope of the protection of the present application.
Based on the same principle as the band extending method and the band extending apparatus provided by the embodiment of the present invention, an embodiment of the present invention also provides an electronic device, which may include a processor and a memory. Wherein the memory has stored therein readable instructions, which when loaded and executed by the processor, may implement the method shown in any of the embodiments of the present invention.
As an example, fig. 7 shows a schematic structural diagram of an electronic device 4000 to which the solution of the embodiment of the present application is applied, and as shown in fig. 7, the electronic device 4000 may include a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (extended industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically erasable programmable Read Only Memory), a CD-ROM (Compact Read Only Memory) or other optical disk storage, optical disk storage (including Compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to.
the memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. The processor 4001 is configured to execute application code stored in the memory 4003 to implement the scheme shown in any one of the foregoing method embodiments.
it should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (21)

1. A method of band expansion, comprising:
Carrying out low-pass filtering on the first narrow-band signal to be processed to obtain a second narrow-band signal;
Determining a low frequency spectrum of the second narrowband signal;
Obtaining a target high-frequency spectrum based on the low-frequency spectrum;
And obtaining a broadband signal with a spread frequency band based on the low-frequency spectrum and the target high-frequency spectrum.
2. the method of claim 1, wherein obtaining a target high frequency spectrum based on the low frequency spectrum comprises:
Inputting the low-frequency spectrum into a neural network model, and obtaining a correlation parameter based on the output of the neural network model, wherein the correlation parameter represents the correlation between a high-frequency part and a low-frequency part of a target broadband spectrum, and the correlation parameter comprises a high-frequency spectrum envelope;
And obtaining the target high-frequency spectrum based on the correlation parameter and the low-frequency spectrum.
3. The method of claim 2, wherein inputting the low frequency spectrum to a neural network model comprises:
determining a low-frequency spectral envelope of the second narrowband signal based on a low-frequency spectrum;
inputting the low frequency spectrum and the low frequency spectrum envelope to a neural network model.
4. The method of claim 3, further comprising:
Dividing the low frequency spectrum into a first number of sub-spectra;
Obtaining a sub-spectrum envelope corresponding to each sub-spectrum based on a logarithmic value of a spectrum coefficient included in each sub-spectrum, wherein the low-frequency spectrum envelope includes the determined first number of sub-spectrum envelopes.
5. the method according to any one of claims 1 to 4, wherein said low-pass filtering the first narrow-band signal to be processed to obtain a second narrow-band signal comprises:
Performing upsampling processing on the first narrow-band signal with a sampling factor of a first preset value to obtain an upsampled signal;
Performing low-pass filtering on the up-sampling signal through a filter to obtain a filtering signal;
And performing down-sampling processing on the filtering signal with a sampling factor of a second preset value to obtain the second narrow-band signal, wherein the second preset value is determined based on the number of filtering channels of the filter.
6. The method of any of claims 1-4, wherein the determining the low frequency spectrum of the second narrowband signal comprises:
And performing discrete cosine transform processing on the second narrowband signal to obtain a low-frequency spectrum of the second narrowband signal.
7. The method according to any of claims 1 to 4, characterized in that at least one of the low frequency spectrum or the target high frequency spectrum is derived based on a corresponding filtered initial spectrum.
8. the method of claim 7, wherein the filtering the initial spectrum comprises:
And determining a first filter gain based on the spectrum energy of the initial spectrum, and carrying out filter processing on the initial spectrum according to the first filter gain.
9. the method according to claim 8, wherein the determining a first filter gain based on the spectral energy of the initial spectrum, and the filtering the initial spectrum according to the first filter gain comprises:
Dividing the initial frequency spectrum into a first set number of sub-frequency spectrums, and determining first frequency spectrum energy corresponding to each sub-frequency spectrum;
determining a second filtering gain corresponding to each sub-spectrum based on the first spectral energy corresponding to each sub-spectrum, wherein the first filtering gain comprises the first set number of second filtering gains;
And performing filtering processing on the corresponding sub-spectrums based on the second filtering gain corresponding to each sub-spectrum.
10. the method of claim 9, wherein determining the second filter gain for each sub-spectrum based on the first spectral energy for each sub-spectrum comprises:
Dividing a frequency band corresponding to the initial frequency spectrum into a first sub-band and a second sub-band;
Determining first sub-band energy of the first sub-band according to first spectrum energy of all sub-spectrums corresponding to the first sub-band, and determining second sub-band energy of the second sub-band according to the first spectrum energy of all sub-spectrums corresponding to the second sub-band;
determining a spectrum tilt coefficient of the initial spectrum according to the first sub-band energy and the second sub-band energy;
and determining a second filter gain corresponding to each sub-spectrum according to the spectrum inclination coefficient and the first spectrum energy corresponding to each sub-spectrum.
11. the method of claim 10, wherein the first narrowband signal is a speech signal of a current speech frame, and wherein determining a first spectral energy of a sub-spectrum comprises:
Determining a first initial spectral energy of said one sub-spectrum;
if the current speech frame is a first speech frame, the first spectrum energy is the first initial spectrum energy;
if the current voice frame is not the first voice frame, acquiring second initial spectrum energy of a sub-spectrum corresponding to the sub-spectrum of a related voice frame, wherein the related voice frame is at least one voice frame which is positioned before the current voice frame and is adjacent to the current voice frame;
Obtaining a first spectral energy of the sub-spectrum based on the first initial spectral energy and the second initial spectral energy.
12. The method according to any one of claims 1 to 4, wherein the deriving a target high frequency spectrum based on the low frequency spectrum comprises:
Obtaining an initial high-frequency spectrum based on the low-frequency spectrum;
obtaining the target high-frequency spectrum based on the high-frequency part of the initial high-frequency spectrum;
The obtaining of the wideband signal with the expanded frequency band based on the low frequency spectrum and the target high frequency spectrum includes:
Determining a target low-frequency spectrum according to the low-frequency spectrum and the low-frequency part of the initial high-frequency spectrum;
And obtaining a broadband signal with a spread frequency band according to the target low-frequency spectrum and the target high-frequency spectrum.
13. The method of claim 12, wherein obtaining a band-extended wideband signal based on the target low-frequency spectrum and the target high-frequency spectrum comprises:
Performing frequency-time conversion on the target low-frequency spectrum to obtain a first time domain signal;
Performing frequency-time conversion on the target high-frequency spectrum to obtain a second time domain signal;
generating the wideband signal based on the first time domain signal and the second time domain signal.
14. The method of claim 2, wherein obtaining a target high frequency spectrum based on the correlation parameter and the low frequency spectrum comprises:
Determining a low-frequency spectral envelope of the second narrowband signal based on the low-frequency spectrum;
Generating an initial high frequency spectrum based on the low frequency spectrum;
Determining a difference between the high-frequency spectral envelope and the low-frequency spectral envelope, wherein the high-frequency spectral envelope and the low-frequency spectral envelope are both logarithmic domain spectral envelopes;
And adjusting the initial high-frequency spectrum based on the difference value to obtain the target high-frequency spectrum.
15. the method of claim 14, wherein the high-frequency spectral envelope comprises a second number of first sub-spectral envelopes, wherein the initial high-frequency spectrum comprises the second number of sub-spectra, and wherein each of the first sub-spectral envelopes is determined based on a corresponding sub-spectrum of the initial high-frequency spectrum;
The determining a difference between the high-frequency spectrum envelope and the low-frequency spectrum envelope, and adjusting the initial high-frequency spectrum based on the difference to obtain the target high-frequency spectrum includes:
Determining a difference value of each first sub-spectral envelope and a corresponding one of the low-frequency spectral envelopes;
Adjusting the corresponding initial sub-spectrum based on the difference corresponding to each first sub-spectrum envelope to obtain the second number of adjusted sub-spectrums;
and obtaining the target high-frequency spectrum based on the second number of adjusted sub-spectrums.
16. The method of claim 15, wherein the correlation parameters further include relative flatness information characterizing a correlation of spectral flatness of a high frequency portion of the target wideband spectrum and spectral flatness of a low frequency portion;
The determining a difference between the high frequency spectral envelope and the low frequency spectral envelope comprises:
determining a gain adjustment value for the high frequency spectral envelope based on the relative flatness information and the energy information of the low frequency spectrum;
adjusting the high-frequency spectrum envelope based on the gain adjustment value to obtain an adjusted high-frequency spectrum envelope;
determining a difference between the adjusted high frequency spectral envelope and the low frequency spectral envelope.
17. the method of claim 16, wherein the relative flatness information includes relative flatness information corresponding to at least two subband regions of the high frequency portion, the relative flatness information corresponding to one subband region characterizing a correlation of spectral flatness of one subband region of the high frequency portion and spectral flatness of a high frequency band of the low frequency portion;
The determining a gain adjustment value for the high frequency spectral envelope based on the relative flatness information and the spectral energy of the low frequency spectrum comprises:
Determining a gain adjustment value of a spectral envelope part corresponding to each sub-band region in the high-frequency spectral envelope based on the relative flatness information corresponding to each sub-band region and the spectral energy corresponding to each sub-band region in the low-frequency spectrum;
the adjusting the high frequency spectral envelope based on the gain adjustment value comprises:
And adjusting the corresponding spectrum envelope part in the high-frequency spectrum envelope based on the gain adjustment value of the spectrum envelope part corresponding to each sub-band region.
18. The method according to claim 17, wherein if the high-frequency spectral envelope includes a second number of first sub-spectral envelopes, the determining the gain adjustment value of the portion of the spectral envelope corresponding to each sub-band region in the high-frequency spectral envelope based on the relative flatness information corresponding to each sub-band region and the spectral energy corresponding to each sub-band region in the low-frequency spectrum comprises:
For each first sub-spectrum envelope, determining a gain adjustment value of the first sub-spectrum envelope according to the spectrum energy corresponding to the spectrum envelope corresponding to the first sub-spectrum envelope in the low-frequency spectrum envelope, the relative flatness information corresponding to the corresponding sub-band region, and the spectrum energy corresponding to the corresponding sub-band region;
The adjusting, based on the gain adjustment value of the spectral envelope portion corresponding to each subband region, the corresponding spectral envelope portion in the high-frequency spectral envelope includes:
and adjusting the corresponding first sub-spectrum envelope in the high-frequency spectrum envelope according to the gain adjustment value of the first sub-spectrum envelope corresponding to each sub-band region.
19. A frequency band extending apparatus, comprising:
the second narrowband signal determining module is used for performing low-pass filtering on the first narrowband signal to be processed to obtain a second narrowband signal;
a low-frequency spectrum determination module for determining a low-frequency spectrum of the second narrowband signal;
The high-frequency spectrum determining module is used for obtaining a target high-frequency spectrum based on the low-frequency spectrum;
And the broadband signal determining module is used for obtaining a broadband signal after the frequency band is expanded on the basis of the low-frequency spectrum and the target high-frequency spectrum.
20. an electronic device, comprising a processor and a memory;
the memory has stored therein readable instructions which, when loaded and executed by the processor, implement the method of any one of claims 1 to 18.
21. a computer readable storage medium having stored thereon readable instructions which, when loaded and executed by a processor, carry out the method of any one of claims 1 to 18.
CN201910882470.8A 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium Active CN110556121B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202410194890.8A CN117975976A (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium
CN201910882470.8A CN110556121B (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910882470.8A CN110556121B (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202410194890.8A Division CN117975976A (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110556121A true CN110556121A (en) 2019-12-10
CN110556121B CN110556121B (en) 2024-01-09

Family

ID=68740656

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202410194890.8A Pending CN117975976A (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium
CN201910882470.8A Active CN110556121B (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202410194890.8A Pending CN117975976A (en) 2019-09-18 2019-09-18 Band expansion method, device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (2) CN117975976A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885362A (en) * 2021-01-14 2021-06-01 珠海市岭南大数据研究院 Target identification method, system, device and medium based on radiation noise
CN113299313A (en) * 2021-01-28 2021-08-24 维沃移动通信有限公司 Audio processing method and device and electronic equipment
CN113470667A (en) * 2020-03-11 2021-10-01 腾讯科技(深圳)有限公司 Voice signal coding and decoding method and device, electronic equipment and storage medium
WO2021227783A1 (en) * 2020-05-15 2021-11-18 腾讯科技(深圳)有限公司 Voice processing method, apparatus and device, and storage medium
CN114420140A (en) * 2022-03-30 2022-04-29 北京百瑞互联技术有限公司 Frequency band expansion method, encoding and decoding method and system based on generation countermeasure network
CN119337036A (en) * 2024-12-16 2025-01-21 昆山九华电子设备厂 A method for generating broadband complex background signals based on frequency domain processing
CN115631760B (en) * 2022-09-29 2025-07-25 歌尔科技有限公司 Speech noise reduction method, device, equipment and computer readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458930A (en) * 2007-12-12 2009-06-17 华为技术有限公司 Excitation signal generation in bandwidth spreading and signal reconstruction method and apparatus
US20100063802A1 (en) * 2008-09-06 2010-03-11 Huawei Technologies Co., Ltd. Adaptive Frequency Prediction
US20100114583A1 (en) * 2008-09-25 2010-05-06 Lg Electronics Inc. Apparatus for processing an audio signal and method thereof
CN102124518A (en) * 2008-08-05 2011-07-13 弗朗霍夫应用科学研究促进协会 Apparatus and method for processing an audio signal for speech enhancement using a feature extraction
US20130144614A1 (en) * 2010-05-25 2013-06-06 Nokia Corporation Bandwidth Extender
CN105070293A (en) * 2015-08-31 2015-11-18 武汉大学 Audio bandwidth extension coding and decoding method and device based on deep neutral network
US20160275959A1 (en) * 2013-11-02 2016-09-22 Samsung Electronics Co., Ltd. Broadband signal generating method and apparatus, and device employing same
CN107705801A (en) * 2016-08-05 2018-02-16 中国科学院自动化研究所 The training method and Speech bandwidth extension method of Speech bandwidth extension model
CN107993672A (en) * 2017-12-12 2018-05-04 腾讯音乐娱乐科技(深圳)有限公司 Frequency expansion method and device
CN108198571A (en) * 2017-12-21 2018-06-22 中国科学院声学研究所 A kind of bandwidth expanding method judged based on adaptive bandwidth and system
WO2019081070A1 (en) * 2017-10-27 2019-05-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458930A (en) * 2007-12-12 2009-06-17 华为技术有限公司 Excitation signal generation in bandwidth spreading and signal reconstruction method and apparatus
CN102124518A (en) * 2008-08-05 2011-07-13 弗朗霍夫应用科学研究促进协会 Apparatus and method for processing an audio signal for speech enhancement using a feature extraction
US20100063802A1 (en) * 2008-09-06 2010-03-11 Huawei Technologies Co., Ltd. Adaptive Frequency Prediction
US20100114583A1 (en) * 2008-09-25 2010-05-06 Lg Electronics Inc. Apparatus for processing an audio signal and method thereof
US20130144614A1 (en) * 2010-05-25 2013-06-06 Nokia Corporation Bandwidth Extender
US20160275959A1 (en) * 2013-11-02 2016-09-22 Samsung Electronics Co., Ltd. Broadband signal generating method and apparatus, and device employing same
CN105070293A (en) * 2015-08-31 2015-11-18 武汉大学 Audio bandwidth extension coding and decoding method and device based on deep neutral network
CN107705801A (en) * 2016-08-05 2018-02-16 中国科学院自动化研究所 The training method and Speech bandwidth extension method of Speech bandwidth extension model
WO2019081070A1 (en) * 2017-10-27 2019-05-02 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor
CN107993672A (en) * 2017-12-12 2018-05-04 腾讯音乐娱乐科技(深圳)有限公司 Frequency expansion method and device
CN108198571A (en) * 2017-12-21 2018-06-22 中国科学院声学研究所 A kind of bandwidth expanding method judged based on adaptive bandwidth and system

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113470667A (en) * 2020-03-11 2021-10-01 腾讯科技(深圳)有限公司 Voice signal coding and decoding method and device, electronic equipment and storage medium
WO2021227783A1 (en) * 2020-05-15 2021-11-18 腾讯科技(深圳)有限公司 Voice processing method, apparatus and device, and storage medium
US11900954B2 (en) 2020-05-15 2024-02-13 Tencent Technology (Shenzhen) Company Limited Voice processing method, apparatus, and device and storage medium
CN112885362A (en) * 2021-01-14 2021-06-01 珠海市岭南大数据研究院 Target identification method, system, device and medium based on radiation noise
CN112885362B (en) * 2021-01-14 2024-04-09 珠海市岭南大数据研究院 Target identification method, system, device and medium based on radiation noise
CN113299313A (en) * 2021-01-28 2021-08-24 维沃移动通信有限公司 Audio processing method and device and electronic equipment
WO2022161475A1 (en) * 2021-01-28 2022-08-04 维沃移动通信有限公司 Audio processing method and apparatus, and electronic device
CN113299313B (en) * 2021-01-28 2024-03-26 维沃移动通信有限公司 Audio processing method, device and electronic equipment
CN114420140A (en) * 2022-03-30 2022-04-29 北京百瑞互联技术有限公司 Frequency band expansion method, encoding and decoding method and system based on generation countermeasure network
CN114420140B (en) * 2022-03-30 2022-06-21 北京百瑞互联技术有限公司 Frequency band expansion method, encoding and decoding method and system based on generation countermeasure network
CN115631760B (en) * 2022-09-29 2025-07-25 歌尔科技有限公司 Speech noise reduction method, device, equipment and computer readable storage medium
CN119337036A (en) * 2024-12-16 2025-01-21 昆山九华电子设备厂 A method for generating broadband complex background signals based on frequency domain processing

Also Published As

Publication number Publication date
CN117975976A (en) 2024-05-03
CN110556121B (en) 2024-01-09

Similar Documents

Publication Publication Date Title
CN110556122B (en) Band expansion method, device, electronic equipment and computer readable storage medium
CN110556123B (en) Band expansion method, device, electronic equipment and computer readable storage medium
CN110556121B (en) Band expansion method, device, electronic equipment and computer readable storage medium
CN102089816B (en) Audio signal synthesizer and audio signal encoder
US8935156B2 (en) Enhancing performance of spectral band replication and related high frequency reconstruction coding
US9251800B2 (en) Generation of a high band extension of a bandwidth extended audio signal
US8639500B2 (en) Method, medium, and apparatus with bandwidth extension encoding and/or decoding
US9280978B2 (en) Packet loss concealment for bandwidth extension of speech signals
JP2008513848A (en) Method and apparatus for artificially expanding the bandwidth of an audio signal
TW201140563A (en) Determining an upperband signal from a narrowband signal
JP2005173607A (en) Method and device to generate up-sampled signal of time discrete audio signal
CN102612712A (en) Bandwidth extension of a low band audio signal
JP6289507B2 (en) Apparatus and method for generating a frequency enhancement signal using an energy limiting operation
CN112530446B (en) Band expansion method, device, electronic equipment and computer readable storage medium
Bhatt et al. A novel approach for artificial bandwidth extension of speech signals by LPC technique over proposed GSM FR NB coder using high band feature extraction and various extension of excitation methods
HK40013081A (en) Method, apparatus, electronic device and computer-readable storage medium for expanding frequency band
HK40038380A (en) Method and apparatus for expanding frequency band, electronic device, and computer readable storage medium
HK40013079A (en) Method, apparatus, electronic device, and computer-readable storage medium for expanding frequency band
HK40013085A (en) Method, apparatus, electronic device, and computer-readable storage medium for bandwidth extension
JP2025511991A (en) High Frequency Reconstruction Using Neural Network Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40013081

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TG01 Patent term adjustment
TG01 Patent term adjustment