CN109496333A

CN109496333A - A kind of frame loss compensation method and device

Info

Publication number: CN109496333A
Application number: CN201780046044.XA
Authority: CN
Inventors: 高振东; 肖建良; 刘泽新
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-06-26
Filing date: 2017-06-26
Publication date: 2019-03-19
Also published as: WO2019000178A1

Abstract

A frame loss compensation method and device, comprising: receiving a voice code stream sequence; acquiring historical frame information and future frame information in the voice code stream sequence, wherein the voice code stream sequence includes frame information of multiple voice frames, the The plurality of speech frames include at least one historical frame, at least one current frame, and at least one future frame, the at least one historical frame is located before the at least one current frame in the time domain, and the at least one future frame is located after the at least one current frame in the time domain , the historical frame information is the frame information of the at least one historical frame, and the future frame information is the frame information of the at least one future frame; according to the historical frame information and the future frame information, the frame information of the at least one current frame is estimated to improve the The accuracy of drop frame compensation is improved.

Description

PCT国内申请，说明书已公开。PCT domestic application, the description has been published.

Claims

A method for frame loss compensation, the method comprising:

receiving a voice code stream sequence;

acquiring historical frame information and future frame information in the voice code stream sequence, wherein the voice code stream sequence comprises frame information of a plurality of voice frames, the plurality of voice frames comprise at least one historical frame, at least one current frame and at least one future frame, the at least one historical frame is positioned in front of the at least one current frame in a time domain, the at least one future frame is positioned behind the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;

estimating the frame information of the at least one current frame according to the historical frame information and the future frame information.
The method of claim 1, further comprising: storing the voice code stream sequence in a buffer area;

the acquiring of the historical frame information and the future frame information in the speech code stream sequence comprises:

decoding frame information of a plurality of voice frames of the voice code stream sequence in the buffer area to obtain decoded historical frame information;

obtaining the future frame information that is not decoded from the buffer.
The method of claim 1 or 2, wherein the historical frame information comprises formant spectrum information of the at least one historical frame, the future frame information comprises formant spectrum information of the at least one future frame;

estimating the frame information of the at least one current frame according to the historical frame information and the future frame information comprises:

determining formant spectrum information of the at least one current frame based on the formant spectrum information of the at least one historical frame and the formant spectrum information of the at least one future frame.
A method according to any one of claims 1 to 3, wherein said historical frame information comprises a pitch value of said at least one historical frame, and said future frame information comprises a pitch value of said at least one future frame;

estimating the frame information of the at least one current frame according to the historical frame information and the future frame information comprises:

determining a pitch value of the at least one current frame based on the pitch value of the at least one historical frame and the pitch value of the at least one future frame.
The method of any of claims 1-4, wherein the historical frame information comprises an energy of the at least one historical frame, the future frame information comprises an energy of the at least one future frame;

estimating the frame information of the at least one current frame according to the historical frame information and the future frame information comprises:

determining the energy of the at least one current frame according to the energy of the at least one historical frame and the energy of the at least one future frame.
The method of any of claims 1-5, wherein said estimating frame information for the at least one current frame based on the historical frame information and the future frame information comprises:

determining a frame type of the at least one current frame, the frame type comprising an unvoiced sound or a voiced sound;

determining at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame according to the frame type.
The method of claim 6, wherein said determining the frame type of the at least one current frame comprises:

determining a magnitude of spectral tilt of the at least one current frame;

and determining the frame type of the at least one current frame according to the spectral tilt of the at least one current frame.
The method of claim 6, wherein said determining the frame type of the at least one current frame comprises:

obtaining pitch change states of a plurality of subframes in the at least one current frame;

and determining the frame type of the at least one current frame according to the pitch change states of the plurality of subframes.
The method of any of claims 6-8, wherein said determining at least one of an adaptive codebook gain and a fixed codebook gain for the at least one current frame based on the frame type comprises:

and if the frame type is voiced, determining the adaptive codebook gain of the at least one current frame according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the fixed codebook gains of a plurality of historical frames as the fixed codebook gain of the at least one current frame.
The method of any of claims 6-9, wherein said determining at least one of an adaptive codebook gain and a fixed codebook gain for the at least one current frame based on the frame type comprises:

and if the frame type is unvoiced, determining the fixed codebook gain of the at least one current frame according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the adaptive codebook gains of a plurality of historical frames as the adaptive codebook gain of the at least one current frame.
The method of claim 9 or 10, wherein the method further comprises:

and determining the energy gain of the at least one current frame according to the time domain signal size in the decoded historical frame information and the length of each subframe in the historical frame.
A frame loss compensation apparatus, the apparatus comprising:

the receiving module is used for receiving the voice code stream sequence;

an obtaining module, configured to obtain historical frame information and future frame information in the speech code stream sequence, where the speech code stream sequence includes frame information of multiple speech frames, the multiple speech frames include at least one historical frame, at least one current frame, and at least one future frame, the at least one historical frame is located before the at least one current frame in a time domain, the at least one future frame is located after the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;

and the processing module is used for estimating the frame information of the at least one current frame according to the historical frame information and the future frame information.
The apparatus of claim 12, wherein the sequence of speech codestreams is stored in a buffer;

the acquisition module is specifically configured to:

decoding frame information of a plurality of voice frames of the voice code stream sequence in the buffer area to obtain decoded historical frame information;

obtaining the future frame information that is not decoded from the buffer.
The apparatus of claim 12 or 13, wherein the historical frame information comprises formant spectrum information for the at least one historical frame, the future frame information comprising formant spectrum information for the at least one future frame;

the processing module is specifically configured to:

and determining the formant spectrum information of the at least one current frame according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame.
The apparatus according to any one of claims 12 to 14, wherein said historical frame information comprises a pitch value of said at least one historical frame, and said future frame information comprises a pitch value of said at least one future frame;

the processing module is specifically configured to:

determining a pitch value of the at least one current frame based on the pitch value of the at least one historical frame and the pitch value of the at least one future frame.
The apparatus of any of claims 12 to 15, wherein the historical frame information comprises an energy of the at least one historical frame, the future frame information comprises an energy of the at least one future frame;

the processing module is specifically configured to:

determining the energy of the at least one current frame according to the energy of the at least one historical frame and the energy of the at least one future frame.
The apparatus according to any one of claims 12 to 16, wherein the processing module is specifically configured to:

determining a frame type of the at least one current frame, the frame type comprising an unvoiced sound or a voiced sound;

determining at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame according to the frame type.
The apparatus of claim 17, wherein the processing module is further configured to determine a magnitude of spectral tilt of the at least one current frame;

and determining the frame type of the at least one current frame according to the spectral tilt of the at least one current frame.
The apparatus of claim 17, wherein the processing module is further configured to obtain pitch change statuses for a plurality of subframes in the at least one current frame;

and determining the frame type of the at least one current frame according to the pitch change states of the plurality of subframes.
The apparatus according to any one of claims 17 to 19, wherein the processing module is specifically configured to:

and if the frame type is voiced, determining the adaptive codebook gain of the at least one current frame according to the adaptive codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the fixed codebook gains of a plurality of historical frames as the fixed codebook gain of the at least one current frame.
The apparatus according to any one of claims 17 to 19, wherein the processing module is specifically configured to:

and if the frame type is unvoiced, determining the fixed codebook gain of the at least one current frame according to the fixed codebook gain and the pitch period of one historical frame and the energy gain of the at least one current frame, and taking the average value of the adaptive codebook gains of a plurality of historical frames as the adaptive codebook gain of the at least one current frame.
The apparatus of claim 20 or 21, wherein the processing module is further configured to determine the energy gain of the at least one current frame according to the time domain signal size in the decoded historical frame information and the length of each subframe in the historical frame.
A frame loss compensation apparatus, comprising: a memory, a communication bus, and a vocoder, the memory coupled to the vocoder through the communication bus; wherein the memory is configured to store program code, and the vocoder is configured to call the program code to:

receiving a voice code stream sequence;

acquiring historical frame information and future frame information in the voice code stream sequence, wherein the voice code stream sequence comprises frame information of a plurality of voice frames, the plurality of voice frames comprise at least one historical frame, at least one current frame and at least one future frame, the at least one historical frame is positioned in front of the at least one current frame in a time domain, the at least one future frame is positioned behind the at least one current frame in the time domain, the historical frame information is frame information of the at least one historical frame, and the future frame information is frame information of the at least one future frame;

estimating the frame information of the at least one current frame according to the historical frame information and the future frame information.