CN107426200B

CN107426200B - Multimedia data processing method and device

Info

Publication number: CN107426200B
Application number: CN201710569644.6A
Authority: CN
Inventors: 刘运; 张帆; 马跃
Original assignee: Guangzhou Baiguoyuan Network Technology Co Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2017-07-13
Filing date: 2017-07-13
Publication date: 2020-10-23
Anticipated expiration: 2037-07-13
Also published as: CN107426200A

Abstract

The embodiment of the invention discloses a multimedia data processing method and a device, wherein the method comprises the following steps: acquiring first multimedia data, and performing data processing on the first multimedia data according to a current complexity algorithm; recording the multimedia processing duration corresponding to each data frame in the first multimedia data, determining a preset threshold range in which the multimedia processing duration corresponding to each data frame is located, and counting the number of data frames in each preset threshold range; and if the number of the data frames in each preset threshold range meets a preset switching condition, switching the current complexity algorithm into a target complexity algorithm so as to facilitate the follow-up multimedia processing on the acquired second multimedia data according to the target complexity algorithm. By adopting the invention, the corresponding complexity algorithm can be adjusted during the running period of the playing software so as to improve the processing efficiency of the multimedia data and ensure the fluency of the playing multimedia data.

Description

Multimedia data processing method and device

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a multimedia data processing method and apparatus.

Background

With the rapid development of the internet technology, audio and video live broadcast software (such as makeup teaching live broadcast, clothing matching live broadcast and the like) is increasingly developed on the internet, so that beauty-conscious women can be protected by loading the audio and video live broadcast software and wearing skills are provided. However, these audio and video live broadcast software often integrate some real-time tasks with large computation, such as: video processing, video encoding, audio processing, audio encoding, and so forth. Therefore, after the mobile terminal loaded with the live broadcast software runs for a period of time, the mobile terminal often avoids an overheating phenomenon of a Central Processing Unit (CPU) by actively reducing the Processing frequency of the CPU, and further, the Processing capability of the CPU cannot meet the real-time requirement of the audio/video live broadcast software due to the reduction of the Processing frequency of the CPU, for example, a series of echo and sound seizure phenomena may be caused.

Disclosure of Invention

The embodiment of the invention provides a multimedia data processing method and device, which can improve the processing efficiency of multimedia data so as to ensure the fluency of playing the multimedia data.

The invention provides a multimedia data processing method in a first aspect, which comprises the following steps:

acquiring first multimedia data, and performing data processing on the first multimedia data according to a current complexity algorithm;

recording the multimedia processing duration corresponding to each data frame in the first multimedia data, determining a preset threshold range in which the multimedia processing duration corresponding to each data frame is located, and counting the number of data frames in each preset threshold range;

and if the number of the data frames in each preset threshold range meets a preset switching condition, switching the current complexity algorithm into a target complexity algorithm so as to facilitate the follow-up multimedia processing on the acquired second multimedia data according to the target complexity algorithm.

Optionally, before the step of switching the current complexity algorithm to the target complexity algorithm if the number of data frames in each preset threshold range meets a preset switching condition, the method further includes:

acquiring the total number of data frames in the first multimedia data;

selecting a preset threshold range with the maximum number of data frames from all preset threshold ranges as a target threshold range;

calculating the ratio of the number of the data frames in the target threshold range to the total number of the data frames in the first multimedia, and judging whether the ratio is greater than or equal to a preset ratio corresponding to the target threshold range;

if the ratio is greater than or equal to the preset ratio, determining that the number of the data frames in each preset threshold range meets a preset switching condition;

and if the ratio is smaller than the preset ratio, determining that the number of the data frames in each preset threshold range does not meet the preset switching condition.

Wherein, if the number of data frames in each preset threshold range meets a preset switching condition, switching the current complexity algorithm to a target complexity algorithm includes:

if the number of the data frames in each preset threshold range meets a preset switching condition, acquiring a target complexity algorithm corresponding to the target threshold range;

judging whether the target complexity algorithm is the same as the current complexity algorithm or not;

if not, switching the current complexity algorithm into a target complexity algorithm;

if the judgment result is yes, the current complexity algorithm is reserved.

if the number of the data frames in each preset threshold range meets a preset switching condition, acquiring the level of a target complexity algorithm corresponding to the target threshold range;

calculating a difference between the level of the target complexity algorithm and the level of the current complexity algorithm;

if the absolute value of the difference is a first preset difference, switching the current complexity algorithm to the target complexity algorithm;

if the difference is larger than the first preset difference, determining the complexity algorithm which is larger than the level of the current complexity algorithm and adjacent to the level as a new target complexity algorithm, and switching the current complexity algorithm into the new target complexity algorithm;

if the difference is smaller than the second preset difference, determining the complexity algorithm which is smaller than the level of the current complexity algorithm and adjacent to the level as a new target complexity algorithm, and switching the current complexity algorithm into the new target complexity algorithm;

and if the difference is zero, determining to reserve the current complexity algorithm.

The acquiring first multimedia data and performing multimedia processing on the first multimedia data according to a current complexity algorithm includes:

acquiring first multimedia data, and judging whether the first multimedia data belongs to a near-end multimedia data type; the first multimedia data comprises target multimedia data and environmental noise data;

if the current complexity algorithm is judged to be the same as the current complexity algorithm, extracting a near-end algorithm set associated with the near-end multimedia data type from the current complexity algorithm;

and filtering the environmental noise data carried in the first multimedia data according to the near-end algorithm set to obtain multimedia data to be coded corresponding to the target multimedia data, and carrying out multimedia coding on the multimedia data to be coded to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the current complexity algorithm is a preset high-fidelity algorithm;

the filtering, according to the near-end algorithm set, the environmental noise data carried in the first multimedia data to obtain multimedia data to be encoded corresponding to the target multimedia data, and performing multimedia encoding on the multimedia data to be encoded to obtain a multimedia processing result corresponding to the first multimedia data, includes:

performing frequency band splitting on the first multimedia data according to a down-sampling algorithm to obtain high-frequency component data and low-frequency component data in the first multimedia data;

processing high-frequency component data and low-frequency component data in the first multimedia data according to a high-complexity echo cancellation algorithm, a noise reduction algorithm, a high-complexity voice detection algorithm, a high-complexity automatic gain control algorithm and an up-sampling algorithm to obtain multimedia data to be coded corresponding to the target multimedia data;

and performing multimedia coding on the multimedia data to be coded according to a single-channel multimedia coding algorithm to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the current complexity algorithm is a preset medium complexity algorithm;

processing high-frequency component data and low-frequency component data in the first multimedia data according to a low-complexity echo cancellation algorithm, a low-complexity noise reduction algorithm, a low-complexity voice detection algorithm and an up-sampling algorithm to obtain multimedia data to be coded corresponding to the target multimedia data;

Optionally, the current complexity algorithm is a preset low complexity algorithm;

performing frequency band splitting on the first multimedia data according to a down-sampling algorithm, and setting high-frequency component data in the first multimedia data to zero to obtain low-frequency component data in the first multimedia data;

processing low-frequency component data in the first multimedia data according to a low-complexity echo cancellation algorithm, a low-complexity noise reduction algorithm, a low-complexity voice detection algorithm and an up-sampling algorithm to obtain multimedia data to be coded corresponding to the target multimedia data;

and carrying out multimedia coding on the multimedia data to be coded according to a low-complexity multimedia coding algorithm to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the method further comprises:

if the first multimedia data is judged not to belong to the near-end multimedia data type, extracting a far-end algorithm set associated with the far-end multimedia data type from the current complexity algorithm;

and performing multimedia decoding on the first multimedia data according to the far-end algorithm set to obtain decoded multimedia data carrying a voice detection identifier, and processing the decoded multimedia data according to the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data.

multimedia decoding is carried out on the first multimedia data according to the far-end algorithm set to obtain decoded multimedia data carrying voice detection identification, and the decoded multimedia data are processed according to the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data, wherein the multimedia processing result comprises the following steps:

multimedia decoding is carried out on the first multimedia data according to a decoding algorithm in the far-end algorithm set, and decoded multimedia data carrying a voice detection identifier are obtained;

and processing the decoded multimedia data according to a down-sampling algorithm, a howling suppression algorithm, a high-complexity noise reduction algorithm, a high-complexity voice detection algorithm and an up-sampling algorithm carried in the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data.

the multimedia decoding the first multimedia data according to the far-end algorithm set to obtain decoded multimedia data carrying a voice detection identifier, and processing the decoded multimedia data according to the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data, includes:

and processing the decoded multimedia data according to a down-sampling algorithm, a low-complexity noise reduction algorithm, a low-complexity voice detection algorithm and an up-sampling algorithm carried in the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data.

and extracting a voice detection identifier carried in the decoded multimedia data, and processing the decoded multimedia data according to the voice detection identifier to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, after the switching the current complexity algorithm to the target complexity algorithm, the method further includes:

extracting a preset number of data frames from a multimedia processing result corresponding to the first multimedia data to be used as historical data frames;

processing the second multimedia data according to the target complexity algorithm to obtain multimedia data to be coded corresponding to the second multimedia data; the multimedia coding algorithm in the target complexity algorithm is different from the multimedia coding algorithm in the current complexity algorithm;

according to a multimedia coding algorithm in the target complexity algorithm, coding the historical data frame and the multimedia data to be coded corresponding to the second multimedia data to obtain a multimedia processing result corresponding to the second multimedia data; the multimedia coding algorithm is a low-complexity multimedia coding algorithm or a single-channel multimedia coding algorithm.

A second aspect of the present invention provides a multimedia data processing apparatus comprising:

the acquisition processing module is used for acquiring first multimedia data and processing the first multimedia data according to a current complexity algorithm;

the recording and counting module is used for recording the multimedia processing duration corresponding to each data frame in the first multimedia data, determining the preset threshold range of the multimedia processing duration corresponding to each data frame, and counting the number of the data frames in each preset threshold range;

the algorithm switching module is used for switching the current complexity algorithm into a target complexity algorithm if the number of the data frames in each preset threshold range meets a preset switching condition;

and the notification module is used for notifying the acquisition processing module to continue to perform multimedia processing on the acquired second multimedia data according to the target complexity algorithm.

Optionally, the apparatus further comprises:

a total number obtaining module, configured to obtain a total number of data frames in the first multimedia data;

a target range selection module, configured to select a preset threshold range with the largest number of data frames from all preset threshold ranges as a target threshold range;

a calculating module, configured to calculate a ratio of the number of data frames in the target threshold range to the total number of data frames in the first multimedia;

the ratio judging module is used for judging whether the ratio is larger than or equal to a preset ratio corresponding to the target threshold range;

a determining module, configured to determine that the number of data frames in each preset threshold range meets a preset switching condition if the ratio is greater than or equal to the preset ratio;

the determining module is further configured to determine that the number of data frames within each preset threshold range does not satisfy a preset switching condition if the ratio is smaller than the preset ratio.

Wherein the algorithm switching module comprises:

a target algorithm obtaining unit, configured to obtain a target complexity algorithm corresponding to each preset threshold range if the number of data frames in each preset threshold range meets a preset switching condition;

the algorithm judging unit is used for judging whether the target complexity algorithm is the same as the current complexity algorithm or not;

the algorithm switching unit is used for switching the current complexity algorithm into a target complexity algorithm if the judgment result is negative;

and the algorithm retaining unit is used for retaining the current complexity algorithm if the judgment result is yes.

Optionally, the algorithm switching module includes:

a level obtaining unit, configured to obtain a level of a target complexity algorithm corresponding to the target threshold range if the number of data frames in each preset threshold range meets a preset switching condition;

a difference calculation unit, configured to calculate a difference between the level of the target complexity algorithm and the level of the current complexity algorithm;

a first determining and switching unit, configured to switch the current complexity algorithm to the target complexity algorithm if the absolute value of the difference is a first preset difference;

a second determining and switching unit, configured to determine, if the difference is greater than the first preset difference, a complexity algorithm that is greater than the level of the current complexity algorithm and is adjacent to the level as a new target complexity algorithm, and switch the current complexity algorithm to the new target complexity algorithm;

a third determining and switching unit, configured to determine, if the difference is smaller than the second preset difference, a complexity algorithm that is smaller than the level of the current complexity algorithm and is adjacent to the level as a new target complexity algorithm, and switch the current complexity algorithm to the new target complexity algorithm;

and the determining and reserving unit is used for determining and reserving the current complexity algorithm if the difference value is zero.

Wherein, the collection processing module includes:

the acquisition judging unit is used for acquiring first multimedia data and judging whether the first multimedia data belong to a near-end multimedia data type; the first multimedia data comprises target multimedia data and environmental noise data;

a first extracting unit, configured to, if it is determined that the current complexity algorithm is the near-end algorithm set associated with the near-end multimedia data type, extract the near-end algorithm set associated with the near-end multimedia data type from the current complexity algorithm;

and the data processing and coding unit is used for filtering the environmental noise data carried in the first multimedia data according to the near-end algorithm set to obtain multimedia data to be coded corresponding to the target multimedia data, and carrying out multimedia coding on the multimedia data to be coded to obtain a multimedia processing result corresponding to the first multimedia data.

The current complexity algorithm is a preset high-fidelity algorithm;

the data processing encoding unit includes:

the first splitting subunit is configured to perform frequency band splitting on the first multimedia data according to a down-sampling algorithm to obtain high-frequency component data and low-frequency component data in the first multimedia data;

the first processing subunit is used for processing the high-frequency component data and the low-frequency component data in the first multimedia data according to a high-complexity echo cancellation algorithm, a noise reduction algorithm, a high-complexity voice detection algorithm, a high-complexity automatic gain control algorithm and an up-sampling algorithm to obtain multimedia data to be coded corresponding to the target multimedia data;

and the first coding subunit is used for carrying out multimedia coding on the multimedia data to be coded according to a single-channel multimedia coding algorithm to obtain a multimedia processing result corresponding to the first multimedia data.

the data processing encoding unit includes:

the second splitting subunit is configured to perform frequency band splitting on the first multimedia data according to a down-sampling algorithm to obtain high-frequency component data and low-frequency component data in the first multimedia data;

the second processing subunit is configured to process the high-frequency component data and the low-frequency component data in the first multimedia data according to a low-complexity echo cancellation algorithm, a low-complexity noise reduction algorithm, a low-complexity speech detection algorithm, and an up-sampling algorithm, so as to obtain to-be-encoded multimedia data corresponding to the target multimedia data;

and the second coding subunit is used for carrying out multimedia coding on the multimedia data to be coded according to a single-channel multimedia coding algorithm to obtain a multimedia processing result corresponding to the first multimedia data.

the data processing encoding unit includes:

the third splitting subunit is configured to split a frequency band of the first multimedia data according to a down-sampling algorithm, and set zero to high-frequency component data in the first multimedia data to obtain low-frequency component data in the first multimedia data;

the third processing subunit is configured to process the low-frequency component data in the first multimedia data according to a low-complexity echo cancellation algorithm, a low-complexity noise reduction algorithm, a low-complexity speech detection algorithm, and an up-sampling algorithm, so as to obtain to-be-encoded multimedia data corresponding to the target multimedia data;

and the third coding subunit is used for carrying out multimedia coding on the multimedia data to be coded according to a low-complexity multimedia coding algorithm to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the acquisition processing module further includes:

a second extracting unit, configured to extract a far-end algorithm set associated with a far-end multimedia data type from the current complexity algorithm if it is determined that the first multimedia data does not belong to a near-end multimedia data type;

and the data decoding processing unit is used for performing multimedia decoding on the first multimedia data according to the far-end algorithm set to obtain decoded multimedia data carrying a voice detection identifier, and processing the decoded multimedia data according to the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data.

The current complexity algorithm is a preset high-fidelity algorithm;

the data decoding processing unit includes:

the first decoding subunit is configured to perform multimedia decoding on the first multimedia data according to a decoding algorithm in the remote algorithm set to obtain decoded multimedia data carrying a voice detection identifier;

and the first result obtaining subunit is configured to process the decoded multimedia data according to a down-sampling algorithm, a howling suppression algorithm, a high-complexity noise reduction algorithm, a high-complexity voice detection algorithm, and an up-sampling algorithm carried in the far-end algorithm set, so as to obtain a multimedia processing result corresponding to the first multimedia data.

the data decoding processing unit includes:

the second decoding subunit is configured to perform multimedia decoding on the first multimedia data according to a decoding algorithm in the remote algorithm set to obtain decoded multimedia data carrying a voice detection identifier;

and the second result obtaining subunit is configured to process the decoded multimedia data according to a down-sampling algorithm, a low-complexity noise reduction algorithm, a low-complexity speech detection algorithm, and an up-sampling algorithm carried in the far-end algorithm set, so as to obtain a multimedia processing result corresponding to the first multimedia data.

the data decoding processing unit includes:

a third decoding subunit, configured to perform multimedia decoding on the first multimedia data according to a decoding algorithm in the remote algorithm set, so as to obtain decoded multimedia data carrying a voice detection identifier;

and the third result obtaining subunit is configured to extract a voice detection identifier carried in the decoded multimedia data, and process the decoded multimedia data according to the voice detection identifier to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the apparatus further comprises:

the data frame extraction module is used for extracting a preset number of data frames from a multimedia processing result corresponding to the first multimedia data to serve as historical data frames;

the notification module is specifically configured to notify the acquisition processing module to process the second multimedia data according to the target complexity algorithm, so as to obtain to-be-encoded multimedia data corresponding to the second multimedia data; the multimedia coding algorithm in the target complexity algorithm is different from the multimedia coding algorithm in the current complexity algorithm;

the notification module is further specifically configured to notify the acquisition processing module to encode the historical data frame and the to-be-encoded multimedia data corresponding to the second multimedia data according to a multimedia encoding algorithm in the target complexity algorithm, so as to obtain a multimedia processing result corresponding to the second multimedia data; the multimedia coding algorithm is a low-complexity multimedia coding algorithm or a single-channel multimedia coding algorithm.

A third aspect of the present invention provides a multimedia data processing apparatus comprising: a processor, a memory, a network interface;

the processor is connected to a network interface and a memory, respectively, where the network interface is configured to receive and send multimedia data, the memory is configured to store a program code, and the processor is configured to call the program code to execute the method in the first aspect in the embodiments of the present invention.

A fourth aspect of embodiments of the present invention provides a computer storage medium storing a computer program comprising program instructions that, when executed by a processor, perform the method of the first aspect of embodiments of the present invention.

The embodiment of the invention collects first multimedia data and processes the first multimedia data according to the current complexity algorithm; recording the multimedia processing duration corresponding to each data frame in the first multimedia data, determining a preset threshold range in which the multimedia processing duration corresponding to each data frame is located, and counting the number of data frames in each preset threshold range; and if the number of the data frames in each preset threshold range meets a preset switching condition, switching the current complexity algorithm into a target complexity algorithm so as to facilitate the follow-up multimedia processing on the acquired second multimedia data according to the target complexity algorithm. Therefore, the invention can further count the number of data frames in each preset threshold range by setting a plurality of preset threshold ranges, and switch the current complexity algorithm when the number in each preset threshold range meets the switching condition, so as to ensure that the switched complexity algorithm is matched with the adjusted CPU processing capacity, thereby improving the processing efficiency of the multimedia data and ensuring the fluency of playing the multimedia data.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a multimedia data processing method according to an embodiment of the present invention;

FIG. 3 is a flow chart of another multimedia data processing method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a multimedia data processing apparatus according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of another multimedia data processing apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an algorithm switching module according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another algorithm switching module according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an acquisition processing module according to an embodiment of the present invention;

FIG. 9 is a schematic structural diagram of a data processing encoding unit according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a data decoding processing unit according to an embodiment of the present invention

Fig. 11 is a schematic structural diagram of another multimedia data processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present invention. As shown in fig. 1, the network architecture may include a cast terminal 3000, a server 2000, and a cluster of viewer terminals; the audience terminal cluster may include a plurality of audience terminals, as shown in fig. 1, specifically including an audience terminal 4000a, audience terminals 4000b, …, and an audience terminal 4000 n;

the viewer terminal 4000a, the viewer terminals 4000b and …, and the viewer terminal 4000n may be respectively connected to the server 2000 through a network, and the server 2000 may be connected to the anchor terminal 3000 through a network.

As shown in fig. 1, the anchor terminal 3000 may select at least one audience terminal in the audience terminal cluster as a target audience terminal (taking the target audience terminal as the audience terminal 4000a as an example), initiate a microphone connecting invitation (i.e., a video interaction invitation) to the audience terminal 4000a through the server 2000, and when the audience terminal 4000a responds to the microphone connecting invitation forwarded by the server 2000, the server 2000 may process multimedia data (the multimedia data may include image data and voice data) uploaded by the audience terminal 4000a and the anchor terminal 3000, respectively, and issue the processed multimedia data to the anchor terminal 3000 and the audience terminal 4000a, respectively, so that the audience terminal 4000a has the same live broadcast function as the anchor terminal 3000, that is, at this time, the audience terminal 4000a may serve as another anchor terminal, and may collect images and/or sounds of itself through a microphone, and plays the images and/or sounds recorded by the anchor terminal 3000 through a speaker. Thus, the remaining viewer terminal clusters (viewer terminals 4000b, …, viewer terminal 4000n) are able to see and/or hear images and/or sounds of both anchor (images and/or sounds of both anchors from anchor terminal 3000 and viewer terminal 4000a having a microphone-connecting function with said anchor terminal 3000) simultaneously.

The specific process of the anchor terminal 3000 processing the image and/or sound can be referred to the following embodiments corresponding to fig. 2 to 3.

Fig. 2 is a schematic flow chart illustrating a multimedia data processing method according to an embodiment of the present invention. As shown in fig. 2, the method may include:

s101, collecting first multimedia data, and performing data processing on the first multimedia data according to a current complexity algorithm;

specifically, an anchor terminal collects first multimedia data and judges whether the first multimedia data belongs to a near-end multimedia data type; the first multimedia data comprises target multimedia data and environmental noise data; if the current complexity algorithm is judged to be the same as the current complexity algorithm, extracting a near-end algorithm set associated with the near-end multimedia data type from the current complexity algorithm, filtering the environmental noise data carried in the first multimedia data according to the near-end algorithm set to obtain multimedia data to be coded corresponding to the target multimedia data, and carrying out multimedia coding on the multimedia data to be coded to obtain a multimedia processing result corresponding to the first multimedia data; optionally, if no, extracting a far-end algorithm set associated with a far-end multimedia data type from the current complexity algorithm, performing multimedia decoding on the first multimedia data according to the far-end algorithm set to obtain decoded multimedia data carrying a voice detection identifier, and processing the decoded multimedia data according to the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data.

The anchor terminal may be the anchor terminal 3000 in the embodiment corresponding to fig. 1, and the anchor terminal may include a terminal device with a camera function, such as a personal computer, a tablet computer, a notebook computer, an intelligent television, and an intelligent mobile phone;

the target multimedia data may include voice data and/or image data related to the embodiment corresponding to fig. 1;

the ambient noise data may include, among other things, white gaussian noise, echoes, and other interfering noise in the environment of the anchor user, such as traffic noise generated by passing vehicles, airplanes, etc.

The current complexity algorithm may include any one of a high fidelity algorithm, a medium complexity algorithm, and a sibling complexity algorithm, and each complexity algorithm corresponds to a near-end algorithm set and a far-end algorithm set.

When the current complexity algorithm is a high fidelity algorithm, the near-end algorithm set comprises the following algorithms:

a down-sampling algorithm, a high-complexity echo cancellation (AEC _ HQ) algorithm, a high-complexity noise reduction (NS HQ) algorithm, a high-complexity voice detection (VAD _ HQ) algorithm, a high-complexity automatic gain control (AGC _ HQ) algorithm, an up-sampling algorithm, a single-channel multimedia coding (AAC _ SBR) algorithm;

the down-sampling module corresponding to the down-sampling algorithm and the up-sampling module corresponding to the up-sampling algorithm are both carefully designed, and the purpose is to ensure the continuity of audio when the sampling rate is switched. Therefore, a Split filter (Split filter) can be used to accomplish the resampling. The splitting filter has both down-sampling and up-sampling functions, that is, the splitting filter includes an analysis portion corresponding to the down-sampling module and an integration portion corresponding to the up-sampling module, where the analysis portion may split an input audio into time domain signals of a plurality of sub-bands (for example, split audio data of a 32K sampling rate into two sub-bands of 16K), and the integration portion synthesizes the plurality of sub-band signals back into an original audio signal. When the down-sampling of integral multiple sampling rate is needed, the high frequency sub-band can be set to zero directly after the analysis. The division filter is used for down-sampling, so that the low-frequency component of the audio signal can be ensured to be continuous, the human ear is more sensitive to the low-frequency component, and the continuity of the low-frequency component can ensure that the processed audio signal is continuous in audibility. When the HQ processing flow is switched, the added high-frequency component is matched with the original low-frequency component, and because the low-frequency component and the high-frequency component are formed by splitting the same section of audio data, the discontinuity of the auditory sense is not caused, the auditory sense is more detailed, and the sound is clear.

Accordingly, the set of remote algorithms comprises the following algorithms:

decoding algorithm, down-sampling algorithm, howling suppression (AHC) algorithm, high-complexity noise reduction algorithm (NS HQ), high-complexity voice detection (VAD _ HQ) algorithm, and up-sampling algorithm.

The howling suppression module corresponding to the howling suppression algorithm can effectively suppress howling caused by echo in a high-fidelity algorithm, but is not an essential module in a medium-complexity algorithm and a low-complexity algorithm, so that the howling suppression module can be removed to reduce the complexity of CPU processing.

Wherein, when the current complexity algorithm is a medium complexity algorithm, the near-end algorithm set comprises the following algorithms:

a down-sampling algorithm, a low-complexity echo cancellation (AEC _ LC) algorithm, a low-complexity noise reduction (NS _ LC) algorithm, a low-complexity voice detection (VAD __ LC) algorithm, an up-sampling algorithm, a single-channel multimedia coding (AAC _ SBR) algorithm;

accordingly, the set of remote algorithms comprises the following algorithms:

decoding algorithm, down-sampling algorithm, low complexity noise reduction algorithm (NS _ LC), low complexity voice detection (VAD __ LC) algorithm, up-sampling algorithm.

Wherein, when the current complexity algorithm is a low complexity algorithm, the near-end algorithm set comprises the following algorithms:

a down-sampling algorithm, a low-complexity echo cancellation (AEC _ LC) algorithm, a low-complexity noise reduction (NS _ LC) algorithm, a low-complexity voice detection (VAD __ LC) algorithm, an up-sampling algorithm, a low-complexity multimedia coding (AAC _ LC) algorithm;

accordingly, the set of remote algorithms comprises the following algorithms:

decoding algorithm, voice detection (VAD) identification recognition.

The first multimedia data belongs to the near-end multimedia data type, and means that the first multimedia data may be audio data entered by an anchor user through a sound receiver (e.g., a microphone), and may also be image data entered by the anchor user through an image receiver (e.g., a camera). At this time, the anchor terminal for acquiring the first multimedia data may extract a near-end algorithm set associated with the near-end multimedia data type in the current complexity algorithm, and may further perform data processing and data encoding on the first multimedia data according to the near-end algorithm set, thereby obtaining a multimedia processing result corresponding to the first multimedia data.

The first multimedia data belongs to the far-end multimedia data, and means that the first multimedia data may be audio data that is recorded by a main user through an audio player (e.g., a speaker) (the main user may be the audience user who holds the audience terminal 4000a in the embodiment corresponding to fig. 1), or may be image data that is recorded by the main user through an image player (e.g., a camera). At this time, the anchor terminal for acquiring the first multimedia data may extract a remote algorithm set associated with the type of the remote multimedia data from the current complexity algorithm, and may further perform data decoding and data processing on the first multimedia data according to the remote algorithm set, thereby obtaining a multimedia processing result corresponding to the first multimedia data.

For example, in a live display interface on a certain live platform, there are 4 viewers (viewer a, viewer B, viewer C, and viewer D, and the viewer terminals corresponding to the 4 viewers respectively may be the viewer terminal 4000a, the viewer terminal 4000B, the viewer terminal 4000C, and the viewer terminal 4000D in the embodiment corresponding to fig. 1) watching a color cosmetic teaching video recorded on the live platform by the main broadcaster E. In the live broadcasting process of the video teaching, the anchor E can randomly select one of the 4 viewers as the online viewer (for example, the viewer a as the online viewer) in order to know the color cosmetic mastery degree of each viewer, so that a network communication relationship can be established between the anchor terminal held by the anchor E and the viewer terminal 4000a held by the viewer a. At this time, for the anchor terminal held by the anchor E, the audio data and the video data recorded by the anchor E through the camera on the anchor terminal may be taken as the first multimedia data, and the first multimedia data belongs to the near-end multimedia data type. Therefore, the anchor terminal can extract a near-end algorithm set corresponding to the near-end multimedia data type from the current complexity algorithm, and perform data processing and data encoding on the first multimedia data according to the near-end algorithm set;

meanwhile, when the main broadcast terminal held by the main broadcast E has the function of connecting with the viewer terminal 4000a held by the viewer a, the viewer terminal 4000a may act as a slave main broadcast terminal having the same function as the main broadcast terminal, that is, the viewer terminal 4000b, the viewer terminal 4000c, and the viewer terminal 4000d may simultaneously view and hear the images and sounds of the main broadcast E and the viewer a. That is, for the anchor terminal held by the anchor E, the audio data and the video data transmitted by the viewer terminal 4000a may be further used as the first multimedia data, and at this time, the first multimedia data belongs to the remote multimedia data, so that the anchor terminal held by the anchor E may extract the remote algorithm set corresponding to the type of the remote multimedia data in the current complexity algorithm, and perform data decoding and data processing on the first multimedia data according to the remote algorithm set.

Step S102, recording the multimedia processing duration corresponding to each data frame in the first multimedia data, determining the preset threshold range of the multimedia processing duration corresponding to each data frame, and counting the number of the data frames in each preset threshold range;

specifically, the anchor terminal may record a multimedia processing duration corresponding to each data frame in the first multimedia data; the multimedia processing duration may be a time-consuming duration required for the anchor terminal to perform data processing and data encoding on each data frame, and therefore, the anchor terminal may further count the number of data frames within each preset threshold range according to the preset threshold range in which each multimedia processing duration is respectively located; when the number of data frames within each preset threshold range meets a preset switching condition, the anchor terminal may further perform step S103; optionally, after the step S102 is executed, the anchor terminal may further retain the current complexity algorithm when the number of data frames in each preset threshold range does not satisfy a preset switching condition.

Step S103, if the number of the data frames in each preset threshold range meets a preset switching condition, switching the current complexity algorithm to a target complexity algorithm so as to facilitate the follow-up multimedia processing on the acquired second multimedia data according to the target complexity algorithm.

Specifically, the anchor terminal may obtain the total number of data frames in the first multimedia data, select a preset threshold range having the largest number of data frames from all preset threshold ranges as a target threshold range, calculate a ratio of the number of data frames in the target threshold range to the total number of data frames, and determine whether the ratio is greater than or equal to a preset ratio corresponding to the target threshold range, if the ratio is greater than or equal to the preset ratio, determine that the number of data frames in each preset threshold range meets a preset switching condition, and switch the current complexity algorithm to a target complexity algorithm; and if the ratio is smaller than the preset ratio, determining that the number of the data frames in each preset threshold range does not meet the preset switching condition.

When the current complexity algorithm is switched to the target complexity algorithm, the anchor terminal may try to ensure that the processing time consumption of the first multimedia data is consistent with the processing time consumption of the subsequent second multimedia data, for example, when the same segment of first multimedia data is processed, if the current complexity algorithm is a high-fidelity algorithm, the processing time consumption of the anchor terminal is 4ms, and if the anchor terminal in a medium complexity algorithm is used to process the same multimedia data, the processing time consumption of 16ms may be required. It can be seen that the processing power of the CPU can be very different for two different complexity algorithms. Therefore, in order to ensure seamless switching between complexity switching algorithms, it may be considered that, in the process of performing frequency reduction by the anchor terminal, a high-fidelity algorithm is automatically switched to a corresponding medium complexity algorithm according to the processing capability of the CPU, and the second multimedia data is processed according to the medium complexity algorithm, at this time, 4ms of processing time is consumed, because the anchor terminal improves the processing efficiency of the CPU at the expense of the quality of audio output at this time.

In order to avoid frequent switching complexity, the anchor terminal may adopt two preset thresholds (4ms and 16ms) to control the increase and decrease of complexity. The complexity algorithm corresponding to the processing duration within the range of the preset threshold value less than 4ms may be used as a high-fidelity algorithm, the complexity algorithm corresponding to the processing duration within the range of the preset threshold value from 4ms to 16ms may be used as a medium-complexity algorithm, and the complexity algorithm corresponding to the processing duration within the range of the preset threshold value greater than 16ms may be used as a low-complexity algorithm, so that each preset threshold value range corresponds to one complexity algorithm, and each preset range corresponds to a different preset ratio (for example, the preset ratio corresponding to the range of the preset threshold value greater than 16ms is 50%, and the preset ratio corresponding to the range of the preset threshold value less than 4ms is 80%).

Therefore, when the anchor terminal acquires the total number of the data frames in the first multimedia data, the anchor terminal may further count the number of the data frames within each preset threshold range according to the preset threshold range in which the multimedia processing duration corresponding to each data frame is located, so as to further select the preset threshold range having the maximum number of the data frames from all the preset threshold ranges as a target threshold range, thereby obtaining the ratio of the number of the data frames within the target threshold range to the total number of the data frames.

For example, taking the current complexity algorithm as a medium complexity algorithm as an example, where the total number of the data frames acquired by the anchor terminal and corresponding to the first multimedia data is 20 frames, and it is counted that the multimedia processing durations of the data frames with 13 frames in the 20 frames are all greater than 20ms (that is, the multimedia processing durations of the data frames with 13 frames are all within a preset threshold range greater than 16ms), and the multimedia processing durations of the data frames with 7 frames are all within a preset threshold range of 4-16 ms. Therefore, the anchor terminal may take a preset threshold range greater than 16ms as a target threshold range, and may further obtain a ratio of the number of data frames within the target preset threshold range to the total number of data frames in the first multimedia data (the ratio is: 13/20-85%). At this time, the ratio is greater than a preset ratio (80%), so that the current complexity algorithm (medium complexity algorithm) can be switched to a target complexity algorithm (e.g., low complexity algorithm) corresponding to the target threshold range, so as to improve the processing efficiency of multimedia data, thereby ensuring that the CPU of the anchor terminal can process subsequently acquired multimedia data in real time according to the target complexity algorithm.

It can be seen that the processor (CPU) of the anchor terminal may further determine the current processing capability of the CPU according to the multimedia processing time length of each data frame, and if the multimedia processing time length is longer, it indicates that the processing capability of the CPU under the current complexity algorithm is worse, that is, the current complexity algorithm needs to be degraded. Optionally, if the multimedia processing duration is shorter, it indicates that the processing capability of the CPU under the current complexity algorithm is stronger, that is, the CPU currently has sufficient processing capability to process subsequently acquired multimedia data, so that the current complexity algorithm can be upgraded.

Further, please refer to fig. 2, which is a flowchart illustrating another multimedia data processing method according to an embodiment of the present invention. As shown in fig. 2, the method may include:

step S201, collecting first multimedia data, and processing the first multimedia data according to a current complexity algorithm;

step S202, recording the multimedia processing duration corresponding to each data frame in the first multimedia data, determining the preset threshold range of the multimedia processing duration corresponding to each data frame, and counting the number of the data frames in each preset threshold range;

for a specific implementation manner of steps S201 to S202, reference may be made to the description of steps S101 to S102 in the embodiment corresponding to fig. 2, and details will not be further described here.

Step S203, acquiring the total number of data frames in the first multimedia data;

step S204, selecting a preset threshold range with the maximum number of data frames from all preset threshold ranges as a target threshold range;

specifically, when the anchor terminal acquires the total number of the data frames in the first multimedia data, the anchor terminal may further count the number of the data frames within each preset threshold range according to the preset threshold range in which the multimedia processing duration corresponding to each data frame is located, so as to further use the preset threshold range having the maximum number of data frames within each preset threshold range as the target threshold range.

The number of the preset threshold value ranges can be preset according to actual conditions, and a corresponding preset ratio is distributed to each preset threshold value range.

In order to avoid frequent switching complexity, the anchor terminal may adopt two preset thresholds (4ms and 16ms) to control the increase and decrease of complexity. The complexity algorithm corresponding to the processing duration within the range of the preset threshold value less than 4ms is used as a high-fidelity algorithm, the complexity algorithm corresponding to the processing duration within the range of the preset threshold value from 4ms to 16ms is used as a medium-complexity algorithm, and the complexity algorithm corresponding to the processing duration within the range of the preset threshold value greater than 16ms is used as a low-complexity algorithm, so that each preset threshold value range corresponds to one complexity algorithm, and each preset range corresponds to different preset ratios.

The preset ratio corresponding to the range with the preset threshold value range being greater than 16ms is 50%, the anchor terminal can perform degradation processing on the current complexity algorithm as long as the anchor terminal counts the ratio between the number of the multimedia data frames falling into the preset threshold value range and the total number of the first multimedia data frames, and when the ratio is greater than or equal to 50%, the anchor terminal cannot perform real-time processing on subsequently acquired second multimedia data under the current complexity algorithm, so that the complexity algorithm of multimedia data processing needs to be reduced to effectively and real-time process the second multimedia data, and further, the phenomena of jamming and echo are avoided.

The preset ratio corresponding to the range with the preset threshold value range being less than 4ms is 80%, the anchor terminal can also count the ratio between the number of the multimedia data frames falling into the preset threshold value range and the total number of the first multimedia data frames, and when the ratio is greater than or equal to 80%, upgrade the current complexity algorithm.

Step S205, calculating the ratio of the number of the data frames in the target threshold range to the total number of the data frames in the first multimedia, and judging whether the ratio is greater than or equal to a preset ratio corresponding to the target threshold range;

optionally, the anchor terminal may further calculate a ratio of the number of data frames in each preset threshold range to the total number of data frames in the first multimedia, select a maximum ratio among all ratios, and use a preset threshold range corresponding to the maximum ratio as a target threshold range, thereby further determining whether the maximum ratio is greater than or equal to the preset ratio corresponding to the target threshold range.

If most of the multimedia processing durations respectively corresponding to the data frames are within the preset threshold range larger than 16ms (for example, the processing durations of the data frames having 28 frames in the first multimedia data having 32 frames are all larger than 20ms), it indicates that the processing capability of the CPU under the current complexity algorithm is worse, and then the step S206-the step S207 may be further executed to perform the degradation processing on the current complexity algorithm. Optionally, if most of the multimedia processing durations corresponding to the data frames are within a preset threshold range smaller than 4ms (for example, the processing durations of 25 data frames in the first multimedia data of 32 frames are all smaller than 3ms), it indicates that the processing capability of the CPU under the current complexity algorithm is stronger, that is, the CPU currently has sufficient processing capability to process the subsequently acquired multimedia data, so that step S206-step S207 may be further performed to upgrade the current complexity algorithm.

For example, taking the current complexity algorithm as a medium complexity algorithm as an example, where the total number of the data frames acquired by the anchor terminal and corresponding to the first multimedia data is 20 frames, and it is counted that the multimedia processing durations of the data frames with 13 frames in the 20 frames are all greater than 20ms (that is, the multimedia processing durations of the data frames with 13 frames are all within a preset threshold range greater than 16ms), and the multimedia processing durations of the data frames with 7 frames are all within a preset threshold range of 4-16 ms. Therefore, the anchor terminal may use a preset threshold range greater than 16ms as a target threshold range, and may further obtain a ratio of the number of data frames within the target preset threshold range to the total number of data frames in the first multimedia data (the ratio is: 13/20 ═ 65%). At this time, the ratio is greater than the preset ratio (50%), so that steps S206-S207 may be further performed to switch the current complexity algorithm (medium complexity algorithm) to the target complexity algorithm (e.g., low complexity algorithm) corresponding to the target threshold range, so as to ensure that the CPU of the anchor terminal can process the subsequently acquired multimedia data in real time according to the target complexity algorithm.

For another example, still taking the current complexity algorithm as the medium complexity algorithm as an example, when the total number of the data frames in the acquired first multimedia data is 20 frames, and it is detected that the anchor terminal performs data processing and data encoding on the first multimedia data, it is counted that the multimedia processing durations of the data frames with 17 frames are all smaller than the preset threshold range of 4ms (the preset ratio in the range of the preset threshold range smaller than 4ms is 80%), and the multimedia processing durations of the data frames with 3 frames are all located in the preset threshold range of 4-16ms (the preset ratio in the range of the preset threshold range 4-16ms is 60%). At this time, the anchor terminal may use a range in which a preset threshold range is less than 4ms as a target threshold range, and may further calculate that a ratio between the data frames of the 17 frames and the total number of 20 frames is 85%, and since the ratio 85% is greater than a preset ratio 80%, the anchor terminal may further perform step S206 to step S207 to switch the current complexity algorithm to a high-fidelity algorithm corresponding to the target threshold range, and further output higher-quality multimedia data.

Step S206, if the ratio is greater than or equal to the preset ratio, determining that the number of data frames in each preset threshold range meets a preset switching condition;

step S207, switching the current complexity algorithm into a target complexity algorithm so as to facilitate the follow-up multimedia processing of the collected second multimedia data according to the target complexity algorithm;

specifically, if the number of data frames in each preset threshold range meets a preset switching condition, acquiring a target complexity algorithm corresponding to the target threshold range, and judging whether the target complexity algorithm is the same as the current complexity algorithm, if not, switching the current complexity algorithm to the target complexity algorithm; optionally, if yes, the current complexity algorithm is retained.

In the process of switching the complexity algorithm in this way, as long as it is detected that the ratio of the number of data frames in the target threshold range to the total number of data frames in the first multimedia meets a preset ratio, the current complexity algorithm may be switched to the target complexity algorithm corresponding to the target threshold range, so that there is a possibility that the target complexity algorithm is the same as the current complexity algorithm, and there may also be a possibility of level-jump switching (for example, switching from a high-fidelity algorithm to a low-complexity algorithm).

For example, the total number of the acquired data frames in the first multimedia data is 20 frames, and in the process of performing data processing and data encoding on the first multimedia data, the anchor terminal counts that the multimedia processing durations of the data frames with 16 frames are all within a preset threshold range of 4ms (the preset ratio of the preset threshold range within the range of less than 4ms is 80%), and the multimedia processing durations of the data frames with 4 frames are all within a preset threshold range of 4-16ms (the preset ratio of the preset threshold range within the range of 4-16ms is 60%). At this time, the anchor terminal may use a range with a preset threshold range smaller than 4ms as a target threshold range, and may further calculate that a ratio between the data frames of the 17 frames and the total number of 20 frames is 80%, and since the ratio 85% is equal to the preset ratio 80%, the anchor terminal may determine that the number of data frames in each preset threshold range satisfies a preset switching condition, and further obtain a target complexity algorithm (high fidelity algorithm) corresponding to the preset threshold range smaller than 4ms, and determine whether the current complexity algorithm is the same as the target complexity algorithm. If the current complexity algorithm is a low complexity algorithm, the anchor terminal can determine that the current complexity algorithm is different from the target complexity algorithm, and can switch the current complexity algorithm into a high fidelity algorithm to realize skip level switching. Optionally, if the current complexity algorithm is a high fidelity algorithm, it is determined that the current complexity algorithm is the same as the target complexity algorithm, so that the current complexity algorithm (high fidelity algorithm) may be retained. Optionally, if the current complexity algorithm is a medium complexity algorithm, determining that the current complexity algorithm is different from the target complexity algorithm, and switching the current complexity algorithm to a high fidelity algorithm. Therefore, the mobile terminal can dynamically adjust the current complexity algorithm according to the actual processing capacity of the anchor terminal.

Optionally, the anchor terminal may further obtain a level of a target complexity algorithm corresponding to the target threshold range, calculate a difference between the level of the target complexity algorithm and the level of the current complexity algorithm, switch the current complexity algorithm to the target complexity algorithm if an absolute value of the difference is a first preset difference, determine a complexity algorithm that is greater than the level of the current complexity algorithm and is adjacent to the level as a new target complexity algorithm if the difference is greater than the first preset difference, and switch the current complexity algorithm to the new target complexity algorithm; optionally, if the difference is smaller than the second preset difference, determining a complexity algorithm which is smaller than the level of the current complexity algorithm and adjacent to the level as a new target complexity algorithm, and switching the current complexity algorithm into the new target complexity algorithm; optionally, if the difference is zero, determining to reserve the current complexity algorithm.

The first preset difference value is 1, which indicates that the level difference between the current complexity algorithm and the target complexity algorithm is +/-1, namely that the current complexity algorithm and the target complexity algorithm belong to complexity algorithms of adjacent levels.

Wherein the second predetermined difference is a value less than-1 (e.g., -2, -3), this indicates that the predetermined complexity algorithm is far more than the three complexity algorithms (high fidelity algorithm, medium complexity algorithm, and low complexity algorithm) involved in the foregoing, and there are other types of complexity algorithms (e.g., sub-fidelity algorithm, etc.).

Similarly, it can be seen that the difference is greater than the first preset difference, and at this time, the difference between the level of the target complexity algorithm and the level of the current complexity algorithm may be 2 or 3, which may also indicate that the preset complexity algorithm is far more than the three complexity algorithms (high fidelity algorithm, medium complexity algorithm, and low complexity algorithm) involved in the foregoing, and there are other types of complexity algorithms (e.g., sub-fidelity algorithm, etc.).

Step S208, if the ratio is smaller than the preset ratio, it is determined that the number of data frames within each preset threshold range does not satisfy a preset switching condition.

Specifically, when it is determined that the number of data frames within each preset threshold range does not satisfy a preset switching condition, the anchor terminal reserves the current complexity algorithm; optionally, if the processing capability of the CPU in the current complexity algorithm does not enable the second multimedia data collected subsequently to be processed in real time, a complexity switching scheme corresponding to the anchor terminal may be obtained from the server, where the complexity switching scheme includes a level lower than a level corresponding to the low complexity. At least three preset threshold values are set at this time, namely the preset threshold values can be 4ms, 16ms and 26ms, so that when the original dynamic adjustment fails, two sub-preset threshold value ranges within the preset threshold value range larger than 16ms are further obtained, and a preset ratio is further distributed to the two sub-preset threshold value ranges, so that phenomena such as echo and jamming are avoided.

The server may be the server 2000 in the embodiment corresponding to fig. 1, and the description thereof will not be repeated here.

Optionally, the anchor terminal may further extract a preset number of data frames from the multimedia processing result corresponding to the first multimedia data, use the data frames as historical data frames, process the second multimedia data according to the target complexity algorithm to obtain multimedia data to be encoded corresponding to the second multimedia data, and encode the historical data frames and the multimedia data to be encoded corresponding to the second multimedia data according to a multimedia encoding algorithm in the target complexity algorithm to obtain a multimedia processing result corresponding to the second multimedia data;

the frame length of the historical data frame is at least 160ms, so that when the multimedia coding algorithm is switched (for example, the AAC _ SBR algorithm is switched to the AAC _ LC algorithm), the historical data frame can be preprocessed to initialize the encoder corresponding to the target complexity algorithm, and thus the coding delay of the two multimedia coding algorithms during algorithm switching can be reduced.

Wherein a multimedia coding algorithm in the target complexity algorithm is different from a multimedia coding algorithm in the current complexity algorithm;

the multimedia coding algorithm may be a low-complexity multimedia coding algorithm or a single-channel multimedia coding algorithm.

Further, please refer to fig. 4, which is a schematic structural diagram of a multimedia data processing apparatus according to an embodiment of the present invention. As shown in fig. 4, the multimedia data processing apparatus 1 is applicable to the anchor terminal 3000 in the embodiment corresponding to fig. 1, and the multimedia data processing apparatus 1 may include: the system comprises an acquisition processing module 10, a record counting module 20, an algorithm switching module 30 and a notification module 40;

the acquisition processing module 10 is configured to acquire first multimedia data and perform data processing on the first multimedia data according to a current complexity algorithm;

the recording and counting module 20 is configured to record a multimedia processing duration corresponding to each data frame in the first multimedia data, determine a preset threshold range in which the multimedia processing duration corresponding to each data frame is located, and count the number of data frames in each preset threshold range;

the algorithm switching module 30 is configured to switch the current complexity algorithm to a target complexity algorithm if the number of data frames within each preset threshold range meets a preset switching condition;

the notifying module 40 is configured to notify the acquiring and processing module 10 to continue to perform multimedia processing on the acquired second multimedia data according to the target complexity algorithm.

For specific implementation manners of the acquisition processing module 10, the record statistics module 20, the algorithm switching module 30, and the notification module 40, reference may be made to the description of step S101 to step S103 in the embodiment corresponding to fig. 2, which will not be described herein again.

Further, please refer to fig. 5, which is a schematic structural diagram of another multimedia data processing apparatus according to an embodiment of the present invention. As shown in fig. 5, the multimedia data processing apparatus 1 can be applied to the anchor terminal 3000 in the embodiment corresponding to fig. 1, and the multimedia data processing apparatus 1 can include the collecting and processing module 10, the record counting module 20, the algorithm switching module 30 and the notification module 40 in the embodiment corresponding to fig. 4, and further, the multimedia data processing apparatus 1 can further include; a total number obtaining module 50, a target range selecting module 60, a calculating module 70, a ratio judging module 80, a determining module 90 and a data frame extracting module 100;

the total number obtaining module 50 is configured to obtain the total number of data frames in the first multimedia data;

the target range selection module 60 is configured to select a preset threshold range with the largest number of data frames from all preset threshold ranges as a target threshold range;

the calculating module 70 is configured to calculate a ratio of the number of data frames in the target threshold range to the total number of data frames in the first multimedia;

the ratio determining module 80 is configured to determine whether the ratio is greater than or equal to a preset ratio corresponding to the target threshold range;

the determining module 90 is configured to determine that the number of data frames in each preset threshold range meets a preset switching condition if the ratio is greater than or equal to the preset ratio;

the determining module 90 is further configured to determine that the number of data frames in each preset threshold range does not satisfy a preset switching condition if the ratio is smaller than the preset ratio.

For specific implementation manners of the total number obtaining module 50, the target range selecting module 60, the calculating module 70, the ratio determining module 80, and the determining module 90, reference may be made to the description of step S203 to step S209 in the embodiment corresponding to fig. 3, and details will not be further described here.

Further, please refer to fig. 6, which is a schematic structural diagram of an algorithm switching module according to an embodiment of the present invention. As shown in fig. 6, the algorithm switching module 30 includes: a target algorithm obtaining unit 301, an algorithm judging unit 302, an algorithm switching unit 303 and an algorithm retaining unit 304;

the target algorithm obtaining unit 301 is configured to obtain a target complexity algorithm corresponding to the target threshold range if the number of data frames in each preset threshold range meets a preset switching condition;

the algorithm determining unit 302 is configured to determine whether a target complexity algorithm is the same as the current complexity algorithm;

the algorithm switching unit 303 is configured to switch the current complexity algorithm to the target complexity algorithm if the decision is negative;

the algorithm retaining unit 304 is configured to retain the current complexity algorithm if the determination result is yes.

For specific implementation manners of the target algorithm obtaining unit 301, the algorithm determining unit 302, the algorithm switching unit 303, and the algorithm retaining unit 304, reference may be made to the description of step S207 in the embodiment corresponding to fig. 3, which will not be further described here.

Optionally, please refer to fig. 7, which is a schematic structural diagram of another algorithm switching module according to an embodiment of the present invention. As shown in fig. 7, the algorithm switching module 30 includes: a rank acquisition unit 305, a difference value calculation unit 306, a first determination switching unit 307, a second determination switching unit 308, a third determination switching unit 309, and a determination holding unit 310;

the level obtaining unit 305 is configured to obtain a level of a target complexity algorithm corresponding to the target threshold range if the number of data frames in each preset threshold range meets a preset switching condition;

the difference calculating unit 306 is configured to calculate a difference between the level of the target complexity algorithm and the level of the current complexity algorithm;

the first determining and switching unit 307 is configured to switch the current complexity algorithm to the target complexity algorithm if the absolute value of the difference is a first preset difference;

the second determining and switching unit 308 is configured to determine, if the difference is greater than the first preset difference, a complexity algorithm that is greater than the level of the current complexity algorithm and is adjacent to the level as a new target complexity algorithm, and switch the current complexity algorithm to the new target complexity algorithm;

the third determining and switching unit 309 is configured to determine, if the difference is smaller than the second preset difference, a complexity algorithm that is smaller than the level of the current complexity algorithm and is adjacent to the level as a new target complexity algorithm, and switch the current complexity algorithm to the new target complexity algorithm;

the determining and retaining unit 310 is configured to determine to retain the current complexity algorithm if the difference is zero.

For specific implementation manners of the level obtaining unit 305, the difference calculating unit 306, the first determining and switching unit 307, the second determining and switching unit 308, the third determining and switching unit 309, and the determining and reserving unit 310, reference may be made to the description of step S207 in the embodiment corresponding to fig. 3, and details will not be further described here.

Further, please refer to fig. 8, which is a schematic structural diagram of an acquisition processing module according to an embodiment of the present invention. As shown in fig. 8, the acquisition processing module 10 includes: the device comprises an acquisition judging unit 101, a first extracting unit 102, a data processing and encoding unit 103, a second extracting unit 104 and a data decoding and processing unit 105;

the acquisition judging unit 101 is configured to acquire first multimedia data and judge whether the first multimedia data belongs to a near-end multimedia data type; the first multimedia data comprises target multimedia data and environmental noise data;

the first extracting unit 102 is configured to, if the determination result is yes, extract a near-end algorithm set associated with the near-end multimedia data type from the current complexity algorithm;

the data processing and encoding unit 103 is configured to filter the environmental noise data carried in the first multimedia data according to the near-end algorithm set to obtain multimedia data to be encoded corresponding to the target multimedia data, and perform multimedia encoding on the multimedia data to be encoded to obtain a multimedia processing result corresponding to the first multimedia data.

The second extracting unit 104 is configured to, if it is determined that the first multimedia data does not belong to a near-end multimedia data type, extract a far-end algorithm set associated with a far-end multimedia data type from the current complexity algorithm;

the data decoding processing unit 105 is configured to perform multimedia decoding on the first multimedia data according to the far-end algorithm set to obtain decoded multimedia data carrying a voice detection identifier, and process the decoded multimedia data according to the far-end algorithm set to obtain a multimedia processing result corresponding to the first multimedia data.

For specific implementation manners of the acquisition determining unit 101, the first extracting unit 102, the data processing and encoding unit 103, the second extracting unit 104, and the data decoding processing unit 105, reference may be made to the description of step S101 in the embodiment corresponding to fig. 2, and details will not be further described here.

Further, please refer to fig. 9, which is a schematic structural diagram of a data processing encoding unit according to an embodiment of the present invention. As shown in fig. 9, the data processing encoding unit 103 includes: a first splitting subunit 1031, a first processing subunit 1032, a first encoding subunit 1033, a second splitting subunit 1034, a second processing subunit 1035, a second encoding subunit 1036, a third splitting subunit 1037, a third processing subunit 1038, and a third encoding subunit 1039;

the first splitting subunit 1031 is configured to perform frequency band splitting on the first multimedia data according to a down-sampling algorithm to obtain high-frequency component data and low-frequency component data in the first multimedia data;

the first processing subunit 1032 is configured to process the high-frequency component data and the low-frequency component data in the first multimedia data according to a high-complexity echo cancellation algorithm, a noise reduction algorithm, a high-complexity speech detection algorithm, a high-complexity automatic gain control algorithm, and an up-sampling algorithm, so as to obtain to-be-encoded multimedia data corresponding to the target multimedia data;

the first encoding subunit 1033 is configured to perform multimedia encoding on the multimedia data to be encoded according to a single-channel multimedia encoding algorithm, so as to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the second splitting subunit 1034 is configured to perform frequency band splitting on the first multimedia data according to a down-sampling algorithm, so as to obtain high-frequency component data and low-frequency component data in the first multimedia data;

the second processing subunit 1035 is configured to process the high-frequency component data and the low-frequency component data in the first multimedia data according to a low-complexity echo cancellation algorithm, a low-complexity noise reduction algorithm, a low-complexity speech detection algorithm, and an up-sampling algorithm, so as to obtain multimedia data to be encoded corresponding to the target multimedia data;

the second encoding subunit 1036 is configured to perform multimedia encoding on the multimedia data to be encoded according to a single-channel multimedia encoding algorithm, so as to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the third splitting subunit 1037 is configured to perform frequency band splitting on the first multimedia data according to a down-sampling algorithm, and set zero to high-frequency component data in the first multimedia data, so as to obtain low-frequency component data in the first multimedia data;

the third processing subunit 1038 is configured to process the low-frequency component data in the first multimedia data according to a low-complexity echo cancellation algorithm, a low-complexity noise reduction algorithm, a low-complexity speech detection algorithm, and an up-sampling algorithm, so as to obtain to-be-encoded multimedia data corresponding to the target multimedia data;

the third encoding subunit 1039 is configured to perform multimedia encoding on the multimedia data to be encoded according to a low-complexity multimedia encoding algorithm, so as to obtain a multimedia processing result corresponding to the first multimedia data.

For specific implementation manners of the first splitting sub-unit 1031, the first processing sub-unit 1032, the first encoding sub-unit 1033, the second splitting sub-unit 1034, the second processing sub-unit 1035, the second encoding sub-unit 1036, the third splitting sub-unit 1037, the third processing sub-unit 1038, and the third encoding sub-unit 1039, reference may be made to the description of the three complexity algorithms in the near-end algorithm set in the embodiment corresponding to fig. 2, and further description will not be repeated here.

Further, please refer to fig. 10, which is a schematic structural diagram of a data decoding processing unit according to an embodiment of the present invention. As shown in fig. 10, the data decoding processing unit 105 includes: a first decoding sub-unit 1051, a first result obtaining sub-unit 1052, a second decoding sub-unit 1053, a second result obtaining sub-unit 1054, a third decoding sub-unit 1055, and a third result obtaining sub-unit 1056;

the first decoding subunit 1051 is configured to perform multimedia decoding on the first multimedia data according to a decoding algorithm in the remote algorithm set, so as to obtain decoded multimedia data carrying a voice detection identifier;

the first result obtaining subunit 1052 is configured to process the decoded multimedia data according to a down-sampling algorithm, a howling suppression algorithm, a high-complexity noise reduction algorithm, a high-complexity voice detection algorithm, and an up-sampling algorithm carried in the remote algorithm set, so as to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the second decoding subunit 1053 is configured to perform multimedia decoding on the first multimedia data according to a decoding algorithm in the remote algorithm set, so as to obtain decoded multimedia data carrying a voice detection identifier;

the second result obtaining subunit 1054 is configured to process the decoded multimedia data according to a down-sampling algorithm, a low-complexity noise reduction algorithm, a low-complexity voice detection algorithm, and an up-sampling algorithm carried in the far-end algorithm set, so as to obtain a multimedia processing result corresponding to the first multimedia data.

Optionally, the third decoding subunit 1055 is configured to perform multimedia decoding on the first multimedia data according to a decoding algorithm in the remote algorithm set, so as to obtain decoded multimedia data carrying a voice detection identifier;

the third result obtaining subunit 1056 is configured to extract a voice detection identifier carried in the decoded multimedia data, and process the decoded multimedia data according to the voice detection identifier to obtain a multimedia processing result corresponding to the first multimedia data.

For specific implementation manners of the first decoding subunit 1051, the first result obtaining subunit 1052, the second decoding subunit 1053, the second result obtaining subunit 1054, the third decoding subunit 1055, and the third result obtaining subunit 1056, reference may be made to the description of the three complexity algorithms in the remote end algorithm set in the embodiment corresponding to fig. 2, and details will not be further described here.

Optionally, the data frame extracting module 100 is configured to extract a preset number of data frames from a multimedia processing result corresponding to the first multimedia data, where the data frames are used as historical data frames;

the notifying module 40 is specifically configured to notify the collecting and processing module 10 to process the second multimedia data according to the target complexity algorithm, so as to obtain to-be-encoded multimedia data corresponding to the second multimedia data; the multimedia coding algorithm in the target complexity algorithm is different from the multimedia coding algorithm in the current complexity algorithm;

the notifying module 40 is further specifically configured to notify the collecting and processing module 10 to encode the historical data frame and the to-be-encoded multimedia data corresponding to the second multimedia data according to a multimedia encoding algorithm in the target complexity algorithm, so as to obtain a multimedia processing result corresponding to the second multimedia data; the multimedia coding algorithm is a low-complexity multimedia coding algorithm or a single-channel multimedia coding algorithm.

Further, please refer to fig. 11, which is a schematic structural diagram of another multimedia data processing apparatus according to an embodiment of the present invention. As shown in fig. 11, the multimedia data processing apparatus 1000 may be applied to the anchor terminal 3000 in the embodiment corresponding to fig. 1, and the multimedia data processing apparatus 1000 may include: the processor 1001, the network interface 1004, and the memory 1004, the multimedia data processing apparatus 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1004 may optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 11, the memory 1004, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the multimedia data processing apparatus 1000 shown in fig. 11, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1004 to implement:

It should be understood that the multimedia data processing apparatus 1000 described in the embodiment of the present invention may perform the description of the multimedia data processing method in the embodiment corresponding to fig. 2 or fig. 3, and may also perform the description of the multimedia data processing apparatus in the embodiment corresponding to fig. 4, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present invention further provides a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the multimedia data processing apparatus 1, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the multimedia data processing method in the embodiment corresponding to fig. 2 or fig. 3 can be executed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for processing multimedia data, comprising:

if the number of the data frames in each preset threshold range meets a preset switching condition, switching the current complexity algorithm into a target complexity algorithm so as to facilitate the follow-up multimedia processing on the acquired second multimedia data according to the target complexity algorithm; the target complexity algorithm is matched with the processing capacity of the adjusted CPU (processor) corresponding to the second multimedia data, and the processing time consumption of the first multimedia data and the processing time consumption of the second multimedia data are consistent.

2. The method according to claim 1, wherein before the step of switching the current complexity algorithm to the target complexity algorithm if the number of data frames in each of the predetermined threshold ranges satisfies a predetermined switching condition, the method further comprises:

acquiring the total number of data frames in the first multimedia data;

3. The method according to claim 2, wherein if the number of data frames in each preset threshold range satisfies a preset switching condition, switching the current complexity algorithm to the target complexity algorithm comprises:

if the judgment result is yes, the current complexity algorithm is reserved.

4. The method according to claim 2, wherein if the number of data frames in each preset threshold range satisfies a preset switching condition, switching the current complexity algorithm to the target complexity algorithm comprises:

if the difference is smaller than a second preset difference, determining the complexity algorithm which is smaller than the level of the current complexity algorithm and adjacent to the level as a new target complexity algorithm, and switching the current complexity algorithm into the new target complexity algorithm;

5. The method of claim 1, wherein collecting first multimedia data and multimedia processing the first multimedia data according to a current complexity algorithm comprises:

6. The method of claim 5, wherein the current complexity algorithm is a preset high fidelity algorithm;

7. The method of claim 5, wherein the current complexity algorithm is a preset medium complexity algorithm;

8. The method of claim 5, wherein the current complexity algorithm is a preset low complexity algorithm;

9. The method of claim 5, further comprising:

10. The method of claim 9, wherein the current complexity algorithm is a preset high fidelity algorithm;

11. The method of claim 9, wherein the current complexity algorithm is a preset medium complexity algorithm;

12. The method of claim 9, wherein the current complexity algorithm is a preset low complexity algorithm;

13. The method of claim 5, further comprising, after the switching the current complexity algorithm to a target complexity algorithm:

14. A multimedia data processing apparatus, comprising:

the notification module is used for notifying the acquisition processing module to continue to perform multimedia processing on the acquired second multimedia data according to the target complexity algorithm; the target complexity algorithm is matched with the processing capacity of the adjusted CPU (processor) corresponding to the second multimedia data, and the processing time consumption of the first multimedia data and the processing time consumption of the second multimedia data are consistent.

15. The apparatus of claim 14, further comprising:

16. The apparatus of claim 15, wherein the algorithm switching module comprises:

17. The apparatus of claim 15, wherein the algorithm switching module comprises:

a third determining and switching unit, configured to determine, if the difference is smaller than a second preset difference, a complexity algorithm that is smaller than the level of the current complexity algorithm and is adjacent to the level as a new target complexity algorithm, and switch the current complexity algorithm to the new target complexity algorithm;

18. The apparatus of claim 14, the acquisition processing module comprising:

the system comprises a collecting and judging unit, a processing unit and a processing unit, wherein the collecting and judging unit is used for collecting first multimedia data and judging whether the first multimedia data belong to a near-end multimedia data type; the first multimedia data comprises target multimedia data and environmental noise data;

19. The apparatus of claim 18, wherein the current complexity algorithm is a preset high fidelity algorithm;

the data processing encoding unit includes:

20. The apparatus of claim 18, wherein the current complexity algorithm is a preset medium complexity algorithm;

the data processing encoding unit includes:

21. The apparatus of claim 18, wherein the current complexity algorithm is a preset low complexity algorithm;

the data processing encoding unit includes:

22. The apparatus of claim 18, wherein the acquisition processing module further comprises:

23. The apparatus of claim 22, wherein the current complexity algorithm is a preset high fidelity algorithm;

the data decoding processing unit includes:

24. The apparatus of claim 22, wherein the current complexity algorithm is a preset medium complexity algorithm;

the data decoding processing unit includes:

25. The apparatus of claim 22, wherein the current complexity algorithm is a preset low complexity algorithm;

the data decoding processing unit includes:

26. The apparatus of claim 18, further comprising:

27. A multimedia data processing apparatus, comprising: a processor, a memory, a network interface;

the processor is connected to a network interface for receiving and transmitting multimedia data, a memory for storing program code, and a memory for calling the program code to perform the method according to any one of claims 1 to 13, respectively.

28. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by the processor, perform the method according to any one of claims 1-13.