CN108347529B

CN108347529B - Audio playing method and mobile terminal

Info

Publication number: CN108347529B
Application number: CN201810096473.4A
Authority: CN
Inventors: 张立来
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-01-31
Filing date: 2018-01-31
Publication date: 2021-02-23
Anticipated expiration: 2038-01-31
Also published as: CN108347529A

Abstract

The invention discloses an audio playback method and a mobile terminal, wherein the method includes: acquiring recording data collected by a microphone buffered in a recording thread and acquiring accompanying sound data sent by a target application to a mixer; The recording data and the accompanying sound data are mixed and processed; the playback data after the mixing processing is sent to the playback thread. Since the invention directly transmits the recording data collected by the microphone to the mixer through the recording thread, the transmission time of the recording data collected by the microphone is reduced, the delay time of audio data playback is shortened, and the user experience is improved.

Description

Audio playing method and mobile terminal

Technical Field

The present invention relates to the field of audio processing, and in particular, to an audio playing method and a mobile terminal.

Background

Along with the improvement of living standard of people, people love music more and more deeply. The Chinese red singer, Chinese good voice and other large music activities are raised, more people pursue music from passive hearing to active singing, and many people begin to release own singing works on a social platform, so that more and more Karaoke software is in the vogue.

In the prior art, recording data recorded by a microphone needs to be transmitted to a recording thread of a karaoke software to a playing thread of the karaoke software sequentially through the recording thread, and the recording data is subjected to sound mixing processing on the playing thread of the karaoke software and accompanying sound data to obtain sound mixing audio data, and then is transmitted to a loudspeaker to be played.

Since a series of processes, such as virtual machine, interprocess communication, resampling and the like, are required to be performed on the recorded data from the microphone to the speaker, each of the above links affects the delay from recording to playing of the sound. Moreover, the technologies such as the processor and the flash memory of part of the mobile phones are relatively lagged, so that the delay problem of the part of the mobile phones is more serious. In the mobile phone in the prior art, the shortest delay time is about 70ms, and the worst delay time can exceed 300 ms. However, the minimum gap of sound recognized by human ears is 20ms, and such a long time gap causes serious inconsistency between the recorded sound and the accompaniment, which is very poor in experience.

In summary, in the prior art, the recording data is transmitted to the playing thread of the karaoke software through a plurality of processes, which results in a long delay time of audio playing.

Disclosure of Invention

The invention provides an audio playing method and a mobile terminal, which aim to solve the problem of long audio data playing delay time in the process of singing.

In a first aspect, an embodiment of the present invention provides an audio playing method applied to a mobile terminal, including:

acquiring recording data acquired by a microphone cached in a recording thread and acquiring accompanying sound data sent to a mixer by a target application program;

mixing the sound of the recording data and the accompanying sound data;

and sending the play data after the sound mixing processing to a play thread.

In a second aspect, an embodiment of the present invention further provides a mobile terminal, including:

the acquisition module is used for acquiring the recording data which is cached in the recording thread and is acquired by the microphone and acquiring the accompanying sound data which is sent to the playing thread by the target application program;

the processing module is used for carrying out sound mixing processing on the recording data and the accompanying sound data;

and the first sending module is used for sending the play data after the sound mixing processing to a play thread.

In a third aspect, an embodiment of the present invention further provides a mobile terminal, including a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the audio playing method as described above.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program implements the steps of the audio playing method as described above.

Therefore, in the embodiment of the invention, the recording data acquired by the microphone is directly transmitted to the audio mixer through the recording thread, so that the transmission time of the recording data acquired by the microphone is reduced, the delay time of audio data playing is shortened, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 shows one of the flow charts of an embodiment of the audio playback method of the present invention;

FIG. 2 is a flow chart illustrating mixing of audio data and recording data to obtain playback data according to an embodiment of the present invention;

FIG. 3 is a second flowchart of an audio playing method according to an embodiment of the present invention;

FIG. 4 shows one of the flow charts of an embodiment of the mobile terminal of the present invention;

FIG. 5 is a second flowchart of an embodiment of the mobile terminal of the present invention;

fig. 6 shows a block diagram of an embodiment of a mobile terminal of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, an embodiment of the present invention provides an audio playing method applied to a mobile terminal, including:

step 101, acquiring recording data acquired by a microphone cached in a recording thread and acquiring accompanying sound data sent to a mixer by a target application program.

Specifically, in the step of acquiring the recording data acquired by the microphone cached in the recording thread, the recording data acquired by the microphone is sequentially subjected to signal conversion and encoding processing in the recording thread.

The recording data collected by the microphone is an analog signal. After the microphone collects the analog signal, the microphone transmits the analog signal to a codec (coder/decoder) for signal conversion and encoding, and after the codec converts the analog signal into a digital signal, the codec encodes the digital recording data according to a specific audio file format.

The digital audio data after being encoded by codec is transmitted to the audio mixer of the mobile terminal sequentially through the alsa driver (Advanced Linux Sound Architecture direct, Sound card driver) and the adaptation layer tinyalsa of the Sound card driver.

After receiving the digital audio data, the audio mixer of the mobile terminal encapsulates the digital audio data into a recording audio encapsulation packet, and specifically, the recording audio encapsulation packet is a unified encapsulation packet audio replay of audio data played by a native layer android system.

In the embodiment of the invention, the target application program divides the accompaniment sound of a song into three parts, namely, a prelude accompanying sound, a positive song accompanying sound and a tail accompanying sound in sequence. For positive song accompaniment, the target application program sends the accompaniment data corresponding to each sentence of lyrics.

And 102, mixing the sound of the recording data and the sound data.

Specifically, the step of mixing the sound recording data and the sound data includes:

specifically, the mixer acquires time stamps corresponding to audio frames of the recording data and audio frames and time stamps corresponding to audio frames of the accompanying data and audio frames of the accompanying data, superimposes the audio frames with the same time stamps in the recording data and the accompanying data to obtain a synthesized audio frame, the synthesized audio frame has an original time stamp, and generates the playing data according to the sequence of the time stamps from front to back, wherein the synthesized audio frame has the original time stamp, the synthesized audio frame has the audio frame which does not have the recording data and the recording data has the audio frame which does not have the accompanying data.

Fig. 2 is a schematic diagram of sound recording data and audio data mixed to obtain playback data according to an embodiment of the present invention, where, as shown in fig. 2, the sound recording data 20 and the audio data 30 each include a plurality of audio frames, and each audio frame has a respective time stamp. If audio frames with the same time stamp exist in the recorded sound data 20 and the audio data 30, the intelligent terminal superimposes the audio frames with the same time stamp in the recorded sound data 20 and the audio data 30, for example, superimposes the audio frame a1 and the audio frame b1 with the same time stamp to obtain a synthesized audio frame c1, superimposes the audio frame a3 and the audio frame b3 with the same time stamp to obtain a synthesized audio frame c2, and superimposes the audio frame a4 and the audio frame b6 with the same time stamp to obtain a synthesized audio frame c 3. Finally, the other audio frames of the audio record data 20 and the audio data 30 that are not superimposed and the resulting synthesized audio frames are sequentially generated into the play data 40 from front to back according to the time stamps.

And 103, sending the play data after the sound mixing processing to a play thread.

Specifically, in the step of sending the play data after the audio mixing process to the play thread, the play data is subjected to decoding process and signal conversion in the play thread.

And after receiving the playing data, the playing thread decodes the playing data according to a decoding mode corresponding to the coding mode adopted by the recording thread. For example, when the audio data is encoded by the recording thread using the encoding method corresponding to the PCM, the playback thread encodes the playback data using the decoding method corresponding to the PCM.

According to the audio playing method provided by the embodiment of the invention, the recording data acquired by the microphone is directly transmitted to the audio mixer through the recording thread, so that the transmission time of the recording data acquired by the microphone is reduced, the delay time of audio data playing is shortened, and the user experience is improved.

Further, in an embodiment of the present invention, referring to fig. 3, before the step of obtaining the audio data sent by the target application to the playback thread, the method further includes:

and 104, receiving an accompanying sound playing request sent by the target application program.

The audio playing request is a request of a target application program to a packaging space used by the mixer for storing audio data.

Since the target application divides the accompaniment data of the song into three parts (prelude, positive song and ending), the accompanying sound playing requests corresponding to the three parts of accompanying sound data are different. The target application program firstly divides the song accompanying sound data into three major parts, and divides the accompanying sound of the song part according to the accompanying sound data corresponding to each sentence of lyrics. The target application program packages each part of the audio data into an IAVA audio data package packet respectively. Before sending the audio data to the audio mixer, the IAVA audio data encapsulation packet needs to be converted into a c/c + + audio data encapsulation packet.

And 105, judging whether the accompanying sound playing request carries identification information for indicating that the accompanying sound data is lyric accompanying sound data or not.

Before sending the accompanying sound playing request, the target application program needs to add identification information in the accompanying sound playing request according to whether the accompanying sound data to be sent is the lyric accompanying sound data, if so, the audio mixer needs to determine that the accompanying sound data to be sent is the lyric accompanying sound data.

And step 106, if the identification information for indicating that the accompanying sound data is the lyric accompanying sound data is carried, sending first indication information of sending the accompanying sound data after delaying for a preset time to the target application program.

Specifically, the predetermined time is a predetermined time input by the user.

And after receiving the first indication information, the target application program sends the audio data to the mixer after the preset time.

In this process, since the target application delays sending the audio data, the audio data is not output at the playback thread for the predetermined time. The user can compare a sentence of lyrics which are just recorded in the preset time to determine whether the pronunciation of the lyrics and the tune are correct. Meanwhile, a user does not need to manually input a pause instruction, and the user experience is improved.

Further, in an embodiment of the present invention, the method further includes:

and step 107, if the accompanying sound playing request does not carry the identification information for indicating that the accompanying sound data is the lyric accompanying sound data, sending second indication information for immediately sending the accompanying sound data to the target application program.

If the accompanying sound playing request does not carry the identification information, the accompanying sound data to be sent to the sound mixer by the target application program is indicated to be prelude data or ending data. Therefore, the song accompaniment does not need to be paused, and the playing time of the accompaniment is shortened.

In the embodiment of the invention, the method for delaying the karaoke accompaniment to be played can be set as a special playing mode, namely, if the user does not start the playing mode, the target application program outputs the accompanying sound data according to the method for continuously outputting the accompanying sound data in the existing method; only when the user starts the play mode, the target application outputs the audio data in the manner of steps 104 to 107.

By the audio playing method provided by the embodiment of the invention, the audio playing delay time in the process of singing k is shortened, and meanwhile, the audio data of the part of the singing can be automatically delayed and played according to the time set by the user, so that the user can carefully check the singing song in time, the user does not need to manually input a pause instruction, and the user experience is improved.

Referring to fig. 4, according to another aspect of the present invention, an embodiment of the present invention further provides a mobile terminal 200, including:

an obtaining module 201, configured to obtain recording data acquired by a microphone cached in a recording thread and obtain accompanying sound data sent by a target application to an audio mixer;

the processing module 202 is configured to perform sound mixing processing on the recording data and the accompanying sound data;

and the first sending module 203 is configured to send the play data after the audio mixing process to a play thread.

Preferably, in the step of acquiring the recording data collected by the microphone cached in the recording thread, the recording data collected by the microphone is sequentially subjected to signal conversion and encoding processing in the recording thread.

Preferably, in the step of sending the mixed playing data to a playing thread, the playing data is decoded and signal-converted in the playing thread.

Preferably, referring to fig. 5, the mobile terminal 200 further includes:

a receiving module 204, configured to receive an accompanying sound playing request sent by a target application;

a judging module 205, configured to judge whether the accompanying sound playing request carries identification information for indicating that the accompanying sound data is lyric accompanying sound data;

a second sending module 206, configured to send, to the target application program, first indication information for sending the accompanying sound data after a predetermined time delay if the identification information for indicating that the accompanying sound data is the lyric accompanying sound data is carried.

Preferably, referring to fig. 5, the mobile terminal 200 further includes:

a third sending module 207, configured to send a second indication message for immediately sending the accompanying sound data to the target application program if the accompanying sound playing request does not carry the identification information for indicating that the accompanying sound data is the lyric accompanying sound data.

The mobile terminal provided in the embodiment of the present invention can implement each process implemented by the mobile terminal in the method embodiments of fig. 1 to fig. 3, and is not described herein again to avoid repetition. The method has the advantages that the audio playing delay time in the process of singing the karaoke is shortened, meanwhile, the audio data of the part of the karaoke can be automatically delayed and played according to the time set by the user, so that the user can carefully check the singed song in time, the user does not need to manually input a pause instruction, and the user experience is improved.

Fig. 6 is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present invention, where the mobile terminal 300 includes but is not limited to: radio frequency unit 301, network module 302, audio output unit 303, input unit 304, sensor 305, display unit 306, user input unit 307, interface unit 308, memory 309, processor 310, and power supply 311. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 6 is not intended to be limiting of mobile terminals, and that a mobile terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

Wherein, the rf unit 301 is configured to transmit and receive signals under the control of the processor 310;

the processor 310 is configured to obtain recording data acquired by the microphone cached in the recording thread and obtain accompanying sound data sent to the mixer by the target application program; mixing the sound of the recorded data and the accompanying sound data; and sending the play data after the sound mixing processing to a play thread.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 301 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 310; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 301 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 301 can also communicate with a network and other devices through a wireless communication system.

The mobile terminal provides the user with wireless broadband internet access through the network module 302, such as helping the user send and receive e-mails, browse webpages, access streaming media, and the like.

The audio output unit 303 may convert audio data received by the radio frequency unit 301 or the network module 302 or stored in the memory 309 into an audio signal and output as sound. Also, the audio output unit 303 may also provide audio output related to a specific function performed by the mobile terminal 300 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 303 includes a speaker, a buzzer, a receiver, and the like.

The input unit 304 is used to receive audio or video signals. The input Unit 304 may include a Graphics Processing Unit (GPU) 3041 and a microphone 3042, and the Graphics processor 3041 processes image data of a still picture or video obtained by an image capturing apparatus (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 306. The image frames processed by the graphic processor 3041 may be stored in the memory 309 (or other storage medium) or transmitted via the radio frequency unit 301 or the network module 302. The microphone 3042 may receive sounds and may be capable of processing such sounds into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 301 in case of the phone call mode.

The mobile terminal 300 also includes at least one sensor 305, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that adjusts the brightness of the display panel 3061 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 3061 and/or a backlight when the mobile terminal 300 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 305 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 306 is used to display information input by the user or information provided to the user. The Display unit 306 may include a Display panel 3061, and the Display panel 3061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 307 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 307 includes a touch panel 3071 and other input devices 3072. The touch panel 3071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 3071 (e.g., operations by a user on or near the touch panel 3071 using a finger, a stylus, or any suitable object or attachment). The touch panel 3071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 310, and receives and executes commands sent by the processor 310. In addition, the touch panel 3071 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 307 may include other input devices 3072 in addition to the touch panel 3071. Specifically, the other input devices 3072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 3071 may be overlaid on the display panel 3061, and when the touch panel 3071 detects a touch operation on or near the touch panel, the touch operation is transmitted to the processor 310 to determine the type of the touch event, and then the processor 310 provides a corresponding visual output on the display panel 3061 according to the type of the touch event. Although the touch panel 3071 and the display panel 3061 are shown as two separate components in fig. 6 to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 3071 and the display panel 3061 may be integrated to implement the input and output functions of the mobile terminal, which is not limited herein.

The interface unit 308 is an interface through which an external device is connected to the mobile terminal 300. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 308 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 300 or may be used to transmit data between the mobile terminal 300 and external devices.

The memory 309 may be used to store software programs as well as various data. The memory 309 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 309 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 310 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 309 and calling data stored in the memory 309, thereby performing overall monitoring of the mobile terminal. Processor 310 may include one or more processing units; preferably, the processor 310 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 310.

The mobile terminal 300 may further include a power supply 311 (such as a battery) for supplying power to various components, and preferably, the power supply 311 may be logically connected to the processor 310 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

In addition, the mobile terminal 300 includes some functional modules that are not shown, and thus, the detailed description thereof is omitted.

Preferably, an embodiment of the present invention further provides a mobile terminal, which includes a processor 310, a memory 309, and a computer program stored in the memory 309 and capable of running on the processor 310, where the computer program is executed by the processor 310 to implement each process of the above-mentioned audio playing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned audio playing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

While the preferred embodiments of the present invention have been described, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the following claims.

Claims

1. an audio playback method, applied to a mobile terminal, is characterized in that, comprising:

Obtain the recording data collected by the microphone cached in the recording thread and obtain the audio data sent by the target application to the mixer;

Mixing processing is performed on the recorded data and the accompanying sound data;

sending the playback data after the mixing process to the playback thread;

Before the step of acquiring the audio data sent by the target application to the playing thread, the method further includes:

Receive the audio playback request sent by the target application; wherein, the audio playback request is a request from the target application to the mixer for the encapsulation space for storing audio data;

Judging whether the accompanying sound playback request carries the identification information for indicating that the accompanying sound data is lyrics accompanying sound data;

The judging whether the accompanying sound playback request carries the identification information for indicating that the accompanying sound data is the lyrics accompanying sound data specifically includes:

Before the target application program sends the accompanying sound playing request, if the accompanying sound data to be sent is the lyrics accompanying sound data, then in the accompanying sound playing request, add an identification information, so that the mixer can determine that the accompanying sound data to be sent is Lyric sound data;

If carrying the identification information for indicating that the accompanying sound data is lyric accompanying sound data, then send to the target application program a first indication information for sending the accompanying sound data after a delay of a predetermined time; the predetermined time is the predetermined time input by the user;

If the audio playback request does not carry identification information for indicating that the audio data is lyric audio data, a second instruction message for immediately sending the audio data is sent to the target application.

2. audio playback method according to claim 1 is characterized in that, in the step of acquiring the recording data collected by the microphone buffered in the recording thread, the recording data collected by the microphone passes through signal conversion successively in the recording thread and encoding processing.

3. The audio playback method according to claim 1, wherein in the step of sending the playback data after the mixing process to a playback thread, in the playback thread, the playback data is performed. Decoding processing and signal conversion.

4. A mobile terminal, characterized in that, comprising:

The acquisition module is used to acquire the recording data collected by the microphone cached in the recording thread and acquire the accompanying sound data sent by the target application to the mixer;

a processing module, for performing mixing processing on the recording data and the accompanying sound data;

a first sending module, configured to send the playback data after the mixing process to a playback thread;

The mobile terminal also includes:

A receiving module, for receiving a request for audio playback sent by a target application; wherein, the request for audio playback is a request from the target application to the mixer for storing the encapsulation space of audio data;

Judging module, for judging whether the accompanying sound playback request carries identification information for indicating that the accompanying sound data is lyrics accompanying sound data;

The judging module is specifically used for:

The second sending module is configured to send, to the target application program, first instruction information for sending the accompanying sound data after a delay of a predetermined time if the identification information for indicating that the accompanying sound data is lyric accompanying sound data is carried; the predetermined time is the scheduled time entered by the user;

The third sending module is configured to send, to the target application, second instruction information for immediately sending the audio data if the audio playback request does not carry identification information for indicating that the audio data is lyric audio data.

5. mobile terminal according to claim 4, is characterized in that, in the step of acquiring the recording data collected by the microphone buffered in the recording thread, the recording data collected by the microphone sequentially passes through signal conversion and encoding process.

6. The mobile terminal according to claim 4, wherein in the step of sending the playback data after the mixing process to a playback thread, in the playback thread, the playback data is decoded processing and signal conversion.