CN105979469B

CN105979469B - A recording processing method and terminal

Info

Publication number: CN105979469B
Application number: CN201610509141.5A
Authority: CN
Inventors: 黄业伟
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2016-06-29
Filing date: 2016-06-29
Publication date: 2020-01-31
Anticipated expiration: 2036-06-29
Also published as: CN105979469A

Abstract

The invention provides a sound recording processing method and a terminal, wherein the sound recording processing method comprises the steps of obtaining scene image information collected by a camera and sound information collected by a microphone during sound recording, obtaining position information of each sound source in a scene image according to the scene image information, generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted during sound recording playing, and synthesizing the sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information.

Description

A recording processing method and terminal

技术领域technical field

本发明涉及终端技术领域，尤其涉及一种录音处理方法及终端。The present invention relates to the technical field of terminals, and in particular, to a recording processing method and a terminal.

背景技术Background technique

移动终端技术迅速发展，人们常使用移动终端进行录音和录像，记录生活事件。With the rapid development of mobile terminal technology, people often use mobile terminals to record and record life events.

立体声录音和录像更能提高场景还原度，在移动终端的双喇叭或者耳机场景下播放更为立体，提升用户体验。立体声录音录像的常用方法是使用移动终端中的多个麦克风进行声音采集，多个麦克风的定位效果会更好。Stereo audio recording and video recording can improve the degree of scene restoration, and play more three-dimensionally in the dual speakers or headphones of the mobile terminal, improving the user experience. A common method for stereo audio recording and video recording is to use multiple microphones in the mobile terminal for sound collection, and the positioning effect of multiple microphones will be better.

然而，较多移动终端只有单个麦克风的配置，多个麦克风的配置一般受限于终端的尺寸，如果移动终端的尺寸较小，多个麦克风之间的相对位置较近，声音定位差，录音录像效果并不好。However, many mobile terminals only have a single microphone configuration, and the configuration of multiple microphones is generally limited by the size of the terminal. If the size of the mobile terminal is small, the relative positions of the multiple microphones are relatively close, the sound localization is poor, and the audio and video recording The effect is not good.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明提供一种录音处理方法及终端，现有的移动终端使用单个麦克风录音难以合成多声道声音的问题。In view of this, the present invention provides a recording processing method and terminal, and the existing mobile terminal uses a single microphone to record the problem that it is difficult to synthesize multi-channel sound.

为解决上述技术问题，一方面，本发明提供一种录音处理方法，应用于一终端，所述方法包括：In order to solve the above technical problems, on the one hand, the present invention provides a recording processing method, which is applied to a terminal, and the method includes:

获取录音录像时摄像头采集的场景影像信息以及麦克风采集的声音信息；Obtain the scene image information collected by the camera and the sound information collected by the microphone during recording and video recording;

根据所述场景影像信息获取场景影像中的每一声源的位置信息；Obtain position information of each sound source in the scene image according to the scene image information;

根据所述每一声源的位置信息以及播放录音时需要采用的多个声道，生成与每一所述声源的位置信息对应的声道系数信息；Generate channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and the multiple channels that need to be used when playing the recording;

根据所述声道系数信息，将所述麦克风采集的声音信息合成为多声道音频数据。According to the channel coefficient information, the sound information collected by the microphone is synthesized into multi-channel audio data.

另一方面，本发明还提供一种终端，包括：On the other hand, the present invention also provides a terminal, comprising:

获取模块，用于获取录音录像时摄像头采集的场景影像信息以及麦克风采集的声音信息；The acquisition module is used to acquire the scene image information collected by the camera and the sound information collected by the microphone during recording and recording;

位置信息获取模块，用于根据所述场景影像信息获取场景影像中的每一声源的位置信息；a location information acquisition module, configured to acquire location information of each sound source in the scene image according to the scene image information;

声道系数确定模块，用于根据所述每一声源的位置信息以及播放录音时需要采用的多个声道，生成与所述每一声源的位置信息对应的声道系数信息；A channel coefficient determination module, configured to generate channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of channels that need to be used when playing the recording;

合成模块，用于根据所述声道系数信息，将所述麦克风采集的声音信息合成为多声道音频数据。The synthesis module is used for synthesizing the sound information collected by the microphone into multi-channel audio data according to the channel coefficient information.

本发明的上述技术方案的有益效果如下：The beneficial effects of the above-mentioned technical solutions of the present invention are as follows:

只需采用一个麦克风采集的单声道声音信息便可合成多声道音频数据，因而用于录音的终端中不需要设置多个麦克风，降低了用于录音的终端的成本。Multi-channel audio data can be synthesized only by using the monophonic sound information collected by one microphone, so the terminal used for recording does not need to set up multiple microphones, which reduces the cost of the terminal used for recording.

附图说明Description of drawings

图1为本发明实施例一的录音处理方法的流程图；1 is a flowchart of a recording processing method according to Embodiment 1 of the present invention;

图2为本发明实施例二的录音处理方法的流程图；2 is a flowchart of a recording processing method according to Embodiment 2 of the present invention;

图3为本发明实施例三的录音处理方法的流程图；3 is a flowchart of a recording processing method according to Embodiment 3 of the present invention;

图4为本发明实施例的终端的结构框图。FIG. 4 is a structural block diagram of a terminal according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合附图和实施例，对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明，但不用来限制本发明的范围。The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. The following examples are intended to illustrate the present invention, but not to limit the scope of the present invention.

请参考图1，图1为本发明实施例一的录音处理方法的流程图，所述方法应用于一终端，包括以下步骤：Please refer to FIG. 1. FIG. 1 is a flowchart of a recording processing method according to Embodiment 1 of the present invention. The method is applied to a terminal and includes the following steps:

步骤S11：获取录音录像时摄像头采集的场景影像信息以及麦克风采集的声音信息。Step S11: Acquire scene image information collected by the camera and sound information collected by the microphone during audio recording and video recording.

所述麦克风采集的声音信息为单声道声音信息。The sound information collected by the microphone is monophonic sound information.

步骤S12：根据所述场景影像信息获取场景影像中的每一声源的位置信息。Step S12: Acquire position information of each sound source in the scene image according to the scene image information.

场景影像中的声源可以为一个，也可以多于一个。There can be one or more than one sound source in the scene image.

所述声源的位置信息是指所述声源在场景影像中的位置信息，例如可以包括：声源是位于场景影像的左二分之一部分，还是位于场景影像的右二分之一部分。或者，声源在场景影像的横向方向上的比例系数，例如，声源在场景影像的横向方向上的最左端时，比例系数为[1,0]，在场景影像的横向方向上的最右端时，比例系数为[0,1]，在场景影像的横向方向上的中心点时，比例系数为[0.5,0.5]。或者，还可以包括声源在场景影像中的前后信息，即声源距离摄像头的相对距离信息。当然，也可以为其他类型的位置信息。The position information of the sound source refers to the position information of the sound source in the scene image, for example, it may include: whether the sound source is located in the left half of the scene image, or is located in the right half of the scene image part. Or, the scale factor of the sound source in the horizontal direction of the scene image, for example, when the sound source is at the leftmost end of the scene image in the horizontal direction, the scale factor is [1,0], and at the rightmost end of the scene image in the horizontal direction When the scale factor is [0, 1], when the center point in the horizontal direction of the scene image, the scale factor is [0.5, 0.5]. Alternatively, it may also include information about the front and rear of the sound source in the scene image, that is, information about the relative distance between the sound source and the camera. Of course, other types of location information can also be used.

步骤S13：根据所述每一声源的位置信息以及播放录音时需要采用的多个声道，生成与所述每一声源的位置信息对应的声道系数信息。Step S13: Generate channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of channels to be used when playing the recording.

所述声道系数信息是指播放录音时每一声道所占的比例系数。The channel coefficient information refers to the proportional coefficient occupied by each channel when playing the recording.

举例来说，播放录音时采用的多个声道包括左声道和右声道，所述声源在场景影像中的位置信息是：声源是位于场景影像的左二分之一部分，还是位于场景影像的右二分之一部分的位置信息。当声源位于场景影像的左二分之一部分时，所述声源的位置信息对应的声道系数信息可以为[1,0]，即由左声道播放声源的声音信息，右声道没有声音。当声源位于场景影像的右二分之一部分时，所述声源的位置信息对应的声道系数信息可以为[0,1]，即由右声道播放声源的声音信息，左声道没有声音。For example, the multiple channels used when playing the recording include a left channel and a right channel, and the position information of the sound source in the scene image is: the sound source is located in the left half of the scene image, Also the position information located in the right half of the scene image. When the sound source is located in the left half of the scene image, the channel coefficient information corresponding to the position information of the sound source may be [1,0], that is, the sound information of the sound source is played by the left channel, and the sound information of the sound source is played by the left channel. The channel has no sound. When the sound source is located in the right half of the scene image, the channel coefficient information corresponding to the position information of the sound source may be [0, 1], that is, the sound information of the sound source is played by the right channel, and the sound information of the sound source is played by the right channel. The channel has no sound.

再一例子可以是，播放录音时采用的多个声道包括左声道和右声道，所述声源在场景影像中的位置信息是：声源在场景影像的横向方向上的比例系数。假设声源在场景影像的横向方向上的比例系数为[0.2,0.8]，则由左声道播放声源20％的声音，由左声道播放声源80％的声音。Another example may be that the multiple channels used when playing the recording include a left channel and a right channel, and the position information of the sound source in the scene image is: the scale coefficient of the sound source in the lateral direction of the scene image. Assuming that the scale factor of the sound source in the horizontal direction of the scene image is [0.2, 0.8], the left channel plays the sound of 20% of the sound source, and the left channel plays the sound of 80% of the sound source.

当然，除了双声道，播放录音时也可以采用更多声道，例如三声道，其声道系数信息可以类似于为[0.2，0.4，0.4]。Of course, in addition to two channels, more channels can also be used when playing the recording, for example, three channels, whose channel coefficient information can be similar to [0.2, 0.4, 0.4].

步骤S14：根据所述声道系数信息，将所述麦克风采集的声音信息合成为多声道音频数据。Step S14: Synthesize the sound information collected by the microphone into multi-channel audio data according to the channel coefficient information.

本发明实施例中，只需采用一个麦克风采集的单声道声音信息便可合成多声道音频数据，因而用于录音的终端中不需要设置多个麦克风，降低了用于录音的终端的成本。In this embodiment of the present invention, multi-channel audio data can be synthesized by using only the monophonic sound information collected by one microphone. Therefore, multiple microphones do not need to be set in the terminal used for recording, which reduces the cost of the terminal used for recording. .

上述实施例中执行录音处理方法的终端可以同时是录音的终端，也可以不是录音的终端，仅用于处理录音，例如，所述终端可以为一电脑，而用于录音的终端可以为一摄像机，所述摄像机将录制的录音录像传输给电脑，由电脑进行多声道声音的合成。In the above embodiment, the terminal that executes the recording processing method may be a recording terminal at the same time, or may not be a recording terminal, and is only used for processing recording. For example, the terminal may be a computer, and the recording terminal may be a video camera. , the camera transmits the recorded audio and video to the computer, and the computer performs multi-channel sound synthesis.

也就是说，上述合成多声道声音的时间可以是在录音录像的同时，也可以是在录音录影后进行多声道声音的合成，例如在播放录音录影时在进行多声道声音的合成。That is to say, the time for synthesizing the multi-channel sound may be at the same time as the audio and video recording, or it may be the multi-channel sound synthesis after the audio and video recording, for example, the multi-channel sound synthesis is performed when the audio and video recording is played.

另外，上述用于执行录音处理方法的终端还可以用于播放合成的多声道信息，即，所述用于播放录音的多声道为用于执行录音处理方法的终端上的多声道，所述将所述麦克风采集的声音信息合成为多声道音频数据的步骤之后，还可以包括：播放所述多声道音频数据。In addition, the above-mentioned terminal for executing the recording processing method may also be used to play the synthesized multi-channel information, that is, the multi-channel for playing the recording is the multi-channel on the terminal for executing the recording processing method, After the step of synthesizing the sound information collected by the microphone into multi-channel audio data, the method may further include: playing the multi-channel audio data.

当然，所述用于播放录音的多声道也可以不是用于执行录音处理方法的终端上的多声道，而是其他播放设备上的多声道，此时，所述将所述麦克风采集的声音信息合成为多声道音频数据的步骤之后，还可以包括：将所述多声道音频数据传输给一播放设备播放，所述多声道为所述播放设备上的多声道。也就是说，所述终端仅负责将录音合成为多声道声音，并不负责播放。Of course, the multi-channel used for playing the recording may not be the multi-channel on the terminal used for executing the recording processing method, but the multi-channel on other playback devices. After the step of synthesizing the sound information of the device into multi-channel audio data, the method may further include: transmitting the multi-channel audio data to a playback device for playback, where the multi-channel is the multi-channel on the playback device. That is to say, the terminal is only responsible for synthesizing the recording into multi-channel sound, and is not responsible for playing.

请参考图2，图2为本发明实施例二的录音处理方法的流程图，所述方法应用于一终端，本发明实施例中的终端包括摄像头和一麦克风，所述包括以下步骤：Please refer to FIG. 2. FIG. 2 is a flowchart of a recording processing method according to Embodiment 2 of the present invention. The method is applied to a terminal. The terminal in this embodiment of the present invention includes a camera and a microphone, and the method includes the following steps:

步骤S21：当接收到打开录音录像功能的请求时，开启所述摄像头采集场景影像信息，以及开启所述麦克风采集声音信息。Step S21 : when a request for enabling the recording and recording function is received, the camera is turned on to collect scene image information, and the microphone is turned on to collect sound information.

所述录音录像功能可以为所述终端中的摄像应用软件中的录音录像功能，也可以为所述终端中的实时通信应用软件中的录音录像功能，例如微信的视频聊天功能。The recording and recording function may be the recording and recording function in the camera application software in the terminal, or may be the recording and recording function in the real-time communication application software in the terminal, such as the video chat function of WeChat.

步骤S22：获取录音录像时摄像头采集的场景影像信息以及麦克风采集的声音信息。Step S22: Acquire scene image information collected by the camera and sound information collected by the microphone during audio recording and video recording.

步骤S23：根据所述场景影像信息获取场景影像中的每一声源的位置信息。Step S23: Acquire position information of each sound source in the scene image according to the scene image information.

步骤S24：根据所述每一声源的位置信息以及播放录音时需要采用的多个声道，生成与每一所述声源的位置信息对应的声道系数信息。Step S24: Generate channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of channels to be used when playing the recording.

步骤S25：根据所述声道系数信息，将所述麦克风采集的声音信息合成为多声道音频数据。Step S25: Synthesize the sound information collected by the microphone into multi-channel audio data according to the channel coefficient information.

本发明实施例中，执行录音处理方法的终端，同时为录音录像的终端。并且可以在录音录像的同时，合成多声道音频数据。In this embodiment of the present invention, the terminal that executes the recording processing method is also a terminal that performs recording and video recording. And it can synthesize multi-channel audio data while recording and recording.

本发明实施例中，可以采用图像识别技术，获取每一声源在场景影像中的位置信息，下面举例进行说明。In the embodiment of the present invention, an image recognition technology may be used to obtain the position information of each sound source in the scene image, which will be described with an example below.

请参考图3，图3为本发明实施例三的录音处理方法的流程图，所述方法应用于一终端，包括以下步骤：Please refer to FIG. 3. FIG. 3 is a flowchart of a recording processing method according to Embodiment 3 of the present invention. The method is applied to a terminal and includes the following steps:

步骤S31：获取录音录像时摄像头采集的场景影像信息以及麦克风采集的声音信息。Step S31: Acquire scene image information collected by the camera and sound information collected by the microphone during audio recording and video recording.

步骤S32：根据所述场景影像信息，识别场景影像中的发声的生物体。Step S32: Identify the vocalized organism in the scene image according to the scene image information.

所述生物体包括人和动物。The organisms include humans and animals.

步骤S33：对所述场景影像中的发声的生物体进行面部识别，确定每一声源。Step S33: Perform facial recognition on the vocalized organism in the scene image, and determine each sound source.

例如从连续的影像中，识别声源的嘴唇，面部变化，继而识别出声源。For example, from continuous images, the lips and faces of the sound source are identified, and then the sound source is identified.

步骤S34：获取所述每一声源的位置信息。Step S34: Acquire location information of each sound source.

步骤S35：根据所述每一声源的位置信息以及播放录音时需要采用的多个声道，生成与所述每一声源的位置信息对应的声道系数信息。Step S35: Generate channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of channels to be used when playing the recording.

步骤S36：根据所述声道系数信息，将所述麦克风采集的声音信息合成为多声道音频数据。Step S36: Synthesize the sound information collected by the microphone into multi-channel audio data according to the channel coefficient information.

本发明实施例中，通过面部识别技术，确定声源的位置信息，实现方式简单。In the embodiment of the present invention, the position information of the sound source is determined through the facial recognition technology, and the implementation manner is simple.

当然，在本发明的其他一些实施例中，也可以通过其他方法确定声源的位置信息。Of course, in some other embodiments of the present invention, the position information of the sound source may also be determined by other methods.

上述实施例中提到，所述声源的位置信息可以是声源是位于场景影像的左二分之一部分，还是位于场景影像的右二分之一部分。当声源的位置信息表示所述声源位于场景影像的左二分之一部分时，所述声源的位置信息对应的声道系数信息配置为采用左声道播放所述声源的声音信息，右声道没有声音。例如，声道系数信息可以表示为[1,0]。当声源的位置信息表示所述声源位于场景影像的右二分之一部分时，所述声源的位置信息对应的声道系数信息配置为采用右声道播放所述声源的声音信息，左声道没有声音。例如，声道系数信息可以表示为[0,1]。As mentioned in the above embodiment, the position information of the sound source may be whether the sound source is located in the left half of the scene image or in the right half of the scene image. When the position information of the sound source indicates that the sound source is located in the left half of the scene image, the channel coefficient information corresponding to the position information of the sound source is configured to use the left channel to play the sound of the sound source information, there is no sound from the right channel. For example, channel coefficient information may be represented as [1, 0]. When the position information of the sound source indicates that the sound source is located in the right half of the scene image, the channel coefficient information corresponding to the position information of the sound source is configured to use the right channel to play the sound of the sound source information, there is no sound from the left channel. For example, channel coefficient information may be represented as [0,1].

当在一个时段同时包括多个声源时，上述声道系数信息可以采用矩阵的方式表示，例如同一时段包括两个声源，两个声源的声道系数信息可以表示为。When a period of time includes multiple sound sources at the same time, the channel coefficient information can be expressed in the form of a matrix. For example, if the same period includes two sound sources, the channel coefficient information of the two sound sources can be expressed as .

上述实施例中提到，所述声源的位置信息还可以是声源在场景影像的横向方向上的比例系数，例如，声源在场景影像的横向方向上的最左端时，比例系数为[1,0]，在场景影像的横向方向上的最右端时，比例系数为[0,1]，在场景影像的横向方向上的中心点时，比例系数为[0.5,0.5]。此时，所述根据所述每一声源的位置信息以及播放录音时需要采用的多个声道，生成与每一所述声源的位置信息对应的声道系数信息的步骤包括：根据所述声源的位置信息，计算所述声源在场景影像的横向方向上的比例系数；根据所述声源在场景影像的横向方向上的比例系数，计算左声道和右声道所占的系数信息，得到所述声源的位置信息对应的声道系数信息。例如，声源在场景影像的横向方向上的比例系数为[1,0]，此时，声道系数信息同样为[1,0]。As mentioned in the above embodiment, the position information of the sound source may also be the scale coefficient of the sound source in the horizontal direction of the scene image. For example, when the sound source is at the leftmost end of the horizontal direction of the scene image, the scale coefficient is [ 1,0], the scale factor is [0,1] when it is at the far right end of the scene image in the horizontal direction, and the scale factor is [0.5, 0.5] when it is at the center point in the horizontal direction of the scene image. At this time, the step of generating channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and the multiple channels that need to be used when playing the recording includes: according to the position information of the sound source, calculate the scale coefficient of the sound source in the horizontal direction of the scene image; calculate the coefficient occupied by the left channel and the right channel according to the scale coefficient of the sound source in the horizontal direction of the scene image information to obtain channel coefficient information corresponding to the position information of the sound source. For example, the scale coefficient of the sound source in the lateral direction of the scene image is [1,0], and at this time, the channel coefficient information is also [1,0].

上述实施例中提到，所述声源的位置信息还可以是声源在场景影像中的前后信息，即声源距离摄像头的相对距离信息。此时，可以配合用于播放录音的终端上具有前后设置多声道的场景。As mentioned in the above embodiment, the position information of the sound source may also be information about the front and rear of the sound source in the scene image, that is, the relative distance information between the sound source and the camera. In this case, the terminal used for playing the recording can be equipped with a scene where multiple channels are set before and after.

当然，声源的位置信息也可以为其他类型，再次不再一一举例说明。Of course, the location information of the sound source may also be of other types, and will not be illustrated one by one again.

请参考图4，本发明实施例还提供一种终端，包括：Referring to FIG. 4 , an embodiment of the present invention further provides a terminal, including:

上述终端可以为手机、平板电脑、摄像机或台式电脑等终端。The above-mentioned terminal may be a terminal such as a mobile phone, a tablet computer, a camera, or a desktop computer.

优选地，所述终端还包括：Preferably, the terminal further includes:

播放模块，用于播放所述多声道音频数据。A playing module is used to play the multi-channel audio data.

优选地，所述终端还包括：Preferably, the terminal further includes:

所述摄像头和所述麦克风；以及the camera and the microphone; and

控制模块，用于当接收到打开录音录像功能的请求时，控制所述摄像头开启并采集场景影像信息，以及控制所述麦克风开启并采集声音信息。The control module is configured to control the camera to turn on and collect scene image information, and control the microphone to turn on and collect sound information when a request to turn on the recording and video recording function is received.

在本发明的一实施例中，所述位置信息获取模块包括：In an embodiment of the present invention, the location information acquisition module includes:

第一识别单元，用于根据所述场景影像信息，识别场景影像中的发声的生物体；a first identification unit, configured to identify the vocalized organism in the scene image according to the scene image information;

第二识别单元，用于对所述场景影像中的发声的生物体进行面部识别，确定每一声源；a second recognition unit, configured to perform facial recognition on the vocalized organism in the scene image, and determine each sound source;

获取单元，用于获取所述每一声源的位置信息。an acquiring unit, configured to acquire the position information of each sound source.

以上所述是本发明的优选实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明所述原理的前提下，还可以作出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above are the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.

Claims

1, recording processing method, applied to terminal, characterized in that the method includes:

acquiring scene image information acquired by a camera and sound information acquired by a microphone during recording; the sound information collected by the microphone is single sound channel sound information;

acquiring position information of each sound source in the scene image according to the scene image information;

generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be adopted when the sound record is played;

synthesizing sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information;

the multiple sound channels required to be adopted when the recording is played comprise a left sound channel and a right sound channel of a terminal used for playing the recording, and the step of generating sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and the multiple sound channels required to be adopted when the recording is played comprises the following steps:

calculating a proportionality coefficient of the sound source in the transverse direction of the scene image according to the position information of the sound source;

calculating coefficient information occupied by a left sound channel and a right sound channel according to the proportional coefficient of the sound source in the transverse direction of the scene image to obtain sound channel coefficient information corresponding to the position information of the sound source;

a plurality of sound channels that need adopt when playing the recording still include a plurality of sound channels that are used for playing the terminal of recording and set up from beginning to end, the positional information of sound source still includes: the sound source is information before and after the scene image.

2. The recording processing method according to claim 1, wherein after the step of synthesizing the sound information collected by the microphones into multi-channel audio data, the method further comprises:

and playing the multi-channel audio data.

3. The audio recording processing method according to claim 1, wherein the terminal includes the camera and the microphone, and before the step of acquiring the scene image information collected by the camera and the sound information collected by the microphone during audio recording, the method further includes:

when a request for opening a recording function is received, the camera is started to collect scene image information, and the microphone is started to collect sound information.

4. The audio recording method according to claim 3, wherein the audio recording function is an audio recording function in a camera application in a terminal or an audio recording function in a real-time communication application in the terminal.

5. The audio recording method according to claim 1, wherein the step of obtaining location information of each sound source in the scene image according to the scene image information comprises:

identifying a sounding organism in the scene image according to the scene image information;

carrying out facial recognition on the sounding organisms in the scene images, and determining each sound source;

the position information of each sound source is acquired.

6. The recording processing method of claim 1, wherein the step of generating channel coefficient information corresponding to the position information of each of the sound sources based on the position information of each of the sound sources and a plurality of channels required to be used when the recording is played comprises:

when the position information of the sound source indicates that the sound source is positioned in the left half part of the scene image, the sound channel coefficient information corresponding to the position information of the sound source is configured to play the sound information of the sound source by using a left sound channel;

when the position information of the sound source indicates that the sound source is located in the right-half part of the scene image, the channel coefficient information corresponding to the position information of the sound source is configured to play the sound information of the sound source in the right channel.

7. The audio recording processing method according to claim 1, wherein the audio recording processing method is executed by the terminal at the time of audio recording; or the terminal executes the playing of the audio and video recording.

A terminal of the type , comprising:

the acquisition module is used for acquiring scene image information acquired by the camera and sound information acquired by the microphone during recording; the sound information collected by the microphone is single sound channel sound information;

the position information acquisition module is used for acquiring the position information of each sound source in the scene image according to the scene image information;

a sound channel coefficient determining module, configured to generate sound channel coefficient information corresponding to the position information of each sound source according to the position information of each sound source and a plurality of sound channels required to be used when a recording is played;

the synthesis module is used for synthesizing the sound information collected by the microphone into multi-channel audio data according to the sound channel coefficient information;

the plurality of sound channels required to be adopted when the recording is played comprise a left sound channel and a right sound channel of a terminal for playing the recording;

the sound channel coefficient determining module is used for calculating a proportionality coefficient of the sound source in the transverse direction of the scene image according to the position information of the sound source; calculating coefficient information occupied by a left sound channel and a right sound channel according to the proportional coefficient of the sound source in the transverse direction of the scene image to obtain sound channel coefficient information corresponding to the position information of the sound source;

9. The terminal of claim 8, further comprising:

and the playing module is used for playing the multi-channel audio data.

10. The terminal of claim 8, further comprising:

the camera and the microphone; and

and the control module is used for controlling the camera to start and collect scene image information and controlling the microphone to start and collect sound information when receiving a request for starting the recording and video recording function.

11. The terminal of claim 8, wherein the location information obtaining module comprises:

an recognition unit for recognizing a biological body that utters in a scene image based on the scene image information;

a second recognition unit configured to perform face recognition on the biological object that uttered sound in the scene image, and identify each sound source;

an obtaining unit configured to obtain the position information of each sound source.