[go: up one dir, main page]

CN111212032B - Audio processing method, device, electronic equipment and storage medium based on Internet of Vision - Google Patents

Audio processing method, device, electronic equipment and storage medium based on Internet of Vision Download PDF

Info

Publication number
CN111212032B
CN111212032B CN201911285285.7A CN201911285285A CN111212032B CN 111212032 B CN111212032 B CN 111212032B CN 201911285285 A CN201911285285 A CN 201911285285A CN 111212032 B CN111212032 B CN 111212032B
Authority
CN
China
Prior art keywords
audio data
audio
microphone
mobile terminal
internet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911285285.7A
Other languages
Chinese (zh)
Other versions
CN111212032A (en
Inventor
蔡耀
曾绳涛
韩杰
杨春晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Visionvera Information Technology Co Ltd
Original Assignee
Visionvera Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Visionvera Information Technology Co Ltd filed Critical Visionvera Information Technology Co Ltd
Priority to CN201911285285.7A priority Critical patent/CN111212032B/en
Publication of CN111212032A publication Critical patent/CN111212032A/en
Application granted granted Critical
Publication of CN111212032B publication Critical patent/CN111212032B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请实施例提供了一种基于视联网的音频处理方法、装置、电子设备及存储介质,在视联网中部署有视联网终端,视联网终端与流媒体服务器通信连接,流媒体服务器与移动终端通信连接,移动终端上配置有音频播放组件,所述方法应用于移动终端中设置的应用程序对象,包括:在检测到预设的音视频通话业务被开启时,触发移动终端上预设的音频采集模式;接收流媒体服务器发送的第一音频数据,并调用音频播放组件,对第一音频数据进行播放;获取由移动终端在音频采集模式下采集的第二音频数据;根据第一音频数据及第二音频数据,对第二音频数据进行回声消除处理,得到回声消除处理后的目标音频数据。本申请可以提高视联网会议的音频通话质量。

Figure 201911285285

The embodiment of the present application provides an audio processing method, device, electronic equipment, and storage medium based on the Internet of Vision. In the Internet of Vision, an Internet-of-Vision terminal is deployed, and the Internet-of-Vision terminal communicates with a streaming media server. Communication connection, the mobile terminal is equipped with an audio playback component, the method is applied to the application object set in the mobile terminal, including: when it is detected that the preset audio and video call service is turned on, triggering the preset audio on the mobile terminal Collection mode; receive the first audio data sent by the streaming media server, and call the audio playback component to play the first audio data; obtain the second audio data collected by the mobile terminal in the audio collection mode; according to the first audio data and For the second audio data, echo cancellation processing is performed on the second audio data to obtain target audio data after echo cancellation processing. The application can improve the audio communication quality of the video network conference.

Figure 201911285285

Description

基于视联网的音频处理方法、装置、电子设备及存储介质Audio processing method, device, electronic equipment and storage medium based on Internet of Vision

技术领域technical field

本申请涉及数据处理技术领域,特别是涉及一种基于视联网的音频处理方法、装置、电子设备及存储介质。The present application relates to the technical field of data processing, in particular to an audio processing method, device, electronic equipment and storage medium based on the Internet of Vision.

背景技术Background technique

当前,随着视联网业务在全国范围内的普及发展,视联网高清视联交互技术在政府部门已经其它行业中发挥着举足轻重的作用。视联网采用全球最先进的VisionVera实时高清视频交换技术,实现了目前互联网无法实现的全网高清视频实时传输,将高清视频会议、视频监控、远程培训、智能化监控分析、应急指挥、视频电话、现场直播、电视邮件、信息发布等数十种视频、语音、图片、文字、通讯、数据等服务全部整合在一个系统平台,通过多种终端设备实现高清品质视频通信实时互联互通。At present, with the popularization and development of the Internet of Vision business across the country, the Internet of Vision high-definition Internet of Vision interactive technology plays a pivotal role in government departments and other industries. The Internet of Vision adopts the world's most advanced VisionVera real-time high-definition video switching technology, which realizes the real-time transmission of high-definition video on the entire network that cannot be realized by the Internet at present, and integrates high-definition video conferencing, video monitoring, remote training, intelligent monitoring and analysis, emergency command, video telephony, Dozens of video, voice, picture, text, communication, data and other services such as live broadcast, TV mail, and information release are all integrated into one system platform, and real-time interconnection of high-definition quality video communication is realized through a variety of terminal devices.

随着视联网视频会议的广泛运用,部分视联网视频会议是在视联网和 4G网络的环境下进行的。例如,在无人机入会参会中,无人机通常会连接一个手机,在手机上通过手机上安装的视联网内应用对无人机进行操控,同时需要通过该视联网内应用与指挥大厅的视联网终端进行音视频通话。一般在无人机参会的视联网视频会议中,无人机参会方一般处于野外环境,这样,用户一般会在手机上连接麦克风,在连接麦克风的情况下,手机一边播放对方的声音,一边用麦克风进行采集。但是手机播放对方的声音后,该声音会产生回声,进而回声又与新采集的声音一起传送给视联网终端。这样,造成指挥大厅的视联网终端在播放回传的声音时,对方就会听到在前次通话中他们自己发出的回声。With the widespread use of video conferences on the Internet of Things, some video conferences on the Internet of Things are carried out under the environment of the Internet of Things and 4G networks. For example, when a UAV joins a meeting, the UAV is usually connected to a mobile phone, and the UAV is controlled on the mobile phone through the application in the Internet of Vision installed on the mobile phone. At the same time, the application in the Internet of Vision and the command hall The Internet of Things terminal can make audio and video calls. Generally, in the Internet-of-Vision video conference where drones participate in the meeting, the participants of the drone are usually in the wild environment. In this way, the user usually connects a microphone to the mobile phone. When the microphone is connected, the mobile phone plays the voice of the other party. While collecting with a microphone. However, after the mobile phone plays the voice of the other party, the voice will generate an echo, and then the echo will be transmitted to the Internet of Vision terminal together with the newly collected voice. In this way, when the Internet-of-Vision terminal in the command hall is playing the returned sound, the other party will hear the echo that they themselves sent in the previous call.

现有技术中,视联网视频会议为了抑制该回声,一般采取的方式是:通过设置时间间隔的方式进行,以使人耳无法区分回声和新采集的声音,但是此种方式,并不能完全杜绝回声,对时间间隔的设置要求较高,且由于这个循环回路一直进行,从而使得回声越累积越多,最后出现嗡鸣声,影响通话质量。In the prior art, in order to suppress the echo in the Internet-based video conferencing, the general way is to set a time interval so that the human ear cannot distinguish the echo from the newly collected sound, but this method cannot completely eliminate the echo. Echo has a high requirement for setting the time interval, and because this loop continues continuously, the echo will accumulate more and more, and finally there will be a buzzing sound, which will affect the quality of the call.

发明内容Contents of the invention

鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种基于视联网的音频处理方法、装置、电子设备及存储介质。In view of the above problems, the embodiments of the present application are proposed to provide an audio processing method, device, electronic device and storage medium based on the Internet of Vision that overcomes the above problems or at least partially solves the above problems.

第一方面,本申请实施例提供一种基于视联网的音频处理方法,在所述视联网中部署有视联网终端,所述视联网终端与流媒体服务器通信连接,所述流媒体服务器与移动终端通信连接,所述移动终端上配置有音频播放组件,所述方法应用于所述移动终端中设置的应用程序对象,包括:In the first aspect, the embodiment of the present application provides an audio processing method based on the Internet of Vision, in which an Internet of Vision terminal is deployed, and the Internet of Vision terminal communicates with a streaming media server, and the streaming media server communicates with a mobile A terminal communication connection, the mobile terminal is configured with an audio playback component, and the method is applied to an application object set in the mobile terminal, including:

在检测到预设的音视频通话业务被开启时,触发所述移动终端上预设的音频采集模式;When it is detected that the preset audio and video call service is turned on, trigger the preset audio collection mode on the mobile terminal;

接收所述流媒体服务器在所述音视频通话业务中发送的第一音频数据,并调用所述音频播放组件,对所述第一音频数据进行播放;其中,所述第一音频数据由所述视联网终端发送给所述流媒体服务器;receiving the first audio data sent by the streaming media server in the audio and video call service, and calling the audio playback component to play the first audio data; wherein, the first audio data is provided by the The video network terminal sends to the streaming media server;

获取由所述移动终端在所述音频采集模式下采集的第二音频数据;acquiring second audio data collected by the mobile terminal in the audio collection mode;

根据所述第一音频数据及所述第二音频数据,对所述第二音频数据进行回声消除处理,得到回声消除处理后的目标音频数据;performing echo cancellation processing on the second audio data according to the first audio data and the second audio data, to obtain target audio data after echo cancellation processing;

将所述目标音频数据发送给所述流媒体服务器,所述流媒体服务器用于将所述目标音频数据发送给所述视联网终端。sending the target audio data to the streaming media server, and the streaming media server is configured to send the target audio data to the Internet of Vision terminal.

可选地,在所述移动终端上配置有第一麦克风及第二麦克风,触发所述移动终端上预设的音频采集模式,包括:Optionally, a first microphone and a second microphone are configured on the mobile terminal to trigger a preset audio collection mode on the mobile terminal, including:

调用所述第一麦克风及所述第二麦克风;invoke the first microphone and the second microphone;

获取由所述移动终端在所述音频采集模式下采集的第二音频数据,包括:Obtaining the second audio data collected by the mobile terminal in the audio collection mode includes:

获取由所述第一麦克风采集的第一麦克风音频数据,以及由所述第二麦克风采集的第二麦克风音频数据;acquiring first microphone audio data collected by the first microphone, and second microphone audio data collected by the second microphone;

根据所述第一麦克风音频数据及所述第二麦克风音频数据,对所述第二麦克风音频数据进行降噪处理,得到第二音频数据。Perform noise reduction processing on the second microphone audio data according to the first microphone audio data and the second microphone audio data to obtain second audio data.

可选地,根据所述第一音频数据及所述第二音频数据,对所述第二音频数据进行回声消除处理,得到回声消除处理后的目标音频数据,包括:Optionally, according to the first audio data and the second audio data, performing echo cancellation processing on the second audio data to obtain target audio data after echo cancellation processing, including:

在所述第二音频数据中,确定与所述第一音频数据对应的第三音频数据;In the second audio data, determining third audio data corresponding to the first audio data;

将所述第三音频数据从所述第二音频数据中滤除,得到滤除所述第三音频数据的目标音频数据。The third audio data is filtered out from the second audio data to obtain target audio data from which the third audio data is filtered out.

可选地,在触发所述移动终端上预设的音频采集模式的同时,所述方法还包括:Optionally, while triggering the preset audio collection mode on the mobile terminal, the method further includes:

调用所述移动终端中设置的自适应滤波器;calling the adaptive filter set in the mobile terminal;

在所述第二音频数据中,确定与所述第一音频数据的频率相同的第三音频数据,包括:In the second audio data, determining third audio data having the same frequency as the first audio data includes:

将所述第一音频数据输入所述自适应滤波器,得到由所述自适应滤波器输出的输出音频数据;inputting the first audio data into the adaptive filter to obtain output audio data output by the adaptive filter;

在所述第二音频数据中,确定与所述输出音频数据的频率相同的第三音频数据。Among the second audio data, third audio data having the same frequency as the output audio data is determined.

可选地,调用所述移动终端上配置的自适应滤波器,包括:Optionally, invoking the adaptive filter configured on the mobile terminal includes:

确定所述视联网终端上与所述应用程序对象适配的至少一个应用程序接口,并确定所述至少一个应用程序接口中是否存在目标接口;Determining at least one application program interface adapted to the application program object on the Internet of Vision terminal, and determining whether a target interface exists in the at least one application program interface;

在所述至少一个应用程序接口中存在所述目标接口时,通过所述目标接口调用与所述目标接口对应的自适应滤波器;When the target interface exists in the at least one application program interface, call the adaptive filter corresponding to the target interface through the target interface;

在所述至少一个应用程序接口中不存在所述目标接口时,通过预设的应用程序接口调用与所述预设的应用程序接口对应的自适应滤波器When the target interface does not exist in the at least one application program interface, call the adaptive filter corresponding to the preset application program interface through the preset application program interface

第二方面,本申请实施例提供一种基于视联网的音频处理装置,在所述视联网中部署有视联网终端,所述视联网终端与流媒体服务器通信连接,所述流媒体服务器与移动终端通信连接,所述移动终端上配置有音频播放组件,所述装置应用于所述移动终端中设置的应用程序对象,所述装置具体地可以为虚拟装置,具体可以包括以下模块:In the second aspect, the embodiment of the present application provides an audio processing device based on the Internet of Vision, in which an Internet of Vision terminal is deployed, and the Internet of Vision terminal communicates with a streaming media server, and the streaming media server communicates with a mobile Terminal communication connection, the mobile terminal is equipped with an audio playback component, the device is applied to the application program object set in the mobile terminal, the device can specifically be a virtual device, and can specifically include the following modules:

音频模式触发模块,用于在检测到预设的音频通话业务被开启时,触发所述移动终端上预设的音频数据采集模式;An audio mode trigger module, configured to trigger a preset audio data collection mode on the mobile terminal when it is detected that the preset audio call service is turned on;

音频数据接收并播放模块,用于接收所述流媒体服务器发送的第一音频数据,调用所述音频播放组件,对所述第一音频数据进行播放;所述第一音频数据由所述视联网终端发送给所述流媒体服务器;An audio data receiving and playing module, configured to receive the first audio data sent by the streaming media server, call the audio playing component, and play the first audio data; the first audio data is provided by the video network The terminal sends to the streaming media server;

音频数据采集模块,用于获取由所述移动终端在所述音频数据采集模式下采集的第二音频数据;an audio data collection module, configured to obtain second audio data collected by the mobile terminal in the audio data collection mode;

音频数据处理模块,用于根据所述第一音频数据及所述第二音频数据,对所述第二音频数据进行回声消除处理,得到回声消除处理后的目标音频数据;An audio data processing module, configured to perform echo cancellation processing on the second audio data according to the first audio data and the second audio data, to obtain target audio data after echo cancellation processing;

音频数据发送模块,用于将所述目标音频数据发送给所述流媒体服务器,所述流媒体服务器用于将所述目标音频数据发送给所述视联网终端。An audio data sending module, configured to send the target audio data to the streaming media server, and the streaming media server is configured to send the target audio data to the Internet-of-Video terminal.

可选地,在所述移动终端上配置有第一麦克风及第二麦克风,所述音频模式触发模块,具体可以用于调用所述第一麦克风及所述第二麦克风;Optionally, a first microphone and a second microphone are configured on the mobile terminal, and the audio mode trigger module may specifically be used to invoke the first microphone and the second microphone;

所述音频数据采集模块,具体可以包括以下单元:The audio data acquisition module may specifically include the following units:

麦克风音频数据获取单元,用于获取由所述第一麦克风采集的第一麦克风音频数据,以及由所述第二麦克风采集的第二麦克风音频数据;a microphone audio data acquisition unit, configured to acquire first microphone audio data collected by the first microphone, and second microphone audio data collected by the second microphone;

降噪处理单元,用于根据所述第一麦克风音频数据及所述第二麦克风音频数据,对所述第二麦克风音频数据进行降噪处理,得到第二音频数据。The noise reduction processing unit is configured to perform noise reduction processing on the second microphone audio data according to the first microphone audio data and the second microphone audio data to obtain second audio data.

可选地,所述音频数据处理模块,具体可以包括以下单元:Optionally, the audio data processing module may specifically include the following units:

音频数据查找单元,用于在所述第二音频数据中,确定与所述第一音频数据对应的第三音频数据;an audio data search unit, configured to determine, among the second audio data, third audio data corresponding to the first audio data;

音频数据滤除单元,用于将所述第三音频数据从所述第二音频数据中滤除,得到滤除所述第三音频数据的目标音频数据。An audio data filtering unit, configured to filter out the third audio data from the second audio data to obtain target audio data from which the third audio data is filtered out.

可选地,所述装置还具体可以包括以下模块:Optionally, the device may specifically include the following modules:

调用模块,用于调用所述移动终端中设置的自适应滤波器;A calling module, used to call the adaptive filter set in the mobile terminal;

所述音频数据查找单元,具体可以包括以下单元:The audio data search unit may specifically include the following units:

音频数据输入单元,用于将所述第一音频数据输入所述自适应滤波器,得到由所述自适应滤波器输出的输出音频数据;an audio data input unit, configured to input the first audio data into the adaptive filter to obtain output audio data output by the adaptive filter;

音频数据确定单元,用于在所述第二音频数据中,确定与所述输出音频数据的频率相同的第三音频数据。The audio data determining unit is configured to determine, among the second audio data, third audio data having the same frequency as the output audio data.

可选地,所述调用模块,具体可以包括以下单元:Optionally, the calling module may specifically include the following units:

目标接口确定单元,用于确定所述视联网终端上与所述应用程序对象适配的至少一个应用程序接口,并确定所述至少一个应用程序接口中是否存在目标接口;A target interface determining unit, configured to determine at least one application program interface adapted to the application program object on the Internet-of-Vision terminal, and determine whether there is a target interface in the at least one application program interface;

第一调用单元,用于在所述至少一个应用程序接口中存在所述目标接口时,通过所述目标接口调用与所述目标接口对应的自适应滤波器;A first calling unit, configured to call the adaptive filter corresponding to the target interface through the target interface when the target interface exists in the at least one application program interface;

第二调用单元,用于在所述至少一个应用程序接口中不存在所述目标接口时,通过预设的应用程序接口调用与所述预设的应用程序接口对应的自适应滤波器。The second calling unit is configured to call the adaptive filter corresponding to the preset application program interface through the preset application program interface when the target interface does not exist in the at least one application program interface.

第三方面,本申请实施例还公开了一种电子设备,包括:In the third aspect, the embodiment of the present application also discloses an electronic device, including:

一个或多个处理器;和one or more processors; and

其上存储有指令的一个或多个机器可读介质,当由所述一个或多个处理器执行时,使得所述设备执行如本申请实施例所述的一个或多个的基于视联网的音频处理方法。One or more machine-readable media having instructions stored thereon, when executed by the one or more processors, causes the device to execute one or more Internet-of-Vision-based Audio processing methods.

第四方面,本申请实施例还公开了一种计算机可读存储介质,其存储的计算机程序使得处理器执行如本申请实施例所述的基于视联网的音频处理方法。In the fourth aspect, the embodiment of the present application also discloses a computer-readable storage medium, which stores a computer program to enable the processor to execute the video-network-based audio processing method described in the embodiment of the present application.

与现有技术相比,本申请实施例包括以下优点:Compared with the prior art, the embodiment of the present application includes the following advantages:

在本申请实施例中,在移动终端中设置的应用程序对象检测到预设的音视频通话业务被开启时,便触发预设的音频采集模式。之后,在接收到流媒体服务器发送的第一音频数据时,可以调用移动终端上的音频播放组件对该第一音频数据进行播放,之后,获取移动终端在音频采集模式下采集的第二音频数据,并根据第一音频数据,对第二音频数据进行回声消除处理,以得到目标音频数据,进而可以将回声消除处理后的目标音频数据经由流媒体服务器发送给视联网终端。由于应用程序对象可以使得移动终端在预设的音频采集模式下采集第二音频数据,进而可以提高第二音频数据的音频质量,又由于本申请是根据第一音频数据,对采集到的第二音频数据进行回声消除处理,可以在第二音频数据中对播放第一音频数据所产生的回声进行消除,进而可以提高本申请对回声处理的效果,进而提高通话质量。In the embodiment of the present application, when the application object set in the mobile terminal detects that the preset audio and video call service is enabled, the preset audio collection mode is triggered. Afterwards, when receiving the first audio data sent by the streaming media server, the audio playback component on the mobile terminal can be called to play the first audio data, and then the second audio data collected by the mobile terminal in the audio collection mode can be obtained , and perform echo cancellation processing on the second audio data according to the first audio data to obtain target audio data, and then send the target audio data after the echo cancellation processing to the Internet-of-Vision terminal via the streaming media server. Because the application program object can make the mobile terminal collect the second audio data in the preset audio collection mode, and then can improve the audio quality of the second audio data, and because the application is based on the first audio data, the collected second audio data The echo cancellation processing of the audio data can cancel the echo generated by playing the first audio data in the second audio data, thereby improving the echo processing effect of the present application, and further improving the call quality.

附图说明Description of drawings

为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments of the present application. Obviously, the accompanying drawings in the following description are only some embodiments of the present application , for those skilled in the art, other drawings can also be obtained according to these drawings without paying creative labor.

图1是本申请的一种视联网的组网示意图;Fig. 1 is a kind of networking schematic diagram of the present application;

图2是本申请的一种节点服务器的硬件结构示意图;Fig. 2 is a schematic diagram of the hardware structure of a node server of the present application;

图3是本申请的一种接入交换机的硬件结构示意图;FIG. 3 is a schematic diagram of a hardware structure of an access switch of the present application;

图4是本申请的一种以太网协转网关的硬件结构示意图;Fig. 4 is the hardware structural representation of a kind of Ethernet protocol conversion gateway of the present application;

图5是本申请实施例的一种基于视联网的音频处理方法的应用场景图;FIG. 5 is an application scene diagram of an audio processing method based on the Internet of Vision according to an embodiment of the present application;

图6是本申请实施例的一种基于视联网的音频处理方法的步骤流程图;FIG. 6 is a flow chart of the steps of an audio processing method based on the Internet of Vision according to an embodiment of the present application;

图7是本申请实施例的一种基于视联网的音频处理装置的结构示意图。FIG. 7 is a schematic structural diagram of an audio processing device based on the Internet of Vision according to an embodiment of the present application.

具体实施方式detailed description

为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。In order to make the above objects, features and advantages of the present application more obvious and comprehensible, the present application will be further described in detail below in conjunction with the accompanying drawings and specific implementation methods.

视联网是网络发展的重要里程碑,是一个实时网络,能够实现高清视频实时传输,将众多互联网应用推向高清视频化,高清面对面。The Internet of Vision is an important milestone in the development of the network. It is a real-time network that can realize real-time transmission of high-definition video, and push many Internet applications to high-definition video, high-definition face-to-face.

视联网采用实时高清视频交换技术,可以在一个网络平台上将所需的服务,如高清视频会议、视频监控、智能化监控分析、应急指挥、数字广播电视、延时电视、网络教学、现场直播、VOD点播、电视邮件、个性录制(PVR)、内网(自办)频道、智能化视频播控、信息发布等数十种视频、语音、图片、文字、通讯、数据等服务全部整合在一个系统平台,通过电视或电脑实现高清品质视频播放。The Internet of View adopts real-time high-definition video exchange technology, which can provide required services on one network platform, such as high-definition video conferencing, video surveillance, intelligent monitoring and analysis, emergency command, digital broadcast TV, time-lapse TV, online teaching, live broadcast , VOD on demand, TV mail, personalized recording (PVR), intranet (self-managed) channel, intelligent video broadcast control, information release and other dozens of video, voice, picture, text, communication, data and other services are all integrated in one System platform, realize high-definition quality video playback through TV or computer.

为使本领域技术人员更好地理解本申请实施例,以下对视联网进行介绍:In order to enable those skilled in the art to better understand the embodiment of the present application, the Internet of Things is introduced as follows:

视联网所应用的部分技术如下所述:Some of the technologies applied in the Internet of Things are as follows:

网络技术(NetworkTechnology)Network Technology (Network Technology)

视联网的网络技术创新改良了传统以太网(Ethernet),以面对网络上潜在的巨大第一视频流量。不同于单纯的网络分组包交换(Packet Switching) 或网络电路交换(Circuit Switching),视联网技术采用Packet Switching满足Streaming需求。视联网技术具备分组交换的灵活、简单和低价,同时具备电路交换的品质和安全保证,实现了全网交换式虚拟电路,以及数据格式的无缝连接。The network technology innovation of the Internet of View has improved the traditional Ethernet (Ethernet) to face the potentially huge first video traffic on the network. Different from pure network packet switching (Packet Switching) or network circuit switching (Circuit Switching), video networking technology uses Packet Switching to meet Streaming requirements. The Internet of Vision technology has the flexibility, simplicity and low price of packet switching, and at the same time has the quality and security guarantee of circuit switching, realizing the seamless connection of switched virtual circuits and data formats throughout the network.

交换技术(Switching Technology)Switching Technology

视联网采用以太网的异步和包交换两个优点,在全兼容的前提下消除了以太网缺陷,具备全网端到端无缝连接,直通用户终端,直接承载IP数据包。用户数据在全网范围内不需任何格式转换。视联网是以太网的更高级形态,是一个实时交换平台,能够实现目前互联网无法实现的全网大规模高清视频实时传输,将众多网络视频应用推向高清化、统一化。Video networking adopts the two advantages of Ethernet asynchronous and packet switching, eliminates the defects of Ethernet under the premise of full compatibility, has end-to-end seamless connection of the whole network, directly connects to user terminals, and directly carries IP data packets. User data does not require any format conversion within the entire network. Video networking is a more advanced form of Ethernet. It is a real-time switching platform, which can realize the real-time transmission of large-scale high-definition video in the whole network that cannot be realized by the Internet at present, and push many network video applications to high-definition and unification.

服务器技术(ServerTechnology)Server Technology (Server Technology)

视联网和统一视频平台上的服务器技术不同于传统意义上的服务器,它的流媒体传输是建立在面向连接的基础上,其数据处理能力与流量、通讯时间无关,单个网络层就能够包含信令及数据传输。对于语音和视频业务来说,视联网和统一视频平台流媒体处理的复杂度比数据处理简单许多,效率比传统服务器大大提高了百倍以上。The server technology on the Internet of View and unified video platform is different from the server in the traditional sense. Its streaming media transmission is based on connection-oriented, and its data processing capability has nothing to do with traffic and communication time. A single network layer can contain information command and data transmission. For voice and video services, the complexity of video streaming and unified video platform streaming media processing is much simpler than data processing, and the efficiency is greatly improved by more than 100 times compared with traditional servers.

储存器技术(Storage Technology)Storage Technology

统一视频平台的超高速储存器技术为了适应超大容量和超大流量的媒体内容而采用了最先进的实时操作系统,将服务器指令中的节目信息映射到具体的硬盘空间,媒体内容不再经过服务器,瞬间直接送达到用户终端,用户等待一般时间小于0.2秒。最优化的扇区分布大大减少了硬盘磁头寻道的机械运动,资源消耗仅占同等级IP互联网的20%,但产生大于传统硬盘阵列3倍的并发流量,综合效率提升10倍以上。The ultra-high-speed storage technology of the unified video platform adopts the most advanced real-time operating system in order to adapt to the super-large capacity and super-large-flow media content, and maps the program information in the server command to the specific hard disk space, and the media content no longer passes through the server. It is delivered directly to the user terminal in an instant, and the user generally waits for less than 0.2 seconds. The optimized sector distribution greatly reduces the mechanical movement of the hard disk head seeking. The resource consumption is only 20% of the IP Internet of the same level, but the concurrent traffic generated is 3 times larger than that of the traditional hard disk array, and the overall efficiency is increased by more than 10 times.

网络安全技术(Network Security Technology)Network Security Technology

视联网的结构性设计通过每次服务单独许可制、设备与用户数据完全隔离等方式从结构上彻底根除了困扰互联网的网络安全问题,一般不需要杀毒程序、防火墙,杜绝了黑客与病毒的攻击,为用户提供结构性的无忧安全网络。The structural design of the Internet of View completely eradicates the network security problems that plague the Internet through the individual licensing system for each service, complete isolation of equipment and user data, and generally does not require anti-virus programs and firewalls, preventing hackers and virus attacks , to provide users with a structured worry-free security network.

服务创新技术(Service Innovation Technology)Service Innovation Technology

统一视频平台将业务与传输融合在一起,不论是单个用户、私网用户还是一个网络的总合,都不过是一次自动连接。用户终端、机顶盒或PC直接连到统一视频平台,获得丰富多彩的各种形态的多媒体视频服务。统一视频平台采用“菜谱式”配表模式来替代传统的复杂应用编程,可以使用非常少的代码即可实现复杂的应用,实现“无限量”的新业务创新。The unified video platform integrates business and transmission together, whether it is a single user, a private network user or the sum of a network, it is just an automatic connection. User terminals, set-top boxes or PCs are directly connected to the unified video platform to obtain rich and colorful multimedia video services in various forms. The unified video platform adopts the "recipe-style" table matching mode to replace the traditional complex application programming. It can realize complex applications with very little code and realize "unlimited" new business innovations.

视联网的组网如下所述:The networking of the Internet of View is as follows:

视联网是一种集中控制的网络结构,该网络可以是树型网、星型网、环状网等等类型,但在此基础上网络中需要有集中控制节点来控制整个网络。Vision networking is a network structure with centralized control. The network can be a tree network, star network, ring network, etc., but on this basis, a centralized control node is required in the network to control the entire network.

如图1所示,视联网分为接入网和城域网两部分。As shown in Figure 1, the Internet of Things is divided into two parts: the access network and the metropolitan area network.

接入网部分的设备主要可以分为3类:节点服务器,接入交换机,终端(包括各种机顶盒、编码板、存储器等)。节点服务器与接入交换机相连,接入交换机可以与多个终端相连,并可以连接以太网。The equipment in the access network part can be mainly divided into three categories: node server, access switch, terminal (including various set-top boxes, encoding boards, storage, etc.). The node server is connected with the access switch, and the access switch can be connected with multiple terminals and can be connected with Ethernet.

其中,节点服务器是接入网中起集中控制功能的节点,可控制接入交换机和终端。节点服务器可直接与接入交换机相连,也可以直接与终端相连。Wherein, the node server is a node with centralized control function in the access network, which can control the access switches and terminals. The node server can be directly connected to the access switch, and can also be directly connected to the terminal.

类似的,城域网部分的设备也可以分为3类:城域服务器,节点交换机,节点服务器。城域服务器与节点交换机相连,节点交换机可以与多个节点服务器相连。Similarly, the devices in the MAN part can also be divided into three categories: MAN servers, node switches, and node servers. The metro server is connected to the node switch, and the node switch can be connected to multiple node servers.

其中,节点服务器即为接入网部分的节点服务器,即节点服务器既属于接入网部分,又属于城域网部分。Wherein, the node server is the node server of the access network part, that is, the node server belongs to both the access network part and the metropolitan area network part.

城域服务器是城域网中起集中控制功能的节点,可控制节点交换机和节点服务器。城域服务器可直接连接节点交换机,也可直接连接节点服务器。The metropolitan area server is a node with a centralized control function in the metropolitan area network, which can control node switches and node servers. The metro server can be directly connected to the node switch, or directly connected to the node server.

由此可见,整个视联网络是一种分层集中控制的网络结构,而节点服务器和城域服务器下控制的网络可以是树型、星型、环状等各种结构。It can be seen that the entire Vision Network is a layered centralized control network structure, while the network controlled by the node server and the metro server can be in various structures such as tree, star, and ring.

形象地称,接入网部分可以组成统一视频平台(虚线圈中部分),多个统一视频平台可以组成视联网;每个统一视频平台可以通过城域以及广域视联网互联互通。Vividly speaking, the access network part can form a unified video platform (the part in the dotted circle), and multiple unified video platforms can form a video network; each unified video platform can be interconnected through the metropolitan area and the wide area video network.

视联网设备分类Classification of Internet of Things devices

1.1本申请实施例的视联网中的设备主要可以分为3类:服务器,交换机(包括以太网协转网关),终端(包括各种机顶盒,编码板,存储器等)。视联网整体上可以分为城域网(或者国家网、全球网等)和接入网。1.1 The devices in the Internet of View in the embodiment of the present application can be mainly divided into three categories: servers, switches (including Ethernet protocol conversion gateways), terminals (including various set-top boxes, encoding boards, memory, etc.). As a whole, the Internet of Things can be divided into a metropolitan area network (or a national network, a global network, etc.) and an access network.

1.2其中接入网部分的设备主要可以分为3类:节点服务器,接入交换机(包括以太网协转网关),终端(包括各种机顶盒,编码板,存储器等)。1.2 The equipment in the access network can be mainly divided into three categories: node server, access switch (including Ethernet protocol conversion gateway), terminal (including various set-top boxes, encoding boards, storage, etc.).

各接入网设备的具体硬件结构为:The specific hardware structure of each access network device is:

节点服务器:Node server:

如图2所示,主要包括网络接口模块201、交换引擎模块202、CPU 模块203、磁盘阵列模块204;As shown in Figure 2, it mainly includes a network interface module 201, a switching engine module 202, a CPU module 203, and a disk array module 204;

其中,网络接口模块201,CPU模块203、磁盘阵列模块204进来的包均进入交换引擎模块202;交换引擎模块202对进来的包进行查地址表 205的操作,从而获得包的导向信息;并根据包的导向信息把该包存入对应的包缓存器206的队列;如果包缓存器206的队列接近满,则丢弃;交换引擎模块202轮询所有包缓存器队列,如果满足以下条件进行转发: 1)该端口发送缓存未满;2)该队列包计数器大于零。磁盘阵列模块204 主要实现对硬盘的控制,包括对硬盘的初始化、读写等操作;CPU模块 203主要负责与接入交换机、终端(图中未示出)之间的协议处理,对地址表205(包括下行协议包地址表、上行协议包地址表、数据包地址表) 的配置,以及,对磁盘阵列模块204的配置。Wherein, network interface module 201, the bag that CPU module 203, disk array module 204 come in all enters switching engine module 202; The guiding information of packet stores this packet into the queue of corresponding packet cache 206; If the queue of packet cache 206 is close to full, then discards; Switching engine module 202 polls all packet cache queues, if satisfying following conditions forwarding: 1) The sending buffer of the port is not full; 2) The queue packet counter is greater than zero. The disk array module 204 mainly realizes the control of the hard disk, including operations such as initialization and reading and writing of the hard disk; (including the configuration of the downlink protocol packet address table, the uplink protocol packet address table, and the data packet address table), and the configuration of the disk array module 204 .

接入交换机:Access switch:

如图3所示,主要包括网络接口模块(下行网络接口模块301、上行网络接口模块302)、交换引擎模块303和CPU模块304;As shown in Figure 3, mainly comprise network interface module (downlink network interface module 301, uplink network interface module 302), switching engine module 303 and CPU module 304;

其中,下行网络接口模块301进来的包(上行数据)进入包检测模块305;包检测模块305检测包的目地地址(DA)、源地址(SA)、数据包类型及包长度是否符合要求,如果符合,则分配相应的流标识符 (stream-id),并进入交换引擎模块303,否则丢弃;上行网络接口模块 302进来的包(下行数据)进入交换引擎模块303;CPU模块304进来的数据包进入交换引擎模块303;交换引擎模块303对进来的包进行查地址表306的操作,从而获得包的导向信息;如果进入交换引擎模块303的包是下行网络接口往上行网络接口去的,则结合流标识符(stream-id)把该包存入对应的包缓存器307的队列;如果该包缓存器307的队列接近满,则丢弃;如果进入交换引擎模块303的包不是下行网络接口往上行网络接口去的,则根据包的导向信息,把该数据包存入对应的包缓存器 307的队列;如果该包缓存器307的队列接近满,则丢弃。Wherein, the packet (upstream data) that the downstream network interface module 301 comes in enters the packet detection module 305; Whether the destination address (DA), source address (SA), data packet type and packet length of the packet detection module 305 detection packet meet the requirements, if Meet, then distribute corresponding flow identifier (stream-id), and enter switching engine module 303, otherwise discard; The packet (downstream data) that upstream network interface module 302 comes in enters switching engine module 303; The data packet that CPU module 304 comes in Enter switching engine module 303; Switching engine module 303 carries out the operation of looking into address table 306 to the bag that comes in, thereby obtains the guiding information of packet; If the bag that enters switching engine module 303 is that downlink network interface goes to uplink network interface, then combines Flow identifier (stream-id) stores this packet into the queue of corresponding packet cache 307; If the queue of this packet cache 307 is close to full, then discards; If the packet that enters switching engine module 303 is not downlink network interface, goes up If the data packet is sent to the network interface, the data packet is stored in the queue of the corresponding packet buffer 307 according to the direction information of the packet; if the queue of the packet buffer 307 is nearly full, it is discarded.

交换引擎模块303轮询所有包缓存器队列,可以包括两种情形:The switching engine module 303 polls all packet buffer queues, which can include two situations:

如果该队列是下行网络接口往上行网络接口去的,则满足以下条件进行转发:1)该端口发送缓存未满;2)该队列包计数器大于零;3)获得码率控制模块产生的令牌;If the queue goes from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the port sending buffer is not full; 2) the queue packet counter is greater than zero; 3) the token generated by the code rate control module is obtained ;

如果该队列不是下行网络接口往上行网络接口去的,则满足以下条件进行转发:1)该端口发送缓存未满;2)该队列包计数器大于零。If the queue does not go from the downlink network interface to the uplink network interface, the following conditions are met for forwarding: 1) the sending buffer of the port is not full; 2) the packet counter of the queue is greater than zero.

码率控制模块308是由CPU模块304来配置的,在可编程的间隔内对所有下行网络接口往上行网络接口去的包缓存器队列产生令牌,用以控制上行转发的码率。The code rate control module 308 is configured by the CPU module 304 to generate tokens for all packet buffer queues going from the downlink network interface to the uplink network interface within a programmable interval to control the uplink forwarding code rate.

CPU模块304主要负责与节点服务器之间的协议处理,对地址表306 的配置,以及,对码率控制模块308的配置。The CPU module 304 is mainly responsible for protocol processing with the node server, configuration of the address table 306 , and configuration of the code rate control module 308 .

以太网协转网关 Ethernet protocol conversion gateway :

如图4所示,主要包括网络接口模块(下行网络接口模块401、上行网络接口模块402)、交换引擎模块403、CPU模块404、包检测模块405、码率控制模块408、地址表406、包缓存器407和MAC添加模块409、 MAC删除模块410。As shown in Figure 4, it mainly includes network interface modules (downlink network interface module 401, uplink network interface module 402), switching engine module 403, CPU module 404, packet detection module 405, code rate control module 408, address table 406, packet Buffer 407, MAC adding module 409, and MAC deleting module 410.

其中,下行网络接口模块401进来的数据包进入包检测模块405;包检测模块405检测数据包的以太网MAC DA、以太网MAC SA、以太网 length or frame type、视联网目地地址DA、视联网源地址SA、视联网数据包类型及包长度是否符合要求,如果符合则分配相应的流标识符 (stream-id);然后,由MAC删除模块410减去MAC DA、MAC SA、 length or frametype(2byte),并进入相应的接收缓存,否则丢弃;Wherein, the data packet coming in from the downlink network interface module 401 enters the packet detection module 405; the packet detection module 405 detects the Ethernet MAC DA, Ethernet MAC SA, Ethernet length or frame type, visual networking destination address DA, visual networking Source address SA, depending on whether the network data packet type and the packet length meet the requirements, if it is met, the corresponding flow identifier (stream-id) is allocated; then, the MAC DA, MAC SA, length or frametype ( 2byte), and enter the corresponding receive buffer, otherwise discard;

下行网络接口模块401检测该端口的发送缓存,如果有包则根据包的视联网目地地址DA获知对应的终端的以太网MAC DA,添加终端的以太网MAC DA、以太网协转网关的MACSA、以太网length or frame type,并发送。The downlink network interface module 401 detects the sending buffer of the port, if there is a packet, the Ethernet MAC DA of the corresponding terminal is known according to the visual network destination address DA of the packet, and the Ethernet MAC DA of the terminal, the MACSA of the Ethernet protocol conversion gateway, and the MACSA of the Ethernet protocol conversion gateway are added. Ethernet length or frame type, and send.

以太网协转网关中其他模块的功能与接入交换机类似。The functions of other modules in the Ethernet protocol conversion gateway are similar to those of the access switch.

终端:terminal:

主要包括网络接口模块、业务处理模块和CPU模块;例如,机顶盒主要包括网络接口模块、视音频编解码引擎模块、CPU模块;编码板主要包括网络接口模块、视音频编码引擎模块、CPU模块;存储器主要包括网络接口模块、CPU模块和磁盘阵列模块。It mainly includes a network interface module, a business processing module and a CPU module; for example, a set-top box mainly includes a network interface module, an video and audio codec engine module, and a CPU module; an encoding board mainly includes a network interface module, an video and audio encoding engine module, and a CPU module; It mainly includes network interface module, CPU module and disk array module.

1.3城域网部分的设备主要可以分为2类:节点服务器,节点交换机,城域服务器。其中,节点交换机主要包括网络接口模块、交换引擎模块和CPU模块;城域服务器主要包括网络接口模块、交换引擎模块和CPU 模块构成。1.3 The equipment of the metropolitan area network can be mainly divided into two categories: node server, node switch, and metropolitan area server. Among them, the node switch mainly includes a network interface module, a switching engine module and a CPU module; the metro server mainly includes a network interface module, a switching engine module and a CPU module.

2、视联网数据包定义2. Definition of Internet of Things data package

2.1接入网数据包定义2.1 Definition of access network data packet

接入网的数据包主要包括以下几部分:目的地址(DA)、源地址(SA)、保留字节、payload(PDU)、CRC。The data packet of the access network mainly includes the following parts: destination address (DA), source address (SA), reserved bytes, payload (PDU), and CRC.

如下表所示,接入网的数据包主要包括以下几部分:As shown in the table below, the data packets of the access network mainly include the following parts:

DADA SASA ReservedReserved PayloadPayload CRC CRC

其中:in:

目的地址(DA)由8个字节(byte)组成,第一个字节表示数据包的类型(例如各种协议包、组播数据包、单播数据包等),最多有256 种可能,第二字节到第六字节为城域网地址,第七、第八字节为接入网地址;The destination address (DA) consists of 8 bytes (byte), the first byte indicates the type of data packet (such as various protocol packets, multicast data packets, unicast data packets, etc.), there are up to 256 possibilities, The second byte to the sixth byte is the address of the metropolitan area network, and the seventh and eighth bytes are the address of the access network;

源地址(SA)也是由8个字节(byte)组成,定义与目的地址(DA) 相同;The source address (SA) is also composed of 8 bytes (byte), and the definition is the same as that of the destination address (DA);

保留字节由2个字节组成;Reserved bytes consist of 2 bytes;

payload部分根据不同的数据报的类型有不同的长度,如果是各种协议包的话是64个字节,如果是单组播数据包话是32+1024=1056个字节,当然并不仅仅限于以上2种;The payload part has different lengths according to different datagram types. If it is a variety of protocol packets, it is 64 bytes. If it is a single multicast data packet, it is 32+1024=1056 bytes. Of course, it is not limited to Above 2 types;

CRC有4个字节组成,其计算方法遵循标准的以太网CRC算法。CRC consists of 4 bytes, and its calculation method follows the standard Ethernet CRC algorithm.

2.2城域网数据包定义2.2 MAN packet definition

城域网的拓扑是图型,两个设备之间可能有2种、甚至2种以上的连接,即节点交换机和节点服务器、节点交换机和节点交换机、节点交换机和节点服务器之间都可能超过2种连接。但是,城域网设备的城域网地址却是唯一的,为了精确描述城域网设备之间的连接关系,在本申请实施例中引入参数:标签,来唯一描述一个城域网设备。The topology of the metropolitan area network is a graph. There may be two or even more than two types of connections between two devices, that is, there may be more than 2 connections between node switches and node servers, node switches and node switches, and node switches and node servers. kind of connection. However, the MAN address of the MAN device is unique. In order to accurately describe the connection relationship between MAN devices, a parameter: label is introduced in the embodiment of this application to uniquely describe a MAN device.

本说明书中标签的定义和MPLS(Multi-Protocol Label Switch,多协议标签交换)的标签的定义类似,假设设备A和设备B之间有两个连接,那么数据包从设备A到设备B就有2个标签,数据包从设备B到设备A 也有2个标签。标签分入标签、出标签,假设数据包进入设备A的标签 (入标签)是0x0000,这个数据包离开设备A时的标签(出标签)可能就变成了0x0001。城域网的入网流程是集中控制下的入网过程,也就意味着城域网的地址分配、标签分配都是由城域服务器主导的,节点交换机、节点服务器都是被动的执行而已,这一点与MPLS的标签分配是不同的,MPLS的标签分配是交换机、服务器互相协商的结果。The definition of labels in this manual is similar to the definition of labels in MPLS (Multi-Protocol Label Switch, Multi-Protocol Label Switching). Assuming that there are two connections between device A and device B, then the data packets from device A to device B have 2 labels, the packet from device B to device A also has 2 labels. The label is divided into an incoming label and an outgoing label. Assuming that the label (incoming label) of the data packet entering device A is 0x0000, the label (outgoing label) of the data packet when it leaves device A may become 0x0001. The network access process of the metropolitan area network is a network access process under centralized control, which means that the address allocation and label allocation of the metropolitan area network are all dominated by the metropolitan area server, and the node switches and node servers are all passively executed. Different from MPLS label allocation, MPLS label allocation is the result of mutual negotiation between switches and servers.

如下表所示,城域网的数据包主要包括以下几部分:As shown in the table below, the data packet of the MAN mainly includes the following parts:

DADA SASA ReservedReserved 标签Label PayloadPayload CRC CRC

即目的地址(DA)、源地址(SA)、保留字节(Reserved)、标签、payload(PDU)、CRC。其中,标签的格式可以参考如下定义:标签是32bit,其中高16bit保留,只用低16bit,它的位置是在数据包的保留字节和payload之间。That is, destination address (DA), source address (SA), reserved byte (Reserved), label, payload (PDU), and CRC. Among them, the format of the label can refer to the following definition: the label is 32bit, of which the high 16bit is reserved, and only the low 16bit is used, and its position is between the reserved byte of the data packet and the payload.

基于上述视联网的特性,基于视联网进行的视频会议越来越多,其视频会议的应用场景也越来越多样。因此,为了保证在不同的应用场景下进行的视频会议的会议质量,需要保证视频会议中的音频通话质量的稳定性。Based on the above-mentioned characteristics of the Internet of Vision, there are more and more video conferencing based on the Internet of Vision, and the application scenarios of the video conference are also becoming more and more diverse. Therefore, in order to ensure the conference quality of the video conference conducted in different application scenarios, it is necessary to ensure the stability of the audio call quality in the video conference.

例如,一种应用场景是对无人机进行指挥的视频会议场景,在该视频会议中,视联网中的视联网终端会和互联网中的移动终端进行音视频通信,其中,互联网中的移动终端上安装有针对该应用场景的应用程序服务,该应用程序服务可以为用户提供无人机控制、与视联网终端进行音视频通信的本地服务。For example, one application scenario is a video conferencing scenario in which drones are commanded. In this video conference, the Internet of Vision terminals in the Internet of Vision will perform audio and video communications with mobile terminals in the Internet, wherein the mobile terminals in the Internet An application service for this application scenario is installed on the platform, which can provide users with local services for drone control and audio and video communication with Internet-of-Vision terminals.

在此种应用场景下,为保证常常处于野外的移动终端的音频通话质量,需要避免该移动终端发出的音频中有回声。目前的方式是将该应用程序服务对接到第三方的回声抑制工具,使得可以通过设置时间间隔来避免回声的产生。但是,对接第三方的回声抑制工具时,就得用c语言把第三方的库封装为移动终端能调用的so库(动态链接库),通过jni(JavaNative Interface) 接口调用该so库,过程非常繁琐。不仅增加了开发人员的开发工作量,且回声抑制的效果并不好,在通话进行一段时间后,便会出现嗡嗡声,影响通话质量。In such an application scenario, in order to ensure the audio call quality of a mobile terminal that is often in the wild, it is necessary to avoid echoes in the audio sent by the mobile terminal. The current method is to connect the application service to a third-party echo suppression tool, so that the generation of echo can be avoided by setting a time interval. However, when connecting to a third-party echo suppression tool, the third-party library must be packaged in C language as a so library (dynamic link library) that can be called by the mobile terminal, and the so library is called through the jni (JavaNative Interface) interface. The process is very cumbersome. It not only increases the development workload of developers, but also the effect of echo suppression is not good. After a period of time in the call, there will be a buzzing sound, which will affect the call quality.

基于此,本申请人在综合考虑视联网特性和上述应用程序对象的基础上,为了提高上述应用场景下移动终端侧的音频通话质量,构思了本申请的技术构思之一,以至少解决上述技术问题中存在的通话进行一段时间后,便会出现嗡嗡声的问题。具体地,在应用程序对象检测到音视频通话业务开启时,便触发移动终端上预设的音频采集模式,以使得移动终端在该采集模式下采集第二音频数据,并根据第一音频数据对第二音频数据进行回声消除,从而可以在第二音频数据中对播放第一音频数据所产生的回声进行消除。避免了调用第三方的回声抑制库,通过设置时间间隔以达到人耳无法分辨回声所带来的对接第三方过程繁琐、随着时间的积累回声被叠加从而降低通话质量的问题。Based on this, the applicant conceived one of the technical ideas of the present application to at least solve the above-mentioned technical concepts in order to improve the audio call quality on the mobile terminal side in the above-mentioned application scenarios, on the basis of comprehensively considering the characteristics of the Internet of Things and the above-mentioned application objects. The humming problem occurs after the call in question has been going on for a while. Specifically, when the application object detects that the audio and video call service is started, it triggers the preset audio collection mode on the mobile terminal, so that the mobile terminal collects the second audio data in this collection mode, and according to the first audio data pair The echo cancellation is performed on the second audio data, so that the echo generated by playing the first audio data can be canceled in the second audio data. It avoids the problem of calling the third-party echo suppression library and setting the time interval so that the human ear cannot distinguish the echo, which is cumbersome to connect to the third party, and the echo is superimposed over time to reduce the call quality.

参考图5,示出了本申请实施例的一种基于视联网的音频处理方法的应用场景中,在该应用场景中,视联网终端与互联网终端进行视联网视频会议,其中,互联网终端通过遥控手柄建立与无人机的通信连接,具体地,遥控手柄向无人机发出控制指令,以控制无人机的飞行高度和航向,其中,无人机所拍摄的图片及飞行数据会传回给互联网终端,进而,互联网终端可以将图片及飞行数据通过视联网传回给视联网终端。Referring to FIG. 5 , it shows an application scenario of an audio processing method based on the Internet of Vision according to an embodiment of the present application. In this application scenario, an Internet-of-Vision terminal and an Internet terminal conduct an Internet-of-Vision video conference, wherein the Internet terminal remotely The handle establishes a communication connection with the UAV. Specifically, the remote control handle sends control instructions to the UAV to control the flight altitude and heading of the UAV. Among them, the pictures and flight data taken by the UAV will be sent back to the UAV. The Internet terminal, and furthermore, the Internet terminal can send pictures and flight data back to the Internet of Vision terminal through the Internet of Vision.

其中,在所述视联网中部署有视联网终端,该视联网终端与流媒体服务器通信连接,流媒体服务器与移动终端通信连接。移动终端、无人机和遥控手柄均可以部署在互联网中,流媒体服务器可以分别与视联网中的视联网终端通信、以及与互联网中的移动终端通信。Wherein, a video networking terminal is deployed in the video networking, the video networking terminal is communicatively connected to a streaming media server, and the streaming media server is communicatively connected to a mobile terminal. Mobile terminals, unmanned aerial vehicles and remote controllers can all be deployed in the Internet, and the streaming media server can communicate with the Internet of Vision terminals in the Internet of Vision and communicate with mobile terminals in the Internet of Vision respectively.

其中,在所述移动终端上配置有音频播放组件,其中,音频播放组件可以是但是不限于下述音频播放器:mp3、mp4等。在该移动终端中还设置有应用程序对象,该应用程序对象可以与流媒体服务器相对应,应用程序对象可以为用户提供无人机的数据分析、保存、传输以及视联网音视频会议等本地服务,则流媒体服务器可以为应用程序对象进行的上述服务提供后台服务,如提供数据的转发服务。Wherein, an audio playback component is configured on the mobile terminal, wherein the audio playback component may be but not limited to the following audio players: mp3, mp4, etc. An application object is also set in the mobile terminal, and the application object can correspond to the streaming media server, and the application object can provide users with local services such as data analysis, storage, and transmission of drones, as well as video-network audio and video conferences. , the streaming media server can provide background services for the above services performed by the application object, such as providing data forwarding services.

本申请实施例中,移动终端可以为安装安卓系统的终端,如安卓手机、安卓平板电脑等。In the embodiment of the present application, the mobile terminal may be a terminal installed with an Android system, such as an Android mobile phone, an Android tablet computer, and the like.

参照图6所示,示出了本申请实施例的一种基于视联网的音频处理方法的步骤流程图,所述方法可以应用于所述移动终端中设置的所述应用程序对象中,如图6所示,具体可以包括以下步骤:Referring to FIG. 6 , it shows a flow chart of steps of an audio processing method based on the Internet of Things according to an embodiment of the present application. The method can be applied to the application object set in the mobile terminal, as shown in FIG. 6, specifically the following steps may be included:

步骤S601,在检测到预设的音视频通话业务被开启时,触发所述移动终端上预设的音频采集模式。Step S601, triggering a preset audio collection mode on the mobile terminal when it is detected that a preset audio and video call service is enabled.

本申请实施例中,预设的音视频通话业务可以是指该音视频通话业务预先配置在该应用程序对象中,作为该应用程序对象所提供的多个服务中的一个服务。例如,应用程序对象可以提供无人机数据分析、音视频通话、监控回放等服务。In the embodiment of the present application, the preset audio and video call service may refer to that the audio and video call service is preconfigured in the application program object as one of multiple services provided by the application program object. For example, application objects can provide services such as drone data analysis, audio and video calls, and surveillance playback.

具体实施时,应用程序对象可以根据用户对音视频通话业务的操作去检测该音频通话业务是否被开启,实际中,用户对音视频通话业务进行了启动操作,则应用程序对象可以检测到该启动操作,进而根据该启动操作,去触发移动终端上预设的音频采集模式。During specific implementation, the application program object can detect whether the audio and video call service is enabled according to the user's operation on the audio and video call service. operation, and then according to the start operation, to trigger the preset audio collection mode on the mobile terminal.

其中,预设的音频采集模式可以是指:在移动终端上预设的与该应用程序对象上的音视频通话业务相对应的音频采集模式。该音频采集模式可以控制移动终端采集音频数据、对音频数据进行预处理的方式。实际中,在音视频通话业务启动时,应用程序对象便可以触发该音频采集模式,以使得移动终端可以在该音频采集模式下采集音频数据。本申请实施例中的触发可以是指启动,即,启动该预设的音频采集模式。Wherein, the preset audio collection mode may refer to: an audio collection mode preset on the mobile terminal corresponding to the audio and video call service on the application program object. The audio collection mode can control the manner in which the mobile terminal collects audio data and preprocesses the audio data. In practice, when the audio and video call service is started, the application program object can trigger the audio collection mode, so that the mobile terminal can collect audio data in the audio collection mode. Triggering in this embodiment of the present application may refer to starting, that is, starting the preset audio collection mode.

实际中,在该移动终端中可以配置有多个原始的音频采集模式,本实施例中,可以根据移动终端在各原始的音频采集模式下所各自采集的音频数据中,将环境噪声最少的音频数据所对应的音频采集模式确定为上述预设的音频采集模式。In practice, multiple original audio collection modes can be configured in the mobile terminal. In this embodiment, the audio with the least environmental noise can be selected according to the audio data collected by the mobile terminal in each original audio collection mode. The audio collection mode corresponding to the data is determined to be the above preset audio collection mode.

示例地,以移动终端为安卓手机设备为例,在该安卓手机设备上包括有 9种原始的音频采集模式,分别为:Exemplarily, taking the mobile terminal as an example of an Android phone device, there are 9 original audio collection modes on the Android phone device, which are:

AudioSource(音频源).DEFAULT默认模式;AudioSource (audio source).DEFAULT default mode;

AudioSource.MIC麦克风模式;AudioSource.MIC microphone mode;

AudioSource.VOICE_UPLINK电话上行模式;AudioSource.VOICE_UPLINK telephone uplink mode;

AudioSource.VOICE_DOWNLINK电话下行模式;AudioSource.VOICE_DOWNLINK phone downlink mode;

AudioSource.VOICE_CALL电话上行+下行模式;AudioSource.VOICE_CALL phone uplink + downlink mode;

AudioSource.CAMCORDER摄像机模式;AudioSource.CAMCORDER camera mode;

AudioSource.VOICE_RECOGNITION语音识别模式;AudioSource.VOICE_RECOGNITION voice recognition mode;

AudioSource.VOICE_COMMUNICATION语音通信模式,例如VoIP (Voice overInternet Protocol,基于IP的语音传输)模式;AudioSource.VOICE_COMMUNICATION voice communication mode, such as VoIP (Voice over Internet Protocol, IP-based voice transmission) mode;

AudioSource.REMOTE_SUBMIX远程声音模式,例如wifi display(无线显示)模式。AudioSource.REMOTE_SUBMIX remote sound mode, such as wifi display (wireless display) mode.

则经过实际测试,确定到在语音通信模式下移动终端所采集的音频数据中包含的噪声最少,则可以将该语音通信模式确定为预设的音频采集模式,从而在该安卓手机中设置的应用程序对象检测到音视频通话业务开启时,则可以触发该语音通信模式。Then, through actual testing, it is determined that the audio data collected by the mobile terminal in the voice communication mode contains the least noise, then the voice communication mode can be determined as the preset audio collection mode, so that the application set in the Android phone When the program object detects that the audio and video call service is enabled, the voice communication mode may be triggered.

步骤S602,接收所述流媒体服务器在所述音视频通话业务中发送的第一音频数据,并调用所述音频播放组件,对所述第一音频数据进行播放。Step S602, receiving the first audio data sent by the streaming media server in the audio and video call service, and calling the audio playing component to play the first audio data.

其中,所述第一音频数据由所述视联网终端发送给所述流媒体服务器。Wherein, the first audio data is sent by the Internet-of-TV terminal to the streaming media server.

本实施例中,第一音频数据为由视联网终端采集的数据,并被视联网终端通过视联网发送给流媒体服务器,再由流媒体服务器发送给移动终端上的应用程序对象。In this embodiment, the first audio data is data collected by the Internet-of-Vision terminal, and is sent by the Internet-of-Vision terminal to the streaming media server through the Internet of Vision, and then sent to the application program object on the mobile terminal by the streaming media server.

具体实施时,移动终端可以对调用音频播放组件对该第一音频数据进行解码播放,同时可以对该第一音频数据进行缓存,以便在后续可以根据该缓存的第一音频数据,对移动终端新采集的音频数据进行回声消除。During specific implementation, the mobile terminal can decode and play the first audio data by invoking the audio playback component, and can cache the first audio data at the same time, so that the mobile terminal can be updated later based on the cached first audio data. The collected audio data is echo-cancelled.

步骤S603,获取由所述移动终端在所述音频采集模式下采集的第二音频数据。Step S603, acquiring second audio data collected by the mobile terminal in the audio collection mode.

实际中,在播放完第一音频数据后,应用程序对象便开始获取移动终端在音频采集模式下采集的第二音频数据。由于在播放第一音频数据的过程中,播放组件发出的声音被周围环境所反射形成回声,由于该回声相比直接传播的声音所经过的路程更长,因而在播放完第一音频数据后,该由第一音频数据所产生的回声会与用户的说话声一起被移动终端所采集到。因此,在该第二音频数据中则包括了环境对播放的第一音频数据进行反射的回声音频数据。In practice, after playing the first audio data, the application program object starts to acquire the second audio data collected by the mobile terminal in the audio collection mode. Because in the process of playing the first audio data, the sound emitted by the playback component is reflected by the surrounding environment to form an echo. Since the echo travels a longer distance than the directly transmitted sound, after the first audio data is played, The echo generated by the first audio data will be collected by the mobile terminal together with the user's speech. Therefore, the second audio data includes echo audio data in which the environment reflects the played first audio data.

本实施例中,由于在预设的音频采集模式下采集的音频数据中包括的环境噪声可以最少,因此,该预设的音频采集模式下采集的第二音频数据中包括的环境噪声也较少,进而提高了第二音频数据的音频质量。具体地,环境噪声是指移动终端所处的周围环境中产生的噪声,而回声可以是环境噪声中的一种,当在该预设的音频采集模式下采集的第二音频数据中包括的环境噪声较少时,表征其中的回声也可以较少。因此,可以提高采集的第二音频数据的质量,以得到较好的回声消除效果。In this embodiment, because the environmental noise included in the audio data collected in the preset audio collection mode can be the least, therefore, the environmental noise included in the second audio data collected in the preset audio collection mode is also less , thereby improving the audio quality of the second audio data. Specifically, the environmental noise refers to the noise generated in the surrounding environment where the mobile terminal is located, and the echo can be one of the environmental noises. When the environmental noise included in the second audio data collected in the preset audio collection mode When there is less noise, there can also be fewer echoes that characterize it. Therefore, the quality of the collected second audio data can be improved to obtain a better echo cancellation effect.

步骤S604,根据所述第一音频数据及所述第二音频数据,对所述第二音频数据进行回声消除处理,得到回声消除处理后的目标音频数据。Step S604: Perform echo cancellation processing on the second audio data according to the first audio data and the second audio data, to obtain target audio data after echo cancellation processing.

本实施例中,在获得到第二音频数据时,则可以根据缓存的第一音频数据,对第二音频数据中包括的回声音频数据进行消除,以得到消除回声音频数据后的目标音频数据。其中,该回声音频数据即为移动终端所采集的由环境对播放中的第一音频数据进行反射而产生的音频数据。In this embodiment, when the second audio data is obtained, the echo audio data included in the second audio data may be eliminated according to the buffered first audio data, so as to obtain the target audio data after the echo audio data is eliminated. Wherein, the echo audio data is the audio data collected by the mobile terminal and generated by the environment reflecting the first audio data being played.

这样,在得到的目标音频数据中便不包括回声音频数据,由此,实现了在音视频通话业务中,在移动终端侧的回声消除处理。In this way, the echo audio data is not included in the obtained target audio data, thereby realizing the echo cancellation processing on the mobile terminal side in the audio and video call service.

步骤S605,将所述目标音频数据发送给所述流媒体服务器,所述流媒体服务器用于将所述目标音频数据发送给所述视联网终端。Step S605, sending the target audio data to the streaming media server, and the streaming media server is used to send the target audio data to the Internet-of-Video terminal.

在得到目标音频数据后,则应用程序对象可以将该目标音频数据发送给流媒体服务器,以使流媒体服务器通过视联网将该目标音频数据发送给视联网终端。由于目标音频数据中不包括回声音频数据,则视联网终端在播放该目标音频数据时,视联网终端的用户听到的便是清晰的移动终端的用户的声音,而不会听到自己前次发出的语音,从而提高了通话质量。After obtaining the target audio data, the application program object can send the target audio data to the streaming media server, so that the streaming media server can send the target audio data to the video networking terminal through the video network. Since echo audio data is not included in the target audio data, when the video network terminal plays the target audio data, what the user of the video network terminal hears is the clear voice of the user of the mobile terminal, and will not hear his last voice, thereby improving call quality.

本申请实施例中,由于在该预设的音频采集模式下采集的第二音频数据中包括的环境噪声较少,表征其中的回声也较少。因此,本申请可以提高了采集的第二音频数据的质量。又由于是根据第一音频数据对该第二音频数据进行回声消除处理,使得在得到的目标音频数据中不包括回声音频数据,提高了传送出去的目标音频数据的音频质量。相比于设置时间间隔的方式,本申请发送出去的目标音频数据本身并不携带回声音频数据,因此,可以保证通话质量,实现清晰通话。避免了时间间隔设置不合理所带来的人耳仍能听到回声,以及随着时间的推移,在传送出去的音频数据中回声累积越来越多以产生嗡嗡声的问题。In the embodiment of the present application, since the second audio data collected in the preset audio collection mode contains less environmental noise, less echoes are represented therein. Therefore, the present application can improve the quality of the collected second audio data. Furthermore, since the echo cancellation process is performed on the second audio data according to the first audio data, the obtained target audio data does not include echo audio data, thereby improving the audio quality of the transmitted target audio data. Compared with the way of setting the time interval, the target audio data sent by this application does not carry the echo audio data itself, so the call quality can be guaranteed and a clear call can be realized. It avoids the problem that the human ear can still hear the echo caused by the unreasonable time interval setting, and as time goes by, the echo accumulates more and more in the transmitted audio data to generate a buzzing sound.

结合上述实施例,在一种可选的实例中,在所述移动终端上配置有第一麦克风及第二麦克风,则在步骤S601中,触发所述移动终端上预设的音频采集模式,具体包括以下步骤:In combination with the above-mentioned embodiments, in an optional example, the first microphone and the second microphone are configured on the mobile terminal, then in step S601, the preset audio collection mode on the mobile terminal is triggered, specifically Include the following steps:

步骤S6011,调用所述第一麦克风及所述第二麦克风。Step S6011, call the first microphone and the second microphone.

实际中,第一麦克风和第二麦克风可以配置在移动终端的不同位置,可选地,第一麦克风可以配置在移动终端的底部,第二麦克风可以配置在移动终端的顶端。有上述步骤S601对音频采集模式的描述可知,在移动终端中可以设置多个原始的音频采集模式,实际中,每一种音频采集模式所调用的麦克风并不相同,对麦克风采集到的音频数据进行预处理的方式也可以不相同。In practice, the first microphone and the second microphone can be arranged at different positions of the mobile terminal. Optionally, the first microphone can be arranged at the bottom of the mobile terminal, and the second microphone can be arranged at the top of the mobile terminal. According to the description of the audio collection mode in the above step S601, it can be seen that multiple original audio collection modes can be set in the mobile terminal. In practice, the microphones used by each audio collection mode are different. The audio data collected by the microphone The way of preprocessing can also be different.

本申请实施例中,预设的音频采集模式可以对应于同时调用第一麦克风和第二麦克风。即,预设的音频采集模式触发时,应用程序对象可以同时调用第一麦克风和第二麦克风,以利用该第一麦克风和第二麦克风进行音频数据的采集。In this embodiment of the present application, the preset audio collection mode may correspond to calling the first microphone and the second microphone simultaneously. That is, when the preset audio collection mode is triggered, the application program object can call the first microphone and the second microphone at the same time, so as to use the first microphone and the second microphone to collect audio data.

相应地,步骤S603具体可以包括以下步骤:Correspondingly, step S603 may specifically include the following steps:

步骤S6031,获取由所述第一麦克风采集的第一麦克风音频数据,以及由所述第二麦克风采集的第二麦克风音频数据。Step S6031, acquiring the first microphone audio data collected by the first microphone and the second microphone audio data collected by the second microphone.

实际中,在调用第一麦克风和第二麦克风后,第一麦克风和第二麦克风可以同时对音频数据进行采集。In practice, after calling the first microphone and the second microphone, the first microphone and the second microphone can simultaneously collect audio data.

由于第一麦克风和第二麦克风位于移动终端上的不同位置,则二者采集的音频数据具有差异。具体地,由于在通话时顶部的第二麦克风和底部的第一麦克风距离用户的距离不同,因此,第一麦克风音频数据中包括的用户语音的音量和第二麦克风音频数据中包括的用户的语音的音量大小是不同的,而两个麦克风所拾取的背景噪声音量是基本相同的,因此可以利用上述差别,过滤掉噪声保留人声。Since the first microphone and the second microphone are located at different positions on the mobile terminal, the audio data collected by the two are different. Specifically, since the second microphone at the top and the first microphone at the bottom are at different distances from the user during a call, the volume of the user's voice included in the first microphone audio data and the user's voice included in the second microphone audio data The volume of the microphone is different, and the volume of the background noise picked up by the two microphones is basically the same, so the above-mentioned difference can be used to filter out the noise and preserve the human voice.

步骤S6032,根据所述第一麦克风音频数据及所述第二麦克风音频数据,对所述第二麦克风音频数据进行降噪处理,得到第二音频数据。Step S6032: Perform noise reduction processing on the second microphone audio data according to the first microphone audio data and the second microphone audio data to obtain second audio data.

本申请实施例中,在预设的音频采集模式下,可以对第二麦克风采集的的第二麦克风音频数据进行降噪处理。具体地,可以对第一麦克风音频数据和第二麦克风音频数据进行解码生成补偿信号,进而根据该补偿信号对第二麦克风音频数据进程降噪处理,从而可以去除该第二麦克风音频数中的环境噪音,从而得到降噪处理后的第二音频数据。In the embodiment of the present application, in the preset audio collection mode, noise reduction processing may be performed on the second microphone audio data collected by the second microphone. Specifically, the first microphone audio data and the second microphone audio data can be decoded to generate a compensation signal, and then the second microphone audio data can be denoised according to the compensation signal, so that the ambient noise in the second microphone audio data can be removed. noise, so as to obtain the second audio data after noise reduction processing.

结合上述实施例,在一种可选的实例中,步骤S604具体可以包括以下步骤:With reference to the above embodiments, in an optional example, step S604 may specifically include the following steps:

步骤S6041,在所述第二音频数据中,确定与所述第一音频数据对应的第三音频数据。Step S6041, in the second audio data, determine third audio data corresponding to the first audio data.

本可选示例中,在对第二音频数据进行回声消除处理时,由于第二音频数据中包括的回声音频数据是由播放第一音频数据所反射的回声,进而该回声音频数据是与第一音频数据相关的音频数据。如,回声音频数据与第一音频数据都是来自同一用户的声音,虽然经过反射,但是其声学特征是一样的。因此,可以基于语音识别技术,从第二音频数据中找出与第一音频数据的声学特征的匹配度大于预设匹配度的第三音频数据,这样确定出的第三音频数据便是播放第一音频数据所反射的回声音频数据。其中,声学特征可以是音频数据的频率特征或频谱特征。In this optional example, when performing echo cancellation processing on the second audio data, since the echo audio data included in the second audio data is the echo reflected by playing the first audio data, the echo audio data is consistent with the first audio data. audio data related audio data. For example, the echo audio data and the first audio data both come from the same user's voice, and although they are reflected, their acoustic features are the same. Therefore, based on speech recognition technology, the third audio data whose matching degree with the acoustic features of the first audio data is greater than the preset matching degree can be found out from the second audio data, and the third audio data determined in this way is to play the first audio data. Echo audio data reflected by an audio data. Wherein, the acoustic feature may be a frequency feature or a spectral feature of the audio data.

步骤S6042,将所述第三音频数据从所述第二音频数据中滤除,得到滤除所述第三音频数据的目标音频数据。Step S6042, filtering out the third audio data from the second audio data to obtain target audio data in which the third audio data is filtered out.

实际中,在确定第三音频数据后,便可以从第二音频数据中将第三音频数据进行剔除,以得到目标音频数据。In practice, after the third audio data is determined, the third audio data may be removed from the second audio data to obtain target audio data.

相应地,在一种可选示例中,在触发所述移动终端上预设的音频采集模式的同时,所述方法还具体可以包括以下步骤:Correspondingly, in an optional example, while triggering the preset audio collection mode on the mobile terminal, the method may specifically include the following steps:

步骤S6012,调用所述移动终端中设置的自适应滤波器。Step S6012, calling the adaptive filter set in the mobile terminal.

本可选示例中,为了提高对第二音频数据进行回声消除处理的效率,减少应用程序对象的底层开发量,应用程序对象可以调用移动终端中配置的自适应滤波器,以使该自适应滤波器对第二音频数据进行回声消除处理。In this optional example, in order to improve the efficiency of echo cancellation processing on the second audio data and reduce the underlying development amount of the application program object, the application program object can call the adaptive filter configured in the mobile terminal, so that the adaptive filter The device performs echo cancellation processing on the second audio data.

可选地,步骤S6012具体可以包括以下步骤:Optionally, step S6012 may specifically include the following steps:

步骤S6012-1,确定所述视联网终端上与所述应用程序对象适配的至少一个应用程序接口,并确定所述至少一个应用程序接口中是否存在目标接口。Step S6012-1: Determine at least one application program interface on the Internet of Vision terminal that is compatible with the application program object, and determine whether there is a target interface in the at least one application program interface.

实际中,基于不同的底层开发软件可以得到不同的自适应滤波器,不同的自适应滤波器进行回声消除处理的效率和质量也可以不同,且不同的自适应滤波器对应不同的应用程序接口。In practice, different adaptive filters can be obtained based on different underlying development software, and the echo cancellation efficiency and quality of different adaptive filters can also be different, and different adaptive filters correspond to different APIs.

本申请实施例中,可以将上述不同的自适应滤波器各自对应的应用程序接口与应用程序对象建立调用关系,确定所述视联网终端上与所述应用程序对象适配的应用程序接口,即是可以确定在所述视联网终端上与应用程序对象具有调用关系的至少一个应用程序接口。这样,应用程序对象在对第二音频数据进行回声消除处理时,可以通过调用与自己建立调用关系的应用程序接口,便可以成功调用移动终端配置的自适应滤波器,相比于借助第三方的回声抑制工具的情况,简化了调用过程,提高了效率。In the embodiment of the present application, the application program interface corresponding to the above-mentioned different adaptive filters may establish a call relationship with the application program object, and determine the application program interface adapted to the application program object on the Internet of Vision terminal, that is It is at least one application program interface that can be determined to have a call relationship with the application program object on the Internet of Vision terminal. In this way, when the application program object performs echo cancellation processing on the second audio data, it can successfully call the adaptive filter configured by the mobile terminal by calling the application program interface that establishes a calling relationship with itself. In the case of the echo suppression tool, the calling process is simplified and the efficiency is improved.

其中,目标接口可以预先设定,具体地,每个应用程序接口都具有各自的标识,实际中,确定所述至少一个应用程序接口中是否存在目标接口,可以是在所述至少一个应用程序接口中,确定是否存在应用程序接口的标识与目标接口的标识相一致的应用程序接口。Wherein, the target interface may be preset. Specifically, each application program interface has its own identifier. In practice, determining whether the target interface exists in the at least one application program interface may be in the at least one application program interface. , determine whether there is an application program interface whose identifier of the application program interface is consistent with the identifier of the target interface.

在一种可选示例中,由于不同的自适应滤波器进行回声消除处理的效率和质量不同,则实际中所述至少一个应用程序接口都可以具有各自的优先级,优先级越高,应用程序接口所对应的自适应滤波器的器进行回声消除处理的效率和质量越好。则在本申请实施例中,目标接口可以是指优先级高于预设优先级的应用程序接口,即可以在至少一个应用程序接口确定是否存在优先级高于预设优先级的目标接口In an optional example, since different adaptive filters have different efficiency and quality of echo cancellation processing, in practice, the at least one application program interface may have its own priority, and the higher the priority, the application The better the efficiency and quality of echo cancellation processing performed by the adaptive filter corresponding to the interface. Then in this embodiment of the application, the target interface may refer to an application program interface with a priority higher than the preset priority, that is, it may be determined in at least one application program interface whether there is a target interface with a priority higher than the preset priority

具体实施时,以移动终端为安卓手机为例,该目标接口可以是AEC(AcousticEchoCanceler,声回波抵消器)接口。由于AEC可以非常快速的开发出回声消除程序,因此,基于该AEC所开发出的自适应滤波器可以快速地对音频数据进行回声消除,回声消除的质量较好,因而可以提高本申请的音频通话质量。During specific implementation, taking the mobile terminal as an Android phone as an example, the target interface may be an AEC (Acoustic Echo Canceller, acoustic echo canceller) interface. Because AEC can develop the echo cancellation program very quickly, therefore, the adaptive filter developed based on this AEC can quickly carry out echo cancellation to audio data, and the quality of echo cancellation is better, thereby can improve the audio communication of the present application quality.

其中,在确定存在所述目标接口时,则转步骤S6012-2,在确定不存在所述目标接口时,则转步骤S6012-3。Wherein, when it is determined that the target interface exists, go to step S6012-2, and when it is determined that the target interface does not exist, go to step S6012-3.

步骤S6012-2,通过所述目标接口调用与所述目标接口对应的自适应滤波器。Step S6012-2, call the adaptive filter corresponding to the target interface through the target interface.

实际中,在所述至少一个应用程序接口中存在所述目标接口时,则可以通过该目标接口调用该接口对应的自适应滤波器。In practice, when the target interface exists in the at least one application program interface, the adaptive filter corresponding to the interface may be invoked through the target interface.

例如,存在AEC接口,则通过该AEC接口调用该AEC接口对应的自适应滤波器。For example, if there is an AEC interface, the adaptive filter corresponding to the AEC interface is invoked through the AEC interface.

步骤S6012-3,通过预设的应用程序接口调用与所述预设的应用程序接口对应的自适应滤波器。Step S6012-3, call the adaptive filter corresponding to the preset application program interface through the preset application program interface.

本实施例中,预设的应用程序接口可以是指在所述至少一个应用程序接口中的一个备用的应用程序接口,实际中,可以预先在于所述应用程序对象建立调用关系的各个应用程序接口中,指定其中一个应用程序接口为备用的应用程序接口,在确定所述至少一个应用程序中不存在目标接口时,随即该应用程序对象可以调用该备用的应用程序接口。In this embodiment, the preset application program interface may refer to a spare application program interface among the at least one application program interface. In the process, one of the application program interfaces is designated as a standby application program interface, and when it is determined that the target interface does not exist in the at least one application program, then the application program object can call the standby application program interface.

如,以移动终端为安卓手机为例、目标接口为AEC接口为例,实际中,该AEC接口不一定与移动终端的机型适用,此种情况下,则可以将speex(回声消除算法)接口作为预设的应用程序接口,则可以通过该speex接口调用自适应滤波器。由于speex可以与移动终端的各个机型适用,适配范围广,因此,可以speex接口可以作为备用应用程序接口。For example, take the mobile terminal as an example of an Android phone and the target interface as an AEC interface. In practice, the AEC interface may not be applicable to the model of the mobile terminal. In this case, you can use the speex (echo cancellation algorithm) interface As a preset application program interface, the adaptive filter can be called through the speex interface. Since speex is applicable to various models of mobile terminals and has a wide range of adaptability, the speex interface can be used as a backup application program interface.

相应地,在一种可选示例中,由于每个应用程序接口可以具有各自的优先级,则该预设的应用程序接口的优先级与预设优先级相邻的下一优先级的级别相同。Correspondingly, in an optional example, since each application program interface may have its own priority, the priority of the preset application program interface is the same as the level of the next priority adjacent to the preset priority .

在成功调用自适应滤波器后,则可以利用自适应滤波器对第二音频数据进行回声消除处理,具体地,可以利用自适应滤波器分别执行上述步骤S6041 和步骤S6042,其中,步骤S6041具体可以包括以下步骤:After the adaptive filter is successfully invoked, the adaptive filter can be used to perform echo cancellation processing on the second audio data. Specifically, the adaptive filter can be used to perform the above steps S6041 and S6042 respectively, wherein step S6041 can specifically Include the following steps:

S60411,将所述第一音频数据输入所述自适应滤波器,得到由所述自适应滤波器输出的输出音频数据。S60411. Input the first audio data into the adaptive filter to obtain output audio data output by the adaptive filter.

由于在接收到第一音频数据时,对第一音频数据进行了缓存,则可以从缓存中提取该第一音频数据,将该第一音频数据输入至自适应滤波器,经自适应滤波器的处理后,便得到输出音频数据。Since the first audio data is cached when the first audio data is received, the first audio data can be extracted from the cache, the first audio data can be input to the adaptive filter, and the adaptive filter After processing, output audio data is obtained.

具体地,以AEC接口对应的自适应滤波器为例,假设第一音频数据为 x(n),将x(n)输入该自适应滤波器,自适应滤波器对输入信号序列x(n)的每一个样值,按特定的算法,更新、调整加权系数,使输出信号序列y(n)与期望输出信号序列d(n)相比较的均方误差为最小,即输出信号序列y(n)逼近期望信号序列d(n),y(n)越逼近d(n),则代表y(n)与x(n)越一致。其中,输出信号序列y(n)即为输出音频数据,以最小均方误差为准则设计的自适应滤波器的系数可以由维纳-霍甫夫方程解得。Specifically, taking the adaptive filter corresponding to the AEC interface as an example, assuming that the first audio data is x(n), and inputting x(n) into the adaptive filter, the adaptive filter performs an input signal sequence x(n) For each sample value of , update and adjust the weighting coefficient according to a specific algorithm, so that the mean square error of the output signal sequence y(n) compared with the expected output signal sequence d(n) is the smallest, that is, the output signal sequence y(n ) approximates the expected signal sequence d(n), and the closer y(n) is to d(n), the more consistent y(n) and x(n) are. Among them, the output signal sequence y(n) is the output audio data, and the coefficients of the adaptive filter designed based on the minimum mean square error can be obtained by solving the Wiener-Hopf equation.

S60412,在所述第二音频数据中,确定与所述输出音频数据的频率相同的第三音频数据。S60412. In the second audio data, determine third audio data having the same frequency as the output audio data.

其中,应用程序对象可以利用自适应滤波器在所述第二音频数据中,确定与所述输出音频数据的频率相同的第三音频数据。由于输出音频数据与 x(n)相一致,则第三音频数据与输出音频数据的频率相同时,则可以表示该第三音频数据是与第一音频数据相关的音频数据。Wherein, the application program object may use an adaptive filter to determine third audio data having the same frequency as the output audio data in the second audio data. Since the output audio data is consistent with x(n), when the frequency of the third audio data is the same as that of the output audio data, it can be indicated that the third audio data is audio data related to the first audio data.

相应地,步骤S6042具体可以为以下步骤:Correspondingly, step S6042 may specifically be the following steps:

步骤S6043,通过所述自适应滤波器,将所述第三音频数据从所述第二音频数据中滤除,得到滤除所述第三音频数据的目标音频数据。Step S6043: Filter out the third audio data from the second audio data by using the adaptive filter to obtain target audio data in which the third audio data is filtered out.

实际中,在确定出第三音频数据时,也可以利用自适应滤波器将所述第三音频数据从所述第二音频数据中滤除。In practice, when the third audio data is determined, an adaptive filter may also be used to filter the third audio data from the second audio data.

需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。It should be noted that, for the method embodiment, for the sake of simple description, it is expressed as a series of action combinations, but those skilled in the art should know that the embodiment of the present application is not limited by the described action sequence, because According to the embodiment of the present application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions involved are not necessarily required by the embodiments of the present application.

参考图7,示出了本申请实施例的一种基于视联网的音频处理装置的结构框图,在所述视联网中部署有视联网终端,所述视联网终端与流媒体服务器通信连接,所述流媒体服务器与移动终端通信连接,所述移动终端上配置有音频播放组件,所述装置应用于所述移动终端中设置的应用程序对象,所述装置具体地可以为虚拟装置,具体可以包括以下模块:Referring to FIG. 7 , it shows a structural block diagram of an audio processing device based on the Internet of Vision according to an embodiment of the present application. In the Internet of Vision, an Internet-of-Vision terminal is deployed, and the Internet-of-Vision terminal communicates with a streaming media server. The streaming media server is communicatively connected to the mobile terminal, the mobile terminal is equipped with an audio playback component, and the device is applied to an application program object set in the mobile terminal, the device may specifically be a virtual device, and specifically may include The following modules:

音频模式触发模块701,用于在检测到预设的音频通话业务被开启时,触发所述移动终端上预设的音频数据采集模式;An audio mode trigger module 701, configured to trigger a preset audio data collection mode on the mobile terminal when it is detected that a preset audio call service is turned on;

音频数据接收并播放模块702,用于接收所述流媒体服务器发送的第一音频数据,调用所述音频播放组件,对所述第一音频数据进行播放;所述第一音频数据由所述视联网终端发送给所述流媒体服务器;Audio data receiving and playing module 702, configured to receive the first audio data sent by the streaming media server, call the audio playing component, and play the first audio data; the first audio data is played by the video The networked terminal sends to the streaming media server;

音频数据采集模块703,用于获取由所述移动终端在所述音频数据采集模式下采集的第二音频数据;An audio data collection module 703, configured to obtain second audio data collected by the mobile terminal in the audio data collection mode;

音频数据处理模块704,用于根据所述第一音频数据及所述第二音频数据,对所述第二音频数据进行回声消除处理,得到回声消除处理后的目标音频数据;An audio data processing module 704, configured to perform echo cancellation processing on the second audio data according to the first audio data and the second audio data, to obtain target audio data after echo cancellation processing;

音频数据发送模块705,用于将所述目标音频数据发送给所述流媒体服务器,所述流媒体服务器用于将所述目标音频数据发送给所述视联网终端。An audio data sending module 705, configured to send the target audio data to the streaming media server, and the streaming media server is configured to send the target audio data to the Internet-of-Video terminal.

可选地,在所述移动终端上配置有第一麦克风及第二麦克风,所述音频模式触发模块,具体可以用于调用所述第一麦克风及所述第二麦克风;Optionally, a first microphone and a second microphone are configured on the mobile terminal, and the audio mode trigger module may specifically be used to invoke the first microphone and the second microphone;

所述音频数据采集模块,具体可以包括以下单元:The audio data acquisition module may specifically include the following units:

麦克风音频数据获取单元,用于获取由所述第一麦克风采集的第一麦克风音频数据,以及由所述第二麦克风采集的第二麦克风音频数据;a microphone audio data acquisition unit, configured to acquire first microphone audio data collected by the first microphone, and second microphone audio data collected by the second microphone;

降噪处理单元,用于根据所述第一麦克风音频数据及所述第二麦克风音频数据,对所述第二麦克风音频数据进行降噪处理,得到第二音频数据。The noise reduction processing unit is configured to perform noise reduction processing on the second microphone audio data according to the first microphone audio data and the second microphone audio data to obtain second audio data.

可选地,所述音频数据处理模块,具体可以包括以下单元:Optionally, the audio data processing module may specifically include the following units:

音频数据查找单元,用于在所述第二音频数据中,确定与所述第一音频数据对应的第三音频数据;an audio data search unit, configured to determine, among the second audio data, third audio data corresponding to the first audio data;

音频数据滤除单元,用于将所述第三音频数据从所述第二音频数据中滤除,得到滤除所述第三音频数据的目标音频数据。An audio data filtering unit, configured to filter out the third audio data from the second audio data to obtain target audio data from which the third audio data is filtered out.

可选地,所述装置还具体可以包括以下模块:Optionally, the device may specifically include the following modules:

调用模块,用于调用所述移动终端中设置的自适应滤波器;A calling module, used to call the adaptive filter set in the mobile terminal;

所述音频数据查找单元,具体可以包括以下单元:The audio data search unit may specifically include the following units:

音频数据输入单元,用于将所述第一音频数据输入所述自适应滤波器,得到由所述自适应滤波器输出的输出音频数据;an audio data input unit, configured to input the first audio data into the adaptive filter to obtain output audio data output by the adaptive filter;

音频数据确定单元,用于在所述第二音频数据中,确定与所述输出音频数据的频率相同的第三音频数据。The audio data determining unit is configured to determine, among the second audio data, third audio data having the same frequency as the output audio data.

可选地,所述调用模块,具体可以包括以下单元:Optionally, the calling module may specifically include the following units:

目标接口确定单元,用于确定所述视联网终端上与所述应用程序对象适配的至少一个应用程序接口,并确定所述至少一个应用程序接口中是否存在目标接口;A target interface determining unit, configured to determine at least one application program interface adapted to the application program object on the Internet-of-Vision terminal, and determine whether there is a target interface in the at least one application program interface;

第一调用单元,用于在所述至少一个应用程序接口中存在所述目标接口时,通过所述目标接口调用所述自适应滤波器;A first calling unit, configured to call the adaptive filter through the target interface when the target interface exists in the at least one application program interface;

第二调用单元,用于在所述至少一个应用程序接口中不存在所述目标接口时,通过预设的应用程序接口调用所述自适应滤波器。The second calling unit is configured to call the adaptive filter through a preset application program interface when the target interface does not exist in the at least one application program interface.

对于基于视联网的音频处理装置实施例而言,由于其与基于视联网的音频处理方法实施例基本相似,所以描述的比较简单,相关之处参见基于视联网的音频处理方法实施例的部分说明即可。For the embodiment of the audio processing device based on the Internet of Vision, since it is basically similar to the embodiment of the audio processing method based on the Internet of Vision, the description is relatively simple. That's it.

本申请实施例还提供了一种电子设备,包括:The embodiment of the present application also provides an electronic device, including:

一个或多个处理器;和其上存储有指令的一个或多个机器可读介质,当由所述一个或多个处理器执行时,使得所述设备执行如本申请实施例所述的一个或多个的基于视联网的音频处理方法。One or more processors; and one or more machine-readable media with instructions stored thereon, when executed by the one or more processors, make the device execute one of the methods described in the embodiments of the present application or multiple video-network-based audio processing methods.

本申请实施例还提供了一种计算机可读存储介质,其存储的计算机程序使得处理器执行如本申请实施例所述的基于视联网的音频处理方法。The embodiment of the present application also provides a computer-readable storage medium, which stores a computer program to enable a processor to execute the audio processing method based on the Internet of Vision as described in the embodiment of the present application.

本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。Each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same and similar parts of each embodiment can be referred to each other.

本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the embodiments of the present application may be provided as methods, devices, or computer program products. Therefore, the embodiment of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to the embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing terminal equipment to produce a machine such that instructions executed by the computer or processor of other programmable data processing terminal equipment Produce means for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing terminal to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the The instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded into a computer or other programmable data processing terminal equipment, so that a series of operational steps are performed on the computer or other programmable terminal equipment to produce computer-implemented processing, thereby The instructions executed above provide steps for implementing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。While the preferred embodiments of the embodiments of the present application have been described, additional changes and modifications can be made to these embodiments by those skilled in the art once the basic inventive concept is understood. Therefore, the appended claims are intended to be interpreted to cover the preferred embodiment and all changes and modifications that fall within the scope of the embodiments of the application.

最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。Finally, it should also be noted that in this text, relational terms such as first and second etc. are only used to distinguish one entity or operation from another, and do not necessarily require or imply that these entities or operations, any such actual relationship or order exists. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or terminal equipment comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements identified, or also include elements inherent in such a process, method, article, or end-equipment. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or terminal device comprising said element.

以上对本申请所提供的一种基于视联网的音频处理方法、装置、电子设备及存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The above is a detailed introduction to an audio processing method, device, electronic equipment and storage medium based on the Internet of Vision provided by this application. In this paper, specific examples are used to illustrate the principle and implementation of this application. The above embodiments The description is only used to help understand the method of the present application and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present application, there will be changes in the specific implementation and application scope, in summary , the contents of this specification should not be construed as limiting the application.

Claims (10)

1. An audio processing method based on a video network is characterized in that a video network terminal is deployed in the video network, the video network terminal is in communication connection with a streaming media server, the streaming media server is in communication connection with a mobile terminal, and an audio playing component is configured on the mobile terminal, and the method is applied to an application program object arranged in the mobile terminal, and comprises the following steps:
when detecting that a preset audio and video call service is started, triggering a preset audio acquisition mode on the mobile terminal, wherein the preset audio acquisition mode is an audio acquisition mode corresponding to audio data with the least environmental noise;
receiving first audio data sent by the streaming media server in the audio and video call service, and calling the audio playing component to play the first audio data; the first audio data is sent to the streaming media server by the video networking terminal;
acquiring second audio data acquired by the mobile terminal in the audio acquisition mode;
according to the first audio data and the second audio data, performing echo cancellation processing on the second audio data to obtain target audio data after echo cancellation processing;
and sending the target audio data to the streaming media server, wherein the streaming media server is used for sending the target audio data to the video networking terminal.
2. The method according to claim 1, wherein a first microphone and a second microphone are configured on the mobile terminal, and triggering a preset audio capture mode on the mobile terminal comprises:
invoking the first microphone and the second microphone;
acquiring second audio data acquired by the mobile terminal in the audio acquisition mode, including:
obtaining first microphone audio data captured by the first microphone and second microphone audio data captured by the second microphone;
and according to the first microphone audio data and the second microphone audio data, carrying out noise reduction processing on the second microphone audio data to obtain second audio data.
3. The method of claim 1, wherein performing echo cancellation processing on the second audio data according to the first audio data and the second audio data to obtain target audio data after echo cancellation processing, comprises:
determining third audio data corresponding to the first audio data in the second audio data;
and filtering the third audio data from the second audio data to obtain target audio data with the third audio data filtered.
4. The method according to claim 3, wherein while triggering a preset audio capture mode on the mobile terminal, the method further comprises:
calling a self-adaptive filter arranged in the mobile terminal;
determining, among the second audio data, third audio data having the same frequency as the first audio data, including:
inputting the first audio data into the adaptive filter to obtain output audio data output by the adaptive filter;
in the second audio data, third audio data having the same frequency as the output audio data is determined.
5. The method of claim 4, wherein invoking the adaptive filter configured on the mobile terminal comprises:
determining at least one application program interface which is matched with the application program object on the video network terminal, and determining whether a target interface exists in the at least one application program interface;
when the target interface exists in the at least one application program interface, calling an adaptive filter corresponding to the target interface through the target interface;
and when the target interface does not exist in the at least one application program interface, calling the adaptive filter corresponding to the preset application program interface through a preset application program interface.
6. An audio processing device based on video networking, wherein a video networking terminal is deployed in the video networking, the video networking terminal is in communication connection with a streaming media server, the streaming media server is in communication connection with a mobile terminal, an audio playing component is configured on the mobile terminal, and the device is applied to an application program object arranged in the mobile terminal, and comprises:
the mobile terminal comprises an audio mode triggering module, a voice acquisition module and a voice processing module, wherein the audio mode triggering module is used for triggering a preset audio acquisition mode on the mobile terminal when detecting that a preset audio communication service is started, and the preset audio acquisition mode is an audio acquisition mode corresponding to audio data with the least environmental noise;
the audio data receiving and playing module is used for receiving first audio data sent by the streaming media server, calling the audio playing component and playing the first audio data; the first audio data is sent to the streaming media server by the video networking terminal;
the audio data acquisition module is used for acquiring second audio data acquired by the mobile terminal in the audio acquisition mode;
the audio data processing module is used for performing echo cancellation processing on the second audio data according to the first audio data and the second audio data to obtain target audio data after echo cancellation processing;
and the audio data sending module is used for sending the target audio data to the streaming media server, and the streaming media server is used for sending the target audio data to the video network terminal.
7. The apparatus according to claim 6, wherein a first microphone and a second microphone are configured on the mobile terminal, and the audio mode triggering module is specifically configured to invoke the first microphone and the second microphone;
the audio data acquisition module comprises:
a microphone audio data acquisition unit configured to acquire first microphone audio data acquired by the first microphone and second microphone audio data acquired by the second microphone;
and the noise reduction processing unit is used for carrying out noise reduction processing on the second microphone audio data according to the first microphone audio data and the second microphone audio data to obtain second audio data.
8. The apparatus of claim 6, wherein the audio data processing module comprises:
the audio data searching unit is used for determining third audio data corresponding to the first audio data in the second audio data;
and the audio data filtering unit is used for filtering the third audio data from the second audio data to obtain target audio data with the third audio data filtered.
9. An electronic device, comprising:
one or more processors; and
one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the electronic device to perform the video networking-based audio processing method of any of claims 1-5.
10. A computer-readable storage medium storing a computer program for causing a processor to execute the video network-based audio processing method according to any one of claims 1 to 5.
CN201911285285.7A 2019-12-13 2019-12-13 Audio processing method, device, electronic equipment and storage medium based on Internet of Vision Active CN111212032B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911285285.7A CN111212032B (en) 2019-12-13 2019-12-13 Audio processing method, device, electronic equipment and storage medium based on Internet of Vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911285285.7A CN111212032B (en) 2019-12-13 2019-12-13 Audio processing method, device, electronic equipment and storage medium based on Internet of Vision

Publications (2)

Publication Number Publication Date
CN111212032A CN111212032A (en) 2020-05-29
CN111212032B true CN111212032B (en) 2022-12-23

Family

ID=70789252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911285285.7A Active CN111212032B (en) 2019-12-13 2019-12-13 Audio processing method, device, electronic equipment and storage medium based on Internet of Vision

Country Status (1)

Country Link
CN (1) CN111212032B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562639B (en) * 2020-11-30 2023-09-19 努比亚技术有限公司 Audio processing method, terminal and computer readable storage medium
CN112543202B (en) * 2020-12-28 2022-04-12 创想空间信息技术(苏州)有限公司 Method, system and readable storage medium for transmitting shared sound in network conference
CN112995541B (en) * 2021-04-26 2021-08-13 北京易真学思教育科技有限公司 Method for eliminating video echo and computer storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108616487A (en) * 2016-12-09 2018-10-02 北京视联动力国际信息技术有限公司 Based on the sound mixing method and device regarding networking
CN108962273A (en) * 2017-12-29 2018-12-07 北京视联动力国际信息技术有限公司 A kind of audio-frequency inputting method and device of microphone

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7012630B2 (en) * 1996-02-08 2006-03-14 Verizon Services Corp. Spatial sound conference system and apparatus
CN108630215B (en) * 2017-09-21 2020-02-21 视联动力信息技术股份有限公司 Echo suppression method and device based on video networking
CN110557595A (en) * 2018-05-31 2019-12-10 视联动力信息技术股份有限公司 Method and device for accessing mobile terminal to video conference
CN109147812B (en) * 2018-09-19 2020-09-08 视联动力信息技术股份有限公司 Echo cancellation method and device
CN110324644A (en) * 2019-07-05 2019-10-11 视联动力信息技术股份有限公司 UAV Video live broadcasting method, system, electronic equipment and readable storage medium storing program for executing
CN110445526A (en) * 2019-07-10 2019-11-12 视联动力信息技术股份有限公司 Data transmission method, device, system, electronic equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108616487A (en) * 2016-12-09 2018-10-02 北京视联动力国际信息技术有限公司 Based on the sound mixing method and device regarding networking
CN108962273A (en) * 2017-12-29 2018-12-07 北京视联动力国际信息技术有限公司 A kind of audio-frequency inputting method and device of microphone

Also Published As

Publication number Publication date
CN111212032A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
CN108632558B (en) Video call method and device
CN111212032B (en) Audio processing method, device, electronic equipment and storage medium based on Internet of Vision
CN110087102B (en) Status query method, device and storage medium
CN108616487B (en) Audio mixing method and device based on video networking
CN109547728B (en) Recorded broadcast source conference entering and conference recorded broadcast method and system
CN110572607A (en) Video conference method, system and device and storage medium
CN108965220B (en) Method and system for synchronizing conference control right
CN110191304B (en) Data processing method, device and storage medium
CN109788235B (en) Video networking-based conference recording information processing method and system
CN111327868B (en) Method, terminal, server, device and medium for setting conference speaker role
CN108630215B (en) Echo suppression method and device based on video networking
CN109147812B (en) Echo cancellation method and device
CN110457575B (en) File pushing method, device and storage medium
CN110149305B (en) A method and transfer server for multi-party playing audio and video based on video networking
CN109905616B (en) Method and device for switching video pictures
CN108965777B (en) Echo cancellation method and device
CN111182258B (en) Data transmission method and device for network conference
CN111131751B (en) Information display method, device and system for video network conference
CN111131840B (en) Method and device for switching network of video service system
CN111212263B (en) Method and device for filtering monitoring resource data
CN110049275B (en) Information processing method and device in video conference and storage medium
CN111131758B (en) Audio and video data calling method and device and storage medium
CN109714316B (en) Audio mixing processing method of video network and video network system
CN110049069B (en) Data acquisition method and device
CN110198384A (en) A kind of means of communication and transfer server based on view networking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 33rd Floor, No.1 Huasheng Road, Yuzhong District, Chongqing 400013

Patentee after: VISIONVERA INFORMATION TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 100000 Beijing Dongcheng District Qinglong Hutong 1 Song Hua Building A1103-1113

Patentee before: VISIONVERA INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address