CN106713818A - Speech processing system and method during video call - Google Patents
Speech processing system and method during video call Download PDFInfo
- Publication number
- CN106713818A CN106713818A CN201710093114.9A CN201710093114A CN106713818A CN 106713818 A CN106713818 A CN 106713818A CN 201710093114 A CN201710093114 A CN 201710093114A CN 106713818 A CN106713818 A CN 106713818A
- Authority
- CN
- China
- Prior art keywords
- call
- module
- video
- online
- atmosphere
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 230000006854 communication Effects 0.000 claims abstract description 13
- 238000004891 communication Methods 0.000 claims abstract description 9
- 230000000694 effects Effects 0.000 claims abstract description 9
- 230000006870 function Effects 0.000 claims description 30
- 230000002708 enhancing effect Effects 0.000 claims 8
- 230000015572 biosynthetic process Effects 0.000 claims 2
- 238000003786 synthesis reaction Methods 0.000 claims 2
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 238000005728 strengthening Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 7
- 238000003672 processing method Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephonic Communication Services (AREA)
Abstract
本发明提供一种视频通话中语音处理系统及方法,视频通话终端通过基础通信网互联互通;视频通话包含外部增强通话功能的在线服务器;外部增强通话功能的在线服务器包括在线语音转文字模块及在线通话氛围模块;在线语音转文字模块包括语音识别单元;用户通过视频通话终端进行通话;终端的本地语音转文字模块或在线语音转文字模块的语音识别单元对对方的音频数据进行处理,进行语音识别后转换成文字,存储在文字转字幕存储模块,将识别的文字内容叠加到终端的视频画面上进行显示;并调用终端本地通话氛围模块或外部增强通话功能服务器的在线通话氛围模块;根据识别的文字内容,将通话整体氛围渲染成图像和文字效果,并与视频图像合成后在终端渲染显示。
The invention provides a voice processing system and method in a video call. The video call terminals are interconnected through the basic communication network; the video call includes an online server with an external enhanced call function; the online server with an external enhanced call function includes an online voice-to-text module and an online The call atmosphere module; the online speech-to-text module includes a speech recognition unit; the user makes a call through a video call terminal; the terminal's local speech-to-text module or the speech recognition unit of the online speech-to-text module processes the other party's audio data for speech recognition Then convert it into text, store it in the text-to-subtitle storage module, superimpose the recognized text content on the video screen of the terminal for display; and call the terminal's local call atmosphere module or the online call atmosphere module of the external enhanced call function server; according to the identified The text content renders the overall atmosphere of the call into images and text effects, which are synthesized with video images and rendered and displayed on the terminal.
Description
技术领域technical field
本发明涉及一种视频通话中语音处理系统及方法。The invention relates to a voice processing system and method in a video call.
背景技术Background technique
随着技术的进步,人与人的远程沟通方式从书信,电报,语音电话发展到视频电话。视频电话需要同时传输视频数据和音频数据,虽然音视频数据均有压缩,但是其数据量仍旧比纯语音通信的多1-2个数量级。视频通话对基础网络的要求,对终端的硬件配置均有大幅提高。With the advancement of technology, the way of remote communication between people has developed from letters, telegrams, and voice calls to video calls. Video telephony needs to transmit video data and audio data at the same time. Although audio and video data are compressed, the amount of data is still 1-2 orders of magnitude larger than that of pure voice communication. The requirements for video calls on the basic network and the hardware configuration of the terminal have been greatly improved.
视频通话就是音频和视频同时传送,但是技术进步,能让视频通话承载更多的内容,改进视频通话的用户体验,增加用户的粘性。Video calls are audio and video transmissions at the same time, but technological advancements allow video calls to carry more content, improve the user experience of video calls, and increase user stickiness.
发明内容Contents of the invention
本发明的目的是提供一种视频通话中语音处理的方法和系统,用于给视频通话增加一些特性,增加视频通话的趣味性,增加视频通话功能的用户粘性。The purpose of the present invention is to provide a method and system for voice processing in a video call, which is used to add some features to the video call, increase the fun of the video call, and increase the user stickiness of the video call function.
本发明采用以下技术方案实现:The present invention adopts following technical scheme to realize:
一种视频通话中语音处理的系统,其特征在于:包括硬件驱动与操作系统模块、视频通话中间件模块、本地语音转文字模块、本地通话氛围模块、文字转字幕存储模块、文字效果用户设置模块、通话氛围用户设置模块及外部增强通话功能在线服务器;所述外部增强通话功能在线服务器包括在线语音转文字模块及在线通话氛围模块;在线语音转文字模块包括语音识别单元;所述视频通话中间件模块用于接收对方视频通话的音视频数据,并将音视频数据解复用,得到视频数据和音频数据;本地语音转文字模块或在线语音转文字模块将音频数据,调用语音转文件接口,得到用户的文字内容;本地通话氛围模块或在线通话氛围模块将通话整体氛围渲染成图像,并与视频图像合成后在终端渲染显示。A system for voice processing in a video call, characterized in that it includes a hardware driver and an operating system module, a video call middleware module, a local voice-to-text module, a local call atmosphere module, a text-to-subtitle storage module, and a text effect user setting module , call atmosphere user setting module and an external enhanced call function online server; the external enhanced call function online server includes an online speech-to-text module and an online call atmosphere module; the online speech-to-text module includes a speech recognition unit; the video call middleware The module is used to receive the audio and video data of the other party’s video call, and demultiplex the audio and video data to obtain video data and audio data; the local voice-to-text module or the online voice-to-text module transfers the audio data to the voice-to-file interface to obtain The user's text content; the local call atmosphere module or the online call atmosphere module renders the overall atmosphere of the call into an image, which is synthesized with the video image and then rendered and displayed on the terminal.
本发明还提供一种视频通话中语音处理方法,其特征在于:包括以下步骤:S1:视频通话终端通过基础通信网互联互通;提供一外部增强通话功能在线服务器;外部增强通话功能在线服务器包括在线语音识别服务器及在线通话氛围服务器;S2:用户通过视频通话终端进行通话;视频通话中间件模块接收对方视频通话的音视频数据,并将音视频数据解复用,得到视频数据和音频数据;通过终端的本地语音转文字模块或在线语音转文字模块的语音识别单元对对方的音频数据进行语音识别,再转换成文字存储在文字转字幕存储模块,并将识别的文字内容叠加到终端的视频画面上进行显示;S3:调用终端的本地通话氛围模块或外部增强通话功能服务器的在线通话氛围模块;根据S2中识别的文字内容,将通话整体氛围渲染成图像和文字效果,并与视频图像合成后在终端渲染显示。The present invention also provides a voice processing method in a video call, which is characterized in that it includes the following steps: S1: the video call terminals are interconnected and intercommunicated through the basic communication network; an external online server with enhanced call function is provided; the external online server with enhanced call function includes an online Speech recognition server and online call atmosphere server; S2: The user makes a call through the video call terminal; the video call middleware module receives the audio and video data of the other party's video call, and demultiplexes the audio and video data to obtain video data and audio data; The terminal's local speech-to-text module or the speech recognition unit of the online speech-to-text module performs speech recognition on the audio data of the other party, and then converts it into text and stores it in the text-to-subtitle storage module, and superimposes the recognized text content on the video screen of the terminal S3: call the local call atmosphere module of the terminal or the online call atmosphere module of the external enhanced call function server; according to the text content recognized in S2, the overall atmosphere of the call is rendered into image and text effects, and synthesized with the video image Displayed in terminal rendering.
进一步的,用户根据需求选择是否调用本地或在线通话氛围模块。Further, the user chooses whether to call the local or online call atmosphere module according to the requirement.
进一步的,预先存储有多种文字叠加在视频画面的模板,由用户进行选择。Further, templates with various texts superimposed on the video screen are pre-stored and selected by the user.
进一步的,视频通话终端间的数据通信过程包含用户认证过程。Further, the data communication process between video call terminals includes a user authentication process.
进一步的,还包括S4:当S3中调取外部增强通话功能服务器的在线通话氛围模块;终端将音视频数据传输给外部增强通话功能在线服务器,在线服务器处理后,得到文字数据和氛围数据,连同终端的音视频数据一并传输给对方。Further, it also includes S4: when the online call atmosphere module of the external enhanced call function server is called in S3; the terminal transmits the audio and video data to the external enhanced call function online server, and after processing by the online server, text data and atmosphere data are obtained, together with The audio and video data of the terminal are transmitted to the other party together.
与现有技术相比,本发明具有以下优点:扩展了视频通话的使用功能(语音转文字),增加了功能的用户粘性;增强了通话氛围渲染功能(文字显示的额外效果),同样增加了功能的用户粘性。Compared with the prior art, the present invention has the following advantages: the use function of the video call is expanded (voice-to-text), and the user stickiness of the function is increased; the call atmosphere rendering function (extra effect of text display) is enhanced, and the User stickiness of the function.
附图说明Description of drawings
图1为视频通话中语音处理系统的总体结构图。FIG. 1 is an overall structural diagram of a voice processing system in a video call.
图2为视频通话中语音处理系统的核心模块框图。Fig. 2 is a block diagram of the core modules of the speech processing system in the video call.
图3为视频通话中语音处理的操作序列图。FIG. 3 is an operation sequence diagram of voice processing in a video call.
具体实施方式detailed description
下面结合附图和具体实施例对本发明做进一步解释说明。The present invention will be further explained below in conjunction with the accompanying drawings and specific embodiments.
如图1所示,视频通话中语音处理系统的总体结构图。视频通话终端通过基础通信网(互联网等)互联互通。视频通话包含外部增强通话功能的在线服务器,如:在线语音识别服务器,在线通话氛围服务器。服务器功能的划分是功能逻辑上划分,并非从物理逻辑上划分,即在线语音视频服务器和在线通话氛围服务器可能是存在于同一台服务器主机上。As shown in Figure 1, the overall structure diagram of the speech processing system in the video call. The video call terminals are interconnected through the basic communication network (Internet, etc.). Video calls include external online servers that enhance call functions, such as online voice recognition servers and online call atmosphere servers. The division of server functions is a logical division of functions, not a physical division, that is, the online voice and video server and the online call atmosphere server may exist on the same server host.
视频通话终端和在线语音视频服务器和在线通话氛围服务器通过基础通信网相连接,他们之前的数据通信是双向的。数据通信过程可能包含必要的用户认证过程。The video call terminal is connected to the online voice and video server and the online call atmosphere server through the basic communication network, and their previous data communication is two-way. Data communication process may include necessary user authentication process.
如图2所示,视频通话中语音处理系统的核心模块框图。视频通话中语音处理系统包括硬件驱动与操作系统模块、视频通话中间件模块、本地语音转文字模块、在线语音转文字模块、文字转字幕存储模块、文字效果用户设置模块、本地通话氛围模块、在线通话氛围模块和通话氛围用户设置模块;所述视频通话中间件模块用于接收对方视频通话的音视频数据,并将音视频数据解复用,得到视频数据和音频数据;本地语音转文字模块或在线语音转文字模块将音频数据,调用语音转文件接口,得到用户的文字内容;本地在线通话氛围模块或在线通话氛围模块将通话整体氛围渲染成图像,并与视频图像合成后在终端渲染显示。As shown in Figure 2, the block diagram of the core modules of the speech processing system in the video call. The voice processing system in the video call includes hardware driver and operating system module, video call middleware module, local voice-to-text module, online voice-to-text module, text-to-subtitle storage module, text effect user setting module, local call atmosphere module, online A call atmosphere module and a call atmosphere user setting module; the video call middleware module is used to receive the audio and video data of the other party's video call, and demultiplex the audio and video data to obtain video data and audio data; the local voice-to-text module or The online voice-to-text module transfers audio data to the voice-to-file interface to obtain the user's text content; the local online call atmosphere module or the online call atmosphere module renders the overall atmosphere of the call into an image, which is synthesized with the video image and displayed on the terminal.
本发明还提供一种视频通话中语音处理方法,其包括以下步骤:S1:视频通话终端通过基础通信网互联互通;提供一外部增强通话功能在线服务器;外部增强通话功能在线服务器包括在线语音识别服务器及在线通话氛围服务器;S2:用户通过视频通话终端进行通话;视频通话中间件模块接收对方视频通话的音视频数据,并将音视频数据解复用,得到视频数据和音频数据;通过终端的本地语音转文字模块或在线语音转文字模块的语音识别单元对对方的音频数据进行语音识别,再转换成文字存储在文字转字幕存储模块,并将识别的文字内容叠加到终端的视频画面上进行显示;S3:调用终端的本地通话氛围模块或外部增强通话功能服务器的在线通话氛围模块;根据S2中识别的文字内容,将通话整体氛围渲染成图像和文字效果,并与视频图像合成后在终端渲染显示。The present invention also provides a voice processing method in a video call, which includes the following steps: S1: video call terminals are interconnected through the basic communication network; an external online server with enhanced call function is provided; the external online server with enhanced call function includes an online voice recognition server and the online call atmosphere server; S2: the user makes a call through the video call terminal; the video call middleware module receives the audio and video data of the other party’s video call, and demultiplexes the audio and video data to obtain video data and audio data; through the terminal’s local The speech recognition unit of the speech-to-text module or the online speech-to-text module performs speech recognition on the audio data of the other party, and then converts it into text and stores it in the text-to-subtitle storage module, and superimposes the recognized text content on the video screen of the terminal for display ;S3: call the local call atmosphere module of the terminal or the online call atmosphere module of the external enhanced call function server; according to the text content recognized in S2, the overall atmosphere of the call is rendered into image and text effects, and rendered on the terminal after being synthesized with the video image show.
如图3所示,视频通话中语音处理的操作序列图。接收对方视频通话的音视频数据之后,语音转文字和通话氛围功能可在服务器上完成,或者视频通话终端的完成。具体的交互过程见操作序列图。As shown in FIG. 3 , the operation sequence diagram of voice processing in a video call. After receiving the audio and video data of the other party's video call, the voice-to-text and call atmosphere functions can be completed on the server, or the completion of the video call terminal. See the operation sequence diagram for the specific interaction process.
进一步的,用户根据需求选择是否调用本地或在线通话氛围模块。Further, the user chooses whether to call the local or online call atmosphere module according to the requirement.
进一步的,预先存储有多种文字叠加在视频画面的模板,由用户进行选择。Further, templates with various texts superimposed on the video screen are pre-stored and selected by the user.
进一步的,视频通话终端间的数据通信过程包含用户认证过程。Further, the data communication process between video call terminals includes a user authentication process.
进一步的,还包括S4:当S3中调取外部增强通话功能服务器的在线通话氛围模块;终端将音视频数据传输给外部增强通话功能在线服务器,在线服务器处理后,得到文字数据和氛围数据,连同终端的音视频数据一并传输给对方。Further, it also includes S4: when the online call atmosphere module of the external enhanced call function server is called in S3; the terminal transmits the audio and video data to the external enhanced call function online server, and after processing by the online server, text data and atmosphere data are obtained, together with The audio and video data of the terminal are transmitted to the other party together.
以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。The above are the preferred embodiments of the present invention, and all changes made according to the technical solution of the present invention, when the functional effect produced does not exceed the scope of the technical solution of the present invention, all belong to the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710093114.9A CN106713818A (en) | 2017-02-21 | 2017-02-21 | Speech processing system and method during video call |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710093114.9A CN106713818A (en) | 2017-02-21 | 2017-02-21 | Speech processing system and method during video call |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106713818A true CN106713818A (en) | 2017-05-24 |
Family
ID=58917095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710093114.9A Pending CN106713818A (en) | 2017-02-21 | 2017-02-21 | Speech processing system and method during video call |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106713818A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415706A (en) * | 2019-08-08 | 2019-11-05 | 常州市小先信息技术有限公司 | A kind of technology and its application of superimposed subtitle real-time in video calling |
CN112804440A (en) * | 2019-11-13 | 2021-05-14 | 北京小米移动软件有限公司 | Method, device and medium for processing image |
US11044287B1 (en) | 2020-11-13 | 2021-06-22 | Microsoft Technology Licensing, Llc | Caption assisted calling to maintain connection in challenging network conditions |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002165193A (en) * | 2000-11-24 | 2002-06-07 | Sharp Corp | Visual telephone system |
CN1741656A (en) * | 2004-08-27 | 2006-03-01 | 乐金电子(中国)研究开发中心有限公司 | Information service method for camera mobile telephone and apparatus thereof |
CN1747546A (en) * | 2004-09-07 | 2006-03-15 | 乐金电子(中国)研究开发中心有限公司 | Device and method for providing video effect as communication between mobile communication terminals |
CN104902212A (en) * | 2015-04-30 | 2015-09-09 | 努比亚技术有限公司 | Video communication method and apparatus |
CN105244023A (en) * | 2015-11-09 | 2016-01-13 | 上海语知义信息技术有限公司 | System and method for reminding teacher emotion in classroom teaching |
CN105260416A (en) * | 2015-09-25 | 2016-01-20 | 百度在线网络技术(北京)有限公司 | Voice recognition based searching method and apparatus |
CN105530521A (en) * | 2015-12-16 | 2016-04-27 | 广东欧珀移动通信有限公司 | A streaming media search method, device and system |
-
2017
- 2017-02-21 CN CN201710093114.9A patent/CN106713818A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002165193A (en) * | 2000-11-24 | 2002-06-07 | Sharp Corp | Visual telephone system |
CN1741656A (en) * | 2004-08-27 | 2006-03-01 | 乐金电子(中国)研究开发中心有限公司 | Information service method for camera mobile telephone and apparatus thereof |
CN1747546A (en) * | 2004-09-07 | 2006-03-15 | 乐金电子(中国)研究开发中心有限公司 | Device and method for providing video effect as communication between mobile communication terminals |
CN104902212A (en) * | 2015-04-30 | 2015-09-09 | 努比亚技术有限公司 | Video communication method and apparatus |
CN105260416A (en) * | 2015-09-25 | 2016-01-20 | 百度在线网络技术(北京)有限公司 | Voice recognition based searching method and apparatus |
CN105244023A (en) * | 2015-11-09 | 2016-01-13 | 上海语知义信息技术有限公司 | System and method for reminding teacher emotion in classroom teaching |
CN105530521A (en) * | 2015-12-16 | 2016-04-27 | 广东欧珀移动通信有限公司 | A streaming media search method, device and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415706A (en) * | 2019-08-08 | 2019-11-05 | 常州市小先信息技术有限公司 | A kind of technology and its application of superimposed subtitle real-time in video calling |
CN112804440A (en) * | 2019-11-13 | 2021-05-14 | 北京小米移动软件有限公司 | Method, device and medium for processing image |
US11044287B1 (en) | 2020-11-13 | 2021-06-22 | Microsoft Technology Licensing, Llc | Caption assisted calling to maintain connection in challenging network conditions |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2007346312B2 (en) | A communication network and devices for text to speech and text to facial animation conversion | |
CN108234735A (en) | A kind of media display methods and terminal | |
US20060281064A1 (en) | Image communication system for compositing an image according to emotion input | |
CN113592985B (en) | Method and device for outputting mixed deformation value, storage medium and electronic device | |
CN114339069B (en) | Video processing method, video processing device, electronic equipment and computer storage medium | |
CN111201786B (en) | Display control device, communication device, display control method, and storage medium | |
CN115942039B (en) | Video generation method, device, electronic equipment and storage medium | |
CN102236986A (en) | Sign language translation system, device and method | |
CN107993646A (en) | A kind of method for realizing real-time voice intertranslation | |
KR20140146965A (en) | Translation system comprising of display apparatus and server and display apparatus controlling method thereof | |
CN106713818A (en) | Speech processing system and method during video call | |
WO2007070734A2 (en) | Method and system for directing attention during a conversation | |
CN108664536A (en) | Interactive audio-visual sharing method and system | |
CN103796181A (en) | Playing method of sending message, system and related equipment thereof | |
CN113593587B (en) | Voice separation method and device, storage medium, and electronic device | |
US20150341565A1 (en) | Low data-rate video conference system and method, sender equipment and receiver equipment | |
CN206339975U (en) | A kind of talkback unit for realizing real-time voice intertranslation | |
US20210249007A1 (en) | Conversation assistance device, conversation assistance method, and program | |
CN107707866A (en) | A kind of remote video communication method based on Internet of Things | |
US20230247131A1 (en) | Presentation of communications | |
US8553855B2 (en) | Conference support apparatus and conference support method | |
US20140129228A1 (en) | Method, System, and Relevant Devices for Playing Sent Message | |
CN101119545B (en) | Information processing system and information processing method based on coding labels | |
JP2004015478A (en) | Speech communication terminal device | |
CN116264073A (en) | Dubbing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Wu Wenhuan Inventor before: Chen Tianwu |
|
CB03 | Change of inventor or designer information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170524 |
|
RJ01 | Rejection of invention patent application after publication |