[go: up one dir, main page]

CN107205131A - A kind of methods, devices and systems for realizing video calling - Google Patents

A kind of methods, devices and systems for realizing video calling Download PDF

Info

Publication number
CN107205131A
CN107205131A CN201610161286.0A CN201610161286A CN107205131A CN 107205131 A CN107205131 A CN 107205131A CN 201610161286 A CN201610161286 A CN 201610161286A CN 107205131 A CN107205131 A CN 107205131A
Authority
CN
China
Prior art keywords
bag
video
text
terminal
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610161286.0A
Other languages
Chinese (zh)
Inventor
程岑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201610161286.0A priority Critical patent/CN107205131A/en
Priority to PCT/CN2017/075195 priority patent/WO2017157168A1/en
Publication of CN107205131A publication Critical patent/CN107205131A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/762Media network packet handling at the source 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/278Subtitling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Telephone Function (AREA)

Abstract

A kind of methods, devices and systems for realizing video calling, including:First terminal gathers digital audio and video signals and digital video signal respectively;Digital audio and video signals are converted to text message by first terminal, and text message is packaged into text bag, digital audio and video signals are packaged into audio pack, digital video signal is packaged into video bag;Text bag, audio pack and video bag are sent to second terminal by first terminal respectively.

Description

一种实现视频通话的方法、装置和系统A method, device and system for implementing video calls

技术领域 technical field

本文涉及但不限于视频通话领域,尤指一种实现视频通话的方法、装置和系统。 This article relates to but not limited to the field of video calls, especially a method, device and system for realizing video calls.

背景技术 Background technique

随着移动和互联网宽带技术的飞速发展,使可视通讯增值业务在家庭用户中得到迅速的推广,通过基于这个业务的技术可以得到面对面的交流以及网上视频教学等增值业务的服务,如果为可视通讯业务的音频增加同步字幕,不但能够给听力差的用户提供更好的服务,而且可以在网络不佳的情况下对实际的音频效果作一个有益的补充。 With the rapid development of mobile and Internet broadband technology, the visual communication value-added service has been rapidly promoted among home users. Through the technology based on this service, value-added services such as face-to-face communication and online video teaching can be obtained. Adding synchronous subtitles to the audio of video communication services can not only provide better services to users with poor hearing, but also make a useful supplement to the actual audio effect when the network is not good.

相关技术中,实现视频通话中增加语音字幕的方法大致包括: In related technologies, methods for adding voice subtitles in a video call roughly include:

第一终端分别采集数字音频信号和数字视频信号;对采集的数字音频信号进行语音编码处理,将语音编码处理后的数字音频信号封装成音频包;并将采集的数字音频信号通过语音识别技术转换为文本信息,将文本信息与采集的数字视频信号叠加合成后进行视频编码处理,将视频编码处理后的数字视频信号封装成视频包;分别将音频包和视频包发送给第二终端; The first terminal separately collects digital audio signals and digital video signals; performs voice coding processing on the collected digital audio signals, and encapsulates the digital audio signals after voice coding processing into audio packets; converts the collected digital audio signals through voice recognition technology For text information, the text information and the collected digital video signal are superimposed and synthesized, and then the video encoding process is performed, and the digital video signal after the video encoding process is packaged into a video packet; the audio packet and the video packet are respectively sent to the second terminal;

第二终端接收到音频包和视频包,对音频包中语音编码处理后的数字音频信号进行语音解码得到数字音频信号并播放,对视频包中频编码处理后的数字视频信号进行视频解码得到数字视频信号并显示。 The second terminal receives the audio packet and the video packet, performs speech decoding on the digital audio signal after the speech coding processing in the audio packet to obtain a digital audio signal and plays it, and performs video decoding on the digital video signal after the intermediate frequency coding processing of the video packet to obtain a digital video signal and display.

上述方法中,当网络情况不佳时,由于视频包比较大,所以视频包出现丢包和抖动的概率会更大,这样,文本信息就会随着视频包一起而丢失,造成视频通话过程中信息丢失。 In the above method, when the network condition is not good, since the video packet is relatively large, the probability of packet loss and jitter in the video packet will be greater. In this way, the text information will be lost together with the video packet, resulting in Information is lost.

发明内容 Contents of the invention

本发明实施例提出了一种实现视频通话的方法、装置和系统,能够在网络情况不佳时减少视频通话过程中的信息丢失。 Embodiments of the present invention provide a method, device and system for implementing video calls, which can reduce information loss during video calls when the network condition is poor.

本发明实施例提出了一种实现视频通话的方法,包括: The embodiment of the present invention proposes a method for implementing a video call, including:

第一终端分别采集数字音频信号和数字视频信号; The first terminal respectively collects digital audio signals and digital video signals;

第一终端将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包; The first terminal converts the digital audio signal into text information, encapsulates the text information into a text packet, encapsulates the digital audio signal into an audio packet, and encapsulates the digital video signal into a video packet;

第一终端分别将文本包、音频包和视频包发送给第二终端。 The first terminal respectively sends the text packet, the audio packet and the video packet to the second terminal.

可选的,所述将数字音频信号封装成音频包之前还包括:所述第一终端对所述数字音频信号进行语音编码处理; Optionally, before encapsulating the digital audio signal into an audio packet, the method further includes: performing speech coding processing on the digital audio signal by the first terminal;

所述将数字音频信号封装成音频包包括:所述第一终端对语音编码处理后的数字音频信号封装成所述音频包。 The encapsulating the digital audio signal into an audio packet includes: the first terminal encapsulating the speech encoded digital audio signal into the audio packet.

可选的,所述将数字视频信号封装成视频包之前还包括:所述第一终端对所述数字视频信号进行视频编码处理; Optionally, before encapsulating the digital video signal into a video packet, the method further includes: performing video encoding processing on the digital video signal by the first terminal;

所述将数字视频信号封装成视频包包括:所述第一终端对视频编码处理后的数字视频信号封装成所述视频包。 The encapsulating the digital video signal into a video packet includes: the first terminal encapsulating the video encoded digital video signal into the video packet.

本发明实施例还提出了一种实现视频通话的方法,包括: The embodiment of the present invention also proposes a method for implementing a video call, including:

第二终端接收到来自第一终端的文本包; The second terminal receives the text packet from the first terminal;

第二终端判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间,显示接收到的文本包和缓存的文本包中,时间戳字段对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳字段对应的时间的文本包中的文本信息。 The second terminal judges that the time corresponding to the timestamp in the received text packet is less than or equal to the time corresponding to the timestamp of the audio packet being played or the video packet being displayed, and displays the received text packet and the buffered text packet. , the text information in the text packet whose time corresponding to the timestamp field is less than or equal to the time corresponding to the timestamp field of the audio packet being played or the video packet being displayed.

可选的,当所述第二终端判断出所述接收到的文本包中的时间戳对应的时间大于正在播放的音频包或正在显示的视频包的时间戳对应的时间时,该方法还包括: Optionally, when the second terminal determines that the time corresponding to the timestamp in the received text packet is greater than the time corresponding to the timestamp of the audio packet being played or the video packet being displayed, the method further includes :

所述第二终端缓存所述接收到的文本包。 The second terminal buffers the received text package.

可选的,当第二终端在接收到所述文本包后的预设时间内未接收到音频 包和视频包时,该方法还包括: Optionally, when the second terminal does not receive the audio package and the video package within the preset time after receiving the text package, the method also includes:

所述第二终端显示缓存的文本包中的文本信息。 The second terminal displays the text information in the cached text package.

可选的,所述第二终端接收到来自第一终端的文本包后,在所述第二终端判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间之前还包括: Optionally, after the second terminal receives the text packet from the first terminal, the second terminal determines that the time corresponding to the time stamp in the received text packet is less than or equal to the audio packet being played or the audio packet being played. Before the time corresponding to the timestamp of the displayed video package also includes:

所述第二终端判断出字幕显示功能已打开。 The second terminal determines that the subtitle display function has been turned on.

本发明实施例还提出了一种第一终端,包括: The embodiment of the present invention also proposes a first terminal, including:

采集模块,用于分别采集数字音频信号和数字视频信号; Acquisition module, used for respectively collecting digital audio signal and digital video signal;

第一处理模块,用于将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包; The first processing module is used to convert the digital audio signal into text information, encapsulate the text information into a text packet, encapsulate the digital audio signal into an audio packet, and encapsulate the digital video signal into a video packet;

发送模块,用于分别将文本包、音频包和视频包发送给第二终端。 A sending module, configured to respectively send the text packet, the audio packet and the video packet to the second terminal.

可选的,所述第一处理模块具体用于: Optionally, the first processing module is specifically used for:

将数字音频信号转换为文本信息,对所述数字音频信号进行语音编码处理,将文本信息封装成文本包,对语音编码处理后的数字音频信号封装成所述音频包,对所述数字视频信号进行视频编码处理,对视频编码处理后的数字视频信号封装成所述视频包。 converting the digital audio signal into text information, performing speech encoding processing on the digital audio signal, encapsulating the text information into a text packet, encapsulating the speech encoded digital audio signal into the audio packet, and processing the digital video signal Perform video encoding processing, and encapsulate the digital video signal after video encoding processing into the video packet.

本发明实施例还提出了一种第二终端,包括: The embodiment of the present invention also proposes a second terminal, including:

接收模块,用于接收到来自第一终端的文本包; a receiving module, configured to receive a text packet from the first terminal;

第二处理模块,用于判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间,显示接收到的文本包和缓存的文本包中,时间戳字段对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳字段对应的时间的文本包中的文本信息。 The second processing module is used to judge that the time corresponding to the timestamp in the received text packet is less than or equal to the time corresponding to the timestamp of the audio packet being played or the video packet being displayed, and display the received text packet and cache In the text packet, the time corresponding to the timestamp field is less than or equal to the text information in the text packet of the time corresponding to the timestamp field of the audio packet being played or the video packet being displayed.

可选的,所述第二处理模块还用于: Optionally, the second processing module is also used for:

当判断出所述接收到的文本包中的时间戳对应的时间大于正在播放的音频包或正在显示的视频包的时间戳对应的时间时,缓存所述接收到的文本包。 When it is determined that the time corresponding to the time stamp in the received text package is greater than the time corresponding to the time stamp of the audio package being played or the video package being displayed, buffer the received text package.

可选的,所述第二处理模块还用于: Optionally, the second processing module is also used for:

当在接收到所述文本包后的预设时间内未接收到音频包和视频包时,显示缓存的文本包中的文本信息。 When the audio package and the video package are not received within a preset time after the text package is received, the text information in the cached text package is displayed.

本发明实施例还提出了一种实现视频通话的系统,包括: The embodiment of the present invention also proposes a system for realizing video calls, including:

第一终端,用于分别采集数字音频信号和数字视频信号;将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包;分别将文本包、音频包和视频包发送给第二终端; The first terminal is used to separately collect digital audio signals and digital video signals; convert the digital audio signals into text information, encapsulate the text information into text packets, encapsulate the digital audio signals into audio packets, and encapsulate the digital video signals into video packets ; Send the text packet, audio packet and video packet to the second terminal respectively;

第二终端,用于接收到来自第一终端的文本包;判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间,显示接收到的文本包和缓存的文本包中,时间戳字段对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳字段对应的时间的文本包中的文本信息。 The second terminal is used to receive the text packet from the first terminal; it is determined that the time corresponding to the timestamp in the received text packet is less than or equal to the time corresponding to the timestamp of the audio packet being played or the video packet being displayed , to display the text information in the text packets whose time corresponding to the timestamp field in the received text packet and cached text packet is less than or equal to the time corresponding to the timestamp field of the audio packet being played or the video packet being displayed.

可选的,所述第二终端还用于: Optionally, the second terminal is also used for:

当判断出所述接收到的文本包中的时间戳对应的时间大于正在播放的音频包或正在显示的视频包的时间戳对应的时间时,缓存所述接收到的文本包。 When it is determined that the time corresponding to the time stamp in the received text package is greater than the time corresponding to the time stamp of the audio package being played or the video package being displayed, buffer the received text package.

可选的,所述第二终端还用于: Optionally, the second terminal is also used for:

当在接收到所述文本包后的预设时间内未接收到音频包和视频包时,显示缓存的文本包中的文本信息。 When the audio package and the video package are not received within a preset time after the text package is received, the text information in the cached text package is displayed.

与相关技术相比,本发明实施例的技术方案包括:第一终端分别采集数字音频信号和数字视频信号;第一终端将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包;第一终端分别将文本包、音频包和视频包发送给第二终端。通过本发明实施例的方案,第一终端分别将文本包、音频包和视频包发送给第二终端,实现了在网络情况不佳时,视频包丢失时不会导致文本丢失,从而减少了视频通话过程中的信息丢失。 Compared with related technologies, the technical solutions of the embodiments of the present invention include: the first terminal separately collects digital audio signals and digital video signals; the first terminal converts the digital audio signals into text information, encapsulates the text information into text packets, and converts the digital The audio signal is encapsulated into an audio packet, and the digital video signal is encapsulated into a video packet; the first terminal respectively sends the text packet, the audio packet and the video packet to the second terminal. Through the scheme of the embodiment of the present invention, the first terminal sends the text packet, audio packet and video packet to the second terminal respectively, so that when the network condition is not good, the text will not be lost when the video packet is lost, thereby reducing the Information lost during a call.

附图说明 Description of drawings

下面对本发明实施例中的附图进行说明,实施例中的附图是用于对本发明的进一步理解,与说明书一起用于解释本发明,并不构成对本发明保护范围的限制。 The accompanying drawings in the embodiments of the present invention are described below. The accompanying drawings in the embodiments are used for further understanding of the present invention and are used together with the description to explain the present invention, and do not constitute a limitation to the protection scope of the present invention.

图1为本发明实施例发送端实现视频通话的方法的流程图; FIG. 1 is a flow chart of a method for implementing a video call at a sending end according to an embodiment of the present invention;

图2为本发明实施例接收端实现视频通话的方法的流程图; FIG. 2 is a flow chart of a method for implementing a video call at a receiving end according to an embodiment of the present invention;

图3为本发明实施例第一终端的结构组成示意图; FIG. 3 is a schematic diagram of the structural composition of a first terminal according to an embodiment of the present invention;

图4为本发明实施例第二终端的结构组成示意图; FIG. 4 is a schematic structural diagram of a second terminal according to an embodiment of the present invention;

图5为本发明实施例实现视频通话的系统的结构组成示意图。 FIG. 5 is a schematic structural composition diagram of a system for implementing a video call according to an embodiment of the present invention.

具体实施方式 detailed description

为了便于本领域技术人员的理解,下面结合附图对本发明作进一步的描述,并不能用来限制本发明的保护范围。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的各种方式可以相互组合。 In order to facilitate the understanding of those skilled in the art, the present invention will be further described below in conjunction with the accompanying drawings, which cannot be used to limit the protection scope of the present invention. It should be noted that, in the case of no conflict, the embodiments in the present application and various manners in the embodiments can be combined with each other.

参见图1,本发明实施例提出了一种实现视频通话的方法,包括: Referring to Fig. 1, the embodiment of the present invention proposes a method for implementing a video call, including:

步骤100、第一终端分别采集数字音频信号和数字视频信号。 Step 100, the first terminal separately collects digital audio signals and digital video signals.

本步骤中,第一终端可以采用G.711(一种由国际电信联盟制定的音频编码方式)中规定的采集时间采集数字音频信号,按照预先设定的视频帧率采集数字视频信号。例如,每10毫秒(ms)采集一次数字音频信号,每40ms采集一次数字视频信号。 In this step, the first terminal may collect digital audio signals at the collection time specified in G.711 (an audio coding method formulated by the International Telecommunication Union), and collect digital video signals at a preset video frame rate. For example, a digital audio signal is collected every 10 milliseconds (ms), and a digital video signal is collected every 40 ms.

步骤101、第一终端将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包。 Step 101. The first terminal converts digital audio signals into text information, encapsulates the text information into text packets, encapsulates digital audio signals into audio packets, and encapsulates digital video signals into video packets.

本步骤中,第一终端可以采用语音识别技术将数字音频信号转换为文本信息。 In this step, the first terminal may use voice recognition technology to convert the digital audio signal into text information.

本步骤中,文本包、或音频包、或视频包可以按照实时传输协议(RTP,Real-time Transport Protocol)包协议的规范来进行封装。 In this step, the text packet, or audio packet, or video packet may be encapsulated according to the specification of the Real-time Transport Protocol (RTP, Real-time Transport Protocol) packet protocol.

RTP包的包头的格式如表1所示。 The format of the header of the RTP packet is shown in Table 1.

表1 Table 1

表1中,V表示协议版本,2比特(bit), In Table 1, V represents the protocol version, 2 bits (bit),

P表示填充位,1比特,当P置位时,RTP包的包头尾部包含附加的填充字节。 P represents a padding bit, 1 bit, when P is set, the header and tail of the RTP packet contain additional padding bytes.

X为扩展位,1比特,当X置位时,表示在RTP包的包头后扩展一个包头。 X is an extension bit, 1 bit, when X is set, it means that a header is extended after the header of the RTP packet.

CC表示贡献源列表(Contributing Source Identifiers)标识的数目。 CC represents the number of Contributing Source Identifiers identifiers.

M为标记位,1比特。 M is a marker bit, 1 bit.

PT为负载类型(Payload Type),7比特,对于文本包,可以采用相关技术中未使用的类型来表示,例如20。 PT is a payload type (Payload Type), 7 bits, and for a text packet, it can be represented by a type that is not used in the related art, for example, 20.

序列号,16比特,每发一个RTP包,序列号增加1。本发明实施例中,文本包、音频包、视频包的序列号独立编号。 Sequence number, 16 bits, every time an RTP packet is sent, the sequence number increases by 1. In the embodiment of the present invention, the serial numbers of the text package, the audio package and the video package are numbered independently.

时间戳,32比特,记录RTP包中第一个字节的采样时刻。对于音频包和视频包,时间戳为开始采集的时间,对于文本包,时间戳为对应的音频包开始采集的时间。 Timestamp, 32 bits, records the sampling moment of the first byte in the RTP packet. For audio and video packets, the time stamp is the time when the collection starts, and for the text package, the time stamp is the time when the corresponding audio package starts to be collected.

同步源标识符(SSRC,Synchronization Source Identifier),32比特,表示RTP包的来源,同一个RTP会话中不能有两个相同的SSRC值。 Synchronization Source Identifier (SSRC, Synchronization Source Identifier), 32 bits, indicates the source of the RTP packet, and there cannot be two identical SSRC values in the same RTP session.

CSRC,0~15项,每项32比特,该字段不是RTP包的包头所必须的。 CSRC, 0~15 items, 32 bits each, this field is not required for the header of the RTP packet.

文本包中的时间戳字段为采集数字音频信号或数字视频信号的时间,Payload Type为语音文本信息类型(可以采用未定义的值,例如20等)。 The timestamp field in the text packet is the time when the digital audio signal or digital video signal is collected, and the Payload Type is the voice text information type (undefined values can be used, such as 20, etc.).

步骤102、第一终端分别将文本包、音频包和视频包发送给第二终端。 Step 102, the first terminal respectively sends the text packet, the audio packet and the video packet to the second terminal.

本步骤中,不同类型的包可以按照不同的策略分别发送。例如,音频包按照音频编码采样频率发送,视频包按照约定的帧率间隔发送,文本包按照音频编码采样频率发送。 In this step, different types of packets may be sent separately according to different strategies. For example, audio packets are sent according to the audio coding sampling frequency, video packets are sent according to the agreed frame rate interval, and text packets are sent according to the audio coding sampling frequency.

可选的,将数字音频信号封装成音频包之前还包括:第一终端对数字音频信号进行语音编码处理;相应的, Optionally, before encapsulating the digital audio signal into an audio packet, it also includes: the first terminal performs speech coding processing on the digital audio signal; correspondingly,

将数字音频信号封装成音频包包括:第一终端对语音编码处理后的数字音频信号封装成音频包。 Encapsulating the digital audio signal into an audio packet includes: the first terminal encapsulating the speech encoded digital audio signal into an audio packet.

可选的,将数字视频信号封装成视频包之前还包括:第一终端对所述数字视频信号进行视频编码处理;相应的, Optionally, before encapsulating the digital video signal into a video packet, the method further includes: performing video encoding processing on the digital video signal by the first terminal; correspondingly,

所述将数字视频信号封装成视频包包括:所述第一终端对视频编码处理后的数字视频信号封装成所述视频包。 The encapsulating the digital video signal into a video packet includes: the first terminal encapsulating the video encoded digital video signal into the video packet.

通过本发明实施例的方案,第一终端分别将文本包、音频包和视频包发送给第二终端,实现了在网络情况不佳时,视频包丢失时不会导致文本丢失,从而减少了视频通话过程中的信息丢失。 Through the scheme of the embodiment of the present invention, the first terminal sends the text packet, audio packet and video packet to the second terminal respectively, so that when the network condition is not good, the text will not be lost when the video packet is lost, thereby reducing the Information lost during a call.

参见图2,本发明实施例还提出了一种实现视频通话的方法,包括: Referring to Fig. 2, the embodiment of the present invention also proposes a method for implementing a video call, including:

步骤200、第二终端接收到来自第一终端的文本包。 Step 200, the second terminal receives the text packet from the first terminal.

步骤201、第二终端判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间,显示接收到的文本包和缓存的文本包中,时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间的文本包中的文本信息。 Step 201, the second terminal determines that the time corresponding to the timestamp in the received text packet is less than or equal to the time corresponding to the timestamp of the audio packet being played or the video packet being displayed, and displays the received text packet and cached In the text package, the text information in the text package whose time corresponding to the time stamp is less than or equal to the time corresponding to the time stamp of the audio package being played or the video package being displayed.

本步骤中,文本信息可以按照预先设置的显示区域和/或字体大小进行显示。具体地,可以根据显示区域和/或字体大小确定屏幕上一次可以显示的字数,计算一个文本包的文本信息需要显示的次数,根据一个文本包对应的音频包的采集频率确定显示一次的停留时间,按照停留时间进行显示。 In this step, the text information can be displayed according to the preset display area and/or font size. Specifically, the number of words that can be displayed on the screen at one time can be determined according to the display area and/or font size, the number of times the text information of a text package needs to be displayed can be calculated, and the dwell time for displaying once can be determined according to the acquisition frequency of the audio package corresponding to a text package , displayed according to dwell time.

例如,一个文本包对应的音频包的采集频率为20ms采集一次,文本包一共100个字,一次可以显示的字数为10个字,那么需要显示10次,每次 显示的停留时间为2ms。 For example, the audio packet corresponding to a text packet is collected once every 20ms. The text packet has 100 words in total, and the number of words that can be displayed at one time is 10 words, so it needs to be displayed 10 times, and the dwell time of each display is 2ms.

本步骤中,文本信息可以在屏幕的图形层上进行显示,即叠加到显示数字视频信号的视频层上进行显示。 In this step, the text information can be displayed on the graphics layer of the screen, that is, superimposed on the video layer for displaying digital video signals for display.

可选的,步骤200和步骤201之间还包括: Optionally, between step 200 and step 201:

第二终端判断出字幕显示功能已打开。 The second terminal determines that the subtitle display function has been turned on.

当第二终端判断出字幕显示功能关闭时,结束本流程。 When the second terminal determines that the subtitle display function is off, this process ends.

该方法还包括: The method also includes:

第二终端判断出接收到的文本包中的时间戳对应的时间大于正在播放的音频包或正在显示的视频包的时间戳对应的时间,缓存接收到的文本包。 The second terminal determines that the time corresponding to the timestamp in the received text packet is greater than the time corresponding to the timestamp of the audio packet being played or the video packet being displayed, and buffers the received text packet.

该方法还包括: The method also includes:

第二终端在预设时间内未接收到音频包和视频包,显示缓存的文本包中的文本信息。 The second terminal does not receive the audio package and the video package within the preset time, and displays the text information in the cached text package.

上述方法中,第二终端接收到音频包和/或视频包后,可以按照音视频解码协议标准中约定的规则进行播放或显示。 In the above method, after the second terminal receives the audio packet and/or video packet, it can play or display according to the rules stipulated in the audio and video decoding protocol standard.

其中,第二终端接收到音频包后,可以按照音频解码协议标准(如G711)中约定的规则进行播放,第二终端接收到视频包后,可以按照视频解码协议(如H264)中约定的规则进行显示。 Wherein, after the second terminal receives the audio packet, it can play according to the rules agreed in the audio decoding protocol standard (such as G711); after receiving the video packet, the second terminal can play the to display.

参见图3,本发明实施例还提出了一种第一终端,包括: Referring to Fig. 3, the embodiment of the present invention also proposes a first terminal, including:

采集模块,用于分别采集数字音频信号和数字视频信号; Acquisition module, used for respectively collecting digital audio signal and digital video signal;

第一处理模块,用于将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包; The first processing module is used to convert the digital audio signal into text information, encapsulate the text information into a text packet, encapsulate the digital audio signal into an audio packet, and encapsulate the digital video signal into a video packet;

发送模块,用于分别将文本包、音频包和视频包发送给第二终端。 A sending module, configured to respectively send the text packet, the audio packet and the video packet to the second terminal.

本发明实施例的第一终端中,第一处理模块具体用于: In the first terminal of the embodiment of the present invention, the first processing module is specifically used for:

将数字音频信号转换为文本信息,对数字音频信号进行语音编码处理,将文本信息封装成文本包,对语音编码处理后的数字音频信号封装成音频包, 对数字视频信号进行视频编码处理,对视频编码处理后的数字视频信号封装成视频包。 Convert the digital audio signal into text information, perform speech coding processing on the digital audio signal, encapsulate the text information into a text package, package the digital audio signal after the speech coding process into an audio package, and perform video coding processing on the digital video signal. The digital video signal processed by video encoding is encapsulated into a video packet.

参见图4,本发明实施例还提出了一种第二终端,包括: Referring to Figure 4, the embodiment of the present invention also proposes a second terminal, including:

接收模块,用于接收到来自第一终端的文本包; a receiving module, configured to receive a text packet from the first terminal;

第二处理模块,用于判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间,显示接收到的文本包和缓存的文本包中,时间戳字段对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳字段对应的时间的文本包中的文本信息。 The second processing module is used to judge that the time corresponding to the timestamp in the received text packet is less than or equal to the time corresponding to the timestamp of the audio packet being played or the video packet being displayed, and display the received text packet and cache In the text packet, the time corresponding to the timestamp field is less than or equal to the text information in the text packet of the time corresponding to the timestamp field of the audio packet being played or the video packet being displayed.

本发明实施例的第二终端中,第二处理模块还用于: In the second terminal in the embodiment of the present invention, the second processing module is also used for:

当判断出接收到的文本包中的时间戳对应的时间大于正在播放的音频包或正在显示的视频包的时间戳对应的时间时,缓存接收到的文本包。 When it is determined that the time corresponding to the time stamp in the received text package is greater than the time corresponding to the time stamp of the audio package being played or the video package being displayed, the received text package is cached.

本发明实施例的第二终端中,第二处理模块还用于: In the second terminal in the embodiment of the present invention, the second processing module is also used for:

当在接收到文本包后的预设时间内未接收到音频包和视频包时,显示缓存的文本包中的文本信息。 When the audio package and the video package are not received within the preset time after receiving the text package, the text information in the cached text package is displayed.

参见图5,本发明实施例还提出了一种实现视频通话的系统,包括: Referring to Fig. 5, the embodiment of the present invention also proposes a system for implementing video calls, including:

第一终端,用于分别采集数字音频信号和数字视频信号;将数字音频信号转换为文本信息,将文本信息封装成文本包,将数字音频信号封装成音频包,将数字视频信号封装成视频包;分别将文本包、音频包和视频包发送给第二终端; The first terminal is used to separately collect digital audio signals and digital video signals; convert the digital audio signals into text information, encapsulate the text information into text packets, encapsulate the digital audio signals into audio packets, and encapsulate the digital video signals into video packets ; Send the text packet, audio packet and video packet to the second terminal respectively;

第二终端,用于接收到来自第一终端的将文本包;判断出接收到的文本包中的时间戳对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳对应的时间,显示接收到的文本包和缓存的文本包中,时间戳字段对应的时间小于或等于正在播放的音频包或正在显示的视频包的时间戳字段对应的时间的文本包中的文本信息。 The second terminal is used to receive the text package from the first terminal; it is determined that the time corresponding to the time stamp in the received text package is less than or equal to the time corresponding to the time stamp of the audio package being played or the video package being displayed Time, displaying the text information in the text packets whose time corresponding to the timestamp field is less than or equal to the time corresponding to the timestamp field of the audio packet being played or the video packet being displayed in the received text packet and cached text packet.

本发明实施例的系统中,第二终端还用于: In the system of the embodiment of the present invention, the second terminal is also used for:

当判断出接收到的文本包中的时间戳对应的时间大于正在播放的音频包或正在显示的视频包的时间戳对应的时间时,缓存接收到的文本包。 When it is determined that the time corresponding to the time stamp in the received text package is greater than the time corresponding to the time stamp of the audio package being played or the video package being displayed, the received text package is cached.

本发明实施例的系统中,第二终端还用于: In the system of the embodiment of the present invention, the second terminal is also used for:

当在接收到文本包后的预设时间内未接收到音频包和视频包时,显示缓存的文本包中的文本信息。 When the audio package and the video package are not received within the preset time after receiving the text package, the text information in the cached text package is displayed.

需要说明的是,以上所述的实施例仅是为了便于本领域的技术人员理解而已,并不用于限制本发明的保护范围,在不脱离本发明的发明构思的前提下,本领域技术人员对本发明所做出的任何显而易见的替换和改进等均在本发明的保护范围之内。 It should be noted that the above-described embodiments are only for the convenience of those skilled in the art to understand, and are not intended to limit the protection scope of the present invention. Any obvious replacements and improvements made by the invention are within the protection scope of the present invention.

Claims (15)

1. a kind of method for realizing video calling, it is characterised in that including:
First terminal gathers digital audio and video signals and digital video signal respectively;
Digital audio and video signals are converted to text message by first terminal, and text message is packaged into text bag, Digital audio and video signals are packaged into audio pack, digital video signal is packaged into video bag;
Text bag, audio pack and video bag are sent to second terminal by first terminal respectively.
2. according to the method described in claim 1, it is characterised in that described to encapsulate digital audio and video signals Also include before into audio pack:The first terminal carries out voice coding processing to the digital audio and video signals;
It is described digital audio and video signals are packaged into audio pack to include:The first terminal is to voice coding processing Digital audio and video signals afterwards are packaged into the audio pack.
3. according to the method described in claim 1, it is characterised in that described to encapsulate digital video signal Also include before into video bag:The first terminal carries out Video coding processing to the digital video signal;
It is described digital video signal is packaged into video bag to include:The first terminal is to Video coding processing Digital video signal afterwards is packaged into the video bag.
4. a kind of method for realizing video calling, it is characterised in that including:
Second terminal receives the text bag from first terminal;
Second terminal judges that the timestamp corresponding time in the text bag received is less than or equal to The timestamp corresponding time of the audio pack of broadcasting or the video bag shown, show the text received In the text bag of bag and caching, the timestamp field corresponding time is less than or equal to the audio pack played Or the text message in the text bag of the timestamp field corresponding time of the video bag shown.
5. method according to claim 4, it is characterised in that when the second terminal judges institute The timestamp corresponding time in the text bag received is stated more than the audio pack played or is shown Video bag timestamp corresponding time when, this method also includes:
The text bag received described in the second terminal caching.
6. method according to claim 5, it is characterised in that when second terminal receive it is described When not receiving audio pack and video bag in the preset time after text bag, this method also includes:
Text message in the text bag of the second terminal display caching.
7. method according to claim 4, it is characterised in that the second terminal, which is received, to be come from After the text bag of first terminal, the timestamp pair in the second terminal judges the text bag that receives The time answered is less than or equal to the audio pack played or the timestamp of the video bag shown is corresponding Also include before time:
The second terminal judges that caption display function has been opened.
8. a kind of first terminal, it is characterised in that including:
Acquisition module, for gathering digital audio and video signals and digital video signal respectively;
First processing module, for digital audio and video signals to be converted into text message, text message is encapsulated Into text bag, digital audio and video signals are packaged into audio pack, digital video signal is packaged into video bag;
Sending module, for text bag, audio pack and video bag to be sent into second terminal respectively.
9. first terminal according to claim 8, it is characterised in that the first processing module tool Body is used for:
Digital audio and video signals are converted into text message, the digital audio and video signals are carried out at voice coding Reason, text bag is packaged into by text message, and institute is packaged into the digital audio and video signals after voice coding processing Audio pack is stated, Video coding processing is carried out to the digital video signal, to the number after Video coding processing Word vision signal is packaged into the video bag.
10. a kind of second terminal, it is characterised in that including:
Receiving module, for receiving the text bag from first terminal;
Second processing module, is less than for the timestamp corresponding time in the text bag judging to receive Or equal to the timestamp corresponding time of the audio pack played or the video bag shown, display connects In the text bag and the text bag of caching that receive, the timestamp field corresponding time, which is less than or equal to, to be broadcast Text in the text bag of the timestamp field corresponding time of the audio pack put or the video bag shown Information.
11. second terminal according to claim 10, it is characterised in that the Second processing module It is additionally operable to:
It is more than the sound played when the timestamp corresponding time in the text bag received of judging Frequency was wrapped or during the timestamp of video bag that shows corresponding time, the text bag received described in caching.
12. second terminal according to claim 11, it is characterised in that the Second processing module It is additionally operable to:
When not receiving audio pack and video bag in the preset time after receiving the text bag, show Show the text message in the text bag of caching.
13. a kind of system for realizing video calling, it is characterised in that including:
First terminal, for gathering digital audio and video signals and digital video signal respectively;DAB is believed Number text message is converted to, text message is packaged into text bag, digital audio and video signals are packaged into audio Bag, video bag is packaged into by digital video signal;Text bag, audio pack and video bag are sent to respectively Second terminal;
Second terminal, for receiving the text bag from first terminal;Judge the text bag received In the timestamp corresponding time be less than or equal to the audio pack played or the video bag that shows In timestamp corresponding time, the text bag for showing the text bag received and caching, timestamp field pair The time answered is less than or equal to the timestamp field pair of the audio pack played or the video bag shown Text message in the text bag for the time answered.
14. system according to claim 13, it is characterised in that the second terminal is additionally operable to:
It is more than the sound played when the timestamp corresponding time in the text bag received of judging Frequency was wrapped or during the timestamp of video bag that shows corresponding time, the text bag received described in caching.
15. system according to claim 14, it is characterised in that the second terminal is additionally operable to:
When not receiving audio pack and video bag in the preset time after receiving the text bag, show Show the text message in the text bag of caching.
CN201610161286.0A 2016-03-18 2016-03-18 A kind of methods, devices and systems for realizing video calling Pending CN107205131A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610161286.0A CN107205131A (en) 2016-03-18 2016-03-18 A kind of methods, devices and systems for realizing video calling
PCT/CN2017/075195 WO2017157168A1 (en) 2016-03-18 2017-02-28 Method, terminal, system and computer storage medium for video calling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610161286.0A CN107205131A (en) 2016-03-18 2016-03-18 A kind of methods, devices and systems for realizing video calling

Publications (1)

Publication Number Publication Date
CN107205131A true CN107205131A (en) 2017-09-26

Family

ID=59851446

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610161286.0A Pending CN107205131A (en) 2016-03-18 2016-03-18 A kind of methods, devices and systems for realizing video calling

Country Status (2)

Country Link
CN (1) CN107205131A (en)
WO (1) WO2017157168A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933574A (en) * 2019-02-27 2019-06-25 常州猛犸电动科技有限公司 A kind of unique key generation method, device and terminal device
CN110290341A (en) * 2019-07-24 2019-09-27 长沙世邦通信技术有限公司 Follow video intercom method, system and the storage medium of face synchronously displaying subtitle
CN110415706A (en) * 2019-08-08 2019-11-05 常州市小先信息技术有限公司 A kind of technology and its application of superimposed subtitle real-time in video calling

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1992861A (en) * 2005-12-26 2007-07-04 财团法人工业技术研究院 Recording medium for storing subtitle data structure and method for playing the subtitle data
CN101035262A (en) * 2007-04-19 2007-09-12 深圳市融合视讯科技有限公司 Video information transmission method
US20100194979A1 (en) * 2008-11-02 2010-08-05 Xorbit, Inc. Multi-lingual transmission and delay of closed caption content through a delivery system
CN102957892A (en) * 2011-08-24 2013-03-06 三星电子(中国)研发中心 Method, system and device for realizing audio and video conference
CN103685985A (en) * 2012-09-17 2014-03-26 联想(北京)有限公司 Communication method, transmitting device, receiving device, voice processing equipment and terminal equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7561178B2 (en) * 2005-09-13 2009-07-14 International Business Machines Corporation Method, apparatus and computer program product for synchronizing separate compressed video and text streams to provide closed captioning and instant messaging integration with video conferencing
KR20150021258A (en) * 2013-08-20 2015-03-02 삼성전자주식회사 Display apparatus and control method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1992861A (en) * 2005-12-26 2007-07-04 财团法人工业技术研究院 Recording medium for storing subtitle data structure and method for playing the subtitle data
CN101035262A (en) * 2007-04-19 2007-09-12 深圳市融合视讯科技有限公司 Video information transmission method
US20100194979A1 (en) * 2008-11-02 2010-08-05 Xorbit, Inc. Multi-lingual transmission and delay of closed caption content through a delivery system
CN102957892A (en) * 2011-08-24 2013-03-06 三星电子(中国)研发中心 Method, system and device for realizing audio and video conference
CN103685985A (en) * 2012-09-17 2014-03-26 联想(北京)有限公司 Communication method, transmitting device, receiving device, voice processing equipment and terminal equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林健浩: "基于SIP协议的音视频会话技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933574A (en) * 2019-02-27 2019-06-25 常州猛犸电动科技有限公司 A kind of unique key generation method, device and terminal device
CN109933574B (en) * 2019-02-27 2021-03-19 常州猛犸电动科技有限公司 Unique key generation method and device and terminal equipment
CN110290341A (en) * 2019-07-24 2019-09-27 长沙世邦通信技术有限公司 Follow video intercom method, system and the storage medium of face synchronously displaying subtitle
CN110415706A (en) * 2019-08-08 2019-11-05 常州市小先信息技术有限公司 A kind of technology and its application of superimposed subtitle real-time in video calling

Also Published As

Publication number Publication date
WO2017157168A1 (en) 2017-09-21

Similar Documents

Publication Publication Date Title
TWI501673B (en) Method of synchronized playing video and audio data and system thereof
KR102229109B1 (en) Transmitting apparatus and receiving apparatus and signal processing method thereof
KR101733501B1 (en) Broadcast signal transmitting method, broadcast signal receiving method, broadcast signal transmitting apparatus, and broadcast signal receiving apparatus
KR101721884B1 (en) Method for transmitting broadcast signal, method for receiving broadcast signal, apparatus for transmitting broadcast signal, and apparatus for receiving broadcast signal
US7730380B2 (en) Method and apparatus for transmitting/receiving voice over internet protocol packets with a user datagram protocol checksum in a mobile communication system
WO2009006804A1 (en) A method for transmitting mobile multimedia broadcast service data flow and a multiplexing frame for transmitting
WO2000072549A3 (en) Method and apparatus for telecommunications using internet protocol
KR101764634B1 (en) Method for transmitting broadcast signal, method for receiving broadcast signal, apparatus for transmitting broadcast signal, and apparatus for receiving broadcast signal
KR20060054662A (en) Apparatus and method for header compression in broadband wireless communication system
CN108055566A (en) Method, device, equipment and computer-readable storage medium for synchronizing audio and video
CN109600341B (en) Instant messaging detection method, equipment and computer storage medium
CN107205131A (en) A kind of methods, devices and systems for realizing video calling
CN108597529A (en) A kind of police digital cluster system air interface speech monitoring system and method
US20030067922A1 (en) Communication method, communication device, and communication terminal
FI961442A (en) Voice over packet data
CN101163226B (en) Method and system for implementing mobile video session using WiMAX network
CN107707726A (en) A kind of terminal and call method communicated for normal person with deaf-mute
CN105323250B (en) A kind of data transmission method based on PTT public network cluster intercom system
CN112887497B (en) Communication method, apparatus and computer storage medium
CN109714295B (en) Voice encryption and decryption synchronous processing method and device
CN101022558A (en) Information source adapter based on SAF
WO2015154557A1 (en) Data packet transmission processing method and device
CN115103228A (en) Video streaming method, device, electronic device, storage medium and product
CN101453286A (en) Method for digital audio multiplex transmission in multimedia broadcasting system
JP2008245061A (en) PCR regeneration method for IP stream transmission

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170926