CN113766165A

CN113766165A - Interaction method, device, terminal and storage medium for realizing barrier-free video chat

Info

Publication number: CN113766165A
Application number: CN202110917966.1A
Authority: CN
Inventors: 宋涛; 柏超
Original assignee: Guangzhou Yiyu Intelligent Technology Co ltd
Current assignee: Guangzhou Yiyu Intelligent Technology Co ltd
Priority date: 2021-08-11
Filing date: 2021-08-11
Publication date: 2021-12-07

Abstract

The invention discloses an interactive mode, a device, a terminal and a storage medium for realizing barrier-free video chat, including: client A establishes a session, first enters a chat room, accesses a chat signaling control service, performs the chat signaling control service, and authenticates , assign the client ID, assign the audio and video stream playback address, assign the audio and video stream push address, assign the text chat session address, collect the local audio and video through the audio and video collector, play the local video on the local audio and video player, and collect the video stream The device starts to push the address to the audio and video stream, and the text chat module starts to create a chat to the text chat session address; the patent of the present invention realizes the function of text communication during video chat by combining the user UI interface design and artificial intelligence technology. The function of automatically converting voice into subtitles when chatting meets the needs of barrier-free communication, as well as the barrier-free video chat communication needs of hearing-impaired people and hearing people.

Description

Interactive mode, device, terminal and storage medium for realizing barrier-free video chat

Technical Field

The invention relates to the technical field of video chatting, in particular to an interaction mode, a device, a terminal and a storage medium for realizing barrier-free video chatting.

Background

The video chat is used as a real-time audio and video communication mode and widely applied to various scenes such as IM chat, video conference, video authentication of bank security fund insurance financial business and the like. The video chat is used as a communication mode for simultaneously presenting images and sounds of two parties, the shortage of information amount in telephone, IM voice and text chat is solved, and along with the development of smart phones, Internet of things and WiFi/4G/5G wireless communication technologies, the user threshold of the video chat is greatly reduced and is more and more common.

However, for the hearing-impaired people, the existing video chat products generally cannot meet the communication requirements of the hearing-impaired people and have two main problems, the general video chat products are not matched with a text communication function, and cannot perform supplementary text communication when the hearing-impaired people cannot hear the voice and cannot pronounce the voice during chatting, the general video chat products are not matched with an ASR function and a TTS function, and cannot automatically convert the voice into subtitles during chatting, so that the hearing-impaired people can understand the voice expression of the hearing-impaired people by reading the text conveniently, and the hearing-impaired people cannot input the text to enable the hearing-impaired people to understand the expression of the hearing-impaired people by converting the text into the voice.

The reasons for the problems are mainly that a hearing-impaired person is taken as a social disadvantaged group, various software lacks of optimization specially for the needs of the hearing-impaired person, two-video chatting is popularized gradually after 4G/5G network popularization and mobile internet highly develop, related applications are continuously perfected at present, text communication is added in three-video chatting, the scene is relatively numerous, the UI design of the existing video chatting interface design adding text chatting is difficult to balance business experience, and four-ASR and TTS technologies are also applied in large quantities after cloud computing, big data, machine learning and artificial intelligence technologies develop rapidly in recent years. The addition of ASR and TTS technologies in video chat also has a certain technical threshold, for example, audio stream separation, transcoding and then ASR are required, for example, text stream TTS is required to be converted into audio stream, and then encapsulated with video stream into media stream, which all require the addition of computational cost of audio/video and artificial intelligence.

The third and fourth mentioned above are also the main technical challenges encountered in the development and implementation of the present invention, and therefore, an interactive mode, an apparatus, a terminal and a storage medium for implementing barrier-free video chat are proposed to solve the problem.

Disclosure of Invention

The invention aims to provide an interaction mode, a device, a terminal and a storage medium for realizing barrier-free video chat, and solves the problem that the existing video chat product is not matched with a text communication function and cannot convert voice into subtitles during chat.

In order to achieve the above object, the first aspect of the present invention provides the following solutions: an interactive mode for realizing barrier-free video chat comprises the following steps:

establishing a session of a client A, firstly entering a chat room, accessing a chat signaling control service, performing the chat signaling control service, authenticating, allocating a client identifier, allocating an audio and video stream playing address, allocating an audio and video stream pushing address, allocating a text chat session address, acquiring local audio and video through an audio and video acquisition device, and playing local video on a local audio and video player;

the video stream collector starts to push the address to the audio and video stream;

the text chatting module starts to establish chatting to the text chatting session address;

the audio and video stream player starts to prepare for pulling stream from the audio and video stream playing address;

sharing video chat and inviting a client B to join;

establishing a session of a client B, entering a chat room by the client B, accessing a chat signaling control service, performing the chat signaling control service, authenticating, allocating a client identifier, allocating an audio and video stream playing address, allocating an audio and video stream pushing address, allocating a text chat session address, acquiring local audio and video through an audio and video collector, and playing local video on a local audio and video player;

the method comprises the steps that ASR subtitle processing is added to a sound and video of a healthy person, audio and video streaming service is conducted, the audio of a client B is separated, ASR service is conducted, the audio and video are subjected to character conversion, the character conversion result is delivered to character chatting service for mixing processing, then the character conversion result is delivered to audio and video streaming service for mixing processing of video subtitles, the audio and video of the client B are mixed with video subtitles, then the audio and video streaming service is conducted, the audio and video + subtitle broadcasting service of the client B is improved for the client A, character chatting room service is conducted, the character conversion result of the client B is provided for the client A to be read by the client A, meanwhile, character chatting room service is conducted, the own character conversion result is provided for the client B, and the client B can read the video of a playing service end of a server audio and video streaming player;

the audio and video of the hearing-impaired person is added with TTS audio processing, a hearing-impaired user inputs characters in a character chatting module for communication, a character chatting session address is served to a character chatting room, and the character chatting content of a client A is pushed, the character chatting content of the client A is presented by the character chatting room service in a chatting window of the character chatting module, the character chatting room service shows the character conversation content of the two parties, the character chatting content of the client A is given to the TTS service for voice conversion, the TTS service carries out voice conversion on the characters, the TTS service gives a voice conversion result to the audio and video service for voice mixing processing, the audio and video service mixes the audio and video of the client A into TTS voice, the audio and video streaming service is provided for the client B, and the audio and video playing service of the client A is played by an audio and video streaming player at a service end;

keeping the chat session, taking the case that the client A is disconnected and reenters as an example, the client A accidentally drops out of the chat room, the client A reenters the chat room, intervenes in the chat signaling control service and the chat signaling control service, and gives the session ID to the client A to enable the client A to reenter the chat session;

ending the chat session, taking the client A actively ending the session as an example, notifying the chat signaling control service, ending the chat session, notifying the chat signaling control service, notifying the client B, ending the chat session, notifying the audio and video streaming service, notifying the text chat room service, and ending the chat session service.

In order to achieve the above object, the second aspect of the present invention provides the following solutions: realize accessible video chat device includes:

the barrier-free video chat device comprises a client system and a server system.

Preferably, the client system has two or more clients in a chat session.

Preferably, the client is implemented by two or more of software, APP, small program, webpage and H5.

Preferably, the basic module of the client system needs to include an audio/video stream player, an audio/video stream collector and a text chat module.

Preferably, the server system is deployed in a stand-alone manner or in a distributed manner, and the hardware of the server system is deployed in a local hardware server or a cloud server which has a public network IP or a domain name and is provided with a CPU operation unit, a memory processing unit and a hard disk storage unit.

Preferably, the basic module of the server system needs to include a chat signaling control service, an audio and video streaming service, a text chat room service, an ASR service and a TTS service.

Preferably, the operating system of the server system is Windows, Linux or Unix.

In order to achieve the above object, a third aspect of the present invention provides the following solutions: realize accessible video chat terminal includes:

the system terminal comprises a system memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line;

the at least one processor invoking the instructions in the memory to cause the server to perform the steps of the unobstructed video chat interactive mode of claim 1.

In order to achieve the above object, a fourth aspect of the present invention provides the following solutions: implementing a barrier-free video chat-readable storage medium, comprising:

the computer readable storage medium has stored thereon instructions which, when executed by a processor, perform the steps of implementing an interactive mode of barrier-free video chat for hearing impaired people as claimed in claim 1.

Compared with the prior art, the invention has the beneficial effects that:

the invention realizes the function of text communication during video chat and the function of automatically converting voice into caption during chat by combining the UI interface design of the user with the artificial intelligence technology, meets the requirement of barrier-free communication and simultaneously meets the requirement of barrier-free video chat between the hearing impaired and the hearing-healthy people.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

FIG. 2 is a first flowchart illustrating an interaction method according to the present invention;

FIG. 3 is a second flowchart illustrating an interaction method according to the present invention;

FIG. 4 is a third schematic flow chart of the interactive mode of the present invention;

fig. 5 is a fourth schematic flow chart of the interaction method of the present invention.

Detailed Description

The present invention will now be described in more detail by way of examples, which are given by way of illustration only and are not intended to limit the scope of the present invention in any way.

In a first aspect of the present invention, the present invention provides a technical solution: the interactive mode for realizing barrier-free video chat comprises the following steps:

firstly, a client A session is established, a chat room is entered, chat signaling control service is accessed, the chat signaling control service is carried out, authentication is carried out, client identification is allocated, an audio and video stream playing address is allocated, an audio and video stream pushing address is allocated, a text chat session address is allocated, local audio and video are collected through an audio and video collector, and local video is played on a local audio and video player;

the audiovisual stream player begins preparing to pull streams from the audiovisual stream playing address.

And secondly, sharing the video chat and inviting the client B to join.

And fourthly, adding ASR caption processing to the voice and video of the healthy listening person, performing audio and video streaming service, separating the audio of the client B, performing ASR service, performing character conversion on the audio and video, delivering a character conversion result to character chatting service for mixing processing, delivering the character conversion result to the audio and video streaming service for mixing processing of the video caption, performing audio and video streaming service, mixing the audio and video of the client B with the video caption, and then performing the audio and video streaming service, so that the audio and video broadcasting service and the caption broadcasting service of the client B are improved for the client A, and the character chatting room service is provided for the client A to provide the character conversion result of the client B for the client A to read, and simultaneously, the character chatting room service provides own character conversion result for the client B to read the video of a playing service end of a server audio and video streaming player.

And fifthly, adding TTS audio processing to the audio and video of the hearing-impaired person, inputting characters into a character chatting module by a hearing-impaired user for communication, serving a character chatting session address to a character chatting room, pushing character chatting contents of the client A, displaying character conversation contents of the client A and the character chatting room in a chatting window of the character chatting module, serving the character chatting room, transferring the character chatting contents of the client A to TTS service for voice conversion, serving the TTS service for voice conversion, transferring a voice conversion result to audio and video service for voice mixing processing by the TTS service, serving the audio and video TTS service, mixing the audio and video of the client A into voice, serving the client B with audio and video playing service of the client A, and playing the service video by a service-side audio and video stream player.

And sixthly, keeping the chat session, taking the case that the client A is disconnected and reenters as an example, the client A accidentally drops out of the chat room, the client A reenters the chat room, intervenes in the chat signaling control service and the chat signaling control service, and gives the session ID to the client A to enable the client A to reenter the chat session.

And seventhly, ending the chat session, taking the client A actively ending the session as an example, notifying the chat signaling control service, ending the chat session, notifying the client B, ending the chat session, notifying the audio and video streaming service, notifying the text chat room service, and ending the chat session service.

In a second aspect of the present invention, the present invention provides a technical solution: realize accessible video chat device includes:

The client system has two or more clients in one chat session.

The client is specifically implemented by two or more of software, APP, applet, webpage and H5.

The basic modules of the client system need to include an audio and video stream player, an audio and video stream collector and a text chat module.

And the audio and video stream player is used for playing the server side and the local video, playing the audio of the server side and superposing and presenting the subtitles when playing the audio and video of the server side.

And the audio and video stream collector is used for collecting local audio and video data by a camera and a microphone module of an operating system operated by the client, transmitting the collected audio and video data to the server for processing, and finally playing the data by one or more clients at the opposite end.

And the text chatting module is used for processing text communication contents in the chatting conversation, supporting the input of the text contents in the conversation and supporting the display of the chatting conversation between the local terminal and the opposite terminal.

The server system adopts single machine deployment or distributed deployment, and the hardware of the server system adopts a local hardware server or a cloud server which has a public network IP or a domain name and is provided with a CPU (Central processing Unit), a memory processing unit and a hard disk storage unit.

The basic modules of the server system need to comprise chat signaling control service, audio and video streaming service, text chat room service, ASR service and TTS service.

The chat signaling control service is used for controlling the chat process and providing a chat session establishment method for the client, providing an audio and video stream playing address for the client, providing an audio and video stream pushing address for the client, providing a character chat session address for the client, notifying each client when the chat session is closed, providing a maintenance recovery service for the chat session, providing the chat session address to which a character recognition result is sent for ASR service, and providing an audio and video stream service pushing address of a voice result for TTS service.

And the audio and video stream service is used for processing the audio and video stream media part in the chat process, providing the playing service of the audio and video stream for the client, providing the pushing service of the audio and video for the client, mixing the audio stream provided by the TTS service into the audio and video stream needing to be played by the client, separating the audio stream of the client, delivering the audio stream to the ASR service for identification and conversion, generating a caption from characters generated by the ASR service, and mixing the caption into the audio and video stream needing to be played by the client.

The text chat room service is used for processing the text chat part in the chat process, processing the receiving and distribution of text chat contents from the client, providing text recognition result receiving service for ASR service, mixing the text recognition results into the corresponding chat rooms, providing the text chat contents from the client to TTS service, and performing TTS processing.

And the ASR service is used for converting the voice part in the chatting process into characters, identifying the audio stream, converting the audio stream into characters, sending the character identification result to a character chatting room address provided by the chatting signaling control service, and sending the character identification result to the audio and video streaming service for mixing into subtitles.

TTS service, which is used to convert the text part into voice, and convert the text into voice, and send the voice conversion result to the audio/video stream service address provided by the chat signaling control service.

The operating system of the server system is Windows, Linux or Unix.

In a third aspect of the present invention, the present invention provides a technical solution: realize accessible video chat terminal includes:

the at least one processor invokes instructions in memory to cause the server to perform the steps of the unobstructed video chat interactive mode as in claim 1.

In a fourth aspect of the present invention, the present invention provides a technical solution: realize accessible video chat terminal includes:

the computer readable storage medium has stored thereon instructions which, when executed by the processor, perform the steps of implementing the barrier-free video chat for hearing impaired people as claimed in claim 1.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. realize the interactive mode of barrier-free video chat, it is characterized in that: the interactive mode of described barrier-free video chat, comprises:

Client A session is established, first enters the chat room, accesses the chat signaling control service, the chat signaling control service is performed, authenticates, assigns the client identifier, assigns the audio and video stream playback address, assigns the audio and video stream push address, assigns The text chat session address, collect local audio and video through the audio and video collector, and play the local video on the local audio and video player;

The video stream collector starts to push the stream to the address of the audio and video stream;

The text chat module starts to establish a chat to the text chat session address;

The audio and video stream player starts to prepare to pull the stream from the audio and video stream playback address;

Share video chat and invite client B to join;

Client B session is established, client B enters the chat room, accesses the chat signaling control service, the chat signaling control service is performed, authenticates, assigns the client ID, assigns the audio and video stream playback address, and assigns the audio and video stream push address , assign a text chat session address, collect local audio and video through the audio and video collector, and play the local video on the local audio and video player;

Audio and video add ASR subtitle processing, audio and video streaming service, separate the audio of client B, ASR service, perform text conversion on audio and video, send the text conversion result to the text chat service for mixing processing, and then convert the text The result is handed over to the audio and video streaming service for video subtitle mixing processing, the audio and video streaming service, and then the audio and video of client B is mixed with video subtitles, and then the audio and video streaming service improves the audio and video + subtitle broadcasting of client B for client A. Service, text chat room service, provides client A with the text conversion result of client B for client A to read, and text chat room service provides its own text conversion result to client B for client B to read in the service The playback server video of the audio and video stream player;

TTS audio processing is added to the audio and video of the hearing-impaired. The hearing-impaired user enters text in the text chat module to communicate, sends the text chat session address to the text chat room service, and pushes the text chat content of client A, and the text chat room service sends the client The text chat content of A, in the chat window of the text chat module, the text conversation content of the two parties is displayed, the text chat room service, the text chat content of client A is handed over to the TTS service for voice conversion, and the TTS service is for text voice conversion. , the TTS service transfers the voice conversion result to the audio and video service for sound mixing processing, audio and video service, mixes the audio and video of client A into the TTS sound, audio and video streaming service, and provides the audio and video playback service of client A for client B , on the server-side audio and video stream player, play the server-side video;

The chat session is maintained. Take client A disconnected and re-entered as an example. Client A accidentally falls out of the chat room, and client A re-enters the chat room, intervenes in the chat signaling control service, and the chat signaling control service will end the session ID. To client A, so that client A re-enters the chat session;

End a chat session, take client A's initiative to end the session as an example, notify the chat signaling control service, end the chat session, chat signaling control service, notify client B, chat session end chat signaling control service, notify audio and video streaming service , text chat room service, end chat session service.

2. The device for realizing barrier-free video chat is characterized in that: the device for realizing barrier-free video chat includes a client system and a server system.

3 . The device for realizing barrier-free video chat according to claim 2 , wherein, in the client system, in a chat session, there are two or more clients. 4 .

4 . The device for implementing barrier-free video chat according to claim 3 , wherein the client is implemented by two or more of software, APP, applet, web page and H5 multiple technologies. 5 .

5 . The device for realizing barrier-free video chat according to claim 3 , wherein the basic modules of the client system need to include an audio and video stream player, an audio and video stream collector and a text chat module. 6 .

6. The realization barrier-free video chatting device according to claim 2, is characterized in that: described server system, server system adopts stand-alone deployment or distributed deployment, and its hardware adopts and is deployed in possessing public network IP or domain name, possessing CPU computing unit, memory processing unit, local hardware server or cloud server for hard disk storage units.

7 . The device for realizing barrier-free video chat according to claim 6 , wherein the basic module of the server system needs to include a chat signaling control service, an audio and video streaming service, a text chat room service, an ASR service and a TTS service. 8 .

8. The device for realizing barrier-free video chat according to claim 7, wherein the operating system of the server system is Windows, Linux or Unix.

9. A barrier-free video chat terminal is realized, characterized in that: the system terminal comprises a system memory and at least one processor, and instructions are stored in the memory, and the memory and at least one processor are interconnected through a line;

The at least one processor invokes the instructions in the memory to cause the server to perform the steps of the interactive manner of the barrier-free video chat of claim 1 .

10. Realize barrier-free video chat readable storage medium, it is characterized in that: described computer-readable storage medium is stored with instruction, when instruction is executed by processor, realize the barrier-free video chat of hearing-impaired person as in claim 1. Interactive mode steps.