Interactive mode, device, terminal and storage medium for realizing barrier-free video chat
Technical Field
The invention relates to the technical field of video chatting, in particular to an interaction mode, a device, a terminal and a storage medium for realizing barrier-free video chatting.
Background
The video chat is used as a real-time audio and video communication mode and widely applied to various scenes such as IM chat, video conference, video authentication of bank security fund insurance financial business and the like. The video chat is used as a communication mode for simultaneously presenting images and sounds of two parties, the shortage of information amount in telephone, IM voice and text chat is solved, and along with the development of smart phones, Internet of things and WiFi/4G/5G wireless communication technologies, the user threshold of the video chat is greatly reduced and is more and more common.
However, for the hearing-impaired people, the existing video chat products generally cannot meet the communication requirements of the hearing-impaired people and have two main problems, the general video chat products are not matched with a text communication function, and cannot perform supplementary text communication when the hearing-impaired people cannot hear the voice and cannot pronounce the voice during chatting, the general video chat products are not matched with an ASR function and a TTS function, and cannot automatically convert the voice into subtitles during chatting, so that the hearing-impaired people can understand the voice expression of the hearing-impaired people by reading the text conveniently, and the hearing-impaired people cannot input the text to enable the hearing-impaired people to understand the expression of the hearing-impaired people by converting the text into the voice.
The reasons for the problems are mainly that a hearing-impaired person is taken as a social disadvantaged group, various software lacks of optimization specially for the needs of the hearing-impaired person, two-video chatting is popularized gradually after 4G/5G network popularization and mobile internet highly develop, related applications are continuously perfected at present, text communication is added in three-video chatting, the scene is relatively numerous, the UI design of the existing video chatting interface design adding text chatting is difficult to balance business experience, and four-ASR and TTS technologies are also applied in large quantities after cloud computing, big data, machine learning and artificial intelligence technologies develop rapidly in recent years. The addition of ASR and TTS technologies in video chat also has a certain technical threshold, for example, audio stream separation, transcoding and then ASR are required, for example, text stream TTS is required to be converted into audio stream, and then encapsulated with video stream into media stream, which all require the addition of computational cost of audio/video and artificial intelligence.
The third and fourth mentioned above are also the main technical challenges encountered in the development and implementation of the present invention, and therefore, an interactive mode, an apparatus, a terminal and a storage medium for implementing barrier-free video chat are proposed to solve the problem.
Disclosure of Invention
The invention aims to provide an interaction mode, a device, a terminal and a storage medium for realizing barrier-free video chat, and solves the problem that the existing video chat product is not matched with a text communication function and cannot convert voice into subtitles during chat.
In order to achieve the above object, the first aspect of the present invention provides the following solutions: an interactive mode for realizing barrier-free video chat comprises the following steps:
establishing a session of a client A, firstly entering a chat room, accessing a chat signaling control service, performing the chat signaling control service, authenticating, allocating a client identifier, allocating an audio and video stream playing address, allocating an audio and video stream pushing address, allocating a text chat session address, acquiring local audio and video through an audio and video acquisition device, and playing local video on a local audio and video player;
the video stream collector starts to push the address to the audio and video stream;
the text chatting module starts to establish chatting to the text chatting session address;
the audio and video stream player starts to prepare for pulling stream from the audio and video stream playing address;
sharing video chat and inviting a client B to join;
establishing a session of a client B, entering a chat room by the client B, accessing a chat signaling control service, performing the chat signaling control service, authenticating, allocating a client identifier, allocating an audio and video stream playing address, allocating an audio and video stream pushing address, allocating a text chat session address, acquiring local audio and video through an audio and video collector, and playing local video on a local audio and video player;
the video stream collector starts to push the address to the audio and video stream;
the text chatting module starts to establish chatting to the text chatting session address;
the audio and video stream player starts to prepare for pulling stream from the audio and video stream playing address;
the method comprises the steps that ASR subtitle processing is added to a sound and video of a healthy person, audio and video streaming service is conducted, the audio of a client B is separated, ASR service is conducted, the audio and video are subjected to character conversion, the character conversion result is delivered to character chatting service for mixing processing, then the character conversion result is delivered to audio and video streaming service for mixing processing of video subtitles, the audio and video of the client B are mixed with video subtitles, then the audio and video streaming service is conducted, the audio and video + subtitle broadcasting service of the client B is improved for the client A, character chatting room service is conducted, the character conversion result of the client B is provided for the client A to be read by the client A, meanwhile, character chatting room service is conducted, the own character conversion result is provided for the client B, and the client B can read the video of a playing service end of a server audio and video streaming player;
the audio and video of the hearing-impaired person is added with TTS audio processing, a hearing-impaired user inputs characters in a character chatting module for communication, a character chatting session address is served to a character chatting room, and the character chatting content of a client A is pushed, the character chatting content of the client A is presented by the character chatting room service in a chatting window of the character chatting module, the character chatting room service shows the character conversation content of the two parties, the character chatting content of the client A is given to the TTS service for voice conversion, the TTS service carries out voice conversion on the characters, the TTS service gives a voice conversion result to the audio and video service for voice mixing processing, the audio and video service mixes the audio and video of the client A into TTS voice, the audio and video streaming service is provided for the client B, and the audio and video playing service of the client A is played by an audio and video streaming player at a service end;
keeping the chat session, taking the case that the client A is disconnected and reenters as an example, the client A accidentally drops out of the chat room, the client A reenters the chat room, intervenes in the chat signaling control service and the chat signaling control service, and gives the session ID to the client A to enable the client A to reenter the chat session;
ending the chat session, taking the client A actively ending the session as an example, notifying the chat signaling control service, ending the chat session, notifying the chat signaling control service, notifying the client B, ending the chat session, notifying the audio and video streaming service, notifying the text chat room service, and ending the chat session service.
In order to achieve the above object, the second aspect of the present invention provides the following solutions: realize accessible video chat device includes:
the barrier-free video chat device comprises a client system and a server system.
Preferably, the client system has two or more clients in a chat session.
Preferably, the client is implemented by two or more of software, APP, small program, webpage and H5.
Preferably, the basic module of the client system needs to include an audio/video stream player, an audio/video stream collector and a text chat module.
Preferably, the server system is deployed in a stand-alone manner or in a distributed manner, and the hardware of the server system is deployed in a local hardware server or a cloud server which has a public network IP or a domain name and is provided with a CPU operation unit, a memory processing unit and a hard disk storage unit.
Preferably, the basic module of the server system needs to include a chat signaling control service, an audio and video streaming service, a text chat room service, an ASR service and a TTS service.
Preferably, the operating system of the server system is Windows, Linux or Unix.
In order to achieve the above object, a third aspect of the present invention provides the following solutions: realize accessible video chat terminal includes:
the system terminal comprises a system memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line;
the at least one processor invoking the instructions in the memory to cause the server to perform the steps of the unobstructed video chat interactive mode of claim 1.
In order to achieve the above object, a fourth aspect of the present invention provides the following solutions: implementing a barrier-free video chat-readable storage medium, comprising:
the computer readable storage medium has stored thereon instructions which, when executed by a processor, perform the steps of implementing an interactive mode of barrier-free video chat for hearing impaired people as claimed in claim 1.
Compared with the prior art, the invention has the beneficial effects that:
the invention realizes the function of text communication during video chat and the function of automatically converting voice into caption during chat by combining the UI interface design of the user with the artificial intelligence technology, meets the requirement of barrier-free communication and simultaneously meets the requirement of barrier-free video chat between the hearing impaired and the hearing-healthy people.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a first flowchart illustrating an interaction method according to the present invention;
FIG. 3 is a second flowchart illustrating an interaction method according to the present invention;
FIG. 4 is a third schematic flow chart of the interactive mode of the present invention;
fig. 5 is a fourth schematic flow chart of the interaction method of the present invention.
Detailed Description
The present invention will now be described in more detail by way of examples, which are given by way of illustration only and are not intended to limit the scope of the present invention in any way.
In a first aspect of the present invention, the present invention provides a technical solution: the interactive mode for realizing barrier-free video chat comprises the following steps:
firstly, a client A session is established, a chat room is entered, chat signaling control service is accessed, the chat signaling control service is carried out, authentication is carried out, client identification is allocated, an audio and video stream playing address is allocated, an audio and video stream pushing address is allocated, a text chat session address is allocated, local audio and video are collected through an audio and video collector, and local video is played on a local audio and video player;
the video stream collector starts to push the address to the audio and video stream;
the text chatting module starts to establish chatting to the text chatting session address;
the audiovisual stream player begins preparing to pull streams from the audiovisual stream playing address.
And secondly, sharing the video chat and inviting the client B to join.
Establishing a session of a client B, entering a chat room by the client B, accessing a chat signaling control service, performing the chat signaling control service, authenticating, allocating a client identifier, allocating an audio and video stream playing address, allocating an audio and video stream pushing address, allocating a text chat session address, acquiring local audio and video through an audio and video collector, and playing local video on a local audio and video player;
the video stream collector starts to push the address to the audio and video stream;
the text chatting module starts to establish chatting to the text chatting session address;
the audiovisual stream player begins preparing to pull streams from the audiovisual stream playing address.
And fourthly, adding ASR caption processing to the voice and video of the healthy listening person, performing audio and video streaming service, separating the audio of the client B, performing ASR service, performing character conversion on the audio and video, delivering a character conversion result to character chatting service for mixing processing, delivering the character conversion result to the audio and video streaming service for mixing processing of the video caption, performing audio and video streaming service, mixing the audio and video of the client B with the video caption, and then performing the audio and video streaming service, so that the audio and video broadcasting service and the caption broadcasting service of the client B are improved for the client A, and the character chatting room service is provided for the client A to provide the character conversion result of the client B for the client A to read, and simultaneously, the character chatting room service provides own character conversion result for the client B to read the video of a playing service end of a server audio and video streaming player.
And fifthly, adding TTS audio processing to the audio and video of the hearing-impaired person, inputting characters into a character chatting module by a hearing-impaired user for communication, serving a character chatting session address to a character chatting room, pushing character chatting contents of the client A, displaying character conversation contents of the client A and the character chatting room in a chatting window of the character chatting module, serving the character chatting room, transferring the character chatting contents of the client A to TTS service for voice conversion, serving the TTS service for voice conversion, transferring a voice conversion result to audio and video service for voice mixing processing by the TTS service, serving the audio and video TTS service, mixing the audio and video of the client A into voice, serving the client B with audio and video playing service of the client A, and playing the service video by a service-side audio and video stream player.
And sixthly, keeping the chat session, taking the case that the client A is disconnected and reenters as an example, the client A accidentally drops out of the chat room, the client A reenters the chat room, intervenes in the chat signaling control service and the chat signaling control service, and gives the session ID to the client A to enable the client A to reenter the chat session.
And seventhly, ending the chat session, taking the client A actively ending the session as an example, notifying the chat signaling control service, ending the chat session, notifying the client B, ending the chat session, notifying the audio and video streaming service, notifying the text chat room service, and ending the chat session service.
In a second aspect of the present invention, the present invention provides a technical solution: realize accessible video chat device includes:
the barrier-free video chat device comprises a client system and a server system.
The client system has two or more clients in one chat session.
The client is specifically implemented by two or more of software, APP, applet, webpage and H5.
The basic modules of the client system need to include an audio and video stream player, an audio and video stream collector and a text chat module.
And the audio and video stream player is used for playing the server side and the local video, playing the audio of the server side and superposing and presenting the subtitles when playing the audio and video of the server side.
And the audio and video stream collector is used for collecting local audio and video data by a camera and a microphone module of an operating system operated by the client, transmitting the collected audio and video data to the server for processing, and finally playing the data by one or more clients at the opposite end.
And the text chatting module is used for processing text communication contents in the chatting conversation, supporting the input of the text contents in the conversation and supporting the display of the chatting conversation between the local terminal and the opposite terminal.
The server system adopts single machine deployment or distributed deployment, and the hardware of the server system adopts a local hardware server or a cloud server which has a public network IP or a domain name and is provided with a CPU (Central processing Unit), a memory processing unit and a hard disk storage unit.
The basic modules of the server system need to comprise chat signaling control service, audio and video streaming service, text chat room service, ASR service and TTS service.
The chat signaling control service is used for controlling the chat process and providing a chat session establishment method for the client, providing an audio and video stream playing address for the client, providing an audio and video stream pushing address for the client, providing a character chat session address for the client, notifying each client when the chat session is closed, providing a maintenance recovery service for the chat session, providing the chat session address to which a character recognition result is sent for ASR service, and providing an audio and video stream service pushing address of a voice result for TTS service.
And the audio and video stream service is used for processing the audio and video stream media part in the chat process, providing the playing service of the audio and video stream for the client, providing the pushing service of the audio and video for the client, mixing the audio stream provided by the TTS service into the audio and video stream needing to be played by the client, separating the audio stream of the client, delivering the audio stream to the ASR service for identification and conversion, generating a caption from characters generated by the ASR service, and mixing the caption into the audio and video stream needing to be played by the client.
The text chat room service is used for processing the text chat part in the chat process, processing the receiving and distribution of text chat contents from the client, providing text recognition result receiving service for ASR service, mixing the text recognition results into the corresponding chat rooms, providing the text chat contents from the client to TTS service, and performing TTS processing.
And the ASR service is used for converting the voice part in the chatting process into characters, identifying the audio stream, converting the audio stream into characters, sending the character identification result to a character chatting room address provided by the chatting signaling control service, and sending the character identification result to the audio and video streaming service for mixing into subtitles.
TTS service, which is used to convert the text part into voice, and convert the text into voice, and send the voice conversion result to the audio/video stream service address provided by the chat signaling control service.
The operating system of the server system is Windows, Linux or Unix.
In a third aspect of the present invention, the present invention provides a technical solution: realize accessible video chat terminal includes:
the system terminal comprises a system memory and at least one processor, wherein instructions are stored in the memory, and the memory and the at least one processor are interconnected through a line;
the at least one processor invokes instructions in memory to cause the server to perform the steps of the unobstructed video chat interactive mode as in claim 1.
In a fourth aspect of the present invention, the present invention provides a technical solution: realize accessible video chat terminal includes:
the computer readable storage medium has stored thereon instructions which, when executed by the processor, perform the steps of implementing the barrier-free video chat for hearing impaired people as claimed in claim 1.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.