Real-time intercom intervention and alarm platform based on AI technology
Technical Field
The invention belongs to the technical field of intercom intervention and alarm, and particularly relates to a real-time intercom intervention and alarm platform based on an AI technology.
Background
In daily life, communication is ubiquitous, and communication modes are also various. Under a specific scene, even if an administrator monitors the call, it is difficult to reflect and intervene sensitive information in the call in real time.
Disclosure of Invention
In order to solve the defects of the prior art, the invention aims to provide a real-time intercom intervention and alarm platform based on an AI technology so as to overcome the defects in the prior art.
In order to achieve the above purpose, the invention provides a real-time intercom intervention and warning platform based on an AI technology, which comprises a communication server, a media resource control server and an AI voice training and recognition platform, wherein the communication server is electrically connected and in signal connection with the media resource control server, the communication server comprises an MRCP client, a user agent, a session communication module and an intervention module, wherein the user agent is used for accessing a plurality of user terminals, the session communication module is used for acquiring communication contents of the user terminals in real time and converting the communication contents into audio media streams, the MRCP client is used for pulling the audio media streams in real time and sending the audio media streams to the media resource control server, the media resource control server is electrically connected and in signal connection with the AI voice training and recognition platform, the AI voice training and recognition platform is electrically connected and in signal connection with the communication server, the AI voice training and recognition platform comprises a voice recognition engine, a training module and a warning module, the voice recognition engine is used for receiving text contents of the media resource control server, the training module is used for acquiring the communication contents of the user terminals in real time and converting the audio media contents into audio media streams, the audio media streams are transmitted to the media streams to the media resource control server, the media resource control server is electrically connected and in signal connection with the AI voice training and the media training and recognition platform comprises a voice recognition engine, the training module and a warning module, the voice recognition module is used for receiving the text contents of the media information, the voice recognition engine is used for receiving the text contents of the media control server, the voice resources and the media audio resources and the voice control server, the voice server is used for receiving the audio signals, and a warning information and a warning server, and a warning server is used for sending the audio signals and a warning server.
According to the technical scheme, when the user terminals are in conversation, conversation content is converted into audio streams in real time and sensitive words are identified, when sensitive information is detected, which sound is the current audio or what state or scene sound is, alarm information of corresponding categories is sent to the communication server, timely intervention is achieved, the method can be used for identifying sensitive words, distress sounds, abnormal sounds and other risk information appearing in a plurality of business scenes, and corresponding intervention actions are started, so that the purposes of purifying conversation environments and timely processing accidents are achieved.
As a further explanation of the real-time intercom intervention and alert platform based on AI technology of the present invention, preferably, the media resource control server includes a master server and a plurality of slave servers, the MRCP client communicates with the master server, the master server communicates with the plurality of slave servers, so that the MRCP client sends the IP address and port number of the user terminal to the master server, and the master server controls the idle slave servers to establish communication connection with the MRCP client.
As a further explanation of the real-time intercom intervention and warning platform based on the AI technology, the voice recognition engine preferably comprises a word segmentation module and a semantic analysis module, wherein the word segmentation module is used for dividing text content into word vector sets according to word segmentation sets and transmitting the word vector sets to the semantic analysis module, and the semantic analysis module is used for carrying out semantic analysis on the word vector sets, preliminarily determining classification types corresponding to the word vector sets and transmitting the classification types to the training module.
As a further explanation of the real-time intercom intervention and warning platform based on the AI technology, the warning module preferably comprises an encoder, a warning information generating module and a warning information transmitting module, wherein the training module is connected with the encoder, the encoder is connected with the warning information generating module, the warning information generating module is connected with the warning information transmitting module, the warning information transmitting module is connected with the intervention module, the encoder is used for receiving the sensitive information and the audio classification result of the training module and generating corresponding message codes and sending the corresponding message codes to the warning information generating module, the warning information generating module is used for generating warning information with the message codes after receiving the message codes, and the warning information transmitting module is used for sending the warning information with the message codes to the intervention module.
As a further explanation of the real-time intercom intervention and alarm platform based on the AI technology, the intervention module preferably comprises an alarm information receiving module, a decoder, an interruption intervention module, a reminding intervention module and a keyword silencing module, wherein the alarm information transmitting module is connected with the alarm information receiving module, the alarm information receiving module is connected with the decoder, the decoder is respectively connected with the interruption intervention module, the reminding intervention module and the keyword silencing module, the alarm information receiving module is used for receiving alarm information with message codes of the alarm information transmitting module and sending the alarm information to the decoder, the decoder is used for analyzing the message codes and starting the interruption intervention module, the reminding intervention module or the keyword silencing module according to the message codes, the interruption intervention module is used for cutting off a call of a user terminal, the reminding intervention module is used for sending a text warning or inserting voice to the user terminal, and the keyword silencing module is used for silencing sensitivity in communication content of the user terminal.
As a further illustration of the AI-based real-time intercom intervention and alert platform of the present invention, preferably, the AI voice training and recognition platform includes a database module for storing a sensitive word dataset and an audio classification dataset, providing model training data for a training set and a testing set for a training module.
As a further explanation of the real-time intercom intervention and alert platform based on AI technology of the present invention, preferably, the communication server, the media resource control server and the AI speech training and recognition platform are connected by real-time media streaming.
As a further illustration of the AI-technology based real-time intercom intervention and alert platform of the present invention, preferably the communication server communicates with the media resource control server via SIP protocol.
As a further illustration of the AI technology-based real-time intercom intervention and alert platform of the present invention, preferably the speech recognition model and the sound classification model are deployed on a private CPU/GPU server.
Through the technical scheme, the model is used in an intranet or no-intranet environment, so that data privacy is ensured.
As a further illustration of the real-time intercom intervention and alert platform based on AI technology of the present invention, preferably, the speech recognition model and the sound classification model are the speech recognition systems DeepASR based on PADDLEPADDLE FLUID and Kaldi.
Through the technical scheme, deepASR utilizes the Fluid framework to complete configuration and training of an acoustic model in voice recognition, integrates a Kaldi decoder, achieves quick and large-scale training of the acoustic model, and utilizes Kaldi to complete complex voice data preprocessing and final decoding processes.
The invention has the beneficial effects that the invention provides an intervention and alarm platform supporting real-time intercom, a plurality of different user terminals can be accessed through the communication server, when the user terminals are in conversation, the conversation content is converted into an audio stream and sensitive words are identified in real time through the communication connection established between the communication server and the media resource control server as well as between the AI voice training and identification platform, when sensitive information is detected, which sound is the current audio or what state or scene sound is, the alarm information of the corresponding type is sent to the communication server, timely intervention is realized, and the invention can be used for identifying sensitive words, distress sounds, abnormal sounds and other risk information appearing in a plurality of business scenes, starting corresponding intervention actions, so as to achieve the purposes of purifying conversation environment and timely processing occurrence of unexpected events.
Drawings
Fig. 1 is a schematic structural diagram of the real-time intercom intervention and warning platform based on AI technology of the present invention.
Fig. 2 is a schematic diagram of a structure of a media resource control server according to the present invention.
FIG. 3 is a schematic diagram of the speech recognition engine of the present invention.
FIG. 4 is a schematic diagram of the structure of the alarm module and the intervention module of the present invention.
Detailed Description
For a further understanding of the structure, features, and other objects of the invention, reference should now be made in detail to the accompanying drawings of the preferred embodiments of the invention, which are illustrated in the accompanying drawings and are for purposes of illustrating the concepts of the invention and not for limiting the invention.
First, referring to fig. 1, fig. 1 is a schematic structural diagram of an AI technology-based real-time intercom intervention and warning platform of the present invention. The real-time intercom intervention and alarm platform based on the AI technology comprises a communication server 1, a media resource control server 2 and an AI voice training and recognition platform 3.
The communication server 1 is electrically and signally connected with the media resource control server 2, and is used for providing communication services, the communication server 1 comprises an MRCP client 11, a user agent 12, a session communication component 13 and an intervention module 14, wherein the user agent 12 is used for accessing a plurality of user terminals, the session communication component 13 is connected with the user agent 12, the session communication component 13 is used for acquiring the communication content of the user terminals in real time and converting the communication content into audio media streams, the MRCP client 11 is connected with the session communication component 13, the MRCP client 11 is connected with the media resource control server 2, and the MRCP client 11 is used for pulling the audio media streams in real time and sending the audio media streams to the media resource control server 2.
The media resource control server 2 is electrically and signally connected to the AI speech training and recognition platform 3, and the media resource control server 2 is configured to convert the audio media stream into text content and send the text content to the AI speech training and recognition platform 3. As shown in fig. 2, the media resource control server 2 includes a master server 21 and a plurality of slave servers 22, where the MRCP client 11 communicates with the master server 21, and the master server 21 communicates with the plurality of slave servers 22, so that the MRCP client 11 sends the IP address and the port number of the user terminal to the master server 21, and the master server 21 controls the idle slave servers 22 to establish a communication connection with the MRCP client 11.
The AI voice training and recognition platform 3 is electrically and signally connected with the communication server 1, and the AI voice training and recognition platform 3 comprises a voice recognition engine 31, The training module 31 and the alarm module 33, wherein the media resource control server 2 is connected with the voice recognition engine 31, the voice recognition engine 31 is used for receiving text content of the media resource control server 2, the voice recognition engine 31 is connected with the training module 32, the voice recognition engine 31 comprises a word segmentation module 311 and a semantic analysis module 312 as shown in fig. 3, the word segmentation module 311 is used for dividing the text content into word vector sets according to word segmentation sets and transmitting the word vector sets to the semantic analysis module 312, and the semantic analysis module 312 is used for carrying out semantic analysis on the word vector sets, preliminarily determining classification categories corresponding to the word vector sets and transmitting the classification categories to the training module 32. The training module 32 comprises a voice recognition model and a sound classification model, the training module 32 is used for recognizing sensitive information related to text contents through the voice recognition model and classifying audio related to the text contents through the sound classification model, the training module 32 is connected with the alarm module 33, the alarm module 33 is connected with the intervention module 14, the alarm module 33 is used for generating corresponding types of alarm information with message codes and sending the alarm information to the communication server 1 when the training module 32 detects the sensitive information, and the intervention module 14 in the communication server 1 starts corresponding intervention actions according to the alarm information with the message codes. wherein, as shown in FIG. 4, the alarm module 33 comprises an encoder 331, The system comprises an alarm information generating module 332 and an alarm information transmitting module 333, wherein the training module 32 is connected with an encoder 331, the encoder 331 is connected with the alarm information generating module 332, the alarm information generating module 332 is connected with the alarm information transmitting module 333, the alarm information transmitting module 333 is connected with the intervention module 14, the encoder 331 is used for receiving sensitive information and an audio classification result of the training module 32 and generating corresponding message codes and sending the corresponding message codes to the alarm information generating module 332, the alarm information generating module 332 is used for generating alarm information with the message codes after receiving the message codes, and the alarm information transmitting module 333 is used for sending the alarm information with the message codes to the intervention module 14. The intervention module 14 comprises an alarm information receiving module 141, a decoder 142, an interrupt intervention module 143, a reminding intervention module 144 and a keyword silencing module 145, wherein the alarm information transmitting module 333 is connected with the alarm information receiving module 141, the alarm information receiving module 141 is connected with the decoder 142, the decoder 142 is respectively connected with the interrupt intervention module 143, the reminding intervention module 144 and the keyword silencing module 145, the alarm information receiving module 141 is used for receiving alarm information with message codes of the alarm information transmitting module 333 and transmitting the alarm information to the decoder 142, and the decoder 142 is used for analyzing the message codes and starting the interrupt intervention module 143 according to the message codes, the reminding intervention module 144 or the keyword silencing module 145, the interruption intervention module 143 is used for cutting off the call of the user terminal, the reminding intervention module 144 is used for sending out text warning or inserting voice to the user terminal, and the keyword silencing module 145 is used for silencing sensitive words in the communication content of the user terminal. Thus, the intervention actions initiated by the intervention module 14 include cutting off the call, alerting, inserting the call, and silencing the sensitive word for the communication content of the user terminal. The encoder 331 in the alarm module 33 and the decoder 142 in the intervention module 14, and the alarm information transmitting module 333 in the alarm module 33 and the alarm information receiving module 141 in the intervention module 14 are matched with each other, so as to ensure the accuracy of correctly transmitting and decoding the alarm information, and also improve the security. When the AI voice training and recognition platform detects sensitive information, the AI voice training and recognition platform can recognize what kind of voice is currently provided or what state or scene is provided, send corresponding type of alarm information to the communication server, and perform timely intervention, and can be used for recognizing sensitive words, distress sounds, abnormal sounds and other risk information appearing in a plurality of business scenes, and starting corresponding intervention actions so as to achieve the purposes of purifying conversation environment and timely processing occurrence of unexpected events.
Preferably, the AI speech training and recognition platform 3 further comprises a database module 34, the database module 34 being connected to the training module 32, the database module 34 being adapted to store the sensitive word data set and the audio classification data set, and to provide the training module 32 with model training data of the training set and the test set. The AI voice training and recognition platform 3 is provided with a real-time voice transcription interface adopting a connection mode of websocket protocol, and can realize that the recognition result is obtained while uploading the audio, and the audio stream is recognized as characters in real time. The speech recognition model and the sound classification model are PADDLEPADDLE FLUID and Kaldi based speech recognition system DeepASR. DeepASR utilizes a Fluid framework to complete the configuration and training of an acoustic model in voice recognition, integrates a Kaldi decoder, realizes the rapid and large-scale training of the acoustic model, and utilizes Kaldi to complete complex voice data preprocessing and final decoding processes. The trained voice recognition model and the trained voice classification model are deployed on a private CPU/GPU server, and the models are used in an intranet or non-intranet environment to ensure data privacy. The model may also be published as an API and used by calling the model.
Preferably, the communication server 1, the media resource control server 2 and the AI speech training and recognition platform 3 are connected through real-time media streaming. The communication server 1 communicates with the media resource control server 2 via the SIP protocol. The MRCP client 11 comprises a SIP protocol stack and an MRCP protocol stack, wherein the MRCP protocol stack of the MRCP client 11 is used for calling an API interface of the media resource control server 2, the API interface creates a SIP dialog through the SIP protocol stack of the MRCP client 11 and carries information of the media resource control server 2, and the SIP protocol stack of the MRCP client 11 is used for initializing a media session for the media resource control server 2 through RTP and creating a control session for the media resource control server 2 through the MRCP protocol stack of the MRCP client 11. The media resource control server 2 also comprises an MRCP protocol stack and a SIP protocol stack, and the media resource control server 2 comprises various media resources such as speech recognition, speech synthesis, speech recording, speaker verification, voiceprint matching.
The real-time intercom intervention and warning platform based on the AI technology can be applied to live audio, live equipment is connected to a user agent 12 of a communication server 1, the communication server 1 sends the live audio to a media resource control server 2, the media resource control server 2 processes the audio into texts and sends the texts to an AI voice training and recognition platform 3, the AI voice training and recognition platform 3 detects whether the texts of the live audio have sensitive words, the sensitive words can be silenced, warning information is sent to the communication server 1 to carry out silencing, or live broadcasting is cut off, warning information is sent to the live broadcasting room, labor supervision cost is saved, content safety of the live broadcasting room is ensured, network environment is purified, the communication server 1 can also be applied to recognition of conversation content, timely intervention processing of accidents occur, the communication server can also be applied to public places such as schools and banks, the AI voice training and recognition platform 3 recognizes audio content, and timely processing of accidents occur.
It should be noted that the foregoing summary and the detailed description are intended to demonstrate practical applications of the technical solution provided by the present invention, and should not be construed as limiting the scope of the present invention. Various modifications, equivalent alterations, or improvements will occur to those skilled in the art, and are within the spirit and principles of the invention. The scope of the invention is defined by the appended claims.