CN106409283A - Audio frequency-based man-machine mixed interaction system and method - Google Patents
Audio frequency-based man-machine mixed interaction system and method Download PDFInfo
- Publication number
- CN106409283A CN106409283A CN201610791966.0A CN201610791966A CN106409283A CN 106409283 A CN106409283 A CN 106409283A CN 201610791966 A CN201610791966 A CN 201610791966A CN 106409283 A CN106409283 A CN 106409283A
- Authority
- CN
- China
- Prior art keywords
- unit
- message
- intervention
- module
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
- Debugging And Monitoring (AREA)
Abstract
本发明公开了一种基于音频的人机混合交互系统,语音识别模块与语义识别模块相连并传输语音对应的文字信息,异常处理模块与语音识别模块和语义识别模块相连,语音识别模块传输文字信息给异常处理模块,语义识别模块传输语义解析结果给异常处理模块;异常处理模块与语音合成模块相连并传输干预信息。本发明还公开了一种基于音频的人机混合交互方法,语音识别模块将语音信息转换为文字信息并输出至语义识别单元;语义识别单元从文字信息中提取用户目的以及相应的关键信息;异常处理模块根据语音识别模块的文字信息以及语义识别模块的语义信息判断人机对话当前是否出现异常并针对异常处理消息的回复。本发明的技术方案提供统一的人机对话体验。
The invention discloses an audio-based human-computer hybrid interactive system. The speech recognition module is connected with the semantic recognition module and transmits text information corresponding to the speech. The exception processing module is connected with the speech recognition module and the semantic recognition module, and the speech recognition module transmits text information. To the exception processing module, the semantic recognition module transmits the semantic analysis result to the exception processing module; the exception processing module is connected with the speech synthesis module and transmits intervention information. The invention also discloses an audio-based human-computer hybrid interaction method. The voice recognition module converts voice information into text information and outputs it to the semantic recognition unit; the semantic recognition unit extracts the user's purpose and corresponding key information from the text information; The processing module judges whether there is an abnormality in the human-computer dialogue according to the text information of the speech recognition module and the semantic information of the semantic recognition module, and processes the message reply for the abnormality. The technical solution of the invention provides a unified man-machine dialogue experience.
Description
技术领域technical field
本发明涉及信息处理技术领域,尤其涉及一种基于音频的人机混合交互系统及方法。The present invention relates to the technical field of information processing, in particular to an audio-based human-computer hybrid interaction system and method.
背景技术Background technique
如图1所示,目前基于音频的人机对话系统均使用机器回复作为最终回复呈现给用户,当机器决策系统不能明确用户意图时,大部分对话系统选择呈现“请再说一遍”之类的回复以让用户进行重新的输入,其中部分人机对话系统引入了基于话务中心的人工干预方法。As shown in Figure 1, the current audio-based man-machine dialogue systems all use the machine’s reply as the final reply to present to the user. When the machine decision-making system cannot clarify the user’s intention, most of the dialogue systems choose to present a reply such as “please say it again”. In order to allow users to re-input, some of the man-machine dialogue systems have introduced a manual intervention method based on the call center.
目前现有人机对话异常处理主要通过话务中心形式实现,在机器无法处理用户输入音频或者在用户明确表示需要人工服务时,请求人工的话务中心介入,此时用户与话务员之间建立一对一的通话连接,话务员与用户进行直接交流,获知用户的需求并通过话务平台下发相应的指令。At present, the abnormal handling of the existing human-machine dialogue is mainly realized through the call center. When the machine cannot process the user input audio or when the user clearly expresses the need for manual service, the call center is requested to intervene manually. At this time, a pair is established between the user and the operator. One call connection, the operator communicates directly with the user, learns the user's needs and issues corresponding instructions through the call platform.
现有话务中心的人工干预方式存在的问题主要有:人工效率低,干预师与用户需要建立一对一的语音交流,等待用户输入的时间段内无法服务其他人;成本高,大规模的呼叫中心需要一系列的电信设备以及相应服务集成,同时由于效率低,需要更多干预师进行干预服务,从而间接提高了人力成本;受网络环境影响大:利用网络资源直接传输音频需要稳定的网络连接,网络环境的波动会导致音频质量下降从而影响对话体验,甚至中断人机对话流程。The problems existing in the manual intervention method of the existing call center mainly include: low manual efficiency, the interventionist and the user need to establish one-to-one voice communication, and cannot serve other people during the time period waiting for user input; high cost, large-scale The call center needs a series of telecommunication equipment and corresponding service integration. At the same time, due to low efficiency, more interventionists are required to provide intervention services, which indirectly increases labor costs; it is greatly affected by the network environment: using network resources to directly transmit audio requires a stable network Connection, the fluctuation of the network environment will lead to the degradation of audio quality, which will affect the dialogue experience, and even interrupt the process of man-machine dialogue.
因此,本领域的技术人员致力于开发一种基于音频的人机混合交互系统及方法,将人工干预回复与机器回复相结合,从而统一人机对话的流程和提升用户体验。Therefore, those skilled in the art are devoting themselves to developing an audio-based human-computer hybrid interaction system and method, which combines manual intervention reply with machine reply, so as to unify the process of man-machine dialogue and improve user experience.
发明内容Contents of the invention
有鉴于现有技术的上述缺陷,本发明所要解决的技术问题是如何提高客服过程中人机对话的效率和用户体验。In view of the above-mentioned defects of the prior art, the technical problem to be solved by the present invention is how to improve the efficiency and user experience of man-machine dialogue in the customer service process.
为实现上述目的,本发明提供了一种基于音频的人机混合交互系统,包括语音识别模块、语音合成模块、语义识别模块以及异常处理模块,其中,所述语音识别模块被配置为与所述语义识别模块相连并传输语音对应的文字信息,所述异常处理模块被配置为与所述语音识别模块和所述语义识别模块相连,所述语音识别模块被配置为传输文字信息给所述异常处理模块,所述语义识别模块被配置为传输语义解析结果给所述异常处理模块;所述异常处理模块被配置为与所述语音合成模块相连并传输干预信息。To achieve the above object, the present invention provides an audio-based human-computer hybrid interaction system, including a speech recognition module, a speech synthesis module, a semantic recognition module and an exception processing module, wherein the speech recognition module is configured to communicate with the The semantic recognition module is connected and transmits text information corresponding to the voice, the exception processing module is configured to be connected to the speech recognition module and the semantic recognition module, and the speech recognition module is configured to transmit text information to the exception processing module, the semantic recognition module is configured to transmit semantic analysis results to the exception processing module; the exception processing module is configured to be connected to the speech synthesis module and transmit intervention information.
进一步地,所述语音识别模块包括信号处理及特征提取单元、声学模型、语言模型以及解码器,其中,所述信号处理及特征提取单元被配置为与所述声学模型相连并传输声学特征信息,所述解码器被配置为与所述声学模型和所述语言模型相连并输出识别结果。Further, the speech recognition module includes a signal processing and feature extraction unit, an acoustic model, a language model, and a decoder, wherein the signal processing and feature extraction unit is configured to be connected to the acoustic model and transmit acoustic feature information, The decoder is configured to be connected to the acoustic model and the language model and output a recognition result.
进一步地,所述语音合成模块包括文本分析单元、韵律控制单元以及合成语音单元,其中,所述文本分析单元被配置为接收文本信息并对所述文本信息进行处理,将处理结果传输到所述韵律控制单元与所述合成语音单元,所述韵律控制单元被配置为与所述合成语音单元相连,并传输音高、音长、音强、停顿及语调信息,所述合成语音单元被配置为将所述接收文本分析单元的分析结果与所述韵律控制单元的控制参数合成输出的语音。Further, the speech synthesis module includes a text analysis unit, a prosody control unit and a speech synthesis unit, wherein the text analysis unit is configured to receive text information and process the text information, and transmit the processing result to the A prosody control unit and the synthesized speech unit, the prosody control unit is configured to be connected to the synthesized speech unit, and transmit pitch, sound length, sound intensity, pause and intonation information, and the synthesized speech unit is configured to The analysis result of the received text analysis unit and the control parameters of the prosody control unit are synthesized into the output speech.
进一步地,所述语义识别模块包括领域标注单元、意图判断单元、信息提取单元,其中,所述领域标注单元被配置为与所述意图判断单元相连并传输领域信息,所述意图判断单元被配置为与所述信息提取单元相连并传输用户意图信息,所述信息提取单元输出语义分析的结果。Further, the semantic recognition module includes a domain labeling unit, an intention judgment unit, and an information extraction unit, wherein the domain labeling unit is configured to be connected to the intention judgment unit and transmit domain information, and the intention judgment unit is configured In order to be connected with the information extraction unit and transmit user intention information, the information extraction unit outputs the result of semantic analysis.
进一步地,所述异常处理模块包括异常检测单元、数据库查询单元以及干预师单元,其中,所述异常检测单元被配置为接收所述语音识别模块和所述语义识别模块的输出,并决定是否采取干预措施,所述数据库查询单元被配置为接收所述异常检测单元的干预信号,并接收所述语义识别模块的语义信息,查询并输出干预消息,所述干预师单元被配置为利用干预师对所述数据库查询单元输出的所述干预消息进行必要的择优以及修改,最终输出给用户的回复消息。Further, the abnormality processing module includes an abnormality detection unit, a database query unit and an interventionist unit, wherein the abnormality detection unit is configured to receive the output of the speech recognition module and the semantic recognition module, and decide whether to take Intervention measures, the database query unit is configured to receive the intervention signal of the abnormality detection unit, and receive the semantic information of the semantic recognition module, query and output an intervention message, and the interventionist unit is configured to use the interventionist to The intervention message output by the database query unit undergoes necessary optimization and modification, and finally outputs a reply message to the user.
本发明还提供了一种基于音频的人机混合交互方法,包括以下步骤:The present invention also provides an audio-based human-computer hybrid interaction method, comprising the following steps:
步骤1、提供语音识别模块、语音合成模块、语义识别模块以及异常处理模块;Step 1, providing a speech recognition module, a speech synthesis module, a semantic recognition module and an exception handling module;
步骤2、所述语音识别模块将语音信息转换为文字信息并输出至所述语义识别单元;Step 2, the voice recognition module converts voice information into text information and outputs it to the semantic recognition unit;
步骤3、所述语义识别单元从文字信息中提取用户目的以及相应的关键信息;Step 3, the semantic recognition unit extracts the user purpose and corresponding key information from the text information;
步骤4、所述异常处理模块根据所述语音识别模块的文字信息以及所述语义识别模块的语义信息判断人机对话当前是否出现异常并针对异常处理消息的回复。Step 4. The abnormality processing module judges whether there is an abnormality in the human-computer dialogue according to the text information of the speech recognition module and the semantic information of the semantic recognition module, and processes the message reply for the abnormality.
进一步地,在步骤2中,具体包括以下步骤:Further, in step 2, the following steps are specifically included:
步骤2.1、从输入的音频流中提取特征供声学模型处理,同时降低环境噪声、信道和说话人因素对所述特征造成的影响;Step 2.1, extracting features from the input audio stream for acoustic model processing, while reducing the impact of environmental noise, channel and speaker factors on the features;
步骤2.2、解码器根据声学、语言学模型及词典,对所述声学模型的处理结果,寻找能够以最大概率输出所述音频流的词串,作为语音的识别结果。Step 2.2: The decoder searches for a word string that can output the audio stream with the highest probability based on the acoustic model processing results of the acoustic model and the dictionary as the speech recognition result.
进一步地,在步骤3中,具体包括以下步骤:Further, in step 3, the following steps are specifically included:
步骤3.1、利用文字信息中标志性的关键词标记当前对话所属的领域;Step 3.1, using iconic keywords in the text information to mark the field to which the current conversation belongs;
步骤3.2、在所述领域中基于规则对用户意图进行判断;Step 3.2, judging the user's intention based on rules in the field;
步骤3.3、根据所述领域以及所述用户意图,结合规则,对具体的关键信息进行提取。Step 3.3, extract specific key information according to the field and the user's intention in combination with rules.
进一步地,在步骤4中,具体包括以下步骤:Further, in step 4, the following steps are specifically included:
步骤4.1、异常检测单元根据所述语音识别模块的文字信息以及所述语义识别模块的语义信息判断当前的人机对话是否出现异常,若异常则由干预师单元接管人机对话;Step 4.1, the abnormality detection unit judges whether the current man-machine dialogue is abnormal according to the text information of the speech recognition module and the semantic information of the semantic recognition module, and if it is abnormal, the interventionist unit takes over the man-machine dialogue;
步骤4.2、数据库查询单元根据语义信息进行数据库的查询,得到具有推荐度的干预消息,如果干预消息的推荐度较高,则直接利用该干预消息进行干预,如果推荐度较低,则请求干预师进行人工介入;Step 4.2, the database query unit queries the database according to the semantic information, and obtains intervention messages with a recommendation degree. If the recommendation degree of the intervention message is high, the intervention message is directly used for intervention. If the recommendation degree is low, the interventionist is requested. perform manual intervention;
步骤4.3、在机器算法无法找到高推荐度的干预消息时,干预师介入进行干预消息的选择以及修改,随后将修改后的干预消息发送至客户端。Step 4.3. When the machine algorithm fails to find a highly recommended intervention message, the interventionist intervenes to select and modify the intervention message, and then sends the modified intervention message to the client.
进一步地,所述关键信息包括对话领域、对话关键词,所述对话关键词包括内容关键词和情绪关键词。Further, the key information includes dialogue fields and dialogue keywords, and the dialogue keywords include content keywords and emotion keywords.
与现有技术相比,本发明的技术效果包括:Compared with prior art, technical effect of the present invention comprises:
1、效率提高:充分利用了干预师等待用户输入的时间,使得干预师可同时对多个用户进行干预服务,提高干预的效率。1. Efficiency improvement: Make full use of the time that the interventionist waits for user input, so that the interventionist can provide intervention services to multiple users at the same time, improving the efficiency of intervention.
2、成本减少:无需采购话务中心相关的一系列电信设备,利用现有的计算机以及服务器即可搭建干预平台。2. Cost reduction: There is no need to purchase a series of telecommunication equipment related to the call center, and the intervention platform can be built by using the existing computers and servers.
3、工作场景丰富:由于干预师界面采用了B/S(Browser/Server浏览器/服务器)结构,干预师打开浏览器登录相应的网站即可进行干预操作,不必要在工位上接听电话,可以在PAD、智能手机、个人笔记本等移动终端上进行干预服务。3. Rich work scenarios: Since the interventionist interface adopts the B/S (Browser/Server browser/server) structure, the interventionist can open the browser and log in to the corresponding website to perform intervention operations, and it is unnecessary to answer the phone at the workstation. Intervention services can be performed on mobile terminals such as PADs, smart phones, and personal notebooks.
4、网络要求低:文本传输的数据量很小,从而对网络的要求降低,同时用户收听到的语音由本地合成,不受网络情况的影响。4. Low network requirements: The amount of data transmitted by text is small, thereby reducing the requirements for the network. At the same time, the voice heard by the user is synthesized locally and is not affected by the network situation.
5、统一的人机对话体验:对用户来说,干预师是透明的,用户的体验如同与一个充分智能的“机器”在对话,可以无缝衔接目前的人机对话方式。5. Unified man-machine dialogue experience: For users, the interventionist is transparent, and the user experience is like talking with a fully intelligent "machine", which can seamlessly connect with the current man-machine dialogue.
以下将结合附图对本发明的构思、具体结构及产生的技术效果作进一步说明,以充分地了解本发明的目的、特征和效果。The idea, specific structure and technical effects of the present invention will be further described below in conjunction with the accompanying drawings, so as to fully understand the purpose, features and effects of the present invention.
附图说明Description of drawings
图1为现有传统话务中心的干预模式示意图;FIG. 1 is a schematic diagram of an intervention mode of an existing traditional call center;
图2为本发明的系统模块示意图;Fig. 2 is a schematic diagram of a system module of the present invention;
图3为本发明一个较佳实施例的系统流程示意图;Fig. 3 is a schematic flow chart of the system of a preferred embodiment of the present invention;
图4为本发明一个较佳实施例的角色对话流程示意图。Fig. 4 is a schematic diagram of a character dialog flow in a preferred embodiment of the present invention.
具体实施方式detailed description
本发明是通过以下技术方案实现的:The present invention is achieved through the following technical solutions:
如图2所示,本发明涉及一种基于音频的人机对话异常处理系统,包括:语音识别模块、语音合成模块、语义识别模块以及异常处理模块,其中:语音识别模块与语义识别模块相连并传输语音对应的文字信息,语音识别模块和语义识别模块均与异常处理模块相连,并分别传输文字信息和语义解析结果,异常处理模块与语音合成模块相连并传输干预信息。As shown in Fig. 2, the present invention relates to a kind of audio-based man-machine dialogue exception processing system, comprising: a speech recognition module, a speech synthesis module, a semantic recognition module and an exception processing module, wherein: the speech recognition module is connected with the semantic recognition module and The text information corresponding to the voice is transmitted, the speech recognition module and the semantic recognition module are connected to the exception processing module, and respectively transmit text information and semantic analysis results, and the exception processing module is connected to the speech synthesis module and transmits intervention information.
所述的语音识别模块包括:信号处理及特征提取单元、声学模型、语言模型以及解码器,其中:信号处理及特征提取单元与声学模型相连并传输声学特征信息,解码器与声学模型和语言模型相连,对外界输出识别结果。Described speech recognition module comprises: signal processing and feature extraction unit, acoustic model, language model and decoder, wherein: signal processing and feature extraction unit are connected with acoustic model and transmit acoustic characteristic information, decoder and acoustic model and language model Connected to output recognition results to the outside world.
所述的语音合成模块包括:文本分析单元、韵律控制单元以及合成语音单元,,其中:文本分析单元接收文本信息并对其进行处理,将处理结果传输到韵律控制单元与合成语音单元,韵律控制单元与合成语音单元相连,并传输目标的音高、音长、音强、停顿及语调等信息,合成语音单元接收文本分析单元的分析结果与韵律控制单元的控制参数,对外界输出合成的语音。The speech synthesis module includes: a text analysis unit, a prosody control unit and a synthetic speech unit, wherein: the text analysis unit receives the text information and processes it, and transmits the processing result to the prosody control unit and the synthesized speech unit, and the prosody control unit The unit is connected to the synthesized speech unit, and transmits information such as pitch, duration, sound intensity, pause, and intonation of the target. The synthesized speech unit receives the analysis results of the text analysis unit and the control parameters of the prosody control unit, and outputs the synthesized speech to the outside world. .
所述的语义识别模块包括:领域标注单元、意图判断单元、信息提取单元,其中:领域标注单元与意图判断单元相连并传输领域信息,意图判断单元与信息提取单元相连并传输用户意图信息,信息单元与外界相连并传输语义分析的信息。The semantic recognition module includes: a field labeling unit, an intention judgment unit, and an information extraction unit, wherein: the field labeling unit is connected with the intention judgment unit and transmits domain information, and the intention judgment unit is connected with the information extraction unit and transmits user intention information, information The unit is connected to the outside world and transmits information for semantic analysis.
所述的异常处理模块包括:异常检测单元、数据库查询单元、干预师单元以,其中:异常检测单元接收语音识别模块和语义识别模块的输出,并决定是否采取干预措施,数据库查询单元接收异常检测单元的干预信号,并接收语义识别模块的语义信息,查询并输出干预消息,干预师单元利用干预师对数据库查询单元输出的干预消息进行必要的择优以及修改,最终输出用户回复消息。The abnormal processing module includes: an abnormal detection unit, a database query unit, and an intervention division unit, wherein: the abnormal detection unit receives the output of the speech recognition module and the semantic recognition module, and decides whether to take intervention measures, and the database query unit receives the abnormal detection The intervention signal of the unit, and receive the semantic information of the semantic recognition module, query and output the intervention message, the interventionist unit uses the interventionist to select and modify the intervention message output by the database query unit, and finally outputs the user reply message.
本发明涉及上述系统的人机对话异常处理方法,具体包括以下步骤:The present invention relates to the man-machine dialog abnormality processing method of the above system, which specifically includes the following steps:
步骤1、提供语音识别模块、语音合成模块、语义识别模块以及异常处理模块。Step 1. Provide a speech recognition module, a speech synthesis module, a semantic recognition module and an exception handling module.
步骤2、语音识别模块将语音信息转换为文字信息并输出至语义识别单元,具体步骤包括:Step 2, the voice recognition module converts the voice information into text information and outputs it to the semantic recognition unit, and the specific steps include:
2.1前端处理音频流,从输入信号中提取特征,供声学模型处理。同时尽可能降低环境噪声、信道、说话人等因素对特征造成的影响。2.1 The front end processes the audio stream and extracts features from the input signal for processing by the acoustic model. At the same time, the influence of environmental noise, channel, speaker and other factors on the features is reduced as much as possible.
2.2解码器对输入的信号根据声学、语言学模型及词典,寻找能够以最大概率输出该信号的词串,作为语音的识别结果。2.2 The decoder looks for the word string that can output the signal with the greatest probability according to the acoustic, linguistic model and dictionary for the input signal, as the speech recognition result.
步骤3、语义识别单元从文字信息中提取用户目的以及相应的关键信息,具体步骤包括:Step 3, the semantic recognition unit extracts the user's purpose and corresponding key information from the text information, and the specific steps include:
3.1利用文字信息中标志性的关键词标记当前对话所属的领域。3.1 Use iconic keywords in the text information to mark the field to which the current conversation belongs.
3.2在具体领域中基于规则对用户的意图进行判断。3.2 Judge the user's intention based on the rules in the specific field.
3.3根据领域以及用户意图,结合规则,例如预先设定的模板,对具体的关键信息进行提取。3.3 According to the field and user intentions, combined with rules, such as preset templates, to extract specific key information.
步骤4、异常处理模块根据语音识别模块的文字信息以及语义识别模块的语义信息判断人机对话当前是否出现异常并进行异常的处理以及消息的回复,具体步骤包括:Step 4, the exception processing module judges whether the man-machine dialogue is currently abnormal according to the text information of the speech recognition module and the semantic information of the semantic recognition module, and performs abnormal processing and message reply. The specific steps include:
4.1异常检测单元根据语音识别模块的文字信息以及语义识别模块的语义信息判断当前的人机对话是否出现异常。不异常则由本地客户端进行处理,异常则由干预服务器接管人机对话。4.1 The abnormality detection unit judges whether the current man-machine dialogue is abnormal according to the text information of the speech recognition module and the semantic information of the semantic recognition module. If there is no exception, it will be handled by the local client, and if there is an exception, the intervening server will take over the man-machine dialogue.
4.2数据库查询单元根据语义信息进行数据库的查询,得到推荐的干预消息,如果干预消息的推荐度较高,则直接利用该干预消息进行干预,如果推荐度较低,则请求干预师进行人工介入。4.2 The database query unit searches the database according to the semantic information to obtain the recommended intervention message. If the recommendation degree of the intervention message is high, the intervention message is directly used for intervention. If the recommendation degree is low, the interventionist is requested to intervene manually.
4.3在机器算法无法找到高推荐度的干预消息时,干预师介入进行干预消息的选择以及修改,随后将修改后的干预消息发送至客户端。4.3 When the machine algorithm fails to find a highly recommended intervention message, the interventionist intervenes to select and modify the intervention message, and then sends the modified intervention message to the client.
在人机对话异常处理的过程中,用户的语音输入通过机器的语音识别以及语义解析后,会将语音的识别结果以及语义解析的结果以文本的形式传到干预师端,干预师接受到消息之后可以选择发送对话消息或者下发命令消息。对话消息以文本的形式传输到机器,随后通过语音合成系统(TTS)合成语音并播放给用户,命令消息则是直接通过机器执行命令。In the process of man-machine dialogue exception handling, after the user's voice input passes through the machine's voice recognition and semantic analysis, the voice recognition results and semantic analysis results will be sent to the interventionist in the form of text, and the interventionist will receive the message After that, you can choose to send a dialog message or send a command message. Dialogue messages are transmitted to the machine in the form of text, and then the voice is synthesized by a speech synthesis system (TTS) and played to the user, while the command message is directly executed by the machine.
本实施例包括以下步骤,如图3和图4所示,即用户输入-->干预消息生成-->客户机推送干预消息三个步骤分别进行技术方案的介绍:This embodiment includes the following steps, as shown in Figure 3 and Figure 4, that is, the three steps of user input->intervention message generation->client push intervention message are introduced respectively for technical solutions:
1)用户输入1) User input
用户进行语音输入的过程中,利用的语音识别系统将用户的语音输入音频转换为文字,同时对该句文字进行语义分析(语义分析的结果包括用户当前的对话领域、用户请求服务的关键信息等),最后将文字以及语义分析的结果以文本形式通过HTTP协议的POST方法传输到异常处理模块。During the user's voice input process, the voice recognition system used converts the user's voice input audio into text, and at the same time performs semantic analysis on the sentence (the result of semantic analysis includes the user's current dialogue field, key information of the user's requested service, etc. ), and finally transmit the text and semantic analysis results to the exception handling module in text form through the POST method of the HTTP protocol.
2)干预消息生成2) Intervention message generation
异常处理模块在异常情况下,根据语音识别的文本信息和语义识别的语义槽查询数据库,得到备选的干预消息。如果干预消息的推荐度较高,则直接利用该干预消息进行干预,如果推荐度较低,则请求干预师进行人工介入。干预师在界面上可以看到由异常处理模块提供的辅助数据比如用户输入的识别结果和语义分析的结果等,结合这些信息干预师能够更准确快速地对候选干预消息进行筛选与修改。干预消息分为对话消息与命令消息,均以文本的形式采用统一的Websocket协议进行传输,其区别在与传输内容的不同以及机器的处理方式不同。The exception processing module queries the database according to the text information of the speech recognition and the semantic slot of the semantic recognition under abnormal circumstances, and obtains alternative intervention messages. If the recommendation degree of the intervention message is high, the intervention message is directly used for intervention, and if the recommendation degree is low, the interventionist is requested to intervene manually. Interventionists can see the auxiliary data provided by the exception handling module on the interface, such as the recognition results input by users and the results of semantic analysis, etc. Combining these information, interventionists can more accurately and quickly screen and modify candidate intervention messages. Intervention messages are divided into dialogue messages and command messages, both of which are transmitted in the form of text using a unified Websocket protocol, the difference lies in the difference in the transmission content and the processing method of the machine.
3)客户机推送干预消息3) The client pushes the intervention message
客户机收到干预消息后立刻返回干预师“消息已收到”的确认信息,并将干预消息缓存在消息队列中。客户机会监听当前的人机对话状态并在一定条件下尝试从消息队列中取出消息向用户推送,具体的推送时机包括有:1、干预消息到达时,2、TTS合成的语音消息播报完成时;需要满足的条件为1、消息队列不为空,2、客户机的音频播放器当前空闲。如果干预消息成功推送则返回干预师“干预消息已推送”的确认信息。After the client computer receives the intervention message, it immediately returns the confirmation message of "the message has been received" to the interventionist, and caches the intervention message in the message queue. The client will monitor the current state of the man-machine dialogue and try to take out the message from the message queue and push it to the user under certain conditions. The specific push timing includes: 1. When the intervention message arrives, 2. When the broadcast of the voice message synthesized by TTS is completed; The conditions to be met are 1. the message queue is not empty, and 2. the audio player of the client is currently idle. If the intervention message is successfully pushed, it will return the interventionist's confirmation message "The intervention message has been pushed".
例如:For example:
1、用户A发出语音指令“我要去一个好玩的地方”。1. User A issues a voice command "I am going to a fun place".
2、语音识别模块将语音输入转换为文字。2. The voice recognition module converts voice input into text.
3、语义分析模块处理后得到用户意图为“导航”,导航的目标地的标签为“好玩”。3. After processing by the semantic analysis module, it is obtained that the user's intention is "navigation", and the label of the navigation destination is "fun".
4、异常处理模块中的异常检测单元收到用户A的服务请求,包含完整的语音识别结果“我要去一个好玩的地方”,和语义分析的结果“导航”、"好玩",同时检测到当前的对话状态出现异常。4. The abnormal detection unit in the abnormal processing module receives the service request from user A, including the complete voice recognition result "I am going to a fun place", and the semantic analysis results "navigation" and "fun", and simultaneously detects There is an exception in the current dialog state.
5、异常处理模块中的数据库查询单元根据”导航“、”好玩“进行数据库查询,得到一些备选消息比如”请问您要去苏州的好玩小吃吗?“、”为您找个5个与好玩相关的地点“,这两条消息的推荐度都比较低,故请求干预师单元的人工介入。干预师利用异常处理模块得到的数据库查询结果以及语义分析结果和语音识别的文字结果进行干预消息的选择和修改,将干预消息改为”请问您想要怎样的娱乐方式?“,向用户发送该文本消息。5. The database query unit in the exception handling module performs database query according to "navigation" and "fun", and gets some alternative messages such as "do you want to go to Suzhou's fun snacks?", "find 5 for you and fun" Relevant locations", the recommendation of these two messages is relatively low, so manual intervention from the interventionist unit is requested. The interventionist selects and modifies the intervention message by using the database query results obtained by the exception handling module, the semantic analysis results, and the text results of speech recognition, and changes the intervention message to "What kind of entertainment do you want?", and sends the message to the user. text message.
6、客户机收到干预消息后将其存入消息队列,向异常处理模块发送“消息已收到”的反馈,并尝试进行推送。6. After receiving the intervention message, the client computer stores it in the message queue, sends a feedback of "the message has been received" to the exception handling module, and tries to push it.
7、条件满足后进行干预消息的语音合成系统合成以及播报,用户听到音频“请问您想要怎样的娱乐方式”,客户机向异常处理模块发送“消息已推送”反馈。7. After the conditions are met, the speech synthesis system synthesizes and broadcasts the intervention message. The user hears the audio "What kind of entertainment do you want?", and the client computer sends a "message has been pushed" feedback to the exception handling module.
8、客户进行进一步的语音输入“我要去唱歌”8. The customer makes further voice input "I'm going to sing"
9、ASR系统将语音输入转换为文字9. ASR system converts speech input into text
10、语义分析得到用户意图为“导航”,导航的目标为“KTV”10. Semantic analysis shows that the user's intention is "navigation", and the navigation target is "KTV"
11、异常检测单元得到用户A的具体服务需求,包含完整的语音识别结果“我要去唱歌”,和语义分析的结果”导航“、”KTV“。11. The anomaly detection unit obtains the specific service requirements of user A, including the complete voice recognition result "I'm going to sing", and the semantic analysis results "navigation" and "KTV".
12、数据库查询单元根据”导航“、”KTV“、以及用户的相关信息进行数据库的搜索,得到备选干预消息”为您推荐xxx请问是否前往?“,同时由于推荐度很高,故绕过干预师单元,直接向客户机发送文字消息”为您推荐xxx请问是否前往?“12. The database query unit searches the database according to "navigation", "KTV" and user related information, and obtains an alternative intervention message "recommend xxx for you, do you want to go?", and because the recommendation is very high, it is bypassed Intervention division unit, send a text message directly to the client computer "recommend xxx for you, would you like to go?"
13、用户确认前往13. The user confirms to go to
14、异常处理系统用户推送命令类型的干预消息,包含命令类型“导航”以及目的地的POI信息。14. The user of the exception handling system pushes an intervention message of the command type, including the command type "navigation" and the POI information of the destination.
15、客户机从消息队列中取出命令类型“导航”的消息以及相应的POI信息,进行导航操作,客户机向异常处理模块发送“消息已推送”反馈,交互结束。15. The client takes out the message of the command type "navigation" and the corresponding POI information from the message queue, and performs the navigation operation. The client sends a feedback of "the message has been pushed" to the exception handling module, and the interaction ends.
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。The preferred specific embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make many modifications and changes according to the concept of the present invention without creative efforts. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning or limited experiments on the basis of the prior art shall be within the scope of protection defined by the claims.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610791966.0A CN106409283B (en) | 2016-08-31 | 2016-08-31 | Man-machine mixed interaction system and method based on audio |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610791966.0A CN106409283B (en) | 2016-08-31 | 2016-08-31 | Man-machine mixed interaction system and method based on audio |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106409283A true CN106409283A (en) | 2017-02-15 |
| CN106409283B CN106409283B (en) | 2020-01-10 |
Family
ID=58001464
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610791966.0A Active CN106409283B (en) | 2016-08-31 | 2016-08-31 | Man-machine mixed interaction system and method based on audio |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106409283B (en) |
Cited By (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107122807A (en) * | 2017-05-24 | 2017-09-01 | 努比亚技术有限公司 | A kind of family's monitoring method, service end and computer-readable recording medium |
| CN107733780A (en) * | 2017-09-18 | 2018-02-23 | 上海量明科技发展有限公司 | Task smart allocation method, apparatus and JICQ |
| CN107992587A (en) * | 2017-12-08 | 2018-05-04 | 北京百度网讯科技有限公司 | A kind of voice interactive method of browser, device, terminal and storage medium |
| CN109697226A (en) * | 2017-10-24 | 2019-04-30 | 上海易谷网络科技股份有限公司 | Text silence seat monitoring robot interactive method |
| CN110069607A (en) * | 2017-12-14 | 2019-07-30 | 株式会社日立制作所 | For the method, apparatus of customer service, electronic equipment, computer readable storage medium |
| CN110602334A (en) * | 2019-09-03 | 2019-12-20 | 上海航动科技有限公司 | Intelligent outbound method and system based on man-machine cooperation |
| WO2020057446A1 (en) * | 2018-09-17 | 2020-03-26 | Huawei Technologies Co., Ltd. | Method and system for generating a semantic point cloud map |
| CN110926493A (en) * | 2019-12-10 | 2020-03-27 | 广州小鹏汽车科技有限公司 | Navigation method, navigation device, vehicle and computer readable storage medium |
| CN110970017A (en) * | 2018-09-27 | 2020-04-07 | 北京京东尚科信息技术有限公司 | Human-computer interaction method and system, computer system |
| CN111125384A (en) * | 2018-11-01 | 2020-05-08 | 阿里巴巴集团控股有限公司 | Multimedia answer generation method and device, terminal equipment and storage medium |
| CN111540353A (en) * | 2020-04-16 | 2020-08-14 | 重庆农村商业银行股份有限公司 | Semantic understanding method, device, equipment and storage medium |
| CN112509575A (en) * | 2020-11-26 | 2021-03-16 | 上海济邦投资咨询有限公司 | Financial consultation intelligent guiding system based on big data |
| CN112735427A (en) * | 2020-12-25 | 2021-04-30 | 平安普惠企业管理有限公司 | Radio reception control method and device, electronic equipment and storage medium |
| CN112735410A (en) * | 2020-12-25 | 2021-04-30 | 中国人民解放军63892部队 | Automatic voice interactive force model control method and system |
| CN107204185B (en) * | 2017-05-03 | 2021-05-25 | 深圳车盒子科技有限公司 | Vehicle-mounted voice interaction method and system and computer readable storage medium |
| CN116453540A (en) * | 2023-06-15 | 2023-07-18 | 山东贝宁电子科技开发有限公司 | Underwater frogman voice communication quality enhancement processing method |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1920948A (en) * | 2005-08-24 | 2007-02-28 | 富士通株式会社 | Voice recognition system and voice processing system |
| CN101276584A (en) * | 2007-03-28 | 2008-10-01 | 株式会社东芝 | Prosodic Pattern Generating Device, Speech Synthesizing Device and Method |
| CN102509483A (en) * | 2011-10-31 | 2012-06-20 | 苏州思必驰信息科技有限公司 | Distributive automatic grading system for spoken language test and method thereof |
| CN102982799A (en) * | 2012-12-20 | 2013-03-20 | 中国科学院自动化研究所 | Speech recognition optimization decoding method integrating guide probability |
| CN104678868A (en) * | 2015-01-23 | 2015-06-03 | 贾新勇 | Business and equipment operation and maintenance monitoring system |
| CN105227790A (en) * | 2015-09-24 | 2016-01-06 | 北京车音网科技有限公司 | A kind of voice answer method, electronic equipment and system |
| CN105723362A (en) * | 2013-10-28 | 2016-06-29 | 余自立 | Natural expression processing method, processing and response method, device, and system |
-
2016
- 2016-08-31 CN CN201610791966.0A patent/CN106409283B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1920948A (en) * | 2005-08-24 | 2007-02-28 | 富士通株式会社 | Voice recognition system and voice processing system |
| CN101276584A (en) * | 2007-03-28 | 2008-10-01 | 株式会社东芝 | Prosodic Pattern Generating Device, Speech Synthesizing Device and Method |
| CN102509483A (en) * | 2011-10-31 | 2012-06-20 | 苏州思必驰信息科技有限公司 | Distributive automatic grading system for spoken language test and method thereof |
| CN102982799A (en) * | 2012-12-20 | 2013-03-20 | 中国科学院自动化研究所 | Speech recognition optimization decoding method integrating guide probability |
| CN105723362A (en) * | 2013-10-28 | 2016-06-29 | 余自立 | Natural expression processing method, processing and response method, device, and system |
| CN104678868A (en) * | 2015-01-23 | 2015-06-03 | 贾新勇 | Business and equipment operation and maintenance monitoring system |
| CN105227790A (en) * | 2015-09-24 | 2016-01-06 | 北京车音网科技有限公司 | A kind of voice answer method, electronic equipment and system |
Cited By (24)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107204185B (en) * | 2017-05-03 | 2021-05-25 | 深圳车盒子科技有限公司 | Vehicle-mounted voice interaction method and system and computer readable storage medium |
| CN107122807A (en) * | 2017-05-24 | 2017-09-01 | 努比亚技术有限公司 | A kind of family's monitoring method, service end and computer-readable recording medium |
| CN107122807B (en) * | 2017-05-24 | 2021-05-21 | 努比亚技术有限公司 | Home monitoring method, server and computer readable storage medium |
| CN107733780A (en) * | 2017-09-18 | 2018-02-23 | 上海量明科技发展有限公司 | Task smart allocation method, apparatus and JICQ |
| CN107733780B (en) * | 2017-09-18 | 2020-07-03 | 上海量明科技发展有限公司 | Intelligent task allocation method and device and instant messaging tool |
| CN109697226A (en) * | 2017-10-24 | 2019-04-30 | 上海易谷网络科技股份有限公司 | Text silence seat monitoring robot interactive method |
| CN107992587A (en) * | 2017-12-08 | 2018-05-04 | 北京百度网讯科技有限公司 | A kind of voice interactive method of browser, device, terminal and storage medium |
| CN110069607A (en) * | 2017-12-14 | 2019-07-30 | 株式会社日立制作所 | For the method, apparatus of customer service, electronic equipment, computer readable storage medium |
| CN110069607B (en) * | 2017-12-14 | 2024-03-05 | 株式会社日立制作所 | Methods, devices, electronic devices, and computer-readable storage media for customer service |
| US10983526B2 (en) | 2018-09-17 | 2021-04-20 | Huawei Technologies Co., Ltd. | Method and system for generating a semantic point cloud map |
| WO2020057446A1 (en) * | 2018-09-17 | 2020-03-26 | Huawei Technologies Co., Ltd. | Method and system for generating a semantic point cloud map |
| CN110970017A (en) * | 2018-09-27 | 2020-04-07 | 北京京东尚科信息技术有限公司 | Human-computer interaction method and system, computer system |
| CN111125384B (en) * | 2018-11-01 | 2023-04-07 | 阿里巴巴集团控股有限公司 | Multimedia answer generation method and device, terminal equipment and storage medium |
| CN111125384A (en) * | 2018-11-01 | 2020-05-08 | 阿里巴巴集团控股有限公司 | Multimedia answer generation method and device, terminal equipment and storage medium |
| CN110602334A (en) * | 2019-09-03 | 2019-12-20 | 上海航动科技有限公司 | Intelligent outbound method and system based on man-machine cooperation |
| CN110926493A (en) * | 2019-12-10 | 2020-03-27 | 广州小鹏汽车科技有限公司 | Navigation method, navigation device, vehicle and computer readable storage medium |
| CN111540353A (en) * | 2020-04-16 | 2020-08-14 | 重庆农村商业银行股份有限公司 | Semantic understanding method, device, equipment and storage medium |
| CN112509575A (en) * | 2020-11-26 | 2021-03-16 | 上海济邦投资咨询有限公司 | Financial consultation intelligent guiding system based on big data |
| CN112735427A (en) * | 2020-12-25 | 2021-04-30 | 平安普惠企业管理有限公司 | Radio reception control method and device, electronic equipment and storage medium |
| CN112735410A (en) * | 2020-12-25 | 2021-04-30 | 中国人民解放军63892部队 | Automatic voice interactive force model control method and system |
| CN112735427B (en) * | 2020-12-25 | 2023-12-05 | 海菲曼(天津)科技有限公司 | Radio reception control method and device, electronic equipment and storage medium |
| CN112735410B (en) * | 2020-12-25 | 2024-06-07 | 中国人民解放军63892部队 | Automatic voice interactive force model control method and system |
| CN116453540A (en) * | 2023-06-15 | 2023-07-18 | 山东贝宁电子科技开发有限公司 | Underwater frogman voice communication quality enhancement processing method |
| CN116453540B (en) * | 2023-06-15 | 2023-08-29 | 山东贝宁电子科技开发有限公司 | A method for enhancing the quality of underwater frogman voice communication |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106409283B (en) | 2020-01-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106409283A (en) | Audio frequency-based man-machine mixed interaction system and method | |
| US12332948B2 (en) | In-conversation search | |
| CN107895578B (en) | Voice interaction method and device | |
| CN106777018B (en) | Method and device for optimizing input sentences in intelligent chat robot | |
| KR102108500B1 (en) | Supporting Method And System For communication Service, and Electronic Device supporting the same | |
| CN106354835A (en) | Artificial dialogue auxiliary system based on context semantic understanding | |
| CN107609092A (en) | Intelligent response method and apparatus | |
| CN108777751A (en) | A kind of call center system and its voice interactive method, device and equipment | |
| CN113782022B (en) | Communication method, device, equipment and storage medium based on intention recognition model | |
| WO2017128775A1 (en) | Voice control system, voice processing method and terminal device | |
| CN110956955A (en) | A method and device for voice interaction | |
| KR20240046508A (en) | Decision and visual display of voice menu for calls | |
| KR20230163649A (en) | Intelligent response recommendation system and method for supporting customer consultation based on speech signal in realtime | |
| CN112669842A (en) | Man-machine conversation control method, device, computer equipment and storage medium | |
| CN112866086A (en) | Information pushing method, device, equipment and storage medium for intelligent outbound | |
| CN113076397A (en) | Intention recognition method and device, electronic equipment and storage medium | |
| CN114064943A (en) | Conference management method, conference management device, storage medium and electronic equipment | |
| CN109887490A (en) | Method and apparatus for recognizing speech | |
| CN105279168A (en) | Data query method supporting natural language, open platform, and user terminal | |
| CN110740212B (en) | Call answering method and device based on intelligent voice technology and electronic equipment | |
| CN117809641A (en) | Terminal equipment and voice interaction method based on query text rewriting | |
| CN109360565A (en) | A method of precision of identifying speech is improved by establishing resources bank | |
| US20140067401A1 (en) | Provide services using unified communication content | |
| JP6625772B2 (en) | Search method and electronic device using the same | |
| CN109545203A (en) | Audio recognition method, device, equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20200619 Address after: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120 Patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. Address before: 200240 Dongchuan Road, Shanghai, No. 800, No. Patentee before: SHANGHAI JIAO TONG University |
|
| TR01 | Transfer of patent right | ||
| TR01 | Transfer of patent right |
Effective date of registration: 20201105 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: AI SPEECH Ltd. Address before: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120 Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. |
|
| TR01 | Transfer of patent right | ||
| CP01 | Change in the name or title of a patent holder |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Patentee after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Patentee before: AI SPEECH Ltd. |
|
| CP01 | Change in the name or title of a patent holder |