[go: up one dir, main page]

CN115376509A - Method and device for implementing voice recognition interaction - Google Patents

Method and device for implementing voice recognition interaction Download PDF

Info

Publication number
CN115376509A
CN115376509A CN202210799286.9A CN202210799286A CN115376509A CN 115376509 A CN115376509 A CN 115376509A CN 202210799286 A CN202210799286 A CN 202210799286A CN 115376509 A CN115376509 A CN 115376509A
Authority
CN
China
Prior art keywords
voice
wake
awakening
recognized
delay information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210799286.9A
Other languages
Chinese (zh)
Inventor
赵茂祥
刘威
李全忠
何国涛
蒲瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Original Assignee
Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puqiang Times Zhuhai Hengqin Information Technology Co ltd filed Critical Puqiang Times Zhuhai Hengqin Information Technology Co ltd
Priority to CN202210799286.9A priority Critical patent/CN115376509A/en
Publication of CN115376509A publication Critical patent/CN115376509A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a method and a device for realizing voice recognition interaction, wherein the method comprises the steps of obtaining voice to be recognized, sending the voice to be recognized to a voice awakening engine for analysis and processing, and outputting an awakening result and awakening delay information; and sending the awakening delay information to a voice recognition engine, and outputting a recognition result. According to the invention, the problem of multiple words or few words can be recognized through boundary processing and awakening time delay set in the voice awakening engine, so that the voice identification accuracy is higher, the reaction speed and accuracy of the intelligent voice assistant are improved, and the intelligent voice becomes more intelligent.

Description

语音识别交互的实现方法及装置Method and device for implementing voice recognition interaction

技术领域technical field

本发明属于人工智能技术领域,具体涉及一种语音识别交互的实现方法及装置。The invention belongs to the technical field of artificial intelligence, and in particular relates to a method and device for realizing voice recognition interaction.

背景技术Background technique

随着人工智能及语音识别技术的不断发展,提高人机交互效率的需求日益增长,对机器的反应速度和准确度都有更高的要求。语音识别是将“语音”转换成对应的“文字”,语音唤醒是在连续的语音流中实时检测出说话人的特点片段。With the continuous development of artificial intelligence and speech recognition technology, the demand for improving the efficiency of human-computer interaction is increasing, and there are higher requirements for the response speed and accuracy of the machine. Speech recognition is to convert "speech" into corresponding "text", and voice wake-up is to detect the characteristic segment of the speaker in real time in the continuous voice stream.

Oneshot是将一句话唤醒并识别的交互方式,比如:唤醒词是“你好小意”,这时候可以说“你好小意,打开收音机”,这就是一个语音识别的交互。语音识别的交互方式需要语音唤醒引擎和语音识别引擎。最初的语音识别是将一句话完整的送给识别,由语音识别引擎进行处理,然后对识别结果进行切割。这种方式存在2个问题,其中一个问题是,唤醒词的识别率没有唤醒的关键词检查能力强,如果出现误识别,就不会对误识别的唤醒词进行切割,导致识别结果错误。另一个问题是在设计上可以更换唤醒词,如果更换了唤醒词,识别模型没有兼容新的唤醒词,会导致识别错误。Oneshot is an interaction method that wakes up and recognizes a sentence. For example, if the wake-up word is "Hi Xiaoyi", you can say "Hi Xiaoyi, turn on the radio" at this time. This is a voice recognition interaction. The interactive mode of voice recognition requires a voice wake-up engine and a voice recognition engine. The initial speech recognition is to send a complete sentence to the recognition, which is processed by the speech recognition engine, and then the recognition results are cut. There are two problems in this method. One of the problems is that the recognition rate of wake-up words is not as strong as the ability to check wake-up keywords. If misrecognition occurs, the wrongly recognized wake-up words will not be cut, resulting in wrong recognition results. Another problem is that the wake-up word can be changed in design. If the wake-up word is changed, the recognition model is not compatible with the new wake-up word, which will lead to recognition errors.

相关技术中,通过唤醒边界处理的方式,将唤醒部分的声音数据不送给语音识别引擎,虽然能够解决上述问题,但是又出现了新的问题,由于语音唤醒是存在时延的,进入识别引擎的部分语音就会丢失,导致无法识别正确,称这种现象为丢字问题。如果没有唤醒时延,缺失后验会导致误唤醒率严重升高,这就导致唤醒时延必须客观存在。如果将唤醒延时一起送给识别引擎,这样就能解决识别丢字的问题了,但是又出现了新的问题,唤醒的延时可能会包含唤醒词的尾音,这样就导致识别可能多字的问题,称这种现象为多字问题。In the related technology, the wake-up part of the sound data is not sent to the speech recognition engine by means of wake-up boundary processing. Although the above problems can be solved, a new problem has arisen. Because there is a time delay in the wake-up of the voice, it enters the recognition engine. Part of the speech will be lost, resulting in the inability to recognize correctly, this phenomenon is called word loss. If there is no wake-up delay, the missing posterior will lead to a serious increase in the false wake-up rate, which leads to the fact that the wake-up delay must exist objectively. If the wake-up delay is sent to the recognition engine together, this can solve the problem of missing words in recognition, but a new problem arises, the wake-up delay may include the end of the wake-up word, which will lead to recognition of possible multi-word problem, this phenomenon is called multi-word problem.

发明内容Contents of the invention

有鉴于此,本发明的目的在于克服现有技术的不足,提供一种语音识别交互的实现方法及装置,以解决现有技术中语音识别时丢字和多字的问题。In view of this, the purpose of the present invention is to overcome the deficiencies of the prior art, and provide a method and device for implementing speech recognition interaction, so as to solve the problems of missing and multiple characters in speech recognition in the prior art.

为实现以上目的,本发明采用如下技术方案:一种语音识别交互的实现方法,包括:In order to achieve the above object, the present invention adopts the following technical solutions: a method for implementing voice recognition interaction, comprising:

获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;Obtain the voice to be recognized and send it to the voice wake-up engine for analysis and processing, output the wake-up result, and output the wake-up delay information after confirming the wake-up result;

将所述唤醒延时信息发送至语音识别引擎,输出识别结果。Send the wake-up delay information to the speech recognition engine, and output the recognition result.

进一步的,所述语音唤醒引擎包括:数据处理单元和边界处理单元;所述输出唤醒结果和唤醒延时信息,包括:Further, the voice wake-up engine includes: a data processing unit and a boundary processing unit; the output wake-up result and wake-up delay information include:

所述数据处理单元对所述待识别语音进行分析,判断所述待识别语音中是否满足唤醒条件,根据判断结果输出唤醒结果;The data processing unit analyzes the speech to be recognized, judges whether the wake-up condition is satisfied in the speech to be recognized, and outputs a wake-up result according to the judgment result;

所述边界处理单元对所述待识别语音进行处理,得到唤醒延时信息。The boundary processing unit processes the speech to be recognized to obtain wake-up delay information.

进一步的,判断所述待识别语音中是否满足唤醒条件,包括:Further, judging whether the wake-up condition is satisfied in the speech to be recognized includes:

判断所述待识别语音中是否存在唤醒词,如果存在,则所述待识别语音满足唤醒条件,否则所述待识别语音不满足唤醒条件。Judging whether there is a wake-up word in the voice to be recognized, if yes, the voice to be recognized meets the wake-up condition, otherwise the voice to be recognized does not meet the wake-up condition.

进一步的,所述对所述待识别语音进行处理,得到唤醒延时信息,包括:Further, the processing of the speech to be recognized to obtain wake-up delay information includes:

对所述待识别语音进行分析,根据预设时间段的区间采样点的值计算平均值,将所述平均值确定为语音能量值;Analyzing the speech to be recognized, calculating an average value according to the values of interval sampling points in a preset time period, and determining the average value as a speech energy value;

根据语音能量值判断是否保留所述区间的语音数据作为唤醒延时信息;其中,所述唤醒延时信息中存在唤醒词最后一个字的尾音。It is judged according to the voice energy value whether to retain the voice data in the interval as the wake-up delay information; wherein, the wake-up delay information includes the ending sound of the last word of the wake-up word.

进一步的,所述根据语音能量值判断是否保留所述区间的语音数据作为唤醒延时信息,包括:Further, the judging whether to retain the voice data in the interval as wake-up delay information according to the voice energy value includes:

如果唤醒延时信息的语音能量值小于所述唤醒延时信息中第一帧的能量值,则丢弃所述唤醒延时信息;If the voice energy value of the wake-up delay information is less than the energy value of the first frame in the wake-up delay information, discard the wake-up delay information;

如果唤醒延时信息的语音能量值大于所述唤醒延时信息中第一帧的能量值,则保留所述换线延时信息。If the voice energy value of the wake-up delay information is greater than the energy value of the first frame in the wake-up delay information, the line change delay information is retained.

进一步的,所述预设时间段为10s。Further, the preset time period is 10s.

本申请实施例提供一种语音识别交互的实现装置,包括:An embodiment of the present application provides a device for realizing voice recognition interaction, including:

唤醒模块,用于获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;The wake-up module is used to obtain the voice to be recognized and send it to the voice wake-up engine for analysis and processing, output the wake-up result, and output wake-up delay information after confirming the wake-up result;

识别模块,用于将所述唤醒延时信息发送至语音识别引擎,输出识别结果。A recognition module, configured to send the wake-up delay information to a speech recognition engine, and output a recognition result.

进一步的,所述唤醒模块,包括:Further, the wake-up module includes:

数据处理单元和边界处理单元;Data processing unit and boundary processing unit;

所述数据处理单元用于对所述待识别语音进行分析,判断所述待识别语音中是否满足唤醒条件,根据判断结果输出唤醒结果;The data processing unit is used to analyze the speech to be recognized, judge whether the wake-up condition is satisfied in the speech to be recognized, and output a wake-up result according to the judgment result;

所述边界处理单元用于对所述待识别语音进行处理,得到唤醒延时信息。The boundary processing unit is configured to process the speech to be recognized to obtain wake-up delay information.

本申请实施例提供一种计算机设备,包括:存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行上述任一项语音识别交互的实现方法的步骤。An embodiment of the present application provides a computer device, including: a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor performs any of the above speech recognition interaction steps of the implementation method.

本申请实施例还提供一种计算机存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行上述任一项语音识别交互的实现方法的步骤。The embodiment of the present application also provides a computer storage medium storing a computer program, and when the computer program is executed by a processor, the processor is made to perform the steps of any one of the methods for implementing voice recognition interaction described above.

本发明采用以上技术方案,能够达到的有益效果包括:The present invention adopts the above technical scheme, and the beneficial effects that can be achieved include:

本发明提供一种语音识别交互的实现方法及装置,所述方法包括获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果和唤醒延时信息;将所述唤醒延时信息发送至语音识别引擎,输出识别结果。本发明通过语音唤醒引擎中设置的边界处理以及唤醒时延,能够识别多字或少字的问题,使得语音识别准确率更高,提高了智能语音助手的反应速度和准确度,使得智能语音变的更加智能。The present invention provides a method and device for implementing voice recognition interaction. The method includes acquiring the voice to be recognized and sending it to the voice wake-up engine for analysis and processing, outputting the wake-up result and wake-up delay information; sending the wake-up delay information To the speech recognition engine to output the recognition result. The present invention can recognize the problem of many words or few words through the boundary processing and wake-up delay set in the voice wake-up engine, so that the accuracy of voice recognition is higher, the response speed and accuracy of the intelligent voice assistant are improved, and the intelligent voice changes more intelligent.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明语音识别交互的实现方法的步骤示意图;Fig. 1 is a schematic diagram of the steps of the implementation method of voice recognition interaction of the present invention;

图2为本发明语音识别交互的实现方法的流程示意图;FIG. 2 is a schematic flow diagram of a method for implementing voice recognition interaction in the present invention;

图3为本发明语音识别交互的实现装置的结构示意图;FIG. 3 is a schematic structural diagram of a device for implementing voice recognition interaction according to the present invention;

图4为本发明语音识别交互的实现方法的运行环境的硬件结构示意图。FIG. 4 is a schematic diagram of the hardware structure of the operating environment of the implementation method of voice recognition interaction in the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚,下面将对本发明的技术方案进行详细的描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所得到的所有其它实施方式,都属于本发明所保护的范围。In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be described in detail below. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other implementations obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

下面结合附图介绍本申请实施例中提供的一个具体的语音识别交互的实现方法及装置。A specific method and device for implementing voice recognition interaction provided in the embodiments of the present application will be described below with reference to the accompanying drawings.

如图1所示,本申请实施例中提供的语音识别交互的实现方法,包括:As shown in Figure 1, the implementation method of the speech recognition interaction provided in the embodiment of the present application includes:

S101,获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;S101. Obtain the voice to be recognized and send it to the voice wake-up engine for analysis and processing, output the wake-up result, and output wake-up delay information after confirming the wake-up result;

一些实施例中,所述语音唤醒引擎包括:数据处理单元和边界处理单元;所述输出唤醒结果和唤醒延时信息,包括:In some embodiments, the voice wake-up engine includes: a data processing unit and a boundary processing unit; the output wake-up result and wake-up delay information include:

所述数据处理单元对所述待识别语音进行分析,判断所述待识别语音中是否满足唤醒条件,根据判断结果输出唤醒结果;The data processing unit analyzes the speech to be recognized, judges whether the wake-up condition is satisfied in the speech to be recognized, and outputs a wake-up result according to the judgment result;

所述边界处理单元对所述待识别语音进行处理,得到唤醒延时信息。The boundary processing unit processes the speech to be recognized to obtain wake-up delay information.

一些实施例中,判断所述待识别语音中是否满足唤醒条件,包括:In some embodiments, judging whether the wake-up condition is satisfied in the speech to be recognized includes:

判断所述待识别语音中是否存在唤醒词,如果存在,则所述待识别语音满足唤醒条件,否则所述待识别语音不满足唤醒条件。Judging whether there is a wake-up word in the voice to be recognized, if yes, the voice to be recognized meets the wake-up condition, otherwise the voice to be recognized does not meet the wake-up condition.

具体的,如图2所示,本申请中首先获取到待识别语音,例如待识别语音为:“你好小意,打开XXAPP”,其中,“你好小意”为唤醒词,可以理解的是,唤醒词通过唤醒算法存储在语音唤醒引擎中,唤醒词可以包括多个,本申请在此不做限定。在数据处理单元对待识别语音进行分析识别到唤醒词后,唤醒结果为唤醒,此时边界处理单元对所述待识别语音进行处理,得到唤醒延时信息。如果数据处理单元识别不到唤醒词,则唤醒结果为失败,无法唤醒,也就不会输出唤醒延时信息。Specifically, as shown in FIG. 2 , in this application, the voice to be recognized is first obtained. For example, the voice to be recognized is: "Hi Hao Xiaoyi, open XXAPP", wherein "Hi Hao Xiaoyi" is a wake-up word, which is understandable Yes, the wake-up word is stored in the voice wake-up engine through the wake-up algorithm, and the wake-up word can include multiple, which is not limited in this application. After the data processing unit analyzes the voice to be recognized and recognizes the wake-up word, the wake-up result is wake-up, and the boundary processing unit processes the voice to be recognized to obtain wake-up delay information. If the data processing unit fails to recognize the wake-up word, the wake-up result is a failure, and the wake-up cannot be performed, and the wake-up delay information will not be output.

一些实施例中,所述对所述待识别语音进行处理,得到唤醒延时信息,包括:In some embodiments, the processing the speech to be recognized to obtain wake-up delay information includes:

对所述待识别语音进行分析,根据预设时间段的区间采样点的值计算平均值,将所述平均值确定为语音能量值;Analyzing the speech to be recognized, calculating an average value according to the values of interval sampling points in a preset time period, and determining the average value as a speech energy value;

根据语音能量值判断是否保留所述区间的语音数据作为唤醒延时信息;其中,所述唤醒延时信息中存在唤醒词最后一个字的尾音。It is judged according to the voice energy value whether to retain the voice data in the interval as the wake-up delay information; wherein, the wake-up delay information includes the ending sound of the last word of the wake-up word.

具体实施例中,所述根据语音能量值判断是否保留所述区间的语音数据作为唤醒延时信息,包括:In a specific embodiment, the judging whether to retain the voice data in the interval as the wake-up delay information according to the voice energy value includes:

如果唤醒延时信息的语音能量值小于所述唤醒延时信息中第一帧的能量值,则丢弃所述唤醒延时信息;If the voice energy value of the wake-up delay information is less than the energy value of the first frame in the wake-up delay information, discard the wake-up delay information;

如果唤醒延时信息的语音能量值大于所述唤醒延时信息中第一帧的能量值,则保留所述换线延时信息。If the voice energy value of the wake-up delay information is greater than the energy value of the first frame in the wake-up delay information, the line change delay information is retained.

一些实施例中,所述预设时间段为10s。In some embodiments, the preset time period is 10s.

具体的,本申请对语音数据进行处理,根据唤醒延时片段的数据能量值大小选择性抛弃或保留送给识别引擎。能量值是根据区间采样点的值做平均值计算,比如区间可以为10ms。Specifically, the present application processes the voice data, and selectively discards or reserves it for the recognition engine according to the data energy value of the wake-up delay segment. The energy value is calculated based on the average value of the interval sampling point, for example, the interval can be 10ms.

具体的,例如,选取待识别语音的的一个语音区间作为唤醒延时片段,其中唤醒延时片段包括待识别语音最后一个字的尾音,计算唤醒延时片段中多个采样点的能量值,然后对多个能量值取平均值,将平均值作为语音能量值,将语音能量值与唤醒延时信息中第一帧的能量值进行对比,如果语音能量值小于所述唤醒延时信息中第一帧的能量值,则丢弃所述唤醒延时信息,以解决多字的问题。如果语音能量值大于唤醒延时信息中第一帧的能量值唤醒片段会被保留,并送给识别引擎处理,来解决丢字的问题。Specifically, for example, select a speech interval of the speech to be recognized as the wake-up delay segment, wherein the wake-up delay segment includes the end sound of the last word of the speech to be recognized, calculate the energy value of a plurality of sampling points in the wake-up delay segment, and then Taking the average value of multiple energy values, using the average value as the voice energy value, comparing the voice energy value with the energy value of the first frame in the wake-up delay information, if the voice energy value is less than the first frame in the wake-up delay information The energy value of the frame, then discard the wake-up delay information to solve the problem of multiple words. If the voice energy value is greater than the energy value of the first frame in the wake-up delay information, the wake-up segment will be retained and sent to the recognition engine for processing to solve the problem of word loss.

S102,将所述唤醒延时信息发送至语音识别引擎,输出识别结果。S102. Send the wake-up delay information to a speech recognition engine, and output a recognition result.

最终将唤醒延时信息发送至语音识别引擎中进行识别,输出识别结果。Finally, the wake-up delay information is sent to the speech recognition engine for recognition, and the recognition result is output.

语音识别交互的实现方法的工作原理为:参见图2,先利用语音唤醒引擎中的数据处理单元对待识别语音进行分析识别到唤醒词后,唤醒结果为唤醒,此时边界处理单元对所述待识别语音进行处理,得到唤醒延时信息。如果数据处理单元识别不到唤醒词,则唤醒结果为失败,无法唤醒,也就不会输出唤醒延时信息。得到唤醒延时信息后发送至语音识别引擎进行识别,得到识别结果。通过本申请提供的语音识别交互的实现方法,能够解决语音识别时多字或丢字的问题,提高识别准确率。The working principle of the implementation method of voice recognition interaction is as follows: Referring to Fig. 2, first use the data processing unit in the voice wake-up engine to analyze the voice to be recognized and recognize the wake-up word, the wake-up result is wake-up, and the boundary processing unit now Recognize the voice for processing, and get the wake-up delay information. If the data processing unit fails to recognize the wake-up word, the wake-up result is a failure, and the wake-up cannot be performed, and the wake-up delay information will not be output. After the wake-up delay information is obtained, it is sent to the speech recognition engine for recognition, and the recognition result is obtained. The method for implementing voice recognition interaction provided by the present application can solve the problem of multiple characters or missing characters during voice recognition, and improve the recognition accuracy.

如图3所示,本申请实施例提供一种语音识别交互的实现装置,包括:As shown in Figure 3, the embodiment of the present application provides a device for implementing speech recognition interaction, including:

唤醒模块301,用于获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;The wake-up module 301 is used to obtain the voice to be recognized and send it to the voice wake-up engine for analysis and processing, output the wake-up result, and output wake-up delay information after determining the wake-up result;

识别模块302,用于将所述唤醒延时信息发送至语音识别引擎,输出识别结果。The recognition module 302 is configured to send the wake-up delay information to a voice recognition engine, and output a recognition result.

本申请提供的语音识别交互的实现装置的工作原理为,唤醒模块301获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;识别模块302将所述唤醒延时信息发送至语音识别引擎,输出识别结果。The working principle of the implementation device for voice recognition interaction provided by this application is that the wake-up module 301 acquires the voice to be recognized and sends it to the voice wake-up engine for analysis and processing, outputs the wake-up result, and outputs wake-up delay information after confirming the wake-up result; the recognition module 302 Send the wake-up delay information to the speech recognition engine, and output the recognition result.

本申请提供一种计算机设备,包括:存储器和处理器,还可以包括网络接口,所述存储器存储有计算机程序,存储器可以包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flashRAM)。该计算机设备存储有操作系统,存储器是计算机可读介质的示例。所述计算机程序被所述处理器执行时,使得所述处理器执行语音识别交互的实现方法,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。The present application provides a computer device, including: a memory and a processor, and may also include a network interface, the memory stores a computer program, and the memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory such as read-only memory (ROM) or flash memory (flashRAM). The computer device stores an operating system and the memory is an example of a computer readable medium. When the computer program is executed by the processor, the processor executes the implementation method of speech recognition interaction. The structure shown in FIG. The limitation of the computer device to which the application scheme is applied, the specific computer device may include more or less components than those shown in the figure, or combine certain components, or have a different arrangement of components.

在一个实施例中,本申请提供的语音识别交互的实现方法可以实现为一种计算机程序的形式,计算机程序可在如图4所示的计算机设备上运行。In one embodiment, the method for implementing speech recognition interaction provided in this application may be implemented in the form of a computer program, and the computer program may run on the computer device as shown in FIG. 4 .

一些实施例中,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;将所述唤醒延时信息发送至语音识别引擎,输出识别结果。In some embodiments, when the computer program is executed by the processor, the processor performs the following steps: acquire the voice to be recognized and send it to the voice wake-up engine for analysis and processing, output the wake-up result, and output the wake-up result after determining the wake-up result Wake-up delay information; sending the wake-up delay information to a speech recognition engine, and outputting a recognition result.

本申请还提供一种计算机存储介质,计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光光盘(DVD)或其他光学存储、磁盒式磁带存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。The present application also provides a computer storage medium. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random Access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD-ROM), digital multiplayer Functional Optical Disc (DVD) or other optical storage, magnetic cassette storage or other magnetic storage device, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

一些实施例中,本发明还提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果,确定唤醒结果后输出唤醒延时信息;将所述唤醒延时信息发送至语音识别引擎,输出识别结果。In some embodiments, the present invention also proposes a computer-readable storage medium, which stores a computer program. When the computer program is executed by the processor, the speech to be recognized is acquired and sent to the voice wake-up engine for analysis and processing, and the wake-up call is output. As a result, the wake-up delay information is output after the wake-up result is determined; the wake-up delay information is sent to the speech recognition engine, and the recognition result is output.

综上所述,本发明提供一种语音识别交互的实现方法及装置,所述方法包括获取待识别语音并发送至语音唤醒引擎中进行分析处理,输出唤醒结果和唤醒延时信息;将所述唤醒延时信息发送至语音识别引擎,输出识别结果。本发明通过语音唤醒引擎中设置的边界处理以及唤醒时延,能够识别多字或少字的问题,使得语音识别准确率更高,提高了智能语音助手的反应速度和准确度,使得智能语音变的更加智能。To sum up, the present invention provides a method and device for realizing voice recognition interaction. The method includes acquiring the voice to be recognized and sending it to the voice wake-up engine for analysis and processing, outputting the wake-up result and wake-up delay information; The wake-up delay information is sent to the speech recognition engine, and the recognition result is output. The present invention can recognize the problem of many words or few words through the boundary processing and wake-up delay set in the voice wake-up engine, so that the accuracy of voice recognition is higher, the response speed and accuracy of the intelligent voice assistant are improved, and the intelligent voice changes more intelligent.

可以理解的是,上述提供的方法实施例与上述的装置实施例对应,相应的具体内容可以相互参考,在此不再赘述。It can be understood that the method embodiments provided above correspond to the above device embodiments, and the corresponding specific contents can be referred to each other, and will not be repeated here.

本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowcharts and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令方法的制造品,该指令方法实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising the instructions, the instructions The method implements the function specified in the procedure or procedures of the flowchart and/or the block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Anyone skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. Should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (10)

1. A method for implementing voice recognition interaction is characterized by comprising the following steps:
acquiring a voice to be recognized, sending the voice to be recognized to a voice awakening engine for analysis and processing, outputting an awakening result, and outputting awakening delay information after the awakening result is determined;
and sending the awakening delay information to a voice recognition engine, and outputting a recognition result.
2. The method of claim 1, wherein the voice wake engine comprises: a data processing unit and a boundary processing unit; the outputting of the wake-up result and the wake-up delay information includes:
the data processing unit analyzes the voice to be recognized, judges whether the voice to be recognized meets an awakening condition or not, and outputs an awakening result according to a judgment result;
and the boundary processing unit processes the voice to be recognized to obtain awakening delay information.
3. The method according to claim 2, wherein determining whether the voice to be recognized satisfies a wake-up condition comprises:
judging whether a wake-up word exists in the voice to be recognized, if so, judging that the voice to be recognized meets a wake-up condition, otherwise, judging that the voice to be recognized does not meet the wake-up condition.
4. The method according to claim 2, wherein the processing the speech to be recognized to obtain wake-up delay information comprises:
analyzing the voice to be recognized, calculating an average value according to values of interval sampling points of a preset time period, and determining the average value as a voice energy value;
judging whether voice data of the interval is reserved as awakening delay information or not according to the voice energy value; and the wake-up delay information contains the tail tone of the last word of the wake-up word.
5. The method according to claim 4, wherein the determining whether to reserve the voice data of the interval as the wake-up delay information according to the voice energy value comprises:
if the voice energy value of the awakening delay information is smaller than the energy value of the first frame in the awakening delay information, discarding the awakening delay information;
and if the voice energy value of the awakening delay information is larger than the energy value of the first frame in the awakening delay information, reserving the line-changing delay information.
6. The method of claim 4,
the preset time period is 10s.
7. An apparatus for implementing voice recognition interaction, comprising:
the awakening module is used for acquiring the voice to be recognized, sending the voice to be recognized to the voice awakening engine for analysis and processing, outputting an awakening result, and outputting awakening delay information after the awakening result is determined;
and the recognition module is used for sending the awakening delay information to a voice recognition engine and outputting a recognition result.
8. The apparatus of claim 7, wherein the wake-up module comprises:
a data processing unit and a boundary processing unit;
the data processing unit is used for analyzing the voice to be recognized, judging whether the voice to be recognized meets an awakening condition or not, and outputting an awakening result according to a judgment result;
and the boundary processing unit is used for processing the voice to be recognized to obtain awakening delay information.
9. A computer device, comprising: a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform a method of implementing a speech recognition interaction according to any one of claims 1 to 6.
10. A computer storage medium, characterized in that a computer program is stored which, when executed by a processor, causes the processor to carry out an implementation method of a speech recognition interaction according to any one of claims 1 to 6.
CN202210799286.9A 2022-07-06 2022-07-06 Method and device for implementing voice recognition interaction Pending CN115376509A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210799286.9A CN115376509A (en) 2022-07-06 2022-07-06 Method and device for implementing voice recognition interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210799286.9A CN115376509A (en) 2022-07-06 2022-07-06 Method and device for implementing voice recognition interaction

Publications (1)

Publication Number Publication Date
CN115376509A true CN115376509A (en) 2022-11-22

Family

ID=84061206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210799286.9A Pending CN115376509A (en) 2022-07-06 2022-07-06 Method and device for implementing voice recognition interaction

Country Status (1)

Country Link
CN (1) CN115376509A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102687196A (en) * 2009-10-08 2012-09-19 西班牙电信公司 Method for the detection of speech segments
US20180033436A1 (en) * 2015-04-10 2018-02-01 Huawei Technologies Co., Ltd. Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal
CN107808670A (en) * 2017-10-25 2018-03-16 百度在线网络技术(北京)有限公司 Voice data processing method, device, equipment and storage medium
CN108711427A (en) * 2018-05-18 2018-10-26 出门问问信息科技有限公司 The acquisition method and device of voice messaging
CN108962262A (en) * 2018-08-14 2018-12-07 苏州思必驰信息科技有限公司 Voice data processing method and device
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device
EP3923272A1 (en) * 2020-06-10 2021-12-15 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for adapting a wake-up model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102687196A (en) * 2009-10-08 2012-09-19 西班牙电信公司 Method for the detection of speech segments
US20180033436A1 (en) * 2015-04-10 2018-02-01 Huawei Technologies Co., Ltd. Speech recognition method, speech wakeup apparatus, speech recognition apparatus, and terminal
CN107808670A (en) * 2017-10-25 2018-03-16 百度在线网络技术(北京)有限公司 Voice data processing method, device, equipment and storage medium
CN108711427A (en) * 2018-05-18 2018-10-26 出门问问信息科技有限公司 The acquisition method and device of voice messaging
CN108962262A (en) * 2018-08-14 2018-12-07 苏州思必驰信息科技有限公司 Voice data processing method and device
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device
EP3923272A1 (en) * 2020-06-10 2021-12-15 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for adapting a wake-up model

Similar Documents

Publication Publication Date Title
US11848008B2 (en) Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
CN106940998B (en) Execution method and device for setting operation
CN111880856B (en) Voice wake-up method, device, electronic equipment and storage medium
CN112151015B (en) Keyword detection method, keyword detection device, electronic equipment and storage medium
CN111192590B (en) Voice wake-up method, device, device and storage medium
US20040176956A1 (en) Block synchronous decoding
CN112992191B (en) Voice endpoint detection method and device, electronic equipment and readable storage medium
CN113674746B (en) Man-machine interaction method, device, equipment and storage medium
CN105845128A (en) Voice identification efficiency optimization method based on dynamic pruning beam prediction
CN112037768A (en) Voice translation method and device, electronic equipment and computer readable storage medium
CN109215647A (en) Voice awakening method, electronic equipment and non-transient computer readable storage medium
CN111901627B (en) Video processing method and device, storage medium and electronic equipment
CN111276124B (en) Keyword recognition method, device, equipment and readable storage medium
US20130090925A1 (en) System and method for supplemental speech recognition by identified idle resources
CN111128172B (en) Voice recognition method, electronic equipment and storage medium
US11194378B2 (en) Information processing method and electronic device
WO2023098459A1 (en) Voice wake-up method and apparatus, electronic device, and readable storage medium
US20200074992A1 (en) Method and apparatus for judging termination of sound reception and terminal device
WO2022178933A1 (en) Context-based voice sentiment detection method and apparatus, device and storage medium
CN114399992A (en) Voice command response method, device and storage medium
JP2002215187A (en) Voice recognition method and apparatus
US11900921B1 (en) Multi-device speech processing
CN115881124B (en) Voice wake-up recognition methods, devices and storage media
CN114299964B (en) Training method and device for voice line recognition model, voice line recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination