CN106128462A

CN106128462A - Speech Recognition Method and System

Info

Publication number: CN106128462A
Application number: CN201610465607.6A
Authority: CN
Inventors: 林瑞华; 黎琛
Original assignee: Dongguan Coolpad Software Tech Co ltd
Current assignee: Dongguan Coolpad Software Tech Co ltd
Priority date: 2016-06-21
Filing date: 2016-06-21
Publication date: 2016-11-16
Also published as: WO2017219495A1

Abstract

The present invention provides a speech recognition method, which is applied to an electronic device, and the method includes: obtaining speech information input by a user; using a first speech recognition method to recognize the speech information to obtain a first speech recognition result, and using a second speech recognition method to recognize the speech information to obtain a second speech recognition result, wherein the first speech recognition method and the second speech recognition method are run in parallel; and displaying the first speech recognition result and the second speech recognition result according to a preset rule. The present invention also provides a speech recognition system. Using the present invention, the second speech recognition method can be used to assist the first speech recognition method in recognizing the user's speech information, thereby improving the speech recognition rate.

Description

Speech Recognition Method and System

【技术领域】【Technical field】

本发明涉及语音识别技术领域，尤其涉及一种基于地理位置辅助的语音识别方法及系统。The invention relates to the technical field of voice recognition, in particular to a voice recognition method and system based on geographic location assistance.

【背景技术】【Background technique】

近年来，语音识别技术取得显著进步，已经从实验室走向市场。在实际应用中，例如自动电话应答系统，通过识别用户的语音输入信息，自动完成和用户的交互。In recent years, speech recognition technology has made remarkable progress and has moved from the laboratory to the market. In practical applications, such as an automatic telephone answering system, the interaction with the user is automatically completed by recognizing the user's voice input information.

目前普通话作为交流的语言已经基本普及，但不同地区的人还有方言的差别，因此受到方言的影响，各地人群的普通话都有不同的特征，但同一区域的人群，其说普通话的语速、语音、语义具有类似性。At present, Putonghua has been basically popularized as a communication language, but people in different regions still have different dialects. Therefore, due to the influence of dialects, the Mandarin of people in different places has different characteristics. Phonetics and semantics are similar.

由于不同地区的用户普通话并不完全标准，带有地区特点，现有的语音输入通过录取用户语音数据并根据语音识别算法识别输出结果，并未结合用户所处的地理位置等数据做辅助参考，对于某些带有地方口音、带有部分地方方言时的语音识别的识别率并不高。Since the Putonghua of users in different regions is not completely standard and has regional characteristics, the existing voice input records the user's voice data and recognizes the output results according to the voice recognition algorithm, without combining the user's geographic location and other data as auxiliary references. The recognition rate of speech recognition with some local accents and some local dialects is not high.

【发明内容】【Content of invention】

鉴于以上内容，有必要提供一种语音识别方法及系统，能根据用户当前的地理位置调用辅助语音数据包来识别用户语音，从而提高语音识别的准确率。In view of the above, it is necessary to provide a voice recognition method and system, which can call auxiliary voice data packets to recognize the user's voice according to the user's current geographic location, thereby improving the accuracy of voice recognition.

一种语音识别方法，应用于电子设备中，该方法包括：A speech recognition method applied to electronic equipment, the method comprising:

获取用户输入的语音信息；Obtain the voice information input by the user;

利用第一语音识别方法识别所述语音信息得到第一语音识别结果，利用第二语音识别方法识别所述语音信息得到第二语音识别结果；及Using a first voice recognition method to recognize the voice information to obtain a first voice recognition result, using a second voice recognition method to recognize the voice information to obtain a second voice recognition result; and

根据预先设置的规则显示所述第一语音识别结果及所述第二语音识别结果。The first speech recognition result and the second speech recognition result are displayed according to preset rules.

根据本发明的一个优选实施例，所述第一语音识别方法是基于预设模型的大词汇量语音识别方法，所述第二语音识别方法是基于辅助语音数据包的语音识别方法。According to a preferred embodiment of the present invention, the first speech recognition method is a large vocabulary speech recognition method based on a preset model, and the second speech recognition method is a speech recognition method based on auxiliary speech data packets.

根据本发明的一个优选实施例，所述基于辅助语音数据包的语音识别方法包括：According to a preferred embodiment of the present invention, the voice recognition method based on auxiliary voice data packets includes:

接收到所述语音信息时，获取该用户当前的地理位置信息；When receiving the voice information, obtain the user's current geographic location information;

根据所述地理位置信息调用对应的辅助语音数据包；及Invoking the corresponding auxiliary voice data package according to the geographic location information; and

根据所述辅助语音数据包识别所述语音信息得到所述第二语音识别结果。Recognizing the voice information according to the auxiliary voice data packet to obtain the second voice recognition result.

根据本发明的一个优选实施例，所述方法还包括：According to a preferred embodiment of the present invention, the method also includes:

预先设置多个基于地理位置的语音数据包，并将所述语音数据包存储于所述电子设备中或者存储于与所述电子设备连接的服务器中。A plurality of voice data packets based on geographic location are preset, and the voice data packets are stored in the electronic device or in a server connected to the electronic device.

根据本发明的一个优选实施例，在根据所述地理位置信息调用对应的辅助语音数据包之前，所述方法还包括：According to a preferred embodiment of the present invention, before invoking the corresponding auxiliary voice data package according to the geographic location information, the method further includes:

根据所述语音信息确定该用户的语音类型，所述语音类型包括口音及方言；及determining the voice type of the user according to the voice information, the voice type including accent and dialect; and

基于所述语音类型和所述地理位置信息共同确定对应的辅助语音数据包。A corresponding auxiliary voice data packet is jointly determined based on the voice type and the geographic location information.

接收到所述语音信息时，获取该用户当前的地理位置信息及历史地理位置信息；及When receiving the voice information, obtain the user's current geographic location information and historical geographic location information; and

根据历史地理位置信息和当前地理位置信息确定调用的辅助语音数据包。The auxiliary voice data package to be called is determined according to the historical geographic location information and the current geographic location information.

结合获取的用户反馈信息更新所述预先设置的规则，所述预先设置的规则包括：The preset rules are updated in combination with the obtained user feedback information, and the preset rules include:

为所述第一语音识别结果预先分配第一权重，为所述第二语音识别结果预先分配第二权重，根据权重值的大小确定对应该权重值的语音识别结果的显示方式；或Pre-assigning a first weight to the first speech recognition result, pre-allocating a second weight to the second speech recognition result, and determining the display mode of the speech recognition result corresponding to the weight value according to the weight value; or

为所述第一语音识别结果预先设置第一识别分数，为所述第二语音识别结果预先设置第二识别分数，根据识别分数的大小确定对应该识别分数的语音识别结果的显示方式，Presetting a first recognition score for the first speech recognition result, presetting a second recognition score for the second speech recognition result, determining the display mode of the speech recognition result corresponding to the recognition score according to the size of the recognition score,

其中，所述显示方式包括显示的时间或显示的位置。Wherein, the display manner includes display time or display position.

根据本发明的一个优选实施例，所述更新所述预先设置的规则包括：According to a preferred embodiment of the present invention, the updating of the preset rules includes:

根据用户选取的语音识别结果，将对应该语音识别结果的权重值或者识别分数值变大，及/或将用户没有选取的语音识别结果对应的权重值或者识别分数值减小。According to the speech recognition result selected by the user, the weight value or recognition score corresponding to the speech recognition result is increased, and/or the weight value or recognition score corresponding to the speech recognition result not selected by the user is decreased.

一种语音识别系统，运行于电子设备中，该系统包括：A speech recognition system running in an electronic device, the system includes:

获取模块，用于获取用户输入的语音信息；An acquisition module, configured to acquire voice information input by the user;

第一识别模块，用于识别所述语音信息得到第一语音识别结果；A first recognition module, configured to recognize the voice information to obtain a first voice recognition result;

第二识别模块，用于识别所述语音信息得到第二语音识别结果；及A second recognition module, configured to recognize the voice information to obtain a second voice recognition result; and

显示模块，用于根据预先设置的规则显示所述第一语音识别结果及所述第二语音识别结果。A display module, configured to display the first speech recognition result and the second speech recognition result according to preset rules.

根据本发明的一个优选实施例，所述第一语音识别模块是基于预设模型的大词汇量语音识别模块，所述第二语音识别模块是基于辅助语音数据包的语音识别模块。According to a preferred embodiment of the present invention, the first speech recognition module is a large vocabulary speech recognition module based on a preset model, and the second speech recognition module is a speech recognition module based on auxiliary speech data packets.

根据本发明的一个优选实施例，According to a preferred embodiment of the present invention,

所述获取模块，还用户接收到所述语音信息时，获取该用户当前的地理位置信息；The obtaining module also obtains the user's current geographic location information when the user receives the voice information;

所述第二识别模块包括：The second identification module includes:

调用子模块，用于根据所述地理位置信息调用对应的辅助语音数据包；及Calling a submodule, used to call the corresponding auxiliary voice data package according to the geographic location information; and

该第二识别模块，用于根据所述辅助语音数据包识别所述语音信息得到所述第二语音识别结果。The second recognition module is configured to recognize the voice information according to the auxiliary voice data packet to obtain the second voice recognition result.

根据本发明的一个优选实施例，所述系统还包括：According to a preferred embodiment of the present invention, the system also includes:

设置模块，用于预先设置多个基于地理位置的语音数据包，并将所述语音数据包存储于所述电子设备中或者存储于与所述电子设备连接的服务器中。The setting module is used to pre-set a plurality of voice data packages based on geographic location, and store the voice data packages in the electronic device or in a server connected to the electronic device.

根据本发明的一个优选实施例，所述系统还包括确定子模块：According to a preferred embodiment of the present invention, the system also includes a determination submodule:

用于根据所述语音信息确定该用户的语音类型，所述语音类型包括口音及方言；及for determining the voice type of the user according to the voice information, the voice type including accent and dialect; and

根据本发明的一个优选实施例，其特征在于，According to a preferred embodiment of the present invention, it is characterized in that,

所述获取模块，还用于接收到所述语音信息时，获取该用户当前的地理位置信息及历史地理位置信息；及The acquiring module is further configured to acquire the user's current geographic location information and historical geographic location information when receiving the voice information; and

所述调用子模块，还用于根据历史地理位置信息和当前地理位置信息确定调用的辅助语音数据包。The invoking sub-module is also used to determine the auxiliary voice data package to be invoked according to historical geographic location information and current geographic location information.

更新模块，用于结合获取的用户反馈信息更新所述预先设置的规则，所述预先设置的规则是由所述设置模块设置的，包括：An update module, configured to update the preset rules in combination with the acquired user feedback information, the preset rules are set by the setting module, including:

根据本发明的一个优选实施例，所述更新模块更新所述预先设置的规则包括：According to a preferred embodiment of the present invention, updating the preset rules by the update module includes:

由以上技术方案可以看出，本发明的语音识别方法及系统能够根据不同区域普通话的特征，建立多个辅助语音数据包，对处于不同地理位置的用户调用不同的辅助语音数据包，可有效的减少语音识别库的种类，并提高语音识别率。As can be seen from the above technical solutions, the speech recognition method and system of the present invention can establish multiple auxiliary voice data packets according to the characteristics of Mandarin in different regions, and call different auxiliary voice data packets for users in different geographic locations, which can effectively Reduce the types of speech recognition libraries and improve the speech recognition rate.

【附图说明】【Description of drawings】

图1是本发明用于执行一个语音识别系统的电子设备较佳实施方式的硬件架构示意图。FIG. 1 is a schematic diagram of the hardware architecture of an electronic device for implementing a voice recognition system according to the present invention.

图2是本发明语音识别方法较佳实施例的流程图。Fig. 2 is a flowchart of a preferred embodiment of the speech recognition method of the present invention.

图3是本发明基于辅助语音数据包的语音识别方法的较佳实施例的流程图。Fig. 3 is a flow chart of a preferred embodiment of the voice recognition method based on auxiliary voice data packets in the present invention.

图4是本发明语音识别系统第一实施例的功能模块图。Fig. 4 is a functional block diagram of the first embodiment of the speech recognition system of the present invention.

图5是本发明语音识别系统第二实施例的功能模块图。Fig. 5 is a functional block diagram of the second embodiment of the speech recognition system of the present invention.

【主要元件符号说明】[Description of main component symbols]

【具体实施方式】【detailed description】

为了使本发明的目的、技术方案和优点更加清楚，下面结合附图和具体实施例对本发明进行详细描述。显然，所描述的实施例仅仅是本发明的一部分实施例，而不是全部的实施例。此外，应当理解，本文所描述的具体实施例，仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. In addition, it should be understood that the specific embodiments described herein are only used to explain the present invention, not to limit the present invention.

如图1所示，是本发明用于执行一个语音识别系统的电子设备较佳实施例的硬件架构示意图。如该硬件架构示意图所示，电子设备1包括语音识别系统10。该电子设备1还包括存储单元20、显示单元30、处理单元40及语音接收单元50。As shown in FIG. 1 , it is a schematic diagram of the hardware architecture of a preferred embodiment of an electronic device for implementing a speech recognition system according to the present invention. As shown in the schematic diagram of the hardware architecture, the electronic device 1 includes a speech recognition system 10 . The electronic device 1 further includes a storage unit 20 , a display unit 30 , a processing unit 40 and a voice receiving unit 50 .

优选地，本发明的语音识别方法通过所述电子设备1中的语音识别系统10来实现。Preferably, the speech recognition method of the present invention is implemented by the speech recognition system 10 in the electronic device 1 .

所述电子设备1包括一种能够按照事先设定或存储的指令，自动进行数值计算和/或信息处理的电子设备，其硬件包括但不限于微处理器、专用集成电路(ApplicationSpecific Integrated Circuit，ASIC)、可编程门阵列(Field Programmable Gate Array，FPGA)、数字处理器(Digital Signal Processor，DSP)、嵌入式设备等。所述电子设备1还可包括用户设备。所述用户设备包括但不限于任何一种可与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互的电子产品，例如，个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant，PDA)、游戏机、交互式网络电视(InternetProtocol Television，IPTV)、智能式穿戴设备等。其中，所述用户设备所处的网络包括但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network，VPN)等。The electronic device 1 includes an electronic device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) ), programmable gate array (Field Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc. The electronic device 1 may also include user equipment. The user equipment includes but is not limited to any electronic product that can interact with the user through keyboards, mice, remote controls, touch pads, or voice-activated devices, such as personal computers, tablet computers, smart phones, personal digital Assistant (Personal Digital Assistant, PDA), game console, interactive Internet TV (Internet Protocol Television, IPTV), smart wearable devices, etc. Wherein, the network where the user equipment is located includes but is not limited to the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN) and the like.

需要说明的是，所述用户设备仅为举例，其他现有的或今后可能出现的用户设备如可适应于本发明，也应包含在本发明的保护范围以内，并以引用方式包含于此。It should be noted that the user equipment described above is only an example, and other existing or future user equipment that can be adapted to the present invention should also be included in the protection scope of the present invention and included here by reference.

在一个实施例中，所述语音识别系统10用于当用户输入语音信息时，获取该用户输入的语音信息，利用基于预设模型的大词汇量语音识别方法(例如，基于隐马尔可夫模型的大词汇量语音识别方法)对所输入的语音信息进行识别得到第一语音识别结果，利用基于辅助语音数据包的语音识别方法(例如，根据该用户当前的地理位置信息调用与该地理位置信息相对应的辅助语音数据包)进行识别得到第二语音识别结果。所述语音识别系统10通过比较第一语音识别结果和第二语音识别结果得到一个最优识别结果，不仅提高了语音识别率，还提高了用户的体验。In one embodiment, the speech recognition system 10 is configured to acquire the speech information input by the user when the user inputs the speech information, and utilize a large vocabulary speech recognition method based on a preset model (for example, based on a hidden Markov model Large vocabulary speech recognition method) to recognize the input speech information to obtain the first speech recognition result, using the speech recognition method based on auxiliary speech data packets (for example, according to the user's current geographic location information call and the geographic location information corresponding auxiliary voice data packet) to obtain the second voice recognition result. The voice recognition system 10 obtains an optimal recognition result by comparing the first voice recognition result and the second voice recognition result, which not only improves the voice recognition rate, but also improves user experience.

在一个实施例中，所述存储单元20用于存储安裝于所述电子设备1中的软件程序及数据，例如所述语音识别系统10。该存储单元20可以是所述电子设备1的内部存储单元，例如所述电子设备1的硬盘或者内存。该存储单元20也可以是所述电子设备1的外部存储设备，例如所述电子设备1上的插接式硬盘、智能媒体卡(Smart Media Card，SMC)、安全数字卡(Secure Digital Card，SD)、快闪存储器卡(flash card)等储存单元。进一步地，所述存储单元20还可以既包括所述电子设备1的内部存储单元，也可以包括外部存储设备。In one embodiment, the storage unit 20 is used to store software programs and data installed in the electronic device 1 , such as the speech recognition system 10 . The storage unit 20 may be an internal storage unit of the electronic device 1 , such as a hard disk or memory of the electronic device 1 . The storage unit 20 can also be an external storage device of the electronic device 1, such as a plug-in hard disk on the electronic device 1, a smart media card (Smart Media Card, SMC), a secure digital card (Secure Digital Card, SD ), flash memory card (flash card) and other storage units. Further, the storage unit 20 may include both an internal storage unit of the electronic device 1 and an external storage device.

在本实施例中，所述存储单元20中预先存储有多个辅助语音数据包及与该多个辅助语音数据包相对应的语音信息。所述辅助语音数据包可以是基于地理位置的语音数据包，对应地，所述存储单元20中存储的是具有该地理位置语音特征的语音信息。In this embodiment, the storage unit 20 pre-stores a plurality of auxiliary voice data packets and voice information corresponding to the plurality of auxiliary voice data packets. The auxiliary voice data package may be a voice data package based on a geographic location, and correspondingly, the storage unit 20 stores voice information having a voice feature of the geographic location.

在本实施例中，所述的地理位置是以地市为单位进行划分的。在其他实施例中，对于方言复杂的地理位置，还可细分到地市以下的区域，例如，以县级市为单位进行划分或者以设定的区域为单位进行划分。In this embodiment, the geographical location is divided in units of prefectures and cities. In other embodiments, geographical locations with complex dialects can also be subdivided into areas below prefectures and cities, for example, divided in units of county-level cities or in units of set areas.

由于在同一地理位置，所讲的普通话也会存在口音和方言的区别。或者即使不在同一地理位置，方言或者口音也有可能相同，因此，所述存储单元20中存储的基于地理位置的语音数据包在其他的一些实施例中进一步包括基于方言和地理位置的语音数据包及基于口音和地理位置的语音数据包。Due to the same geographical location, there will be differences in accents and dialects in the Mandarin spoken. Or even if they are not in the same geographic location, the dialect or accent may be the same, therefore, the voice data packets based on the geographic location stored in the storage unit 20 further include voice data packets based on the dialect and geographic location and Voice data packages based on accent and geographic location.

例如，基于方言和地理位置的语音数据包可以包括：粤语_香港、粤语_广州、闽南语_泉州、闽南语_厦门。基于口音和地理位置的语音数据包可以包括：口音_福建、口音_广州。需要说明的是，基于口音和地理位置的语音数据包包括，但不限于，声母、韵母的吐字方式。For example, the voice data packets based on dialect and geographic location may include: Cantonese_Hong Kong, Cantonese_Guangzhou, Hokkien_Quanzhou, Hokkien_Xiamen. The voice data packets based on the accent and geographic location may include: Accent_Fujian, Accent_Guangzhou. It should be noted that the voice data packets based on the accent and geographic location include, but are not limited to, the way of pronouncing initials and finals.

在一个实施例中，所述显示单元30用来显示图形用户界面(Graphic UserInterface，GUI)，该图形用户界面中可包括多个应用程序图标及/或多个虚拟按键，该应用程序图标及虚拟按键可以是代表所述电子设备1所能提供的各个功能，例如语音输入图标代表了所述电子设备1可提供语音输入的功能，地理位置选择列表按键代表了所述电子设备1可提供选择地理位置的功能，以及文本输入框按键代表了所述电子设备1可提供输入地理位置的功能。In one embodiment, the display unit 30 is used to display a graphical user interface (Graphic User Interface, GUI), which may include a plurality of application program icons and/or a plurality of virtual keys, the application program icons and virtual keys Buttons can represent various functions that the electronic device 1 can provide. For example, the voice input icon represents the function that the electronic device 1 can provide for voice input, and the geographic location selection list button represents that the electronic device 1 can provide geographical selection. The function of location and the button of the text input box represent that the electronic device 1 can provide a function of inputting a geographic location.

所述显示单元30可以是，但不限于，触摸显示屏等具有触摸功能的显示单元。故用户除了可通过所述显示单元30观看所述电子设备1所显示的应用程序图标及/或虚拟按键外，也可通过所述显示单元30输入功能指令，例如，运行所述应用程序图标对应的应用程序的指令，或者激活虚拟按键启动相应的功能的指令。The display unit 30 may be, but not limited to, a display unit with touch function such as a touch screen. Therefore, in addition to viewing the application program icons and/or virtual keys displayed on the electronic device 1 through the display unit 30, the user can also input functional instructions through the display unit 30, for example, to run the corresponding application program icon. instructions of the application program, or instructions for activating the virtual key to start the corresponding function.

在一个实施例中，所述处理单元40是一个或者多个中央处理器(CentralProcessing unit，CPU)、微处理器或其他数字处理芯片等。该处理单元40用于执行软件程序代码或运算数据，例如执行所述的语音识别系统10。本实施例中，所述处理单元40接收用户输入的语音信息，同时获取该用户当前的地理位置信息，在进行语音识别时，结合基于预设模型的大词汇量语音识别(例如，基于隐马尔可夫模型的大词汇量语音识别方法，或者基于人工神经网络模型的语音识别方法)和基于辅助语音数据包的语音识别(例如，基于地理位置的辅助语音数据包的语音识别)分别输出第一识别结果和第二识别结果，根据用户比较第一识别结果和第二识别结果做出的选择，动态调整基于预设模型的大词汇量语音识别和基于辅助语音数据包的语音识别的权重，以提高语音识别的准确率。In one embodiment, the processing unit 40 is one or more central processing units (Central Processing unit, CPU), microprocessors or other digital processing chips. The processing unit 40 is used for executing software program codes or computing data, such as executing the speech recognition system 10 described above. In this embodiment, the processing unit 40 receives the voice information input by the user, and at the same time obtains the user's current geographical location information, and when performing voice recognition, it combines the large vocabulary voice recognition based on the preset model (for example, based on Hidden Mark Large vocabulary speech recognition method of Cove model, or speech recognition method based on artificial neural network model) and speech recognition based on auxiliary speech data packet (for example, speech recognition based on auxiliary speech data packet of geographic location) respectively output the first The recognition result and the second recognition result, according to the selection made by the user comparing the first recognition result and the second recognition result, dynamically adjust the weight of the speech recognition based on the large vocabulary of the preset model and the speech recognition based on the auxiliary voice data packet, to Improve the accuracy of speech recognition.

所述处理单元40与所述语音识别系统10、存储单元20、显示单元30及语音输入单元50通讯连接。所述通讯可以通过串行外围设备接口总线(Universal Serial Bus，USB)或其他通信路径或协议来实现。The processing unit 40 is communicatively connected with the voice recognition system 10 , the storage unit 20 , the display unit 30 and the voice input unit 50 . The communication may be implemented through a serial peripheral device interface bus (Universal Serial Bus, USB) or other communication paths or protocols.

所述语音输入单元50用于录入用户的语音信息。所述显示单元30包括，但不限于，麦克风。The voice input unit 50 is used to input voice information of the user. The display unit 30 includes, but is not limited to, a microphone.

如图2所示，是本发明语音识别方法的较佳实施例的流程图。根据不同的需求，该流程图中步骤的顺序可以改变，某些步骤可以省略。As shown in FIG. 2 , it is a flowchart of a preferred embodiment of the speech recognition method of the present invention. According to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

S100，获取用户输入的语音信息。S100. Acquire voice information input by a user.

在本实施例中，用户可以直接通过所述电子设备1的语音接收单元50输入语音，所述语音识别系统10根据用户输入语音的内容获取语音信息。In this embodiment, the user can directly input voice through the voice receiving unit 50 of the electronic device 1 , and the voice recognition system 10 acquires voice information according to the content of the voice input by the user.

在其他实施例中，所述电子设备1的显示单元30提供了一个图形用户界面，所述图形用户界面上包括一个语音输入图标，在用户点击所述语音输入图标时，所述语音识别系统10通过所述语音接收单元50获取用户输入的语音信息。In other embodiments, the display unit 30 of the electronic device 1 provides a graphical user interface, the graphical user interface includes a voice input icon, when the user clicks the voice input icon, the voice recognition system 10 The voice information input by the user is acquired through the voice receiving unit 50 .

S102，利用第一语音识别方法识别所述语音信息得到第一识别结果，以及利用第二语音识别方法识别所述语音信息得到第二识别结果。S102. Use a first voice recognition method to recognize the voice information to obtain a first recognition result, and use a second voice recognition method to recognize the voice information to obtain a second recognition result.

在本实施例中，所述第一语音识别方法识别可以是基于预设模型的大词汇量语音识别方法，所述第二语音识别方法可以是基于辅助语音数据包的语音识别方法。即利用基于辅助语音数据包的语音识别方法协助基于预设模型的大词汇量语音识别方法进行语音识别。所述基于辅助语音数据包的语音识别方法可以是基于地理位置建立的辅助语音数据包的语音识别方法。在一些实施例中，所述语音识别系统10可以先执行所述第一语音识别方法识别所述语音信息，再执行所述第二语音识别方法识别所述第二语音信息。In this embodiment, the first speech recognition method may be a large vocabulary speech recognition method based on a preset model, and the second speech recognition method may be a speech recognition method based on auxiliary speech data packets. That is, the speech recognition method based on the auxiliary speech data packet is used to assist the speech recognition method based on a preset model with a large vocabulary. The voice recognition method based on auxiliary voice data packets may be a voice recognition method based on auxiliary voice data packets established by geographic location. In some embodiments, the speech recognition system 10 may first execute the first speech recognition method to recognize the speech information, and then execute the second speech recognition method to recognize the second speech information.

在一些实施例中，为了提高识别效率，所述语音识别系统10可以并行执行所述第一语音识别方法与所述第二语音识别方法分别识别所述语音信息。利用所述基于预设模型的大词汇量语音识别方法识别所述语音信息时，同时利用所述基于辅助语音数据包的语音识别方法识别所述语音信息，即所述语音识别系统10以第一线程运行所述基于预设模型的大词汇量语音识别方法以识别所述语音信息，并行地一第二线程运行所述基于辅助语音数据包的语音识别以识别所述语音信息。In some embodiments, in order to improve recognition efficiency, the speech recognition system 10 may execute the first speech recognition method and the second speech recognition method in parallel to recognize the speech information respectively. When using the preset model-based large vocabulary speech recognition method to recognize the speech information, at the same time utilize the auxiliary speech packet-based speech recognition method to recognize the speech information, that is, the speech recognition system 10 uses the first A thread runs the preset model-based large-vocabulary speech recognition method to recognize the speech information, and a second thread runs the speech recognition based on auxiliary speech data packets to recognize the speech information in parallel.

在本实施例中，所述基于预设模型的大词汇量语音识别方法是指按照标准普通话建立的语音识别库，任何用户均可以调用所述语音识别库，按照标准普通话进行识别。基于预设模型的大词汇量语音识别不考虑方言和地理位置及/或口音和地理位置的影响。所述基于预设模型的大词汇量语音识别方法可采用现有技术中的语音识别方法，通过预先建立的多个模型进行学习、训练以识别用户的语音，并将语音信息转换成文字信息。In this embodiment, the large-vocabulary speech recognition method based on a preset model refers to a speech recognition library established according to standard Mandarin, and any user can call the speech recognition library to perform recognition according to standard Mandarin. Large-vocabulary speech recognition based on preset models does not take dialect and geographic location and/or accent and geographic location into account. The large-vocabulary speech recognition method based on the preset model can adopt the speech recognition method in the prior art, learn and train through multiple pre-established models to recognize the user's speech, and convert the speech information into text information.

所述基于辅助语音数据包的语音识别方法(为便于描述，下文简称为“辅助语音识别方法”)考虑方言和地理位置及/或口音和地理位置的影响，需要事先通过训练和学习建立基于地理位置的语音数据包。关于所述基于地理位置的语音识别方法请参阅图3及相应描述。The speech recognition method based on auxiliary voice data packets (for convenience of description, hereinafter referred to as "auxiliary speech recognition method") considers the impact of dialect and geographical location and/or accent and geographical location, and needs to establish a geographically-based speech recognition method through training and learning in advance. Voice packets for the location. Please refer to FIG. 3 and corresponding description for the speech recognition method based on geographic location.

S104，根据预先设置的规则显示所述第一语音识别结果和第二语音识别结果。S104. Display the first speech recognition result and the second speech recognition result according to a preset rule.

本实施例中，所述预先设置的规则可以是，所述语音识别系统10为所述第一语音识别结果预先分配第一权重，为所述第二语音识别结果预先分配第二权重，根据权重值的大小确定对应该权重值的语音识别结果的显示方式。所述第一权重值和所述第二权重值的总和可以为一固定数，例如，为整数1。优选地，所述语音识别系统10预先设置的第一权重值大于第二权重值，也就是说所述语音识别系统10为第一语音识别方法分配的权重值大于为第二语音识别方法分配的权重值。In this embodiment, the preset rule may be that the speech recognition system 10 pre-assigns a first weight to the first speech recognition result, pre-allocates a second weight to the second speech recognition result, and according to the weight The magnitude of the value determines how the speech recognition results for that weight value are displayed. The sum of the first weight value and the second weight value may be a fixed number, for example, an integer 1. Preferably, the first weight value preset by the speech recognition system 10 is greater than the second weight value, that is to say, the speech recognition system 10 assigns a weight value larger to the first speech recognition method than to the second speech recognition method. Weights.

在其他实施例中，所述预先设置的规则还可以是，所述语音识别系统10为所述第一语音识别结果预先设置第一识别分数，为所述第二语音识别结果预先设置第二识别分数，根据识别分数的大小确定对应该识别分数的语音识别结果的显示方式。优选地，所述语音识别系统10预先设置的第一识别分数值大于第二识别分数值。In other embodiments, the preset rule may also be that the speech recognition system 10 presets a first recognition score for the first speech recognition result, and presets a second recognition score for the second speech recognition result. Score, the display mode of the speech recognition result corresponding to the recognition score is determined according to the size of the recognition score. Preferably, the speech recognition system 10 presets a first recognition score value greater than a second recognition score value.

所述语音识别结果的显示方式包括，但不限于：显示的时间及/或显示的位置。The display manner of the voice recognition result includes, but not limited to: display time and/or display location.

例如，所述语音识别系统10预先设置的规则是为语音识别结果分配权重，则当预先设置的第一权重值大于预先设置的第二权重值时，可以在所述电子设备1的显示单元30上将对应权重值大的第一语音识别结果显示在第一位置，如所述显示单元30提供的用户界面的上半部分；当预先设置的第一权重值小于预先设置的第二权重值时，将对应权重值小的第一语音识别结果显示在第二位置，如所述显示单元30提供的用户界面的下半部分。For example, the preset rule of the speech recognition system 10 is to assign weights to the speech recognition results, then when the preset first weight value is greater than the preset second weight value, the display unit 30 of the electronic device 1 Display the first speech recognition result corresponding to a large weight value in the first position, such as the upper part of the user interface provided by the display unit 30; when the preset first weight value is smaller than the preset second weight value , displaying the first speech recognition result corresponding to a smaller weight value in a second position, such as the lower half of the user interface provided by the display unit 30 .

此外，当预先设置的第一权重值大于预先设置的第二权重值时，在所述电子设备1的显示单元30上显示第一语音识别结果，在预设时间之后(例如，2秒后)在所述电子设备1的显示单元30上显示第二语音识别结果。In addition, when the preset first weight value is greater than the preset second weight value, the first speech recognition result is displayed on the display unit 30 of the electronic device 1, after a preset time (for example, after 2 seconds) The second voice recognition result is displayed on the display unit 30 of the electronic device 1 .

在本实施例中，所述的语音识别方法进一步包括：结合获取的用户反馈信息更新所述预先设置的规则。In this embodiment, the speech recognition method further includes: updating the preset rules in combination with the acquired user feedback information.

所述用户反馈信息可以根据用户的操作得到。例如，用户选取了第一语音识别结果，则所述语音识别系统10获取到的用户反馈信息表示最佳语音识别结果是利用第一语音识别方法得到的。若用户选取了第二语音识别结果，则所述语音识别系统10获取到的用户反馈信息表示最佳语音识别结果是利用第二语音识别方法得到的。The user feedback information can be obtained according to user operations. For example, if the user selects the first speech recognition result, the user feedback information obtained by the speech recognition system 10 indicates that the best speech recognition result is obtained by using the first speech recognition method. If the user selects the second speech recognition result, the user feedback information obtained by the speech recognition system 10 indicates that the best speech recognition result is obtained by using the second speech recognition method.

所述更新所述预先设置的规则可以是调整预先设置的权重值或者调整预先设置的识别分数值。The updating of the preset rule may be adjusting a preset weight value or adjusting a preset recognition score value.

具体地，所述语音识别系统10根据用户选取的语音识别结果，将对应该语音识别结果的权重值或者识别分数值变大，及/或将用户没有选取的语音识别结果对应的权重值或者识别分数值减小。例如，当获取的用户反馈信息是选取了第一语音识别结果，则将对应该第一语音识别结果的第一权重值或者第一识别分数值变大，及/或将对应第二语音识别结果的第二权重值或者第二识别分数值减小。当获取的用户反馈信息是选取了第二语音识别结果，则将对应该第二语音识别结果的第二权重值或者第二识别分数值变大，及/或将对应第一语音识别结果的第一权重值或者第一识别分数值减小。Specifically, the speech recognition system 10 increases the weight value or recognition score corresponding to the speech recognition result according to the speech recognition result selected by the user, and/or increases the weight value or recognition score corresponding to the speech recognition result not selected by the user. The score value decreases. For example, when the obtained user feedback information is that the first speech recognition result is selected, the first weight value or the first recognition score value corresponding to the first speech recognition result will be increased, and/or the corresponding second speech recognition result will be The second weight value or the second recognition score value of is decreased. When the obtained user feedback information is that the second speech recognition result is selected, the second weight value or the second recognition score value corresponding to the second speech recognition result will be increased, and/or the second weight value corresponding to the first speech recognition result will be increased. A weight value or first recognition score value decreases.

其中，上述的权重值或者分数值的变大或减小可根据预先设置的比例或者数值进行。Wherein, the above-mentioned increase or decrease of the weight value or score value may be performed according to a preset ratio or value.

请一并参阅图3所示，为基于辅助语音数据包的语音识别方法的较佳实施例的流程图。根据不同的需求，该流程图中步骤的顺序可以改变，某些步骤可以省略。Please also refer to FIG. 3 , which is a flow chart of a preferred embodiment of the voice recognition method based on auxiliary voice data packets. According to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

S1020，接收到用户的语音信息时，获取该用户当前的地理位置信息。S1020, when receiving the voice information of the user, acquire the current geographic location information of the user.

在本实施例中，所述语音识别系统10通过所述电子设备1内置的定位模块及/或网络连接模块获取所述电子设备1当前所在的地理位置信息。所述定位模块包括，但不限于：全球定位系统(Global Positioning System，GPS)。所述所述网络连接模块包括，但不限于：第3代移动通信技术(The 3rd Generation Telecommunication，3G)、通用分组无线业务(General Packet Radio Service，GPRS)以及无线保真技术(wireless fidelity，Wi-Fi)。所述电子设备1当前所在的地理位置信息即被认为是该用户当前所在的地理位置信息。In this embodiment, the voice recognition system 10 obtains the current geographic location information of the electronic device 1 through the built-in positioning module and/or network connection module of the electronic device 1 . The positioning module includes, but is not limited to: Global Positioning System (Global Positioning System, GPS). The network connection module includes, but is not limited to: 3rd generation mobile communication technology (The 3rd Generation Telecommunication, 3G), general packet radio service (General Packet Radio Service, GPRS) and wireless fidelity technology (wireless fidelity, Wi -Fi). The current geographic location information of the electronic device 1 is considered as the current geographic location information of the user.

在一些实施例中，所述语音识别系统10还可以通过接收用户设置的指令，并根据该用户设置的指令确定该用户当前的地理位置信息。In some embodiments, the speech recognition system 10 may also receive an instruction set by the user, and determine the current geographic location information of the user according to the instruction set by the user.

例如，所述电子设备1中设置有位置选择列表，该位置选择列表包括中国所有城市的名称。用户通过触发该位置选择列表，选择与用户输入语音信息相应的地理位置信息。For example, the electronic device 1 is provided with a location selection list, and the location selection list includes names of all cities in China. The user selects geographic location information corresponding to the voice information input by the user by triggering the location selection list.

又如，所述电子设备1中设置有文本输入框，用户通过激活该文本输入框功能，在相应的界面中输入当前地理位置信息。As another example, the electronic device 1 is provided with a text input box, and the user inputs current geographical location information in a corresponding interface by activating the function of the text input box.

S1022，根据所述地理位置信息调用对应的辅助语音数据包。S1022. Invoke a corresponding auxiliary voice data package according to the geographic location information.

在本实施例中，所述电子设备1根据所述地理位置信息从所述存储单元20中调用对应的辅助语音数据包。In this embodiment, the electronic device 1 invokes the corresponding auxiliary voice data package from the storage unit 20 according to the geographic location information.

所述存储单元20中预先存储有辅助语音数据包及该辅助语音数据包包括的具有地理位置语音特征的语音信息。The storage unit 20 pre-stores the auxiliary voice data package and the voice information with geographical location voice features included in the auxiliary voice data package.

例如，所述地理位置信息是广东，则所述语音识别系统10调用识别广东语音特征的辅助语音数据包。For example, if the geographic location information is Guangdong, then the voice recognition system 10 invokes an auxiliary voice data package for identifying voice features of Guangdong.

在一些实施例中，如果所述电子设备1的存储单元20中没有预先存储有对应所述地理位置信息的辅助语音数据包时，则所述语音识别系统10在获取用户当前的地理位置信息时，从与所述电子设备1通讯连接的服务器下载该辅助语音数据包。所述通讯连接可以是无线通讯连接。所述辅助语音数据包由用户事先进行训练和学习得到并布署于所述服务器，所述语音识别系统10可以通过网络请求所述服务器发送对应所述地理位置信息的辅助语音数据包。In some embodiments, if the storage unit 20 of the electronic device 1 is not pre-stored with auxiliary voice data packets corresponding to the geographic location information, the voice recognition system 10 will obtain the current geographic location information of the user , downloading the auxiliary voice data package from a server communicatively connected with the electronic device 1 . The communication link may be a wireless communication link. The auxiliary voice data packet is obtained through training and learning by the user in advance and deployed on the server, and the voice recognition system 10 may request the server to send the auxiliary voice data packet corresponding to the geographic location information through the network.

S1024，根据所述辅助语音数据包识别所述语音信息得到第二语音识别结果。S1024. Recognize the voice information according to the auxiliary voice data packet to obtain a second voice recognition result.

在本实施例中，所述语音识别系统10利用所述第二语音识别方法识别所述语音信息得到所述第二语音识别结果。In this embodiment, the speech recognition system 10 uses the second speech recognition method to recognize the speech information to obtain the second speech recognition result.

进一步地，为了解决即使在同一地理位置也会存在方言或者口音的差别而造成的语音识别率不高的问题，所述语音识别系统10根据所述地理位置信息调用对应的辅助语音数据包之前，所述S1022还可以包括：根据所述语音信息确定该用户的语音类型，并基于所述语音类型和所述地理位置信息共同确定对应的辅助语音数据包。Further, in order to solve the problem of low speech recognition rate caused by differences in dialects or accents even in the same geographic location, before the speech recognition system 10 invokes the corresponding auxiliary voice data package according to the geographic location information, The S1022 may further include: determining the voice type of the user according to the voice information, and jointly determining a corresponding auxiliary voice data packet based on the voice type and the geographic location information.

该用户的语音类型由用户语言的发音和音调决定，可以包括方言和口音。The user's voice type is determined by the pronunciation and intonation of the user's language, which can include dialects and accents.

例如，用户的当前的地理位置为广州，用户的语音类型可以是口音(例如，粤语)，则所述语音识别系统10调用“口音_广州”的辅助语音数据包识别所述语音信息。在一些实施例中，所述语音识别系统10还可以通过获取所述显示单元30提供的包括有文本输入框的界面上输入的信息获取用户的语音类型。For example, the user's current geographic location is Guangzhou, and the user's voice type may be an accent (for example, Cantonese), then the voice recognition system 10 calls the auxiliary voice data packet of "accent_Guangzhou" to recognize the voice information. In some embodiments, the voice recognition system 10 can also acquire the user's voice type by acquiring information input on an interface provided by the display unit 30 that includes a text input box.

更进一步地，为了避免用户临时去某地出差或者旅游时，所述电子设备1获取该用户当前的地理位置信息，并根据该当前的地理位置信息调用相应的辅助语音数据包造成识别率低时，所述S1022还可以包括：获取用户当前的地理位置信息以及历史地理位置信息，并根据历史地理位置信息和当前地理位置信息确定调用的辅助语音数据包。Furthermore, in order to prevent the user from temporarily going on a business trip or traveling in a certain place, the electronic device 1 obtains the user's current geographic location information, and calls the corresponding auxiliary voice data package according to the current geographic location information, resulting in a low recognition rate. , the S1022 may further include: acquiring the user's current geographic location information and historical geographic location information, and determining the called auxiliary voice data package according to the historical geographic location information and the current geographic location information.

在本实施例中，所述历史地理位置信息是指用户的经常居住地的地理位置信息。In this embodiment, the historical geographic location information refers to the geographic location information of the user's usual residence.

例如，用户当前的地理位置为广州，而用户的经常居住地在福建，则电子设备1调用识别福建语音特征的辅助语音数据包来识别所述语音信息。For example, if the user's current geographic location is Guangzhou, but the user's usual place of residence is in Fujian, then the electronic device 1 invokes an auxiliary voice data package for identifying voice features of Fujian to identify the voice information.

综上所述，本发明实施例公开的一种语音识别方法，预先通过训练和学习得到多个辅助语音数据包，该辅助语音数据包是以地理位置为单位进行划分的语音数据库。同时基于用户的语音类型，辅助语音数据包进一步细分为基于方言和地理位置的辅助语音数据包，以及基于口音和地理位置的辅助语音数据包。利用基于预设模型的大词汇量语音识别方法识别用户的语音信息时，同时也利用该辅助语音数据包识别用户的语音信息从而协助所述基于预设模型的大词汇量语音识别方法，不仅提高了用户的语音识别率，也提高了用户体验。To sum up, in the speech recognition method disclosed in the embodiment of the present invention, a plurality of auxiliary speech data packets are obtained through training and learning in advance, and the auxiliary speech data packets are speech databases divided by geographic location. At the same time, based on the voice type of the user, the auxiliary voice data packets are further subdivided into auxiliary voice data packets based on dialect and geographical location, and auxiliary voice data packets based on accent and geographical location. When using the large vocabulary speech recognition method based on the preset model to recognize the user's voice information, the auxiliary voice data packet is also used to recognize the user's voice information to assist the large vocabulary speech recognition method based on the preset model, which not only improves The user's voice recognition rate is improved, and the user experience is also improved.

如图4所示，是本发明语音识别系统的第一实施例的功能模块图。所述语音识别系统10包括获取模块100、第一识别模块102、第二识别模块104、显示模块106、设置模块108及更新模块110。本发明所称的模块是指一种能够被处理单元40所执行并且能够完成固定功能的一系列计算机程序段，其存储在存储单元20中。在本实施例中，关于各模块的功能将在后续的实施例中详述。As shown in FIG. 4 , it is a functional block diagram of the first embodiment of the speech recognition system of the present invention. The speech recognition system 10 includes an acquisition module 100 , a first recognition module 102 , a second recognition module 104 , a display module 106 , a setting module 108 and an update module 110 . The module referred to in the present invention refers to a series of computer program segments that can be executed by the processing unit 40 and can complete fixed functions, which are stored in the storage unit 20 . In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

所述获取模块100，用于获取用户输入的语音信息。The acquiring module 100 is configured to acquire voice information input by a user.

在本实施例中，用户可以直接通过所述电子设备1的语音接收单元50输入语音，所述获取模块100根据用户输入语音的内容获取语音信息。In this embodiment, the user can directly input voice through the voice receiving unit 50 of the electronic device 1 , and the acquisition module 100 acquires voice information according to the content of the voice input by the user.

在其他实施例中，所述电子设备1的显示单元30提供了一个图形用户界面，所示图形用户界面上包括一个语音输入图标，在用户点击所述语音输入图标时，所述获取模块100通过所述语音接收单元50获取用户输入的语音信息。In other embodiments, the display unit 30 of the electronic device 1 provides a graphical user interface, the graphical user interface includes a voice input icon, and when the user clicks the voice input icon, the acquisition module 100 passes The voice receiving unit 50 acquires voice information input by the user.

所述第一识别模块102，用于识别所述语音信息得到第一识别结果。The first recognition module 102 is configured to recognize the voice information to obtain a first recognition result.

所述第二识别模块102，用于识别所述语音信息得到第二识别结果。The second recognition module 102 is configured to recognize the voice information to obtain a second recognition result.

在本实施例中，所述第一识别模块102可以是基于预设模型的大词汇量语音识别模块，所述第二识别模块102可以是基于辅助语音数据包的语音识别模块。即利用基于辅助语音数据包的语音识别模块协助基于预设模型的大词汇量语音识别模块进行语音识别。所述基于辅助语音数据包的语音识别模块可以是基于地理位置建立的辅助语音数据包的语音识别模块。在一些实施例中，所述语音识别系统10可以先执行所述第一语音识别模块102识别所述语音信息，再执行所述第二语音识别模块102识别所述第二语音信息。In this embodiment, the first recognition module 102 may be a large-vocabulary speech recognition module based on a preset model, and the second recognition module 102 may be a speech recognition module based on auxiliary voice data packets. That is, the speech recognition module based on the auxiliary speech data package is used to assist the speech recognition module based on the large vocabulary of the preset model to perform speech recognition. The voice recognition module based on the auxiliary voice data package may be a voice recognition module based on the auxiliary voice data package established based on the geographic location. In some embodiments, the speech recognition system 10 may first execute the first speech recognition module 102 to recognize the speech information, and then execute the second speech recognition module 102 to recognize the second speech information.

在一些实施例中，为了提高识别效率，所述语音识别系统10可以并行执行所述第一语音识别模块102与所述第二语音识别模块102分别识别所述语音信息。利用基于预设模型的大词汇量语音识别模块识别所述语音信息时，同时利用所述基于辅助语音数据包的语音识别模块识别所述语音信息，即所述语音识别系统10以第一线程运行所述第一识别模块102以识别所述语音信息，并行地一第二线程运行所述第二识别模块102以识别所述语音信息。In some embodiments, in order to improve recognition efficiency, the speech recognition system 10 may execute the first speech recognition module 102 and the second speech recognition module 102 in parallel to recognize the speech information respectively. When the speech information is recognized by the large vocabulary speech recognition module based on the preset model, the speech information is recognized by the speech recognition module based on the auxiliary speech data packet at the same time, that is, the speech recognition system 10 runs with the first thread The first recognition module 102 recognizes the voice information, and a second thread runs the second recognition module 102 in parallel to recognize the voice information.

在本实施例中，基于预设模型的大词汇量语音识别模块是指按照标准普通话建立的语音识别库，任何用户均可以调用所述语音识别库，按照标准普通话进行识别。基于预设模型的大词汇量语音识别不考虑方言和地理位置及/或口音和地理位置的影响。所述基于预设模型的大词汇量语音识别模块与现有技术中的相同。In this embodiment, the large-vocabulary speech recognition module based on the preset model refers to a speech recognition library established according to standard Mandarin, and any user can call the speech recognition library to perform recognition according to standard Mandarin. Large-vocabulary speech recognition based on preset models does not take dialect and geographic location and/or accent and geographic location into account. The large vocabulary speech recognition module based on the preset model is the same as that in the prior art.

所述基于辅助语音数据包的语音识别模块(为便于描述，下文简称为“辅助语音识别模块”)考虑方言和地理位置及/或口音和地理位置的影响，需要事先通过训练和学习建立基于地理位置的语音数据包。关于所述基于地理位置的语音识别模块请参阅图5及相应描述。The speech recognition module based on auxiliary voice data packets (for convenience of description, hereinafter referred to as "auxiliary speech recognition module") considers the influence of dialect and geographical location and/or accent and geographical location, and needs to establish a geographical basis through training and learning in advance. Voice packets for the location. For the location-based speech recognition module, please refer to FIG. 5 and the corresponding description.

所述显示模块106，用于根据预先设置的规则显示所述第一语音识别结果和第二语音识别结果。The display module 106 is configured to display the first speech recognition result and the second speech recognition result according to preset rules.

本实施例中，所述预先设置的规则由所述设置模块108预先设置。所述设置模块108可以为所述第一语音识别结果预先分配第一权重，为所述第二语音识别结果预先分配第二权重，根据权重值的大小确定对应该权重值的语音识别结果的显示方式。所述第一权重值和所述第二权重值的总和可以为一固定数，例如，为整数1。优选地，所述设置模块108预先设置的第一权重值大于第二权重值，也就是说所述设置模块108为第一语音识别方法分配的权重值大于为第二语音识别方法分配的权重值。In this embodiment, the preset rule is preset by the setting module 108 . The setting module 108 may pre-allocate a first weight for the first speech recognition result, pre-allocate a second weight for the second speech recognition result, and determine the display of the speech recognition result corresponding to the weight value according to the weight value Way. The sum of the first weight value and the second weight value may be a fixed number, for example, an integer 1. Preferably, the first weight value preset by the setting module 108 is greater than the second weight value, that is to say, the weight value assigned by the setting module 108 to the first speech recognition method is greater than the weight value assigned to the second speech recognition method .

在其他实施例中，所述设置模块108预先设置的规则还可以是，为所述第一语音识别结果预先设置第一识别分数，为所述第二语音识别结果预先设置第二识别分数，根据识别分数的大小确定对应该识别分数的语音识别结果的显示方式。优选地，所述设置模块108预先设置的第一识别分数值大于第二识别分数值。In other embodiments, the preset rule set by the setting module 108 may also be: preset a first recognition score for the first speech recognition result, and preset a second recognition score for the second speech recognition result, according to The size of the recognition score determines the display mode of the speech recognition result corresponding to the recognition score. Preferably, the first recognition score value preset by the setting module 108 is greater than the second recognition score value.

所述语音识别结果的显示方式包括，但不限于：显示的时间及/或显示的位置。但不限于显示的时间和显示的位置。The display manner of the voice recognition result includes, but not limited to: display time and/or display location. But not limited to the displayed time and displayed location.

例如，所述设置模块108预先设置的规则是为语音识别结果分配权重，则当预先设置的第一权重值大于预先设置的第二权重值时，所述显示模块106可以在所述电子设备1的显示单元30上将对应权重值大的第一语音识别结果显示在第一位置，如所述显示单元30提供的用户界面的上半部分；当预先设置的第一权重值小于预先设置的第二权重值时，所述显示模块106将对应权重值小的第一语音识别结果显示在第二位置，如所述显示单元30提供的用户界面的下半部分。For example, the preset rule set by the setting module 108 is to assign weights to speech recognition results, then when the preset first weight value is greater than the preset second weight value, the display module 106 can On the display unit 30 of the display unit 30, the first speech recognition result corresponding to a large weight value is displayed in the first position, such as the upper part of the user interface provided by the display unit 30; when the preset first weight value is smaller than the preset first weight value When the weight value is two, the display module 106 displays the first speech recognition result corresponding to a smaller weight value in a second position, such as the lower part of the user interface provided by the display unit 30 .

此外，当预先设置的第一权重值大于预先设置的第二权重值时，所述显示模块106在所述电子设备1的显示单元30上显示第一语音识别结果，在预设时间之后(例如，2秒后)在所述电子设备1的显示单元30上显示第二语音识别结果。In addition, when the preset first weight value is greater than the preset second weight value, the display module 106 displays the first voice recognition result on the display unit 30 of the electronic device 1, after a preset time (eg , after 2 seconds) display the second speech recognition result on the display unit 30 of the electronic device 1 .

在本实施例中，所述的语音识别系统10进一步包括所述更新模块110，用于结合获取的用户反馈信息更新所述预先设置的规则。In this embodiment, the speech recognition system 10 further includes the update module 110, configured to update the preset rules in combination with the acquired user feedback information.

本实施例中，所述用户反馈信息可以根据用户的操作得到。例如，用户选取了第一语音识别结果，则所述获取模块100获取到的用户反馈信息表示最佳语音识别结果是利用第一语音识别方法得到的。若用户选取了第二语音识别结果，则所述获取模块100获取到的用户反馈信息表示最佳语音识别结果是利用第二语音识别方法得到的。In this embodiment, the user feedback information may be obtained according to user operations. For example, if the user selects the first speech recognition result, the user feedback information obtained by the acquisition module 100 indicates that the best speech recognition result is obtained by using the first speech recognition method. If the user selects the second speech recognition result, the user feedback information obtained by the acquisition module 100 indicates that the best speech recognition result is obtained by using the second speech recognition method.

所述更新模块110更新所述预先设置的规则可以是调整预先设置的权重值或者调整预先设置的识别分数值。The updating module 110 may update the preset rule by adjusting a preset weight value or adjusting a preset recognition score value.

具体地，所述更新模块110根据用户选取的语音识别结果，将对应该语音识别结果的权重值或者识别分数值变大，及/或将用户没有选取的语音识别结果对应的权重值或者识别分数值减小。例如，当获取的用户反馈信息是选取了第一语音识别结果，则所述更新模块110将对应该第一语音识别结果的第一权重值或者第一识别分数值变大，及/或将对应第二语音识别结果的第二权重值或者第二识别分数值减小。当获取的用户反馈信息是选取了第二语音识别结果，则所述更新模块110将对应该第二语音识别结果的第二权重值或者第二识别分数值变大，及/或将对应第一语音识别结果的第一权重值或者第一识别分数值减小。Specifically, according to the speech recognition result selected by the user, the update module 110 increases the weight value or recognition score corresponding to the speech recognition result, and/or increases the weight value or recognition score corresponding to the speech recognition result not selected by the user. The value decreases. For example, when the obtained user feedback information is that the first speech recognition result is selected, the update module 110 increases the first weight value or the first recognition score corresponding to the first speech recognition result, and/or increases the corresponding The second weight value or the second recognition score value of the second speech recognition result is decreased. When the obtained user feedback information is that the second speech recognition result is selected, the update module 110 increases the second weight value or the second recognition score corresponding to the second speech recognition result, and/or increases the second weight value corresponding to the first speech recognition result. The first weight value or the first recognition score value of the speech recognition result decreases.

请一并参阅图5所示，是本发明语音识别系统的第二实施例的功能模块图。其中，所述第二识别模块104包括调用子模块1040、下载子模块1042及确定子模块1044。本发明所称的模块是指一种能够被处理单元40所执行并且能够完成固定功能的一系列计算机程序段，其存储在存储单元20中。在本实施例中，关于各模块的功能将在后续的实施例中详述。Please also refer to FIG. 5 , which is a functional block diagram of the second embodiment of the speech recognition system of the present invention. Wherein, the second identifying module 104 includes a calling submodule 1040 , a downloading submodule 1042 and a determining submodule 1044 . The module referred to in the present invention refers to a series of computer program segments that can be executed by the processing unit 40 and can complete fixed functions, which are stored in the storage unit 20 . In this embodiment, the functions of each module will be described in detail in subsequent embodiments.

所述获取模块100，还用于接收到用户的语音信息时，获取该用户当前的地理位置信息。The acquiring module 100 is further configured to acquire the current geographic location information of the user when receiving the user's voice information.

在本实施例中，所述获取模块100通过所述电子设备1内置的定位模块及/或网络连接模块获取所述电子设备1当前所在的地理位置信息。所述定位模块包括，但不限于：全球定位系统(Global Positioning System，GPS)。所述所述网络连接模块包括，但不限于：第3代移动通信技术(The 3rd Generation Telecommunication，3G)、通用分组无线业务(General Packet Radio Service，GPRS)以及无线保真技术(wireless fidelity，Wi-Fi)。所述电子设备1当前所在的地理位置信息即被认为是该用户当前所在的地理位置信息。In this embodiment, the obtaining module 100 obtains the current geographic location information of the electronic device 1 through the built-in positioning module and/or network connection module of the electronic device 1 . The positioning module includes, but is not limited to: Global Positioning System (Global Positioning System, GPS). The network connection module includes, but is not limited to: 3rd generation mobile communication technology (The 3rd Generation Telecommunication, 3G), general packet radio service (General Packet Radio Service, GPRS) and wireless fidelity technology (wireless fidelity, Wi -Fi). The current geographic location information of the electronic device 1 is considered as the current geographic location information of the user.

在一些实施例中，所述获取模块100还可以通过接收用户设置的指令，并根据该用户设置的指令确定该用户当前的地理位置信息。In some embodiments, the acquiring module 100 may also receive an instruction set by the user, and determine the current geographic location information of the user according to the instruction set by the user.

所述调用子模块1040，用于根据所述地理位置信息调用对应的辅助语音数据包。The calling sub-module 1040 is configured to call the corresponding auxiliary voice data package according to the geographic location information.

在本实施例中，所述调用子模块1040根据所述地理位置信息从所述存储单元20中调用对应的辅助语音数据包。In this embodiment, the calling submodule 1040 calls the corresponding auxiliary voice data package from the storage unit 20 according to the geographic location information.

例如，所述地理位置信息是广东，则所述调用子模块1040调用识别广东语音特征的辅助语音数据包。For example, if the geographic location information is Guangdong, then the invoking submodule 1040 invokes the auxiliary voice data package for identifying the voice characteristics of Cantonese.

在一些实施例中，如果所述电子设备1的存储单元20中没有预先存储有对应所述地理位置信息的辅助语音数据包时，则所述获取模块100在获取用户当前的地理位置信息时，执行所述下载子模块102。所述下载子模块1042从与所述电子设备1通讯连接的服务器下载该辅助语音数据包。所述通讯连接可以是无线通讯连接。所述辅助语音数据包由用户事先进行训练和学习得到并布署于所述服务器，下载子模块1042可以通过网络请求所述服务器发送对应所述地理位置信息的辅助语音数据包。In some embodiments, if the storage unit 20 of the electronic device 1 does not pre-store the auxiliary voice data package corresponding to the geographic location information, when the acquisition module 100 acquires the current geographic location information of the user, Execute the download submodule 102. The downloading sub-module 1042 downloads the auxiliary voice data package from a server communicated with the electronic device 1 . The communication link may be a wireless communication link. The auxiliary voice data package is obtained through training and learning by the user in advance and deployed on the server, and the download sub-module 1042 may request the server to send the auxiliary voice data package corresponding to the geographic location information through the network.

所述第二识别模块104，用于根据所述辅助语音数据包识别所述语音信息得到第二语音识别结果。The second recognition module 104 is configured to recognize the voice information according to the auxiliary voice data packet to obtain a second voice recognition result.

在本实施例中，第二识别模块104利用所述第二语音识别方法识别所述语音信息得到所述第二语音识别结果。In this embodiment, the second recognition module 104 uses the second speech recognition method to recognize the speech information to obtain the second speech recognition result.

进一步地，为了解决即使在同一地理位置也会存在方言或者口音的差别而造成的语音识别率不高的问题，所述第二识别模块104还可以包括确定子模块1044：用于根据所述语音信息确定该用户的语音类型。所述调用子模块1040基于所述语音类型和所述地理位置信息共同确定对应的辅助语音数据包。Further, in order to solve the problem of low speech recognition rate caused by differences in dialects or accents even in the same geographical location, the second recognition module 104 may also include a determination submodule 1044: used to information to determine the user's voice type. The invoking sub-module 1040 jointly determines the corresponding auxiliary voice data package based on the voice type and the geographic location information.

例如，用户的当前的地理位置为广州，用户的语音类型是口音(例如，粤语)，则所述调用子模块1040调用“口音_广州”的辅助语音数据包识别所述语音信息。For example, the user's current geographic location is Guangzhou, and the user's voice type is accent (for example, Cantonese), then the calling sub-module 1040 calls the auxiliary voice data package of "accent_Guangzhou" to identify the voice information.

在一些实施例中，所述获取模块100还可以通过获取所述显示单元30提供的包括有文本输入框的界面上输入的信息获取用户的语音类型。In some embodiments, the obtaining module 100 may also obtain the voice type of the user by obtaining information input on an interface provided by the display unit 30 that includes a text input box.

更进一步地，为了避免用户临时去某地出差或者旅游时，所述获取模块100获取该用户当前的地理位置信息，所述调用子模块1040根据该当前的地理位置信息调用相应的辅助语音数据包造成识别率低时，所述获取模块100还用于获取用户当前的地理位置信息以及历史地理位置信息，所述调用子模块1040根据历史地理位置信息和当前地理位置信息确定调用的辅助语音数据包。Furthermore, in order to prevent the user from temporarily going to a certain place for business or tourism, the acquisition module 100 acquires the user's current geographic location information, and the calling sub-module 1040 invokes the corresponding auxiliary voice data package according to the current geographic location information When the recognition rate is low, the acquiring module 100 is also used to acquire the user's current geographic location information and historical geographic location information, and the calling sub-module 1040 determines the auxiliary voice data package to call according to the historical geographic location information and the current geographic location information .

例如，用户当前的地理位置为广州，而用户的经常居住地在福建，则所述调用子模块1040调用识别福建语音特征的辅助语音数据包来识别所述语音信息。For example, if the user's current geographic location is Guangzhou, but the user's usual residence is in Fujian, then the invoking sub-module 1040 calls the auxiliary voice data package for identifying the voice characteristics of Fujian to identify the voice information.

综上所述，本发明实施例公开的一种语音识别系统，预先通过训练和学习得到多个辅助语音数据包，该辅助语音数据包是以地理位置为单位进行划分的语音数据库。同时基于用户的语音类型，辅助语音数据包进一步细分为基于方言和地理位置的辅助语音数据包，以及基于口音和地理位置的辅助语音数据包。利用基于预设模型的大词汇量语音识别模块识别用户的语音信息时，同时也利用用该辅助语音数据包识别用户的语音信息从而协助所述基于预设模型的大词汇量语音识别方法，不仅提高了用户的语音识别率，也提高了用户体验。To sum up, in the speech recognition system disclosed in the embodiment of the present invention, a plurality of auxiliary speech data packets are obtained through training and learning in advance, and the auxiliary speech data packets are speech databases divided by geographic location. At the same time, based on the voice type of the user, the auxiliary voice data packets are further subdivided into auxiliary voice data packets based on dialect and geographical location, and auxiliary voice data packets based on accent and geographical location. When using the large vocabulary speech recognition module based on the preset model to recognize the user's voice information, the auxiliary voice data packet is also used to identify the user's voice information to assist the large vocabulary speech recognition method based on the preset model, not only Improve the user's voice recognition rate, but also improve the user experience.

在本发明所提供的几个实施例中，应该理解到，所揭露的系统，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may physically exist separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.

上述以软件功能模块的形式实现的集成的单元，可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中，包括若干指令用以使得一台计算机单元(可以是个人计算机，服务器，或者网络单元等)或处理器(processor)执行本发明各个实施例所述方法的部分步骤。The above-mentioned integrated units implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software function modules are stored in a storage medium, and include several instructions to make a computer unit (which may be a personal computer, server, or network unit, etc.) or a processor (processor) execute the methods described in various embodiments of the present invention. partial steps.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一，第二等词语用来表示名称，而并不表示任何特定的顺序。It will be apparent to those skilled in the art that the invention is not limited to the details of the above-described exemplary embodiments, but that the invention can be embodied in other specific forms without departing from the spirit or essential characteristics of the invention. Accordingly, the embodiments should be regarded in all points of view as exemplary and not restrictive, the scope of the invention being defined by the appended claims rather than the foregoing description, and it is therefore intended that the scope of the invention be defined by the appended claims rather than by the foregoing description. All changes within the meaning and range of equivalents of the elements are embraced in the present invention. Any reference sign in a claim should not be construed as limiting the claim concerned. In addition, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or devices stated in the system claims may also be realized by one unit or device through software or hardware. The words first, second, etc. are used to denote names and do not imply any particular order.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或等同替换，而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention without limitation. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent replacements can be made without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A speech recognition method, applied in electronic equipment, is characterized in that, the method comprises:

Obtain the voice information input by the user;

Using a first voice recognition method to recognize the voice information to obtain a first voice recognition result, using a second voice recognition method to recognize the voice information to obtain a second voice recognition result; and

The first speech recognition result and the second speech recognition result are displayed according to preset rules.

2. The speech recognition method according to claim 1, wherein the first speech recognition method is a large vocabulary speech recognition method based on a preset model, and the second speech recognition method is based on auxiliary voice data packets voice recognition method.

3. the voice recognition method as claimed in claim 2, is characterized in that, the voice recognition method based on auxiliary voice data packet comprises:

When receiving the voice information, obtain the user's current geographic location information;

Invoking the corresponding auxiliary voice data package according to the geographic location information; and

Recognizing the voice information according to the auxiliary voice data packet to obtain the second voice recognition result.

4. speech recognition method as claimed in claim 3, is characterized in that, described method also comprises:

A plurality of voice data packets based on geographic location are preset, and the voice data packets are stored in the electronic device or in a server connected to the electronic device.

5. The speech recognition method according to claim 3, wherein, before invoking the corresponding auxiliary voice data package according to the geographic location information, the method further comprises:

determining the voice type of the user according to the voice information, the voice type including accent and dialect; and

A corresponding auxiliary voice data packet is jointly determined based on the voice type and the geographic location information.

6. The speech recognition method according to any one of claims 3 to 5, wherein the method further comprises:

When receiving the voice information, obtain the user's current geographic location information and historical geographic location information; and

The auxiliary voice data package to be called is determined according to the historical geographic location information and the current geographic location information.

7. speech recognition method as claimed in claim 6, is characterized in that, described method also comprises:

The preset rules are updated in combination with the obtained user feedback information, and the preset rules include:

Pre-assigning a first weight to the first speech recognition result, pre-allocating a second weight to the second speech recognition result, and determining the display mode of the speech recognition result corresponding to the weight value according to the weight value; or

Presetting a first recognition score for the first speech recognition result, presetting a second recognition score for the second speech recognition result, determining the display mode of the speech recognition result corresponding to the recognition score according to the size of the recognition score,

Wherein, the display manner includes display time or display position.

8. The speech recognition method according to claim 7, wherein said updating said preset rule comprises:

According to the speech recognition result selected by the user, the weight value or recognition score corresponding to the speech recognition result is increased, and/or the weight value or recognition score corresponding to the speech recognition result not selected by the user is decreased.

9. A speech recognition system running in an electronic device, characterized in that the system comprises:

An acquisition module, configured to acquire voice information input by the user;

A first recognition module, configured to recognize the voice information to obtain a first voice recognition result;

A second recognition module, configured to recognize the voice information to obtain a second voice recognition result; and

A display module, configured to display the first speech recognition result and the second speech recognition result according to preset rules.

10. The speech recognition system according to claim 9, wherein the first speech recognition module is a large vocabulary speech recognition module based on a preset model, and the second speech recognition module is based on auxiliary voice data packets speech recognition module.

11. The speech recognition system as claimed in claim 10, wherein,

The acquiring module is further configured to acquire the current geographic location information of the user when the voice information is received;

The second identification module includes:

Calling a submodule, used to call the corresponding auxiliary voice data package according to the geographic location information; and

The second recognition module is configured to recognize the voice information according to the auxiliary voice data packet to obtain the second voice recognition result.

12. speech recognition system as claimed in claim 11, is characterized in that, described system also comprises:

The setting module is used to pre-set a plurality of voice data packages based on geographic location, and store the voice data packages in the electronic device or in a server connected to the electronic device.

13. speech recognition system as claimed in claim 11, is characterized in that, described system also comprises determining submodule:

for determining the voice type of the user according to the voice information, the voice type including accent and dialect; and

14. The speech recognition system according to any one of claims 11 to 13, wherein:

The acquiring module is further configured to acquire the user's current geographic location information and historical geographic location information when receiving the voice information; and

The invoking sub-module is also used to determine the auxiliary voice data package to be invoked according to historical geographic location information and current geographic location information.

15. speech recognition system as claimed in claim 14, is characterized in that, described system also comprises:

An update module, configured to update the preset rules in combination with the acquired user feedback information, the preset rules are set by the setting module, including:

Wherein, the display manner includes display time or display position.

16. speech recognition system as claimed in claim 15 is characterized in that, described updating module updates described preset rule and comprises: