[go: up one dir, main page]

WO2021000497A1 - Retrieval method and apparatus, and computer device and storage medium - Google Patents

Retrieval method and apparatus, and computer device and storage medium Download PDF

Info

Publication number
WO2021000497A1
WO2021000497A1 PCT/CN2019/118254 CN2019118254W WO2021000497A1 WO 2021000497 A1 WO2021000497 A1 WO 2021000497A1 CN 2019118254 W CN2019118254 W CN 2019118254W WO 2021000497 A1 WO2021000497 A1 WO 2021000497A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
text
feature
recognized
language
Prior art date
Application number
PCT/CN2019/118254
Other languages
French (fr)
Chinese (zh)
Inventor
王建华
马琳
张晓东
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021000497A1 publication Critical patent/WO2021000497A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to a retrieval method, device, computer equipment and storage medium.
  • a retrieval method, device, computer equipment, and storage medium are provided.
  • a retrieval method including:
  • the second feature data is an analysis result of sentiment analysis on the recognized text
  • the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
  • the first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information;
  • the target retrieval content is obtained by searching according to the key information.
  • a retrieval device including
  • the voice acquisition module is used to acquire the voice to be recognized
  • a speech recognition module configured to input the to-be-recognized speech into a trained speech recognition model for recognition to obtain recognized text;
  • the key information confirmation module is used to input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain the first feature data and the second feature data respectively, wherein the first feature data is for the recognition
  • the analysis result of the semantic analysis of the text; the second feature data is the analysis result of the sentiment analysis of the recognized text; it is also used to obtain the target text after the word preprocessing of the recognized text, wherein the word Preprocessing includes word segmentation, removal of staying words, and word filtering; it is also used to input the first feature data, second feature data, and target text into a text classification model, which is based on the first feature data and the first feature data.
  • the first logical rule matching the two feature data is obtained, and the target text is classified according to the first logical rule to obtain key information; and
  • the retrieval module is used to retrieve the target retrieval content according to the key information.
  • a computer device including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
  • the second feature data is an analysis result of sentiment analysis on the recognized text
  • the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
  • the first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information;
  • the target retrieval content is obtained by searching according to the key information.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the computer-readable instructions When executed by one or more processors, the one or more processors perform the following steps:
  • the second feature data is an analysis result of sentiment analysis on the recognized text
  • the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
  • the first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information;
  • the target retrieval content is obtained by searching according to the key information.
  • Fig. 1 is an application scenario diagram of the retrieval method according to one or more embodiments.
  • Fig. 2 is a schematic flowchart of a retrieval method according to one or more embodiments.
  • Fig. 3 is a schematic flow diagram of speech recognition according to one or more embodiments.
  • Fig. 4 is a schematic diagram of a process of speech recognition according to one or more embodiments.
  • Fig. 5 is a schematic flowchart of training steps of a model to be trained according to one or more embodiments.
  • Fig. 6 is a schematic flowchart of a training step of a speech recognition model according to one or more embodiments.
  • Fig. 7 is a block diagram of a retrieval device according to one or more embodiments.
  • Figure 8 is a block diagram of a computer device according to one or more embodiments.
  • Fig. 1 is a diagram of the application environment of the retrieval method in an embodiment.
  • the application environment includes a terminal 110 and a server 120.
  • the terminal 110 and the server 120 communicate through a network.
  • the communication network may be a wireless or wired communication network, such as an IP network, a cellular mobile communication network, etc., where the terminal And the number of servers is not limited.
  • the terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices.
  • the server 120 may be implemented as an independent server or a server cluster composed of multiple servers.
  • the terminal 110 obtains the voice to be recognized.
  • the terminal 110 inputs the voice to be recognized into the trained voice recognition model for recognition, and the recognized text is obtained.
  • the terminal 110 inputs the recognized text into the trained semantic analysis model and the emotion analysis model to obtain the first First feature data and second feature data.
  • the terminal 110 performs word preprocessing on the recognized text, the target text is obtained, and the first feature data, the second feature data, and the target text are input into the text classification model.
  • the text classification model According to the first characteristic data and the second characteristic data, the first logic rule that is successfully matched is obtained, and the target text is classified according to the first logic rule to obtain key information, and the terminal 110 searches according to the key information Get the target retrieval content.
  • the steps of processing the voice on the terminal 110 and finally obtaining the target retrieval content can also be performed on the server 120. Specifically, after the terminal 110 obtains the voice to be recognized, it sends the voice to be recognized to the server 120, where the voice to be recognized is processed on the server 120 to obtain the target retrieval content, and the server 120 returns the target retrieval content to the terminal.
  • a retrieval method is provided. Taking the method applied to the terminal in FIG. 1 as an example, the method includes the following steps:
  • Step 210 Obtain a voice to be recognized.
  • the terminal records the user voice, and uses the user voice as the voice to be recognized.
  • the voice to be recognized is the voice data that the user expresses in a more verbal manner.
  • the voice data is used when the user is involved in retrieval when using the enterprise application system, freeing his hands to achieve human-computer interaction, and automatically retrieve the content that he wants to retrieve.
  • the operation of triggering the terminal to input the user's voice may be triggered by the user, such as clicking a control on the terminal, or it may be automatically detected by the terminal, such as automatically recording the voice of a person detected.
  • the enterprise application system can refer to the pure software system running in the enterprise, or it can be an application system composed of three levels: standardized management mode, knowledgeable business model, and integrated software system, such as OA collaborative office System, safe CSTS system, fingertip office system, etc.
  • Step 220 Input the to-be-recognized speech into the trained speech recognition model for recognition, and obtain the recognized text.
  • the terminal inputs the to-be-recognized speech into a trained speech recognition model for recognition, and obtains the recognized text.
  • the speech recognition model is mainly the process of converting speech into text, recognizing the text content in the speech, and obtaining a speech recognition algorithm for recognizing text.
  • Step 230 Input the recognized text into the trained semantic analysis model and sentiment analysis model to obtain first feature data and second feature data respectively; wherein, the first feature data is an analysis of semantic analysis of the recognized text Result; the second feature data is an analysis result of sentiment analysis on the recognized text.
  • the terminal inputs the recognized text into the trained semantic analysis model to obtain the first feature data.
  • the semantic analysis model is a semantic analysis algorithm that analyzes and processes the content of the recognized text based on the task of establishing the contextual words in the recognized text.
  • the first feature data refers to the analysis result of the semantic analysis of the recognized text.
  • the same words often represent different word meanings. Therefore, it is necessary to combine the meanings of words adjacent to each word context to judge and analyze the word, and analyze the meaning of the word in the semantic context .
  • the tasks of semantic analysis are different for different language units.
  • the basic task of semantic analysis is word sense disambiguation (WSD), semantic role labeling (SRL) at the sentence level, and reference disambiguation at the text level, also known as co-referential resolution.
  • the terminal inputs the recognized text into the trained sentiment analysis model to obtain the second feature data.
  • the sentiment analysis model refers to the sentiment analysis algorithm that judges the sentiment color of the text or the attitude of praise and criticism based on the analysis of the recognized text.
  • Sentiment analysis is also called tendency analysis, that is, a subjective text analysis judges the speaker's emotional color or praise and criticism attitude.
  • the second feature data refers to the analysis result of sentiment analysis on the recognized text.
  • Step 240 After performing word preprocessing on the recognized text, the target text is obtained, where the word preprocessing includes word segmentation, removal of staying words, and word filtering.
  • word preprocessing refers to a process of preliminary processing of the recognized text. After word preprocessing, the target text is obtained, and the target text is more accurate in subsequent processing.
  • word preprocessing can be to perform word segmentation processing on the recognized text, removing remaining words processing, and word filtering.
  • the word segmentation processing refers to segmentation of the recognized text, and removing stop words means to recognize Words that do not have any meaning in the text, such as removing words that have no special meaning such as " ⁇ , MA, ⁇ ".
  • Word filtering is a way of managing keywords in the recognized text, and is used to filter bad information.
  • Step 250 Input the first feature data, the second feature data, and the target text into the text classification model.
  • the text classification model obtains the first logical rule that matches successfully according to the first feature data and the second feature data, and compares them according to the first logical rule.
  • the target text is classified and processed to obtain key information.
  • the terminal inputs the first feature data, the second feature data, and the target text into the text classification model.
  • the text classification model refers to an algorithm that classifies the target text according to the first data and the second feature data.
  • the text classification model obtains a first logical rule that is successfully matched according to the first feature data and the second feature data, and classifies the target text according to the first logical rule to obtain key information. That is, through the results of semantic analysis and sentiment analysis, the target text is classified and extracted to obtain key information for retrieval.
  • Step 260 retrieve based on the key information to obtain the target retrieval content.
  • the terminal retrieves the target retrieval content according to the key information.
  • speech recognition and natural language processing (NLP) technologies are introduced into the existing retrieval function of the enterprise Internet application system, and the user’s speech is entered, and the speech recognition and natural language processing are performed automatically according to the key information finally obtained. Complete the search, avoid manual and frequent complex information retrieval, and greatly improve the efficiency of retrieval.
  • NLP Natural Language Processing
  • AI artificial intelligence
  • Natural language processing is an important technology that embodies language intelligence. It is an important branch of artificial intelligence that helps analyze, understand or generate natural language, realize the natural communication between humans and machines, and also help communication between people.
  • the entered user voice refers to any type of voice.
  • a series of information most likely to be needed by the user is retrieved based on any type of voice of the user, which improves the accuracy of retrieval.
  • the types of voice include standardized and colloquial terms.
  • the input voice can be the user using standardized language to say a voice: "Please check the turnover of the fourth quarter of 2018", or it can be the user using a colloquial expression to say a voice: " How much money did you make this quarter?" Whether it is standardized users or spoken language, speech recognition and natural language processing can be performed on them.
  • the key information obtained through text classification model matching and classification is "turnover and current quarter’s Time”, and automatically search based on key information, and finally get the target search content that users need, such as "specific operating income and operating income sources for each quarter.”
  • the speech to be recognized is input into the trained speech recognition model for recognition to obtain the recognized text
  • the recognized text is input into the trained semantic analysis model and sentiment analysis model to obtain the first A feature data and a second feature data.
  • the target text is obtained.
  • the first feature data, the second feature data, and the target text are input into the text classification model.
  • the text classification model is based on the first feature
  • the data and the second characteristic data obtain the first logical rule that matches successfully, and the target text is classified according to the first logical rule to obtain key information, and then search is performed according to the key information to obtain the target search content.
  • the recognized text is obtained, and then natural language processing is performed on the recognized text through the semantic analysis model, the sentiment analysis model and the text classification model to obtain the key information for retrieval , And finally get the target retrieval content based on the key information.
  • natural language processing By replacing the traditional keyword input with voice input, it saves the user input time.
  • the accuracy and comprehensiveness of the key information can be ensured, and then the key information can be automatically retrieved to accurately retrieve the corresponding target retrieval content. Improve the efficiency of information retrieval.
  • the speech recognition model includes an acoustic model and a language model.
  • step 220 includes:
  • Step 221 Perform signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence.
  • Step 222 Input the characteristic sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores, respectively.
  • Step 223 Perform a decoding search on the acoustic model score and the speech model score to obtain the recognized text.
  • the terminal performs signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence.
  • the audio signals have characteristic parameters, such as frequency, period, energy, etc. Therefore, signal processing and feature extraction on the voice audio signals can obtain a characteristic sequence.
  • the feature sequence includes multiple voice features of the voice to be recognized.
  • the terminal inputs the characteristic sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores, respectively.
  • the language model score refers to the evaluation of the quality of the language model and the analysis of the recognition result of speech recognition.
  • Acoustic model score refers to the acoustic model score generated by integrating the acoustic and phonetic systems and according to the input feature sequence.
  • the terminal performs a decoding search on the acoustic model score and the speech model score to obtain the recognized text.
  • the decoding search refers to the process of matching preset words according to the feature sequence and the score of the feature sequence to obtain the recognized text.
  • a feature sequence is obtained, the acoustic model score and the language model score are obtained, and then the recognized text is obtained through decoding search, so as to realize accurate conversion of speech to text.
  • step 223 further includes:
  • Step 223A obtain the preset hypothesis word sequence.
  • Step 223B Calculate the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain acoustic model groups.
  • Step 223C Calculate the language model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain language model groups.
  • Step 223D According to the grouping of the acoustic model and the grouping of the language model, the overall score of the hypothetical word in the preset hypothesis word sequence is calculated, and the hypothetical word with the highest overall score is used as the recognized text.
  • the terminal obtains a preset hypothetical word sequence
  • the preset hypothetical word sequence is a number of preset hypothetical words.
  • the grouping of the target acoustic model refers to the comparison and calculation of the hypothesis words in the hypothesis word sequence and the feature vectors in the feature sequence to obtain the acoustic score set of the hypothesis words.
  • the grouping of the target language model refers to the comparison and calculation of the hypothetical words in the hypothetical word sequence and the feature vectors in the feature sequence to obtain the language score set of the hypothetical words.
  • the overall acoustic score score of each hypothesis word in the preset hypothesis word sequence is calculated according to the acoustic score set and the language score set, and the hypothesis word with the highest overall score is selected as the recognition text.
  • the model to be trained includes the semantic analysis model, the sentiment analysis model, and the text classification model, as shown in FIG. 5, the method further includes:
  • Step 310 Obtain a training sample set.
  • the training sample set includes granular data samples, language data samples, and modal data samples.
  • the granular data samples include granular data features, language data features, and modal data features.
  • Step 320 Obtain the text to be trained, and input the text to be trained into the initial model to be trained to obtain the initial text.
  • Step 330 Adjust the parameters of the initial model to be trained according to the initial text, granular data features, language data features, and modal data features, until convergence conditions are met, to obtain the semantic analysis model, the sentiment analysis model, and the text classification model.
  • the training sample set refers to the big data samples used to train semantic analysis models, sentiment analysis models, and text classification models. Big data samples can be obtained through crawlers or purchased.
  • the training sample set includes granular data samples, language data samples and modal data samples.
  • the granular data sample is detailed and comprehensive multi-granular monolingual data.
  • Multilingual data is information data representing different languages, such as Chinese, English, Korean, Japanese, and dialects of different regions.
  • Multi-modal data is data that represents multiple manifestations of the same thing. It is similar to the information form of human perception and learning. From the perspective of a machine, it is equivalent to the description of the same thing by different sensors, such as cameras and X-rays. , Infrared photo of the same target in the same scene.
  • the sample to be trained is the sample used for training.
  • the sample to be trained can be a human sentence, or a novel, a paper, or even a large amount of industry data.
  • the speech recognition model includes an acoustic model and a language model, as shown in FIG. 6, and the method further includes:
  • Step 341 Obtain training samples, where the training samples include language features and acoustic features.
  • Step 342 Obtain the training speech to be recognized, and input the training speech to be recognized into the initial language model to obtain the initial language score.
  • Step 343 Adjust the parameters of the initial language model according to the language features and initial language scores, and adjust the parameters of the initial acoustic model according to the acoustic features and initial acoustic scores until both the initial language model and the initial acoustic model meet the convergence conditions to obtain a speech recognition model .
  • the training sample refers to the sample data used for speech training, and the training sample includes language features and acoustic features.
  • Linguistic features refer to the features used to distinguish different languages. For example, Chinese has the characteristics of Chinese, and English has the characteristics of English, etc., just as the human ear can recognize different languages according to the characteristics of different national languages.
  • Acoustic feature refers to the feature obtained by combining acoustics and pronunciation.
  • a retrieval method device including: a voice acquisition module 510, a voice recognition module 520, a key information confirmation module 530, and a retrieval module 540, wherein:
  • the voice acquisition module 510 is used to acquire the voice to be recognized.
  • the voice recognition module 520 is configured to input the to-be-recognized voice into a trained voice recognition model for recognition to obtain recognized text.
  • the key information confirmation module 530 is configured to input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain first feature data and second feature data, respectively, wherein the first feature data is a reference to the The analysis result of the semantic analysis of the recognized text; the second feature data is the analysis result of the sentiment analysis of the recognized text; it is also used to obtain the target text after the word preprocessing of the recognized text, wherein the Word preprocessing includes word segmentation, removal of remaining words, and word filtering; it is also used to input the first feature data, second feature data, and target text into a text classification model, which is based on the first feature data and The second characteristic data obtains the first logical rule that is successfully matched, and the target text is classified according to the first logical rule to obtain key information.
  • the retrieval module is used to retrieve the target retrieval content according to the key information.
  • the speech recognition model includes an acoustic model and a language model
  • the speech recognition module 510 includes:
  • the feature sequence extraction unit is used to perform signal processing and feature extraction on the audio signal of the voice data to obtain a feature sequence.
  • the score confirmation unit is used to input the feature sequence into the trained acoustic model and the trained language model to obtain the acoustic model score and the language model score respectively.
  • the recognition text acquisition unit performs a decoding search on the acoustic model score and the speech model score to obtain the recognized text.
  • the recognized text obtaining unit further includes:
  • the preset hypothesis word sequence obtaining unit is used to obtain the preset hypothesis word sequence.
  • the score calculation unit is configured to calculate the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain the grouping of the acoustic model, and also to calculate the score based on the feature vector in the feature sequence.
  • the language model scores of the predetermined hypothesis word sequence are stated to obtain language model groups.
  • the recognition text confirmation unit is configured to calculate the overall score of hypothetical words in the preset hypothesis word sequence according to the grouping of the acoustic model and the grouping of the language model, and use the hypothesis word with the highest overall score as the recognized text.
  • Each module in the above retrieval device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 8.
  • the computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer readable instructions.
  • the internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer-readable instructions are executed by the processor to implement a retrieval method.
  • the display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, a trackball or a touchpad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device includes a memory and one or more processors.
  • the memory stores computer readable instructions.
  • the steps of the retrieval method provided in any embodiment of the present application are implemented.
  • One or more non-volatile storage media storing computer-readable instructions.
  • the one or more processors implement the retrieval provided in any of the embodiments of the present application Method steps.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink), DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A retrieval method, comprising: performing speech recognition on colloquial speech, used as speech to be recognized, of a user, to obtain recognized text; performing natural language processing on the recognized text by means of a semantic analysis model, an emotion analysis model and a text classification model, so as to obtain key information used for retrieval; and finally, obtaining target retrieval content according to the key information.

Description

检索方法、装置、计算机设备和存储介质Retrieval method, device, computer equipment and storage medium
相关申请的交叉引用Cross references to related applications
本申请要求于2019年07月03日提交中国专利局,申请号为201910594101.9,申请名称为“检索方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on July 3, 2019. The application number is 201910594101.9 and the application name is "Search methods, devices, computer equipment, and storage media". The entire content is incorporated by reference. In this application.
技术领域Technical field
本申请涉及一种检索方法、装置、计算机设备和存储介质。This application relates to a retrieval method, device, computer equipment and storage medium.
背景技术Background technique
随机计算机技术和互联网系统的飞速发展,衍生出各个行业各个岗位多种用途的应用系统,目前,应用系统中涉及到信息检索时,传统的检索方式均需要用户选择,手动填写关键词以此来检索出相对应的内容。但随着当前互联网用户的群体数量以及日常工作中所需要的业务场景复杂度,数据时效性,数据量庞大性的不断增加,传统检索方式的检索工作量也随之增加,传统的信息检索模式将大大拖慢工作效率。The rapid development of random computer technology and Internet systems has derived application systems for multiple purposes in various industries and positions. At present, when information retrieval is involved in the application system, traditional retrieval methods require users to choose and manually fill in keywords to do so. Retrieve the corresponding content. However, as the current population of Internet users and the complexity of business scenarios required in daily work, the timeliness of data, and the huge amount of data continue to increase, the retrieval workload of traditional retrieval methods also increases. Traditional information retrieval models Will greatly slow down work efficiency.
发明内容Summary of the invention
根据本申请公开的各种实施例,提供一种检索方法、装置、计算机设备和存储介质。According to various embodiments disclosed in the present application, a retrieval method, device, computer equipment, and storage medium are provided.
一种检索方法,包括:A retrieval method including:
获取待识别语音;Obtain the voice to be recognized;
将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;Input the to-be-recognized speech into a trained speech recognition model for recognition to obtain a recognized text;
将所述识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;Input the recognized text into the trained semantic analysis model and sentiment analysis model to obtain first feature data and second feature data, respectively; wherein, the first feature data is the analysis result of semantic analysis of the recognized text The second feature data is an analysis result of sentiment analysis on the recognized text;
对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;After performing word preprocessing on the recognized text, the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information; and
根据所述关键信息进行检索得到目标检索内容。The target retrieval content is obtained by searching according to the key information.
一种检索装置,包括A retrieval device, including
语音获取模块,用于获取待识别语音;The voice acquisition module is used to acquire the voice to be recognized;
语音识别模块,用于将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;A speech recognition module, configured to input the to-be-recognized speech into a trained speech recognition model for recognition to obtain recognized text;
关键信息确认模块,用于将所述识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据,其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;还用于对所述识别文本进行词语预处理后,得到目标文本,其中,所述词语预处理包括分词、去除停留词、词语过滤;还用于将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The key information confirmation module is used to input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain the first feature data and the second feature data respectively, wherein the first feature data is for the recognition The analysis result of the semantic analysis of the text; the second feature data is the analysis result of the sentiment analysis of the recognized text; it is also used to obtain the target text after the word preprocessing of the recognized text, wherein the word Preprocessing includes word segmentation, removal of staying words, and word filtering; it is also used to input the first feature data, second feature data, and target text into a text classification model, which is based on the first feature data and the first feature data. The first logical rule matching the two feature data is obtained, and the target text is classified according to the first logical rule to obtain key information; and
检索模块,用于根据所述关键信息进行检索得到目标检索内容。The retrieval module is used to retrieve the target retrieval content according to the key information.
一种计算机设备,包括存储器和一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device, including a memory and one or more processors, the memory stores computer readable instructions, when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:
获取待识别语音;Obtain the voice to be recognized;
将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;Input the to-be-recognized speech into a trained speech recognition model for recognition to obtain a recognized text;
将所述识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;Input the recognized text into the trained semantic analysis model and sentiment analysis model to obtain first feature data and second feature data, respectively; wherein, the first feature data is the analysis result of semantic analysis of the recognized text The second feature data is an analysis result of sentiment analysis on the recognized text;
对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;After performing word preprocessing on the recognized text, the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information; and
根据所述关键信息进行检索得到目标检索内容。The target retrieval content is obtained by searching according to the key information.
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:One or more non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:
获取待识别语音;Obtain the voice to be recognized;
将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;Input the to-be-recognized speech into a trained speech recognition model for recognition to obtain a recognized text;
将所述识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;Input the recognized text into the trained semantic analysis model and sentiment analysis model to obtain first feature data and second feature data, respectively; wherein, the first feature data is the analysis result of semantic analysis of the recognized text The second feature data is an analysis result of sentiment analysis on the recognized text;
对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;After performing word preprocessing on the recognized text, the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information; and
根据所述关键信息进行检索得到目标检索内容。The target retrieval content is obtained by searching according to the key information.
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。The details of one or more embodiments of the application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为根据一个或多个实施例中检索方法的应用场景图。Fig. 1 is an application scenario diagram of the retrieval method according to one or more embodiments.
图2为根据一个或多个实施例中检索方法的流程示意图。Fig. 2 is a schematic flowchart of a retrieval method according to one or more embodiments.
图3为根据一个或多个实施例中语音识别的流程示意图。Fig. 3 is a schematic flow diagram of speech recognition according to one or more embodiments.
图4为根据一个或多个实施例中语音识别的流程示意图。Fig. 4 is a schematic diagram of a process of speech recognition according to one or more embodiments.
图5为根据一个或多个实施例中待训练模型的训练步骤的流程示意图。Fig. 5 is a schematic flowchart of training steps of a model to be trained according to one or more embodiments.
图6为根据一个或多个实施例中语音识别模型的训练步骤的流程示意图。Fig. 6 is a schematic flowchart of a training step of a speech recognition model according to one or more embodiments.
图7为根据一个或多个实施例中检索装置的框图。Fig. 7 is a block diagram of a retrieval device according to one or more embodiments.
图8为根据一个或多个实施例中计算机设备的框图。Figure 8 is a block diagram of a computer device according to one or more embodiments.
具体实施方式Detailed ways
为了使本申请的技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
本申请提供的检索方法,可以应用于如图1所示的应用环境中。图1为一个实施例中检 索方法运行的应用环境图。如图1所示,该应用环境包括终端110和服务器120,终端110和服务器120之间通过网络进行通信,通信网络可以是无线或者有线通信网络,例如IP网络、蜂窝移动通信网络等,其中终端和服务器的个数不限。The retrieval method provided in this application can be applied to the application environment as shown in FIG. 1. Fig. 1 is a diagram of the application environment of the retrieval method in an embodiment. As shown in Figure 1, the application environment includes a terminal 110 and a server 120. The terminal 110 and the server 120 communicate through a network. The communication network may be a wireless or wired communication network, such as an IP network, a cellular mobile communication network, etc., where the terminal And the number of servers is not limited.
终端110可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备。服务器120可以用独立的服务器或者是多个服务器组成的服务器集群来实现。通过终端110获取待识别语音,终端110将待识别语音输入已训练的语音识别模型中进行识别,得到识别文本,终端110将识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据,终端110对识别文本进行词语预处理后,得到目标文本,将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息,终端110根据所述关键信息进行检索得到目标检索内容。The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 120 may be implemented as an independent server or a server cluster composed of multiple servers. The terminal 110 obtains the voice to be recognized. The terminal 110 inputs the voice to be recognized into the trained voice recognition model for recognition, and the recognized text is obtained. The terminal 110 inputs the recognized text into the trained semantic analysis model and the emotion analysis model to obtain the first First feature data and second feature data. After the terminal 110 performs word preprocessing on the recognized text, the target text is obtained, and the first feature data, the second feature data, and the target text are input into the text classification model. The text classification model According to the first characteristic data and the second characteristic data, the first logic rule that is successfully matched is obtained, and the target text is classified according to the first logic rule to obtain key information, and the terminal 110 searches according to the key information Get the target retrieval content.
在其中一个实施例中,上述在终端110上对语音进行处理,最后得到目标检索内容的步骤,同样可以在服务器120上进行。具体地,终端110获取待识别语音后,将待识别语音发送至服务器120,在服务器120上对待识别语音进行处理得到目标检索内容,服务器120再将目标检索内容返回至终端。In one of the embodiments, the steps of processing the voice on the terminal 110 and finally obtaining the target retrieval content can also be performed on the server 120. Specifically, after the terminal 110 obtains the voice to be recognized, it sends the voice to be recognized to the server 120, where the voice to be recognized is processed on the server 120 to obtain the target retrieval content, and the server 120 returns the target retrieval content to the terminal.
在其中一个实施例中,如图2所示,提供了一种检索方法,以该方法应用于图1中的终端为例进行说明,包括以下步骤:In one of the embodiments, as shown in FIG. 2, a retrieval method is provided. Taking the method applied to the terminal in FIG. 1 as an example, the method includes the following steps:
步骤210,获取待识别语音。Step 210: Obtain a voice to be recognized.
具体地,终端录入用户语音,并将用户语音作为待识别语音。待识别语音是用户较为口语化表达的语音数据,语音数据用于用户在使用企业应用系统时涉及检索时,解放双手实现人机交互,并自动检索出想要检索出的内容。其中,触发终端录入用户语音的操作可以是用户触发的,比如点击终端上的控件,也可以是终端自动检测,比如检测到人的声音自动录入。其中,企业应用系统可以是指运行在企业内的单纯的软件系统,也可以是由标准化的管理模式、知识化的业务模型以及集成化的软件系统三个层次构成的应用系统,比如OA协同办公系统、平安CSTS系统,指尖办公系统等。Specifically, the terminal records the user voice, and uses the user voice as the voice to be recognized. The voice to be recognized is the voice data that the user expresses in a more verbal manner. The voice data is used when the user is involved in retrieval when using the enterprise application system, freeing his hands to achieve human-computer interaction, and automatically retrieve the content that he wants to retrieve. Among them, the operation of triggering the terminal to input the user's voice may be triggered by the user, such as clicking a control on the terminal, or it may be automatically detected by the terminal, such as automatically recording the voice of a person detected. Among them, the enterprise application system can refer to the pure software system running in the enterprise, or it can be an application system composed of three levels: standardized management mode, knowledgeable business model, and integrated software system, such as OA collaborative office System, safe CSTS system, fingertip office system, etc.
步骤220,将待识别语音输入已训练的语音识别模型中进行识别,得到识别文本。Step 220: Input the to-be-recognized speech into the trained speech recognition model for recognition, and obtain the recognized text.
具体地,终端将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本。语音识别模型主要是将语音转换成文字的处理,识别出语音中的文字内容,得到识别文本的语音识别算法。Specifically, the terminal inputs the to-be-recognized speech into a trained speech recognition model for recognition, and obtains the recognized text. The speech recognition model is mainly the process of converting speech into text, recognizing the text content in the speech, and obtaining a speech recognition algorithm for recognizing text.
步骤230,将识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果。Step 230: Input the recognized text into the trained semantic analysis model and sentiment analysis model to obtain first feature data and second feature data respectively; wherein, the first feature data is an analysis of semantic analysis of the recognized text Result; the second feature data is an analysis result of sentiment analysis on the recognized text.
具体地,终端将所述识别文本输入已训练的语义分析模型中,得到第一特征数据。语义分析模型是根据识别文本中的上下文词语建立任务对识别文本的内容进行分析和处理的语义分析算法,第一特征数据是指对识别文本进行语义分析的分析结果。在不同的语义情景下,相同的词语往往代表着不同的词义,因此需要结合每个词语上下文相邻的词语的含义,对该词进行判断和分析,分析符合该语义情景下的该词的词义。其中,对于不同的语言单位,语义分析的任务各不相同。在词的层次上,语义分析的基本任务是进行词义消歧(WSD),在句子层面上是语义角色标注(SRL),在篇章层面上是指代消歧,也称共指消解。Specifically, the terminal inputs the recognized text into the trained semantic analysis model to obtain the first feature data. The semantic analysis model is a semantic analysis algorithm that analyzes and processes the content of the recognized text based on the task of establishing the contextual words in the recognized text. The first feature data refers to the analysis result of the semantic analysis of the recognized text. In different semantic situations, the same words often represent different word meanings. Therefore, it is necessary to combine the meanings of words adjacent to each word context to judge and analyze the word, and analyze the meaning of the word in the semantic context . Among them, the tasks of semantic analysis are different for different language units. At the word level, the basic task of semantic analysis is word sense disambiguation (WSD), semantic role labeling (SRL) at the sentence level, and reference disambiguation at the text level, also known as co-referential resolution.
具体地,终端将所述识别文本输入已训练的情感分析模型中,得到第二特征数据。其中,情感分析模型是指根据识别文本分析判断文本的情感色彩或褒贬态度的情感分析算法。情感分析也称倾向性分析,即对一个主观的文本分析判断说话者的情感色彩或者褒贬态度,第二特征数据是指对识别文本进行情感分析的分析结果。Specifically, the terminal inputs the recognized text into the trained sentiment analysis model to obtain the second feature data. Among them, the sentiment analysis model refers to the sentiment analysis algorithm that judges the sentiment color of the text or the attitude of praise and criticism based on the analysis of the recognized text. Sentiment analysis is also called tendency analysis, that is, a subjective text analysis judges the speaker's emotional color or praise and criticism attitude. The second feature data refers to the analysis result of sentiment analysis on the recognized text.
步骤240,对识别文本进行词语预处理后,得到目标文本,其中,所述词语预处理包括分词、去除停留词、词语过滤。Step 240: After performing word preprocessing on the recognized text, the target text is obtained, where the word preprocessing includes word segmentation, removal of staying words, and word filtering.
具体地,终端对所述识别文本进行词语预处理后,得到目标文本。其中,词语预处理是指对识别文本的一个初步处理的过程,经过词语预处理得到目标文本,目标文本在后续处理的时候更加准确。在其中一个实施例中,词语预处理可以是对识别文本进行分词处理、去除停留词处理、词语过滤,分词处理是指对识别文本进行词语的切分,去除停用词指的是指将识别文本中没有任何意思的词语,比如去掉“的、吗、呢”等没有特殊含义的词语。词语过滤处理是一种管理识别文本中关键词的一种方式,用于过滤不良信息。Specifically, after the terminal performs word preprocessing on the recognized text, the target text is obtained. Among them, word preprocessing refers to a process of preliminary processing of the recognized text. After word preprocessing, the target text is obtained, and the target text is more accurate in subsequent processing. In one of the embodiments, word preprocessing can be to perform word segmentation processing on the recognized text, removing remaining words processing, and word filtering. The word segmentation processing refers to segmentation of the recognized text, and removing stop words means to recognize Words that do not have any meaning in the text, such as removing words that have no special meaning such as "的, MA, 呢". Word filtering is a way of managing keywords in the recognized text, and is used to filter bad information.
步骤250,将第一特征数据、第二特征数据、目标文本输入文本分类模型中,文本分类模型根据第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据第一逻辑规则对目标文本进行分类处理,得到关键信息。Step 250: Input the first feature data, the second feature data, and the target text into the text classification model. The text classification model obtains the first logical rule that matches successfully according to the first feature data and the second feature data, and compares them according to the first logical rule. The target text is classified and processed to obtain key information.
具体地,终端将第一特征数据、第二特征数据、目标文本输入文本分类模型中。其中,文本分类模型是指根据第一数据和第二特征数据对目标文本进行分类的算法。文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息。即通过语义分析结果和情感分析结果,对目标文本进行分类提取,得到用于检索的关键信息。Specifically, the terminal inputs the first feature data, the second feature data, and the target text into the text classification model. Among them, the text classification model refers to an algorithm that classifies the target text according to the first data and the second feature data. The text classification model obtains a first logical rule that is successfully matched according to the first feature data and the second feature data, and classifies the target text according to the first logical rule to obtain key information. That is, through the results of semantic analysis and sentiment analysis, the target text is classified and extracted to obtain key information for retrieval.
步骤260,根据关键信息进行检索得到目标检索内容。Step 260: Retrieve based on the key information to obtain the target retrieval content.
具体地,终端根据关键信息进行检索得到目标检索内容。在其中一个实施例中,在企业互联网应用系统的现有检索功能中引入语音识别和自然语言处理(NLP)技术,录入用户的语音,进行语音识别和自然语言处理后根据最后得到的关键信息自动完成搜索,避免人工频繁的复杂信息检索,极大的提高检索的效率。Specifically, the terminal retrieves the target retrieval content according to the key information. In one of the embodiments, speech recognition and natural language processing (NLP) technologies are introduced into the existing retrieval function of the enterprise Internet application system, and the user’s speech is entered, and the speech recognition and natural language processing are performed automatically according to the key information finally obtained. Complete the search, avoid manual and frequent complex information retrieval, and greatly improve the efficiency of retrieval.
NLP(Natural Language Processing)是人工智能(AI)的一个子领域,在整个人工智能体系下的作用。自然语言处理就是体现语言智能重要的技术,它是人工智能一个重要的分支,帮助分析、理解或者生成自然语言,实现人与机器的自然交流,同时也帮助人与人之间的交流。NLP (Natural Language Processing) is a sub-field of artificial intelligence (AI), which plays a role in the entire artificial intelligence system. Natural language processing is an important technology that embodies language intelligence. It is an important branch of artificial intelligence that helps analyze, understand or generate natural language, realize the natural communication between humans and machines, and also help communication between people.
录入的用户语音是指任意类型的语音,根据用户的任意类型的语音检索出用户最可能需要的一系列信息,提高了检索的准确率,语音的类型包括标准化用语和口语化用语。在其中一个实施例中,比如,录入的语音可以是用户使用标准化用语说一段语音:“请查一下2018年第四季度营业额”,也可以是用户使用口语化的表达方式说一段语音:“本季度挣了多少钱?”,无论是标准化用户还是口语化用语的语音,都能对其进行语音识别和自然语言处理,通过文本分类模型匹配和分类得到关键信息是“营业额以及本季度的时间”,并根据关键信息自动进行检索,最后得到的用户需要的目标检索内容,比如“每个季度具体的营业收入以及营业收入来源等”。The entered user voice refers to any type of voice. A series of information most likely to be needed by the user is retrieved based on any type of voice of the user, which improves the accuracy of retrieval. The types of voice include standardized and colloquial terms. In one of the embodiments, for example, the input voice can be the user using standardized language to say a voice: "Please check the turnover of the fourth quarter of 2018", or it can be the user using a colloquial expression to say a voice: " How much money did you make this quarter?" Whether it is standardized users or spoken language, speech recognition and natural language processing can be performed on them. The key information obtained through text classification model matching and classification is "turnover and current quarter’s Time", and automatically search based on key information, and finally get the target search content that users need, such as "specific operating income and operating income sources for each quarter."
在本实施例中,通过获取待识别语音,将待识别语音输入已训练的语音识别模型中进行识别,得到识别文本,将识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据,对识别文本进行词语预处理后,得到目标文本,将第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,并根据第一逻辑规则对所述目标文本进行分类处理,得到关键信息,再根据所述关键信息进行检索得到目标检索内容。通过将用户口语化的语音作为待识别语音进行语音识别,得到识别文本,再通过语义分析模型、情感分析模型和文本分类模型对所述识别文本进行自然语言处理,得到用于进行检索的关键信息,最后再根据关键信息得到目标检索内容。通过用语音输入代替传统的关键字输入,节省用户输入的时间,通过自然语言处理能够保证关键信息的准确性和全面性,再根据关键信息进行自动检索,准确地检索出对应的目标检索内容,提高信息检索的工作效率。In this embodiment, by acquiring the speech to be recognized, the speech to be recognized is input into the trained speech recognition model for recognition to obtain the recognized text, and the recognized text is input into the trained semantic analysis model and sentiment analysis model to obtain the first A feature data and a second feature data. After preprocessing the recognized text, the target text is obtained. The first feature data, the second feature data, and the target text are input into the text classification model. The text classification model is based on the first feature The data and the second characteristic data obtain the first logical rule that matches successfully, and the target text is classified according to the first logical rule to obtain key information, and then search is performed according to the key information to obtain the target search content. By using the spoken speech of the user as the speech to be recognized for speech recognition, the recognized text is obtained, and then natural language processing is performed on the recognized text through the semantic analysis model, the sentiment analysis model and the text classification model to obtain the key information for retrieval , And finally get the target retrieval content based on the key information. By replacing the traditional keyword input with voice input, it saves the user input time. Through natural language processing, the accuracy and comprehensiveness of the key information can be ensured, and then the key information can be automatically retrieved to accurately retrieve the corresponding target retrieval content. Improve the efficiency of information retrieval.
在其中一个实施例中,所述语音识别模型包括声学模型和语言模型,如图3所示,步骤220包括:In one of the embodiments, the speech recognition model includes an acoustic model and a language model. As shown in FIG. 3, step 220 includes:
步骤221,对所述待识别语音的音频信号进行信号处理和特征提取,得到特征序列。Step 221: Perform signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence.
步骤222,将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声 学模型得分和语言模型得分。Step 222: Input the characteristic sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores, respectively.
步骤223,对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。Step 223: Perform a decoding search on the acoustic model score and the speech model score to obtain the recognized text.
具体地,终端对待识别语音的音频信号进行信号处理和特征提取,得到特征序列。其中,可以理解的是不同的语音的音频信号是有区别的,音频信号具有特征参数,比如频率、周期、能量等,因此对语音的音频信号进行信号处理和特征提取,能够得到特征序列。特征序列包含多个所述待识别语音的语音特征。Specifically, the terminal performs signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence. Among them, it can be understood that the audio signals of different voices are different. The audio signals have characteristic parameters, such as frequency, period, energy, etc. Therefore, signal processing and feature extraction on the voice audio signals can obtain a characteristic sequence. The feature sequence includes multiple voice features of the voice to be recognized.
具体地,终端将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声学模型得分和语言模型得分。其中,语言模型得分是指用于评估语言模型的好坏,用于语音识别的识别结果分析。声学模型得分是指将声学和发音学的制式进行整合,根据输入的特征序列,生成的声学模型得分。Specifically, the terminal inputs the characteristic sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores, respectively. Among them, the language model score refers to the evaluation of the quality of the language model and the analysis of the recognition result of speech recognition. Acoustic model score refers to the acoustic model score generated by integrating the acoustic and phonetic systems and according to the input feature sequence.
具体地,终端对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。其中,解码搜索是指根据特征序列以及特征序列的得分,匹配预设词,得到识别文本的过程。Specifically, the terminal performs a decoding search on the acoustic model score and the speech model score to obtain the recognized text. Among them, the decoding search refers to the process of matching preset words according to the feature sequence and the score of the feature sequence to obtain the recognized text.
在本实施例中,通过对将待识别语音进行信号处理和特征提取,得到特征序列,得到声学模型得分和语言模型得分后再通过解码搜索得到识别文本,实现语音到文字的准确转换。In this embodiment, by performing signal processing and feature extraction on the speech to be recognized, a feature sequence is obtained, the acoustic model score and the language model score are obtained, and then the recognized text is obtained through decoding search, so as to realize accurate conversion of speech to text.
在其中一个实施例中,如图4所示,步骤223还包括:In one of the embodiments, as shown in FIG. 4, step 223 further includes:
步骤223A,获取预设假设词序列。 Step 223A, obtain the preset hypothesis word sequence.
步骤223B,根据所述特征序列中的特征向量计算所述预设假设词序列的所述声学模型得分,得到声学模型得分组。 Step 223B: Calculate the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain acoustic model groups.
步骤223C,根据所述特征序列中的特征向量计算所述预设假设词序列的所述语言模型得分,得到语言模型得分组。 Step 223C: Calculate the language model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain language model groups.
步骤223D,根据所述声学模型得分组和语言模型得分组,计算所述预设假设词序列中假设词的总体得分,将所述总体得分最高的假设词作为所述识别文本。 Step 223D: According to the grouping of the acoustic model and the grouping of the language model, the overall score of the hypothetical word in the preset hypothesis word sequence is calculated, and the hypothetical word with the highest overall score is used as the recognized text.
具体地,终端获取预设假设词序列,预设假设词序列是预先设置的若干假设词。目标声学模型得分组是指假设词序列中的假设词与特征序列中的特征向量进行对比计算,得到的假设词的声学得分集合。目标语言模型得分组是指假设词序列中的假设词与特征序列中的特征向量进行对比计算,得到的假设词的语言得分集合。并根据声学得分集合和语言得分集合计算所述预设假设词序列中每一个假设词的声学得分总体得分,并选择总体得分最高的假设词作为识别文本。Specifically, the terminal obtains a preset hypothetical word sequence, and the preset hypothetical word sequence is a number of preset hypothetical words. The grouping of the target acoustic model refers to the comparison and calculation of the hypothesis words in the hypothesis word sequence and the feature vectors in the feature sequence to obtain the acoustic score set of the hypothesis words. The grouping of the target language model refers to the comparison and calculation of the hypothetical words in the hypothetical word sequence and the feature vectors in the feature sequence to obtain the language score set of the hypothetical words. The overall acoustic score score of each hypothesis word in the preset hypothesis word sequence is calculated according to the acoustic score set and the language score set, and the hypothesis word with the highest overall score is selected as the recognition text.
在其中一个实施例中,待训练模型包括所述语义分析模型、所述情感分析模型和所述文本分类模型,如图5示,方法还包括:In one of the embodiments, the model to be trained includes the semantic analysis model, the sentiment analysis model, and the text classification model, as shown in FIG. 5, the method further includes:
步骤310,获取训练样本集,所述训练样本集包括粒度数据样本、语言数据样本和模态数据样本,所述粒度数据样本包括粒度数据特征、语言数据特征、模态数据特征。Step 310: Obtain a training sample set. The training sample set includes granular data samples, language data samples, and modal data samples. The granular data samples include granular data features, language data features, and modal data features.
步骤320,获取待训练文本,将待训练文本输入初始待训练模型,得到初始文本。Step 320: Obtain the text to be trained, and input the text to be trained into the initial model to be trained to obtain the initial text.
步骤330,根据初始文本、粒度数据特征、语言数据特征和模态数据特征对初始待训练模型进行参数调整,直到满足收敛条件,得到所述语义分析模型、所述情感分析模型、所述文本分类模型。Step 330: Adjust the parameters of the initial model to be trained according to the initial text, granular data features, language data features, and modal data features, until convergence conditions are met, to obtain the semantic analysis model, the sentiment analysis model, and the text classification model.
训练样本集是指用于训练语义分析模型、情感分析模型和文本分类模型的大数据样本,大数据样本可以通过爬虫或购买得到。训练样本集包括粒度数据样本、语言数据样本和模态数据样本。粒度数据样本是详细全面的多粒度单语数据。多语言数据是代表不同语言的信息数据,比如中文英文、韩语、日语、不同地区方言等。多模态数据是表示同一个事物的多种表现形态的数据,类似于人类感知学习的信息形式、站在机器的角度上说相当于不同传感器对同一事物的描述,比如说,相机、X光、红外线对同一个场景同一个目标照出的图片。The training sample set refers to the big data samples used to train semantic analysis models, sentiment analysis models, and text classification models. Big data samples can be obtained through crawlers or purchased. The training sample set includes granular data samples, language data samples and modal data samples. The granular data sample is detailed and comprehensive multi-granular monolingual data. Multilingual data is information data representing different languages, such as Chinese, English, Korean, Japanese, and dialects of different regions. Multi-modal data is data that represents multiple manifestations of the same thing. It is similar to the information form of human perception and learning. From the perspective of a machine, it is equivalent to the description of the same thing by different sensors, such as cameras and X-rays. , Infrared photo of the same target in the same scene.
待训练样本是用于训练的样本,待训练样本可以是人类的一句话,或者一篇小说,一篇论文,乃至大量的行业数据。通过不断训练调整初始待训练模型的参数,直到满足收敛条件,得到语义分析模型、情感分析模型和文本分类模型。The sample to be trained is the sample used for training. The sample to be trained can be a human sentence, or a novel, a paper, or even a large amount of industry data. Through continuous training and adjusting the parameters of the initial model to be trained, until the convergence condition is met, a semantic analysis model, a sentiment analysis model and a text classification model are obtained.
在其中一个实施例中,所述语音识别模型包括声学模型和语言模型,如图6示,方法还包括:In one of the embodiments, the speech recognition model includes an acoustic model and a language model, as shown in FIG. 6, and the method further includes:
步骤341,获取训练样本,所述训练样本包括语言特征和声学特征。Step 341: Obtain training samples, where the training samples include language features and acoustic features.
步骤342,获取待识别训练语音,将待识别训练语音输入初始语言模型,得到初始语言得分。Step 342: Obtain the training speech to be recognized, and input the training speech to be recognized into the initial language model to obtain the initial language score.
步骤343,根据语言特征、初始语言得分对初始语言模型进行参数调整,根据声学特征、初始声学得分对初始声学模型进行参数调整,直到初始语言模型和初始声学模型都满足收敛条件,得到语音识别模型。Step 343: Adjust the parameters of the initial language model according to the language features and initial language scores, and adjust the parameters of the initial acoustic model according to the acoustic features and initial acoustic scores until both the initial language model and the initial acoustic model meet the convergence conditions to obtain a speech recognition model .
训练样本是指用来语音训练的样本数据,训练样本包括语言特征和声学特征。语言特征是指用来区分不同的语言的特征,比如中文具有中文的特征,英文具有英文的特征等等,就像人耳能够根据不同国家语言的特色能够识别出不同的语言一样。声学特征是指将声学和发音学结合所得到的特征。The training sample refers to the sample data used for speech training, and the training sample includes language features and acoustic features. Linguistic features refer to the features used to distinguish different languages. For example, Chinese has the characteristics of Chinese, and English has the characteristics of English, etc., just as the human ear can recognize different languages according to the characteristics of different national languages. Acoustic feature refers to the feature obtained by combining acoustics and pronunciation.
应该理解的是,虽然图2-6的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-6中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-6 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-6 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在其中一个实施例中,如图7所示,提供了一种检索方法装置,包括:语音获取模块510、语音识别模块520、关键信息确认模块530和检索模块540,其中:In one of the embodiments, as shown in FIG. 7, a retrieval method device is provided, including: a voice acquisition module 510, a voice recognition module 520, a key information confirmation module 530, and a retrieval module 540, wherein:
语音获取模块510,用于获取待识别语音。The voice acquisition module 510 is used to acquire the voice to be recognized.
语音识别模块520,用于将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本。The voice recognition module 520 is configured to input the to-be-recognized voice into a trained voice recognition model for recognition to obtain recognized text.
关键信息确认模块530,用于将所述识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据,其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;还用于对所述识别文本进行词语预处理后,得到目标文本,其中,所述词语预处理包括分词、去除停留词、词语过滤;还用于将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息。The key information confirmation module 530 is configured to input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain first feature data and second feature data, respectively, wherein the first feature data is a reference to the The analysis result of the semantic analysis of the recognized text; the second feature data is the analysis result of the sentiment analysis of the recognized text; it is also used to obtain the target text after the word preprocessing of the recognized text, wherein the Word preprocessing includes word segmentation, removal of remaining words, and word filtering; it is also used to input the first feature data, second feature data, and target text into a text classification model, which is based on the first feature data and The second characteristic data obtains the first logical rule that is successfully matched, and the target text is classified according to the first logical rule to obtain key information.
检索模块,用于根据所述关键信息进行检索得到目标检索内容。The retrieval module is used to retrieve the target retrieval content according to the key information.
在其中一个实施例中,所述语音识别模型包括声学模型和语言模型,语音识别模块510包括:In one of the embodiments, the speech recognition model includes an acoustic model and a language model, and the speech recognition module 510 includes:
特征序列提取单元,用于对所述语音数据的音频信号进行信号处理和特征提取,得到特征序列。The feature sequence extraction unit is used to perform signal processing and feature extraction on the audio signal of the voice data to obtain a feature sequence.
得分确认单元,用于将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声学模型得分和语言模型得分。The score confirmation unit is used to input the feature sequence into the trained acoustic model and the trained language model to obtain the acoustic model score and the language model score respectively.
识别文本获取单元,对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。The recognition text acquisition unit performs a decoding search on the acoustic model score and the speech model score to obtain the recognized text.
在其中一个实施例中,所述识别文本获取单元还包括:In one of the embodiments, the recognized text obtaining unit further includes:
预设假设词序列获取单元,用于获取预设假设词序列。The preset hypothesis word sequence obtaining unit is used to obtain the preset hypothesis word sequence.
得分计算单元,用于根据所述特征序列中的特征向量计算所述预设假设词序列的所述声学模型得分,得到声学模型得分组,还用于根据所述特征序列中的特征向量计算所述预设假设词序列的所述语言模型得分,得到语言模型得分组。The score calculation unit is configured to calculate the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain the grouping of the acoustic model, and also to calculate the score based on the feature vector in the feature sequence. The language model scores of the predetermined hypothesis word sequence are stated to obtain language model groups.
识别文本确认单元,用于根据所述声学模型得分组和语言模型得分组,计算所述预设假设词序列中假设词的总体得分,将所述总体得分最高的假设词作为所述识别文本。The recognition text confirmation unit is configured to calculate the overall score of hypothetical words in the preset hypothesis word sequence according to the grouping of the acoustic model and the grouping of the language model, and use the hypothesis word with the highest overall score as the recognized text.
关于检索装置的具体限定可以参见上文中对于检索方法的限定,在此不再赘述。上述检索装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the retrieval device, please refer to the above limitation on the retrieval method, which will not be repeated here. Each module in the above retrieval device can be implemented in whole or in part by software, hardware and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在其中一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图8所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机可读指令。该内存储器为非易失性存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种检索方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one of the embodiments, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 8. The computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. The processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer readable instructions. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by the processor to implement a retrieval method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, or it can be a button, a trackball or a touchpad set on the housing of the computer equipment , It can also be an external keyboard, touchpad, or mouse.
本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时实现本申请任意一个实施例中提供的检索方法的步骤。A computer device includes a memory and one or more processors. The memory stores computer readable instructions. When the computer readable instructions are executed by the processor, the steps of the retrieval method provided in any embodiment of the present application are implemented.
一个或多个存储有计算机可读指令的非易失性存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器实现本申请任意一个实施例中提供的检索方法的步骤。One or more non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors implement the retrieval provided in any of the embodiments of the present application Method steps.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)、DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink), DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (18)

  1. 一种检索方法,包括:A retrieval method including:
    获取待识别语音;Obtain the voice to be recognized;
    将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;Input the to-be-recognized speech into a trained speech recognition model for recognition to obtain a recognized text;
    将所述识别文本输入已训练的语义分析模型中和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;Input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain first feature data and second feature data, respectively; wherein, the first feature data is an analysis of semantic analysis of the recognized text Result; the second feature data is an analysis result of sentiment analysis on the recognized text;
    对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;After performing word preprocessing on the recognized text, the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
    将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information; and
    根据所述关键信息进行检索得到目标检索内容。The target retrieval content is obtained by searching according to the key information.
  2. 根据权利要求1所述的方法,其特征在于,所述语音识别模型包括声学模型和语言模型,所述将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本的步骤,包括:The method according to claim 1, wherein the speech recognition model includes an acoustic model and a language model, and the step of inputting the to-be-recognized speech into a trained speech recognition model for recognition to obtain recognized text, include:
    对所述待识别语音的音频信号进行信号处理和特征提取,得到特征序列;Signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence;
    将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声学模型得分和语言模型得分;及Input the feature sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores respectively; and
    对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。A decoding search is performed on the acoustic model score and the speech model score to obtain the recognized text.
  3. 根据权利要求2所述的方法,其特征在于,所述对所述声学模型得分和所述语音模型得分进行解码搜索,得到识别文本的步骤,包括:The method according to claim 2, wherein the step of decoding and searching the acoustic model score and the speech model score to obtain the recognized text comprises:
    获取预设假设词序列;Obtain the presupposition word sequence;
    根据所述特征序列中的特征向量计算所述预设假设词序列的所述声学模型得分,得到声学模型得分组;Calculating the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain the acoustic model grouping;
    根据所述特征序列中的特征向量计算所述预设假设词序列的所述语言模型得分,得到语言模型得分组;及Calculate the language model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain language model groups; and
    根据所述声学模型得分组和语言模型得分组,计算所述预设假设词序列中假设词的总体得分,将所述总体得分最高的假设词作为所述识别文本。According to the grouping of the acoustic model and the grouping of the language model, the overall score of the hypothesis word in the preset hypothesis word sequence is calculated, and the hypothesis word with the highest overall score is used as the recognized text.
  4. 根据权利要求1所述的方法,其特征在于,待训练模型包括所述语义分析模型、所述情感分析模型和所述文本分类模型,所述待训练模型的训练步骤,包括:The method according to claim 1, wherein the model to be trained comprises the semantic analysis model, the sentiment analysis model and the text classification model, and the training step of the model to be trained comprises:
    获取训练样本集,所述训练样本集包括粒度数据样本、语言数据样本和模态数据样本,所述粒度数据样本包括粒度数据特征、语言数据特征、模态数据特征;Acquiring a training sample set, the training sample set including granular data samples, language data samples and modal data samples, the granular data samples including granular data features, language data features, and modal data features;
    获取待训练文本,将待训练文本输入初始待训练模型,得到初始文本;及Obtain the text to be trained, and input the text to be trained into the initial model to be trained to obtain the initial text; and
    根据所述初始文本、所述粒度数据特征、所述语言数据特征、所述模态数据特征对所述初始待训练模型进行参数调整,直到满足收敛条件,得到所述语义分析模型、所述情感分析模型、所述文本分类模型。According to the initial text, the granular data feature, the language data feature, and the modal data feature, the parameters of the initial model to be trained are adjusted until the convergence condition is met, and the semantic analysis model and the emotion are obtained. Analysis model, the text classification model.
  5. 根据权利要求1所述的方法,其特征在于,所述语音识别模型包括声学模型和语言模型,所述语音识别模型的训练步骤包括:The method according to claim 1, wherein the speech recognition model includes an acoustic model and a language model, and the training step of the speech recognition model comprises:
    获取训练样本,所述训练样本包括语言特征和声学特征;Acquiring training samples, the training samples including language features and acoustic features;
    获取待识别训练语音,将待识别训练语音输入初始语言模型,得到初始语言得分;Obtain the training speech to be recognized, and input the training speech to be recognized into the initial language model to obtain the initial language score;
    获取待识别训练语音,将待识别训练语音输入初始声学模型,得到初始声学得分;Obtain the training speech to be recognized, and input the training speech to be recognized into the initial acoustic model to obtain the initial acoustic score;
    根据所述语言特征、所述初始语言得分对所述初始语言模型进行参数调整,根据所述声学特征、所述初始声学得分对所述初始声学模型进行参数调整,直到所述初始语言模型和所述初始声学模型都满足收敛条件,得到所述语音识别模型。Adjust the parameters of the initial language model according to the language feature and the initial language score, and adjust the parameters of the initial acoustic model according to the acoustic feature and the initial acoustic score until the initial language model and the initial language model The initial acoustic models meet the convergence condition, and the speech recognition model is obtained.
  6. 一种检索装置,包括:A retrieval device, including:
    语音获取模块,用于获取待识别语音;The voice acquisition module is used to acquire the voice to be recognized;
    语音识别模块,用于将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;A speech recognition module, configured to input the to-be-recognized speech into a trained speech recognition model for recognition to obtain recognized text;
    关键信息确认模块,用于将所述识别文本输入已训练的语义分析模型和情感分析模型中,分别得到第一特征数据和第二特征数据,其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;还用于对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;还用于将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;The key information confirmation module is used to input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain the first feature data and the second feature data respectively, wherein the first feature data is for the recognition The analysis result of the semantic analysis of the text; the second feature data is the analysis result of the sentiment analysis of the recognized text; it is also used to obtain the target text after word preprocessing of the recognized text; wherein, the word Preprocessing includes word segmentation, removal of staying words, and word filtering; it is also used to input the first feature data, second feature data, and target text into a text classification model, which is based on the first feature data and the first feature data. Second, the characteristic data obtains the first logical rule that matches successfully, and classifies the target text according to the first logical rule to obtain key information;
    检索模块,用于根据所述关键信息进行检索得到目标检索内容。The retrieval module is used to retrieve the target retrieval content according to the key information.
  7. 根据权利要求6所述的装置,其特征在于,所述语音识别模型包括声学模型和语言模型,所述语音识别模块包括:The device according to claim 6, wherein the speech recognition model comprises an acoustic model and a language model, and the speech recognition module comprises:
    特征序列提取单元,用于对所述语音数据的音频信号进行信号处理和特征提取,得到特征序列;The feature sequence extraction unit is configured to perform signal processing and feature extraction on the audio signal of the voice data to obtain a feature sequence;
    得分确认单元,用于将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声学模型得分和语言模型得分;The score confirmation unit is used to input the feature sequence into the trained acoustic model and the trained language model to obtain the acoustic model score and the language model score respectively;
    识别文本获取单元,对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。The recognition text acquisition unit performs a decoding search on the acoustic model score and the speech model score to obtain the recognized text.
  8. 根据权利要求7所述的装置,其特征在于,所述识别文本获取单元包括:The device according to claim 7, wherein the recognition text obtaining unit comprises:
    预设假设词序列获取单元,用于获取预设假设词序列;The preset hypothesis word sequence obtaining unit is used to obtain the preset hypothesis word sequence;
    得分计算单元,用于根据所述特征序列中的特征向量计算所述预设假设词序列的所述声学模型得分,得到声学模型得分组,还用于根据所述特征序列中的特征向量计算所述预设假设词序列的所述语言模型得分,得到语言模型得分组;The score calculation unit is configured to calculate the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain the grouping of the acoustic model, and also to calculate the score based on the feature vector in the feature sequence. State the language model score of the presupposition hypothesis word sequence, and obtain the language model grouping;
    识别文本确认单元,用于根据所述声学模型得分组和语言模型得分组,计算所述预设假设词序列中假设词的总体得分,将所述总体得分最高的假设词作为所述识别文本。The recognition text confirmation unit is configured to calculate the overall score of hypothetical words in the preset hypothesis word sequence according to the grouping of the acoustic model and the grouping of the language model, and use the hypothesis word with the highest overall score as the recognized text.
  9. 一种计算机设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more Each processor performs the following steps:
    获取待识别语音;Obtain the voice to be recognized;
    将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;Input the to-be-recognized speech into a trained speech recognition model for recognition to obtain a recognized text;
    将所述识别文本输入已训练的语义分析模型中和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;Input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain first feature data and second feature data, respectively; wherein, the first feature data is an analysis of semantic analysis of the recognized text Result; the second feature data is an analysis result of sentiment analysis on the recognized text;
    对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;After performing word preprocessing on the recognized text, the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
    将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information; and
    根据所述关键信息进行检索得到目标检索内容。The target retrieval content is obtained by searching according to the key information.
  10. 根据权利要求9所述的计算机设备,其特征在于,所述语音识别模型包括声学模型和语言模型,所述将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别 文本的步骤,包括:The computer device according to claim 9, wherein the speech recognition model includes an acoustic model and a language model, and the step of inputting the speech to be recognized into a trained speech recognition model for recognition, and obtaining recognized text ,include:
    对所述待识别语音的音频信号进行信号处理和特征提取,得到特征序列;Signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence;
    将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声学模型得分和语言模型得分;及Input the feature sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores respectively; and
    对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。A decoding search is performed on the acoustic model score and the speech model score to obtain the recognized text.
  11. 根据权利要求10所述的计算机设备,其特征在于,所述对所述声学模型得分和所述语音模型得分进行解码搜索,得到识别文本的步骤,包括:The computer device according to claim 10, wherein the step of decoding and searching the acoustic model score and the speech model score to obtain the recognized text comprises:
    获取预设假设词序列;Obtain the presupposition word sequence;
    根据所述特征序列中的特征向量计算所述预设假设词序列的所述声学模型得分,得到声学模型得分组;Calculating the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain the acoustic model grouping;
    根据所述特征序列中的特征向量计算所述预设假设词序列的所述语言模型得分,得到语言模型得分组;及Calculate the language model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain language model groups; and
    根据所述声学模型得分组和语言模型得分组,计算所述预设假设词序列中假设词的总体得分,将所述总体得分最高的假设词作为所述识别文本。According to the grouping of the acoustic model and the grouping of the language model, the overall score of the hypothesis word in the preset hypothesis word sequence is calculated, and the hypothesis word with the highest overall score is used as the recognized text.
  12. 根据权利要求9所述的计算机设备,其特征在于,待训练模型包括所述语义分析模型、所述情感分析模型和所述文本分类模型,所述待训练模型的训练步骤,包括:The computer device according to claim 9, wherein the model to be trained comprises the semantic analysis model, the sentiment analysis model, and the text classification model, and the training step of the model to be trained comprises:
    获取训练样本集,所述训练样本集包括粒度数据样本、语言数据样本和模态数据样本,所述粒度数据样本包括粒度数据特征、语言数据特征、模态数据特征;Acquiring a training sample set, the training sample set including granular data samples, language data samples and modal data samples, the granular data samples including granular data features, language data features, and modal data features;
    获取待训练文本,将待训练文本输入初始待训练模型,得到初始文本;及Obtain the text to be trained, and input the text to be trained into the initial model to be trained to obtain the initial text; and
    根据所述初始文本、所述粒度数据特征、所述语言数据特征、所述模态数据特征对所述初始待训练模型进行参数调整,直到满足收敛条件,得到所述语义分析模型、所述情感分析模型、所述文本分类模型。According to the initial text, the granular data feature, the language data feature, and the modal data feature, the parameters of the initial model to be trained are adjusted until the convergence condition is met, and the semantic analysis model and the emotion are obtained. Analysis model, the text classification model.
  13. 根据权利要求9所述的计算机设备,其特征在于,所述语音识别模型包括声学模型和语言模型,所述语音识别模型的训练步骤包括:The computer device according to claim 9, wherein the speech recognition model comprises an acoustic model and a language model, and the training step of the speech recognition model comprises:
    获取训练样本,所述训练样本包括语言特征和声学特征;Acquiring training samples, the training samples including language features and acoustic features;
    获取待识别训练语音,将待识别训练语音输入初始语言模型,得到初始语言得分;Obtain the training speech to be recognized, and input the training speech to be recognized into the initial language model to obtain the initial language score;
    获取待识别训练语音,将待识别训练语音输入初始声学模型,得到初始声学得分;Obtain the training speech to be recognized, and input the training speech to be recognized into the initial acoustic model to obtain the initial acoustic score;
    根据所述语言特征、所述初始语言得分对所述初始语言模型进行参数调整,根据所述声学特征、所述初始声学得分对所述初始声学模型进行参数调整,直到所述初始语言模型和所述初始声学模型都满足收敛条件,得到所述语音识别模型。Adjust the parameters of the initial language model according to the language feature and the initial language score, and adjust the parameters of the initial acoustic model according to the acoustic feature and the initial acoustic score until the initial language model and the initial language model The initial acoustic models meet the convergence condition, and the speech recognition model is obtained.
  14. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:One or more non-volatile computer-readable storage media storing computer-readable instructions, which when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取待识别语音;Obtain the voice to be recognized;
    将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本;Input the to-be-recognized speech into a trained speech recognition model for recognition to obtain a recognized text;
    将所述识别文本输入已训练的语义分析模型中和情感分析模型中,分别得到第一特征数据和第二特征数据;其中,所述第一特征数据为对所述识别文本进行语义分析的分析结果;所述第二特征数据为对所述识别文本进行情感分析的分析结果;Input the recognized text into the trained semantic analysis model and the sentiment analysis model to obtain first feature data and second feature data, respectively; wherein, the first feature data is an analysis of semantic analysis of the recognized text Result; the second feature data is an analysis result of sentiment analysis on the recognized text;
    对所述识别文本进行词语预处理后,得到目标文本;其中,所述词语预处理包括分词、去除停留词、词语过滤;After performing word preprocessing on the recognized text, the target text is obtained; wherein, the word preprocessing includes word segmentation, removal of stay words, and word filtering;
    将所述第一特征数据、第二特征数据、目标文本输入文本分类模型中,所述文本分类模型根据所述第一特征数据和第二特征数据得到匹配成功的第一逻辑规则,根据所述第一逻辑规则对所述目标文本进行分类处理,得到关键信息;及The first feature data, the second feature data, and the target text are input into a text classification model, and the text classification model obtains a first logical rule that matches successfully according to the first feature data and the second feature data, and according to the The first logic rule classifies the target text to obtain key information; and
    根据所述关键信息进行检索得到目标检索内容。The target retrieval content is obtained by searching according to the key information.
  15. 根据权利要求14所述的存储介质,其特征在于,所述语音识别模型包括声学模 型和语言模型,所述将所述待识别语音输入已训练的语音识别模型中进行识别,得到识别文本的步骤,包括:The storage medium according to claim 14, wherein the speech recognition model includes an acoustic model and a language model, and the step of inputting the speech to be recognized into a trained speech recognition model for recognition to obtain recognized text ,include:
    对所述待识别语音的音频信号进行信号处理和特征提取,得到特征序列;Signal processing and feature extraction on the audio signal of the voice to be recognized to obtain a feature sequence;
    将所述特征序列输入已训练的声学模型和已训练的语言模型中,分别得到声学模型得分和语言模型得分;及Input the feature sequence into the trained acoustic model and the trained language model to obtain acoustic model scores and language model scores respectively; and
    对所述声学模型得分和所述语音模型得分进行解码搜索,得到所述识别文本。A decoding search is performed on the acoustic model score and the speech model score to obtain the recognized text.
  16. 根据权利要求15所述的存储介质,其特征在于,所述对所述声学模型得分和所述语音模型得分进行解码搜索,得到识别文本的步骤,包括:The storage medium according to claim 15, wherein the step of decoding and searching the acoustic model score and the speech model score to obtain the recognized text comprises:
    获取预设假设词序列;Obtain the presupposition word sequence;
    根据所述特征序列中的特征向量计算所述预设假设词序列的所述声学模型得分,得到声学模型得分组;Calculating the acoustic model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain the acoustic model grouping;
    根据所述特征序列中的特征向量计算所述预设假设词序列的所述语言模型得分,得到语言模型得分组;及Calculate the language model score of the preset hypothesis word sequence according to the feature vector in the feature sequence to obtain language model groups; and
    根据所述声学模型得分组和语言模型得分组,计算所述预设假设词序列中假设词的总体得分,将所述总体得分最高的假设词作为所述识别文本。According to the grouping of the acoustic model and the grouping of the language model, the overall score of the hypothesis word in the preset hypothesis word sequence is calculated, and the hypothesis word with the highest overall score is used as the recognized text.
  17. 根据权利要求14所述的存储介质,其特征在于,待训练模型包括所述语义分析模型、所述情感分析模型和所述文本分类模型,所述待训练模型的训练步骤,包括:The storage medium according to claim 14, wherein the model to be trained comprises the semantic analysis model, the sentiment analysis model and the text classification model, and the training step of the model to be trained comprises:
    获取训练样本集,所述训练样本集包括粒度数据样本、语言数据样本和模态数据样本,所述粒度数据样本包括粒度数据特征、语言数据特征、模态数据特征;Acquiring a training sample set, the training sample set including granular data samples, language data samples and modal data samples, the granular data samples including granular data features, language data features, and modal data features;
    获取待训练文本,将待训练文本输入初始待训练模型,得到初始文本;及Obtain the text to be trained, and input the text to be trained into the initial model to be trained to obtain the initial text; and
    根据所述初始文本、所述粒度数据特征、所述语言数据特征、所述模态数据特征对所述初始待训练模型进行参数调整,直到满足收敛条件,得到所述语义分析模型、所述情感分析模型、所述文本分类模型。According to the initial text, the granular data feature, the language data feature, and the modal data feature, the parameters of the initial model to be trained are adjusted until the convergence condition is met, and the semantic analysis model and the emotion are obtained. Analysis model, the text classification model.
  18. 根据权利要求14所述的存储介质,其特征在于,所述语音识别模型包括声学模型和语言模型,所述语音识别模型的训练步骤包括:The storage medium according to claim 14, wherein the speech recognition model comprises an acoustic model and a language model, and the training step of the speech recognition model comprises:
    获取训练样本,所述训练样本包括语言特征和声学特征;Acquiring training samples, the training samples including language features and acoustic features;
    获取待识别训练语音,将待识别训练语音输入初始语言模型,得到初始语言得分;Obtain the training speech to be recognized, and input the training speech to be recognized into the initial language model to obtain the initial language score;
    获取待识别训练语音,将待识别训练语音输入初始声学模型,得到初始声学得分;Obtain the training speech to be recognized, and input the training speech to be recognized into the initial acoustic model to obtain the initial acoustic score;
    根据所述语言特征、所述初始语言得分对所述初始语言模型进行参数调整,根据所述声学特征、所述初始声学得分对所述初始声学模型进行参数调整,直到所述初始语言模型和所述初始声学模型都满足收敛条件,得到所述语音识别模型。Adjust the parameters of the initial language model according to the language feature and the initial language score, and adjust the parameters of the initial acoustic model according to the acoustic feature and the initial acoustic score until the initial language model and the initial language model The initial acoustic models meet the convergence condition, and the speech recognition model is obtained.
PCT/CN2019/118254 2019-07-03 2019-11-14 Retrieval method and apparatus, and computer device and storage medium WO2021000497A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910594101.9 2019-07-03
CN201910594101.9A CN110444198B (en) 2019-07-03 2019-07-03 Retrieval method, retrieval device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021000497A1 true WO2021000497A1 (en) 2021-01-07

Family

ID=68428519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118254 WO2021000497A1 (en) 2019-07-03 2019-11-14 Retrieval method and apparatus, and computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110444198B (en)
WO (1) WO2021000497A1 (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297353A (en) * 2021-06-16 2021-08-24 深圳前海微众银行股份有限公司 Text matching method, device, equipment and storage medium
CN113593535A (en) * 2021-06-30 2021-11-02 青岛海尔科技有限公司 Voice data processing method and device, storage medium and electronic device
CN113704447A (en) * 2021-03-03 2021-11-26 腾讯科技(深圳)有限公司 Text information identification method and related device
CN113724698A (en) * 2021-09-01 2021-11-30 马上消费金融股份有限公司 Training method, device and equipment of speech recognition model and storage medium
CN113761894A (en) * 2021-01-18 2021-12-07 北京沃东天骏信息技术有限公司 Target word removing method, model training method, device, electronic equipment and medium
CN113971203A (en) * 2021-10-26 2022-01-25 福建云知声智能科技有限公司 Information processing method, information processing apparatus, storage medium, and electronic apparatus
CN114299918A (en) * 2021-12-22 2022-04-08 标贝(北京)科技有限公司 Acoustic model training and speech synthesis method, device and system, and storage medium
CN114333790A (en) * 2021-12-03 2022-04-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and program product
CN114387678A (en) * 2022-01-11 2022-04-22 凌云美嘉(西安)智能科技有限公司 Method and apparatus for evaluating language readability using non-verbal body symbols
CN115035984A (en) * 2022-06-17 2022-09-09 上海暖禾脑科学技术有限公司 Method and system for assessing level of psychological consultant in real-time patient persuasion
CN117540917A (en) * 2023-11-14 2024-02-09 大能手教育科技(北京)有限公司 Training platform aided training method, device, equipment and medium
CN117594060A (en) * 2023-10-31 2024-02-23 北京邮电大学 Audio signal content analysis method, device, equipment and storage medium
CN117877525A (en) * 2024-03-13 2024-04-12 广州汇智通信技术有限公司 Audio retrieval method and device based on variable granularity characteristics

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110444198B (en) * 2019-07-03 2023-05-30 平安科技(深圳)有限公司 Retrieval method, retrieval device, computer equipment and storage medium
CN110866410B (en) * 2019-11-15 2023-07-25 深圳市赛为智能股份有限公司 Multilingual conversion method, multilingual conversion device, computer device, and storage medium
CN112069796B (en) * 2020-09-03 2023-08-04 阳光保险集团股份有限公司 Voice quality inspection method and device, electronic equipment and storage medium
CN112600834B (en) * 2020-12-10 2023-03-24 同盾控股有限公司 Content security identification method and device, storage medium and electronic equipment
CN112466278B (en) * 2020-12-16 2022-02-18 北京百度网讯科技有限公司 Voice recognition method and device and electronic equipment
CN113314106A (en) * 2021-05-19 2021-08-27 国网辽宁省电力有限公司 Electric power information query and regulation function calling method based on voice and intention recognition
CN114360533A (en) * 2021-12-20 2022-04-15 日立楼宇技术(广州)有限公司 Interaction method and system based on machine learning, elevator equipment and medium
CN114547474A (en) * 2022-04-21 2022-05-27 北京泰迪熊移动科技有限公司 Data searching method, system, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005275601A (en) * 2004-03-23 2005-10-06 Fujitsu Ltd Information retrieval system by voice
CN105095406A (en) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for voice search based on user feature
CN105260416A (en) * 2015-09-25 2016-01-20 百度在线网络技术(北京)有限公司 Voice recognition based searching method and apparatus
CN106095799A (en) * 2016-05-30 2016-11-09 广州多益网络股份有限公司 The storage of a kind of voice, search method and device
US20180301145A1 (en) * 2010-09-17 2018-10-18 Nuance Communications, Inc. System and Method for Using Prosody for Voice-Enabled Search
CN108961887A (en) * 2018-07-24 2018-12-07 广东小天才科技有限公司 Voice search control method and family education equipment
CN110444198A (en) * 2019-07-03 2019-11-12 平安科技(深圳)有限公司 Search method, device, computer equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8374865B1 (en) * 2012-04-26 2013-02-12 Google Inc. Sampling training data for an automatic speech recognition system based on a benchmark classification distribution
CN104143329B (en) * 2013-08-19 2015-10-21 腾讯科技(深圳)有限公司 Carry out method and the device of voice keyword retrieval
US10347244B2 (en) * 2017-04-21 2019-07-09 Go-Vivace Inc. Dialogue system incorporating unique speech to text conversion method for meaningful dialogue response

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005275601A (en) * 2004-03-23 2005-10-06 Fujitsu Ltd Information retrieval system by voice
US20180301145A1 (en) * 2010-09-17 2018-10-18 Nuance Communications, Inc. System and Method for Using Prosody for Voice-Enabled Search
CN105095406A (en) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 Method and apparatus for voice search based on user feature
CN105260416A (en) * 2015-09-25 2016-01-20 百度在线网络技术(北京)有限公司 Voice recognition based searching method and apparatus
CN106095799A (en) * 2016-05-30 2016-11-09 广州多益网络股份有限公司 The storage of a kind of voice, search method and device
CN108961887A (en) * 2018-07-24 2018-12-07 广东小天才科技有限公司 Voice search control method and family education equipment
CN110444198A (en) * 2019-07-03 2019-11-12 平安科技(深圳)有限公司 Search method, device, computer equipment and storage medium

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113761894A (en) * 2021-01-18 2021-12-07 北京沃东天骏信息技术有限公司 Target word removing method, model training method, device, electronic equipment and medium
CN113704447B (en) * 2021-03-03 2024-05-03 腾讯科技(深圳)有限公司 Text information identification method and related device
CN113704447A (en) * 2021-03-03 2021-11-26 腾讯科技(深圳)有限公司 Text information identification method and related device
CN113297353A (en) * 2021-06-16 2021-08-24 深圳前海微众银行股份有限公司 Text matching method, device, equipment and storage medium
CN113593535A (en) * 2021-06-30 2021-11-02 青岛海尔科技有限公司 Voice data processing method and device, storage medium and electronic device
CN113593535B (en) * 2021-06-30 2024-05-24 青岛海尔科技有限公司 Voice data processing method and device, storage medium and electronic device
CN113724698A (en) * 2021-09-01 2021-11-30 马上消费金融股份有限公司 Training method, device and equipment of speech recognition model and storage medium
CN113724698B (en) * 2021-09-01 2024-01-30 马上消费金融股份有限公司 Training method, device, equipment and storage medium of voice recognition model
CN113971203A (en) * 2021-10-26 2022-01-25 福建云知声智能科技有限公司 Information processing method, information processing apparatus, storage medium, and electronic apparatus
CN114333790A (en) * 2021-12-03 2022-04-12 腾讯科技(深圳)有限公司 Data processing method, device, equipment, storage medium and program product
CN114299918A (en) * 2021-12-22 2022-04-08 标贝(北京)科技有限公司 Acoustic model training and speech synthesis method, device and system, and storage medium
CN114387678A (en) * 2022-01-11 2022-04-22 凌云美嘉(西安)智能科技有限公司 Method and apparatus for evaluating language readability using non-verbal body symbols
CN115035984A (en) * 2022-06-17 2022-09-09 上海暖禾脑科学技术有限公司 Method and system for assessing level of psychological consultant in real-time patient persuasion
CN117594060A (en) * 2023-10-31 2024-02-23 北京邮电大学 Audio signal content analysis method, device, equipment and storage medium
CN117540917A (en) * 2023-11-14 2024-02-09 大能手教育科技(北京)有限公司 Training platform aided training method, device, equipment and medium
CN117540917B (en) * 2023-11-14 2024-05-28 大能手教育科技(北京)有限公司 Training platform aided training method, device, equipment and medium
CN117877525A (en) * 2024-03-13 2024-04-12 广州汇智通信技术有限公司 Audio retrieval method and device based on variable granularity characteristics

Also Published As

Publication number Publication date
CN110444198A (en) 2019-11-12
CN110444198B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
CN113094578B (en) Deep learning-based content recommendation method, device, equipment and storage medium
Ramet et al. Context-aware attention mechanism for speech emotion recognition
US20230089308A1 (en) Speaker-Turn-Based Online Speaker Diarization with Constrained Spectral Clustering
CN111984766A (en) Missing semantic completion method and device
CN114580382A (en) Text error correction method and device
CN110427610A (en) Text analyzing method, apparatus, computer installation and computer storage medium
An et al. Lexical and Acoustic Deep Learning Model for Personality Recognition.
CN109509470A (en) Voice interactive method, device, computer readable storage medium and terminal device
CN113343108B (en) Recommended information processing method, device, equipment and storage medium
US12100388B2 (en) Method and apparatus for training speech recognition model, electronic device and storage medium
CN112307770A (en) Sensitive information detection method and device, electronic equipment and storage medium
WO2021129410A1 (en) Method and device for text processing
WO2021129411A1 (en) Text processing method and device
CN114398902A (en) Chinese semantic extraction method and related equipment based on artificial intelligence
CN110931002B (en) Man-machine interaction method, device, computer equipment and storage medium
CN119047494B (en) Neural network text translation enhancement method and system in multilingual cross-language environment
CN111126084A (en) Data processing method and device, electronic equipment and storage medium
CN114218356B (en) Semantic recognition method, device, equipment and storage medium based on artificial intelligence
CN113158052B (en) Chat content recommendation method, chat content recommendation device, computer equipment and storage medium
CN115174285A (en) Conference record generation method and device and electronic equipment
KR20210085694A (en) Apparatus for image captioning and method thereof
CN116579333A (en) Keyword extraction method, device, computer equipment and storage medium
CN115358216A (en) Text error correction method and device, electronic terminal and storage medium
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19936275

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19936275

Country of ref document: EP

Kind code of ref document: A1