CN116486796A

CN116486796A - Intelligent voice recognition service system

Info

Publication number: CN116486796A
Application number: CN202310639419.0A
Authority: CN
Inventors: 谢民雄; 朱立谷
Original assignee: Beijing Jinshangqi Technology Co ltd
Current assignee: Beijing Jinshangqi Technology Co ltd
Priority date: 2023-06-01
Filing date: 2023-06-01
Publication date: 2023-07-25

Abstract

The invention discloses an intelligent voice recognition service system, which comprises a signal acquisition module, a preprocessing module, a characteristic extraction module, a semantic analysis module and a result output module, wherein the signal acquisition module is used for acquiring initial voice signals through an access interface, and performing voice recognition on the initial voice signals to obtain initial voice recognition data; the preprocessing module is used for preprocessing the initial voice recognition data; the feature extraction module is used for inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training and outputting a target voice recognition result; the semantic analysis module is used for carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result; the result output module is used for judging whether the semantic analysis result is complete or not, and if yes, outputting the semantic analysis result through the output interface; the invention improves the accuracy of voice recognition and enhances the user experience.

Description

Intelligent voice recognition service system

Technical Field

The invention relates to the technical field of voice recognition, in particular to an intelligent voice recognition service system.

Background

In recent years, modern science and technology presents a situation of high-speed development, and particularly, artificial intelligence technology has not been developed before, and the technology fully experiences the advantage of convenience brought by the modern science and technology. With the implementation of convenience advantages, people have begun to pursue higher quality of life, and intelligent speech recognition technology has been rapidly developed in order to facilitate life and work. In recent years, intelligent speech recognition technology has been increasingly used in human life and work; for example, the intelligent interaction system of the intelligent interaction system supports the functions of professional call center systems such as high-capacity telephone incoming/outgoing call processing, telephone switching, incoming call screen, intelligent IVR, intelligent ACD, call records, call seat management, work order management, data report, performance statistics and the like, supports the functional advantages of deployment edition, cloud SaaS edition call center, stable call, clear tone quality, multi-level IVR voice, VIP private line and the like, and captures the calling experience of one call.

At present, by recognizing voice input information of a user, new problems are brought to automatic completion and user interaction, the recognition rate of voice recognition is not high, and the user experience effect is poor.

Disclosure of Invention

The invention aims to solve the problems, and designs an intelligent voice recognition service system.

The technical proposal for realizing the aim is that in the intelligent voice recognition service system, the intelligent voice recognition service system comprises a signal acquisition module, a preprocessing module, a characteristic extraction module, a semantic analysis module and a result output module, wherein,

the signal acquisition module is used for acquiring an initial voice signal through the access interface, and performing voice recognition on the initial voice signal to obtain initial voice recognition data;

the preprocessing module is used for preprocessing the initial voice recognition data;

the feature extraction module is used for inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training and outputting a target voice recognition result;

the semantic analysis module is used for carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result;

and the result output module is used for judging whether the semantic analysis result is complete or not, and outputting the semantic analysis result through an output interface if the semantic analysis result is complete.

Further, in the above intelligent voice recognition service system, the signal acquisition module comprises a voice input unit, a voice conversion unit, a feature extraction unit and a normalization processing unit, wherein,

the voice input unit is used for receiving user voice data acquired through the voice equipment;

a voice conversion unit for converting the user sound data into an initial voice signal;

the feature extraction unit is used for dividing the initial voice signal into a plurality of channel information and respectively extracting time domain features for each channel signal;

the normalization processing unit is used for performing normalization processing to obtain feature vectors and obtaining initial voice recognition data based on the feature vectors.

Further, in the above intelligent voice recognition service system, the preprocessing module includes a filtering processing unit, a weighting processing unit, a framing processing unit and a segmentation processing unit, wherein,

the filtering processing unit is used for carrying out filtering processing on the initial voice recognition data by utilizing spectral subtraction to obtain a first voice electric signal;

the weighting processing unit is used for weighting the language in the high-frequency stage in the first voice electric signal by adopting a pre-weighting method to obtain a second voice electric signal;

the framing processing unit is used for carrying out signal framing processing on the second voice electric signal to obtain a third voice electric signal containing a plurality of small sections;

and the segmentation processing unit is used for realizing segmentation of the vocabulary signal and the noise signal of the third voice electric signal through an endpoint detection method to obtain preprocessed initial voice recognition data.

Further, in the above intelligent voice recognition service system, the training process of the voice recognition model includes the following steps:

acquiring voice characteristics containing a large number of vocabularies, numbering the vocabularies, and obtaining a sample training set;

inputting the sample training set into a neural network model, and outputting result data;

the vocabulary numbers corresponding to the voice features are used as reference data, the result data and the reference data are compared, and data errors are calculated;

and stopping training the neural network model when the data errors of the result data and the reference data are lower than a preset threshold value, so as to obtain a voice recognition model.

Further, in the above-mentioned intelligent speech recognition service system, the neural network is composed of an input layer, a hidden layer and an output layer, the basic unit is a neuron, the neuron of the input layer is the extracted speech feature, the neuron of the hidden layer is constructed by sample training, and the neuron of the output layer is the speech recognition vocabulary.

Further, in the above intelligent voice recognition service system, the semantic analysis module comprises a vocabulary analysis unit, a ranking processing unit, an expansion processing unit and a semantic recognition unit, wherein,

the vocabulary analysis unit is used for acquiring a voice recognition vocabulary in the target voice recognition result, and sequentially carrying out lexical analysis processing and grammar analysis processing on the voice recognition vocabulary to obtain a plurality of candidate semantic characterizations;

the sorting processing unit is used for sorting the plurality of candidate semantic representations according to the high-to-low order, and screening out high-score candidate semantic representations according to a preset range;

the expansion processing unit is used for expanding the high-score candidate semantic representation to obtain semantic representation data;

the semantic identification unit is used for carrying out semantic role labeling and word sense disambiguation on the semantic representation data and outputting a semantic analysis result corresponding to the target voice identification result.

Further, in the above intelligent speech recognition service system, the lexical analysis processing mode includes word segmentation processing and part-of-speech tagging processing, and the syntax analysis processing includes dependency syntax analysis processing and dependency relationship analysis processing.

Further, in the above intelligent speech recognition service system, the intelligent speech recognition service system further comprises a processor, a computer storage memory, wherein the computer storage memory stores data and has stored thereon computer-executable instructions that, when executed by the processor, effect storage.

The intelligent voice recognition service system has the beneficial effects that the intelligent voice recognition service system comprises a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, the influence of noise on voice recognition is reduced through the preprocessing module, the feature extraction module is used for learning by adopting a neural network model, the semantic analysis is carried out on voice recognition results through the semantic analysis module and the result output module, the voice recognition accuracy is improved, and therefore the user experience is enhanced.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.

FIG. 1 is a schematic diagram of an embodiment of an intelligent speech recognition service system according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an operation method of the intelligent voice recognition service system according to an embodiment of the invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The invention will be described in detail with reference to the accompanying drawings, as shown in fig. 1, an intelligent voice recognition service system includes a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, wherein,

and the result output module is used for judging whether the semantic analysis result is complete or not, and if yes, outputting the semantic analysis result through the output interface.

In this embodiment, the signal acquisition module includes a sound input unit, a voice conversion unit, a feature extraction unit, and a normalization processing unit, wherein,

a voice conversion unit for converting user sound data into an initial voice signal;

In this embodiment, the preprocessing module includes a filtering processing unit, a weighting processing unit, a framing processing unit, and a segmentation processing unit, where,

the filtering processing unit is used for performing filtering processing on the initial voice recognition data by utilizing spectral subtraction to obtain a first voice electric signal;

the weighting processing unit is used for carrying out weighting processing on the language in the high-frequency stage in the first voice electric signal by adopting a pre-weighting method to obtain a second voice electric signal;

the framing processing unit is used for carrying out signal framing processing on the second voice electric signal to obtain a third voice electric signal containing a plurality of small segments;

and the segmentation processing unit is used for realizing segmentation of the vocabulary signal and the noise signal of the third voice electric signal by an endpoint detection method to obtain preprocessed initial voice recognition data.

In this embodiment, the training process of the speech recognition model includes the following steps:

the vocabulary numbers corresponding to the voice features are used as reference data, result data and the reference data are compared, and data errors are calculated;

and stopping training the neural network model when the data errors of the result data and the reference data are lower than a preset threshold value, so as to obtain the voice recognition model.

In this embodiment, the neural network is composed of an input layer, a hidden layer and an output layer, the basic unit is a neuron, the neuron of the input layer is the extracted voice feature, the neuron of the hidden layer is constructed through sample training, and the neuron of the output layer is the voice recognition vocabulary.

In this embodiment, the semantic analysis module includes a vocabulary analysis unit, a ranking processing unit, an expansion processing unit, and a semantic recognition unit, wherein,

the vocabulary analysis unit is used for acquiring a voice recognition vocabulary in the target voice recognition result, and performing lexical analysis processing and grammar analysis processing on the voice recognition vocabulary in sequence to obtain a plurality of candidate semantic characterizations;

the semantic recognition unit is used for carrying out semantic role labeling and word sense disambiguation on the semantic representation data and outputting a semantic analysis result corresponding to the target voice recognition result.

In this embodiment, the intelligent speech recognition service system further comprises a processor, a computer storage memory, wherein the computer storage memory stores data and has stored thereon computer-executable instructions that when executed by the processor will effect the storing.

In the embodiment of the invention, the intelligent voice recognition service system comprises a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, wherein the influence of noise on voice recognition is reduced through the preprocessing module, the feature extraction module is used for learning by adopting a neural network model, the semantic analysis module and the result output module are used for carrying out semantic analysis on voice recognition results, and the voice recognition accuracy is improved, so that the user experience is enhanced.

The following describes an operation method of the intelligent voice recognition service system according to an embodiment of the present invention, as shown in fig. 2, the operation method includes the following steps:

step 201, collecting an initial voice signal through an access interface, and performing voice recognition on the initial voice signal to obtain initial voice recognition data;

step 202, preprocessing initial voice recognition data;

step 203, inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training, and outputting a target voice recognition result;

204, carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result;

and 205, judging whether the semantic analysis result is complete, and if so, outputting the semantic analysis result through an output interface.

In the embodiment of the invention, an initial voice signal is acquired through an access interface, and voice recognition is carried out on the initial voice signal to obtain initial voice recognition data; preprocessing initial voice recognition data; inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training, and outputting a target voice recognition result; carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result; judging whether the semantic analysis result is complete, if so, outputting the semantic analysis result through an output interface; the invention improves the accuracy of voice recognition, thereby enhancing the user experience.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An intelligent voice recognition service system is characterized by comprising a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, wherein,

2. The intelligent speech recognition service system of claim 1 wherein the signal acquisition module comprises a sound input unit, a speech conversion unit, a feature extraction unit, and a normalization processing unit, wherein,

3. The intelligent speech recognition service system of claim 1 wherein the preprocessing module comprises a filtering processing unit, a weighting processing unit, a framing processing unit, and a segmentation processing unit, wherein,

4. The intelligent speech recognition service system of claim 1 wherein the training process of the speech recognition model comprises the steps of:

5. The intelligent speech recognition service system according to claim 4, wherein the neural network is composed of an input layer, a hidden layer and an output layer, the basic unit is a neuron, the neuron of the input layer is the extracted speech feature, the neuron of the hidden layer is constructed by sample training, and the neuron of the output layer is a speech recognition vocabulary.

6. The intelligent speech recognition service system of claim 1 wherein the semantic parsing module comprises a vocabulary analysis unit, a ranking processing unit, an expansion processing unit, and a semantic recognition unit, wherein,

7. The intelligent speech recognition service system according to claim 6, wherein the lexical analysis processing means comprises word segmentation processing and part-of-speech tagging processing, and the syntax analysis processing comprises dependency syntax analysis processing and dependency relation analysis processing.

8. The intelligent speech recognition service system of claim 1 further comprising a processor, a computer storage memory, wherein the computer storage memory stores data and has stored thereon computer-executable instructions that when executed by the processor will effect the storing.