[go: up one dir, main page]

CN116486796A - Intelligent voice recognition service system - Google Patents

Intelligent voice recognition service system Download PDF

Info

Publication number
CN116486796A
CN116486796A CN202310639419.0A CN202310639419A CN116486796A CN 116486796 A CN116486796 A CN 116486796A CN 202310639419 A CN202310639419 A CN 202310639419A CN 116486796 A CN116486796 A CN 116486796A
Authority
CN
China
Prior art keywords
voice recognition
voice
data
result
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310639419.0A
Other languages
Chinese (zh)
Inventor
谢民雄
朱立谷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jinshangqi Technology Co ltd
Original Assignee
Beijing Jinshangqi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jinshangqi Technology Co ltd filed Critical Beijing Jinshangqi Technology Co ltd
Priority to CN202310639419.0A priority Critical patent/CN116486796A/en
Publication of CN116486796A publication Critical patent/CN116486796A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an intelligent voice recognition service system, which comprises a signal acquisition module, a preprocessing module, a characteristic extraction module, a semantic analysis module and a result output module, wherein the signal acquisition module is used for acquiring initial voice signals through an access interface, and performing voice recognition on the initial voice signals to obtain initial voice recognition data; the preprocessing module is used for preprocessing the initial voice recognition data; the feature extraction module is used for inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training and outputting a target voice recognition result; the semantic analysis module is used for carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result; the result output module is used for judging whether the semantic analysis result is complete or not, and if yes, outputting the semantic analysis result through the output interface; the invention improves the accuracy of voice recognition and enhances the user experience.

Description

Intelligent voice recognition service system
Technical Field
The invention relates to the technical field of voice recognition, in particular to an intelligent voice recognition service system.
Background
In recent years, modern science and technology presents a situation of high-speed development, and particularly, artificial intelligence technology has not been developed before, and the technology fully experiences the advantage of convenience brought by the modern science and technology. With the implementation of convenience advantages, people have begun to pursue higher quality of life, and intelligent speech recognition technology has been rapidly developed in order to facilitate life and work. In recent years, intelligent speech recognition technology has been increasingly used in human life and work; for example, the intelligent interaction system of the intelligent interaction system supports the functions of professional call center systems such as high-capacity telephone incoming/outgoing call processing, telephone switching, incoming call screen, intelligent IVR, intelligent ACD, call records, call seat management, work order management, data report, performance statistics and the like, supports the functional advantages of deployment edition, cloud SaaS edition call center, stable call, clear tone quality, multi-level IVR voice, VIP private line and the like, and captures the calling experience of one call.
At present, by recognizing voice input information of a user, new problems are brought to automatic completion and user interaction, the recognition rate of voice recognition is not high, and the user experience effect is poor.
Disclosure of Invention
The invention aims to solve the problems, and designs an intelligent voice recognition service system.
The technical proposal for realizing the aim is that in the intelligent voice recognition service system, the intelligent voice recognition service system comprises a signal acquisition module, a preprocessing module, a characteristic extraction module, a semantic analysis module and a result output module, wherein,
the signal acquisition module is used for acquiring an initial voice signal through the access interface, and performing voice recognition on the initial voice signal to obtain initial voice recognition data;
the preprocessing module is used for preprocessing the initial voice recognition data;
the feature extraction module is used for inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training and outputting a target voice recognition result;
the semantic analysis module is used for carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result;
and the result output module is used for judging whether the semantic analysis result is complete or not, and outputting the semantic analysis result through an output interface if the semantic analysis result is complete.
Further, in the above intelligent voice recognition service system, the signal acquisition module comprises a voice input unit, a voice conversion unit, a feature extraction unit and a normalization processing unit, wherein,
the voice input unit is used for receiving user voice data acquired through the voice equipment;
a voice conversion unit for converting the user sound data into an initial voice signal;
the feature extraction unit is used for dividing the initial voice signal into a plurality of channel information and respectively extracting time domain features for each channel signal;
the normalization processing unit is used for performing normalization processing to obtain feature vectors and obtaining initial voice recognition data based on the feature vectors.
Further, in the above intelligent voice recognition service system, the preprocessing module includes a filtering processing unit, a weighting processing unit, a framing processing unit and a segmentation processing unit, wherein,
the filtering processing unit is used for carrying out filtering processing on the initial voice recognition data by utilizing spectral subtraction to obtain a first voice electric signal;
the weighting processing unit is used for weighting the language in the high-frequency stage in the first voice electric signal by adopting a pre-weighting method to obtain a second voice electric signal;
the framing processing unit is used for carrying out signal framing processing on the second voice electric signal to obtain a third voice electric signal containing a plurality of small sections;
and the segmentation processing unit is used for realizing segmentation of the vocabulary signal and the noise signal of the third voice electric signal through an endpoint detection method to obtain preprocessed initial voice recognition data.
Further, in the above intelligent voice recognition service system, the training process of the voice recognition model includes the following steps:
acquiring voice characteristics containing a large number of vocabularies, numbering the vocabularies, and obtaining a sample training set;
inputting the sample training set into a neural network model, and outputting result data;
the vocabulary numbers corresponding to the voice features are used as reference data, the result data and the reference data are compared, and data errors are calculated;
and stopping training the neural network model when the data errors of the result data and the reference data are lower than a preset threshold value, so as to obtain a voice recognition model.
Further, in the above-mentioned intelligent speech recognition service system, the neural network is composed of an input layer, a hidden layer and an output layer, the basic unit is a neuron, the neuron of the input layer is the extracted speech feature, the neuron of the hidden layer is constructed by sample training, and the neuron of the output layer is the speech recognition vocabulary.
Further, in the above intelligent voice recognition service system, the semantic analysis module comprises a vocabulary analysis unit, a ranking processing unit, an expansion processing unit and a semantic recognition unit, wherein,
the vocabulary analysis unit is used for acquiring a voice recognition vocabulary in the target voice recognition result, and sequentially carrying out lexical analysis processing and grammar analysis processing on the voice recognition vocabulary to obtain a plurality of candidate semantic characterizations;
the sorting processing unit is used for sorting the plurality of candidate semantic representations according to the high-to-low order, and screening out high-score candidate semantic representations according to a preset range;
the expansion processing unit is used for expanding the high-score candidate semantic representation to obtain semantic representation data;
the semantic identification unit is used for carrying out semantic role labeling and word sense disambiguation on the semantic representation data and outputting a semantic analysis result corresponding to the target voice identification result.
Further, in the above intelligent speech recognition service system, the lexical analysis processing mode includes word segmentation processing and part-of-speech tagging processing, and the syntax analysis processing includes dependency syntax analysis processing and dependency relationship analysis processing.
Further, in the above intelligent speech recognition service system, the intelligent speech recognition service system further comprises a processor, a computer storage memory, wherein the computer storage memory stores data and has stored thereon computer-executable instructions that, when executed by the processor, effect storage.
The intelligent voice recognition service system has the beneficial effects that the intelligent voice recognition service system comprises a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, the influence of noise on voice recognition is reduced through the preprocessing module, the feature extraction module is used for learning by adopting a neural network model, the semantic analysis is carried out on voice recognition results through the semantic analysis module and the result output module, the voice recognition accuracy is improved, and therefore the user experience is enhanced.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention.
FIG. 1 is a schematic diagram of an embodiment of an intelligent speech recognition service system according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an operation method of the intelligent voice recognition service system according to an embodiment of the invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The invention will be described in detail with reference to the accompanying drawings, as shown in fig. 1, an intelligent voice recognition service system includes a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, wherein,
the signal acquisition module is used for acquiring an initial voice signal through the access interface, and performing voice recognition on the initial voice signal to obtain initial voice recognition data;
the preprocessing module is used for preprocessing the initial voice recognition data;
the feature extraction module is used for inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training and outputting a target voice recognition result;
the semantic analysis module is used for carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result;
and the result output module is used for judging whether the semantic analysis result is complete or not, and if yes, outputting the semantic analysis result through the output interface.
In this embodiment, the signal acquisition module includes a sound input unit, a voice conversion unit, a feature extraction unit, and a normalization processing unit, wherein,
the voice input unit is used for receiving user voice data acquired through the voice equipment;
a voice conversion unit for converting user sound data into an initial voice signal;
the feature extraction unit is used for dividing the initial voice signal into a plurality of channel information and respectively extracting time domain features for each channel signal;
the normalization processing unit is used for performing normalization processing to obtain feature vectors and obtaining initial voice recognition data based on the feature vectors.
In this embodiment, the preprocessing module includes a filtering processing unit, a weighting processing unit, a framing processing unit, and a segmentation processing unit, where,
the filtering processing unit is used for performing filtering processing on the initial voice recognition data by utilizing spectral subtraction to obtain a first voice electric signal;
the weighting processing unit is used for carrying out weighting processing on the language in the high-frequency stage in the first voice electric signal by adopting a pre-weighting method to obtain a second voice electric signal;
the framing processing unit is used for carrying out signal framing processing on the second voice electric signal to obtain a third voice electric signal containing a plurality of small segments;
and the segmentation processing unit is used for realizing segmentation of the vocabulary signal and the noise signal of the third voice electric signal by an endpoint detection method to obtain preprocessed initial voice recognition data.
In this embodiment, the training process of the speech recognition model includes the following steps:
acquiring voice characteristics containing a large number of vocabularies, numbering the vocabularies, and obtaining a sample training set;
inputting the sample training set into a neural network model, and outputting result data;
the vocabulary numbers corresponding to the voice features are used as reference data, result data and the reference data are compared, and data errors are calculated;
and stopping training the neural network model when the data errors of the result data and the reference data are lower than a preset threshold value, so as to obtain the voice recognition model.
In this embodiment, the neural network is composed of an input layer, a hidden layer and an output layer, the basic unit is a neuron, the neuron of the input layer is the extracted voice feature, the neuron of the hidden layer is constructed through sample training, and the neuron of the output layer is the voice recognition vocabulary.
In this embodiment, the semantic analysis module includes a vocabulary analysis unit, a ranking processing unit, an expansion processing unit, and a semantic recognition unit, wherein,
the vocabulary analysis unit is used for acquiring a voice recognition vocabulary in the target voice recognition result, and performing lexical analysis processing and grammar analysis processing on the voice recognition vocabulary in sequence to obtain a plurality of candidate semantic characterizations;
the sorting processing unit is used for sorting the plurality of candidate semantic representations according to the high-to-low order, and screening out high-score candidate semantic representations according to a preset range;
the expansion processing unit is used for expanding the high-score candidate semantic representation to obtain semantic representation data;
the semantic recognition unit is used for carrying out semantic role labeling and word sense disambiguation on the semantic representation data and outputting a semantic analysis result corresponding to the target voice recognition result.
In this embodiment, the intelligent speech recognition service system further comprises a processor, a computer storage memory, wherein the computer storage memory stores data and has stored thereon computer-executable instructions that when executed by the processor will effect the storing.
In the embodiment of the invention, the intelligent voice recognition service system comprises a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, wherein the influence of noise on voice recognition is reduced through the preprocessing module, the feature extraction module is used for learning by adopting a neural network model, the semantic analysis module and the result output module are used for carrying out semantic analysis on voice recognition results, and the voice recognition accuracy is improved, so that the user experience is enhanced.
The following describes an operation method of the intelligent voice recognition service system according to an embodiment of the present invention, as shown in fig. 2, the operation method includes the following steps:
step 201, collecting an initial voice signal through an access interface, and performing voice recognition on the initial voice signal to obtain initial voice recognition data;
step 202, preprocessing initial voice recognition data;
step 203, inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training, and outputting a target voice recognition result;
204, carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result;
and 205, judging whether the semantic analysis result is complete, and if so, outputting the semantic analysis result through an output interface.
In the embodiment of the invention, an initial voice signal is acquired through an access interface, and voice recognition is carried out on the initial voice signal to obtain initial voice recognition data; preprocessing initial voice recognition data; inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training, and outputting a target voice recognition result; carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result; judging whether the semantic analysis result is complete, if so, outputting the semantic analysis result through an output interface; the invention improves the accuracy of voice recognition, thereby enhancing the user experience.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. An intelligent voice recognition service system is characterized by comprising a signal acquisition module, a preprocessing module, a feature extraction module, a semantic analysis module and a result output module, wherein,
the signal acquisition module is used for acquiring an initial voice signal through the access interface, and performing voice recognition on the initial voice signal to obtain initial voice recognition data;
the preprocessing module is used for preprocessing the initial voice recognition data;
the feature extraction module is used for inputting the preprocessed initial voice recognition data into a voice recognition model obtained by pre-training and outputting a target voice recognition result;
the semantic analysis module is used for carrying out semantic analysis on the target voice recognition result to obtain a semantic analysis result corresponding to the target voice recognition result;
and the result output module is used for judging whether the semantic analysis result is complete or not, and outputting the semantic analysis result through an output interface if the semantic analysis result is complete.
2. The intelligent speech recognition service system of claim 1 wherein the signal acquisition module comprises a sound input unit, a speech conversion unit, a feature extraction unit, and a normalization processing unit, wherein,
the voice input unit is used for receiving user voice data acquired through the voice equipment;
a voice conversion unit for converting the user sound data into an initial voice signal;
the feature extraction unit is used for dividing the initial voice signal into a plurality of channel information and respectively extracting time domain features for each channel signal;
the normalization processing unit is used for performing normalization processing to obtain feature vectors and obtaining initial voice recognition data based on the feature vectors.
3. The intelligent speech recognition service system of claim 1 wherein the preprocessing module comprises a filtering processing unit, a weighting processing unit, a framing processing unit, and a segmentation processing unit, wherein,
the filtering processing unit is used for carrying out filtering processing on the initial voice recognition data by utilizing spectral subtraction to obtain a first voice electric signal;
the weighting processing unit is used for weighting the language in the high-frequency stage in the first voice electric signal by adopting a pre-weighting method to obtain a second voice electric signal;
the framing processing unit is used for carrying out signal framing processing on the second voice electric signal to obtain a third voice electric signal containing a plurality of small sections;
and the segmentation processing unit is used for realizing segmentation of the vocabulary signal and the noise signal of the third voice electric signal through an endpoint detection method to obtain preprocessed initial voice recognition data.
4. The intelligent speech recognition service system of claim 1 wherein the training process of the speech recognition model comprises the steps of:
acquiring voice characteristics containing a large number of vocabularies, numbering the vocabularies, and obtaining a sample training set;
inputting the sample training set into a neural network model, and outputting result data;
the vocabulary numbers corresponding to the voice features are used as reference data, the result data and the reference data are compared, and data errors are calculated;
and stopping training the neural network model when the data errors of the result data and the reference data are lower than a preset threshold value, so as to obtain a voice recognition model.
5. The intelligent speech recognition service system according to claim 4, wherein the neural network is composed of an input layer, a hidden layer and an output layer, the basic unit is a neuron, the neuron of the input layer is the extracted speech feature, the neuron of the hidden layer is constructed by sample training, and the neuron of the output layer is a speech recognition vocabulary.
6. The intelligent speech recognition service system of claim 1 wherein the semantic parsing module comprises a vocabulary analysis unit, a ranking processing unit, an expansion processing unit, and a semantic recognition unit, wherein,
the vocabulary analysis unit is used for acquiring a voice recognition vocabulary in the target voice recognition result, and sequentially carrying out lexical analysis processing and grammar analysis processing on the voice recognition vocabulary to obtain a plurality of candidate semantic characterizations;
the sorting processing unit is used for sorting the plurality of candidate semantic representations according to the high-to-low order, and screening out high-score candidate semantic representations according to a preset range;
the expansion processing unit is used for expanding the high-score candidate semantic representation to obtain semantic representation data;
the semantic identification unit is used for carrying out semantic role labeling and word sense disambiguation on the semantic representation data and outputting a semantic analysis result corresponding to the target voice identification result.
7. The intelligent speech recognition service system according to claim 6, wherein the lexical analysis processing means comprises word segmentation processing and part-of-speech tagging processing, and the syntax analysis processing comprises dependency syntax analysis processing and dependency relation analysis processing.
8. The intelligent speech recognition service system of claim 1 further comprising a processor, a computer storage memory, wherein the computer storage memory stores data and has stored thereon computer-executable instructions that when executed by the processor will effect the storing.
CN202310639419.0A 2023-06-01 2023-06-01 Intelligent voice recognition service system Pending CN116486796A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310639419.0A CN116486796A (en) 2023-06-01 2023-06-01 Intelligent voice recognition service system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310639419.0A CN116486796A (en) 2023-06-01 2023-06-01 Intelligent voice recognition service system

Publications (1)

Publication Number Publication Date
CN116486796A true CN116486796A (en) 2023-07-25

Family

ID=87214085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310639419.0A Pending CN116486796A (en) 2023-06-01 2023-06-01 Intelligent voice recognition service system

Country Status (1)

Country Link
CN (1) CN116486796A (en)

Similar Documents

Publication Publication Date Title
CN109817213B (en) Method, device and equipment for performing voice recognition on self-adaptive language
CN109599093B (en) Intelligent quality inspection keyword detection method, device and equipment and readable storage medium
CN111524527B (en) Speaker separation method, speaker separation device, electronic device and storage medium
CN108682420B (en) Audio and video call dialect recognition method and terminal equipment
CN111477216A (en) Training method and system for pronunciation understanding model of conversation robot
US20050065789A1 (en) System and method with automated speech recognition engines
CN112102850A (en) Processing method, device and medium for emotion recognition and electronic equipment
WO2023048746A1 (en) Speaker-turn-based online speaker diarization with constrained spectral clustering
US20130030794A1 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN112131359A (en) Intention identification method based on graphical arrangement intelligent strategy and electronic equipment
CN114818649A (en) Service consultation processing method and device based on intelligent voice interaction technology
CN111489743A (en) Operation management analysis system based on intelligent voice technology
CN109872714A (en) A kind of method, electronic equipment and storage medium improving accuracy of speech recognition
JP6605105B1 (en) Sentence symbol insertion apparatus and method
US20030120490A1 (en) Method for creating a speech database for a target vocabulary in order to train a speech recorgnition system
CN111916057A (en) Language identification method and device, electronic equipment and computer readable storage medium
CN115457938A (en) Method, device, storage medium and electronic device for identifying awakening words
CN112087726B (en) Method and system for identifying polyphonic ringtone, electronic equipment and storage medium
US7340398B2 (en) Selective sampling for sound signal classification
WO2025001000A1 (en) Cognitive test method, cognitive test apparatus, electronic device and storage medium
CN118072734A (en) Speech recognition method, device, processor, memory and electronic device
CN116486796A (en) Intelligent voice recognition service system
Barnard et al. Real-world speech recognition with neural networks
CN116386601A (en) Intelligent voice customer service question answering method and system
CN115831125A (en) Speech recognition method, device, equipment, storage medium and product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination