[go: up one dir, main page]

CN114882886A - CTC simulation training voice recognition processing method, storage medium and electronic equipment - Google Patents

CTC simulation training voice recognition processing method, storage medium and electronic equipment Download PDF

Info

Publication number
CN114882886A
CN114882886A CN202210452691.3A CN202210452691A CN114882886A CN 114882886 A CN114882886 A CN 114882886A CN 202210452691 A CN202210452691 A CN 202210452691A CN 114882886 A CN114882886 A CN 114882886A
Authority
CN
China
Prior art keywords
statement
flow
ctc
recognition
online
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210452691.3A
Other languages
Chinese (zh)
Other versions
CN114882886B (en
Inventor
王磊
郭欢
王兴利
王钦兰
孙雄峰
刘君
王文杰
刘述昌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Casco Signal Ltd
Original Assignee
Casco Signal Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Casco Signal Ltd filed Critical Casco Signal Ltd
Priority to CN202210452691.3A priority Critical patent/CN114882886B/en
Publication of CN114882886A publication Critical patent/CN114882886A/en
Application granted granted Critical
Publication of CN114882886B publication Critical patent/CN114882886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B9/00Simulators for teaching or training purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a CTC simulation training voice recognition processing method, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring dynamic parameters of a rail transit emergency disposal flow, and forming a plurality of groups of flow statements according to the dynamic parameters of the emergency disposal flow; acquiring audio data sent by a calling object on line, converting the audio data into sentence texts by adopting a voice recognizer to obtain on-line recognition sentences, and returning the on-line recognition sentences in real time; and matching the plurality of sets of flow statements with the online identification statement so as to obtain the flow statement with the confidence coefficient larger than the preset threshold value from the plurality of sets of flow statements. The method can combine the characteristics of the flow voice event, realize the standardized language machine check of the fault emergency disposal training and improve the accuracy of the system voice recognition.

Description

CTC simulation training voice recognition processing method, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of voice recognition, in particular to a CTC simulation practical training voice recognition processing method, a storage medium and electronic equipment.
Background
At present, standard phrase assessment aiming at fault emergency disposal training of railway dispatching commanders is mainly based on artificial subjective judgment, the mode has higher requirement on the service level of teachers, the training efficiency is low, the training period and the training cost are invisibly increased, and the judgment modes of different teachers have certain difference and are influenced by more interference factors, so the mode of manual assessment lacks universality and universality.
At present, although the voice recognition technology is mature, a vertical engine in a railway field is not determined in a mainstream voice recognition library, and the recognition effect of railway special terms such as train number, speed limit value, station name and the like is far from meeting practical requirements, so that the normal circulation of a process voice event is influenced, and the user experience is reduced. Because the voice recognition technology relates to a plurality of disciplines such as acoustics, linguistics, signal processing, computer disciplines and the like, the autonomous development difficulty coefficient is high, the engineering quantity is large, the effect is difficult to expect, and the autonomous development difficulty coefficient is influenced by factors such as railway network information safety and the like, and part of training environments are not connected to the Internet and need to have offline recognition capability.
Therefore, in the railway field, an effective solution of machine assessment process standard expressions which have online and offline voice recognition capabilities and can improve the system voice recognition accuracy is still lacked.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, the first purpose of the invention is to provide a CTC simulation training speech recognition processing method, so as to realize online and offline speech recognition capability, combine the characteristics of flow speech events, realize standardized language machine assessment of fault emergency treatment training and improve the speech recognition accuracy of the system.
A second object of the present invention is to provide a computer-readable storage medium.
A third object of the present invention is to provide an electronic apparatus.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a CTC simulation training voice recognition processing method comprises the following steps: acquiring dynamic parameters of a rail transit emergency disposal flow, and forming a plurality of groups of flow statements according to the dynamic parameters of the emergency disposal flow; acquiring audio data sent by a calling object on line, converting the audio data into a sentence text by adopting a voice recognizer to obtain an on-line recognition sentence, and returning the on-line recognition sentence in real time; and matching the plurality of sets of flow statements with the online identification statement so as to obtain the flow statements with the confidence degrees larger than the preset threshold value from the plurality of sets of flow statements.
Optionally, when the speech recognizer returns the online recognition statement in real time, the method further includes: and the speech recognizer dynamically corrects the online recognition statement according to the conversation context and the sentence up-down relation of the conversation content.
Optionally, after the audio data is collected and before the audio data is converted, the method further includes: and detecting the current network connection state, and if the current network connection state is in a disconnection state, switching the current online voice recognition mode to a voice offline recognition mode so as to perform offline recognition processing on the acquired audio data through the voice offline recognition mode.
Optionally, the speech recognizer is a third-party speech recognition device, the third-party speech recognition device is configured with a public cloud, and before the audio data is uploaded to the speech recognizer for conversion, the method further includes: uploading hot words to the public cloud, and performing online voice dictation basic parameter setting on the voice recognizer, wherein the hot words comprise at least one of a train number, flow statement keywords and a railway special name word library, and the online voice dictation basic parameters comprise at least one of punctuation addition parameters, language area parameters, text return format parameters and engine type parameters.
Optionally, before matching the plurality of sets of flow statements with the online identification statement, the method further includes: and carrying out online preprocessing on the flow statement.
Optionally, the online preprocessing of the flow statement includes at least one of the following processing modes: when the flow statement contains a plurality of train numbers, carrying out multi-train statement segmentation; deleting items of content repetition or un-replaced items of the dynamic parameters of the emergency disposal flow in the plurality of sets of flow statements; replacing characters which do not accord with the dispatching command standard expression in the flow statement with Chinese characters; changing the train number to meet the standard reading requirement; and deleting the symbols which do not accord with the dispatching command standard in the flow statement.
Optionally, after performing online preprocessing on the flow statement, the method further includes: acquiring the word number of each group of flow sentences and the word number of online identification sentences; determining the number difference and the number difference rate of each group of flow sentences and online identification sentences according to the number of words of each group of flow sentences and the number of words of the online identification sentences, and screening a plurality of groups of flow sentences of which the number differences and the number difference rates both meet preset conditions from each group of flow sentences to obtain a first flow sentence set, wherein the preset conditions are expressed by the following formulas:
Figure BDA0003617520910000031
wherein Y is the word number of the online identification statement, A is the word number difference, and R is the word number difference rate.
Optionally, after obtaining the first flow statement set, the method further includes: calculating the single-double word matching rate of each group of flow statements in the first flow statement set and the online identification statement; and screening a plurality of groups of flow statements with the single-word and double-word matching rate larger than a first preset value from the first flow statement set to obtain a second flow statement set.
Optionally, after obtaining the second flow statement set, the method further includes: calculating the Distance-editing similarity between each group of flow statements in the second flow statement set and the online identification statement by adopting a Levenshtein-Distance algorithm; and screening a plurality of groups of flow statements with the distance-editing similarity larger than a second preset value from the second flow statement set to obtain a third flow statement set.
Optionally, the step of obtaining the flow statements with the confidence degrees larger than the preset threshold value by matching among the plurality of sets of flow statements includes: calculating the matching degree of each group of flow sentences in the third flow sentence set and the pinyin set single characters of the online identification sentences; and determining the confidence of each group of flow statements according to the single-double word matching rate, the distance-editing similarity and the pinyin set single word matching degree of each group of flow statements and the online recognition statements in the third flow statement set, and screening the flow statements with the confidence greater than a preset threshold value from the third flow statement set as the best matching flow statements.
Optionally, before performing offline recognition processing on the acquired audio data in a voice offline recognition manner, the method further includes: and dynamically constructing an offline command word grammar file, wherein the offline command word grammar file is applied to offline recognition processing of the acquired audio data.
Optionally, the step of dynamically constructing the offline command word grammar file includes: receiving a voice event, identifying the voice event to obtain a corresponding text statement, and performing offline preprocessing on the text statement to obtain a dynamic parameter field, wherein the voice event comprises a flow voice event and/or a car control event; and acquiring a grammar file template, and replacing corresponding parameters in the grammar file template by adopting the dynamic parameter field so as to dynamically construct an offline command word grammar file.
Optionally, the step of performing offline preprocessing on the text statement to obtain a dynamic parameter field includes: performing two times of preprocessing on the text statement, wherein the first preprocessing is to perform Chinese character replacement processing and invalid statement deletion processing on the text statement, and the second preprocessing is to perform statement segmentation on the text statement after the first preprocessing to obtain a plurality of statement arrays; and packaging the preprocessed statement arrays into a plurality of rule items comprising rule names and rule contents, and connecting the rule names in the rule items in series into a sequence according to the original text statement semantics to obtain the dynamic parameter field.
Optionally, the step of performing sentence segmentation on the text sentence preprocessed for the first time includes: judging whether the current text sentence contains the number of the train number, if not, dividing and storing the text sentence by taking 16 characters as a group; if the number of the train number is included, the number part of the train number is extracted, and the rest characters are divided and stored according to 16 characters as a group.
Optionally, the dynamic parameter replacement term in the grammar file template includes at least one of a self-identification header, a grammar name, a slot statement and a dynamic parameter reservation term, and the dynamic parameter reservation term includes at least one of a newly added slot statement, a grammar body, a station name and a train number.
Optionally, the step of performing offline recognition processing on the acquired audio data in the voice offline recognition mode includes: storing the collected audio data in an audio buffer queue, calling the audio data from the audio buffer queue, and performing command word matching on the audio data by adopting the voice recognizer to obtain an offline recognition statement through analysis; replacing the off-line recognition statement with a flow statement standard format according to the dynamically constructed off-line command word grammar file,
optionally, after storing the collected audio data in an audio buffer queue, before retrieving the audio data from the audio buffer queue, the method further includes: and performing offline voice recognition basic parameter setting on the voice recognizer, wherein the offline voice recognition basic parameters comprise resource storage paths and/or matching result return types.
In order to achieve the above object, a second aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the CTC simulation training speech recognition processing method described above.
In order to achieve the above object, a third aspect of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the CTC simulation training speech recognition processing method is implemented.
The invention has at least the following technical effects:
(1) the invention carries out voice recognition optimization processing aiming at two voice recognition modes of on-line and off-line according to different network configuration conditions, thereby meeting the requirements of different use scenes and providing a plurality of man-machine interaction modes for users.
(2) The method dynamically constructs the off-line command word grammar file according to the process content and the scene information, thereby satisfying the identification of the process sentences and the off-line vehicle control commands in different processes and different scenes, being free from the influence of network factors, and having the characteristics of high response speed and high speech recognition precision.
(3) The invention is based on the voice recognition technology, converts voice content of conversation with other roles in the process of dispatcher simulated fault handling into text information through recognition and processing, and carries out voice/vehicle control event circulation through a CTC simulated training platform so as to complete the standard language assessment of machine automation.
(4) In the aspect of on-line voice dictation, the voice recognition effect is optimized mainly through two aspects of uploading hot words and recognition result matching processing, and the best matching process sentence and confidence coefficient are calculated by comparing the text similarity through single-double word matching, a distance-editing algorithm, pinyin set single word matching and the like in the stage of recognition result matching processing, so that the method has the characteristics of flexible recognition mode and high fault tolerance rate.
(5) According to the invention, according to the service of a third-party speech recognition library, the characteristics of the process speech event are combined, the accuracy of system speech recognition can be effectively improved through an optimization processing algorithm, the problem that process sentences are difficult to recognize can be effectively solved, the software development cost can be reduced, the system intelligence level is improved, and the product competitiveness is enhanced.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a flow chart of speech recognition processing information of a CTC simulation training platform according to an embodiment of the present invention;
fig. 2 is a flowchart of a CTC simulation training speech recognition processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart of a CTC simulation training speech online identification processing method according to a specific example of the present invention;
fig. 4 is a flowchart illustrating processing of an online speech dictation recognition result according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a dynamic construction of an offline command word grammar file according to an embodiment of the present invention;
fig. 6 is a flowchart of a CTC simulation training speech offline recognition processing method according to a specific example of the present invention.
Detailed Description
Reference will now be made in detail to the present embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
As described in the background art, in the railway field, an effective solution of machine assessment process standard expressions which have online and offline voice recognition capabilities and can improve the system voice recognition accuracy is still lacking. Therefore, this embodiment proposes a CTC (central Traffic control System, dispatch Centralized control System) simulation training speech recognition processing method, which is applied to a CTC simulation training platform, and a speech recognition processing information flow diagram of the CTC simulation training platform is shown in fig. 1. The SimFAS is simulation FAS radio station software carried by the invention and used for simulating communication between a dispatcher and other roles, the SCU (student machine) sends process content and scene related information to the SimFAS and is used for constructing an offline grammar file, matching an online recognition result, setting a call role button and prompting a process flow state, the SimFAS starts Mic recording equipment, acquires user speech content, performs speech recognition processing according to the user speech content, calculates a process statement most similar to the user speech content through algorithm processing of a recognition result by calling a third-party speech real-time dictation/offline command word interface, displays the recognition result to the user through a human-computer interface, and transmits the recognition result and confidence back to the SCU for the flow and automatic evaluation of speech events and vehicle control events in the process.
The CTC simulation training speech recognition processing method, the storage medium, and the electronic device according to the embodiment are described below with reference to the drawings.
Fig. 2 is a flowchart of a CTC simulation training speech recognition processing method according to an embodiment of the present invention. As shown in fig. 2, the method includes:
step S1: acquiring dynamic parameters of a rail transit emergency disposal flow, and forming a plurality of groups of flow statements according to the dynamic parameters of the emergency disposal flow.
Because different dispatching stations govern different stations and the section picture, CTC operation, equipment state, access state, fault information and train running state in the CTC simulation training platform change along with time, corresponding parameters in a grammar file template can be replaced according to the dynamic change of emergency disposal flow parameters to obtain a plurality of sets of flow statements after parameter replacement, and the plurality of sets of flow statements and keywords thereof are stored in a two-dimensional array and used for matching subsequent on-line identification statements.
Step S2: and collecting audio data sent by the calling object on line, converting the audio data into sentence texts by adopting a voice recognizer to obtain on-line recognition sentences, and returning the on-line recognition sentences in real time.
When the speech recognizer returns the online recognition statement in real time, the method further comprises the following steps: the speech recognizer dynamically corrects the online recognition sentences according to the conversation context and the sentence up-down relation of the conversation content.
After the audio data is collected, and before the audio data is converted, the method further comprises: and detecting the current network connection state, and if the current network connection state is in a disconnection state, switching the current online voice recognition mode to a voice offline recognition mode so as to perform offline recognition processing on the acquired audio data in the voice offline recognition mode.
It should be noted that, the speech recognizer is a third-party speech recognition device (such as a science fiction flight speech recognition device), the third-party speech recognition device is configured with a public cloud, and before uploading the audio data to the speech recognizer for conversion, the method further includes: uploading hot words to a public cloud, and performing online voice dictation basic parameter setting on a voice recognizer, wherein the hot words comprise at least one of a train number, flow statement keywords and a railway special name word library, and the online voice dictation basic parameters comprise at least one of punctuation addition parameters, language area parameters, text return format parameters and engine type parameters.
In this embodiment, the audio data sent by the call object may be collected online, and then a public cloud interface of a third-party speech recognition device, such as a scientific news fly speech recognition device, is invoked to convert the natural language audio data into a sentence text, obtain an online recognition sentence, and return the online recognition sentence in real time.
As a specific example, the CTC simulation training speech online identification processing method of the present embodiment will be described in detail with reference to fig. 3 and steps S21-S29.
Step S21, starting a recording device, collecting audio data in a PCM (Pulse Code Modulation) format with 16k sampling rate and 16bit single sound channel, and storing the audio data in an audio buffer queue;
step S22, detecting the network connection state, if detecting the network connection failure, switching to the speech off-line recognition mode, ending the calling process, if detecting the network connection success, then proceeding the next step;
step S23, the train number, the flow statement keywords and the railway special name word library are used as hot words and uploaded to the science news flying public cloud;
step S24, setting basic parameters of on-line voice dictation, such as punctuation addition, language area, return format and engine type;
step S25, a group of audio data is taken out from the audio buffer queue, a science news audio writing interface is called, and the audio data is converted into text information based on a natural language processing technology;
step S26, after the audio data are uploaded successfully, acquiring uplink flow for detecting the network connection state;
step S27, judging whether an intermediate recognition result is returned or not according to the flag bit, if the condition is true, reading and analyzing a Json file returned by the voice dictation, and acquiring downlink flow, otherwise, performing the next step;
step S28, judging the end state of the identification process, if the user actively hangs up or automatically hangs up when the silence overtime happens, then proceeding the next step, otherwise, entering step S25 again;
and step S29, writing the last frame of empty audio after the recognition is finished, and acquiring and analyzing the final recognition result to obtain the online recognition statement.
Further, before matching the plurality of sets of flow statements with the online identification statement, the method further includes online preprocessing the flow statements, that is, processing the contents of the flow statements as shown in fig. 4.
Wherein, the on-line preprocessing of the flow statement comprises at least one of the following processing modes: when the flow statement contains a plurality of train numbers, carrying out multi-train statement segmentation; deleting items of content repetition or un-replaced items of the dynamic parameters of the emergency disposal flow in the plurality of sets of flow statements; replacing characters which do not accord with the dispatching command standard expression in the flow statement with Chinese characters; changing the train number to meet the standard reading requirement; and deleting the symbols which do not accord with the dispatching command standard in the flow statement.
Specifically, since the dispatcher can only call one train at a time, a voice/control event including a plurality of calling train numbers needs to be divided into separate sentences, such as "driver G102# G107, please stop immediately" divided into "driver G107, please stop immediately" and "driver G102, please stop immediately. In addition, deleting repeated content and un-replaced items of dynamic parameters in the flow statement array, and replacing special meaning characters in the flow statement and the keywords with Chinese characters according to the requirement of the scheduling command standard expression, such as replacing FG with inverse high. In this embodiment, it is also necessary to replace all the numbers in the flow sentence with the chinese characters, and delete the symbols included in the sentence by using the regular expression, so as to avoid affecting the matching rate, and make the train number support two reading methods, such as "a driver who is higher by one and nine times, please stop immediately" and "a driver who is higher by one and nine times, please stop immediately".
Step S3: and matching the plurality of sets of flow statements with the online identification statement so as to obtain the flow statement with the confidence coefficient larger than the preset threshold value from the plurality of sets of flow statements.
Specifically, after the online preprocessing of the flow statement, the method further includes: acquiring the word number of each group of flow sentences and the word number of online identification sentences; determining the number difference and the number difference rate of each group of flow sentences and online identification sentences according to the number of words of each group of flow sentences and the number of words of the online identification sentences, and screening a plurality of groups of flow sentences of which the number differences and the number difference rates both meet preset conditions from each group of flow sentences to obtain a first flow sentence set, wherein the preset conditions are expressed by the following formulas:
Figure BDA0003617520910000091
wherein Y is the word number of the online identification statement, A is the word number difference, and R is the word number difference rate.
As shown in FIG. 4, the present embodiment matches the length limit (word count) for a flow statementLength matching limits). For example, if X is the number of words in the flow sentence and Y is the number of words in the recognition result, the difference in the number of words is a ═ X-Y |, and the rate of the difference in the number of words is
Figure BDA0003617520910000092
And then screening a plurality of sets of flow sentences of which A and R both meet the preset conditions from the sets of flow sentences.
In one embodiment of the invention, after obtaining the first set of flow statements, the method further comprises: calculating the single-double word matching rate of each group of flow statements in the first flow statement set and the online recognition statements; and screening a plurality of groups of flow statements with the single-word and double-word matching rate larger than a first preset value from the first flow statement set to obtain a second flow statement set.
Specifically, the single word matching rate is set as Δ S, the double word matching rate is set as Δ D, and then the single and double word matching rate Δ M of the flow statement and the online recognition statement is calculated as:
Figure BDA0003617520910000093
wherein, if the number of the same words of the single character is M, and the number of the same words of the double character is N, then:
Figure BDA0003617520910000094
Figure BDA0003617520910000095
thus, it is possible to obtain:
Figure BDA0003617520910000096
in this embodiment, the first preset value may be set to 0.7, that is, the contents of the flow statements with Δ M greater than or equal to 0.7 in the first flow statement set are similar to the contents of the online identification statements, then the partial flow statements are screened out for the next operation, and if all the flow statements Δ M is less than 0.7, the online identification statements are output in place of the standard format of the return flow statements.
In one embodiment of the invention, after obtaining the second set of flow statements, the method further comprises: calculating the Distance-editing similarity between each group of flow statements in the second flow statement set and the online identification statement by adopting a Levenshtein-Distance (Distance-editing) algorithm; and screening a plurality of groups of flow statements with the distance-editing similarity larger than a second preset value from the second flow statement set to obtain a third flow statement set.
Specifically, since the single-double word matching method can only calculate the word similarity of two character strings, even if the sentence is inverted, the matching rate is still not affected, and the relevance and the coherence of words in the sentence are easily ignored, therefore, the distance-editing algorithm is introduced in the embodiment to calculate the minimum editing operation times between the sentences, the problem of 'homonymy and disagreement' can be avoided, and the ambiguity between the matched sentences is further reduced.
For example, let character string str1 be "red band appears on the upper line between north and east stations without tin in Suzhou of the major tune", character string str2 be "red band appears on the upper line between north and east stations without tin in Suzhou of the major tune", and record the number of times it takes to convert str1 to str2 with two-dimensional array x [ i, j ], the basic steps of the algorithm are as follows:
(1) if the length of str1 or str2 is 0, the shortest editing distance returns the length of another character string;
(2) constructing an x [ i, j ] array matrix for storing the conversion operation times among character strings, wherein the minimum time for converting str1 into str2 is matrix [19] [20 ];
(3) initializing an x [ i, j ] array matrix, wherein the first row and the first column are incremented by 0, as shown in table 1 below:
TABLE 1 str1 and str2 character comparison table
Figure BDA0003617520910000101
Figure BDA0003617520910000111
(4) Starting with the first character "Note" of str2, matrix [1] [1] to matrix [20] [1] are compared, and if the two characters are equal, matrix [ i ] [0] and matrix [1] [ i-1] are added by 1, and matrix [ i-1] [0] is added by 0, then:
matrix[i][1]=Min(matrix[i][0],matrix[1][i-1],matrix[i-1][0]) (6)
if the two characters are not equal:
matrix[i][1]=Min(matrix[i][0],matrix[1][i-1],matrix[i-1][0])+1 (7)
(5) after completing the 19-column scanning in sequence, the shortest edit distance matrix [19] [20] is obtained to be 5, which is specifically shown in the following table 2:
TABLE 2 Str1 and Str2 shortest edit distance operation Table
Figure BDA0003617520910000112
Figure BDA0003617520910000121
(6) Then the distance-edit similarity of two strings str1 and str2 is:
Figure BDA0003617520910000122
in this embodiment, the second preset value may be set to 0.7 to filter the flow statements with a large difference between the relevance and the coherence between the statements, if the flow statement array is not empty, the next step is performed, otherwise, the online recognition statement is output in place of the standard format of the return flow statement.
In an embodiment of the present invention, the step of obtaining a flow statement with a confidence level greater than a preset threshold from a plurality of sets of flow statements by matching includes: calculating the matching degree of each group of flow sentences in the third flow sentence set and the pinyin set single characters of the online recognition sentences; and determining the confidence coefficient of each group of flow statements according to the single-double word matching rate, the distance-editing similarity and the pinyin set single word matching degree of each group of flow statements and the online recognition statements in the third flow statement set, and screening the flow statements with the confidence coefficient larger than a preset threshold value from the third flow statement set as the best matching flow statements.
Specifically, the similarity between the flow statement and the online recognition statement processed by the Levenshtein-Distance algorithm is high, but the matching degree between the single and double words and the Distance-editing algorithm is reduced due to the influence of the homophone, so that the problem that the final result has matching errors or low confidence coefficient is possibly caused, therefore, the homophone error can be compensated by the pinyin single word matching processing, and the specific processing method comprises the following steps:
(1) converting the flow statement, the keywords thereof and the online identification statement into a pinyin set and storing the pinyin set;
(2) searching a flow sentence keyword pinyin set in the online recognition sentence pinyin set, if the flow sentence keyword pinyin set exists, retaining the flow sentence, and if the flow sentence keyword pinyin set does not exist, rejecting the flow sentence keyword pinyin set; after the calculation is finished, if the flow statement array is empty, namely the third flow statement set is empty, replacing the online identification statement with the standard format of the return flow statement and outputting, and if not, performing the next step;
(3) sequentially calculating the matching degree delta P of the single characters in the pinyin set of the flow sentence and the online recognition sentence;
(4) the single-double word matching rate delta M, the distance-editing similarity delta T and the pinyin set single word matching degree delta P are weighted and averaged to obtain the matching degree, namely confidence coefficient delta Z as follows:
Figure BDA0003617520910000123
and if the number of the flow statement array items in the third flow statement set is not 0, outputting confidence and outputting the flow statement with the highest matching degree, otherwise, replacing the on-line recognition statement with the standard format of the return flow statement and outputting the flow statement for flow circulation and automatic evaluation.
Fig. 3 is a flowchart of a CTC simulation training voice online identification processing method, and fig. 4 is a flowchart of an online voice dictation identification result processing method. As shown in fig. 3 and 4, in this embodiment, the emphasis of online speech dictation is on hotword and recognition result processing, uploading the hotword is helpful to improve the recognition rate of the proper noun, and the recognition result processing calculates the closest flow standard term through a similarity matching algorithm, so that the problem of disordered recognition content can be solved, and the recognition effect is greatly improved. In addition, when the dispatcher calls out, the recording device is started firstly, the audio data is cached, then whether the network state is normal or not is detected, the situation that the communication fails and recognition cannot be conducted can be avoided, then the hotword extracted from the flow statement is uploaded to the cloud of the science university communication Fei, the uploading flow and the downloading flow need to be obtained in the recognition process to detect the connection state, and the calling mode of other interfaces is similar to the voice offline recognition mode. The identification result is divided into an intermediate result and a final result, the intermediate result is real-time identification content dynamically corrected according to the context and is only used for displaying process identification content to a user, the final result is an identification statement obtained last time after the call is ended, the closest flow statement is calculated through a similarity algorithm, and if the identification content is not similar to all the flow statements, the identification content replaced by the flow standard format is output.
It should be noted that, because the recognition effect of the science news flight voice dictation service on the relevant proper nouns of the railway is not good, after the online voice recognition is finished, the invention adopts the method combination of the methods of flow statement preprocessing, word number length limiting rule, single-double word matching, edit-distance algorithm, pinyin set single word matching and the like to calculate the flow statement with the highest similarity to the online recognition statement, if the calculated flow statement array capacity is not empty, the flow statement with the highest matching rate and the confidence thereof are taken out and output to the student computer SCU, otherwise, the online recognition statement is replaced by the flow standard format and output, so that the SCU performs flow circulation and automatic evaluation according to the output result of the simulated FAS radio station software SimFAS.
Further, as described above, when it is detected that the current network connection state is in the disconnection state, the current online speech recognition mode may be switched to the speech offline recognition mode, so as to perform offline recognition processing on the acquired audio data in the speech offline recognition mode.
In an embodiment of the present invention, before performing offline recognition processing on the acquired audio data by an offline speech recognition method, the method further includes: and dynamically constructing an offline command word grammar file, wherein the offline command word grammar file is applied to offline recognition processing of the acquired audio data.
Specifically, as described above, since different stations under different jurisdictions of different dispatching desks and the section picture, CTC operation, device state, access state, fault information and train operation state in the CTC simulation training platform change with time, an offline command word grammar file needs to be dynamically constructed according to the parameter replacement result and scene information of the emergency handling process, so as to support the voice recognition of the process voice event, the train control event and the process external train control operation.
As shown in fig. 5, the step of dynamically constructing the offline command word grammar file includes:
step S10: the method comprises the steps of receiving a voice event, identifying the voice event to obtain a corresponding text statement, and performing offline preprocessing on the text statement to obtain a dynamic parameter field, wherein the voice event comprises a process voice event and/or a car control event.
In this embodiment, the step of performing offline preprocessing on the text statement to obtain the dynamic parameter field includes: and performing two times of preprocessing on the text sentences, wherein the first preprocessing comprises performing Chinese character replacement processing and invalid sentence deletion processing on the text sentences, and the second preprocessing comprises performing sentence segmentation on the text sentences after the first preprocessing to obtain a plurality of sentence arrays.
The step of sentence segmentation of the text sentence after the first preprocessing comprises the following steps: judging whether the current text sentence contains the train number, if not, dividing and storing the text sentence by taking 16 characters as a group; if the number of the train number is included, the number part of the train number is extracted, and the rest characters are divided and stored according to 16 characters as a group.
Furthermore, the preprocessed statement arrays can be packaged into a plurality of rule items comprising rule names and rule contents, and the rule names in the rule items are connected in series into a sequence according to the original text statement semantics, so as to obtain the dynamic parameter field.
Specifically, the flow voice event and the vehicle control event sentence can be preprocessed for the first time according to the characteristics of scheduling command phrases, so that the accuracy of offline voice recognition is improved, and the processing mode is as follows:
(1) the special meaning letters of the railway are replaced by Chinese characters, such as: s1LQ is replaced with a "last exit";
(2) replacing kilometer values in kilometers with Chinese characters, such as: 800 in K1805+800 is replaced with "eight hundred";
(3) the train speed limit numerical value is replaced by Chinese characters, such as: 160 is replaced by 'one hundred six';
(4) the numbers are replaced by Chinese characters, such as: 2 is replaced by 'two';
(5) removing special symbols in the sentences, and eliminating repeated invalid sentences containing parameters and the like.
Further, the text sentence after the first preprocessing can be preprocessed for the second time, for example, the content of the flow sentence is divided according to the rule defined by the command word grammar file of the departure line of science news, and the processing modes include two types:
(1) the text sentence content does not contain the train number, and the whole sentence is segmented and stored by taking 16 characters as groups;
(2) the text sentence content contains the train number, and the train number part is extracted, wherein the train number needs to support two reading methods, such as: "G1974" can be read as "one nine seven four higher" or "one nine four corners higher", and the rest is stored by dividing the 16 characters into groups.
Furthermore, a plurality of sentence arrays after two times of preprocessing can be packaged into a plurality of rules (consisting of rule names and rule contents), the rule names of the same sentence before division are connected in series into a sequence through rule reference, and a grammar file body is formed by the plurality of rules and the sequence. In the embodiment, in the process of dynamically constructing the off-line command word grammar file, the software resource consumption can be reduced by multiplexing the rule names with the same rule content, and the grammar constructing efficiency is improved. The specific treatment process is as follows:
(1) suppose the sentence is: "G1805 driver, the adjacent train has limited speed, permit the driver of the vehicle-mounted machinist the non-intersection side get-off check. "
(2) The sentence division process is that,
high one-eight-zero five, high one-eight-hole five "
② secondary driver, adjacent train limited speed and permitted vehicle-mounted "
Checking off the side and getting off the vehicle when the mechanic does not meet "
(3) Is packaged into a plurality of rule items,
[ phi < voice11 >: one-eight-zero-five higher | one-eight-hole-five higher;
< voice 12 >: the secondary driver, the adjacent train has limited speed and permits to carry on the train;
(iii) < voice 13 >: the mechanic does not meet the checking of the vehicle on the side and off the vehicle;
(4) the above 3 rules are concatenated into a sequence by rule reference,
<voicetext1>:<voice11><voice 12><voice 13>;
step S20: and acquiring a grammar file template, and replacing corresponding parameters in the grammar file template by adopting dynamic parameter fields to dynamically construct an offline command word grammar file.
The dynamic parameter replacement items in the grammar file template comprise at least one of a self-identification head, a grammar name, a slot statement and a dynamic parameter retention word, and the dynamic parameter retention word comprises at least one of a newly added slot statement, a grammar main body, a station name and a train number.
In this embodiment, the parameter fields in the grammar file template may be sequentially replaced with the dynamic parameter fields obtained in step S10 to complete the dynamic construction of the offline command word grammar file, so as to prepare for the offline speech recognition described below.
In one embodiment of the present invention, the step of performing offline recognition processing on the collected audio data in a voice offline recognition manner includes: storing the collected audio data into an audio buffer queue, calling the audio data from the audio buffer queue, and performing command word matching on the audio data by adopting a voice recognizer to obtain an offline recognition statement by analysis; and replacing the offline recognition statement with a flow statement standard format according to the dynamically constructed offline command word grammar file.
After the collected audio data are stored in the audio buffer queue and before the audio data are called from the audio buffer queue, the method further comprises the following steps: and the speech recognizer sets basic parameters for off-line speech recognition, wherein the basic parameters for off-line speech recognition comprise a resource storage path and/or a matching result return type.
Specifically, the audio data can be collected, then the science popularization communication interface is called to obtain the offline recognition statement, and contents such as kilometer posts, speed limit values, train numbers, special characters and the like in the offline recognition statement are replaced into a flow standard format.
As a specific example, the CTC simulation training speech offline recognition processing method of the present embodiment will be described in detail with reference to fig. 6 and steps S31-S36.
Step S31, selecting and constructing different grammar files according to the calling object, calling a train driver to construct a train control grammar, and calling other roles such as a station attendant to construct a flow statement grammar;
step S32, starting a recording device, collecting PCM format audio data with 16k sampling rate and 16bit single sound channel, and storing the data in an audio buffer queue;
step S33, setting basic parameters of off-line speech recognition, such as resource storage path, result return type, etc.;
step S34, a group of audio data is taken out from the audio buffer queue, then a science news audio writing interface is called, and the speech recognizer matches command words based on grammar rules and speech characteristic information;
step S35, judging the end state of the identification process, if the user actively hangs up or automatically hangs up when the silence overtime happens, then proceeding the next step, otherwise, entering step S34 again;
and step S36, writing the last frame of empty audio after the recognition is finished, acquiring and analyzing the recognition result to obtain an offline recognition statement, and replacing the offline recognition statement with a flow statement standard format.
In this embodiment, since the off-line speech recognition method performs path matching according to the grammar network and the speech feature information, the grammar network is a necessary condition for off-line speech recognition. In addition, because different fault handling flow statement contents are different, dynamic parameter calculation results are different in different scenes, and syntax files cannot be kept consistent due to reasons such as changes of flow statement contents in a flow circulation process, the SimFAS constructs the dynamic syntax files according to the flow statements and scene information sent by the SCU in this embodiment.
In the off-line voice recognition process, when a dispatcher calls, the recording equipment is started firstly, audio data are cached, then scientific communication fly interfaces such as user verification, grammar construction and parameter setting are called in sequence to complete preparation work, the cached audio is read continuously and written into a dictation interface until the dispatcher actively hangs up or is silent for 3 seconds and automatically hangs up, a matching result of the voice recognizer is obtained through a result return interface, and finally the recognition result is replaced by a flow standard format for interface display and flow circulation.
In summary, the invention can perform speech recognition optimization processing aiming at two speech recognition modes of online and offline according to different network configuration conditions, thereby meeting the requirements of different use scenes and providing a plurality of man-machine interaction modes for users; the invention can also dynamically construct an offline command word grammar file according to the process content and the scene information, thereby satisfying the identification of process sentences and offline vehicle control commands in different processes and different scenes, being free from the influence of network factors, and having the characteristics of high response speed and high voice identification precision; the invention can also be based on a voice recognition technology, the voice content which is conversed with other roles in the process of the simulated fault handling of the dispatcher is recognized and processed to be converted into text information, and voice/vehicle control event circulation is carried out through a CTC simulated training platform, so that the standard language assessment of the machine automation can be conveniently completed; in addition, the invention optimizes the voice recognition effect mainly through two aspects of the upper hot word and recognition result matching processing in the aspect of on-line voice dictation, for example, the text similarity comparison is carried out in the modes of single-double word matching, distance-editing algorithm, pinyin set single word matching and the like in the stage of recognition result matching processing, and the optimal matching flow statement and confidence coefficient are calculated, so that the invention has the characteristics of flexible recognition mode and high fault-tolerant rate; and the invention can also effectively improve the accuracy of system speech recognition by optimizing the processing algorithm according to the service of a third-party speech recognition library and combining the characteristics of the process speech event, can effectively solve the problem that the process statement is difficult to recognize, can reduce the software development cost, improve the intelligent level of the system and enhance the product competitiveness.
Further, the present invention also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the CTC simulation practical training speech recognition processing method is implemented.
Furthermore, the invention also provides electronic equipment which comprises a processor and a memory, wherein the memory is stored with a computer program, and when the computer program is executed by the processor, the CTC simulation practical training voice recognition processing method is realized.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (19)

1.一种CTC仿真实训语音识别处理方法,其特征在于,包括:1. a CTC simulation training speech recognition processing method, is characterized in that, comprises: 获取轨道交通应急处置流程动态参数,并根据应急处置流程动态参数形成多组流程语句;Obtain the dynamic parameters of the emergency disposal process of rail transit, and form multiple sets of process statements according to the dynamic parameters of the emergency disposal process; 在线采集呼叫对象发出的音频数据,采用语音识别器将所述音频数据转换为语句文本,得到在线识别语句,并实时返回所述在线识别语句;Collecting the audio data sent by the call object online, converting the audio data into sentence text using a speech recognizer, obtaining the online recognition sentence, and returning the online recognition sentence in real time; 将多组流程语句与所述在线识别语句进行匹配,以从多组流程语句中匹配得到置信度大于预设阈值的流程语句。Matching multiple groups of process statements with the online identification statement, so as to obtain a process statement with a confidence greater than a preset threshold from the multiple sets of process statements. 2.如权利要求1所述的CTC仿真实训语音识别处理方法,其特征在于,所述语音识别器实时返回所述在线识别语句时,所述方法还包括:所述语音识别器根据对话语境及对话内容上下语句关系动态修正所述在线识别语句。2. CTC simulation training speech recognition processing method as claimed in claim 1, is characterized in that, when described speech recognizer returns described online recognition sentence in real time, described method also comprises: described speech recognizer is based on dialogue words The online recognition sentence is dynamically revised according to the context and the contextual sentence relationship of the dialogue content. 3.如权利要求1所述的CTC仿真实训语音识别处理方法,其特征在于,在采集音频数据之后,对所述音频数据进行转换之前,所述方法还包括:检测当前网络连接状态,若当前网络连接状态处于断开连接状态,则将当前在线语音识别方式切换至语音离线识别方式,以通过所述语音离线识别方式对采集的音频数据进行离线识别处理。3. CTC simulation training speech recognition processing method as claimed in claim 1, is characterized in that, after collecting audio data, before described audio data is converted, described method also comprises: detect current network connection state, if If the current network connection state is in the disconnected state, the current online voice recognition mode is switched to the voice offline recognition mode, so as to perform offline recognition processing on the collected audio data through the voice offline recognition mode. 4.如权利要求1所述的CTC仿真实训语音识别处理方法,其特征在于,所述语音识别器为第三方语音识别设备,所述第三方语音识别设备配置有公有云,在将所述音频数据上传至语音识别器进行转换之前,所述方法还包括:向所述公有云上传热词,并对所述语音识别器进行在线语音听写基本参数设置,所述热词包括车次号、流程语句关键字和铁路专有名词库中的至少一种,所述在线语音听写基本参数包括标点添加参数、语言区域参数、文本返回格式参数和引擎类型参数中的至少一种。4. CTC simulation training speech recognition processing method as claimed in claim 1, is characterized in that, described speech recognizer is a third-party speech recognition device, and described third-party speech recognition device is configured with public cloud, in the described Before the audio data is uploaded to the speech recognizer for conversion, the method further includes: uploading a hot word to the public cloud, and setting basic parameters of online voice dictation on the speech recognizer, the hot word including the vehicle number, the process flow At least one of sentence keywords and railway proper noun database, and the basic parameters of online voice dictation include at least one of punctuation addition parameters, language area parameters, text return format parameters and engine type parameters. 5.如权利要求1所述的CTC仿真实训语音识别处理方法,其特征在于,将多组流程语句与所述在线识别语句进行匹配之前,所述方法还包括:对所述流程语句进行在线预处理。5. CTC simulation training speech recognition processing method as claimed in claim 1, is characterized in that, before multiple groups of flow statement and described online recognition statement are matched, described method also comprises: described flow statement is carried out online preprocessing. 6.如权利要求5所述的CTC仿真实训语音识别处理方法,其特征在于,对所述流程语句进行在线预处理包括以下处理方式中的至少一种:当流程语句含有多车次号时,进行多车次语句分割;删除多组流程语句中内容重复或应急处置流程动态参数未替换的项;将流程语句中不符合调度指挥标准用语的字符替换为汉字;更改车次号以满足标准读法要求;删除流程语句中不符合调度指挥标准的符号。6. CTC simulation training speech recognition processing method as claimed in claim 5, is characterized in that, carrying out online preprocessing to described flow statement and comprising at least one in the following processing modes: when flow statement contains multi-vehicle number, Perform multi-vehicle sentence segmentation; delete multiple sets of process statements with duplicate content or items that are not replaced by dynamic parameters of the emergency response process; replace characters in the process statements that do not meet the standard terms of dispatch and command with Chinese characters; change the train number to meet the standard reading requirements ;Delete the symbols that do not meet the dispatch command standard in the process statement. 7.如权利要求6所述的CTC仿真实训语音识别处理方法,其特征在于,在对所述流程语句进行在线预处理之后,所述方法还包括:7. CTC simulation training speech recognition processing method as claimed in claim 6, is characterized in that, after carrying out online preprocessing to described flow statement, described method also comprises: 获取各组流程语句的字数和在线识别语句的字数;Obtain the word count of each group of process statements and the word count of online recognition statements; 根据各组流程语句的字数和在线识别语句的字数确定各组流程语句与在线识别语句的字数差和字数差率,并从各组流程语句中筛选出字数差和字数差率均符合预设条件的若干组流程语句,得到第一流程语句集合,所述预设条件采用如下公式表示:According to the number of words in each group of process statements and the number of words in the online recognition statement, determine the word number difference and word number difference rate between each group of process statements and the online recognition statement, and filter out the word number difference and word number difference rate from each group of process statements that meet the preset conditions Several groups of process statements are obtained, and the first process statement set is obtained, and the preset condition is expressed by the following formula:
Figure FDA0003617520900000021
Figure FDA0003617520900000021
其中,Y为在线识别语句的字数,A为字数差,R为字数差率。Among them, Y is the word count of the online recognition sentence, A is the word count difference, and R is the word count difference rate.
8.如权利要求7所述的CTC仿真实训语音识别处理方法,其特征在于,在得到第一流程语句集合之后,所述方法还包括:8. CTC simulation training speech recognition processing method as claimed in claim 7, is characterized in that, after obtaining the first flow statement set, described method also comprises: 计算所述第一流程语句集合中各组流程语句与所述在线识别语句的单双字匹配率;Calculating the single-double-word matching rate between each group of process statements in the first process statement set and the online recognition statement; 从所述第一流程语句集合中筛选出单双字匹配率大于第一预设值的若干组流程语句,得到第二流程语句集合。From the first process statement set, several groups of process statements whose single-double-word matching rate is greater than the first preset value are screened out to obtain a second process statement set. 9.如权利要求8所述的CTC仿真实训语音识别处理方法,其特征在于,在得到第二流程语句集合之后,所述方法还包括:9. CTC simulation training speech recognition processing method as claimed in claim 8, is characterized in that, after obtaining the second process statement set, described method also comprises: 采用Levenshtein-Distance算法计算所述第二流程语句集合中各组流程语句与所述在线识别语句的距离-编辑相似度;Using the Levenshtein-Distance algorithm to calculate the distance-editing similarity between each group of process statements in the second process statement set and the online recognition statement; 从所述第二流程语句集合中筛选出距离-编辑相似度大于第二预设值的若干组流程语句,得到第三流程语句集合。From the second process statement set, several groups of process statements whose distance-editing similarity is greater than the second preset value are selected to obtain a third process statement set. 10.如权利要求9所述的CTC仿真实训语音识别处理方法,其特征在于,所述从多组流程语句中匹配得到置信度大于预设阈值的流程语句的步骤,包括:10. CTC simulation training speech recognition processing method as claimed in claim 9, is characterized in that, the described step that obtains the flow statement that confidence degree is greater than preset threshold value matches from multiple groups of flow statement, comprises: 计算所述第三流程语句集合中各组流程语句与所述在线识别语句的拼音集单字匹配度;Calculate the matching degree of each group of process statements in the third process statement set and the single-character matching degree of the pinyin set of the online recognition statement; 根据所述第三流程语句集合中各组流程语句与在线识别语句的单双字匹配率、距离-编辑相似度和拼音集单字匹配度确定各组流程语句的置信度,并从所述第三流程语句集合中筛选出置信度大于预设阈值的流程语句作为最佳匹配流程语句。The confidence level of each group of process statements is determined according to the single-double-word matching rate, distance-editing similarity, and single-word matching degree of each group of process statements in the third process statement set and online recognition sentences, and the third From the process statement set, a process statement with a confidence greater than a preset threshold is screened out as the best matching process statement. 11.如权利要求3所述的CTC仿真实训语音识别处理方法,其特征在于,在通过语音离线识别方式对采集的音频数据进行离线识别处理之前,所述方法还包括:动态构建离线命令词语法文件,所述离线命令词语法文件应用于所采集的音频数据的离线识别处理。11. CTC simulation training speech recognition processing method as claimed in claim 3, is characterized in that, before the audio data collected is carried out off-line recognition processing by speech off-line recognition mode, described method also comprises: dynamic construction off-line command word A grammar file, the offline command word grammar file is applied to the offline recognition processing of the collected audio data. 12.如权利要求11所述的CTC仿真实训语音识别处理方法,其特征在于,所述动态构建离线命令词语法文件的步骤,包括:12. CTC simulation training speech recognition processing method as claimed in claim 11, is characterized in that, the step of described dynamic construction off-line command word grammar file, comprises: 接收语音事件,识别所述语音事件得到相应的文本语句,并对所述文本语句进行离线预处理,得到动态参数字段,所述语音事件包括流程语音事件和/或控车事件;Receive a voice event, identify the voice event to obtain a corresponding text sentence, and perform offline preprocessing on the text sentence to obtain a dynamic parameter field, where the voice event includes a process voice event and/or a car control event; 获取语法文件模板,采用所述动态参数字段对所述语法文件模板中相应的参数进行替换,以动态构建离线命令词语法文件。Obtain a grammar file template, and use the dynamic parameter field to replace the corresponding parameters in the grammar file template to dynamically construct an offline command word grammar file. 13.如权利要求12所述的CTC仿真实训语音识别处理方法,其特征在于,对所述文本语句进行离线预处理,得到动态参数字段的步骤,包括:13. CTC simulation training speech recognition processing method as claimed in claim 12, is characterized in that, carrying out off-line preprocessing to described text sentence, the step that obtains dynamic parameter field, comprises: 对所述文本语句进行两次预处理,其中,第一次预处理为对所述文本语句进行汉字替换处理和无效语句删除处理,第二次预处理为对第一次预处理后的文本语句进行语句分割,得到多个语句数组;The text statement is preprocessed twice, wherein the first preprocessing is to perform Chinese character replacement processing and invalid statement deletion processing on the text statement, and the second preprocessing is to perform the first preprocessed text statement. Perform statement segmentation to obtain multiple statement arrays; 将预处理后的多个语句数组封装成包括规则名称和规则内容的多个规则项,并按照原文本语句语义,将多个规则项中的规则名称串联成序列,以得到所述动态参数字段。Encapsulate the preprocessed multiple statement arrays into multiple rule items including the rule name and rule content, and concatenate the rule names in the multiple rule items into a sequence according to the semantics of the original text statement to obtain the dynamic parameter field . 14.如权利要求13所述的CTC仿真实训语音识别处理方法,其特征在于,对第一次预处理后的文本语句进行语句分割的步骤,包括:14. CTC simulation training speech recognition processing method as claimed in claim 13, is characterized in that, the step of sentence segmentation is carried out to the text sentence after preprocessing for the first time, comprising: 判断当前文本语句是否包含车次号,若不包含车次号,则将所述文本语句以16个字符为组进行分割保存;Judging whether the current text statement contains the train number, if it does not contain the train number, the text statement is divided and saved in groups of 16 characters; 若包含车次号,则提取车次号部分,并将剩余字符按照16个字符为组进行分割保存。If the train number is included, the train number part is extracted, and the remaining characters are divided into groups of 16 characters and saved. 15.如权利要求12所述的CTC仿真实训语音识别处理方法,其特征在于,所述语法文件模板中的动态参数替换项包括自标识头、语法名称、槽声明和动态参数保留字项中的至少一种,所述动态参数保留字项包括新增槽声明、语法主体、车站名和车次号中的至少一种。15. CTC simulation training speech recognition processing method as claimed in claim 12, is characterized in that, the dynamic parameter replacement item in described grammar file template comprises in self-identifying header, grammar name, slot declaration and dynamic parameter reserved word item At least one of the dynamic parameter reserved words includes at least one of a newly added slot declaration, a grammar body, a station name and a train number. 16.如权利要求11所述的CTC仿真实训语音识别处理方法,其特征在于,所述通过所述语音离线识别方式对采集的音频数据进行离线识别处理的步骤,包括:16. CTC simulation training speech recognition processing method as claimed in claim 11, is characterized in that, the described step of carrying out offline recognition processing to the audio data collected by described speech offline recognition mode, comprises: 将采集的音频数据存放至音频缓存队列,从所述音频缓存队列中调取所述音频数据,并采用所述语音识别器对所述音频数据进行命令词匹配,以解析得到离线识别语句;The audio data collected is stored in the audio buffer queue, and the audio data is retrieved from the audio buffer queue, and the voice recognition device is used to carry out command word matching on the audio data, so as to parse and obtain an off-line recognition sentence; 根据动态构建的离线命令词语法文件,将所述离线识别语句替换为流程语句标准格式。According to the dynamically constructed offline command word syntax file, the offline recognition sentence is replaced with the standard format of the process sentence. 17.如权利要求16所述的CTC仿真实训语音识别处理方法,其特征在于,在将采集的音频数据存放至音频缓存队列之后,从所述音频缓存队列中调取所述音频数据之前,所述方法还包括:对所述语音识别器进行离线语音识别基本参数设置,所述离线语音识别基本参数包括资源存放路径和/或匹配结果返回类型。17. CTC simulation training speech recognition processing method as claimed in claim 16, is characterized in that, after the audio data of collection is deposited in the audio buffer queue, before calling the described audio data from the described audio buffer queue, The method further includes: setting basic parameters of offline speech recognition for the speech recognizer, where the basic parameters of offline speech recognition include a resource storage path and/or a matching result return type. 18.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时,实现如权利要求1-17中任一项所述的CTC仿真实训语音识别处理方法。18. A computer-readable storage medium having a computer program stored thereon, characterized in that, when the computer program is executed by the processor, the CTC simulation training voice as described in any one of claims 1-17 is realized Identify processing methods. 19.一种电子设备,其特征在于,包括处理器和存储器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时,实现如权利要求1-17中任一项所述的CTC仿真实训语音识别处理方法。19. An electronic device, characterized in that it comprises a processor and a memory, and a computer program is stored on the memory, and when the computer program is executed by the processor, the computer program according to any one of claims 1-17 is implemented. The above-mentioned CTC simulation training speech recognition processing method.
CN202210452691.3A 2022-04-27 2022-04-27 CTC simulation training speech recognition processing method, storage medium and electronic device Active CN114882886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210452691.3A CN114882886B (en) 2022-04-27 2022-04-27 CTC simulation training speech recognition processing method, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210452691.3A CN114882886B (en) 2022-04-27 2022-04-27 CTC simulation training speech recognition processing method, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN114882886A true CN114882886A (en) 2022-08-09
CN114882886B CN114882886B (en) 2024-10-01

Family

ID=82672430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210452691.3A Active CN114882886B (en) 2022-04-27 2022-04-27 CTC simulation training speech recognition processing method, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN114882886B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092342A (en) * 2022-11-18 2023-05-09 四川大学 A method and system for automatic response and quality assessment of controller simulation training

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102862587A (en) * 2012-08-20 2013-01-09 泉州市铁通电子设备有限公司 Method and equipment for analyzing rolling stock and locomotive inter-control voice of railways
EP2798634A1 (en) * 2011-12-29 2014-11-05 Intel Corporation Speech recognition utilizing a dynamic set of grammar elements
CN109204007A (en) * 2018-08-29 2019-01-15 江西理工大学 A kind of unpiloted suspension type magnetic suspension train and its control method
US20190332915A1 (en) * 2018-04-26 2019-10-31 Wipro Limited Method and system for interactively engaging a user of a vehicle
US20190371295A1 (en) * 2017-03-21 2019-12-05 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for speech information processing
CN111326147A (en) * 2018-12-12 2020-06-23 北京嘀嘀无限科技发展有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112017642A (en) * 2019-05-31 2020-12-01 华为技术有限公司 Method, device and equipment for speech recognition and computer readable storage medium
CN112435664A (en) * 2020-11-11 2021-03-02 郑州捷安高科股份有限公司 Evaluation system and method based on voice recognition and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2798634A1 (en) * 2011-12-29 2014-11-05 Intel Corporation Speech recognition utilizing a dynamic set of grammar elements
CN102862587A (en) * 2012-08-20 2013-01-09 泉州市铁通电子设备有限公司 Method and equipment for analyzing rolling stock and locomotive inter-control voice of railways
US20190371295A1 (en) * 2017-03-21 2019-12-05 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for speech information processing
US20190332915A1 (en) * 2018-04-26 2019-10-31 Wipro Limited Method and system for interactively engaging a user of a vehicle
CN109204007A (en) * 2018-08-29 2019-01-15 江西理工大学 A kind of unpiloted suspension type magnetic suspension train and its control method
CN111326147A (en) * 2018-12-12 2020-06-23 北京嘀嘀无限科技发展有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112017642A (en) * 2019-05-31 2020-12-01 华为技术有限公司 Method, device and equipment for speech recognition and computer readable storage medium
CN112435664A (en) * 2020-11-11 2021-03-02 郑州捷安高科股份有限公司 Evaluation system and method based on voice recognition and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092342A (en) * 2022-11-18 2023-05-09 四川大学 A method and system for automatic response and quality assessment of controller simulation training

Also Published As

Publication number Publication date
CN114882886B (en) 2024-10-01

Similar Documents

Publication Publication Date Title
CN114547329B (en) Method for establishing pre-trained language model, semantic parsing method and device
US11380327B2 (en) Speech communication system and method with human-machine coordination
CN113361266B (en) Text error correction method, electronic device and storage medium
US10418032B1 (en) System and methods for a virtual assistant to manage and use context in a natural language dialog
CN114547274B (en) Multi-turn question and answer method, device and equipment
US7636657B2 (en) Method and apparatus for automatic grammar generation from data entries
CN111177324B (en) Method and device for intent classification based on speech recognition results
CA2437620C (en) Hierarchichal language models
JP4105841B2 (en) Speech recognition method, speech recognition apparatus, computer system, and storage medium
CN114120985B (en) Soothing interaction method, system, device and storage medium for intelligent voice terminal
CN116450799B (en) Intelligent dialogue method and equipment applied to traffic management service
CN114333838A (en) Method and system for correcting voice recognition text
WO2025000835A1 (en) Instruction execution method and apparatus based on language model, and storage medium
CN116092342A (en) A method and system for automatic response and quality assessment of controller simulation training
CN118968978A (en) A human-computer interaction method, device and electronic device based on semantic analysis
CN112148845A (en) Method and device for inputting verbal resources of robot, electronic equipment and storage medium
JP2004094257A (en) Method and apparatus for generating question of decision tree for speech processing
CN114882886A (en) CTC simulation training voice recognition processing method, storage medium and electronic equipment
CN115620732A (en) A human-computer interaction method, system, electronic device, storage medium and vehicle
CN114547068A (en) Data generation method, device, equipment and computer readable storage medium
HK40070308A (en) Ctc simulation training speech recognition processing method, storage medium and electronic equipment
CN113591441B (en) Voice editing method and device, storage medium and electronic device
CN114218424B (en) Voice interaction method and system for tone word insertion based on wav2vec
HK40070308B (en) Ctc simulation training speech recognition processing method, storage medium and electronic equipment
Komatani et al. Efficient dialogue strategy to find users’ intended items from information query results

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40070308

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant