[go: up one dir, main page]

CN107305541A - Speech recognition text segmentation method and device - Google Patents

Speech recognition text segmentation method and device Download PDF

Info

Publication number
CN107305541A
CN107305541A CN201610256898.8A CN201610256898A CN107305541A CN 107305541 A CN107305541 A CN 107305541A CN 201610256898 A CN201610256898 A CN 201610256898A CN 107305541 A CN107305541 A CN 107305541A
Authority
CN
China
Prior art keywords
segmentation
voice segments
identification text
text
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610256898.8A
Other languages
Chinese (zh)
Other versions
CN107305541B (en
Inventor
胡尹
潘清华
王金钖
胡国平
胡郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201610256898.8A priority Critical patent/CN107305541B/en
Publication of CN107305541A publication Critical patent/CN107305541A/en
Application granted granted Critical
Publication of CN107305541B publication Critical patent/CN107305541B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of speech recognition text segmentation method and device, this method includes:End-point detection is carried out to speech data, the beginning frame number of each voice segments and each voice segments is obtained and terminates frame number;Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;Extract the segmentation feature of the corresponding identification text of each voice segments;Using the segmentation feature and the segmented model that builds in advance of extraction, identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented;It is segmented according to segmentation testing result identification text corresponding to the speech data.The present invention can be realized automatically to be segmented to identification text, becomes apparent from the structure of an article of identification text.

Description

Speech recognition text segmentation method and device
Technical field
The present invention relates to natural language processing field, and in particular to a kind of speech recognition text segmentation method And device.
Background technology
With the development of voice technology, automatic speech recognition technology has been widely used in life Every field, the life requirement that text greatly facilitates people is changed into by voice, such as turns session recording Into text personnel participating in the meeting is sent to as meeting summary;The recording of interview is changed into text, herein On the basis of compile news release etc..However, the identification text that speech recognition is obtained is not as human-edited Text have the clearly structure of an article, the division of such as paragraph structure, so as to cause user checking identification During text, be often difficult to find the emphasis or theme of whole identification text, especially when identification text compared with Many and when being related to multiple themes, user is more difficult to clear the structure of an article of identification text, accurately finds out The content of each theme.Therefore, how identification text is clearly showed into user, helps user's reason The content of solution identification text, the displaying for speech recognition text is particularly important.
In the prior art, it is usually that the identification text of speech data is exposed directly to user, to knowing Other result is not dealt with;Or the structure of an article of identification text is adjusted by manually, after adjustment Textual presentation is recognized to user, such as according to the content of identification text, will recognize that text divides different sections Fall, by the identification textual presentation after adjustment to user.This mode that manually adjusts is recognizing that text is more When, labor workload is big, and efficiency is low, and time-consuming longer, identifying system is extremely difficult to practical effect Really.
The content of the invention
The present invention provides a kind of speech recognition text segmentation method and device, to solve prior art by people Work adjustment recognizes the problem of structure of an article workload of text is big, efficiency is low.
Therefore, the present invention provides following technical scheme:
A kind of speech recognition text segmentation method, including:
End-point detection is carried out to speech data, obtain each voice segments and each voice segments beginning frame number and Terminate frame number;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segmented model that builds in advance of extraction, to speech data correspondence Identification text carry out segmentation detection, with determine need be segmented position;
It is segmented according to segmentation testing result identification text corresponding to the speech data.
Preferably, methods described also includes, and segmented model is built in the following manner:
Collect speech data;
End-point detection is carried out to the speech data of collection, each voice segments are obtained;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
The segment information of the corresponding identification text of each voice segments is marked, the segment information is used to represent to work as Whether the corresponding end position for recognizing text of preceding voice segments needs segmentation;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segment information as training data, segmented model is built.
Preferably, the segmentation feature for extracting the corresponding identification text of each voice segments includes:
From the segmentation feature for acoustically extracting each voice segments of the speech data, and by the segmentation feature It is used as the first segmentation feature of the corresponding identification text of institute's speech segment;And/or
Know from the semantically extraction segmentation feature of the identification text, and using the segmentation feature as described Second segmentation feature of other text.
Preferably, first segmentation feature includes:The duration of current speech segment, in addition to:Currently Between the distance between voice segments and previous voice segments, and/or current speech segment and latter voice segments away from From;
It is described to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
The difference for terminating frame number and the beginning frame number of current speech segment of current speech segment is calculated, and Using the difference as current speech segment duration;
Also include:
The difference for starting frame number and the end frame number of previous voice segments of current speech segment is calculated, and It regard the difference as the distance between current speech segment and previous voice segments;And/or
The difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and It regard the difference as the distance between current speech segment and latter voice segments.
Preferably, first segmentation feature also includes:The speaker of current speech segment and previous voice Whether the speaker of section identical, and/or speaker of current speech segment and the speaker of latter voice segments are It is no identical;
It is described also to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
Speaker's change point detection is carried out to the speech data using speaker's isolation technics;
The speaker of current speech segment and previous voice segments are determined according to speaker's change point testing result Whether speaker is identical, and/or determines speaking for current speech segment according to speaker's change point testing result Whether people is identical with the speaker of latter voice segments.
Preferably, second segmentation feature include it is following any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
Preferably, the semantically extraction segmentation feature from the identification text includes:
Identification text corresponding to the speech data is modified, and the amendment includes:To institute's predicate The corresponding identification text addition punctuate of sound data;
From the semantically extraction segmentation feature of revised identification text.
Preferably, it is described amendment also include it is following any one or more:
Identification text corresponding to the speech data carries out abnormal word filtering;
Identification text corresponding to the speech data carries out smooth processing;
It is regular that identification text corresponding to the speech data carries out numeral;
Identification text corresponding to the speech data carries out text replacement, and the text, which is replaced, to be included: By English lower case upper in the corresponding identification text of the speech data or vice versa; And/or the sensitive word in the corresponding identification text of the speech data is replaced with into additional character.
Preferably, it is described to utilize the segmentation feature extracted and the segmented model built in advance, to described The corresponding identification text of speech data carries out segmentation detection, to determine to need the position being segmented to include:
In units of voice segments, the segmentation feature of the corresponding identification text of each voice segments is inputted into institute successively State segmented model and carry out segmentation detection, whether determine the corresponding end position for recognizing text of each voice segments Need segmentation.
Preferably, methods described also includes:
The identification text after segmentation is shown to user;Or
The theme that each paragraph after segmentation recognizes text is extracted, and by each theme presentation to user;
When perceiving user's theme interested, will the correspondence theme paragraph identification text exhibition Show to user.
A kind of speech recognition text segmentation device, including:
Endpoint detection module, for carrying out end-point detection to speech data, obtains each voice segments and each language The beginning frame number and end frame number of segment;
Sound identification module, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding Recognize text;
Characteristic extracting module, the segmentation feature for extracting the corresponding identification text of each voice segments;
Detection module is segmented, for the segmentation feature using extraction and the segmented model built in advance, Identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented;
Segmentation module, for being entered according to segmentation testing result identification text corresponding to the speech data Row segmentation.
Preferably, described device also includes, and segmented model builds module, for building segmented model; The segmented model, which builds module, to be included:
Data collection module, for collecting speech data;
End-point detection unit, the speech data for being collected to the data collection module carries out end points inspection Survey, obtain each voice segments;
Voice recognition unit, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding Recognize text;
Unit is marked, the segment information for marking the corresponding identification text of each voice segments, the segmentation Information is used to represent whether the end position of the corresponding identification text of current speech segment needs segmentation;
Feature extraction unit, the segmentation feature for extracting the corresponding identification text of each voice segments;
Training unit, for the segmentation feature and the segment information, as training data, to be built Segmented model.
Preferably, the characteristic extracting module includes:
Fisrt feature extraction module, for point for acoustically extracting each voice segments from the speech data Duan Tezheng, and it regard the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment; And/or
Second feature extraction module, for the semantically extraction segmentation feature from the identification text, and It regard the segmentation feature as second segmentation feature for recognizing text.
Preferably, the fisrt feature extraction module includes:
Duration calculation unit, for calculating the end frame number of current speech segment and opening for current speech segment The difference of beginning frame number, and using the difference as current speech segment duration;
Metrics calculation unit, beginning frame number and the knot of previous voice segments for calculating current speech segment The difference of beam frame number, and it regard the difference as the distance between current speech segment and previous voice segments; And/or the difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and It regard the difference as the distance between current speech segment and latter voice segments.
Preferably, the fisrt feature extraction module also includes:
Speaker's change point detection unit, for being entered using speaker's isolation technics to the speech data Row speaker change point is detected;
Speaker's determining unit, for determining current speech segment according to speaker's change point testing result Whether speaker is identical with the speaker of previous voice segments, and/or according to speaker's change point testing result Determine whether the speaker of current speech segment is identical with the speaker of latter voice segments.
Preferably, second segmentation feature include it is following any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
Preferably, the second feature extraction module includes:
Amending unit, for being modified to the corresponding identification text of the speech data, the amendment Unit includes:Punctuate adds subelement, for being marked to the corresponding identification text addition of the speech data Point;
Feature extraction unit, for the semantically extraction segmentation feature from revised identification text.
Preferably, the amending unit also includes following any one or more subelements:
Subelement is filtered, is filtered for carrying out abnormal word to the corresponding identification text of the speech data;
Smooth processing subelement, for carrying out smooth processing to the corresponding identification text of the speech data;
Regular subelement, it is regular for carrying out numeral to the corresponding identification text of the speech data;
Text replaces subelement, for carrying out text replacement to the corresponding identification text of the speech data, The text, which is replaced, to be included:English lower case in the corresponding identification text of the speech data is turned It is changed to capitalization or vice versa;And/or replace the sensitive word in the corresponding identification text of the speech data It is changed to additional character.
Preferably, the segmentation detection module, specifically in units of voice segments, successively by each language The segmentation feature of the corresponding identification text of segment inputs the segmented model and carries out segmentation detection, it is determined that respectively Whether the end position of the corresponding identification text of voice segments needs segmentation.
Preferably, described device also includes:
First display module, for showing the identification text after segmentation to user;Or
Subject distillation module, the theme of text is recognized for extracting each paragraph after segmentation;
Second display module, for by each theme presentation to user;
Sensing module, the theme interested for perceiving user, and perceiving user master interested During topic, second display module is triggered by the identification textual presentation of the paragraph of the correspondence theme to use Family.
The present invention provides a kind of speech recognition text segmentation method and device, by being carried out to speech data End-point detection obtains each voice segments, and each voice segments are carried out with speech recognition and obtains the corresponding knowledge of each voice segments Other text, then, extracts the segmentation feature of the corresponding identification text of each voice segments, utilizes point of extraction Duan Tezheng and the segmented model built in advance, identification text corresponding to the speech data are divided Section detection, to determine the position for needing to be segmented, is segmented according to segmentation testing result to identification text, So as to automatically adjust the structure of an article of identification text, become apparent from its structure of an article, and then Contribute to user's fast understanding to recognize the content of text, lift user's reading efficiency.
Further, the segmentation feature can from speech data acoustically or identification text language Extracted in justice, it is of course also possible to both are integrated based on the segmentation feature extracted in different aspects, and Segmentation detection is carried out using corresponding segmented model identification text corresponding to the speech data, it is determined that The position of segmentation is needed, the accuracy of segmentation can be further improved.
It is possible to further show all identification texts after segmentation to user, or extract every section of knowledge The theme of other text, first by every section of theme presentation to user, when user needs to check section interested When falling, then paragraph content shown, so that it is interested interior to contribute to user to be quickly found out oneself Hold.
Brief description of the drawings
, below will be right in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art The accompanying drawing used required in embodiment is briefly described, it should be apparent that, it is attached in describing below Figure is only some embodiments described in the present invention, for those of ordinary skill in the art, also Other accompanying drawings can be obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of speech recognition text segmentation method of the embodiment of the present invention;
Fig. 2 is the flow chart of structure segmented model in the embodiment of the present invention;
Fig. 3 is a kind of structural representation of speech recognition text segmentation device of the embodiment of the present invention;
Fig. 4 is the structural representation of segmentation module structure module in the embodiment of the present invention;
Fig. 5 is another structural representation of speech recognition text segmentation device of the embodiment of the present invention;
Fig. 6 is another structural representation of speech recognition text segmentation device of the embodiment of the present invention.
Embodiment
In order that those skilled in the art more fully understand the scheme of the embodiment of the present invention, with reference to Drawings and embodiments are described in further detail to the embodiment of the present invention.
As shown in figure 1, be the flow chart of speech recognition text segmentation method of the embodiment of the present invention, including Following steps:
Step 101, end-point detection is carried out to speech data, obtains opening for each voice segments and each voice segments Beginning frame number and end frame number.
The speech data can be obtained according to practical application recording, such as session recording, interview recording.
So-called end-point detection is exactly that the starting point of each voice segments is found out from one section of given voice signal And end point.Some end-point detecting methods of the prior art can be specifically used, it is real to this present invention Example is applied not limit.
Step 102, speech recognition is carried out to each voice segments, obtains the corresponding identification text of each voice segments.
Specifically, each voice segments can be carried out with feature extraction, MFCC (Mel Frequency are such as extracted Cepstrum Coefficient) feature;Then the characteristic and the acoustic mode of training in advance extracted are utilized Type and language model carry out decoding operate;The corresponding identification of each voice segments is obtained finally according to decoded result Text.The detailed process of speech recognition is same as the prior art, will not be described in detail herein.
Step 103, the segmentation feature of the corresponding identification text of each voice segments is extracted.
In actual applications, the segmentation feature can from speech data acoustically or identification text Semantically extraction, it is of course also possible to be that comprehensive both are special based on the segmentation extracted in different aspects Levy, and segmentation detection carried out using corresponding segmented model identification text corresponding to the speech data, It is determined that needing the position being segmented, the accuracy of segmentation can be further improved.
Step 104, using the segmentation feature and the segmented model that builds in advance of extraction, to institute's predicate The corresponding identification text of sound data carries out segmentation detection, to determine the position for needing to be segmented.
Specifically, it is in units of voice segments, the segmentation feature of the corresponding identification text of each voice segments is defeated Enter the segmented model and carry out segmentation detection, determine the end position of the corresponding identification text of each voice segments Whether segmentation is needed.
It should be noted that in actual applications, the output of the segmented model can be current speech Whether the end position of the corresponding identification text of section needs segmentation or current speech segment corresponding Recognize that the end position of text needs the probability being segmented.Certainly, the output of different type parameter, not The training process of segmented model is influenceed, only different input/output arguments need to be set in model training i.e. Can.The specific training process of segmented model will be described in detail later.
If segmented model output is probability parameter, in such a case, it is possible to preset corresponding Threshold value, if the probability is more than the threshold value, then it is assumed that the corresponding identification text of current speech segment End position needs segmentation.
Step 105, divided according to segmentation testing result identification text corresponding to the speech data Section.
Specifically, segmentation markers can be added in the end position for the identification text for needing to be segmented, so It can facilitate and be divided in displaying according to segmentation markers identification text corresponding to the speech data Section displaying.
It should be noted that in actual applications, can be with the corresponding identification text of one voice segments of every extraction This segmentation feature, i.e., carry out segmentation detection to the identification text;Can also be first to all voice segments pair After the segmentation feature for the identification text answered all is extracted, then in units of voice segments, successively by each voice segments The segmentation feature input segmented model of corresponding identification text carries out segmentation detection, and this present invention is implemented Example is not limited.
In another embodiment of the inventive method, it can further include to after user's displaying segmentation The step of recognizing text.During specific displaying, it be able to will be belonged to according to the segmentation markers in identification text Same section of text is placed on a paragraph, and the text segmentation of different sections is shown.
It is A1, A2, A3, A4, A5, A6, A7, A8, A9, A10 such as to recognize text.Wherein, Ai represents one The corresponding identification text of voice segments, after segmentation detection, it is determined that segmentation is needed at A2 and A5, The form that can then show is as follows:
A1,A2
A3,A4,A5
A6,A7,A8,A9,A10
In another embodiment of the inventive method, following steps are can further include:
The theme that each paragraph after segmentation recognizes text is extracted, and by each theme presentation to user;
When perceiving user's theme interested, will the correspondence theme paragraph identification text exhibition Show to user.
Wherein, user selects the mode of theme interested to have a variety of, such as, to the point of corresponding theme The action such as choosing is hit or drawn, or corresponding sequence number is provided to each theme, user passes through input through keyboard phase Sequence number answered etc..
It is previously noted that in embodiments of the present invention, the segmentation feature can be from the acoustics of speech data Upper or identification text semantically extraction, it is of course also possible to be it is comprehensive both be based on different aspects The segmentation feature of upper extraction, the i.e. segmentation feature for acoustically extracting each voice segments from the speech data, And using the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment, and from The semantically extraction segmentation feature of the identification text, and it regard the segmentation feature as the identification text The second segmentation feature.Correspondingly, when carrying out segmented model training, acoustics can also be based solely on On segmentation feature training segmented model, or be based solely on semantically segmentation feature training segmentation mould Type, or segmentation feature training segmented model based on segmentation feature acoustically and semantically.
The segmentation feature in both different aspects is described in detail separately below.
1. from the acoustically extraction segmentation feature of speech data, i.e., foregoing first segmentation feature.
In actual applications, first segmentation feature can include:The duration of current speech segment, also Including:The distance between current speech segment and previous voice segments, and/or current speech segment and latter voice The distance between section.
Further, first segmentation feature may also include:The speaker of current speech segment with it is previous Whether the speaker of voice segments identical, and/or speaker of current speech segment and latter voice segments are spoken Whether people is identical.
This several segmentation feature is described in detail separately below.
A) duration of current speech segment
The frame number that the duration of voice segments can use voice segments to include is represented.Therefore, current speech is calculated The difference for terminating frame number and the beginning frame number of current speech segment of section, you can obtain current speech segment Duration, that is to say, that using the difference as current speech segment duration.
B) the distance between current speech segment and previous voice segments
The distance between current speech segment and previous voice segments can use the start frame sequence of current speech segment Difference number between the end frame number of previous voice segments is represented.Therefore, current speech segment is calculated Start the difference of frame number and the end frame number of previous voice segments, regard the difference as current speech segment The distance between with previous voice segments.
It should be noted that when current speech segment is first voice segments, current speech segment and previous language The distance between segment is 0.
C) the distance between current speech segment and latter voice segments
Similarly, the distance between current speech segment and latter voice segments can use latter voice segments The difference started between frame number and the end frame number of current speech segment is represented.Therefore, calculate latter The difference of the end frame number for starting frame number and current speech segment of voice segments, and using the difference as The distance between current speech segment and latter voice segments.
It should be noted that current speech segment be last voice segments when, current speech segment with it is latter The distance between voice segments are 0.
D) whether the speaker of current speech segment is identical with the speaker of previous voice segments
E) whether the speaker of current speech segment is identical with the speaker of latter voice segments
For neighbouring speech segments speaker, whether identical detection can utilize speaker's isolation technics to institute State speech data and carry out speaker's change point detection, determined according to speaker's change point testing result current Whether the speaker of voice segments is identical and current speech segment with the speaker of previous voice segments to be spoken Whether people is identical with the speaker of latter voice segments.
Speaker's change point is that same speaker speaks end, the place that another speaker starts, Specific detection method is same as the prior art, is not described in detail herein.
2. from the semantically extraction segmentation feature of identification text, i.e., foregoing second segmentation feature.
In actual applications, second segmentation feature can include it is following any one or more:
A) forward direction unsegmented sentence number, refer to from current speech segment it is corresponding identification text starting position to The sentence sum that all identification texts are included between a upper segmentation markers.
A upper segmentation markers can correspond to the identification before identification text start according to current speech segment The segmentation markers of text are obtained, and the segmentation markers can be according to the correspondence identification text of voice segments before Segmentation testing result obtain.
It should be noted that in actual applications, if comprising preceding to not in second segmentation feature This feature of segmentation sentence number, when carrying out segmentation detection, it is necessary to often extracted using above-mentioned The segmentation feature of the corresponding identification text of one voice segments, i.e., carry out being segmented detection to the identification text Mode.
In addition, it is necessary to explanation, if the corresponding identification text of current speech segment is identification text Beginning, the forward direction unsegmented sentence number is 0.
B) backward unsegmented sentence number, the institute referred to after the corresponding identification text of current speech segment is insighted The sentence sum that other text is included.
The backward unsegmented sentence number refers to all identifications after current speech segment correspondence identification text Sentence that text is included sum, can be by analyzing the sentence after current speech segment correspondence identification text Number is obtained.
If it should be noted that current speech segment correspondence identification text for it is all identification texts endings, The backward unsegmented sentence number is 0.
C) the sentence number that the corresponding identification text of current speech segment is included.
Specifically corresponding sentence number can be obtained with the punctuate in Direct Analysis current speech segment correspondence identification text.
D) similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments.
The similarity is typically measured by the distance between vector or angle, between such as calculating vector Cosine angle, the angle is smaller, and two identification text vector similarities are higher.The term vector Change process is prior art, be will not be described in detail herein.
In order to exclude interference of some stop-words to Text similarity computing, it can respectively delete and work as first The stop-word that preceding voice segments identification text corresponding with previous voice segments is included, the stop-word is to know The frequency occurred in other text is higher, but the word without practical significance, symbol or mess code etc., such as " this, With, meeting, be ".It is specific when deleting, identification text can be searched by the stopping vocabulary that builds in advance In stop-word realize.Then it will delete and remaining term vector in text is recognized after stop-word, will All term vectors combine in voice segments correspondence identification text, respectively obtain current speech segment correspondence knowledge Other text vector and previous voice segments correspondence identification text vector, calculate the phase of two identification text vectors Like degree.
It should be noted that at the beginning of current speech segment correspondence identification text is all identification text, The similarity is 0.
E) similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
It is similar with above similarity calculating method, that is, delete after the stop-word in identification text, will recognize Similarity is calculated after text vector.
It should be noted that when current speech segment correspondence identification text is the ending of all identification texts, The similarity is 0.
It should be noted that before the second segmentation feature is extracted, in addition it is also necessary to the speech data pair The identification text answered is modified, then from the semantically extraction described second of revised identification text Segmentation feature.
To recognizing that the amendment of text mainly includes:To identification text addition punctuate.It is described addition punctuate be Corresponding punctuation mark is added to identification text, such as based on conditional random field models to identification text addition Punctuate.In order that the punctuate of addition is more accurate, the threshold that voice can be set intersegmental with addition punctuate in section Value, the threshold value of the intersegmental addition punctuate of such as voice sets smaller, and the threshold value that punctuate is added in voice segments is set Put larger, so as to increase the possibility of the intersegmental addition punctuate of voice, reduce in voice segments and add punctuate Possibility.Add the text after punctuate, will with punctuation mark (including comma, ", question mark "”、 Exclamation mark "!" and fullstop ".") separate text, be used as one.
Secondly, it is described amendment can also further comprise it is following any one or more:
(1) identification text corresponding to the speech data carries out abnormal word filtering.
Text filtering, which is mainly, filters out the abnormal word of mistake in identification text, can specifically be put according to word Reliability and the result of syntactic analysis are filtered.
(2) identification text corresponding to the speech data carries out smooth processing.
Incoherent sentence is mainly smoothed out with the fingers suitable by the smooth processing of text, and the repetitor of no practical significance only retains One, such as " extremely good ", only retain one " very ".Modal particle without practical significance can be neglected Slightly, without typing, fall as " oh " needs of " oh this problem " are smooth.
(3) identification text corresponding to the speech data carries out digital regular.
All numerals in the identification text obtained due to speech recognition are all represented with Chinese figure, still Some numerals represent just to meet the reading habit of user with Arabic numerals, such as 2 points 5 yuan, it should It is expressed as 21.5 yuan.The regular numeral is exactly that some Chinese figures are converted into Arabic numerals, such as The method based on the ABNF syntax can be used.
(4) identification text corresponding to the speech data carries out text replacement, and the text replaces bag Include two kinds of situations:
A kind of situation is the replacement between English capital and small letter, will the corresponding identification text of the speech data In English lower case upper or vice versa;Such as " nba " replaces with " NBA ", " sieve c " It is substituted for " sieve C " etc.;
Another situation be by the speech data it is corresponding identification text in sensitive word replace with it is special Symbol, reaches hiding effect.During specific replacement, sensitive vocabulary can be set up, sensitive vocabulary is then traveled through Search in identification text and whether sensitive word occur, replaced if occurring with additional character, such as some violent tenets Word, such as " robberys " be sensitive word, then " robbery " occurred in text is all with " * * * * " are replaced.
It can be needed to select one or two according to practical application it should be noted that above two text replaces situation Plant to carry out, this embodiment of the present invention is not limited.
As shown in Fig. 2 be the flow chart that segmented model is built in the embodiment of the present invention, including following step Suddenly:
Step 201, speech data is collected.
Step 202, end-point detection is carried out to the speech data, obtains each voice segments.
Step 203, speech recognition is carried out to each voice segments, obtains the corresponding identification text of each voice segments.
Step 204, the segment information of the corresponding identification text of each voice segments, the segment information are marked For representing whether the end position of the corresponding identification text of current speech segment needs segmentation.
Such as, if segmentation, is labeled as 1, otherwise, is labeled as 0.It can certainly use other Symbol represents that the embodiment of the present invention is not construed as limiting.
Step 205, the segmentation feature of the corresponding identification text of each voice segments is extracted.
Step 206, the segmentation feature and the segment information are built into segmentation as training data Model.
The segmented model can use pattern-recognition in common model, such as Bayesian model, support to Amount machine model etc..During specific training, using the segmentation feature for recognizing text as the input of model, it will mark The segment information of note carries out model training, obtains segmented model as the output of model.Segmented model Specific training process it is same as the prior art, will not be described in detail herein.It should be noted that described point Class model can be obtained with off-line training.
It should be noted that when carrying out segmented model training, the segmentation that can be based solely on acoustically is special Training segmented model is levied, or is based solely on segmentation feature training segmented model semantically, Huo Zheji Segmentation feature training segmented model in segmentation feature acoustically and semantically.Correspondingly, above-mentioned , can an extraction base when extracting the segmentation feature of the corresponding identification text of each voice segments in step 205 In segmentation feature acoustically or based on segmentation feature semantically, it can also extract simultaneously based on acoustics On segmentation feature and based on segmentation feature semantically, this embodiment of the present invention is not limited.
In addition, it is necessary to explanation, the segmented model trained based on different type segmentation feature, in profit , it is necessary to extract identification text to be presented when carrying out segmentation detection to identification text to be presented with the segmented model The segmentation feature of this respective type, the segmentation feature of extraction is input to the segmented model to determine to treat The position of segmentation is needed in displaying identification text.
The speech recognition text segmentation method that the present invention is provided, by carrying out end-point detection to speech data Each voice segments are obtained, carrying out speech recognition to each voice segments obtains the corresponding identification text of each voice segments, Then, extract the segmentation feature of the corresponding identification text of each voice segments, using the segmentation feature of extraction with And the segmented model built in advance, identification text progress segmentation detection corresponding to the speech data, Identification text is segmented according to segmentation testing result, so as to automatically adjust identification text The structure of an article, becomes apparent from its structure of an article, and then contributes to user's fast understanding to recognize text Content, lifts user's reading efficiency.
In actual applications, the segmentation feature can from speech data acoustically or identification text Semantically extraction, it is of course also possible to it is comprehensive both based on the segmentation feature extracted in different aspects, And segmentation detection is carried out using corresponding segmented model identification text corresponding to the speech data, really The fixed position for needing to be segmented, can further improve the accuracy of segmentation.
Further, the speech recognition text segmentation method that the present invention is provided, can also show to user All identification texts after segmentation, or the theme of every section of identification text is extracted, first by every section of theme User is showed, is shown when user needs to check paragraph interested, then by paragraph content, So as to contribute to user to be quickly found out oneself content interested.
Correspondingly, the embodiment of the present invention also provides a kind of speech recognition text segmentation device, such as Fig. 3 institutes Show, be a kind of structural representation of the device.
In this embodiment, described device includes:
Endpoint detection module 301, for speech data carry out end-point detection, obtain each voice segments and The beginning frame number and end frame number of each voice segments;
Sound identification module 302, for carrying out speech recognition to each voice segments, obtains each voice segments pair The identification text answered;
Characteristic extracting module 303, the segmentation feature for extracting the corresponding identification text of each voice segments;
Detection module 304 is segmented, for the segmentation feature using extraction and the segmentation mould built in advance Type, identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented Put;Specifically, it is successively that the segmentation of the corresponding identification text of each voice segments is special in units of voice segments Levy the input segmented model and carry out segmentation detection, determine the end of the corresponding identification text of each voice segments Whether position needs segmentation;
Segmentation module 305, for according to segmentation testing result identification text corresponding to the speech data This progress is segmented.
It should be noted that in actual applications, the characteristic extracting module 303 can be from voice number According to acoustically or identification text semantically extraction segmentation feature, it is of course also possible to be it is comprehensive this Two kinds are based on extracting segmentation feature in different aspects.Correspondingly, the characteristic extracting module 303 can be with Including:Fisrt feature extraction module and/or second feature extraction module.Wherein:
Fisrt feature extraction module, for point for acoustically extracting each voice segments from the speech data Duan Tezheng, and it regard the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment;
Second feature extraction module, for the semantically extraction segmentation feature from the identification text, and It regard the segmentation feature as second segmentation feature for recognizing text.
Wherein, a kind of embodiment of the fisrt feature extraction module includes:Duration calculation unit and away from From computing unit;Another embodiment of the fisrt feature extraction module can also further comprise:Speak People's change point detection unit and speaker's determining unit.These units are illustrated separately below.
What the duration calculation unit was used to calculating current speech segment terminates frame number and current speech segment Start frame number difference, and using the difference as current speech segment duration.
The metrics calculation unit is used for the beginning frame number for calculating current speech segment and previous voice segments Terminate the difference of frame number, and regard the difference as the distance between current speech segment and previous voice segments; And/or the difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and It regard the difference as the distance between current speech segment and latter voice segments.
Speaker's change point detection unit is used for using speaker's isolation technics to the speech data Carry out speaker's change point detection.
Speaker's determining unit is used to determine current speech segment according to speaker's change point testing result Speaker and previous voice segments speaker it is whether identical, and/or detected and tie according to speaker's change point Fruit determines whether the speaker of current speech segment is identical with the speaker of latter voice segments.
A kind of embodiment of the second feature extraction module includes:
Amending unit, for being modified to the corresponding identification text of the speech data, the amendment Unit includes:Punctuate adds subelement, for being marked to the corresponding identification text addition of the speech data Point, such as based on conditional random field models identification text addition punctuate corresponding to the speech data;
Feature extraction unit, for the semantically extraction segmentation feature from revised identification text.
Wherein, the second segmentation feature that the feature extraction unit is extracted can include it is following any one Or it is a variety of:Forward direction unsegmented sentence number, backward unsegmented sentence number, the corresponding identification of current speech segment The corresponding identification text of sentence number that text is included, current speech segment is corresponding with previous voice segments to be recognized The corresponding identification text of similarity, the current speech segment identification text corresponding with latter voice segments of text Similarity.
In actual applications, the amending unit can also include following any one or more subelements:
Subelement is filtered, is filtered for carrying out abnormal word to the corresponding identification text of the speech data;
Smooth processing subelement, for carrying out smooth processing to the corresponding identification text of the speech data;
Regular subelement, it is regular for carrying out numeral to the corresponding identification text of the speech data;
Text replaces subelement, for carrying out text replacement to the corresponding identification text of the speech data, The text, which is replaced, to be included:English lower case in the corresponding identification text of the speech data is turned It is changed to capitalization or vice versa;And/or replace the sensitive word in the corresponding identification text of the speech data It is changed to additional character.
In embodiments of the present invention, the segmented model can by corresponding segmented model build module from Line is built, the segmented model build module can independently of speech recognition text segmentation device of the present invention, It can also be integrated in one with speech recognition text segmentation device of the present invention, to this embodiment of the present invention not Limit.
As shown in figure 4, be a kind of structural representation of segmented model structure module in the embodiment of the present invention, Including:
Data collection module 401, for collecting speech data;
End-point detection unit 402, the speech data for being collected to the data collection module carries out end Point detection, obtains each voice segments;
Voice recognition unit 403, for carrying out speech recognition to each voice segments, obtains each voice segments pair The identification text answered;
Unit 404 is marked, the segment information for marking the corresponding identification text of each voice segments is described Segment information is used to represent whether the end position of the corresponding identification text of current speech segment needs segmentation;
Feature extraction unit 405, the segmentation feature for extracting the corresponding identification text of each voice segments;
Training unit 406, for using the segmentation feature and the segment information as training data, Build segmented model.
It should be noted that when carrying out segmented model training, segmentation acoustically can be based solely on Feature (i.e. above-mentioned first segmentation feature) trains segmented model, or is based solely on semantically Segmentation feature (i.e. above-mentioned second segmentation feature) training segmented model, or based on acoustics On segmentation feature and semantically segmentation feature training segmented model.Correspondingly, features described above is extracted When unit 405 extracts the segmentation feature of the corresponding identification text of each voice segments, it can only extract and be based on sound Segmentation feature on or based on segmentation feature semantically, can also be extracted based on acoustically simultaneously Segmentation feature and based on segmentation feature semantically, is not limited this embodiment of the present invention.
In addition, the output of the segmented model can be the end of the corresponding identification text of current speech segment Whether position needs the end position needs of segmentation or the corresponding identification text of current speech segment The probability of segmentation.Certainly, the output of different type parameter, has no effect on the training process of segmented model, Different input/output arguments only need to be set in model training.
The present invention provides a kind of speech recognition text segmentation device, by carrying out end points inspection to speech data Each voice segments are measured, carrying out speech recognition to each voice segments obtains the corresponding identification text of each voice segments, Then, extract the segmentation feature of the corresponding identification text of each voice segments, using the segmentation feature of extraction with And the segmented model built in advance, identification text progress segmentation detection corresponding to the speech data, Identification text is segmented according to segmentation testing result, so as to automatically adjust identification text The structure of an article, becomes apparent from its structure of an article, and then contributes to user's fast understanding to recognize text Content, lifts user's reading efficiency.
Further, the segmentation feature can from speech data acoustically or identification text language Extracted in justice, it is of course also possible to both are integrated based on the segmentation feature extracted in different aspects, and Segmentation detection is carried out using corresponding segmented model identification text corresponding to the speech data, it is determined that The position of segmentation is needed, the accuracy of segmentation can be further improved.
As shown in figure 5, being that another structure of speech recognition text segmentation device of the embodiment of the present invention is shown It is intended to.
From unlike Fig. 3, in this embodiment, described device also includes:
First display module 501, for showing the identification text after segmentation to user.
As shown in fig. 6, being that another structure of speech recognition text segmentation device of the embodiment of the present invention is shown It is intended to.
From unlike Fig. 3, in this embodiment, described device also includes:
Subject distillation module 601, the theme of text is recognized for extracting each paragraph after segmentation;
Second display module 602, for by each theme presentation to user;
Sensing module 603, the theme interested for perceiving user, and it is interested perceiving user Theme when, trigger second display module 602 will the correspondence theme paragraph identification text Show user.
The speech recognition text segmentation device that the present invention is provided, can also be in several ways to user's exhibition Show the identification text after segmentation, not only can clearly recognize text to user's displaying structure of an article, and And may also help in user and be quickly found out oneself content interested, further lift reading efficiency.
Each embodiment in this specification is described by the way of progressive, phase between each embodiment With similar part mutually referring to what each embodiment was stressed is and other embodiment Difference.For device embodiment, because it is substantially similar to embodiment of the method, So describing fairly simple, the relevent part can refer to the partial explaination of embodiments of method.Above institute The device embodiment of description is only schematical, wherein the unit illustrated as separating component can To be or may not be physically separate, the part shown as unit can be or also may be used Not to be physical location, you can with positioned at a place, or multiple NEs can also be distributed to On.Some or all of module therein can be selected to realize the present embodiment side according to the actual needs The purpose of case.Those of ordinary skill in the art are without creative efforts, you can to manage Solve and implement.
The embodiment of the present invention is described in detail above, embodiment pair used herein The present invention is set forth, the explanation of above example be only intended to help to understand the present invention method and Device;Simultaneously for those of ordinary skill in the art, according to the thought of the present invention, specific real Apply and will change in mode and application, in summary, this specification content should not be understood For limitation of the present invention.

Claims (20)

1. a kind of speech recognition text segmentation method, it is characterised in that including:
End-point detection is carried out to speech data, obtain each voice segments and each voice segments beginning frame number and Terminate frame number;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segmented model that builds in advance of extraction, to speech data correspondence Identification text carry out segmentation detection, with determine need be segmented position;
It is segmented according to segmentation testing result identification text corresponding to the speech data.
2. according to the method described in claim 1, it is characterised in that methods described also includes, by with Under type builds segmented model:
Collect speech data;
End-point detection is carried out to the speech data of collection, each voice segments are obtained;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
The segment information of the corresponding identification text of each voice segments is marked, the segment information is used to represent to work as Whether the corresponding end position for recognizing text of preceding voice segments needs segmentation;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segment information as training data, segmented model is built.
3. according to the method described in claim 1, it is characterised in that described to extract each voice segments correspondence The segmentation feature of identification text include:
From the segmentation feature for acoustically extracting each voice segments of the speech data, and by the segmentation feature It is used as the first segmentation feature of the corresponding identification text of institute's speech segment;And/or
Know from the semantically extraction segmentation feature of the identification text, and using the segmentation feature as described Second segmentation feature of other text.
4. method according to claim 3, it is characterised in that first segmentation feature includes: The duration of current speech segment, in addition to:The distance between current speech segment and previous voice segments, and/or The distance between current speech segment and latter voice segments;
It is described to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
The difference for terminating frame number and the beginning frame number of current speech segment of current speech segment is calculated, and Using the difference as current speech segment duration;
Also include:
The difference for starting frame number and the end frame number of previous voice segments of current speech segment is calculated, and It regard the difference as the distance between current speech segment and previous voice segments;And/or
The difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and It regard the difference as the distance between current speech segment and latter voice segments.
5. method according to claim 4, it is characterised in that first segmentation feature is also wrapped Include:Whether the speaker of the speaker of current speech segment and previous voice segments identical, and/or current speech Whether the speaker of section is identical with the speaker of latter voice segments;
It is described also to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
Speaker's change point detection is carried out to the speech data using speaker's isolation technics;
The speaker of current speech segment and previous voice segments are determined according to speaker's change point testing result Whether speaker is identical, and/or determines speaking for current speech segment according to speaker's change point testing result Whether people is identical with the speaker of latter voice segments.
6. method according to claim 3, it is characterised in that second segmentation feature includes Below any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
7. method according to claim 3, it is characterised in that described from the identification text Semantically extracting segmentation feature includes:
Identification text corresponding to the speech data is modified, and the amendment includes:To institute's predicate The corresponding identification text addition punctuate of sound data;
From the semantically extraction segmentation feature of revised identification text.
8. method according to claim 7, it is characterised in that the amendment also includes following Meaning is one or more:
Identification text corresponding to the speech data carries out abnormal word filtering;
Identification text corresponding to the speech data carries out smooth processing;
It is regular that identification text corresponding to the speech data carries out numeral;
Identification text corresponding to the speech data carries out text replacement, and the text, which is replaced, to be included: By English lower case upper in the corresponding identification text of the speech data or vice versa; And/or the sensitive word in the corresponding identification text of the speech data is replaced with into additional character.
9. the method according to any one of claim 1 to 8, it is characterised in that described utilize carries The segmentation feature taken and the segmented model built in advance, identification text corresponding to the speech data Segmentation detection is carried out, to determine to need the position being segmented to include:
In units of voice segments, the segmentation feature of the corresponding identification text of each voice segments is inputted into institute successively State segmented model and carry out segmentation detection, whether determine the corresponding end position for recognizing text of each voice segments Need segmentation.
10. the method according to any one of claim 1 to 8, it is characterised in that methods described Also include:
The identification text after segmentation is shown to user;Or
The theme that each paragraph after segmentation recognizes text is extracted, and by each theme presentation to user;
When perceiving user's theme interested, will the correspondence theme paragraph identification text exhibition Show to user.
11. a kind of speech recognition text segmentation device, it is characterised in that including:
Endpoint detection module, for carrying out end-point detection to speech data, obtains each voice segments and each language The beginning frame number and end frame number of segment;
Sound identification module, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding Recognize text;
Characteristic extracting module, the segmentation feature for extracting the corresponding identification text of each voice segments;
Detection module is segmented, for the segmentation feature using extraction and the segmented model built in advance, Identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented;
Segmentation module, for being entered according to segmentation testing result identification text corresponding to the speech data Row segmentation.
12. device according to claim 11, it is characterised in that described device also includes, point Segment model builds module, for building segmented model;The segmented model, which builds module, to be included:
Data collection module, for collecting speech data;
End-point detection unit, the speech data for being collected to the data collection module carries out end points inspection Survey, obtain each voice segments;
Voice recognition unit, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding Recognize text;
Unit is marked, the segment information for marking the corresponding identification text of each voice segments, the segmentation Information is used to represent whether the end position of the corresponding identification text of current speech segment needs segmentation;
Feature extraction unit, the segmentation feature for extracting the corresponding identification text of each voice segments;
Training unit, for the segmentation feature and the segment information, as training data, to be built Segmented model.
13. device according to claim 11, it is characterised in that the characteristic extracting module bag Include:
Fisrt feature extraction module, for point for acoustically extracting each voice segments from the speech data Duan Tezheng, and it regard the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment; And/or
Second feature extraction module, for the semantically extraction segmentation feature from the identification text, and It regard the segmentation feature as second segmentation feature for recognizing text.
14. device according to claim 13, it is characterised in that the fisrt feature extracts mould Block includes:
Duration calculation unit, for calculating the end frame number of current speech segment and opening for current speech segment The difference of beginning frame number, and using the difference as current speech segment duration;
Metrics calculation unit, beginning frame number and the knot of previous voice segments for calculating current speech segment The difference of beam frame number, and it regard the difference as the distance between current speech segment and previous voice segments; And/or the difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and It regard the difference as the distance between current speech segment and latter voice segments.
15. device according to claim 14, it is characterised in that the fisrt feature extracts mould Block also includes:
Speaker's change point detection unit, for being entered using speaker's isolation technics to the speech data Row speaker change point is detected;
Speaker's determining unit, for determining current speech segment according to speaker's change point testing result Whether speaker is identical with the speaker of previous voice segments, and/or according to speaker's change point testing result Determine whether the speaker of current speech segment is identical with the speaker of latter voice segments.
16. device according to claim 13, it is characterised in that the second segmentation feature bag Include it is following any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
17. device according to claim 13, it is characterised in that the second feature extracts mould Block includes:
Amending unit, for being modified to the corresponding identification text of the speech data, the amendment Unit includes:Punctuate adds subelement, for being marked to the corresponding identification text addition of the speech data Point;
Feature extraction unit, for the semantically extraction segmentation feature from revised identification text.
18. device according to claim 17, it is characterised in that the amending unit also includes Any one or more subelements below:
Subelement is filtered, is filtered for carrying out abnormal word to the corresponding identification text of the speech data;
Smooth processing subelement, for carrying out smooth processing to the corresponding identification text of the speech data;
Regular subelement, it is regular for carrying out numeral to the corresponding identification text of the speech data;
Text replaces subelement, for carrying out text replacement to the corresponding identification text of the speech data, The text, which is replaced, to be included:English lower case in the corresponding identification text of the speech data is turned It is changed to capitalization or vice versa;And/or replace the sensitive word in the corresponding identification text of the speech data It is changed to additional character.
19. the device according to any one of claim 11 to 18, it is characterised in that
The segmentation detection module, specifically in units of voice segments, successively by each voice segments correspondence The segmentation feature of identification text input the segmented model and carry out segmentation detection, determine each voice segments pair Whether the end position for the identification text answered needs segmentation.
20. the device according to any one of claim 11 to 18, it is characterised in that the dress Putting also includes:
First display module, for showing the identification text after segmentation to user;Or
Subject distillation module, the theme of text is recognized for extracting each paragraph after segmentation;
Second display module, for by each theme presentation to user;
Sensing module, the theme interested for perceiving user, and perceiving user master interested During topic, second display module is triggered by the identification textual presentation of the paragraph of the correspondence theme to use Family.
CN201610256898.8A 2016-04-20 2016-04-20 Method and device for segmenting speech recognition text Active CN107305541B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610256898.8A CN107305541B (en) 2016-04-20 2016-04-20 Method and device for segmenting speech recognition text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610256898.8A CN107305541B (en) 2016-04-20 2016-04-20 Method and device for segmenting speech recognition text

Publications (2)

Publication Number Publication Date
CN107305541A true CN107305541A (en) 2017-10-31
CN107305541B CN107305541B (en) 2021-05-04

Family

ID=60150228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610256898.8A Active CN107305541B (en) 2016-04-20 2016-04-20 Method and device for segmenting speech recognition text

Country Status (1)

Country Link
CN (1) CN107305541B (en)

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090051A (en) * 2017-12-20 2018-05-29 深圳市沃特沃德股份有限公司 The interpretation method and translator of continuous long voice document
CN108364650A (en) * 2018-04-18 2018-08-03 北京声智科技有限公司 The adjusting apparatus and method of voice recognition result
CN108363765A (en) * 2018-02-06 2018-08-03 深圳市鹰硕技术有限公司 The recognition methods of audio paragraph and device
CN108446389A (en) * 2018-03-22 2018-08-24 平安科技(深圳)有限公司 Speech message searching and displaying method, device, computer equipment and storage medium
CN108830639A (en) * 2018-05-17 2018-11-16 科大讯飞股份有限公司 Content data processing method and device, computer readable storage medium
CN109344411A (en) * 2018-09-19 2019-02-15 深圳市合言信息科技有限公司 A kind of interpretation method for listening to formula simultaneous interpretation automatically
CN109361823A (en) * 2018-11-01 2019-02-19 深圳市号互联科技有限公司 A kind of intelligent interaction mode that voice is mutually converted with text
CN109743589A (en) * 2018-12-26 2019-05-10 百度在线网络技术(北京)有限公司 Article generation method and device
CN110264997A (en) * 2019-05-30 2019-09-20 北京百度网讯科技有限公司 The method, apparatus and storage medium of voice punctuate
CN110379413A (en) * 2019-06-28 2019-10-25 联想(北京)有限公司 A kind of method of speech processing, device, equipment and storage medium
CN110399489A (en) * 2019-07-08 2019-11-01 厦门市美亚柏科信息股份有限公司 A kind of chat data segmentation method, device and storage medium
CN110502631A (en) * 2019-07-17 2019-11-26 招联消费金融有限公司 A kind of input information response method, apparatus, computer equipment and storage medium
CN110503943A (en) * 2018-05-17 2019-11-26 蔚来汽车有限公司 A voice interaction method and voice interaction system
CN110588524A (en) * 2019-08-02 2019-12-20 精电有限公司 A method for displaying information and a vehicle-mounted auxiliary display system
CN110619897A (en) * 2019-08-02 2019-12-27 精电有限公司 Conference summary generation method and vehicle-mounted recording system
CN110827825A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Punctuation prediction method, system, terminal and storage medium for speech recognition text
CN111079384A (en) * 2019-11-18 2020-04-28 佰聆数据股份有限公司 Identification method and system for intelligent quality inspection service forbidden words
CN111862980A (en) * 2020-08-07 2020-10-30 斑马网络技术有限公司 Incremental semantic processing method
CN111931482A (en) * 2020-09-22 2020-11-13 苏州思必驰信息科技有限公司 Text segmentation method and device
CN112036128A (en) * 2020-08-21 2020-12-04 百度在线网络技术(北京)有限公司 A text content processing method, apparatus, device and storage medium
CN112185424A (en) * 2020-09-29 2021-01-05 国家计算机网络与信息安全管理中心 Voice file cutting and restoring method, device, equipment and storage medium
CN112214965A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Case regularization method, apparatus, electronic device and storage medium
CN112699687A (en) * 2021-01-07 2021-04-23 北京声智科技有限公司 Content cataloging method and device and electronic equipment
CN112712794A (en) * 2020-12-25 2021-04-27 苏州思必驰信息科技有限公司 Speech recognition marking training combined system and device
CN112733660A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN112818077A (en) * 2020-12-31 2021-05-18 科大讯飞股份有限公司 Text processing method, device, equipment and storage medium
WO2021109000A1 (en) * 2019-12-03 2021-06-10 深圳市欢太科技有限公司 Data processing method and apparatus, electronic device, and storage medium
CN113041623A (en) * 2019-12-26 2021-06-29 波克科技股份有限公司 Game parameter configuration method and device and computer readable storage medium
CN113076720A (en) * 2021-04-29 2021-07-06 新声科技(深圳)有限公司 Long text segmentation method and device, storage medium and electronic device
CN114254587A (en) * 2021-12-15 2022-03-29 科大讯飞股份有限公司 Topic paragraph dividing method and device, electronic equipment and storage medium
CN114841171A (en) * 2022-04-29 2022-08-02 北京思源智通科技有限责任公司 Text segmentation subject extraction method, system, readable medium and device
CN115394295A (en) * 2021-05-25 2022-11-25 阿里巴巴新加坡控股有限公司 Segmentation processing method, device, equipment and storage medium
US11580463B2 (en) 2019-05-06 2023-02-14 Hithink Royalflush Information Network Co., Ltd. Systems and methods for report generation
CN117113974A (en) * 2023-04-26 2023-11-24 荣耀终端有限公司 Text segmentation method, device, chip, electronic equipment and medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1296587A (en) * 1998-02-02 2001-05-23 蓝道尔·C·沃克 Text processor
US20040024585A1 (en) * 2002-07-03 2004-02-05 Amit Srivastava Linguistic segmentation of speech
CN1771494A (en) * 2003-05-28 2006-05-10 洛昆多股份公司 Automatic segmentation of texts comprising chunsk without separators
CN1894686A (en) * 2003-11-21 2007-01-10 皇家飞利浦电子股份有限公司 Text segmentation and topic annotation for document structuring
US20100169318A1 (en) * 2008-12-30 2010-07-01 Microsoft Corporation Contextual representations from data streams
CN103150294A (en) * 2011-12-06 2013-06-12 盛乐信息技术(上海)有限公司 Method and system for correcting based on voice identification results
CN103164399A (en) * 2013-02-26 2013-06-19 北京捷通华声语音技术有限公司 Punctuation addition method and device in speech recognition
CN103345922A (en) * 2013-07-05 2013-10-09 张巍 Large-length voice full-automatic segmentation method
CN103488723A (en) * 2013-09-13 2014-01-01 复旦大学 Automatic navigating method and system for interested semantic range of electronic reading
US20150302849A1 (en) * 2005-07-13 2015-10-22 Intellisist, Inc. System And Method For Identifying Special Information
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system
US20160026618A1 (en) * 2002-12-24 2016-01-28 At&T Intellectual Property Ii, L.P. System and method of extracting clauses for spoken language understanding
CN105427858A (en) * 2015-11-06 2016-03-23 科大讯飞股份有限公司 Method and system for achieving automatic voice classification

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1296587A (en) * 1998-02-02 2001-05-23 蓝道尔·C·沃克 Text processor
US20040024585A1 (en) * 2002-07-03 2004-02-05 Amit Srivastava Linguistic segmentation of speech
US20160026618A1 (en) * 2002-12-24 2016-01-28 At&T Intellectual Property Ii, L.P. System and method of extracting clauses for spoken language understanding
CN1771494A (en) * 2003-05-28 2006-05-10 洛昆多股份公司 Automatic segmentation of texts comprising chunsk without separators
CN1894686A (en) * 2003-11-21 2007-01-10 皇家飞利浦电子股份有限公司 Text segmentation and topic annotation for document structuring
US20150302849A1 (en) * 2005-07-13 2015-10-22 Intellisist, Inc. System And Method For Identifying Special Information
US20100169318A1 (en) * 2008-12-30 2010-07-01 Microsoft Corporation Contextual representations from data streams
CN103150294A (en) * 2011-12-06 2013-06-12 盛乐信息技术(上海)有限公司 Method and system for correcting based on voice identification results
CN103164399A (en) * 2013-02-26 2013-06-19 北京捷通华声语音技术有限公司 Punctuation addition method and device in speech recognition
CN103345922A (en) * 2013-07-05 2013-10-09 张巍 Large-length voice full-automatic segmentation method
CN103488723A (en) * 2013-09-13 2014-01-01 复旦大学 Automatic navigating method and system for interested semantic range of electronic reading
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system
CN105427858A (en) * 2015-11-06 2016-03-23 科大讯飞股份有限公司 Method and system for achieving automatic voice classification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DOUG BEEFERMAN: "Statistical Models for Text Segmentation", 《MACHINE LEARNING》 *
ELIZABETH SHRIBERG: "Prosody-based automatic segmentation of speech into sentences and topics", 《SPEECH COMMUNICATION》 *
任新社 等: "基于改进特征值的语音分割算法研究", 《南京师范大学学报( 工程技术版)》 *

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090051A (en) * 2017-12-20 2018-05-29 深圳市沃特沃德股份有限公司 The interpretation method and translator of continuous long voice document
CN108363765A (en) * 2018-02-06 2018-08-03 深圳市鹰硕技术有限公司 The recognition methods of audio paragraph and device
CN108363765B (en) * 2018-02-06 2020-12-08 深圳市鹰硕技术有限公司 Audio paragraph identification method and device
CN108446389A (en) * 2018-03-22 2018-08-24 平安科技(深圳)有限公司 Speech message searching and displaying method, device, computer equipment and storage medium
CN108446389B (en) * 2018-03-22 2021-12-24 平安科技(深圳)有限公司 Voice message search display method and device, computer equipment and storage medium
CN108364650A (en) * 2018-04-18 2018-08-03 北京声智科技有限公司 The adjusting apparatus and method of voice recognition result
CN108364650B (en) * 2018-04-18 2024-01-19 北京声智科技有限公司 Device and method for adjusting voice recognition result
CN110503943A (en) * 2018-05-17 2019-11-26 蔚来汽车有限公司 A voice interaction method and voice interaction system
CN108830639A (en) * 2018-05-17 2018-11-16 科大讯飞股份有限公司 Content data processing method and device, computer readable storage medium
CN110503943B (en) * 2018-05-17 2023-09-19 蔚来(安徽)控股有限公司 Voice interaction method and voice interaction system
CN108830639B (en) * 2018-05-17 2022-04-26 科大讯飞股份有限公司 Content data processing method and device, and computer readable storage medium
CN109344411A (en) * 2018-09-19 2019-02-15 深圳市合言信息科技有限公司 A kind of interpretation method for listening to formula simultaneous interpretation automatically
CN109361823A (en) * 2018-11-01 2019-02-19 深圳市号互联科技有限公司 A kind of intelligent interaction mode that voice is mutually converted with text
CN109743589A (en) * 2018-12-26 2019-05-10 百度在线网络技术(北京)有限公司 Article generation method and device
CN109743589B (en) * 2018-12-26 2021-12-14 百度在线网络技术(北京)有限公司 Article generation method and device
US11580463B2 (en) 2019-05-06 2023-02-14 Hithink Royalflush Information Network Co., Ltd. Systems and methods for report generation
CN110264997A (en) * 2019-05-30 2019-09-20 北京百度网讯科技有限公司 The method, apparatus and storage medium of voice punctuate
CN110379413A (en) * 2019-06-28 2019-10-25 联想(北京)有限公司 A kind of method of speech processing, device, equipment and storage medium
CN110379413B (en) * 2019-06-28 2022-04-19 联想(北京)有限公司 Voice processing method, device, equipment and storage medium
CN110399489B (en) * 2019-07-08 2022-06-17 厦门市美亚柏科信息股份有限公司 Chat data segmentation method, device and storage medium
CN110399489A (en) * 2019-07-08 2019-11-01 厦门市美亚柏科信息股份有限公司 A kind of chat data segmentation method, device and storage medium
CN110502631B (en) * 2019-07-17 2022-11-04 招联消费金融有限公司 Input information response method and device, computer equipment and storage medium
CN110502631A (en) * 2019-07-17 2019-11-26 招联消费金融有限公司 A kind of input information response method, apparatus, computer equipment and storage medium
CN110588524A (en) * 2019-08-02 2019-12-20 精电有限公司 A method for displaying information and a vehicle-mounted auxiliary display system
CN110619897A (en) * 2019-08-02 2019-12-27 精电有限公司 Conference summary generation method and vehicle-mounted recording system
CN110827825A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Punctuation prediction method, system, terminal and storage medium for speech recognition text
CN111079384A (en) * 2019-11-18 2020-04-28 佰聆数据股份有限公司 Identification method and system for intelligent quality inspection service forbidden words
WO2021109000A1 (en) * 2019-12-03 2021-06-10 深圳市欢太科技有限公司 Data processing method and apparatus, electronic device, and storage medium
CN113041623A (en) * 2019-12-26 2021-06-29 波克科技股份有限公司 Game parameter configuration method and device and computer readable storage medium
CN113041623B (en) * 2019-12-26 2023-04-07 波克科技股份有限公司 Game parameter configuration method and device and computer readable storage medium
CN111862980A (en) * 2020-08-07 2020-10-30 斑马网络技术有限公司 Incremental semantic processing method
CN112036128A (en) * 2020-08-21 2020-12-04 百度在线网络技术(北京)有限公司 A text content processing method, apparatus, device and storage medium
CN111931482B (en) * 2020-09-22 2021-09-24 思必驰科技股份有限公司 Text segmentation method and device
CN111931482A (en) * 2020-09-22 2020-11-13 苏州思必驰信息科技有限公司 Text segmentation method and device
CN112185424A (en) * 2020-09-29 2021-01-05 国家计算机网络与信息安全管理中心 Voice file cutting and restoring method, device, equipment and storage medium
CN112214965A (en) * 2020-10-21 2021-01-12 科大讯飞股份有限公司 Case regularization method, apparatus, electronic device and storage medium
CN112712794A (en) * 2020-12-25 2021-04-27 苏州思必驰信息科技有限公司 Speech recognition marking training combined system and device
CN112733660B (en) * 2020-12-31 2022-05-27 蚂蚁胜信(上海)信息技术有限公司 Method and device for splitting video strip
CN112818077A (en) * 2020-12-31 2021-05-18 科大讯飞股份有限公司 Text processing method, device, equipment and storage medium
CN112818077B (en) * 2020-12-31 2023-05-30 科大讯飞股份有限公司 Text processing method, device, equipment and storage medium
CN112733660A (en) * 2020-12-31 2021-04-30 支付宝(杭州)信息技术有限公司 Method and device for splitting video strip
CN112699687A (en) * 2021-01-07 2021-04-23 北京声智科技有限公司 Content cataloging method and device and electronic equipment
CN113076720A (en) * 2021-04-29 2021-07-06 新声科技(深圳)有限公司 Long text segmentation method and device, storage medium and electronic device
CN115394295A (en) * 2021-05-25 2022-11-25 阿里巴巴新加坡控股有限公司 Segmentation processing method, device, equipment and storage medium
CN114254587A (en) * 2021-12-15 2022-03-29 科大讯飞股份有限公司 Topic paragraph dividing method and device, electronic equipment and storage medium
CN114841171A (en) * 2022-04-29 2022-08-02 北京思源智通科技有限责任公司 Text segmentation subject extraction method, system, readable medium and device
CN117113974A (en) * 2023-04-26 2023-11-24 荣耀终端有限公司 Text segmentation method, device, chip, electronic equipment and medium
CN117113974B (en) * 2023-04-26 2024-05-24 荣耀终端有限公司 Text segmentation method, device, chip, electronic equipment and medium

Also Published As

Publication number Publication date
CN107305541B (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN107305541A (en) Speech recognition text segmentation method and device
US10950242B2 (en) System and method of diarization and labeling of audio data
US11037553B2 (en) Learning-type interactive device
CN109887497B (en) Modeling method, device and equipment for speech recognition
CN109686383B (en) Voice analysis method, device and storage medium
CN108536654B (en) Method and device for displaying identification text
US10432789B2 (en) Classification of transcripts by sentiment
CN111341305B (en) Audio data labeling method, device and system
WO2018108080A1 (en) Voiceprint search-based information recommendation method and device
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
CN104598644B (en) Favorite tag mining method and device
CN105427858A (en) Method and system for achieving automatic voice classification
US20180047387A1 (en) System and method for generating accurate speech transcription from natural speech audio signals
CN105654943A (en) Voice wakeup method, apparatus and system thereof
CN107077843A (en) Session control and dialog control method
JP2006190006A5 (en)
US9251808B2 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
WO2014187096A1 (en) Method and system for adding punctuation to voice files
JP5824829B2 (en) Speech recognition apparatus, speech recognition method, and speech recognition program
CN112233680A (en) Speaker role identification method and device, electronic equipment and storage medium
CN112818680B (en) Corpus processing method and device, electronic equipment and computer readable storage medium
CN114120425A (en) Emotion recognition method and device, electronic equipment and storage medium
US9805740B2 (en) Language analysis based on word-selection, and language analysis apparatus
JP2019124952A (en) Information processing device, information processing method, and program
JP2004094257A (en) Method and apparatus for generating question of decision tree for speech processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant