CN107305541A - Speech recognition text segmentation method and device - Google Patents
Speech recognition text segmentation method and device Download PDFInfo
- Publication number
- CN107305541A CN107305541A CN201610256898.8A CN201610256898A CN107305541A CN 107305541 A CN107305541 A CN 107305541A CN 201610256898 A CN201610256898 A CN 201610256898A CN 107305541 A CN107305541 A CN 107305541A
- Authority
- CN
- China
- Prior art keywords
- segmentation
- voice segments
- identification text
- text
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of speech recognition text segmentation method and device, this method includes:End-point detection is carried out to speech data, the beginning frame number of each voice segments and each voice segments is obtained and terminates frame number;Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;Extract the segmentation feature of the corresponding identification text of each voice segments;Using the segmentation feature and the segmented model that builds in advance of extraction, identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented;It is segmented according to segmentation testing result identification text corresponding to the speech data.The present invention can be realized automatically to be segmented to identification text, becomes apparent from the structure of an article of identification text.
Description
Technical field
The present invention relates to natural language processing field, and in particular to a kind of speech recognition text segmentation method
And device.
Background technology
With the development of voice technology, automatic speech recognition technology has been widely used in life
Every field, the life requirement that text greatly facilitates people is changed into by voice, such as turns session recording
Into text personnel participating in the meeting is sent to as meeting summary;The recording of interview is changed into text, herein
On the basis of compile news release etc..However, the identification text that speech recognition is obtained is not as human-edited
Text have the clearly structure of an article, the division of such as paragraph structure, so as to cause user checking identification
During text, be often difficult to find the emphasis or theme of whole identification text, especially when identification text compared with
Many and when being related to multiple themes, user is more difficult to clear the structure of an article of identification text, accurately finds out
The content of each theme.Therefore, how identification text is clearly showed into user, helps user's reason
The content of solution identification text, the displaying for speech recognition text is particularly important.
In the prior art, it is usually that the identification text of speech data is exposed directly to user, to knowing
Other result is not dealt with;Or the structure of an article of identification text is adjusted by manually, after adjustment
Textual presentation is recognized to user, such as according to the content of identification text, will recognize that text divides different sections
Fall, by the identification textual presentation after adjustment to user.This mode that manually adjusts is recognizing that text is more
When, labor workload is big, and efficiency is low, and time-consuming longer, identifying system is extremely difficult to practical effect
Really.
The content of the invention
The present invention provides a kind of speech recognition text segmentation method and device, to solve prior art by people
Work adjustment recognizes the problem of structure of an article workload of text is big, efficiency is low.
Therefore, the present invention provides following technical scheme:
A kind of speech recognition text segmentation method, including:
End-point detection is carried out to speech data, obtain each voice segments and each voice segments beginning frame number and
Terminate frame number;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segmented model that builds in advance of extraction, to speech data correspondence
Identification text carry out segmentation detection, with determine need be segmented position;
It is segmented according to segmentation testing result identification text corresponding to the speech data.
Preferably, methods described also includes, and segmented model is built in the following manner:
Collect speech data;
End-point detection is carried out to the speech data of collection, each voice segments are obtained;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
The segment information of the corresponding identification text of each voice segments is marked, the segment information is used to represent to work as
Whether the corresponding end position for recognizing text of preceding voice segments needs segmentation;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segment information as training data, segmented model is built.
Preferably, the segmentation feature for extracting the corresponding identification text of each voice segments includes:
From the segmentation feature for acoustically extracting each voice segments of the speech data, and by the segmentation feature
It is used as the first segmentation feature of the corresponding identification text of institute's speech segment;And/or
Know from the semantically extraction segmentation feature of the identification text, and using the segmentation feature as described
Second segmentation feature of other text.
Preferably, first segmentation feature includes:The duration of current speech segment, in addition to:Currently
Between the distance between voice segments and previous voice segments, and/or current speech segment and latter voice segments away from
From;
It is described to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
The difference for terminating frame number and the beginning frame number of current speech segment of current speech segment is calculated, and
Using the difference as current speech segment duration;
Also include:
The difference for starting frame number and the end frame number of previous voice segments of current speech segment is calculated, and
It regard the difference as the distance between current speech segment and previous voice segments;And/or
The difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and
It regard the difference as the distance between current speech segment and latter voice segments.
Preferably, first segmentation feature also includes:The speaker of current speech segment and previous voice
Whether the speaker of section identical, and/or speaker of current speech segment and the speaker of latter voice segments are
It is no identical;
It is described also to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
Speaker's change point detection is carried out to the speech data using speaker's isolation technics;
The speaker of current speech segment and previous voice segments are determined according to speaker's change point testing result
Whether speaker is identical, and/or determines speaking for current speech segment according to speaker's change point testing result
Whether people is identical with the speaker of latter voice segments.
Preferably, second segmentation feature include it is following any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper
The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment
The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
Preferably, the semantically extraction segmentation feature from the identification text includes:
Identification text corresponding to the speech data is modified, and the amendment includes:To institute's predicate
The corresponding identification text addition punctuate of sound data;
From the semantically extraction segmentation feature of revised identification text.
Preferably, it is described amendment also include it is following any one or more:
Identification text corresponding to the speech data carries out abnormal word filtering;
Identification text corresponding to the speech data carries out smooth processing;
It is regular that identification text corresponding to the speech data carries out numeral;
Identification text corresponding to the speech data carries out text replacement, and the text, which is replaced, to be included:
By English lower case upper in the corresponding identification text of the speech data or vice versa;
And/or the sensitive word in the corresponding identification text of the speech data is replaced with into additional character.
Preferably, it is described to utilize the segmentation feature extracted and the segmented model built in advance, to described
The corresponding identification text of speech data carries out segmentation detection, to determine to need the position being segmented to include:
In units of voice segments, the segmentation feature of the corresponding identification text of each voice segments is inputted into institute successively
State segmented model and carry out segmentation detection, whether determine the corresponding end position for recognizing text of each voice segments
Need segmentation.
Preferably, methods described also includes:
The identification text after segmentation is shown to user;Or
The theme that each paragraph after segmentation recognizes text is extracted, and by each theme presentation to user;
When perceiving user's theme interested, will the correspondence theme paragraph identification text exhibition
Show to user.
A kind of speech recognition text segmentation device, including:
Endpoint detection module, for carrying out end-point detection to speech data, obtains each voice segments and each language
The beginning frame number and end frame number of segment;
Sound identification module, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding
Recognize text;
Characteristic extracting module, the segmentation feature for extracting the corresponding identification text of each voice segments;
Detection module is segmented, for the segmentation feature using extraction and the segmented model built in advance,
Identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented;
Segmentation module, for being entered according to segmentation testing result identification text corresponding to the speech data
Row segmentation.
Preferably, described device also includes, and segmented model builds module, for building segmented model;
The segmented model, which builds module, to be included:
Data collection module, for collecting speech data;
End-point detection unit, the speech data for being collected to the data collection module carries out end points inspection
Survey, obtain each voice segments;
Voice recognition unit, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding
Recognize text;
Unit is marked, the segment information for marking the corresponding identification text of each voice segments, the segmentation
Information is used to represent whether the end position of the corresponding identification text of current speech segment needs segmentation;
Feature extraction unit, the segmentation feature for extracting the corresponding identification text of each voice segments;
Training unit, for the segmentation feature and the segment information, as training data, to be built
Segmented model.
Preferably, the characteristic extracting module includes:
Fisrt feature extraction module, for point for acoustically extracting each voice segments from the speech data
Duan Tezheng, and it regard the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment;
And/or
Second feature extraction module, for the semantically extraction segmentation feature from the identification text, and
It regard the segmentation feature as second segmentation feature for recognizing text.
Preferably, the fisrt feature extraction module includes:
Duration calculation unit, for calculating the end frame number of current speech segment and opening for current speech segment
The difference of beginning frame number, and using the difference as current speech segment duration;
Metrics calculation unit, beginning frame number and the knot of previous voice segments for calculating current speech segment
The difference of beam frame number, and it regard the difference as the distance between current speech segment and previous voice segments;
And/or the difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and
It regard the difference as the distance between current speech segment and latter voice segments.
Preferably, the fisrt feature extraction module also includes:
Speaker's change point detection unit, for being entered using speaker's isolation technics to the speech data
Row speaker change point is detected;
Speaker's determining unit, for determining current speech segment according to speaker's change point testing result
Whether speaker is identical with the speaker of previous voice segments, and/or according to speaker's change point testing result
Determine whether the speaker of current speech segment is identical with the speaker of latter voice segments.
Preferably, second segmentation feature include it is following any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper
The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment
The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
Preferably, the second feature extraction module includes:
Amending unit, for being modified to the corresponding identification text of the speech data, the amendment
Unit includes:Punctuate adds subelement, for being marked to the corresponding identification text addition of the speech data
Point;
Feature extraction unit, for the semantically extraction segmentation feature from revised identification text.
Preferably, the amending unit also includes following any one or more subelements:
Subelement is filtered, is filtered for carrying out abnormal word to the corresponding identification text of the speech data;
Smooth processing subelement, for carrying out smooth processing to the corresponding identification text of the speech data;
Regular subelement, it is regular for carrying out numeral to the corresponding identification text of the speech data;
Text replaces subelement, for carrying out text replacement to the corresponding identification text of the speech data,
The text, which is replaced, to be included:English lower case in the corresponding identification text of the speech data is turned
It is changed to capitalization or vice versa;And/or replace the sensitive word in the corresponding identification text of the speech data
It is changed to additional character.
Preferably, the segmentation detection module, specifically in units of voice segments, successively by each language
The segmentation feature of the corresponding identification text of segment inputs the segmented model and carries out segmentation detection, it is determined that respectively
Whether the end position of the corresponding identification text of voice segments needs segmentation.
Preferably, described device also includes:
First display module, for showing the identification text after segmentation to user;Or
Subject distillation module, the theme of text is recognized for extracting each paragraph after segmentation;
Second display module, for by each theme presentation to user;
Sensing module, the theme interested for perceiving user, and perceiving user master interested
During topic, second display module is triggered by the identification textual presentation of the paragraph of the correspondence theme to use
Family.
The present invention provides a kind of speech recognition text segmentation method and device, by being carried out to speech data
End-point detection obtains each voice segments, and each voice segments are carried out with speech recognition and obtains the corresponding knowledge of each voice segments
Other text, then, extracts the segmentation feature of the corresponding identification text of each voice segments, utilizes point of extraction
Duan Tezheng and the segmented model built in advance, identification text corresponding to the speech data are divided
Section detection, to determine the position for needing to be segmented, is segmented according to segmentation testing result to identification text,
So as to automatically adjust the structure of an article of identification text, become apparent from its structure of an article, and then
Contribute to user's fast understanding to recognize the content of text, lift user's reading efficiency.
Further, the segmentation feature can from speech data acoustically or identification text language
Extracted in justice, it is of course also possible to both are integrated based on the segmentation feature extracted in different aspects, and
Segmentation detection is carried out using corresponding segmented model identification text corresponding to the speech data, it is determined that
The position of segmentation is needed, the accuracy of segmentation can be further improved.
It is possible to further show all identification texts after segmentation to user, or extract every section of knowledge
The theme of other text, first by every section of theme presentation to user, when user needs to check section interested
When falling, then paragraph content shown, so that it is interested interior to contribute to user to be quickly found out oneself
Hold.
Brief description of the drawings
, below will be right in order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art
The accompanying drawing used required in embodiment is briefly described, it should be apparent that, it is attached in describing below
Figure is only some embodiments described in the present invention, for those of ordinary skill in the art, also
Other accompanying drawings can be obtained according to these accompanying drawings.
Fig. 1 is a kind of flow chart of speech recognition text segmentation method of the embodiment of the present invention;
Fig. 2 is the flow chart of structure segmented model in the embodiment of the present invention;
Fig. 3 is a kind of structural representation of speech recognition text segmentation device of the embodiment of the present invention;
Fig. 4 is the structural representation of segmentation module structure module in the embodiment of the present invention;
Fig. 5 is another structural representation of speech recognition text segmentation device of the embodiment of the present invention;
Fig. 6 is another structural representation of speech recognition text segmentation device of the embodiment of the present invention.
Embodiment
In order that those skilled in the art more fully understand the scheme of the embodiment of the present invention, with reference to
Drawings and embodiments are described in further detail to the embodiment of the present invention.
As shown in figure 1, be the flow chart of speech recognition text segmentation method of the embodiment of the present invention, including
Following steps:
Step 101, end-point detection is carried out to speech data, obtains opening for each voice segments and each voice segments
Beginning frame number and end frame number.
The speech data can be obtained according to practical application recording, such as session recording, interview recording.
So-called end-point detection is exactly that the starting point of each voice segments is found out from one section of given voice signal
And end point.Some end-point detecting methods of the prior art can be specifically used, it is real to this present invention
Example is applied not limit.
Step 102, speech recognition is carried out to each voice segments, obtains the corresponding identification text of each voice segments.
Specifically, each voice segments can be carried out with feature extraction, MFCC (Mel Frequency are such as extracted
Cepstrum Coefficient) feature;Then the characteristic and the acoustic mode of training in advance extracted are utilized
Type and language model carry out decoding operate;The corresponding identification of each voice segments is obtained finally according to decoded result
Text.The detailed process of speech recognition is same as the prior art, will not be described in detail herein.
Step 103, the segmentation feature of the corresponding identification text of each voice segments is extracted.
In actual applications, the segmentation feature can from speech data acoustically or identification text
Semantically extraction, it is of course also possible to be that comprehensive both are special based on the segmentation extracted in different aspects
Levy, and segmentation detection carried out using corresponding segmented model identification text corresponding to the speech data,
It is determined that needing the position being segmented, the accuracy of segmentation can be further improved.
Step 104, using the segmentation feature and the segmented model that builds in advance of extraction, to institute's predicate
The corresponding identification text of sound data carries out segmentation detection, to determine the position for needing to be segmented.
Specifically, it is in units of voice segments, the segmentation feature of the corresponding identification text of each voice segments is defeated
Enter the segmented model and carry out segmentation detection, determine the end position of the corresponding identification text of each voice segments
Whether segmentation is needed.
It should be noted that in actual applications, the output of the segmented model can be current speech
Whether the end position of the corresponding identification text of section needs segmentation or current speech segment corresponding
Recognize that the end position of text needs the probability being segmented.Certainly, the output of different type parameter, not
The training process of segmented model is influenceed, only different input/output arguments need to be set in model training i.e.
Can.The specific training process of segmented model will be described in detail later.
If segmented model output is probability parameter, in such a case, it is possible to preset corresponding
Threshold value, if the probability is more than the threshold value, then it is assumed that the corresponding identification text of current speech segment
End position needs segmentation.
Step 105, divided according to segmentation testing result identification text corresponding to the speech data
Section.
Specifically, segmentation markers can be added in the end position for the identification text for needing to be segmented, so
It can facilitate and be divided in displaying according to segmentation markers identification text corresponding to the speech data
Section displaying.
It should be noted that in actual applications, can be with the corresponding identification text of one voice segments of every extraction
This segmentation feature, i.e., carry out segmentation detection to the identification text;Can also be first to all voice segments pair
After the segmentation feature for the identification text answered all is extracted, then in units of voice segments, successively by each voice segments
The segmentation feature input segmented model of corresponding identification text carries out segmentation detection, and this present invention is implemented
Example is not limited.
In another embodiment of the inventive method, it can further include to after user's displaying segmentation
The step of recognizing text.During specific displaying, it be able to will be belonged to according to the segmentation markers in identification text
Same section of text is placed on a paragraph, and the text segmentation of different sections is shown.
It is A1, A2, A3, A4, A5, A6, A7, A8, A9, A10 such as to recognize text.Wherein, Ai represents one
The corresponding identification text of voice segments, after segmentation detection, it is determined that segmentation is needed at A2 and A5,
The form that can then show is as follows:
A1,A2
A3,A4,A5
A6,A7,A8,A9,A10
In another embodiment of the inventive method, following steps are can further include:
The theme that each paragraph after segmentation recognizes text is extracted, and by each theme presentation to user;
When perceiving user's theme interested, will the correspondence theme paragraph identification text exhibition
Show to user.
Wherein, user selects the mode of theme interested to have a variety of, such as, to the point of corresponding theme
The action such as choosing is hit or drawn, or corresponding sequence number is provided to each theme, user passes through input through keyboard phase
Sequence number answered etc..
It is previously noted that in embodiments of the present invention, the segmentation feature can be from the acoustics of speech data
Upper or identification text semantically extraction, it is of course also possible to be it is comprehensive both be based on different aspects
The segmentation feature of upper extraction, the i.e. segmentation feature for acoustically extracting each voice segments from the speech data,
And using the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment, and from
The semantically extraction segmentation feature of the identification text, and it regard the segmentation feature as the identification text
The second segmentation feature.Correspondingly, when carrying out segmented model training, acoustics can also be based solely on
On segmentation feature training segmented model, or be based solely on semantically segmentation feature training segmentation mould
Type, or segmentation feature training segmented model based on segmentation feature acoustically and semantically.
The segmentation feature in both different aspects is described in detail separately below.
1. from the acoustically extraction segmentation feature of speech data, i.e., foregoing first segmentation feature.
In actual applications, first segmentation feature can include:The duration of current speech segment, also
Including:The distance between current speech segment and previous voice segments, and/or current speech segment and latter voice
The distance between section.
Further, first segmentation feature may also include:The speaker of current speech segment with it is previous
Whether the speaker of voice segments identical, and/or speaker of current speech segment and latter voice segments are spoken
Whether people is identical.
This several segmentation feature is described in detail separately below.
A) duration of current speech segment
The frame number that the duration of voice segments can use voice segments to include is represented.Therefore, current speech is calculated
The difference for terminating frame number and the beginning frame number of current speech segment of section, you can obtain current speech segment
Duration, that is to say, that using the difference as current speech segment duration.
B) the distance between current speech segment and previous voice segments
The distance between current speech segment and previous voice segments can use the start frame sequence of current speech segment
Difference number between the end frame number of previous voice segments is represented.Therefore, current speech segment is calculated
Start the difference of frame number and the end frame number of previous voice segments, regard the difference as current speech segment
The distance between with previous voice segments.
It should be noted that when current speech segment is first voice segments, current speech segment and previous language
The distance between segment is 0.
C) the distance between current speech segment and latter voice segments
Similarly, the distance between current speech segment and latter voice segments can use latter voice segments
The difference started between frame number and the end frame number of current speech segment is represented.Therefore, calculate latter
The difference of the end frame number for starting frame number and current speech segment of voice segments, and using the difference as
The distance between current speech segment and latter voice segments.
It should be noted that current speech segment be last voice segments when, current speech segment with it is latter
The distance between voice segments are 0.
D) whether the speaker of current speech segment is identical with the speaker of previous voice segments
E) whether the speaker of current speech segment is identical with the speaker of latter voice segments
For neighbouring speech segments speaker, whether identical detection can utilize speaker's isolation technics to institute
State speech data and carry out speaker's change point detection, determined according to speaker's change point testing result current
Whether the speaker of voice segments is identical and current speech segment with the speaker of previous voice segments to be spoken
Whether people is identical with the speaker of latter voice segments.
Speaker's change point is that same speaker speaks end, the place that another speaker starts,
Specific detection method is same as the prior art, is not described in detail herein.
2. from the semantically extraction segmentation feature of identification text, i.e., foregoing second segmentation feature.
In actual applications, second segmentation feature can include it is following any one or more:
A) forward direction unsegmented sentence number, refer to from current speech segment it is corresponding identification text starting position to
The sentence sum that all identification texts are included between a upper segmentation markers.
A upper segmentation markers can correspond to the identification before identification text start according to current speech segment
The segmentation markers of text are obtained, and the segmentation markers can be according to the correspondence identification text of voice segments before
Segmentation testing result obtain.
It should be noted that in actual applications, if comprising preceding to not in second segmentation feature
This feature of segmentation sentence number, when carrying out segmentation detection, it is necessary to often extracted using above-mentioned
The segmentation feature of the corresponding identification text of one voice segments, i.e., carry out being segmented detection to the identification text
Mode.
In addition, it is necessary to explanation, if the corresponding identification text of current speech segment is identification text
Beginning, the forward direction unsegmented sentence number is 0.
B) backward unsegmented sentence number, the institute referred to after the corresponding identification text of current speech segment is insighted
The sentence sum that other text is included.
The backward unsegmented sentence number refers to all identifications after current speech segment correspondence identification text
Sentence that text is included sum, can be by analyzing the sentence after current speech segment correspondence identification text
Number is obtained.
If it should be noted that current speech segment correspondence identification text for it is all identification texts endings,
The backward unsegmented sentence number is 0.
C) the sentence number that the corresponding identification text of current speech segment is included.
Specifically corresponding sentence number can be obtained with the punctuate in Direct Analysis current speech segment correspondence identification text.
D) similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments.
The similarity is typically measured by the distance between vector or angle, between such as calculating vector
Cosine angle, the angle is smaller, and two identification text vector similarities are higher.The term vector
Change process is prior art, be will not be described in detail herein.
In order to exclude interference of some stop-words to Text similarity computing, it can respectively delete and work as first
The stop-word that preceding voice segments identification text corresponding with previous voice segments is included, the stop-word is to know
The frequency occurred in other text is higher, but the word without practical significance, symbol or mess code etc., such as " this,
With, meeting, be ".It is specific when deleting, identification text can be searched by the stopping vocabulary that builds in advance
In stop-word realize.Then it will delete and remaining term vector in text is recognized after stop-word, will
All term vectors combine in voice segments correspondence identification text, respectively obtain current speech segment correspondence knowledge
Other text vector and previous voice segments correspondence identification text vector, calculate the phase of two identification text vectors
Like degree.
It should be noted that at the beginning of current speech segment correspondence identification text is all identification text,
The similarity is 0.
E) similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
It is similar with above similarity calculating method, that is, delete after the stop-word in identification text, will recognize
Similarity is calculated after text vector.
It should be noted that when current speech segment correspondence identification text is the ending of all identification texts,
The similarity is 0.
It should be noted that before the second segmentation feature is extracted, in addition it is also necessary to the speech data pair
The identification text answered is modified, then from the semantically extraction described second of revised identification text
Segmentation feature.
To recognizing that the amendment of text mainly includes:To identification text addition punctuate.It is described addition punctuate be
Corresponding punctuation mark is added to identification text, such as based on conditional random field models to identification text addition
Punctuate.In order that the punctuate of addition is more accurate, the threshold that voice can be set intersegmental with addition punctuate in section
Value, the threshold value of the intersegmental addition punctuate of such as voice sets smaller, and the threshold value that punctuate is added in voice segments is set
Put larger, so as to increase the possibility of the intersegmental addition punctuate of voice, reduce in voice segments and add punctuate
Possibility.Add the text after punctuate, will with punctuation mark (including comma, ", question mark "”、
Exclamation mark "!" and fullstop ".") separate text, be used as one.
Secondly, it is described amendment can also further comprise it is following any one or more:
(1) identification text corresponding to the speech data carries out abnormal word filtering.
Text filtering, which is mainly, filters out the abnormal word of mistake in identification text, can specifically be put according to word
Reliability and the result of syntactic analysis are filtered.
(2) identification text corresponding to the speech data carries out smooth processing.
Incoherent sentence is mainly smoothed out with the fingers suitable by the smooth processing of text, and the repetitor of no practical significance only retains
One, such as " extremely good ", only retain one " very ".Modal particle without practical significance can be neglected
Slightly, without typing, fall as " oh " needs of " oh this problem " are smooth.
(3) identification text corresponding to the speech data carries out digital regular.
All numerals in the identification text obtained due to speech recognition are all represented with Chinese figure, still
Some numerals represent just to meet the reading habit of user with Arabic numerals, such as 2 points 5 yuan, it should
It is expressed as 21.5 yuan.The regular numeral is exactly that some Chinese figures are converted into Arabic numerals, such as
The method based on the ABNF syntax can be used.
(4) identification text corresponding to the speech data carries out text replacement, and the text replaces bag
Include two kinds of situations:
A kind of situation is the replacement between English capital and small letter, will the corresponding identification text of the speech data
In English lower case upper or vice versa;Such as " nba " replaces with " NBA ", " sieve c "
It is substituted for " sieve C " etc.;
Another situation be by the speech data it is corresponding identification text in sensitive word replace with it is special
Symbol, reaches hiding effect.During specific replacement, sensitive vocabulary can be set up, sensitive vocabulary is then traveled through
Search in identification text and whether sensitive word occur, replaced if occurring with additional character, such as some violent tenets
Word, such as " robberys " be sensitive word, then " robbery " occurred in text is all with " * * * * " are replaced.
It can be needed to select one or two according to practical application it should be noted that above two text replaces situation
Plant to carry out, this embodiment of the present invention is not limited.
As shown in Fig. 2 be the flow chart that segmented model is built in the embodiment of the present invention, including following step
Suddenly:
Step 201, speech data is collected.
Step 202, end-point detection is carried out to the speech data, obtains each voice segments.
Step 203, speech recognition is carried out to each voice segments, obtains the corresponding identification text of each voice segments.
Step 204, the segment information of the corresponding identification text of each voice segments, the segment information are marked
For representing whether the end position of the corresponding identification text of current speech segment needs segmentation.
Such as, if segmentation, is labeled as 1, otherwise, is labeled as 0.It can certainly use other
Symbol represents that the embodiment of the present invention is not construed as limiting.
Step 205, the segmentation feature of the corresponding identification text of each voice segments is extracted.
Step 206, the segmentation feature and the segment information are built into segmentation as training data
Model.
The segmented model can use pattern-recognition in common model, such as Bayesian model, support to
Amount machine model etc..During specific training, using the segmentation feature for recognizing text as the input of model, it will mark
The segment information of note carries out model training, obtains segmented model as the output of model.Segmented model
Specific training process it is same as the prior art, will not be described in detail herein.It should be noted that described point
Class model can be obtained with off-line training.
It should be noted that when carrying out segmented model training, the segmentation that can be based solely on acoustically is special
Training segmented model is levied, or is based solely on segmentation feature training segmented model semantically, Huo Zheji
Segmentation feature training segmented model in segmentation feature acoustically and semantically.Correspondingly, above-mentioned
, can an extraction base when extracting the segmentation feature of the corresponding identification text of each voice segments in step 205
In segmentation feature acoustically or based on segmentation feature semantically, it can also extract simultaneously based on acoustics
On segmentation feature and based on segmentation feature semantically, this embodiment of the present invention is not limited.
In addition, it is necessary to explanation, the segmented model trained based on different type segmentation feature, in profit
, it is necessary to extract identification text to be presented when carrying out segmentation detection to identification text to be presented with the segmented model
The segmentation feature of this respective type, the segmentation feature of extraction is input to the segmented model to determine to treat
The position of segmentation is needed in displaying identification text.
The speech recognition text segmentation method that the present invention is provided, by carrying out end-point detection to speech data
Each voice segments are obtained, carrying out speech recognition to each voice segments obtains the corresponding identification text of each voice segments,
Then, extract the segmentation feature of the corresponding identification text of each voice segments, using the segmentation feature of extraction with
And the segmented model built in advance, identification text progress segmentation detection corresponding to the speech data,
Identification text is segmented according to segmentation testing result, so as to automatically adjust identification text
The structure of an article, becomes apparent from its structure of an article, and then contributes to user's fast understanding to recognize text
Content, lifts user's reading efficiency.
In actual applications, the segmentation feature can from speech data acoustically or identification text
Semantically extraction, it is of course also possible to it is comprehensive both based on the segmentation feature extracted in different aspects,
And segmentation detection is carried out using corresponding segmented model identification text corresponding to the speech data, really
The fixed position for needing to be segmented, can further improve the accuracy of segmentation.
Further, the speech recognition text segmentation method that the present invention is provided, can also show to user
All identification texts after segmentation, or the theme of every section of identification text is extracted, first by every section of theme
User is showed, is shown when user needs to check paragraph interested, then by paragraph content,
So as to contribute to user to be quickly found out oneself content interested.
Correspondingly, the embodiment of the present invention also provides a kind of speech recognition text segmentation device, such as Fig. 3 institutes
Show, be a kind of structural representation of the device.
In this embodiment, described device includes:
Endpoint detection module 301, for speech data carry out end-point detection, obtain each voice segments and
The beginning frame number and end frame number of each voice segments;
Sound identification module 302, for carrying out speech recognition to each voice segments, obtains each voice segments pair
The identification text answered;
Characteristic extracting module 303, the segmentation feature for extracting the corresponding identification text of each voice segments;
Detection module 304 is segmented, for the segmentation feature using extraction and the segmentation mould built in advance
Type, identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented
Put;Specifically, it is successively that the segmentation of the corresponding identification text of each voice segments is special in units of voice segments
Levy the input segmented model and carry out segmentation detection, determine the end of the corresponding identification text of each voice segments
Whether position needs segmentation;
Segmentation module 305, for according to segmentation testing result identification text corresponding to the speech data
This progress is segmented.
It should be noted that in actual applications, the characteristic extracting module 303 can be from voice number
According to acoustically or identification text semantically extraction segmentation feature, it is of course also possible to be it is comprehensive this
Two kinds are based on extracting segmentation feature in different aspects.Correspondingly, the characteristic extracting module 303 can be with
Including:Fisrt feature extraction module and/or second feature extraction module.Wherein:
Fisrt feature extraction module, for point for acoustically extracting each voice segments from the speech data
Duan Tezheng, and it regard the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment;
Second feature extraction module, for the semantically extraction segmentation feature from the identification text, and
It regard the segmentation feature as second segmentation feature for recognizing text.
Wherein, a kind of embodiment of the fisrt feature extraction module includes:Duration calculation unit and away from
From computing unit;Another embodiment of the fisrt feature extraction module can also further comprise:Speak
People's change point detection unit and speaker's determining unit.These units are illustrated separately below.
What the duration calculation unit was used to calculating current speech segment terminates frame number and current speech segment
Start frame number difference, and using the difference as current speech segment duration.
The metrics calculation unit is used for the beginning frame number for calculating current speech segment and previous voice segments
Terminate the difference of frame number, and regard the difference as the distance between current speech segment and previous voice segments;
And/or the difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and
It regard the difference as the distance between current speech segment and latter voice segments.
Speaker's change point detection unit is used for using speaker's isolation technics to the speech data
Carry out speaker's change point detection.
Speaker's determining unit is used to determine current speech segment according to speaker's change point testing result
Speaker and previous voice segments speaker it is whether identical, and/or detected and tie according to speaker's change point
Fruit determines whether the speaker of current speech segment is identical with the speaker of latter voice segments.
A kind of embodiment of the second feature extraction module includes:
Amending unit, for being modified to the corresponding identification text of the speech data, the amendment
Unit includes:Punctuate adds subelement, for being marked to the corresponding identification text addition of the speech data
Point, such as based on conditional random field models identification text addition punctuate corresponding to the speech data;
Feature extraction unit, for the semantically extraction segmentation feature from revised identification text.
Wherein, the second segmentation feature that the feature extraction unit is extracted can include it is following any one
Or it is a variety of:Forward direction unsegmented sentence number, backward unsegmented sentence number, the corresponding identification of current speech segment
The corresponding identification text of sentence number that text is included, current speech segment is corresponding with previous voice segments to be recognized
The corresponding identification text of similarity, the current speech segment identification text corresponding with latter voice segments of text
Similarity.
In actual applications, the amending unit can also include following any one or more subelements:
Subelement is filtered, is filtered for carrying out abnormal word to the corresponding identification text of the speech data;
Smooth processing subelement, for carrying out smooth processing to the corresponding identification text of the speech data;
Regular subelement, it is regular for carrying out numeral to the corresponding identification text of the speech data;
Text replaces subelement, for carrying out text replacement to the corresponding identification text of the speech data,
The text, which is replaced, to be included:English lower case in the corresponding identification text of the speech data is turned
It is changed to capitalization or vice versa;And/or replace the sensitive word in the corresponding identification text of the speech data
It is changed to additional character.
In embodiments of the present invention, the segmented model can by corresponding segmented model build module from
Line is built, the segmented model build module can independently of speech recognition text segmentation device of the present invention,
It can also be integrated in one with speech recognition text segmentation device of the present invention, to this embodiment of the present invention not
Limit.
As shown in figure 4, be a kind of structural representation of segmented model structure module in the embodiment of the present invention,
Including:
Data collection module 401, for collecting speech data;
End-point detection unit 402, the speech data for being collected to the data collection module carries out end
Point detection, obtains each voice segments;
Voice recognition unit 403, for carrying out speech recognition to each voice segments, obtains each voice segments pair
The identification text answered;
Unit 404 is marked, the segment information for marking the corresponding identification text of each voice segments is described
Segment information is used to represent whether the end position of the corresponding identification text of current speech segment needs segmentation;
Feature extraction unit 405, the segmentation feature for extracting the corresponding identification text of each voice segments;
Training unit 406, for using the segmentation feature and the segment information as training data,
Build segmented model.
It should be noted that when carrying out segmented model training, segmentation acoustically can be based solely on
Feature (i.e. above-mentioned first segmentation feature) trains segmented model, or is based solely on semantically
Segmentation feature (i.e. above-mentioned second segmentation feature) training segmented model, or based on acoustics
On segmentation feature and semantically segmentation feature training segmented model.Correspondingly, features described above is extracted
When unit 405 extracts the segmentation feature of the corresponding identification text of each voice segments, it can only extract and be based on sound
Segmentation feature on or based on segmentation feature semantically, can also be extracted based on acoustically simultaneously
Segmentation feature and based on segmentation feature semantically, is not limited this embodiment of the present invention.
In addition, the output of the segmented model can be the end of the corresponding identification text of current speech segment
Whether position needs the end position needs of segmentation or the corresponding identification text of current speech segment
The probability of segmentation.Certainly, the output of different type parameter, has no effect on the training process of segmented model,
Different input/output arguments only need to be set in model training.
The present invention provides a kind of speech recognition text segmentation device, by carrying out end points inspection to speech data
Each voice segments are measured, carrying out speech recognition to each voice segments obtains the corresponding identification text of each voice segments,
Then, extract the segmentation feature of the corresponding identification text of each voice segments, using the segmentation feature of extraction with
And the segmented model built in advance, identification text progress segmentation detection corresponding to the speech data,
Identification text is segmented according to segmentation testing result, so as to automatically adjust identification text
The structure of an article, becomes apparent from its structure of an article, and then contributes to user's fast understanding to recognize text
Content, lifts user's reading efficiency.
Further, the segmentation feature can from speech data acoustically or identification text language
Extracted in justice, it is of course also possible to both are integrated based on the segmentation feature extracted in different aspects, and
Segmentation detection is carried out using corresponding segmented model identification text corresponding to the speech data, it is determined that
The position of segmentation is needed, the accuracy of segmentation can be further improved.
As shown in figure 5, being that another structure of speech recognition text segmentation device of the embodiment of the present invention is shown
It is intended to.
From unlike Fig. 3, in this embodiment, described device also includes:
First display module 501, for showing the identification text after segmentation to user.
As shown in fig. 6, being that another structure of speech recognition text segmentation device of the embodiment of the present invention is shown
It is intended to.
From unlike Fig. 3, in this embodiment, described device also includes:
Subject distillation module 601, the theme of text is recognized for extracting each paragraph after segmentation;
Second display module 602, for by each theme presentation to user;
Sensing module 603, the theme interested for perceiving user, and it is interested perceiving user
Theme when, trigger second display module 602 will the correspondence theme paragraph identification text
Show user.
The speech recognition text segmentation device that the present invention is provided, can also be in several ways to user's exhibition
Show the identification text after segmentation, not only can clearly recognize text to user's displaying structure of an article, and
And may also help in user and be quickly found out oneself content interested, further lift reading efficiency.
Each embodiment in this specification is described by the way of progressive, phase between each embodiment
With similar part mutually referring to what each embodiment was stressed is and other embodiment
Difference.For device embodiment, because it is substantially similar to embodiment of the method,
So describing fairly simple, the relevent part can refer to the partial explaination of embodiments of method.Above institute
The device embodiment of description is only schematical, wherein the unit illustrated as separating component can
To be or may not be physically separate, the part shown as unit can be or also may be used
Not to be physical location, you can with positioned at a place, or multiple NEs can also be distributed to
On.Some or all of module therein can be selected to realize the present embodiment side according to the actual needs
The purpose of case.Those of ordinary skill in the art are without creative efforts, you can to manage
Solve and implement.
The embodiment of the present invention is described in detail above, embodiment pair used herein
The present invention is set forth, the explanation of above example be only intended to help to understand the present invention method and
Device;Simultaneously for those of ordinary skill in the art, according to the thought of the present invention, specific real
Apply and will change in mode and application, in summary, this specification content should not be understood
For limitation of the present invention.
Claims (20)
1. a kind of speech recognition text segmentation method, it is characterised in that including:
End-point detection is carried out to speech data, obtain each voice segments and each voice segments beginning frame number and
Terminate frame number;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segmented model that builds in advance of extraction, to speech data correspondence
Identification text carry out segmentation detection, with determine need be segmented position;
It is segmented according to segmentation testing result identification text corresponding to the speech data.
2. according to the method described in claim 1, it is characterised in that methods described also includes, by with
Under type builds segmented model:
Collect speech data;
End-point detection is carried out to the speech data of collection, each voice segments are obtained;
Speech recognition is carried out to each voice segments, the corresponding identification text of each voice segments is obtained;
The segment information of the corresponding identification text of each voice segments is marked, the segment information is used to represent to work as
Whether the corresponding end position for recognizing text of preceding voice segments needs segmentation;
Extract the segmentation feature of the corresponding identification text of each voice segments;
Using the segmentation feature and the segment information as training data, segmented model is built.
3. according to the method described in claim 1, it is characterised in that described to extract each voice segments correspondence
The segmentation feature of identification text include:
From the segmentation feature for acoustically extracting each voice segments of the speech data, and by the segmentation feature
It is used as the first segmentation feature of the corresponding identification text of institute's speech segment;And/or
Know from the semantically extraction segmentation feature of the identification text, and using the segmentation feature as described
Second segmentation feature of other text.
4. method according to claim 3, it is characterised in that first segmentation feature includes:
The duration of current speech segment, in addition to:The distance between current speech segment and previous voice segments, and/or
The distance between current speech segment and latter voice segments;
It is described to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
The difference for terminating frame number and the beginning frame number of current speech segment of current speech segment is calculated, and
Using the difference as current speech segment duration;
Also include:
The difference for starting frame number and the end frame number of previous voice segments of current speech segment is calculated, and
It regard the difference as the distance between current speech segment and previous voice segments;And/or
The difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and
It regard the difference as the distance between current speech segment and latter voice segments.
5. method according to claim 4, it is characterised in that first segmentation feature is also wrapped
Include:Whether the speaker of the speaker of current speech segment and previous voice segments identical, and/or current speech
Whether the speaker of section is identical with the speaker of latter voice segments;
It is described also to include from the segmentation feature for acoustically extracting each voice segments of the speech data:
Speaker's change point detection is carried out to the speech data using speaker's isolation technics;
The speaker of current speech segment and previous voice segments are determined according to speaker's change point testing result
Whether speaker is identical, and/or determines speaking for current speech segment according to speaker's change point testing result
Whether people is identical with the speaker of latter voice segments.
6. method according to claim 3, it is characterised in that second segmentation feature includes
Below any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper
The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment
The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
7. method according to claim 3, it is characterised in that described from the identification text
Semantically extracting segmentation feature includes:
Identification text corresponding to the speech data is modified, and the amendment includes:To institute's predicate
The corresponding identification text addition punctuate of sound data;
From the semantically extraction segmentation feature of revised identification text.
8. method according to claim 7, it is characterised in that the amendment also includes following
Meaning is one or more:
Identification text corresponding to the speech data carries out abnormal word filtering;
Identification text corresponding to the speech data carries out smooth processing;
It is regular that identification text corresponding to the speech data carries out numeral;
Identification text corresponding to the speech data carries out text replacement, and the text, which is replaced, to be included:
By English lower case upper in the corresponding identification text of the speech data or vice versa;
And/or the sensitive word in the corresponding identification text of the speech data is replaced with into additional character.
9. the method according to any one of claim 1 to 8, it is characterised in that described utilize carries
The segmentation feature taken and the segmented model built in advance, identification text corresponding to the speech data
Segmentation detection is carried out, to determine to need the position being segmented to include:
In units of voice segments, the segmentation feature of the corresponding identification text of each voice segments is inputted into institute successively
State segmented model and carry out segmentation detection, whether determine the corresponding end position for recognizing text of each voice segments
Need segmentation.
10. the method according to any one of claim 1 to 8, it is characterised in that methods described
Also include:
The identification text after segmentation is shown to user;Or
The theme that each paragraph after segmentation recognizes text is extracted, and by each theme presentation to user;
When perceiving user's theme interested, will the correspondence theme paragraph identification text exhibition
Show to user.
11. a kind of speech recognition text segmentation device, it is characterised in that including:
Endpoint detection module, for carrying out end-point detection to speech data, obtains each voice segments and each language
The beginning frame number and end frame number of segment;
Sound identification module, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding
Recognize text;
Characteristic extracting module, the segmentation feature for extracting the corresponding identification text of each voice segments;
Detection module is segmented, for the segmentation feature using extraction and the segmented model built in advance,
Identification text corresponding to the speech data carries out segmentation detection, to determine the position for needing to be segmented;
Segmentation module, for being entered according to segmentation testing result identification text corresponding to the speech data
Row segmentation.
12. device according to claim 11, it is characterised in that described device also includes, point
Segment model builds module, for building segmented model;The segmented model, which builds module, to be included:
Data collection module, for collecting speech data;
End-point detection unit, the speech data for being collected to the data collection module carries out end points inspection
Survey, obtain each voice segments;
Voice recognition unit, for carrying out speech recognition to each voice segments, obtains each voice segments corresponding
Recognize text;
Unit is marked, the segment information for marking the corresponding identification text of each voice segments, the segmentation
Information is used to represent whether the end position of the corresponding identification text of current speech segment needs segmentation;
Feature extraction unit, the segmentation feature for extracting the corresponding identification text of each voice segments;
Training unit, for the segmentation feature and the segment information, as training data, to be built
Segmented model.
13. device according to claim 11, it is characterised in that the characteristic extracting module bag
Include:
Fisrt feature extraction module, for point for acoustically extracting each voice segments from the speech data
Duan Tezheng, and it regard the segmentation feature as corresponding the first segmentation feature for recognizing text of institute's speech segment;
And/or
Second feature extraction module, for the semantically extraction segmentation feature from the identification text, and
It regard the segmentation feature as second segmentation feature for recognizing text.
14. device according to claim 13, it is characterised in that the fisrt feature extracts mould
Block includes:
Duration calculation unit, for calculating the end frame number of current speech segment and opening for current speech segment
The difference of beginning frame number, and using the difference as current speech segment duration;
Metrics calculation unit, beginning frame number and the knot of previous voice segments for calculating current speech segment
The difference of beam frame number, and it regard the difference as the distance between current speech segment and previous voice segments;
And/or the difference for starting frame number and the end frame number of current speech segment of latter voice segments is calculated, and
It regard the difference as the distance between current speech segment and latter voice segments.
15. device according to claim 14, it is characterised in that the fisrt feature extracts mould
Block also includes:
Speaker's change point detection unit, for being entered using speaker's isolation technics to the speech data
Row speaker change point is detected;
Speaker's determining unit, for determining current speech segment according to speaker's change point testing result
Whether speaker is identical with the speaker of previous voice segments, and/or according to speaker's change point testing result
Determine whether the speaker of current speech segment is identical with the speaker of latter voice segments.
16. device according to claim 13, it is characterised in that the second segmentation feature bag
Include it is following any one or more:
Forward direction unsegmented sentence number, refers to the starting position from the corresponding identification text of current speech segment to upper
The sentence sum that all identification texts are included between one segmentation markers;
Backward unsegmented sentence number, refers to all identifications after the corresponding identification text of current speech segment
The sentence sum that text is included;
The sentence number that the corresponding identification text of current speech segment is included;
The similarity of the corresponding identification text of current speech segment identification text corresponding with previous voice segments;
The similarity of the corresponding identification text of current speech segment identification text corresponding with latter voice segments.
17. device according to claim 13, it is characterised in that the second feature extracts mould
Block includes:
Amending unit, for being modified to the corresponding identification text of the speech data, the amendment
Unit includes:Punctuate adds subelement, for being marked to the corresponding identification text addition of the speech data
Point;
Feature extraction unit, for the semantically extraction segmentation feature from revised identification text.
18. device according to claim 17, it is characterised in that the amending unit also includes
Any one or more subelements below:
Subelement is filtered, is filtered for carrying out abnormal word to the corresponding identification text of the speech data;
Smooth processing subelement, for carrying out smooth processing to the corresponding identification text of the speech data;
Regular subelement, it is regular for carrying out numeral to the corresponding identification text of the speech data;
Text replaces subelement, for carrying out text replacement to the corresponding identification text of the speech data,
The text, which is replaced, to be included:English lower case in the corresponding identification text of the speech data is turned
It is changed to capitalization or vice versa;And/or replace the sensitive word in the corresponding identification text of the speech data
It is changed to additional character.
19. the device according to any one of claim 11 to 18, it is characterised in that
The segmentation detection module, specifically in units of voice segments, successively by each voice segments correspondence
The segmentation feature of identification text input the segmented model and carry out segmentation detection, determine each voice segments pair
Whether the end position for the identification text answered needs segmentation.
20. the device according to any one of claim 11 to 18, it is characterised in that the dress
Putting also includes:
First display module, for showing the identification text after segmentation to user;Or
Subject distillation module, the theme of text is recognized for extracting each paragraph after segmentation;
Second display module, for by each theme presentation to user;
Sensing module, the theme interested for perceiving user, and perceiving user master interested
During topic, second display module is triggered by the identification textual presentation of the paragraph of the correspondence theme to use
Family.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610256898.8A CN107305541B (en) | 2016-04-20 | 2016-04-20 | Method and device for segmenting speech recognition text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610256898.8A CN107305541B (en) | 2016-04-20 | 2016-04-20 | Method and device for segmenting speech recognition text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107305541A true CN107305541A (en) | 2017-10-31 |
CN107305541B CN107305541B (en) | 2021-05-04 |
Family
ID=60150228
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610256898.8A Active CN107305541B (en) | 2016-04-20 | 2016-04-20 | Method and device for segmenting speech recognition text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107305541B (en) |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090051A (en) * | 2017-12-20 | 2018-05-29 | 深圳市沃特沃德股份有限公司 | The interpretation method and translator of continuous long voice document |
CN108364650A (en) * | 2018-04-18 | 2018-08-03 | 北京声智科技有限公司 | The adjusting apparatus and method of voice recognition result |
CN108363765A (en) * | 2018-02-06 | 2018-08-03 | 深圳市鹰硕技术有限公司 | The recognition methods of audio paragraph and device |
CN108446389A (en) * | 2018-03-22 | 2018-08-24 | 平安科技(深圳)有限公司 | Speech message searching and displaying method, device, computer equipment and storage medium |
CN108830639A (en) * | 2018-05-17 | 2018-11-16 | 科大讯飞股份有限公司 | Content data processing method and device, computer readable storage medium |
CN109344411A (en) * | 2018-09-19 | 2019-02-15 | 深圳市合言信息科技有限公司 | A kind of interpretation method for listening to formula simultaneous interpretation automatically |
CN109361823A (en) * | 2018-11-01 | 2019-02-19 | 深圳市号互联科技有限公司 | A kind of intelligent interaction mode that voice is mutually converted with text |
CN109743589A (en) * | 2018-12-26 | 2019-05-10 | 百度在线网络技术(北京)有限公司 | Article generation method and device |
CN110264997A (en) * | 2019-05-30 | 2019-09-20 | 北京百度网讯科技有限公司 | The method, apparatus and storage medium of voice punctuate |
CN110379413A (en) * | 2019-06-28 | 2019-10-25 | 联想(北京)有限公司 | A kind of method of speech processing, device, equipment and storage medium |
CN110399489A (en) * | 2019-07-08 | 2019-11-01 | 厦门市美亚柏科信息股份有限公司 | A kind of chat data segmentation method, device and storage medium |
CN110502631A (en) * | 2019-07-17 | 2019-11-26 | 招联消费金融有限公司 | A kind of input information response method, apparatus, computer equipment and storage medium |
CN110503943A (en) * | 2018-05-17 | 2019-11-26 | 蔚来汽车有限公司 | A voice interaction method and voice interaction system |
CN110588524A (en) * | 2019-08-02 | 2019-12-20 | 精电有限公司 | A method for displaying information and a vehicle-mounted auxiliary display system |
CN110619897A (en) * | 2019-08-02 | 2019-12-27 | 精电有限公司 | Conference summary generation method and vehicle-mounted recording system |
CN110827825A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Punctuation prediction method, system, terminal and storage medium for speech recognition text |
CN111079384A (en) * | 2019-11-18 | 2020-04-28 | 佰聆数据股份有限公司 | Identification method and system for intelligent quality inspection service forbidden words |
CN111862980A (en) * | 2020-08-07 | 2020-10-30 | 斑马网络技术有限公司 | Incremental semantic processing method |
CN111931482A (en) * | 2020-09-22 | 2020-11-13 | 苏州思必驰信息科技有限公司 | Text segmentation method and device |
CN112036128A (en) * | 2020-08-21 | 2020-12-04 | 百度在线网络技术(北京)有限公司 | A text content processing method, apparatus, device and storage medium |
CN112185424A (en) * | 2020-09-29 | 2021-01-05 | 国家计算机网络与信息安全管理中心 | Voice file cutting and restoring method, device, equipment and storage medium |
CN112214965A (en) * | 2020-10-21 | 2021-01-12 | 科大讯飞股份有限公司 | Case regularization method, apparatus, electronic device and storage medium |
CN112699687A (en) * | 2021-01-07 | 2021-04-23 | 北京声智科技有限公司 | Content cataloging method and device and electronic equipment |
CN112712794A (en) * | 2020-12-25 | 2021-04-27 | 苏州思必驰信息科技有限公司 | Speech recognition marking training combined system and device |
CN112733660A (en) * | 2020-12-31 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Method and device for splitting video strip |
CN112818077A (en) * | 2020-12-31 | 2021-05-18 | 科大讯飞股份有限公司 | Text processing method, device, equipment and storage medium |
WO2021109000A1 (en) * | 2019-12-03 | 2021-06-10 | 深圳市欢太科技有限公司 | Data processing method and apparatus, electronic device, and storage medium |
CN113041623A (en) * | 2019-12-26 | 2021-06-29 | 波克科技股份有限公司 | Game parameter configuration method and device and computer readable storage medium |
CN113076720A (en) * | 2021-04-29 | 2021-07-06 | 新声科技(深圳)有限公司 | Long text segmentation method and device, storage medium and electronic device |
CN114254587A (en) * | 2021-12-15 | 2022-03-29 | 科大讯飞股份有限公司 | Topic paragraph dividing method and device, electronic equipment and storage medium |
CN114841171A (en) * | 2022-04-29 | 2022-08-02 | 北京思源智通科技有限责任公司 | Text segmentation subject extraction method, system, readable medium and device |
CN115394295A (en) * | 2021-05-25 | 2022-11-25 | 阿里巴巴新加坡控股有限公司 | Segmentation processing method, device, equipment and storage medium |
US11580463B2 (en) | 2019-05-06 | 2023-02-14 | Hithink Royalflush Information Network Co., Ltd. | Systems and methods for report generation |
CN117113974A (en) * | 2023-04-26 | 2023-11-24 | 荣耀终端有限公司 | Text segmentation method, device, chip, electronic equipment and medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1296587A (en) * | 1998-02-02 | 2001-05-23 | 蓝道尔·C·沃克 | Text processor |
US20040024585A1 (en) * | 2002-07-03 | 2004-02-05 | Amit Srivastava | Linguistic segmentation of speech |
CN1771494A (en) * | 2003-05-28 | 2006-05-10 | 洛昆多股份公司 | Automatic segmentation of texts comprising chunsk without separators |
CN1894686A (en) * | 2003-11-21 | 2007-01-10 | 皇家飞利浦电子股份有限公司 | Text segmentation and topic annotation for document structuring |
US20100169318A1 (en) * | 2008-12-30 | 2010-07-01 | Microsoft Corporation | Contextual representations from data streams |
CN103150294A (en) * | 2011-12-06 | 2013-06-12 | 盛乐信息技术(上海)有限公司 | Method and system for correcting based on voice identification results |
CN103164399A (en) * | 2013-02-26 | 2013-06-19 | 北京捷通华声语音技术有限公司 | Punctuation addition method and device in speech recognition |
CN103345922A (en) * | 2013-07-05 | 2013-10-09 | 张巍 | Large-length voice full-automatic segmentation method |
CN103488723A (en) * | 2013-09-13 | 2014-01-01 | 复旦大学 | Automatic navigating method and system for interested semantic range of electronic reading |
US20150302849A1 (en) * | 2005-07-13 | 2015-10-22 | Intellisist, Inc. | System And Method For Identifying Special Information |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
US20160026618A1 (en) * | 2002-12-24 | 2016-01-28 | At&T Intellectual Property Ii, L.P. | System and method of extracting clauses for spoken language understanding |
CN105427858A (en) * | 2015-11-06 | 2016-03-23 | 科大讯飞股份有限公司 | Method and system for achieving automatic voice classification |
-
2016
- 2016-04-20 CN CN201610256898.8A patent/CN107305541B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1296587A (en) * | 1998-02-02 | 2001-05-23 | 蓝道尔·C·沃克 | Text processor |
US20040024585A1 (en) * | 2002-07-03 | 2004-02-05 | Amit Srivastava | Linguistic segmentation of speech |
US20160026618A1 (en) * | 2002-12-24 | 2016-01-28 | At&T Intellectual Property Ii, L.P. | System and method of extracting clauses for spoken language understanding |
CN1771494A (en) * | 2003-05-28 | 2006-05-10 | 洛昆多股份公司 | Automatic segmentation of texts comprising chunsk without separators |
CN1894686A (en) * | 2003-11-21 | 2007-01-10 | 皇家飞利浦电子股份有限公司 | Text segmentation and topic annotation for document structuring |
US20150302849A1 (en) * | 2005-07-13 | 2015-10-22 | Intellisist, Inc. | System And Method For Identifying Special Information |
US20100169318A1 (en) * | 2008-12-30 | 2010-07-01 | Microsoft Corporation | Contextual representations from data streams |
CN103150294A (en) * | 2011-12-06 | 2013-06-12 | 盛乐信息技术(上海)有限公司 | Method and system for correcting based on voice identification results |
CN103164399A (en) * | 2013-02-26 | 2013-06-19 | 北京捷通华声语音技术有限公司 | Punctuation addition method and device in speech recognition |
CN103345922A (en) * | 2013-07-05 | 2013-10-09 | 张巍 | Large-length voice full-automatic segmentation method |
CN103488723A (en) * | 2013-09-13 | 2014-01-01 | 复旦大学 | Automatic navigating method and system for interested semantic range of electronic reading |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
CN105427858A (en) * | 2015-11-06 | 2016-03-23 | 科大讯飞股份有限公司 | Method and system for achieving automatic voice classification |
Non-Patent Citations (3)
Title |
---|
DOUG BEEFERMAN: "Statistical Models for Text Segmentation", 《MACHINE LEARNING》 * |
ELIZABETH SHRIBERG: "Prosody-based automatic segmentation of speech into sentences and topics", 《SPEECH COMMUNICATION》 * |
任新社 等: "基于改进特征值的语音分割算法研究", 《南京师范大学学报( 工程技术版)》 * |
Cited By (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108090051A (en) * | 2017-12-20 | 2018-05-29 | 深圳市沃特沃德股份有限公司 | The interpretation method and translator of continuous long voice document |
CN108363765A (en) * | 2018-02-06 | 2018-08-03 | 深圳市鹰硕技术有限公司 | The recognition methods of audio paragraph and device |
CN108363765B (en) * | 2018-02-06 | 2020-12-08 | 深圳市鹰硕技术有限公司 | Audio paragraph identification method and device |
CN108446389A (en) * | 2018-03-22 | 2018-08-24 | 平安科技(深圳)有限公司 | Speech message searching and displaying method, device, computer equipment and storage medium |
CN108446389B (en) * | 2018-03-22 | 2021-12-24 | 平安科技(深圳)有限公司 | Voice message search display method and device, computer equipment and storage medium |
CN108364650A (en) * | 2018-04-18 | 2018-08-03 | 北京声智科技有限公司 | The adjusting apparatus and method of voice recognition result |
CN108364650B (en) * | 2018-04-18 | 2024-01-19 | 北京声智科技有限公司 | Device and method for adjusting voice recognition result |
CN110503943A (en) * | 2018-05-17 | 2019-11-26 | 蔚来汽车有限公司 | A voice interaction method and voice interaction system |
CN108830639A (en) * | 2018-05-17 | 2018-11-16 | 科大讯飞股份有限公司 | Content data processing method and device, computer readable storage medium |
CN110503943B (en) * | 2018-05-17 | 2023-09-19 | 蔚来(安徽)控股有限公司 | Voice interaction method and voice interaction system |
CN108830639B (en) * | 2018-05-17 | 2022-04-26 | 科大讯飞股份有限公司 | Content data processing method and device, and computer readable storage medium |
CN109344411A (en) * | 2018-09-19 | 2019-02-15 | 深圳市合言信息科技有限公司 | A kind of interpretation method for listening to formula simultaneous interpretation automatically |
CN109361823A (en) * | 2018-11-01 | 2019-02-19 | 深圳市号互联科技有限公司 | A kind of intelligent interaction mode that voice is mutually converted with text |
CN109743589A (en) * | 2018-12-26 | 2019-05-10 | 百度在线网络技术(北京)有限公司 | Article generation method and device |
CN109743589B (en) * | 2018-12-26 | 2021-12-14 | 百度在线网络技术(北京)有限公司 | Article generation method and device |
US11580463B2 (en) | 2019-05-06 | 2023-02-14 | Hithink Royalflush Information Network Co., Ltd. | Systems and methods for report generation |
CN110264997A (en) * | 2019-05-30 | 2019-09-20 | 北京百度网讯科技有限公司 | The method, apparatus and storage medium of voice punctuate |
CN110379413A (en) * | 2019-06-28 | 2019-10-25 | 联想(北京)有限公司 | A kind of method of speech processing, device, equipment and storage medium |
CN110379413B (en) * | 2019-06-28 | 2022-04-19 | 联想(北京)有限公司 | Voice processing method, device, equipment and storage medium |
CN110399489B (en) * | 2019-07-08 | 2022-06-17 | 厦门市美亚柏科信息股份有限公司 | Chat data segmentation method, device and storage medium |
CN110399489A (en) * | 2019-07-08 | 2019-11-01 | 厦门市美亚柏科信息股份有限公司 | A kind of chat data segmentation method, device and storage medium |
CN110502631B (en) * | 2019-07-17 | 2022-11-04 | 招联消费金融有限公司 | Input information response method and device, computer equipment and storage medium |
CN110502631A (en) * | 2019-07-17 | 2019-11-26 | 招联消费金融有限公司 | A kind of input information response method, apparatus, computer equipment and storage medium |
CN110588524A (en) * | 2019-08-02 | 2019-12-20 | 精电有限公司 | A method for displaying information and a vehicle-mounted auxiliary display system |
CN110619897A (en) * | 2019-08-02 | 2019-12-27 | 精电有限公司 | Conference summary generation method and vehicle-mounted recording system |
CN110827825A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Punctuation prediction method, system, terminal and storage medium for speech recognition text |
CN111079384A (en) * | 2019-11-18 | 2020-04-28 | 佰聆数据股份有限公司 | Identification method and system for intelligent quality inspection service forbidden words |
WO2021109000A1 (en) * | 2019-12-03 | 2021-06-10 | 深圳市欢太科技有限公司 | Data processing method and apparatus, electronic device, and storage medium |
CN113041623A (en) * | 2019-12-26 | 2021-06-29 | 波克科技股份有限公司 | Game parameter configuration method and device and computer readable storage medium |
CN113041623B (en) * | 2019-12-26 | 2023-04-07 | 波克科技股份有限公司 | Game parameter configuration method and device and computer readable storage medium |
CN111862980A (en) * | 2020-08-07 | 2020-10-30 | 斑马网络技术有限公司 | Incremental semantic processing method |
CN112036128A (en) * | 2020-08-21 | 2020-12-04 | 百度在线网络技术(北京)有限公司 | A text content processing method, apparatus, device and storage medium |
CN111931482B (en) * | 2020-09-22 | 2021-09-24 | 思必驰科技股份有限公司 | Text segmentation method and device |
CN111931482A (en) * | 2020-09-22 | 2020-11-13 | 苏州思必驰信息科技有限公司 | Text segmentation method and device |
CN112185424A (en) * | 2020-09-29 | 2021-01-05 | 国家计算机网络与信息安全管理中心 | Voice file cutting and restoring method, device, equipment and storage medium |
CN112214965A (en) * | 2020-10-21 | 2021-01-12 | 科大讯飞股份有限公司 | Case regularization method, apparatus, electronic device and storage medium |
CN112712794A (en) * | 2020-12-25 | 2021-04-27 | 苏州思必驰信息科技有限公司 | Speech recognition marking training combined system and device |
CN112733660B (en) * | 2020-12-31 | 2022-05-27 | 蚂蚁胜信(上海)信息技术有限公司 | Method and device for splitting video strip |
CN112818077A (en) * | 2020-12-31 | 2021-05-18 | 科大讯飞股份有限公司 | Text processing method, device, equipment and storage medium |
CN112818077B (en) * | 2020-12-31 | 2023-05-30 | 科大讯飞股份有限公司 | Text processing method, device, equipment and storage medium |
CN112733660A (en) * | 2020-12-31 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Method and device for splitting video strip |
CN112699687A (en) * | 2021-01-07 | 2021-04-23 | 北京声智科技有限公司 | Content cataloging method and device and electronic equipment |
CN113076720A (en) * | 2021-04-29 | 2021-07-06 | 新声科技(深圳)有限公司 | Long text segmentation method and device, storage medium and electronic device |
CN115394295A (en) * | 2021-05-25 | 2022-11-25 | 阿里巴巴新加坡控股有限公司 | Segmentation processing method, device, equipment and storage medium |
CN114254587A (en) * | 2021-12-15 | 2022-03-29 | 科大讯飞股份有限公司 | Topic paragraph dividing method and device, electronic equipment and storage medium |
CN114841171A (en) * | 2022-04-29 | 2022-08-02 | 北京思源智通科技有限责任公司 | Text segmentation subject extraction method, system, readable medium and device |
CN117113974A (en) * | 2023-04-26 | 2023-11-24 | 荣耀终端有限公司 | Text segmentation method, device, chip, electronic equipment and medium |
CN117113974B (en) * | 2023-04-26 | 2024-05-24 | 荣耀终端有限公司 | Text segmentation method, device, chip, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN107305541B (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107305541A (en) | Speech recognition text segmentation method and device | |
US10950242B2 (en) | System and method of diarization and labeling of audio data | |
US11037553B2 (en) | Learning-type interactive device | |
CN109887497B (en) | Modeling method, device and equipment for speech recognition | |
CN109686383B (en) | Voice analysis method, device and storage medium | |
CN108536654B (en) | Method and device for displaying identification text | |
US10432789B2 (en) | Classification of transcripts by sentiment | |
CN111341305B (en) | Audio data labeling method, device and system | |
WO2018108080A1 (en) | Voiceprint search-based information recommendation method and device | |
CN109637537B (en) | Method for automatically acquiring annotated data to optimize user-defined awakening model | |
CN104598644B (en) | Favorite tag mining method and device | |
CN105427858A (en) | Method and system for achieving automatic voice classification | |
US20180047387A1 (en) | System and method for generating accurate speech transcription from natural speech audio signals | |
CN105654943A (en) | Voice wakeup method, apparatus and system thereof | |
CN107077843A (en) | Session control and dialog control method | |
JP2006190006A5 (en) | ||
US9251808B2 (en) | Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof | |
WO2014187096A1 (en) | Method and system for adding punctuation to voice files | |
JP5824829B2 (en) | Speech recognition apparatus, speech recognition method, and speech recognition program | |
CN112233680A (en) | Speaker role identification method and device, electronic equipment and storage medium | |
CN112818680B (en) | Corpus processing method and device, electronic equipment and computer readable storage medium | |
CN114120425A (en) | Emotion recognition method and device, electronic equipment and storage medium | |
US9805740B2 (en) | Language analysis based on word-selection, and language analysis apparatus | |
JP2019124952A (en) | Information processing device, information processing method, and program | |
JP2004094257A (en) | Method and apparatus for generating question of decision tree for speech processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |