CN111508522A - Statement analysis processing method and system - Google Patents
Statement analysis processing method and system Download PDFInfo
- Publication number
- CN111508522A CN111508522A CN201910094372.8A CN201910094372A CN111508522A CN 111508522 A CN111508522 A CN 111508522A CN 201910094372 A CN201910094372 A CN 201910094372A CN 111508522 A CN111508522 A CN 111508522A
- Authority
- CN
- China
- Prior art keywords
- sentence
- chunk
- exercise
- word
- prosodic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 55
- 238000003672 processing method Methods 0.000 title description 4
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 21
- 230000033764 rhythmic process Effects 0.000 claims description 41
- 238000002372 labelling Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 15
- 238000011156 evaluation Methods 0.000 claims description 10
- 230000003252 repetitive effect Effects 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- OUBMGJOQLXMSNT-UHFFFAOYSA-N N-isopropyl-N'-phenyl-p-phenylenediamine Chemical compound C1=CC(NC(C)C)=CC=C1NC1=CC=CC=C1 OUBMGJOQLXMSNT-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
- G09B5/04—Electrically-operated educational appliances with audible presentation of the material to be studied
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The invention discloses a method and a system for analyzing and processing sentences, wherein the method comprises the following steps: performing prosodic hierarchy analysis on the exercise sentences, determining chunk time boundaries of each prosodic chunk in each sentence, and setting intonation marks for the exercise sentences; setting a rereading mark for the exercise sentence; and taking the determined chunk time boundary, the intonation marks and the practice sentences with the marks being re-read as standard prosody level sentences. By the method provided by the invention, prosody hierarchy analysis is carried out on the text of the input sentence, so that a linear word sequence of a whole sentence is converted into a prosody hierarchy structure, and a user can learn and master the method for carrying out prosody structure analysis on the text and use the method in pronunciation. By the method, the user can master the use of intonation and rereading when the sentence is read.
Description
Technical Field
The present application relates to the field of data analysis technologies, and in particular, to a method and a system for analyzing and processing sentences.
Background
Reading is an important learning method in language learning: the reading can improve the accuracy and the fluency of the pronunciation of the learner and the comprehension capacity of the learner on the sentences and even chapters, thereby strengthening the correct use of the rhythm characteristics such as the stress reading, the intonation and the like.
In reading aloud, the learner may experience the following errors or inaccuracies: mispronunciations or impracticalities of words (including vowels, consonants, syllable boundaries, accents, continuations, transcription savers, etc.), intraword and interword disfluences (including inappropriate durations and pauses), prosodic changes (omission or misuse of accents) such as lack of pitch capability, lack of correct grammatical and semantically related intonation changes (e.g., intonation or precipitation at the end of a sentence), inability to correctly understand a sentence and control the rhythm of speech output by phrases (Phrasing).
Currently, more traditional schemes practice reading aloud in two ways:
the first method is as follows: talking dictionary
A standalone electronic dictionary device, or desktop software, software running in a mobile device (including WeChat applets, web pages, etc.). After a user queries a word, the voiced dictionary provides a traditional paraphrase of the word, along with audible audio (live voice or computer synthesized language) of the pronunciation of the word that can be played. The learner learns the pronunciation of the word by playing the audio and may orally mimic it. The voiced dictionary may also provide a number of word-related illustrative sentences, which may also be accompanied by audio that may be played.
The second method comprises the following steps: talking book
The audio file can be an independently distributed audio file (mp3, etc.), a matching optical disk of a book, an early recording tape, or a program form on a certain content platform: such as PodCast, himalayan FM, wechat, public, etc. The way learners use audiobooks is usually "listening". The learner can also imitate by himself.
The third method comprises the following steps: pronunciation evaluation software
Including software running on a desktop system, software running on a mobile device (mobile applications, wechat applets, web programs, etc.), and other smart devices running an operating system (smart televisions, smart speakers, etc.). Such software typically provides demonstration audio that compares the learner's spoken speech to the demonstration speech to produce an overall score, and typically also provides scores for the segmentation dimensions including pronunciation accuracy, completeness, and fluency.
Although the scheme can guide the user to read aloud, the first and second modes cannot evaluate the reading level of the user, and the learner cannot get immediate feedback;
in the third mode, although the training of the readers can be scored, only sentence-level reading scoring can be provided, and the system cannot realize the targeted training of the learner on the structural segment; and the mode only provides recorded demonstration audio and cannot provide a teaching function, so that the user's grasp of reading skills is reduced.
Disclosure of Invention
The invention provides a statement analysis processing method and system, which are used for solving the problem that in the prior art, the user cannot carry out targeted training due to the fact that whole statement analysis evaluation is carried out on user reading data.
The specific technical scheme is as follows:
a method of statement analysis processing, the method comprising:
performing prosodic hierarchy analysis on the exercise sentences to determine chunk time boundaries of each prosodic chunk in each sentence, wherein the prosodic chunk comprises at least one word, and the time boundaries represent pause positions of the sentences;
setting intonation marks for the exercise sentences according to the determined chunk time boundary;
setting a rereading mark for the exercise sentence according to the determined chunk time boundary;
and taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
Optionally, performing prosody hierarchy analysis on the exercise sentences to determine chunk time boundaries of each prosody chunk in each sentence, includes:
performing prosodic hierarchy analysis on the practice sentences to determine word time boundaries corresponding to all words in the practice sentences;
determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
Optionally, determining the chunk time boundary of each prosodic chunk according to the word time boundary of each word includes:
determining a sentence layer in the practice sentence according to the word time boundary of each word;
determining a intonation phrase layer in the sentence layer;
determining a prosodic phrase layer in the intonation phrase layer;
determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
Optionally, setting intonation marks for the exercise sentences according to the determined chunk time boundary, including:
acquiring data in the exercise sentence and acquiring a tone labeling set, wherein the data comprises each line of text and voice corresponding to each line of text, and the labeling set comprises each tone;
and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
Optionally, setting a rereading mark for the exercise sentence according to the determined chunk time boundary, including:
acquiring data in the exercise sentence and acquiring a rereading label set;
and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
Optionally, after taking the determined chunk time boundary, the intonation flag, and the re-reading marked exercise sentence as a standard prosody level sentence, the method further includes:
acquiring an exercise sentence of the user based on the standard prosody level sentence;
determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence;
and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
Optionally, after outputting prompting information for prompting a rhythm chunk for performing repeated exercises, the method further includes:
detecting whether a rhythm chunk currently trained by a user passes evaluation;
if not, prompting the user to continue training the current rhythm chunk;
if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
A system of statement analysis processing, the system comprising:
the analysis module is used for carrying out prosody level analysis on the exercise sentences, determining the chunk time boundary of each prosody chunk in each sentence, and setting intonation marks for the exercise sentences according to the determined chunk time boundary; setting a re-reading mark for the exercise sentence according to the determined chunk time boundary, wherein the prosodic chunk comprises at least one word, and the time boundary represents the pause position of the sentence;
and the processing module is used for taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
Optionally, the analysis module is specifically configured to perform prosody hierarchy analysis on the exercise sentence, and determine a word time boundary corresponding to each word in the exercise sentence; determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
Optionally, the analysis module is specifically configured to determine a sentence level in the practice sentence according to the word time boundary of each word; determining a intonation phrase layer in the sentence layer; determining a prosodic phrase layer in the intonation phrase layer; determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
Optionally, the analysis module is specifically configured to obtain data in the exercise sentence and obtain a tone labeling set, where the data includes each line of text and a voice corresponding to each line of text, and the labeling set includes each tone; and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
Optionally, the analysis module is specifically configured to obtain data in the exercise sentence and obtain a rereading annotation set; and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
Optionally, the processing module is further configured to obtain an exercise sentence of the user based on the standard prosody level sentence; determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence; and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
Optionally, the processing module is further configured to detect whether the currently trained prosody chunk of the user passes the evaluation; if not, prompting the user to continue training the current rhythm chunk; if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
By the method provided by the invention, prosody hierarchy analysis is carried out on the text of the input sentence, so that a linear word sequence of a whole sentence is converted into a prosody hierarchy structure, and a user can learn and master the method for carrying out prosody structure analysis on the text and use the method in pronunciation. By the method, the user can master the use of intonation and rereading when the sentence is read.
In addition, the prosodic chunks of the sentences of the user can be decomposed and analyzed, and errors of the user in each prosodic chunk of the sentences are determined, so that the user can do partial exercise aiming at each prosodic chunk and even aiming at a single word, and the pertinence and the learning efficiency of the reading-aloud learning are improved.
Drawings
FIG. 1 is a flowchart of a method for analyzing and processing a statement according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a prosodic hierarchy in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a statement analysis processing system according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention are described in detail with reference to the drawings and the specific embodiments, and it should be understood that the embodiments and the specific technical features in the embodiments of the present invention are merely illustrative of the technical solutions of the present invention, and are not restrictive, and the embodiments and the specific technical features in the embodiments of the present invention may be combined with each other without conflict.
First, the terms to which the present invention relates are explained:
the sentence: sequences of words and punctuation, organized according to grammatical rules and semantic requirements, express specific meanings, usually ending with punctuation.
And (3) voice: human vocal organs (vocal cords, vocal tract, tongue, mouth, lips, teeth) naturally convert a specific sentence into sound in the form of a sequence of phonemes under the coordination of the brain.
Rhythm: for the expression needs in human natural language, specific phones/syllables are assigned different prosodic parameters: duration (Duration), Pitch (Pitch), Energy (Energy), and Pause (Pause) to produce a "twitch-down, twitch-down" effect. Humans can perceive whether prosodic parameters match text.
Tone: refers to the trend of the pitch trajectory of the pronunciation of a sentence or a segment of a sentence. In general, statement sentences and special interrogations use down-tones, while general interrogations use up-tones.
Semantic rereading: unlike the repeated reading of symbols (stress) in english words, speakers often make the prosody of some words more prominent in peripheral words according to the semantic and expression requirements of sentences, such as increased pitch, increased energy (volume), increased duration, additional pause, and so on.
The grammar structure is as follows: the process of parsing the sentence described by the natural language text into a syntax tree describing the aforementioned components according to the linguistic criteria, such as main, predicate, object, predicate, shape, complement. The syntax tree is usually represented as a nested structure, e.g., S ═ NP + VP, meaning that a sentence S is composed of a name phrase (as subject) plus a verb phrase (as predicate).
The rhythm structure is as follows: the prosodic structure is a process of reorganizing a text sequence into an 'interconnected block structure' according to the communication requirement during the speaking process of a speaker. The correct and proper prosodic structure can reduce the communication cost of a speaker and a listener. The prosodic structure affects the prosodic features of the text after it is read. This block-like structure also has nested (hierarchical) junctions
However, the structure is much shallower than the grammar structure, and generally has only 2-3 layers. For example: S-IP 1+ IP2, IP 1-PP 1+ PP2 indicate that a sentence S is composed of two intonation phrases IP1 and IP2, where IP1 is composed of a prosodic phrase PP1 and a prosodic phrase PP 2.
Fig. 1 shows a statement analysis processing method according to an embodiment of the present invention, where the method includes:
s1, performing prosody hierarchy analysis on the exercise sentences to determine the chunk time boundary of each prosody chunk in each sentence;
first, the prosodic hierarchy needs to be analyzed, and in the embodiment of the present invention, the prosodic hierarchy can be divided into 3 layers: sentence layer (S), intonation phrase layer (IP), prosodic phrase layer (PP). One S may be composed of one to several IPs, one IP may also be composed of one to several PPs, and the marks between the chunks are chunk time boundaries.
Specifically, S ═ w0,w1,w2,,,wi,,,wn]Is divided intoThe results of the precipitated hierarchy are: s [ [ w ]0,w1,w2][[w3,w4],[wi,,,wn]]Wherein the sentence S includes two IPs: IP1 ═ w0,w1,w2],IP2=[w3,,,wi,,,wn]Wherein IP2 contains two PPs, namely: PP1 ═ w3,w4],PP2=[wi,,,wi,,,wn]。
Before performing prosodic hierarchy analysis on an exercise sentence, word time boundaries of individual words in the sentence need first be determined. In the embodiment of the present invention, the word time boundary is: IP _ Boundary, PP _ Boundary, None-Boundary.
For example, the statement is: this is a serous issue and sensing well with discussh Moscow. The word time boundaries for each word are shown in table 1:
TABLE 1
The time boundaries of the individual prosodic chunks in the sentence can be determined based on the time boundaries of the words. For example, the statement is: this is a serous issue and sensing well with disc with Moscow. When the learner reads the sentence with 12 words and 17 syllables, the learner needs to avoid completing the pronunciation at one stroke, but the learner should accord with the characteristics of the sentence to make proper analysis and planning of the prosodic structure.
Thus, after prosody hierarchy analysis, the prosody analysis results are shown in fig. 2, where the complete exercise sentence is divided into corresponding hierarchies in fig. 2.
It should be noted that, in the embodiment of the present invention, the prosodic hierarchy analysis may be calculated and solved by using a machine-learned algorithm model such as a conditional random field algorithm, a hidden markov model, a recurrent neural network, or the like.
Through the prosody level analysis, the prosody levels in the voice can be marked, and the prosody levels can be used for evaluating the voice exercise of the user and used as a scoring basis for subsequent voice exercise.
S2, according to the determined chunk time boundary, tone marks are set for the training sentences;
first, in the embodiment of the present invention, the intonation types, the applicable cases, and the intonation trends are shown in table 2:
TABLE 2
Based on the contents in table 2, the set of labels of the intonation is I ═ None, L ow, High, L ow _ L ow, L ow _ High, High _ L ow, High _ High };
training data set D ═ D0,D1,,,Di,,,,DkIn which D isi=Si,Ti,Si=[w0,w1,,,wi,,,wn],Ti=[ti0,ti1,,,tij,,,tin|tij]。
Further, in the embodiment of the present invention, the intonation is labeled based on an unsupervised clustering algorithm, and the specific steps are as follows:
1. specifying standard documents to determine record formats, decision bases, arbitration schemes, and the like;
2. preparing data to be marked, including each line of text and corresponding voice;
3. operating a rhythm level marking process to determine a rhythm level boundary;
4. calculating a word time boundary of each word in the speech by using a forced alignment algorithm;
5. extracting acoustic features in the sentence, generating Ai ═ Ai0, Ai1,, aij,, ain ] data;
6. for the set of ij in all Ai, unsupervised clustering similar to K-Means is performed:
6-1) for None-Boundary, skip;
6-2) for PP _ Boundary, the clustering target is 2 types;
6-3) for IP _ Boundary, the clustering target is 4 types;
based on the method, model training is established, and the steps of establishing a machine learning model are as follows:
1. processing the large-scale data set according to the labeling method;
2. extracting text features and constructing pairs between text feature representations and tone types;
3. a model of the relationship between the text feature representation and the tone type is trained using a learning algorithm.
The steps of using the model to distinguish intonation are as follows:
a. initializing classification calculation and loading the learning model;
b. extracting text feature representation;
c. the text feature representation is input to a classification algorithm and an output target pitch type is generated.
The intonation output results are shown in table 3:
TABLE 3
S3, setting a rereading mark for the exercise sentence according to the determined chunk time boundary;
and S4, taking the determined chunk time boundary, the intonation marks and the practice sentences with the repeated reading marks as standard prosody level sentences.
After the intonation analysis of the exercise sentence is completed, the exercise sentence also needs to be re-read and analyzed, that is, a re-read mark of each word segment in the speech is marked, wherein the re-read type is shown in table 4:
type of rereading | Is suitable for | Rereading situation |
None | Common to the null word, or weakened real word | Weak reading |
Normal | Common to real words | Is normal |
Emphasized | The real word is required for highlighting the semantic meaning | Rereading |
TABLE 4
Based on the contents in table 2, the label set of the rereading is: e ═ None, Nor mal, alpha sized }; for each word in the training data, one label in the set E needs to be labeled, and the training data set D ═ D0,D1,,,Di,,,,Dk}, wherein: di=Si,Ai,TiRepresenting one document (sentence) in the training set;
Si=[wi0,wi1,,,wij,,,win]a sequence of words (Token) representing the document (sentence), length being; a. thei=[ai0,ai1,,,aij,,,ain]Representing a sequence of acoustic features (Acoustics features) corresponding to each word (Token); t isi=[ti0,ti1,,,tij,,,tin|tij]Where E denotes a tag sequence corresponding to each word (Token).
Further, rereading labeling is carried out by using unsupervised clustering, and the labeling method comprises the following steps:
1. specifying standard documents to determine record formats, decision bases, arbitration schemes, and the like;
2. preparing data to be marked, including each line of text and corresponding voice;
3. operating a rhythm level marking process to determine a rhythm level boundary;
4. calculating a word time boundary of each word in the speech by using a forced alignment algorithm;
5. extracting acoustic features in the sentence, generating Ai ═ Ai0, Ai1,, aij,, ain ] data;
6. unsupervised clustering similar to K-Means is performed for the set of ij in all Ai, with the target of clustering being 3 classes.
Based on the method, model training is established, and the steps of establishing a machine learning model are as follows:
1. processing the large-scale data set according to the labeling method;
2. extracting text features and constructing pairs between text feature representations and tone types;
3. a model of the relationship between the text feature representation and the tone type is trained using a learning algorithm.
The steps of using the model to distinguish intonation are as follows:
a. initializing classification calculation and loading the learning model;
b. extracting text feature representation;
c. the text feature representation is input to a classification algorithm and an output target pitch type is generated.
The intonation output results are shown in table 4:
TABLE 4
After completing the prosody level analysis, acquiring a practice sentence of the user based on the standard prosody level sentence, determining a prosody chunk with an error in the practice sentence based on the prosody level, and outputting prompting information for prompting the user to practice the prosody chunk. In brief, different prosody chunks exist in one sentence, and the system analyzes each prosody chunk so as to determine whether an error exists in the practice sentence of the user, and if the error exists, a user prompt is given and the position of the error is prompted.
And when errors exist, the system enters a repeated exercise stage of the error rhythm chunk, and detects whether the rhythm chunk currently exercised by the user passes the evaluation, wherein the evaluation is based on the method, if the evaluation fails, the system prompts the user to continue the exercise of the current rhythm chunk, and if the evaluation passes, the system is switched from the current rhythm chunk to the next rhythm chunk with errors to exercise.
By the method provided by the invention, prosody hierarchy analysis is carried out on the text of the input sentence, so that a linear word sequence of a whole sentence is converted into a prosody hierarchy structure, and a user can learn and master the method for carrying out prosody structure analysis on the text and use the method in pronunciation. By the method, the user can master the use of intonation and rereading when the sentence is read.
In addition, the prosodic chunks of the sentences of the user can be decomposed and analyzed, and errors of the user in each prosodic chunk of the sentences are determined, so that the user can do partial exercise aiming at each prosodic chunk and even aiming at a single word, and the pertinence and the learning efficiency of the reading-aloud learning are improved.
Corresponding to the method provided by the embodiment of the present invention, an embodiment of the present invention further provides a statement analysis processing system, and as shown in fig. 3, the present invention is a schematic structural diagram of a statement analysis processing system in the embodiment of the present invention, where the system includes:
the analysis module 301 is configured to perform prosody level analysis on the exercise sentences, determine chunk time boundaries of each prosodic chunk in each sentence, and set intonation marks for the exercise sentences according to the determined chunk time boundaries; setting a re-reading mark for the exercise sentence according to the determined chunk time boundary, wherein the prosodic chunk comprises at least one word, and the time boundary represents the pause position of the sentence;
a processing module 302, configured to use the determined chunk time boundary, the intonation flag, and the practice sentence with the re-reading flag as a standard prosody level sentence.
Further, in this embodiment of the present invention, the analysis module 301 is specifically configured to perform prosody level analysis on the practice sentence, and determine a word time boundary corresponding to each word in the practice sentence; determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
Further, in the embodiment of the present invention, the analysis module 301 is specifically configured to determine a sentence level in the practice sentence according to the word time boundary of each word; determining a intonation phrase layer in the sentence layer; determining a prosodic phrase layer in the intonation phrase layer; determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
Further, in this embodiment of the present invention, the analysis module 301 is specifically configured to obtain data in the exercise sentence and obtain a tone labeling set, where the data includes each line of text and a voice corresponding to each line of text, and the labeling set includes each tone; and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
Further, in this embodiment of the present invention, the analysis module 301 is specifically configured to obtain data in the exercise sentence and obtain a rereading annotation set; and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
Further, in this embodiment of the present invention, the processing module 302 is further configured to obtain an exercise sentence of the user based on the standard prosody level sentence; determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence; and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
Further, in this embodiment of the present invention, the processing module 302 is further configured to detect whether the prosody module currently trained by the user passes the evaluation; if not, prompting the user to continue training the current rhythm chunk; if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the application, including the use of specific symbols, labels, or other designations to identify the vertices.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.
Claims (14)
1. A method of statement analysis processing, the method comprising:
performing prosodic hierarchy analysis on the exercise sentences to determine chunk time boundaries of each prosodic chunk in each sentence, wherein the prosodic chunk comprises at least one word, and the time boundaries represent pause positions of the sentences;
setting intonation marks for the exercise sentences according to the determined chunk time boundary;
setting a rereading mark for the exercise sentence according to the determined chunk time boundary;
and taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
2. The method of claim 1, wherein performing prosodic hierarchy analysis on the exercise sentences to determine chunk time boundaries for each prosodic chunk in each sentence comprises:
performing prosodic hierarchy analysis on the practice sentences to determine word time boundaries corresponding to all words in the practice sentences;
determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
3. The method of claim 2, wherein determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word comprises:
determining a sentence layer in the practice sentence according to the word time boundary of each word;
determining a intonation phrase layer in the sentence layer;
determining a prosodic phrase layer in the intonation phrase layer;
determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
4. The method of claim 2, wherein setting intonation flags to the exercise sentence according to the determined chunk time boundary comprises:
acquiring data in the exercise sentence and acquiring a tone labeling set, wherein the data comprises each line of text and voice corresponding to each line of text, and the labeling set comprises each tone;
and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
5. The method of claim 2, wherein setting a reread flag on the exercise sentence according to the determined chunk time boundary comprises:
acquiring data in the exercise sentence and acquiring a rereading label set;
and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
6. The method of claim 1, wherein after taking the determined chunk time boundaries, the intonation markers, and the re-read marked exercise sentences as standard prosody level sentences, the method further comprises:
acquiring an exercise sentence of the user based on the standard prosody level sentence;
determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence;
and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
7. The method of claim 6, wherein after outputting prompting information for prompting prosodic chunks for performing repetitive exercises, the method further comprises:
detecting whether a rhythm chunk currently trained by a user passes evaluation;
if not, prompting the user to continue training the current rhythm chunk;
if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
8. A system for sentence analysis processing, the system comprising:
the analysis module is used for carrying out prosody level analysis on the exercise sentences, determining the chunk time boundary of each prosody chunk in each sentence, and setting intonation marks for the exercise sentences according to the determined chunk time boundary; setting a re-reading mark for the exercise sentence according to the determined chunk time boundary, wherein the prosodic chunk comprises at least one word, and the time boundary represents the pause position of the sentence;
and the processing module is used for taking the determined chunk time boundary, the intonation marks and the practice sentences with the re-reading marks as standard prosody level sentences.
9. The system of claim 8, wherein the analysis module is specifically configured to perform prosody level analysis on the exercise sentence to determine word time boundaries corresponding to each word in the exercise sentence; determining the chunk time boundaries for each prosodic chunk based on the word time boundaries for each word.
10. The system of claim 9, wherein the analysis module is specifically configured to determine a sentence level in the exercise sentence based on the word time boundary of each word; determining a intonation phrase layer in the sentence layer; determining a prosodic phrase layer in the intonation phrase layer; determining the chunk time boundary of each prosodic chunk according to the sentence layer, the intonation phrase layer, and the prosodic phrase layer.
11. The system of claim 9, wherein the analysis module is specifically configured to obtain data in the exercise sentence and obtain a tone labeling set, wherein the data includes each line of text and a voice corresponding to each line of text, and the labeling set includes each tone; and setting tone marks for each word based on the data and the label set in the exercise sentence and according to the determined word time boundary.
12. The system of claim 9, wherein the analysis module is specifically configured to obtain data in the exercise sentence and obtain a re-reading annotation set; and based on the data in the exercise sentence and the obtained re-reading labeling set, and according to the determined word time boundary, re-reading labeling is carried out on each word.
13. The system of claim 8, wherein the processing module is further configured to obtain an exercise sentence of the user based on the standard prosody hierarchy sentence; determining, based on a prosody hierarchy, that there is an erroneous prosody chunk in the exercise sentence; and outputting prompt information of a rhythm chunk for prompting the user to repeatedly exercise.
14. The system of claim 13, wherein the processing module is further configured to detect whether a prosodic chunk currently trained by the user passes the evaluation; if not, prompting the user to continue training the current rhythm chunk; if yes, switching from the current rhythm module to the next rhythm module with errors so as to enable the user to practice the next rhythm module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910094372.8A CN111508522A (en) | 2019-01-30 | 2019-01-30 | Statement analysis processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910094372.8A CN111508522A (en) | 2019-01-30 | 2019-01-30 | Statement analysis processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111508522A true CN111508522A (en) | 2020-08-07 |
Family
ID=71868946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910094372.8A Pending CN111508522A (en) | 2019-01-30 | 2019-01-30 | Statement analysis processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111508522A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112686018A (en) * | 2020-12-23 | 2021-04-20 | 科大讯飞股份有限公司 | Text segmentation method, device, equipment and storage medium |
CN113327615A (en) * | 2021-08-02 | 2021-08-31 | 北京世纪好未来教育科技有限公司 | Voice evaluation method, device, equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1333501A (en) * | 2001-07-20 | 2002-01-30 | 北京捷通华声语音技术有限公司 | Dynamic Chinese speech synthesizing method |
US20030149558A1 (en) * | 2000-04-12 | 2003-08-07 | Martin Holsapfel | Method and device for determination of prosodic markers |
CN101000764A (en) * | 2006-12-18 | 2007-07-18 | 黑龙江大学 | Speech synthetic text processing method based on rhythm structure |
CN104464751A (en) * | 2014-11-21 | 2015-03-25 | 科大讯飞股份有限公司 | Method and device for detecting pronunciation rhythm problem |
-
2019
- 2019-01-30 CN CN201910094372.8A patent/CN111508522A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030149558A1 (en) * | 2000-04-12 | 2003-08-07 | Martin Holsapfel | Method and device for determination of prosodic markers |
CN1333501A (en) * | 2001-07-20 | 2002-01-30 | 北京捷通华声语音技术有限公司 | Dynamic Chinese speech synthesizing method |
CN101000764A (en) * | 2006-12-18 | 2007-07-18 | 黑龙江大学 | Speech synthetic text processing method based on rhythm structure |
CN104464751A (en) * | 2014-11-21 | 2015-03-25 | 科大讯飞股份有限公司 | Method and device for detecting pronunciation rhythm problem |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112686018A (en) * | 2020-12-23 | 2021-04-20 | 科大讯飞股份有限公司 | Text segmentation method, device, equipment and storage medium |
CN113327615A (en) * | 2021-08-02 | 2021-08-31 | 北京世纪好未来教育科技有限公司 | Voice evaluation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101551947A (en) | Computer system for assisting spoken language learning | |
Gao et al. | A study on robust detection of pronunciation erroneous tendency based on deep neural network. | |
Duan et al. | A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners. | |
Cahill et al. | Natural language processing for writing and speaking | |
Tseng | ILAS Chinese spoken language resources | |
CN111508522A (en) | Statement analysis processing method and system | |
CN113452871A (en) | System and method for automatically generating lessons from videos | |
Dai | [Retracted] An Automatic Pronunciation Error Detection and Correction Mechanism in English Teaching Based on an Improved Random Forest Model | |
Delmonte | Exploring speech technologies for language learning | |
Fata | Is my stress right or wrong? Studying the production of stress by non-native speaking teachers of English | |
CN101727764A (en) | Method and device for assisting in correcting pronunciation | |
Xu et al. | Application of multimodal NLP instruction combined with speech recognition in oral english practice | |
CN115440193A (en) | Pronunciation evaluation scoring method based on deep learning | |
Bang et al. | An automatic feedback system for English speaking integrating pronunciation and prosody assessments. | |
Ibejih et al. | EDUSTT: In-domain speech recognition for Nigerian accented educational contents in English | |
Pellegrini et al. | Extension of the lectra corpus: classroom lecture transcriptions in european portuguese | |
CN114783412B (en) | Spanish spoken language pronunciation training correction method and system | |
Holaj et al. | L2 Czech Annotation for Automatic Feedback on Pronunciation | |
Liu et al. | Speech disorders classification in phonetic exams with MFCC and DTW | |
Ling et al. | A research on guangzhou dialect's negative transfer on british english pronunciation by speech analyzer software Praat and ear recognition method | |
CN118821737B (en) | Intelligent training system based on end-to-end multi-mode teaching big model | |
Dassanayake | Production of Mandarin Chinese Tones by Sri Lankan CFL Learners: An Acoustic Analysis | |
CN114373454B (en) | Oral language assessment method, device, electronic device and computer-readable storage medium | |
TWI731493B (en) | Multi-lingual speech recognition and theme-semanteme analysis method and device | |
이담 | Revisiting Characteristics of Korean-accented English Using Large-scale Corpus and Automatic Phonetic Transcription |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200807 |