CN112599119A

CN112599119A - Method for establishing and analyzing speech library of dysarthria of motility under big data background

Info

Publication number: CN112599119A
Application number: CN202011546906.5A
Authority: CN
Inventors: 马春; 杜炜; 金力; 阚峻岭
Original assignee: Anhui University of Traditional Chinese Medicine AHUTCM
Current assignee: Anhui University of Traditional Chinese Medicine AHUTCM
Priority date: 2020-05-12
Filing date: 2020-12-24
Publication date: 2021-04-02
Anticipated expiration: 2040-12-24
Also published as: CN112599119B

Abstract

The invention relates to a method for establishing and analyzing a speech database for motor dysarthria under the background of big data, comprising the following steps: design of pronunciation text; speech recording; parameter analysis of speech files; establishment of a database management system; Data analysis for data technology. The invention aims to study the speech characteristics of patients with motor dysarthria caused by nervous system diseases. Relying on the advantages of an open network platform, it can realize the measurement covering large-scale groups and the collection of relevant information, and realize the speech of Mandarin, dialect and healthy people. , patient voice and other voice libraries, and on this basis, establish a word library that meets the diagnosis of patients with motor dysarthria.

Description

Method for establishing and analyzing speech library of dysarthria of motility under big data background

Technical Field

The invention relates to a method for establishing and analyzing a speech library of dysarthria of motility under a big data background.

Background

(1) Current study of motor dysarthria:

motor dysarthria (dysarthria) refers to a group of speech disorders resulting from disturbances in the control of muscles due to damage to the central or peripheral nervous system. Motor dysarthria is often manifested as slowed, weakened, inaccurate and uncoordinated movement of speech-related muscle tissues, and may also affect respiration, resonance, control of throat vocalization, dysarthria and rhythm, and is often referred to as dysarthria clinically. Common causes of motor dysarthria include brain trauma, cerebral palsy, amyotrophic lateral sclerosis, multiple sclerosis, stroke, Parkinson's disease, spinocerebellar ataxia, and the like. Dysarthria can be classified into flaccid, spastic, disorganized, hyperkinetic, and mixed types according to neuroanatomical and speech acoustics. Among the communication disorders associated with brain damage, dysarthria has an incidence rate of up to 54%. At present, the speech acoustics characteristics of dysarthria can be reflected from subjective and objective aspects through examination on aspects of voice, resonance, rhythm and the like in clinic, and the method is favorable for providing targeted treatment and comprehensively and scientifically clarifying the speech acoustics pathological mechanism of dysarthria.

The overall incidence of motor dysarthria has been reported in few domestic and foreign studies, and studies in 125 Parkinson's disease patients by Miller et al showed that 69.6% of patients had lower mean speech intelligibility than the normal control group, with 51.2% of patients having a standard deviation lower, indicating a higher incidence of dysarthria in Parkinson's patients. Bogousslavsky et al screened 1000 patients with primary stroke and found up to 46% of the patients with speech impairment, 12.4% of which were diagnosed with dysarthria. Hartelius et al also found 51% prevalence of dysarthria in patients with multiple sclerosis. This indicates that the incidence of dysarthria is high. At present, there is no unified assessment method for dysarthria at home, the dysarthria of motility has no special assessment standard, the dysarthria assessment method or improvement method and the dysarthria examination table of the Chinese rehabilitation research center are mostly adopted, and the degree and type of dysarthria are examined, scored, recorded and evaluated by clinicians or doctors in rehabilitation departments.

(2) The current research situation of the domestic voice library is as follows:

with the development of information technology and computer science, speech technology makes it possible to interact between machine behaviors and human natural language, and both speech synthesis, speech recognition and speech recognition research are necessarily dependent on the construction of a rear-end excellent speech corpus. At present, foreign speech libraries are developed more maturely, the research of Chinese speech libraries has been rapidly advanced in the last decade, and the research and establishment of speech libraries have fallen to the ground in different languages and cultural contexts. However, the construction of speech libraries for dysarthria is still under investigation.

The evaluation research of the sound-forming voice function in China mainly focuses on subjective evaluation, and only a few researchers distinguish the concept of sound formation and voice. Huang Zhaying et al proposed "Chinese word list for testing the ability to compose sound", the word list contains 50 words, and the speech rehabilitation teacher can comprehensively evaluate the ability to compose sound of 21 initial consonants and 4 tones by evaluating the pronunciation-forming voice of 50 tested words, and meanwhile, the ability to compare the sound position of tested words is evaluated by 18 sound position comparisons and 37 minimum voice pairs. Chen Sanding et al evaluated the initial consonant, vowel and tone of Mandarin Chinese to 50 deaf children, revealed the development law of deaf children's structure sound pronunciation of speaking Mandarin Chinese, still further proposed the pronunciation rehabilitation education principle of early, sequential, fault-tolerant and consolidation. Zhang Jing doctor of the university of east China studied the main wrong trend of hearing-impaired children in the consonant constitution, analyzed the cause, and correspondingly proposed the consonant phoneme treatment framework of hearing-impaired children.

(3) The current research situation of big data in the medical field is as follows:

currently, it is more popular to define big data: data that exceeds the capabilities of a typical database software tool to capture, store, process and analyze. Big data is different from traditional data concepts such as super-large-scale data and mass data, and has four basic characteristics: large amount, diversity, aging and value. Kayyali B et al studied the impact of big data on the U.S. medical industry, indicating that the value of big data will be more and more significant to the medical industry over time. At present, big data in the medical field mainly come from pharmaceutical enterprises, clinical diagnosis data, patient medical data, health management and social network data. For example, drug development is a relatively intensive process, even for small and medium-sized enterprises, data on drug development is above TB; the data of a hospital also increases very fast every day, 3000 images of a patient are imaged once in a dual-source CT examination, 1.5GB image data is generated approximately, a standard pathological examination image is about 5GB image, and the data of the patient such as medical treatment and electronic medical record are added, so that the data increase fast every day. Research methods based on massive big data analysis have led to thinking about scientific methodology. The research does not need to directly contact with a research object, and a new research discovery can be obtained by directly analyzing and mining mass data, so that a new scientific research mode is probably brought forward.

The establishment of the voice corpus is a complicated problem, and the problem that the later perfection of the voice corpus needs to be improved is solved, for example, the existing inter-word tone regulation rules are fully utilized, and the actual situations of tone variation and soft sound are reflected as much as possible. For the deficiency of the corpus, the utilization rate of the existing corpus can be improved in the preprocessing link. For the above reasons, the voice library should be an open database so that it can be added and modified at any time to complete the database. Because the speech conditions are different, the establishment of a specific speech corpus can also encounter various difficulties, and the problems discussed herein are only one kind of discussion for establishing a speech corpus, and hopefully, data support can be provided for speech research, and play an important role in better language development and improvement of the speech corpus.

In addition, the large data volume is undoubtedly a great advantage of the network big data analysis technology, but how to guarantee the quality of the mass data and how to implement the problems of cleaning, managing and analyzing the mass data also become a great technical difficulty of the research of the subject. The massive network big data has the characteristics of multi-source heterogeneity, interactivity, timeliness, burstiness, high noise and the like, so that the network big data has the characteristics of huge value, large noise and low value density. This poses a significant challenge to ensure data quality in network big data analytics research.

Disclosure of Invention

The invention designs a method for establishing and analyzing a speech library for dysarthria of motility under the background of big data, which solves the technical problems that the large data volume is undoubtedly a big advantage of a network big data analysis technology, but the problems of how to ensure the quality of mass data and how to realize the cleaning, management and analysis of the mass data and the like also become a big technical difficulty.

In order to solve the technical problems, the invention adopts the following scheme:

a method for establishing and analyzing a speech library of dysarthria with motility under a big data background comprises the following steps: step 1, designing a pronunciation text;

step 2, recording voice;

step 3, marking the voice file;

step 4, analyzing acoustic parameters of the voice file;

step 5, establishing a database management system;

and 6, analyzing data by a big data technology.

Preferably, the data analysis of the big data technology in step 6 is based on a speech classification mechanism of a Hadoop platform, and specifically includes the following sub-steps:

step 61, collecting a plurality of patient voice files, segmenting and labeling voice segments, constructing a voice database, analyzing the extracted acoustic parameters, and acquiring effective characteristics of voice classification;

step 62, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain corresponding voice classification results;

and step 63, finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the big data voice classification.

Preferably, the designing of the pronunciation text in step 1 includes selecting the pronunciation text, and the selection principle of the corpus of pronunciation texts includes one or more of the following:

a. the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the voices of different patients can be reflected better and more conveniently;

b. the vocabularies in the corpus are based on the common Chinese survey table, so that the vocabularies can be conveniently compared with the common Chinese speech;

c. sentences in the corpus are mainly obtained by carrying out dialogue with the patient according to a plurality of related topics, so that the method is more suitable for the real situation faced by speech recognition; "several related topics" include daily life topics or medical history topics, such as queries for time to first onset and medical history.

d. Sentences in the corpus are complete in content and semanteme, so that prosodic information of one sentence can be reflected as much as possible;

e. the three phones are not classified and selected, so that the problem of sparse training data can be effectively solved.

Preferably, the designing of the pronunciation text in step 1 further includes compiling the pronunciation text, and the compiling principle of the pronunciation text includes one or more of the following:

a. a single-word part: taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library;

b. vocabulary part: based on a four thousand word list, but not limited to the four thousand word list, the related words are recorded according to the original conclusion about the related sound system, the voice characteristics including the characteristics of tone quality and super-sound quality can be comprehensively reflected, and example words can be added to reflect the characteristics of the voice phenomenon with great particularity; "recording related words for conclusion of related sound system" refers to a general vocabulary summarized according to the characteristics of sound, combination law, rhythm and intonation used in the same language.

The characteristic speech phenomenon refers to the situation that the dialect is easy to read wrongly, such as the situation that the flat tongue sound and the warped tongue sound are difficult to distinguish, and f and h are not divided.

c. Sentence material part: determining the number of the linguistic data according to the language mastering degree of different speakers, wherein the linguistic data is selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible; "representative" as used herein refers to a general sentence that characterizes dysarthric speech.

d. And a natural conversation part: the method is characterized in that the method is used for recording 20-40 minutes of voice materials of a speaker in the forms of answering questions and freely talking, relates to words different from common Chinese in daily spoken language and requires the speaker to speak in a dialect.

Preferably, the voice recording of step 2 includes determination of speaker, and the selection principle of the speaker is to select a native speaker who has clear mouth and teeth, moderate speech rate ("moderate speech rate" means moderate speech rate, controlled at 150 words/minute) and proficient use of local language and is willing to actively cooperate with investigation, and to ensure that the language environment of the speaker is relatively stable and has cultural degree; or/and, the voice recording further comprises voice acquisition through a voice acquisition device, and the voice acquisition adopts two modes: one is the reading with prompt text, the prompt is the text material of Chinese, the speaker converts it into own native language and reads aloud; the other is natural voice, and the speaker tells the folk story, the folk life condition and humming of local folk songs by using prompts.

Preferably, the analyzing the acoustic parameters of the voice file in step 4 includes voice labeling of the voice library, where the basic voice labeling includes segmentation and alignment of initials and finals of each syllable, and labeling of initials and finals, and includes two parts: the first part is character marking, Chinese character + pinyin is character pronunciation transcription, and the voice information is recorded by Chinese characters so as to be provided for an identification system and also provide materials for the research of linguistics; the character label must mark basic character information and sublingual phenomenon, and the sublingual phenomenon in the basic label can be represented by a general sublingual symbol; the second part is syllable label, the standard mandarin syllable label is adopted in the mandarin syllable label, and the syllable label is tone label; in the tone notation, 0 indicates a light tone, 1 indicates a yin-ping, 2 indicates a yang-ping, 3 indicates a rising tone, and 4 indicates a falling tone.

Preferably, the analyzing of the acoustic parameters of the voice file in step 4 further includes extracting acoustic parameters; firstly, segmenting recorded voice and eliminating mute sections to ensure that analyzed objects are single words, phrases, sentences and conversations; then, judging the start and end sections of the voice signal in the voice waveform data, and labeling the voice; and finally, obtaining corresponding fundamental frequency and formant acoustic analysis parameter data according to an autocorrelation algorithm.

Preferably, the establishing of the database management system in step 5 includes selecting a database, and selecting an sql database management system which is easier to implement.

A big data voice classification flow method based on a Hadoop platform comprises the following steps: the establishing method is used for establishing a voice library, on the basis of the voice library, a Map function is adopted to subdivide a big data voice classification problem based on a Hadoop platform, and a multi-node parallel and distributed sub-problem is used for carrying out voice classification solution to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.

The method comprises the following specific steps:

(1) the Client submits a voice classification task to a Job Tracker of the Hadoop platform, and the Job Tracker copies voice characteristic data to a local distributed file processing system;

(2) initializing voice classified tasks, putting the tasks into a Task queue, and distributing the tasks to corresponding nodes, namely a Task Tracker, by a Job Tracker according to the processing capacity of different nodes;

(3) each Task Tracker adopts a support vector machine to fit the relation between the voice features to be classified and a voice feature library according to the distributed tasks to obtain the corresponding categories of the voice;

(4) taking the corresponding class of the voice as Key/Value, and storing the Key/Value into a local file disk;

(5) if the Key/Value of the voice classification intermediate result is the same, merging the intermediate result, delivering the merged result to Reduce for processing to obtain a voice classification result, and writing the result into a distributed file processing system;

(6) and the Job Tracker performs emptying processing on the task state, and the user obtains a voice classification result from the distributed file processing system.

The method for establishing and analyzing the speech library of dysarthria with motility under the big data background has the following beneficial effects:

(1) the invention aims to research the voice characteristics of patients with motor dysarthria caused by nervous system diseases, can realize measurement covering large-scale groups and collection of related information by relying on the advantages of an open network platform, realizes establishment of voice libraries such as mandarin, dialects, healthy human voices and patient voices, and establishes a word library meeting the condition diagnosis of the patients with motor dysarthria on the basis.

(2) Under the condition that the voice library is continuously expanded, a rich data resource center is finally established according to information such as Putonghua, dialect, different medical histories and different disease conditions, a network autonomous diagnosis way is provided for patients with nervous system diseases, doctors can be assisted in clinical diagnosis and treatment, and a rich and accurate data platform is provided for quantification of the disease conditions of the nervous system diseases.

(3) On the basis of a voice library, based on a Hadoop platform, a Map function is adopted to subdivide a big data voice classification problem, and a multi-node parallel and distributed sub-problem is used for carrying out voice classification solution to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.

Drawings

FIG. 1: the speech annotation of "bao" in the embodiment of the present invention is exemplified.

FIG. 2: in the embodiment of the invention, the resonance peak data of the 'bao' voice is obtained.

FIG. 3: the basic framework of the Hadoop platform in the embodiment of the invention.

FIG. 4 shows a big data voice classification process based on a Hadoop platform.

Detailed Description

The invention is further illustrated below with reference to fig. 1 to 4:

the voice library is composed of an unvoiced sound library, a voiced sound library, a tone library, a voice synthesis program and a Chinese-pinyin conversion program.

1. Establishing an unvoiced sound library:

according to the characteristics of unvoiced sound, the quality of synthesized speech is improved. The unvoiced sound library is established by adopting a direct sampling method. That is, the unvoiced parts in front of the voiced speech segments in various pinyin combinations are sampled to form an unvoiced speech library. Since the unvoiced sounds in 1 syllable actually occupy only a small part, the unvoiced sound library constituted by unvoiced sounds extracted from 400 unvoiced syllables actually occupies a small storage space.

2. Establishing a voiced sound library:

voiced sounds are synthesized by a voiced synthesis program calling VTFR synthesis for voiced sounds. The voiced sound library is actually composed of VTFRs of various voiced sounds, VTFRs of various voiced sounds are sequentially extracted by adopting a VTFR extracting program, and the VTFRs of various voiced sounds and a voiced sound synthesizing program are stored in 1 data packet, so that the voiced sound library is formed. The actually extracted VTFR is only 1 curve, and the space occupied by the voiced sound library formed by the curve is very small.

The establishment of the voice corpus mainly comprises the following four main processes: designing a pronunciation text; recording voice; analyzing parameters of the voice file; establishing a database management system; data analysis of big data technology.

1. Designing a pronunciation text;

1.1 selection of pronunciation text:

how to select corpora is the key of corpus database construction. In order to ensure the order and effectiveness of the database building work and the quality of the corpus, a selection principle of the corpus is firstly researched and formulated before the corpus is built. The selection principle of the speech corpus comprises the following steps: firstly, the single characters in the corpus are required to contain all the phonological phenomena as much as possible, so that the phonetic system characteristics of the dialect speech can be reflected better and more conveniently; secondly, the vocabularies in the corpus are based on the Chinese survey common table, so that the vocabularies can be conveniently compared with the Chinese mandarin; thirdly, sentences in the corpus are mainly selected from spoken language corpora | so that the method is more suitable for the real situation faced by speech recognition; the sentences in the corpus are complete in content and semanteme, so that the prosodic information of one sentence can be reflected as much as possible; and fifthly, selecting three phones without classification, so that the problem of sparse training data can be effectively solved.

1.2, preparation of pronunciation texts:

the formulation of pronunciation texts is one of the key links for establishing a voice database. When determining pronunciation materials, the selection principle of pronunciation texts comprises five parts: one is the single word portion. Taking the initial consonants, the simple or compound vowels and some commonly used characters of the tone listed in the survey word list as the language materials used for the main recording of the voice library; the second is the vocabulary part. Based on a four thousand word list, but not limited to the four thousand word list, the related words are recorded according to the original conclusion about the related sound system, the voice characteristics including the characteristics of tone quality and super-sound quality can be comprehensively reflected, and example words can be added to reflect the characteristics of the voice phenomenon with great particularity; thirdly, the statement material part determines the number of the linguistic data according to the language mastering degree of different speakers, and the linguistic data is selected to have certain representativeness while the range of the linguistic data is ensured to be as wide as possible; and fourthly, a natural conversation part, which is a subject of daily life, records voice materials of a speaker for about half an hour in a form of answering questions and freely talking, relates to words in daily spoken language which are different from the common Chinese language, and requires the speaker to speak in a dialect.

2. Recording voice;

2.1 determination of speaker:

the selection principle of the speaker is to select a native speaker who has clear mouth and teeth, moderate speech speed, proficient use of local language and willing to actively cooperate with investigation, and to ensure that the language environment of the speaker is stable and has a certain cultural degree.

2.2 voice collection:

the speaking mode in the recording process directly determines the purpose of the voice library. Because of the particularity of collecting the corpus, according to different research purposes, two modes are adopted: one is reading aloud with prompt text, the prompt is the literal material of Chinese | the speaker converts it into his own native language and reads aloud; the other is natural voice, and the speaker can tell the folk story, the national living condition, the humming of the local folk song and the like by using prompts.

3. Parameter analysis for the speech file:

after the pronunciation text is recorded, the voice data needs to be analyzed to obtain different features of the voice signal, which is a key for designing the voice corpus and a necessary basis for the post-stage voice processing. The invention focuses on researching voice information, so that the basic attribute of the voice signal waveform needs to be labeled, and meanwhile, the related acoustic parameters are extracted.

3.1 information annotation of the voice library:

the voice labeling uses Praat software and carries out hierarchical labeling by referring to a Chinese sound segment labeling system SAMPA-C. The labels of the voice library comprise a text label and a sound mixing label, wherein the voice "bao" is taken as an example, and is shown in fig. 1.

The first part is character marking, Chinese character + pinyin is character pronunciation transcription, and the phonetic information is recorded with Chinese character for identification system and linguistic research. The character label must mark basic character information and sublingual phenomena, and the sublingual phenomena in the basic label can be represented by a universal sublingual symbol.

The second part is a syllable label, the mandarin syllable label adopts a standard mandarin syllable label, and the syllable label is a tonal label. In the tone notation, 0 indicates a light tone, 1 indicates a yin-ping, 2 indicates a yang-ping, 3 indicates a rising tone, and 4 indicates a falling tone.

3.2 extraction of acoustic parameters:

for the recorded voice signals, the acoustic parameters of each speech segment need to be extracted, and in actual operation, the recorded voice is firstly segmented and the mute segments are eliminated so as to ensure that the analyzed objects are single words; then, judging the start and end sections of the voice signals in the voice waveform data, and marking the range of the vowels; finally, corresponding fundamental frequency and resonance peak data are obtained according to an autocorrelation algorithm, taking voice "bao" as an example, as shown in fig. 2.

4. Establishing a database management system:

4.1 database selection

For the selection of the database, because a large amount of voice waveform data needs to be stored in the voice database, the voice waveform data is characterized by large data volume, unfixed length, and lower requirements on aspects of transaction processing and recovery, safety, network support and the like. Therefore, we can choose a more easily implemented sql database management system.

4.2 creation of database management System

Establishing a database management system in a voice corpus needs to store four materials, namely, speaker attribute materials, such as age, gender, education condition, Chinese mastering condition, mother language use condition and the like of a speaker; secondly, a pronunciation text material is recorded and stored, and the pronunciation of the pronunciation person and text materials such as dialect pronunciation and mandarin international phonetic symbol corresponding to the pronunciation person are recorded and stored; thirdly, actual voice data material is mainly used for storing original parameters of the recorded voice waveform graph; and fourthly, storing acoustic analysis parameter data, namely the acoustic parameters extracted from the processed voice waveform.

5. Data analysis for big data technology

The big data is a data set with large scale which greatly exceeds the capability range of the traditional database software tools in the aspects of acquisition, storage, management and analysis, and has the four characteristics of large data scale, rapid data circulation, various data types and low value density. The strategic significance of big data technology is not to grasp huge data information, but to specialize the data containing significance. In other words, if big data is compared to an industry, the key to realizing profitability in the industry is to improve the "processing ability" of the data and realize the "value-added" of the data through the "processing". In the word bank construction, the important value of the big data technology is that the aim of evaluating the quality of the voice elements in the word bank is achieved through the targeted analysis and research on the data, so that the word bank construction is more complete.

The word stock is shared through the network platform, so that tests of different crowds are facilitated, more data samples are obtained, the voice library is enriched, in the future, the more targeted exercise dysarthria patient word stock can be established according to different regions and different dialects, and more abundant and reliable data samples are provided for subsequent automatic identification of disease classification and classification.

As shown in fig. 3, a speech classification mechanism based on a Hadoop platform is proposed, which includes collecting a large number of images, constructing an image database, and extracting effective features of image classification; then, based on a Hadoop platform, subdividing the big data voice classification problem by adopting a Map function, and performing voice classification solution on the subproblems in a multi-node parallel and distributed manner to obtain a corresponding voice classification result; and finally, combining the voice classification results of the sub-problems by using a Reduce function so as to adapt to the online requirement of the voice classification of the big data.

As shown in fig. 4, the big data speech classification process based on the Hadoop platform includes the following specific steps:

The invention is described above with reference to the accompanying drawings, it is obvious that the implementation of the invention is not limited in the above manner, and it is within the scope of the invention to adopt various modifications of the inventive method concept and solution, or to apply the inventive concept and solution directly to other applications without modification.

Claims

1. A method for establishing and analyzing a speech database for motor dysarthria under the background of big data, comprising the following steps:

Step 1. Design of pronunciation text;

Step 2, voice recording;

Step 3, mark the voice file;

Step 4, analyze the acoustic parameters of the voice file;

Step 5, the establishment of the database management system;

Step 6: Data analysis of big data technology.

2. the establishment and analysis method of motion dysarthria speech database under the big data background according to claim 1, it is characterized in that: the data analysis of big data technology in described step 6 is based on the speech classification mechanism of Hadoop platform, concrete It includes the following sub-steps:

Step 61, collecting a plurality of patient voice files, segmenting and labeling the voice, constructing a voice database, analyzing the extracted acoustic parameters, and obtaining the effective features of voice classification;

Step 62, then based on the Hadoop platform, adopt the Map function to subdivide the big data voice classification problem, and use the multi-node parallel and distributed voice classification solution to the sub-problem to obtain the corresponding voice classification result;

Step 63: Finally, use the Reduce function to combine the speech classification results of the sub-problems to meet the online requirements of big data speech classification.

3. under the big data background according to claim 1 and 2, the establishment and analysis method of the motor dysarthria speech library, is characterized in that:

The design of the pronunciation text in the described step 1 includes the selection of the pronunciation text, and the selection principle of the corpus of the pronunciation text includes one or more of the following:

a. The words in the corpus are required to include all phonological phenomena as much as possible, which can better and more conveniently reflect the phonological characteristics of different patients' voices;

b. The vocabulary in the corpus is based on the commonly used Chinese survey table, so it can be easily compared with Mandarin Chinese;

c. The sentences in the corpus are mainly obtained from conversations with patients based on several related topics, so they are more in line with the real situation faced by speech recognition;

d. The sentences in the corpus are complete in content and semantics, so they can reflect the prosodic information of a sentence as much as possible;

e. The selection of triphones without classification can effectively solve the problem of sparse training data.

4. under the big data background according to claim 3, the establishment and analysis method of the motor dysarthria speech library, is characterized in that:

The design of the pronunciation text described in the step 1 also includes the preparation of the pronunciation text, and the preparation principle of the pronunciation text includes one or more of the following:

a. Single-character part: take the initials, finals, and tones listed in the survey word list as the corpus used for the main recording of this phonetic database;

b. Vocabulary part: Based on at least a 4,000-word vocabulary, record relevant words according to the original conclusions about the relevant phonology, and strive to fully reflect its phonetic characteristics, including the characteristics of sound quality and super sound quality, for some very distinctive voices Phenomenon, example words can be added to reflect its characteristics;

c. Sentence material part: The number of corpus is determined according to the language mastery of different speakers. When selecting, it is necessary to ensure that the range of the corpus is as wide as possible, and it needs to be representative to a certain extent;

d. Natural dialogue part: the topic of daily life, in the form of answering questions and free conversation, recording the speaker's voice material for 20-40 minutes, involving words that are different in spoken language and Mandarin, and requiring the speaker to speak it in dialect.

5. the establishment and analysis method of the motor dysarthria speech library under the big data background according to claim 4, is characterized in that:

The voice recording of the step 2 includes the determination of the speaker, and the selection principle of the speaker is to select a native speaker who has clear articulation, moderate speaking speed, skilled use of native language and is willing to actively cooperate with the investigation, and also ensures that the The language environment is relatively stable, and at the same time, there must be an educational level;

Or/and, the voice recording also includes voice acquisition by a voice collector, and the voice acquisition adopts two methods: one is reading aloud with prompt text, and the prompt is Chinese text material, and the speaker converts it into own native language and read aloud; the other is natural voice, the speaker uses prompts to tell folk stories, national living conditions and humming of local folk songs.

6. according to the establishment and analysis method of motion dysarthria speech library under the big data background described in any one of claim 1-5, it is characterized in that:

The acoustic parameter analysis of the voice file described in step 4 includes the phonetic annotation of the phonetic library, and the basic phonetic annotation includes the segmentation and alignment of the vowels and finals of each syllable, and the annotation of the vowel, including two parts:

The first part is text annotation, Chinese characters + pinyin is the phonetic transcription, and the phonetic information is recorded in Chinese characters so that it can be used by the recognition system and can also provide materials for linguistic research; the text annotation must indicate basic text information and paralinguistics. Phenomenon, paralinguistic phenomena in basic annotations can be represented by general paralinguistic symbols;

The second part is the syllable labeling. The standard Mandarin syllable labeling is used for the standard Mandarin syllable labeling, and the syllable labeling is a tone labeling; in the tone labeling, 0 means soft tone, 1 means Yinping, 2 means Yangping, 3 means rising tone, and 4 means eliminating tone.

7. the establishment and analysis method of motion dysarthria speech library under the big data background according to claim 6, is characterized in that: the acoustic parameter analysis to speech file described in step 4 also comprises the extraction of acoustic parameter;

First, the recorded speech is segmented and the mute segment is eliminated to ensure that the objects of analysis are single words, phrases, sentences, and dialogues; Finally, the corresponding fundamental frequency and formant acoustic analysis parameter data are obtained according to the autocorrelation algorithm.

8. according to the establishment and analysis method of motion dysarthria speech library under the big data background described in any one of claim 1-7, it is characterized in that: the establishment of described database management system in step 5 comprises the selection of database , choose the easier to implement sql database management system.

9. under the background of big data according to claim 8, the establishment and analysis method of the speech library of motor dysarthria, it is characterized in that: need to store four kinds of materials in the establishment of the database management system described in step 5: one is pronunciation Person attribute material; second, pronunciation text material, input and store patient pronunciation material and its corresponding pronunciation and text materials such as Mandarin Chinese International Phonetic Alphabet; third, actual speech data material, used to save the original parameters of the recorded voice waveform; fourth It is the acoustic analysis parameter data, that is, the preservation of the acoustic parameters extracted from the processed speech waveform.