CN109346058B - A system for expanding speech acoustic features - Google Patents
A system for expanding speech acoustic features Download PDFInfo
- Publication number
- CN109346058B CN109346058B CN201811443497.9A CN201811443497A CN109346058B CN 109346058 B CN109346058 B CN 109346058B CN 201811443497 A CN201811443497 A CN 201811443497A CN 109346058 B CN109346058 B CN 109346058B
- Authority
- CN
- China
- Prior art keywords
- voice
- speech
- sound
- video
- submodule
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The application belongs to the technical field of sound processing, and particularly relates to a voice acoustic feature expansion system. In the language learning process, the acoustic characteristics of the language are required to be enlarged, and then corpus suitable for brain perception is produced for the learner to stimulate the brain. The application provides a voice acoustic characteristic expanding system, which comprises a voice acquisition unit, a voice processing unit and a video editing unit, wherein the voice acquisition unit is connected with the voice processing unit; the voice acquisition unit is used for acquiring natural voice; the voice processing unit is used for expanding the frequency spectrum characteristics in the natural voice to different degrees so as to manufacture corpus; the video editing unit is used for editing the voice video and the processed voice to synthesize a video clip. The speech acoustic feature expansion system can produce corpus more suitable for brain perception, thereby helping learners form speech categories in the brain more similar to those of the native speaker.
Description
Technical Field
The application belongs to the technical field of sound processing, and particularly relates to a voice acoustic feature expansion system.
Background
With the rapid development of related fields such as bioengineering, computer science, data statistics processing, brain imaging technology and the like, brain science research combines the advantages of interdisciplinary subjects, and completely new exploration is carried out on the interaction process of brain development and growth and language learning environments. Studies have shown that infants gradually lose sensitivity to non-native language speech after 12 months, thereby creating a hurdle to future foreign language speech learning. One person often gets habit to learn a new language from his original speech perception, so that the foreign language speech similar to the pronunciation of the native language is received faster, and the speech not in the native language is received more difficult. However, when learning a voice similar to a native language, a learner is more likely to be affected by the native language, thereby generating an accent. For example, the united states may have a different perception than the brain of a chinese person for the same english language.
Because it is insensitive to non-native language speech, the learner cannot fully receive language information first audibly, so it is difficult to pronounce it correctly. At the same time, each time a learner learns a phoneme, it is necessary to establish a voice category of the phoneme in the brain. This speech category is not a point, but a collection. Because the language environment in which the foreign language learner is in contact with the native language learner is not comparable, the voice category established in their brains is far from.
In the language learning process, acoustic features of natural voice are expanded to produce corpus suitable for brain perception for learners, and nerve systems of the learners, which lose sensitivity to non-native language voice, are stimulated to be reopened so as to comprehensively receive voice information, thereby helping the learners to form a voice category more similar to that of the native language learners in the brain.
Disclosure of Invention
1. Technical problem to be solved
Based on the fact that acoustic features of natural voices are expanded in the language learning process, corpus suitable for brain perception is manufactured for learners, nerve systems of the learners, which lose sensitivity to non-native voices, are stimulated to be opened again, and voice information is comprehensively received, so that the learners are helped to form voice categories which are closer to the native voices in the brain.
2. Technical proposal
In order to achieve the above object, the present application provides a speech acoustic feature expansion system, which includes a speech acquisition unit, the speech acquisition unit is connected with a speech processing unit, and the speech processing unit is connected with a video editing unit;
the voice acquisition unit is used for acquiring natural voice;
the voice processing unit is used for expanding the frequency spectrum characteristics in the natural voice to different degrees and manufacturing corpus;
the video editing unit is used for editing the voice video and the processed voice to synthesize different video clips.
Optionally, the voice processing unit comprises a MATLAB-based sound processing module.
Optionally, the MATLAB-based sound processing module includes a formant frequency difference expansion sub-module, a pitch synchronization overlap sub-module, a frequency separation sub-module, a bandwidth separation sub-module, and a gap separation sub-module.
Optionally, the MATLAB-based sound processing module includes a sound analysis sub-module and a sound synthesis sub-module.
Optionally, the video editing unit includes a format processing module and a frame rate processing module.
Optionally, the speech processing unit is configured to perform 3 different degrees of expansion on the spectral features in speech, which are 300%,208%,144%, respectively, so as to make a corpus.
3. Advantageous effects
Compared with the prior art, the voice acoustic feature expansion system provided by the application has the beneficial effects that:
The voice acoustic characteristic expanding system provided by the application is characterized in that a voice acquisition unit, a voice processing unit and a video editing unit are connected; and expanding the frequency spectrum characteristics of the natural voice to manufacture video. The acoustic characteristics of the voice contacted when the infant learns the language are simulated, corpus suitable for brain perception is manufactured for a learner to stimulate the brain, so that the brain with reduced sensitivity to foreign language voice can clearly perceive the physical acoustic characteristics of the voice, thereby establishing a voice category similar to a mother language in the brain, and further improving the accuracy of pronunciation.
Drawings
FIG. 1 is a schematic diagram of a speech acoustic feature augmentation system of the present application;
In the figure: the system comprises a 1-voice acquisition unit, a 2-voice processing unit, a 3-video editing unit, a 4-MATLAB-based voice processing module, a 5-formant frequency difference expansion sub-module, a 6-pitch synchronous splicing sub-module, a 7-frequency separation sub-module, an 8-bandwidth separation sub-module, a 9-gap separation sub-module, a 10-voice analysis sub-module, an 11-voice synthesis sub-module, a 12-format processing module and a 13-frame frequency processing module.
Detailed Description
Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and according to these detailed descriptions, those skilled in the art can clearly understand the present application and can practice the present application. Features from various embodiments may be combined to obtain new implementations, or substituted for certain features from certain embodiments to obtain further preferred implementations, without departing from the principles of the application.
The speech unit of the "infant" is exaggeratedly represented by the vibration frequency of the vocal cords and the resonance frequencies of the oral cavity, the laryngeal cavity, and the nasal cavity, and the gap between formants peculiar to vowels is artificially increased. This exaggeration not only allows the infant to easily discern phonetic units, but also simultaneously perceives key phonetic elements in the native language that distinguish the meaning of a single word. The mother and child sounds when speaking have great flexibility and variability, and such flexibility variation helps the infant establish an effective acoustic pattern for speech classification, i.e. a native speech category for each phoneme in the brain. The brain science field discovers that the infant learned native language voice process has the following characteristics: 1) Infants have the opportunity to hear sounds of various people speaking; 2) They have the opportunity to see the pronunciation mouth shape of different people; 3) The sound of the mother speaking to the infant is exaggeratedly represented by the vibration frequency of the vocal cords and the resonance frequencies of the oral cavity, the laryngeal cavity, and the nasal cavity. These three elements are very useful in utilizing infants to facilitate the ability to distinguish phonetic differences between speech and to build a comprehensive native language speech category.
Corpus, i.e. language material. Corpus is the content of linguistic studies. Corpus is a basic unit constituting a corpus.
The baby-ward (MATHERESE, or "mother") is a language used by adults, especially by mother speaking to infants. The language content and form (words, intonation, speed, etc.) need to be adapted to the language ability and cognitive ability of children, considering the understanding and acceptance ability of babies. Studies have shown that the vergence has a physical acoustic characteristic that is expanded in terms of speech over normal language.
Referring to fig. 1, the application provides a voice acoustic feature expanding system, which comprises a voice acquisition unit 1, wherein the voice acquisition unit 1 is connected with a voice processing unit 2, and the voice processing unit 2 is connected with a video editing unit 3;
the voice acquisition unit 1 is used for acquiring natural voice;
the voice processing unit 2 is used for expanding the frequency spectrum characteristics in the natural voice to different degrees and manufacturing corpus;
the video editing unit 3 is configured to edit the voice video and the processed voice to synthesize different video clips.
Optionally, the speech processing unit 2 comprises a MATLAB-based sound processing module 4.
Optionally, the MATLAB-based sound processing module 4 includes a formant frequency difference expansion sub-module 5, a pitch synchronization overlap sub-module 6, a frequency division sub-module 7, a bandwidth division sub-module 8, and a gap division sub-module 9.
Optionally, the MATLAB-based sound processing module 4 comprises a sound analysis sub-module 10 and a sound synthesis sub-module 11. The sound analysis sub-module 10 analyzes the acquired sound, and then synthesizes a new sound by the sound synthesis sub-module 11.
Optionally, the video editing sheet 3 includes a format processing module 12 and a frame rate processing module 13.
Optionally, the speech processing unit 2 is configured to perform 3 different degrees of expansion on the spectral features in speech, which are 300%,208%,144%, respectively, so as to make a corpus.
Examples
Amplifying the target speech is important to distinguish between acoustic elements. For each group of voices to be trained, the physical parameters of a specific natural sound process need to be determined according to the distinguishing factors of the acoustic characteristics of the two voices.
The natural sound recording is obtained through the voice obtaining unit 1 and then transmitted to the voice processing unit 2, the spectral characteristics in the voice are amplified to 3 different degrees through the MATLAB voice processing module 4, the spectral characteristics are 300%,208% and 144% respectively, and then the voice is made into four-level training corpus together with the original voice. For example, english language voice/r-l/pair, 3 parameters are F3 separation frequency, F3 bandwidth and F3 transition time. During the synthesis process, the sub-module 5 amplifies the formant frequency difference of/r-l/by the formant frequency difference and reduces the F3 bandwidth. The amplification of the/r-l/time characteristic is then added by the pitch synchronous splicing submodule 6 using a time-warping technique. For example, the vowels/I-I/pairs of english are separated into frequencies and bandwidths of F1 and F2 by the frequency separation sub-module 7, the bandwidth separation sub-module 8 and the gap separation sub-module 9, and gaps between F1 and F2 are adjusted.
The sub-module "LPC ANALYSIS AND SYNTHESIS of Speech" in the MATLAB sound processing module 4 is used for the fabrication. LPC refers to Linear Prediction Coding. Including the sound analysis sub-module 10 and the sound synthesis sub-module 11, new sounds can be analyzed and synthesized. (see: DSP SYSTEM Toolbox TM functionality available at the for operation)command line.)
After the sound processing is finished, the Final Cut Pro7 is used, and the method comprises a format processing module 12 and a frame frequency processing module 13, wherein different formats and frame frequencies can be mixed and matched in a time axis, videos of the sound are processed through synchronizing slow shot videos of different versions and time stretching audio tracks, and then the processed videos and the processed sounds are put together to edit and synthesize different video clips to be used as corpus for further manufacturing training software.
The voice acoustic characteristic expanding system provided by the application is characterized in that a voice acquisition unit, a voice processing unit and a video editing unit are connected; and expanding the frequency spectrum characteristics of the voice to manufacture video. The acoustic characteristics of the voice contacted when the infant learns the language are simulated, corpus suitable for brain perception is manufactured for a learner to stimulate the brain, so that the brain with reduced sensitivity to foreign language voice can clearly perceive the physical acoustic characteristics of the voice in a hearing way, a voice category similar to a mother language is established in the brain, and the accuracy of pronunciation is further improved.
Although the application has been described with reference to specific embodiments, those skilled in the art will appreciate that many modifications are possible in the construction and detail of the application disclosed within the spirit and scope thereof. The scope of the application is to be determined by the appended claims, and it is intended that the claims cover all modifications that are within the literal meaning or range of equivalents of the technical features of the claims.
Claims (1)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811443497.9A CN109346058B (en) | 2018-11-29 | 2018-11-29 | A system for expanding speech acoustic features |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201811443497.9A CN109346058B (en) | 2018-11-29 | 2018-11-29 | A system for expanding speech acoustic features |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN109346058A CN109346058A (en) | 2019-02-15 |
| CN109346058B true CN109346058B (en) | 2024-06-28 |
Family
ID=65319541
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201811443497.9A Active CN109346058B (en) | 2018-11-29 | 2018-11-29 | A system for expanding speech acoustic features |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN109346058B (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1669074A (en) * | 2002-10-31 | 2005-09-14 | 富士通株式会社 | voice enhancement device |
| CN109378015A (en) * | 2018-11-29 | 2019-02-22 | 西安交通大学 | A kind of language learning system and method |
| CN209388698U (en) * | 2018-11-29 | 2019-09-13 | 西安交通大学 | A kind of speech acoustics feature expansion system |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH0493980A (en) * | 1990-08-06 | 1992-03-26 | Takeshige Fujitani | Language learning system |
| GB9714001D0 (en) * | 1997-07-02 | 1997-09-10 | Simoco Europ Limited | Method and apparatus for speech enhancement in a speech communication system |
| US20020128839A1 (en) * | 2001-01-12 | 2002-09-12 | Ulf Lindgren | Speech bandwidth extension |
| KR100427243B1 (en) * | 2002-06-10 | 2004-04-14 | 휴먼씽크(주) | Method and apparatus for analysing a pitch, method and system for discriminating a corporal punishment, and computer readable medium storing a program thereof |
| CN1564245A (en) * | 2004-04-20 | 2005-01-12 | 上海上悦通讯技术有限公司 | Stunt method and device for baby's crying |
| US7676362B2 (en) * | 2004-12-31 | 2010-03-09 | Motorola, Inc. | Method and apparatus for enhancing loudness of a speech signal |
| US20070168187A1 (en) * | 2006-01-13 | 2007-07-19 | Samuel Fletcher | Real time voice analysis and method for providing speech therapy |
| US9117455B2 (en) * | 2011-07-29 | 2015-08-25 | Dts Llc | Adaptive voice intelligibility processor |
| CN105023574B (en) * | 2014-04-30 | 2018-06-15 | 科大讯飞股份有限公司 | A kind of method and system for realizing synthesis speech enhan-cement |
| CN105982641A (en) * | 2015-01-30 | 2016-10-05 | 上海泰亿格康复医疗科技股份有限公司 | Speech and language hypoacousie multi-parameter diagnosis and rehabilitation apparatus and cloud rehabilitation system |
| CN106710604A (en) * | 2016-12-07 | 2017-05-24 | 天津大学 | Formant enhancement apparatus and method for improving speech intelligibility |
-
2018
- 2018-11-29 CN CN201811443497.9A patent/CN109346058B/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1669074A (en) * | 2002-10-31 | 2005-09-14 | 富士通株式会社 | voice enhancement device |
| CN109378015A (en) * | 2018-11-29 | 2019-02-22 | 西安交通大学 | A kind of language learning system and method |
| CN209388698U (en) * | 2018-11-29 | 2019-09-13 | 西安交通大学 | A kind of speech acoustics feature expansion system |
Also Published As
| Publication number | Publication date |
|---|---|
| CN109346058A (en) | 2019-02-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Nakata et al. | Effect of cochlear implants on children’s perception and production of speech prosody | |
| Bent et al. | The influence of talker and foreign-accent variability on spoken word identification | |
| CN104081453A (en) | System and method for acoustic transformation | |
| Zhang et al. | Adjustment of cue weighting in speech by speakers and listeners: Evidence from amplitude and duration modifications of Mandarin Chinese tone | |
| KR20150076126A (en) | System and method on education supporting of pronunciation including dynamically changing pronunciation supporting means | |
| Taimi et al. | Children Learning a Non-native Vowel–The Effect of a Two-day Production Training. | |
| Athari et al. | Vocal imitation between mothers and infants | |
| CN109378015B (en) | A phonetic learning system and method | |
| Lin et al. | End-to-end articulatory modeling for dysarthric articulatory attribute detection | |
| Kabakoff et al. | Training a non-native vowel contrast with a distributional learning paradigm results in improved perception and production | |
| Hongwei et al. | An investigation of tone perception and production in German learners of Mandarin | |
| CN109346058B (en) | A system for expanding speech acoustic features | |
| CN209388698U (en) | A kind of speech acoustics feature expansion system | |
| Yang et al. | The development of tonal duration in Mandarin-speaking children | |
| CN209388701U (en) | A kind of language learning system | |
| Feng et al. | The ability to use contextual cues to achieve phonological constancy emerges by 14 months. | |
| Wong | Mothers do not enhance tonal contrasts in child-directed speech: Perceptual and acoustic evidence from child-directed Mandarin lexical tones | |
| Georgopoulos | An investigation of audio-visual speech recognition as applied to multimedia speech therapy applications | |
| Hennecke | Audio-visual speech recognition: preprocessing, learning and sensory integration | |
| Escudero et al. | Have four-year-olds mastered vowel reduction in English? An acoustic analysis of bilingual and monolingual child storytelling | |
| David | Ears for computers | |
| TWI806703B (en) | Auxiliary method and system for voice correction | |
| Sairanen | Deep learning text-to-speech synthesis with Flowtron and WaveGlow | |
| JP2011232775A (en) | Pronunciation learning device and pronunciation learning program | |
| Lee et al. | A short-term longitudinal study of vocal development in young children with simultaneous bilateral cochlear implants |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |