[go: up one dir, main page]

CN109326281B - Rhythm labeling method, device and equipment - Google Patents

Rhythm labeling method, device and equipment Download PDF

Info

Publication number
CN109326281B
CN109326281B CN201810988973.9A CN201810988973A CN109326281B CN 109326281 B CN109326281 B CN 109326281B CN 201810988973 A CN201810988973 A CN 201810988973A CN 109326281 B CN109326281 B CN 109326281B
Authority
CN
China
Prior art keywords
prosody
text
voice data
information
marked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810988973.9A
Other languages
Chinese (zh)
Other versions
CN109326281A (en
Inventor
孟君
曹琼
廖晓玲
郝玉峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haitian Rui Sheng Polytron Technologies Inc
Original Assignee
Beijing Haitian Rui Sheng Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=65263729&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=CN109326281(B) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by Beijing Haitian Rui Sheng Polytron Technologies Inc filed Critical Beijing Haitian Rui Sheng Polytron Technologies Inc
Priority to CN201810988973.9A priority Critical patent/CN109326281B/en
Publication of CN109326281A publication Critical patent/CN109326281A/en
Application granted granted Critical
Publication of CN109326281B publication Critical patent/CN109326281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides a prosody labeling method, a prosody labeling device and prosody labeling equipment. The prosody labeling method comprises the following steps: acquiring voice data of a text to be marked; determining prosody information in the voice data according to the voice data, wherein the prosody information is used for indicating pause duration in the voice data; and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data. The prosody labeling method provided by the invention improves the efficiency and accuracy of prosody labeling.

Description

Rhythm labeling method, device and equipment
Technical Field
The invention relates to the technical field of prosody annotation, in particular to a prosody annotation method, a prosody annotation device and prosody annotation equipment.
Background
Prosody, also known as super-segment features, rhythm or intonation, typically includes rhythm, emphasis, intonation, and the like. Prosodic information is a necessary means for people to express thought and emotion. The same characters can express completely different meanings by adopting different moods and rhythms. Therefore, prosodic information plays a very important role in speech synthesis systems.
Currently, prosody labeling in a speech synthesis system generally adopts a method of predicting prosody based on text information. Taking a Chinese notation as an example, prosody prediction is performed based on text information, and a prosody prediction result is usually determined according to information such as initials, finals, words, phrases, paragraphs, and the like. And completing prosody annotation by professional annotators according to the prosody prediction result.
However, language expressions are rich. The prosody labeling is carried out only by a manual mode according to the text information, and the prosody information cannot be correctly predicted for the part of the text which needs obvious pause or obvious silence. The labeling personnel have many places to change. Resulting in less efficient and accurate prosody labeling.
Disclosure of Invention
The invention provides a prosody labeling method, a prosody labeling device and prosody labeling equipment, and the prosody labeling efficiency and the prosody labeling accuracy are improved.
In a first aspect, the present invention provides a prosody labeling method, including:
acquiring voice data of a text to be marked;
determining prosody information in the voice data according to the voice data, wherein the prosody information is used for indicating pause duration in the voice data;
and performing prosodic symbol marking on the text to be marked according to prosodic information in the voice data.
Optionally, in a possible implementation, the method further includes:
acquiring prosodic information in text data of the text to be marked;
optionally, in a possible implementation manner, the performing prosody symbol labeling on the text to be labeled according to prosody information in the voice data includes:
and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data and prosodic information in the text data.
Optionally, in a possible implementation manner, the performing prosody symbol labeling on the text to be labeled according to prosody information in the voice data and prosody information in the text data includes:
performing prosodic symbol marking on the text to be marked according to prosodic information in the voice data;
and updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data.
Optionally, in a possible implementation manner, the updating, according to the prosody information in the text data, the prosody symbols labeled in the text to be labeled includes:
and if the prosody information in the text data indicates that the position of the at least one prosody symbol marked in the text to be marked does not need to be marked with the prosody symbol, deleting the at least one prosody symbol marked.
Optionally, in a possible implementation, the determining prosodic information in the speech data according to the speech data includes:
acquiring at least one mute section in the voice data according to the voice data;
and aiming at each mute section, determining prosody information corresponding to the mute section in the voice data according to the mute section.
Optionally, in a possible implementation manner, the obtaining at least one mute section in the voice data according to the voice data includes:
performing phoneme segmentation on the text data of the text to be labeled to obtain a voice labeling sequence;
performing phoneme segmentation on the voice data according to the voice labeling sequence, the voice data and a preset acoustic model to obtain the at least one mute section in the voice data; the preset acoustic model is used for representing the voice characteristics corresponding to different phonemes.
In a second aspect, the present invention provides a prosody labeling apparatus, including:
the first acquisition module is used for acquiring voice data of a text to be marked;
the prosody information determining module is used for determining prosody information in the voice data according to the voice data, wherein the prosody information is used for indicating pause duration in the voice data;
and the marking module is used for marking the prosody symbols of the text to be marked according to the prosody information in the voice data.
Optionally, in a possible implementation, the system further includes a second obtaining module;
the second obtaining module is used for obtaining prosodic information in the text data of the text to be labeled;
the labeling module is specifically configured to:
and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data and prosodic information in the text data.
Optionally, in a possible implementation manner, the labeling module is specifically configured to:
performing prosodic symbol marking on the text to be marked according to prosodic information in the voice data;
and updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data.
In a third aspect, the present invention provides a prosody annotating device comprising a processor and a memory. The memory is to store instructions. The processor is configured to execute the instructions stored in the memory, so as to enable the prosody annotation device to execute the prosody annotation method provided in any one of the embodiments of the first aspect of the present invention.
In a fourth aspect, the present invention provides a storage medium comprising: a readable storage medium and a computer program for implementing the prosody labeling method provided in any embodiment of the first aspect of the present invention.
The invention provides a prosody labeling method, a prosody labeling device and prosody labeling equipment, wherein prosody symbols are labeled on a text to be labeled according to voice data of the text to be labeled, the richness of language expression is considered, and particularly obvious pause or obvious silence sections in voice are considered, so that the efficiency and the accuracy of prosody labeling are improved, and the cost of prosody labeling is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a prosody labeling method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a prosody labeling apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a prosody labeling apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a prosody labeling method according to an embodiment of the present invention. In the prosody labeling method provided by this embodiment, the main execution body may be a prosody labeling device or a prosody labeling device. As shown in fig. 1, the prosody labeling method provided in this embodiment may include:
s101, voice data of a text to be marked are obtained.
S102, determining prosodic information in the voice data according to the voice data.
Wherein the prosodic information is used to indicate a pause duration in the speech data.
S103, performing prosody symbol labeling on the text to be labeled according to prosody information in the voice data.
Specifically, in this embodiment, a text to be subjected to prosody labeling may be referred to as a text to be labeled. And the voice data of the text to be labeled is generated after the reader reads the text to be labeled. This embodiment is not limited to the reader. Prosodic information in the speech data can be determined from the speech data of the text to be annotated. Wherein the prosodic information is used to indicate a pause duration in the speech data. Furthermore, the prosodic symbols can be labeled on the text to be labeled according to the pause duration in the voice data.
The prosody labeling method provided by this embodiment labels prosody symbols of the text to be labeled according to the voice data of the text to be labeled, and considers richness of language expression. Based on the voice data generated by the reader reading the text to be labeled, the obvious pause or the obvious mute segment in the voice is fully considered. Compared with the rhythm labeling performed in a manual mode based on the text to be labeled, the accuracy of rhythm labeling is improved. Because the places needing to be changed are reduced, the prosody annotation efficiency is improved, and the prosody annotation cost is reduced.
It should be noted that the embodiment does not limit the implementation manner of the prosodic symbols, and the prosodic symbols are set as needed. The pause duration ranges corresponding to different prosodic symbols may be preset. The embodiment does not limit the specific value of the pause duration range.
For example, the prosodic symbols may include #1, #2, #3, and # 4. At this time, there may be 4 kinds of pause durations in the voice data.
This is illustrated by way of example.
Table 1 shows the correspondence between prosody symbols, meanings indicated by the prosody symbols, and pause duration ranges. In this embodiment, the pause duration range may not be defined because the corresponding pause of the #1 and #2 scenes is not noticeable in the sense of hearing and is highly subjective. Of course, a dwell period range may also be defined. This embodiment is not limited to this. Wherein t3< t4 ≦ t5< t 6. In this embodiment, specific values of t3 to t6 are not limited. For example, t 4-t 5-90 ms. Suppose that one example of text to be annotated is xxxxxxx, xxxxxxxx. The text to be labeled can be xxxx #2xxx #3 and xxx #2xxxxx #4 after prosodic symbol labeling.
TABLE 1
Optionally, the prosody labeling method provided in this embodiment may further include:
and acquiring prosodic information in text data of the text to be labeled.
S103, performing prosody symbol tagging on the text to be tagged according to the prosody information in the speech data, which may include:
and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data and prosodic information in the text data.
Specifically, the prosodic information in the text data of the text to be labeled is used for indicating the pause duration in the text data of the text to be labeled. It should be noted that, in this embodiment, an implementation manner of obtaining prosody information in text data of a text to be labeled is not limited, and an existing method for performing prosody prediction based on text information may be adopted.
And performing prosody symbol labeling on the text to be labeled according to prosody information in the voice data and prosody information in the text data, and comprehensively considering a text prosody prediction result and a voice prosody analysis result, thereby further improving the efficiency and the accuracy of prosody labeling.
Optionally, performing prosody symbol labeling on the text to be labeled according to the prosody information in the voice data and the prosody information in the text data, which may include:
and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data.
And updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data.
The prosody symbol labeling is carried out on the text to be labeled on the basis of the prosody information in the voice data, the labeling of the prosody symbol is updated according to the prosody information in the text data, a text prosody prediction result is considered on the basis of a voice prosody analysis result, and the efficiency and the accuracy of the prosody labeling are further improved.
Optionally, updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data may include:
and if the prosody information in the text data indicates that the position of the at least one prosody symbol marked in the text to be marked does not need to be marked with the prosody symbol, deleting the at least one prosody symbol marked.
Specifically, the prosody information in the text data is a text prosody prediction result determined according to the text data of the text to be labeled. Prosodic information in the text data typically reflects the duration of pauses that can be grammatically paused, and also includes locations that cannot be paused. In some scenarios, the prosodic information in the text data indicates a position of at least one prosodic symbol already labeled in the text to be labeled without labeling the prosodic symbol. For example, there is typically no pause in the middle of a grammatical word, which may include phrases, idioms, colloquials, and the like. At this time, at least one prosody symbol marked in the text to be marked can be deleted according to the prosody information in the text data, so that the prosody marking accuracy is further improved.
Optionally, in S102, determining prosodic information in the speech data according to the speech data may include:
and acquiring at least one mute section in the voice data according to the voice data.
And aiming at each mute section, determining prosody information corresponding to the mute section in the voice data according to the mute section.
Specifically, at least one silent segment in the voice data is obtained according to the voice data. The duration of the silent section is the pause duration in the voice data.
Optionally, for each silence segment, determining prosody information corresponding to the silence segment in the speech data according to the silence segment may include:
and acquiring the duration of the mute section according to the starting time and the ending time of the mute section in the voice data.
This is illustrated by way of example.
Assume that a silence period starts at 00:22:07:300 and ends at 00:22:07: 360. The duration of the silent segment is 60 ms. See table 1. Let t3 be 30ms and t4 be 90 ms. Then, the prosodic symbol #2 can be marked in the text to be marked according to the duration of the silent segment.
Optionally, obtaining at least one mute segment in the voice data according to the voice data may include:
and performing phoneme segmentation on the text data of the text to be labeled to obtain a voice labeling sequence.
And performing phoneme segmentation on the voice data according to the voice labeling sequence, the voice data and a preset acoustic model to obtain at least one mute section in the voice data. The preset acoustic model is used for representing the voice characteristics corresponding to different phonemes.
Specifically, a phoneme is a minimum unit of speech divided from the viewpoint of sound quality. The text data of the text to be labeled is subjected to phoneme segmentation, and the text data can be segmented into a series of chronologically adjacent segments corresponding to phonemes. The segment may be referred to as a speech annotation sequence. The preset acoustic model represents the speech characteristics corresponding to different phonemes. According to the voice labeling sequence, the voice data and the preset acoustic model, phoneme segmentation can be carried out on the voice data, and at least one mute section in the voice data is obtained.
In the present embodiment, the phoneme segmentation method is not limited, and a conventional phoneme segmentation method may be used. For example, a speech auto-segmentation algorithm based on a Markov Model (HMM). In this algorithm, a Viterbi algorithm can be used to force the alignment of the speech signal with the corresponding HMM sequence of phonetic annotation units (phonemes) for a given annotation sequence based on the language model of the HMM.
It should be noted that the present embodiment does not limit the type and the obtaining manner of the preset acoustic model. For example, the preset acoustic model may be trained using speech data of the prosody to be predicted and corresponding text based on the open source tool Kaldi. For another example, the preset acoustic model may be obtained based on Deep Neural Networks (DNN) algorithm. Alternatively, when the amount of speech data is small, the preset acoustic model may be a GMM-HMM acoustic model. When the amount of speech data is large, the preset acoustic model may be a DNN-HMM model.
Optionally, performing phoneme segmentation on text data of a text to be labeled to obtain a speech labeling sequence, which may include:
and performing phoneme segmentation on the text data of the text to be labeled, and inserting a pause character between two adjacent words in the text to be labeled to obtain a voice labeling sequence.
This is illustrated by way of example.
It is assumed that a phoneme includes an initial and a final. The text to be marked is 'hello, lovely' of the country. ". The text data of the text to be labeled is 'ni hao, qin ai de zu guo'. Then, the phonetic annotation sequence may be "n i sp h ao sp q inst ai sp d e sp z u sp g uo". Wherein sp represents a pause symbol.
The embodiment provides a prosody labeling method, which includes: acquiring voice data of a text to be labeled, determining prosodic information in the voice data according to the voice data, and labeling prosodic symbols of the text to be labeled according to the prosodic information in the voice data. According to the prosody labeling method provided by the embodiment, the prosody symbols are labeled on the text to be labeled according to the voice data of the text to be labeled, so that the efficiency and the accuracy of prosody labeling are improved.
Fig. 2 is a schematic structural diagram of a prosody labeling device according to an embodiment of the present invention. The prosody labeling device provided in this embodiment is used to execute the prosody labeling method provided in the embodiment shown in fig. 1. As shown in fig. 2, the prosody labeling device provided in this embodiment may include:
the first obtaining module 11 is configured to obtain voice data of a text to be annotated.
The prosody information determining module 12 is configured to determine prosody information in the voice data according to the voice data, where the prosody information is used to indicate a pause duration in the voice data.
And the labeling module 13 is configured to label a prosody symbol of the text to be labeled according to the prosody information in the voice data.
Optionally, a second obtaining module 14 is further included.
And the second obtaining module 14 is configured to obtain prosodic information in text data of the text to be labeled.
The labeling module 13 is specifically configured to:
and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data and prosodic information in the text data.
Optionally, the labeling module 13 is specifically configured to:
and performing prosodic symbol labeling on the text to be labeled according to prosodic information in the voice data.
And updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data.
Optionally, the labeling module 13 is specifically configured to:
and if the prosody information in the text data indicates that the position of the at least one prosody symbol marked in the text to be marked does not need to be marked with the prosody symbol, deleting the at least one prosody symbol marked.
Optionally, the prosodic information determining module 12 is specifically configured to:
and acquiring at least one mute section in the voice data according to the voice data.
And aiming at each mute section, determining prosody information corresponding to the mute section in the voice data according to the mute section.
Optionally, the prosodic information determining module 12 is specifically configured to:
and performing phoneme segmentation on the text data of the text to be labeled to obtain a voice labeling sequence.
And performing phoneme segmentation on the voice data according to the voice labeling sequence, the voice data and a preset acoustic model to obtain at least one mute section in the voice data. The preset acoustic model is used for representing the voice characteristics corresponding to different phonemes.
The prosody labeling device provided in this embodiment is used to execute the prosody labeling method provided in the embodiment shown in fig. 1, and the principle and the technical effect are similar, which are not described herein again.
Fig. 3 is a schematic structural diagram of a prosody labeling apparatus according to an embodiment of the present invention. The prosody labeling device provided in this embodiment is configured to execute the prosody labeling method provided in the embodiment shown in fig. 1.
As shown in fig. 3, the prosody annotating device can include a processor 21 and a memory 22. The memory 22 is configured to store instructions, and the processor 21 is configured to execute the instructions stored in the memory 22, so that the prosody labeling device executes the prosody labeling method provided in the embodiment shown in fig. 1.
An embodiment of the present invention further provides a storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the prosody labeling method according to the embodiment shown in fig. 1.
An embodiment of the present invention further provides a program product, where the program product includes a computer program, where the computer program is stored in a storage medium, and at least one processor may read the computer program from the storage medium, and when the at least one processor executes the computer program, the prosody labeling method according to the embodiment shown in fig. 1 may be implemented.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: Read-Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A prosody labeling method is characterized by comprising the following steps:
acquiring voice data of a text to be marked;
acquiring prosodic information in text data of the text to be marked;
determining prosody information in the voice data according to the voice data, wherein the prosody information is used for indicating pause duration in the voice data;
performing prosodic symbol marking on the text to be marked according to prosodic information in the voice data;
and updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data.
2. The method according to claim 1, wherein the updating the prosodic symbols labeled in the text to be labeled according to the prosodic information in the text data comprises:
and if the prosody information in the text data indicates that the position of the at least one prosody symbol marked in the text to be marked does not need to be marked with the prosody symbol, deleting the at least one prosody symbol marked.
3. The method of claim 1 or 2, wherein the determining prosodic information in the speech data from the speech data comprises:
acquiring at least one mute section in the voice data according to the voice data;
and aiming at each mute section, determining prosody information corresponding to the mute section in the voice data according to the mute section.
4. The method of claim 3, wherein the obtaining at least one silence segment in the voice data according to the voice data comprises:
performing phoneme segmentation on the text data of the text to be labeled to obtain a voice labeling sequence;
performing phoneme segmentation on the voice data according to the voice labeling sequence, the voice data and a preset acoustic model to obtain the at least one mute section in the voice data; the preset acoustic model is used for representing the voice characteristics corresponding to different phonemes.
5. A prosody labeling apparatus, comprising:
the first acquisition module is used for acquiring voice data of a text to be marked;
the prosody information determining module is used for determining prosody information in the voice data according to the voice data, wherein the prosody information is used for indicating pause duration in the voice data;
the second acquisition module is used for acquiring prosodic information in the text data of the text to be labeled;
the marking module is used for marking the prosody symbols of the text to be marked according to the prosody information in the voice data;
and updating the prosodic symbols marked in the text to be marked according to the prosodic information in the text data.
6. A prosody labeling apparatus, comprising: a memory and a processor;
the memory to store program instructions;
the processor for invoking the program instructions stored in the memory to implement the prosody labeling method of any one of claims 1-4.
CN201810988973.9A 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment Active CN109326281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810988973.9A CN109326281B (en) 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810988973.9A CN109326281B (en) 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment

Publications (2)

Publication Number Publication Date
CN109326281A CN109326281A (en) 2019-02-12
CN109326281B true CN109326281B (en) 2020-01-07

Family

ID=65263729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810988973.9A Active CN109326281B (en) 2018-08-28 2018-08-28 Rhythm labeling method, device and equipment

Country Status (1)

Country Link
CN (1) CN109326281B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111161725B (en) * 2019-12-17 2022-09-27 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111105785B (en) * 2019-12-17 2023-06-16 广州多益网络股份有限公司 Text prosody boundary recognition method and device
CN111754978B (en) * 2020-06-15 2023-04-18 北京百度网讯科技有限公司 Prosodic hierarchy labeling method, device, equipment and storage medium
CN115862584A (en) * 2021-09-24 2023-03-28 华为云计算技术有限公司 Rhythm information labeling method and related equipment
CN115116427B (en) * 2022-06-22 2023-11-14 马上消费金融股份有限公司 Labeling method, voice synthesis method, training method and training device
CN116030789B (en) * 2022-12-28 2024-01-26 南京硅基智能科技有限公司 Method and device for generating speech synthesis training data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100524457C (en) * 2004-05-31 2009-08-05 国际商业机器公司 Device and method for text-to-speech conversion and corpus adjustment
CN104916284B (en) * 2015-06-10 2017-02-22 百度在线网络技术(北京)有限公司 Prosody and acoustics joint modeling method and device for voice synthesis system
CN105244020B (en) * 2015-09-24 2017-03-22 百度在线网络技术(北京)有限公司 Prosodic hierarchy model training method, text-to-speech method and text-to-speech device
CN105355193B (en) * 2015-10-30 2020-09-25 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN106601228B (en) * 2016-12-09 2020-02-04 百度在线网络技术(北京)有限公司 Sample labeling method and device based on artificial intelligence rhythm prediction

Also Published As

Publication number Publication date
CN109326281A (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN109326281B (en) Rhythm labeling method, device and equipment
CN109065031B (en) Voice labeling method, device and equipment
CN108447486B (en) Voice translation method and device
CN109389968B (en) Waveform splicing method, device, equipment and storage medium based on double syllable mixing and lapping
CN111369974B (en) Dialect pronunciation marking method, language identification method and related device
CN107039034B (en) Rhythm prediction method and system
CN110459202B (en) Rhythm labeling method, device, equipment and medium
US11810471B2 (en) Computer implemented method and apparatus for recognition of speech patterns and feedback
CN110265028B (en) Method, device and equipment for constructing speech synthesis corpus
KR101587866B1 (en) Apparatus and method for extension of articulation dictionary by speech recognition
CN105336322A (en) Polyphone model training method, and speech synthesis method and device
JP6370749B2 (en) Utterance intention model learning device, utterance intention extraction device, utterance intention model learning method, utterance intention extraction method, program
CN109300468B (en) Voice labeling method and device
US9508338B1 (en) Inserting breath sounds into text-to-speech output
CN113593522B (en) Voice data labeling method and device
JP6585022B2 (en) Speech recognition apparatus, speech recognition method and program
CN112259083B (en) Audio processing method and device
CN112397056A (en) Voice evaluation method and computer storage medium
Proença et al. Automatic evaluation of reading aloud performance in children
JP5180800B2 (en) Recording medium for storing statistical pronunciation variation model, automatic speech recognition system, and computer program
JPWO2016103652A1 (en) Audio processing apparatus, audio processing method, and program
CN112686041A (en) Pinyin marking method and device
JP6718787B2 (en) Japanese speech recognition model learning device and program
Qian et al. Capturing L2 segmental mispronunciations with joint-sequence models in computer-aided pronunciation training (CAPT)
US20230206899A1 (en) Spontaneous text to speech (tts) synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant