CN114550691A - A polyphonic word disambiguation method, device, electronic device and readable storage medium - Google Patents
A polyphonic word disambiguation method, device, electronic device and readable storage medium Download PDFInfo
- Publication number
- CN114550691A CN114550691A CN202210086347.7A CN202210086347A CN114550691A CN 114550691 A CN114550691 A CN 114550691A CN 202210086347 A CN202210086347 A CN 202210086347A CN 114550691 A CN114550691 A CN 114550691A
- Authority
- CN
- China
- Prior art keywords
- target
- polyphonic
- characters
- character
- disambiguation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 40
- 238000002372 labelling Methods 0.000 claims description 13
- 230000000694 effects Effects 0.000 abstract description 10
- 230000015572 biosynthetic process Effects 0.000 abstract description 3
- 238000003786 synthesis reaction Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
Description
技术领域technical field
本申请涉及语音处理技术领域,尤其涉及一种多音字消歧方法、装置、电子设备及可读存储介质。The present application relates to the technical field of speech processing, and in particular, to a polyphonic word disambiguation method, apparatus, electronic device, and readable storage medium.
背景技术Background technique
语音合成技术(Text To Speech,TTS),是指利用计算机将任意文本转化为语音的技术。对于输入的文本需要将其转化为对应的发音,其中,多音字转换的正确与否,极大地影响了用户对合成语音的理解情况,如果多音字转换错误,则语音合成的效果将会大打折扣。因此多音字消歧是语音合成系统中一个重要的任务。Text To Speech (TTS) is a technology that uses a computer to convert any text into speech. For the input text, it needs to be converted into the corresponding pronunciation. Among them, the correctness of the polyphonic conversion greatly affects the user's understanding of the synthesized speech. If the polyphonic conversion is wrong, the effect of speech synthesis will be greatly reduced. . Therefore, polyphone disambiguation is an important task in speech synthesis system.
目前多音字消歧的实现方法有基于决策树、基于最大熵算法、基于专家知识(大量规则),然而,基于决策树的方法是通过预设一些问题,根据问题及预设概率,给所有的可能读音一个最终概率值,由于该种方法需要预设问题及概率值,在语境或者场景发生变化时,这些问题会出现不适配,使得运用该种方法的多音字消歧效果不佳;最大熵模型是基于最大熵原理设计的一种分类模型,对数据量要求很大,然而,样本数据大的话,会导致计算量变大,使得该种方法存在一定应用上的限制;基于专家知识(大量规则)的方法会存在维护费力以及规则和规则之间容易冲突或者相互影响的问题;故现有技术中实现多音字消歧的方法存在多音字消歧效果差的问题。因此,亟需一种多音字消歧方法解决现有技术中存在的多音字消歧效果差的问题。At present, the realization methods of polyphonic word disambiguation are based on decision tree, based on maximum entropy algorithm, and based on expert knowledge (a large number of rules). It is possible to pronounce a final probability value. Since this method requires preset problems and probability values, when the context or scene changes, these problems will not adapt, making the polyphonic word disambiguation effect of this method poor; The maximum entropy model is a classification model designed based on the principle of maximum entropy, which requires a large amount of data. However, if the sample data is large, it will lead to a large amount of calculation, which makes this method have certain application limitations; based on expert knowledge ( The method with a large number of rules) has the problems of laborious maintenance and easy conflict or mutual influence between the rules; therefore, the method for realizing polyphonic word disambiguation in the prior art has the problem of poor polyphonic word disambiguation effect. Therefore, there is an urgent need for a polyphonic word disambiguation method to solve the problem that the polyphonic word disambiguation effect is poor in the prior art.
发明内容SUMMARY OF THE INVENTION
为克服相关技术中存在的问题,本申请提供一种多音字消歧方法、装置、电子设备及可读存储介质。In order to overcome the problems existing in the related art, the present application provides a polyphonic word disambiguation method, device, electronic device and readable storage medium.
根据本申请实施例的第一方面,提供一种多音字消歧的方法,所述方法包括:According to a first aspect of the embodiments of the present application, a method for disambiguating a polyphonic word is provided, the method comprising:
将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符;Dividing the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters;
针对每个字符,获取所述字符对应的第一标识;For each character, obtain the first identifier corresponding to the character;
将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model.
可选地,在将所述待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符的步骤之前,还包括:Optionally, before the step of dividing the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters, it also includes:
预先生成目标多音字消歧模型;Pre-generate the target polyphonic word disambiguation model;
获取待处理文本。Get pending text.
可选地,所述预先生成目标多音字消歧模型,包括:Optionally, the pre-generated target polyphonic word disambiguation model includes:
获取训练样本,其中,所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息,所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识;Acquire a training sample, wherein the training sample includes several sample texts and annotation information of target polyphonic characters in the sample texts, where the annotation information is used to indicate the pronunciation of the target polyphonic characters in the sample text and the The second identification corresponding to the pronunciation of the target polyphonic character in the sample text;
将所述若干样本文本作为输入,将若干所述样本文本中目标多音字字符的标注信息作为输出的目标,对预设的初始模型进行训练,将训练完成的模型确定为目标多音字消歧模型。Taking the several sample texts as input, using the annotation information of the target polyphonic characters in the several sample texts as the output target, training the preset initial model, and determining the trained model as the target polyphonic disambiguation model .
可选地,所述根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音,包括:Optionally, determining the pronunciation of the target polyphonic character according to the output of the target polyphonic disambiguation model, including:
根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息;According to the output of the target polyphone disambiguation model, the label information of the target polyphone character is obtained;
根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。The pronunciation of the target polyphonic character is determined according to the labeling information of the target polyphonic character.
根据本申请实施例的第二方面,提供一种多音字消歧的装置,所述装置包括:According to a second aspect of the embodiments of the present application, a device for disambiguating a polyphonic word is provided, the device comprising:
划分模块,用于将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符;A division module, for dividing the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters;
第一标识获取模块,用于针对每个字符,获取所述字符对应的第一标识;a first identification obtaining module, configured to obtain, for each character, the first identification corresponding to the character;
多音字消歧模块,用于将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。A polyphonic word disambiguation module, for inputting the character and the first identification corresponding to the character into a pre-generated target polyphonic word disambiguation model, and determining the target polyphonic word according to the output of the target polyphonic word disambiguation model pronunciation of characters.
可选地,所述装置还包括:Optionally, the device further includes:
多音字消歧模型训练模块,用于预先生成目标多音字消歧模型;A polyphonic word disambiguation model training module, which is used to generate a target polyphonic word disambiguation model in advance;
待处理文本获取模块,用于获取待处理文本。The pending text obtaining module is used to obtain the pending text.
可选地,所述多音字消歧模型训练模块,包括:Optionally, the polyphonic word disambiguation model training module includes:
训练样本获取单元,用于获取训练样本,其中,所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息,所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识;A training sample obtaining unit is configured to obtain a training sample, wherein the training sample includes a number of sample texts and label information of target polyphonic characters in the sample text, and the label information is used to indicate that there are more targets in the sample text. The pronunciation of the phonetic character and the second identification corresponding to the pronunciation of the target polyphonic character in the sample text;
多音字消歧模型训练单元,用于将所述若干样本文本作为输入,将若干所述样本文本中目标多音字字符的标注信息作为输出的目标,对预设的初始模型进行训练,将训练完成的模型确定为目标多音字消歧模型。The polyphonic word disambiguation model training unit is used for taking the several sample texts as input, and using the annotation information of the target polyphonic characters in the several sample texts as the output target, training the preset initial model, and completing the training The model is identified as the target polyphonic word disambiguation model.
可选地,所述多音字消歧模块,还包括:Optionally, the polyphonic word disambiguation module also includes:
标注信息获取单元,用于根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息;a labeling information obtaining unit, configured to obtain labeling information of the target polyphonic character according to the output of the target polyphonic disambiguation model;
目标多音字字符的发音获取单元,用于根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。The pronunciation obtaining unit of the target polyphonic character is configured to determine the pronunciation of the target polyphonic character according to the label information of the target polyphonic character.
根据本申请实施例的第三方面,提供一种电子设备,包括:According to a third aspect of the embodiments of the present application, an electronic device is provided, including:
处理器;processor;
用于存储所述处理器可执行指令的存储器;memory for storing instructions executable by the processor;
其中,所述处理器被配置为执行所述指令,以实现所述多音字消歧的方法。Wherein, the processor is configured to execute the instructions to implement the method for disambiguating polyphonic words.
根据本申请实施例的第四方面,提供一种计算机可读存储介质,当所述计算机可读存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行所述多音字消歧的方法。According to a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, when instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the polyphonic word disambiguation Methods.
本申请的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present application may include the following beneficial effects:
本申请通过将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符;针对每个字符,获取所述字符对应的第一标识;将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。通过本申请的实施例提供的技术方案,利用目标多音字消歧模型对待处理文本中的目标多音字字符进行多音字消歧,进而提高了多音字消歧的预测速度,进一步地,提高了多音字消歧的效果。In the present application, the text to be processed is divided into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters; for each character, the first identifier corresponding to the character is obtained; the The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model. Through the technical solutions provided by the embodiments of the present application, the target polyphonic word disambiguation model is used to perform polyphonic word disambiguation on the target polyphonic word character in the text to be processed, thereby improving the prediction speed of polyphonic word disambiguation, and further, improving the multi-phonic word disambiguation. The effect of phonetic disambiguation.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本发明的实施例,并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.
图1是根据一示例性实施例示出的一种多音字消歧方法的流程图;1 is a flowchart of a method for disambiguating a polyphonic word according to an exemplary embodiment;
图2是根据一示例性实施例示出的另一种多音字消歧方法的流程图;Fig. 2 is a flowchart of another polyphonic word disambiguation method shown according to an exemplary embodiment;
图3是图2所示的根据一示例性实施例示出的另一种多音字消歧方法的流程图中步骤201的流程图;3 is a flowchart of
图4是图1所示的根据一示例性实施例示出的一种多音字消歧方法的流程图中步骤103的流程图;4 is a flowchart of
图5是根据一示例性实施例示出的一种多音字消歧的装置框图;5 is a block diagram of a device for disambiguating polyphonic characters according to an exemplary embodiment;
图6是根据一示例性实施例示出的另一种多音字消歧的装置框图;Fig. 6 is another apparatus block diagram of polyphonic word disambiguation shown according to an exemplary embodiment;
图7是图6所示的根据一示例性实施例示出的另一种多音字消歧的装置框图中多音字消歧模型训练模块601的装置框图;Fig. 7 is the device block diagram of the polyphonic word disambiguation
图8是图5所示的根据一示例性实施例示出的一种多音字消歧的装置框图中多音字消歧模块503的装置框图;Fig. 8 is the device block diagram of the polyphonic
图9是根据一示例性实施例示出的一种电子设备的框图。Fig. 9 is a block diagram of an electronic device according to an exemplary embodiment.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with some aspects of the invention as recited in the appended claims.
需要说明的是,在本申请实施例中,目标多音字消歧模型可优选为全卷积模型Unet,之所以选用全卷积模型Unet这一结构,是因为全卷积模型Unet可以取代现有技术中使用的循环神经网络模型RNN,进而能够达到更好的多音字消歧效果。全卷积模型Unet最常被用于图像分割领域,具体地,全卷积模型Unet的得名,源自其对称的U型结构,左侧为卷积层,右侧为上采样层。每个卷积层得到的特征图都会拼接到对应的上采样层上,以此来确保每层得到的特征都能运用的后续的计算中。这也使得全卷积模型Unet具备结合各层的特征,从而提高多音字消歧模型对于整体特征的把握,进而提高多音字消歧模型的效果。本申请是以全卷积模型Unet作为目标多音字消歧模型的实施例,在实际应用过程中,也可以使用其他全卷积模型应用于本申请的技术方案,例如:包括但不限于IDCNN。It should be noted that, in the embodiment of the present application, the target polyphonic word disambiguation model may preferably be the fully convolutional model Unet. The reason why the fully convolutional model Unet is selected is because the fully convolutional model Unet can replace the existing The recurrent neural network model RNN used in the technology can achieve better polyphonic word disambiguation effect. The fully convolutional model Unet is most often used in the field of image segmentation. Specifically, the fully convolutional model Unet is named after its symmetrical U-shaped structure, with the convolutional layer on the left and the upsampling layer on the right. The feature maps obtained by each convolutional layer will be spliced to the corresponding upsampling layer to ensure that the features obtained by each layer can be used in subsequent calculations. This also enables the fully convolutional model Unet to combine the features of each layer, thereby improving the polyphony disambiguation model's grasp of the overall features, thereby improving the effect of the polyphonic disambiguation model. The present application uses the fully convolutional model Unet as an embodiment of the target polyphonic word disambiguation model. In the actual application process, other fully convolutional models can also be used to apply to the technical solutions of the present application, for example: including but not limited to IDCNN.
图1是根据一示例性实施例示出的一种多音字消歧方法的流程图,如图1所示,包括以下步骤。Fig. 1 is a flowchart of a method for disambiguating a polyphonic word according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.
步骤101,将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符。Step 101: Divide the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters.
需要说明的是,在本申请实施例中,从任务整体的角度来看,多音字消歧是一个序列到序列的标注任务;然则拆分开来,对于每个字符而言,是一个分类任务。例如:“行”的读音选项是xing2或者hang2,其中,2指的是该字符读音的声调为阳平(第二声);“转”的读音选项是zhuan2、zhuan3和zhuan4,其中,3指的是该字符读音的声调为上声(第三声),4指的是该字符读音的声调为去声(第四声)。It should be noted that, in the embodiment of the present application, from the perspective of the overall task, polyphonic word disambiguation is a sequence-to-sequence labeling task; however, for each character, it is a classification task. . For example: the pronunciation options of "行" are xing2 or hang2, where 2 refers to the tone of the character being pronounced as yangping (the second tone); the pronunciation options of "zhuan" are zhuan2, zhuan3 and zhuan4, where 3 refers to The tone of the character's pronunciation is the upper tone (the third tone), and 4 means the tone of the character's pronunciation is the go-tone (the fourth tone).
因此,首先可以将待处理文本划分为一个一个的字符,具体地,例如:以“登陆人行征信系统”作为待处理文本,将其划分为一个一个的字符,得到“登,陆,人,行,征,信,体,统”。Therefore, firstly, the text to be processed can be divided into characters one by one, specifically, for example: take "login to the People's Bank Credit Information System" as the text to be processed, divide it into characters one by one, and get "login, log in, person, Conduct, sign, trust, style, and standardize".
进一步地,在本申请实施例中,图2是根据一示例性实施例示出的另一种多音字消歧方法的流程图,如图2所示,在步骤101之前还可以包括以下步骤。Further, in this embodiment of the present application, FIG. 2 is a flowchart of another method for disambiguating polyphonic characters according to an exemplary embodiment. As shown in FIG. 2 , the following steps may be further included before
步骤201,预先生成目标多音字消歧模型。
进一步地,在本申请实施例中,图3是图2所示的根据一示例性实施例示出的另一种多音字消歧方法的流程图中步骤201的流程图,如图3所示,步骤201还可以包括以下步骤。Further, in the embodiment of the present application, FIG. 3 is a flowchart of
步骤301,获取训练样本,其中,所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息,所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识。Step 301: Obtain training samples, wherein the training samples include several sample texts and label information of target polyphonic characters in the sample texts, where the labeling information is used to indicate the pronunciation of the target polyphonic characters in the sample texts and the second identifier corresponding to the pronunciation of the target polyphonic character in the sample text.
步骤302,将所述若干样本文本作为输入,将若干所述样本文本中目标多音字字符的标注信息作为输出的目标,对预设的初始模型进行训练,将训练完成的模型确定为目标多音字消歧模型。
需要说明的是,在本申请实施例中,可根据训练样本对预设的初始模型进行训练,训练完成的模型即为目标多音字消歧模型,其中,目标多音字消歧模型为全卷积模型Unet结构。具体地,将获取的训练样本中包含的若干样本文本作为初始模型的输入,将若干样本文本中目标多音字字符的标注信息作为初始模型的输出的目标,对预设的初始模型进行训练,将训练完成的模型确定为目标多音字消歧模型,其中,目标多音字消歧模型为全卷积模型Unet。例如:针对多音字字符“行”的读音为hang2,获取一万句包含有多音字字符“行”,其中,在包含有“行”多音字字符的一万句中,需要包含“行”这个字符在自然语言中的各种读音,例如:hang2,xing2;在读音为hang2的文本中,将不是多音字字符“行”的其他字符添加标识为0,将是多音字字符“行”,且其读音为hang2的字符根据预先设定的多音字列表获取其对应的标注信息,如:hang2_4,其中,2指的是该字符读音的声调为阳平(第二声),4指的是该字符在多音字列表中的标识。将一万句包含有多音字字符“行”,且其读音为hang2的文本作为初始模型的输入,将该字符对应的标注信息,即hang2_4,作为输出的目标,对初始模型进行训练,训练完成的模型即为目标多音字消歧模型。本申请对标注信息的形式以及标识的具体数值不作具体限定。It should be noted that, in the embodiment of the present application, the preset initial model can be trained according to the training sample, and the model after training is the target polyphonic word disambiguation model, wherein the target polyphonic word disambiguation model is a full convolution Model Unet structure. Specifically, several sample texts contained in the acquired training samples are used as the input of the initial model, and the label information of the target polyphonic characters in the several sample texts is used as the output target of the initial model, and the preset initial model is trained, and the The trained model is determined as the target polyphonic word disambiguation model, wherein the target polyphonic word disambiguation model is a fully convolutional model Unet. For example: for the pronunciation of the polyphonic character "行" is hang2, get 10,000 sentences containing the polyphonic character "行", among which, in the ten thousand sentences containing the polyphonic character "行", it is necessary to include the word "行" Various pronunciations of characters in natural language, such as: hang2, xing2; in the text with the pronunciation of hang2, adding other characters that are not the polysyllabic character "line" as 0 will be the polysyllabic character "line", and The character whose pronunciation is hang2 obtains its corresponding labeling information according to the preset polyphonic word list, such as: hang2_4, wherein 2 refers to the tone of the pronunciation of the character as yangping (the second tone), and 4 refers to the character Identification in the polyphonic list. Take 10,000 sentences containing the polyphonic character "line" and the text with the pronunciation of hang2 as the input of the initial model, and the label information corresponding to the character, namely hang2_4, as the target of the output, train the initial model, and the training is completed. The model is the target polyphonic word disambiguation model. The application does not specifically limit the form of the marked information and the specific value of the marking.
本申请通过预先训练目标多音字消歧模型,其中,目标多音字消歧模型为全卷积模型Unet,进而能够将预先训练好的全卷积模型Unet应用于多音字消歧的方法中,实现提高多音字消歧的效率。The present application pre-trains a target polyphonic word disambiguation model, wherein the target polyphonic word disambiguation model is a full convolution model Unet, and then the pre-trained full convolution model Unet can be applied to the method for polyphonic word disambiguation to achieve Improve the efficiency of polyphonic word disambiguation.
进一步地,在本申请实施例中,图2是根据一示例性实施例示出的另一种语音识别方法的流程图,如图2所示,在步骤101之前还可以包括以下步骤。Further, in this embodiment of the present application, FIG. 2 is a flowchart of another speech recognition method according to an exemplary embodiment. As shown in FIG. 2 , the following steps may be further included before
步骤202,获取待处理文本。
需要说明的是,在本申请实施例中,待处理文本可以是任何文本,其可以包括至少一个目标多音字字符。可以通过任何方式获取待处理文本,本申请对获取待处理文本的方式不作具体限定。It should be noted that, in this embodiment of the present application, the text to be processed may be any text, which may include at least one target polyphonic character. The text to be processed may be acquired in any manner, and the application does not specifically limit the manner of acquiring the text to be processed.
步骤102,针对每个字符,获取所述字符对应的第一标识。Step 102: For each character, obtain a first identifier corresponding to the character.
需要说明的是,在本申请实施例中,针对待处理文本划分出的每个字符,将其每个字符添加第一标识,具体地,例如:将划分出的每个字符“登,陆,人,行,征,信,体,统”添加上第一标识,可以得到“登_1,陆_2,人_3,行_4,征_5,信_6,系_7,统_8”,本申请对第一标识的具体数值不做具体限定,且对添加第一标识的形式不作具体限定。It should be noted that, in this embodiment of the present application, for each character divided by the text to be processed, a first identifier is added to each character, specifically, for example: each divided character "login, land, Person, line, sign, letter, body, system" add the first logo, you can get "Deng_1, Lu_2, person_3, line_4, sign_5, letter_6, system_7, system _8", this application does not specifically limit the specific value of the first identification, and does not specifically limit the form of adding the first identification.
步骤103,将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。Step 103: Input the character and the first identifier corresponding to the character into a pre-generated target polyphone disambiguation model, and determine the pronunciation of the target polyphone character according to the output of the target polyphone disambiguation model.
进一步地,在本申请实施例中,图4是图1所示的根据一示例性实施例示出的一种语音识别方法的流程图中步骤103的流程图,如图4所示,步骤103还可以包括以下步骤。Further, in this embodiment of the present application, FIG. 4 is a flowchart of
步骤401,根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息。Step 401: Obtain label information of the target polyphone character according to the output of the target polyphone disambiguation model.
步骤402,根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。Step 402: Determine the pronunciation of the target polyphonic character according to the labeling information of the target polyphonic character.
需要说明的是,在本申请实施例中,根据目标多音字消歧模型,也即全卷积模型Unet的输出得到目标多音字字符的标注信息,其中,标注信息中包含有目标多音字字符的发音和目标多音字字符对应的第二标识,进一步地,能够确定目标多音字字符的发音。例如:得到多音字字符“行”的标注信息为xing2_8,即可获得在多音字列表中该字符对应的标识为8,且其发音为xing阳平(第二声)。It should be noted that, in the embodiment of the present application, according to the target polyphonic disambiguation model, that is, the output of the full convolution model Unet, the labeling information of the target polyphonic character is obtained, wherein the labeling information includes the target polyphonic character. The pronunciation corresponds to the second identification of the target polyphonic character, and further, the pronunciation of the target polyphonic character can be determined. For example, if the label information of the polyphonic character "row" is obtained as xing2_8, the corresponding identification of the character in the polyphonic list can be obtained as 8, and its pronunciation is xing yangping (the second tone).
本申请通过将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符;针对每个字符,获取所述字符对应的第一标识;将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。通过本申请的实施例提供的技术方案,利用目标多音字消歧模型,也即全卷积模型Unet,对待处理文本中的目标多音字字符进行多音字消歧,由于全卷积模型Unet更易并行化,不需要保留以及更新state,因而输出之间不存在依序关系,进而通过利用全卷积模型Unet不仅能够提高多音字消歧的预测速度,同时还能够很好地对上下文信息进行把握,进一步地,提高了多音字消歧的效果。本申请通过预先生成目标多音字消歧模型,其中,目标多音字消歧模型为全卷积模型Unet,进而能够将预先训练好的全卷积模型Unet应用于多音字消歧的方法中,进一步地,提高了多音字消歧的处理效率和预测准确率。In the present application, the text to be processed is divided into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters; for each character, the first identifier corresponding to the character is obtained; the The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model. Through the technical solutions provided by the embodiments of the present application, the target polyphone disambiguation model, that is, the full convolution model Unet, is used to disambiguate the target polyphone characters in the text to be processed. Since the full convolution model Unet is easier to parallelize Therefore, there is no sequential relationship between the outputs. By using the fully convolutional model Unet, the prediction speed of polyphonic word disambiguation can not only be improved, but also the context information can be well grasped. Further, the effect of polyphonic word disambiguation is improved. The present application generates a target polyphonic word disambiguation model in advance, wherein the target polyphonic word disambiguation model is a full convolution model Unet, and then the pre-trained full convolution model Unet can be applied to the method for polyphonic word disambiguation, and further Therefore, the processing efficiency and prediction accuracy of polyphonic word disambiguation are improved.
图5是根据一示例性实施例示出的一种多音字消歧的装置框图,参照图5,该装置包括划分模块501、第一标识获取模块502、多音字消歧模块503。FIG. 5 is a block diagram of an apparatus for disambiguating a polyphonic word according to an exemplary embodiment. Referring to FIG. 5 , the apparatus includes a
划分模块501,用于将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符。The
第一标识获取模块502,用于针对每个字符,获取所述字符对应的第一标识。The first
多音字消歧模块503,用于将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。The polyphonic
可选地,图6是根据一示例性实施例示出的另一种多音字消歧的装置框图。参照图6,该装置包括多音字消歧模型训练模块601、待处理文本获取模块602。Optionally, FIG. 6 is a block diagram of another apparatus for disambiguating polyphonic characters according to an exemplary embodiment. Referring to FIG. 6 , the apparatus includes a polyphonic word disambiguation
多音字消歧模型训练模块601,用于预先生成目标多音字消歧模型。The polyphonic word disambiguation
待处理文本获取模块602,用于获取待处理文本。The to-be-processed
可选地,图7是图6所示的根据一示例性实施例示出的另一种多音字消歧的装置框图中多音字消歧模型训练模块601的装置框图。参照图7,该装置包括训练样本获取单元701、多音字消歧模型训练单元702。Optionally, FIG. 7 is an apparatus block diagram of a polyphonic word disambiguation
训练样本获取单元701,用于获取训练样本,其中,所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息,所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识。A training
多音字消歧模型训练单元702,用于将所述若干样本文本作为输入,将若干所述样本文本中目标多音字字符的标注信息作为输出的目标,对预设的初始模型进行训练,将训练完成的模型确定为目标多音字消歧模型。The polyphonic word disambiguation
可选地,图8是图5所示的根据一示例性实施例示出的一种多音字消歧的装置框图中多音字消歧模块503的装置框图。参照图8该装置包括标注信息获取单元801、目标多音字字符的发音获取单元802。Optionally, FIG. 8 is an apparatus block diagram of the polyphonic
标注信息获取单元801,用于根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息。The labeling
目标多音字字符的发音获取单元802,用于根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。The
本申请通过将待处理文本划分为若干个字符,其中,所述若干个字符包括目标多音字字符和非目标多音字字符;针对每个字符,获取所述字符对应的第一标识;将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型,根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。通过本申请的实施例提供的技术方案,利用目标多音字消歧模型,也即全卷积模型Unet,对待处理文本中的目标多音字字符进行多音字消歧,由于全卷积模型Unet更易并行化,不需要保留以及更新state,因而输出之间不存在依序关系,进而通过利用全卷积模型Unet不仅能够提高多音字消歧的预测速度,同时还能够很好地对上下文信息进行把握,进一步地,提高了多音字消歧的效果。本申请通过预先生成目标多音字消歧模型,其中,目标多音字消歧模型为全卷积模型Unet,进而能够将预先训练好的全卷积模型Unet应用于多音字消歧的方法中,进一步地,提高了多音字消歧的处理效率和预测准确率。In the present application, the text to be processed is divided into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters; for each character, the first identifier corresponding to the character is obtained; the The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model. Through the technical solutions provided by the embodiments of the present application, the target polyphone disambiguation model, that is, the full convolution model Unet, is used to disambiguate the target polyphone characters in the text to be processed. Since the full convolution model Unet is easier to parallelize Therefore, there is no sequential relationship between the outputs. By using the fully convolutional model Unet, the prediction speed of polyphonic word disambiguation can not only be improved, but also the context information can be well grasped. Further, the effect of polyphonic word disambiguation is improved. The present application generates a target polyphonic word disambiguation model in advance, wherein the target polyphonic word disambiguation model is a full convolution model Unet, and then the pre-trained full convolution model Unet can be applied to the method for polyphonic word disambiguation, and further Therefore, the processing efficiency and prediction accuracy of polyphonic word disambiguation are improved.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.
图9是根据一示例性实施例示出的一种用于电子设备900的框图。例如,电子设备900可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。FIG. 9 is a block diagram of an
参照图9,电子设备900可以包括以下一个或多个组件:处理组件902,存储器904,电源组件906,多媒体组件908,音频组件910,输入/输出接口912,传感器组件914,以及通信组件916。9,
处理组件902通常控制装置900的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件902可以包括一个或多个处理器920来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件902可以包括一个或多个模块,便于处理组件902和其他组件之间的交互。例如,处理组件902可以包括多媒体模块,以方便多媒体组件908和处理组件902之间的交互。The
存储器904被配置为存储各种类型的数据以支持在设备900的操作。这些数据的示例包括用于在装置900上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器904可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件906为电子设备900的各种组件提供电力。电源组件906可以包括电源管理系统,一个或多个电源,及其他与为电子设备900生成、管理和分配电力相关联的组件。
多媒体组件908包括在所述电子设备900和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件908包括一个前置摄像头和/或后置摄像头。当电子设备900处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件910被配置为输出和/或输入音频信号。例如,音频组件910包括一个麦克风(MIC),当电子设备900处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器904或经由通信组件916发送。在一些实施例中,音频组件910还包括一个扬声器,用于输出音频信号。
输入/输出接口912为处理组件902和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The input/
传感器组件914包括一个或多个传感器,用于为电子设备900提供各个方面的状态评估。例如,传感器组件914可以检测到电子设备900的打开/关闭状态,组件的相对定位,例如所述组件为电子设备900的显示器和小键盘,传感器组件914还可以检测电子设备900或电子设备900一个组件的位置改变,用户与电子设备900接触的存在或不存在,电子设备900方位或加速/减速和电子设备900的温度变化。传感器组件914可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件914还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件914还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件916被配置为便于电子设备900和其他设备之间有线或无线方式的通信。电子设备900可以接入基于通信标准的无线网络,如WiFi,运营商网络(如2G、3G、4G或5G),或它们的组合。在一个示例性实施例中,通信组件916经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件1116还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,电子设备900可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment,
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器904,上述指令可由电子设备900的处理器920执行以完成上述方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本发明的真正范围和精神由下面的权利要求指出。Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. The present invention is intended to cover any modifications, uses or adaptations of the present invention which follow the general principles of the invention and include common knowledge or common technical means in the technical field not disclosed in this application . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the invention being indicated by the following claims.
应当理解的是,本发明并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。It should be understood that the present invention is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from its scope. The scope of the present invention is limited only by the appended claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210086347.7A CN114550691A (en) | 2022-01-25 | 2022-01-25 | A polyphonic word disambiguation method, device, electronic device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210086347.7A CN114550691A (en) | 2022-01-25 | 2022-01-25 | A polyphonic word disambiguation method, device, electronic device and readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114550691A true CN114550691A (en) | 2022-05-27 |
Family
ID=81670657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210086347.7A Pending CN114550691A (en) | 2022-01-25 | 2022-01-25 | A polyphonic word disambiguation method, device, electronic device and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114550691A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115273809A (en) * | 2022-06-22 | 2022-11-01 | 北京市商汤科技开发有限公司 | Training method, voice generation method and device for polyphonic word pronunciation prediction network |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110377882A (en) * | 2019-07-17 | 2019-10-25 | 标贝(深圳)科技有限公司 | For determining the method, apparatus, system and storage medium of the phonetic of text |
CN111599340A (en) * | 2020-07-27 | 2020-08-28 | 南京硅基智能科技有限公司 | Polyphone pronunciation prediction method and device and computer readable storage medium |
CN111611810A (en) * | 2020-05-29 | 2020-09-01 | 河北数云堂智能科技有限公司 | Polyphone pronunciation disambiguation device and method |
CN111967260A (en) * | 2020-10-20 | 2020-11-20 | 北京金山数字娱乐科技有限公司 | Polyphone processing method and device and model training method and device |
CN112818657A (en) * | 2019-11-15 | 2021-05-18 | 北京字节跳动网络技术有限公司 | Method and device for determining polyphone pronunciation, electronic equipment and storage medium |
CN113380223A (en) * | 2021-05-26 | 2021-09-10 | 标贝(北京)科技有限公司 | Method, device, system and storage medium for disambiguating polyphone |
-
2022
- 2022-01-25 CN CN202210086347.7A patent/CN114550691A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110377882A (en) * | 2019-07-17 | 2019-10-25 | 标贝(深圳)科技有限公司 | For determining the method, apparatus, system and storage medium of the phonetic of text |
CN112818657A (en) * | 2019-11-15 | 2021-05-18 | 北京字节跳动网络技术有限公司 | Method and device for determining polyphone pronunciation, electronic equipment and storage medium |
CN111611810A (en) * | 2020-05-29 | 2020-09-01 | 河北数云堂智能科技有限公司 | Polyphone pronunciation disambiguation device and method |
CN111599340A (en) * | 2020-07-27 | 2020-08-28 | 南京硅基智能科技有限公司 | Polyphone pronunciation prediction method and device and computer readable storage medium |
CN111967260A (en) * | 2020-10-20 | 2020-11-20 | 北京金山数字娱乐科技有限公司 | Polyphone processing method and device and model training method and device |
CN113380223A (en) * | 2021-05-26 | 2021-09-10 | 标贝(北京)科技有限公司 | Method, device, system and storage medium for disambiguating polyphone |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115273809A (en) * | 2022-06-22 | 2022-11-01 | 北京市商汤科技开发有限公司 | Training method, voice generation method and device for polyphonic word pronunciation prediction network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2017535007A (en) | Classifier training method, type recognition method and apparatus | |
CN109961791B (en) | Voice information processing method and device and electronic equipment | |
CN109558599B (en) | Conversion method and device and electronic equipment | |
CN107133354B (en) | Method and device for acquiring image description information | |
WO2021046958A1 (en) | Speech information processing method and apparatus, and storage medium | |
US11335348B2 (en) | Input method, device, apparatus, and storage medium | |
CN111326138A (en) | Voice generation method and device | |
CN112735396A (en) | Speech recognition error correction method, device and storage medium | |
CN110619325A (en) | Text recognition method and device | |
CN114154459A (en) | Speech recognition text processing method, device, electronic device and storage medium | |
CN113923517B (en) | Background music generation method and device and electronic equipment | |
CN115394283A (en) | Speech synthesis method, speech synthesis device, electronic equipment and storage medium | |
CN114550691A (en) | A polyphonic word disambiguation method, device, electronic device and readable storage medium | |
CN114155849A (en) | Method, device and medium for processing virtual objects | |
CN105913841B (en) | Voice recognition method, device and terminal | |
CN112837668A (en) | Voice processing method and device for processing voice | |
WO2024124913A1 (en) | Entity information determining method and apparatus, and device | |
CN113807540B (en) | A data processing method and device | |
CN113115104B (en) | Video processing method and device, electronic equipment and storage medium | |
CN111414731B (en) | Text labeling method and device | |
CN115409200A (en) | Database operation method, device and medium | |
CN113345451B (en) | Sound changing method and device and electronic equipment | |
CN113221581B (en) | Text translation method, device and storage medium | |
CN110334338B (en) | Word segmentation method, device and equipment | |
CN115273852B (en) | Voice response method, device, readable storage medium and chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |