CN114550691A

CN114550691A - A polyphonic word disambiguation method, device, electronic device and readable storage medium

Info

Publication number: CN114550691A
Application number: CN202210086347.7A
Authority: CN
Inventors: 李睿端; 李健; 武卫东; 陈明
Original assignee: Beijing Sinovoice Technology Co Ltd
Current assignee: Beijing Sinovoice Technology Co Ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-05-27

Abstract

The invention relates to a method, a device, electronic equipment and a readable storage medium for disambiguating polyphone, which relate to the technical field of voice processing and comprise the following steps: dividing a text to be processed into a plurality of characters, wherein the plurality of characters comprise target polyphone characters and non-target polyphone characters; aiming at each character, acquiring a first identifier corresponding to the character; inputting the characters and the first identifications corresponding to the characters into a pre-generated target polyphonic character disambiguation model, and determining the pronunciation of the target polyphonic character according to the output of the target polyphonic character disambiguation model. The method and the device are applied to scenes for realizing the multi-tone word disambiguation in a voice synthesis system, and the target multi-tone word disambiguation model is utilized to perform the multi-tone word disambiguation on the target multi-tone word in the text to be processed, so that the prediction speed of the multi-tone word disambiguation under the scenes is improved, and further, the effect of the multi-tone word disambiguation is improved.

Description

A polyphonic word disambiguation method, device, electronic device and readable storage medium

技术领域technical field

本申请涉及语音处理技术领域，尤其涉及一种多音字消歧方法、装置、电子设备及可读存储介质。The present application relates to the technical field of speech processing, and in particular, to a polyphonic word disambiguation method, apparatus, electronic device, and readable storage medium.

背景技术Background technique

语音合成技术(Text To Speech，TTS)，是指利用计算机将任意文本转化为语音的技术。对于输入的文本需要将其转化为对应的发音，其中，多音字转换的正确与否，极大地影响了用户对合成语音的理解情况，如果多音字转换错误，则语音合成的效果将会大打折扣。因此多音字消歧是语音合成系统中一个重要的任务。Text To Speech (TTS) is a technology that uses a computer to convert any text into speech. For the input text, it needs to be converted into the corresponding pronunciation. Among them, the correctness of the polyphonic conversion greatly affects the user's understanding of the synthesized speech. If the polyphonic conversion is wrong, the effect of speech synthesis will be greatly reduced. . Therefore, polyphone disambiguation is an important task in speech synthesis system.

目前多音字消歧的实现方法有基于决策树、基于最大熵算法、基于专家知识(大量规则)，然而，基于决策树的方法是通过预设一些问题，根据问题及预设概率，给所有的可能读音一个最终概率值，由于该种方法需要预设问题及概率值，在语境或者场景发生变化时，这些问题会出现不适配，使得运用该种方法的多音字消歧效果不佳；最大熵模型是基于最大熵原理设计的一种分类模型，对数据量要求很大，然而，样本数据大的话，会导致计算量变大，使得该种方法存在一定应用上的限制；基于专家知识(大量规则)的方法会存在维护费力以及规则和规则之间容易冲突或者相互影响的问题；故现有技术中实现多音字消歧的方法存在多音字消歧效果差的问题。因此，亟需一种多音字消歧方法解决现有技术中存在的多音字消歧效果差的问题。At present, the realization methods of polyphonic word disambiguation are based on decision tree, based on maximum entropy algorithm, and based on expert knowledge (a large number of rules). It is possible to pronounce a final probability value. Since this method requires preset problems and probability values, when the context or scene changes, these problems will not adapt, making the polyphonic word disambiguation effect of this method poor; The maximum entropy model is a classification model designed based on the principle of maximum entropy, which requires a large amount of data. However, if the sample data is large, it will lead to a large amount of calculation, which makes this method have certain application limitations; based on expert knowledge ( The method with a large number of rules) has the problems of laborious maintenance and easy conflict or mutual influence between the rules; therefore, the method for realizing polyphonic word disambiguation in the prior art has the problem of poor polyphonic word disambiguation effect. Therefore, there is an urgent need for a polyphonic word disambiguation method to solve the problem that the polyphonic word disambiguation effect is poor in the prior art.

发明内容SUMMARY OF THE INVENTION

为克服相关技术中存在的问题，本申请提供一种多音字消歧方法、装置、电子设备及可读存储介质。In order to overcome the problems existing in the related art, the present application provides a polyphonic word disambiguation method, device, electronic device and readable storage medium.

根据本申请实施例的第一方面，提供一种多音字消歧的方法，所述方法包括：According to a first aspect of the embodiments of the present application, a method for disambiguating a polyphonic word is provided, the method comprising:

将待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符；Dividing the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters;

针对每个字符，获取所述字符对应的第一标识；For each character, obtain the first identifier corresponding to the character;

将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型，根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model.

可选地，在将所述待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符的步骤之前，还包括：Optionally, before the step of dividing the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters, it also includes:

预先生成目标多音字消歧模型；Pre-generate the target polyphonic word disambiguation model;

获取待处理文本。Get pending text.

可选地，所述预先生成目标多音字消歧模型，包括：Optionally, the pre-generated target polyphonic word disambiguation model includes:

获取训练样本，其中，所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息，所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识；Acquire a training sample, wherein the training sample includes several sample texts and annotation information of target polyphonic characters in the sample texts, where the annotation information is used to indicate the pronunciation of the target polyphonic characters in the sample text and the The second identification corresponding to the pronunciation of the target polyphonic character in the sample text;

将所述若干样本文本作为输入，将若干所述样本文本中目标多音字字符的标注信息作为输出的目标，对预设的初始模型进行训练，将训练完成的模型确定为目标多音字消歧模型。Taking the several sample texts as input, using the annotation information of the target polyphonic characters in the several sample texts as the output target, training the preset initial model, and determining the trained model as the target polyphonic disambiguation model .

可选地，所述根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音，包括：Optionally, determining the pronunciation of the target polyphonic character according to the output of the target polyphonic disambiguation model, including:

根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息；According to the output of the target polyphone disambiguation model, the label information of the target polyphone character is obtained;

根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。The pronunciation of the target polyphonic character is determined according to the labeling information of the target polyphonic character.

根据本申请实施例的第二方面，提供一种多音字消歧的装置，所述装置包括：According to a second aspect of the embodiments of the present application, a device for disambiguating a polyphonic word is provided, the device comprising:

划分模块，用于将待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符；A division module, for dividing the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters;

第一标识获取模块，用于针对每个字符，获取所述字符对应的第一标识；a first identification obtaining module, configured to obtain, for each character, the first identification corresponding to the character;

多音字消歧模块，用于将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型，根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。A polyphonic word disambiguation module, for inputting the character and the first identification corresponding to the character into a pre-generated target polyphonic word disambiguation model, and determining the target polyphonic word according to the output of the target polyphonic word disambiguation model pronunciation of characters.

可选地，所述装置还包括：Optionally, the device further includes:

多音字消歧模型训练模块，用于预先生成目标多音字消歧模型；A polyphonic word disambiguation model training module, which is used to generate a target polyphonic word disambiguation model in advance;

待处理文本获取模块，用于获取待处理文本。The pending text obtaining module is used to obtain the pending text.

可选地，所述多音字消歧模型训练模块，包括：Optionally, the polyphonic word disambiguation model training module includes:

训练样本获取单元，用于获取训练样本，其中，所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息，所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识；A training sample obtaining unit is configured to obtain a training sample, wherein the training sample includes a number of sample texts and label information of target polyphonic characters in the sample text, and the label information is used to indicate that there are more targets in the sample text. The pronunciation of the phonetic character and the second identification corresponding to the pronunciation of the target polyphonic character in the sample text;

多音字消歧模型训练单元，用于将所述若干样本文本作为输入，将若干所述样本文本中目标多音字字符的标注信息作为输出的目标，对预设的初始模型进行训练，将训练完成的模型确定为目标多音字消歧模型。The polyphonic word disambiguation model training unit is used for taking the several sample texts as input, and using the annotation information of the target polyphonic characters in the several sample texts as the output target, training the preset initial model, and completing the training The model is identified as the target polyphonic word disambiguation model.

可选地，所述多音字消歧模块，还包括：Optionally, the polyphonic word disambiguation module also includes:

标注信息获取单元，用于根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息；a labeling information obtaining unit, configured to obtain labeling information of the target polyphonic character according to the output of the target polyphonic disambiguation model;

目标多音字字符的发音获取单元，用于根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。The pronunciation obtaining unit of the target polyphonic character is configured to determine the pronunciation of the target polyphonic character according to the label information of the target polyphonic character.

根据本申请实施例的第三方面，提供一种电子设备，包括：According to a third aspect of the embodiments of the present application, an electronic device is provided, including:

处理器；processor;

用于存储所述处理器可执行指令的存储器；memory for storing instructions executable by the processor;

其中，所述处理器被配置为执行所述指令，以实现所述多音字消歧的方法。Wherein, the processor is configured to execute the instructions to implement the method for disambiguating polyphonic words.

根据本申请实施例的第四方面，提供一种计算机可读存储介质，当所述计算机可读存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行所述多音字消歧的方法。According to a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, when instructions in the computer-readable storage medium are executed by a processor of an electronic device, the electronic device can execute the polyphonic word disambiguation Methods.

本申请的实施例提供的技术方案可以包括以下有益效果：The technical solutions provided by the embodiments of the present application may include the following beneficial effects:

本申请通过将待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符；针对每个字符，获取所述字符对应的第一标识；将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型，根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。通过本申请的实施例提供的技术方案，利用目标多音字消歧模型对待处理文本中的目标多音字字符进行多音字消歧，进而提高了多音字消歧的预测速度，进一步地，提高了多音字消歧的效果。In the present application, the text to be processed is divided into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters; for each character, the first identifier corresponding to the character is obtained; the The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model. Through the technical solutions provided by the embodiments of the present application, the target polyphonic word disambiguation model is used to perform polyphonic word disambiguation on the target polyphonic word character in the text to be processed, thereby improving the prediction speed of polyphonic word disambiguation, and further, improving the multi-phonic word disambiguation. The effect of phonetic disambiguation.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本发明的实施例，并与说明书一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description serve to explain the principles of the invention.

图1是根据一示例性实施例示出的一种多音字消歧方法的流程图；1 is a flowchart of a method for disambiguating a polyphonic word according to an exemplary embodiment;

图2是根据一示例性实施例示出的另一种多音字消歧方法的流程图；Fig. 2 is a flowchart of another polyphonic word disambiguation method shown according to an exemplary embodiment;

图3是图2所示的根据一示例性实施例示出的另一种多音字消歧方法的流程图中步骤201的流程图；3 is a flowchart of step 201 in the flowchart of another polyphonic word disambiguation method shown in FIG. 2 according to an exemplary embodiment;

图4是图1所示的根据一示例性实施例示出的一种多音字消歧方法的流程图中步骤103的流程图；4 is a flowchart of step 103 in the flowchart of a polyphonic word disambiguation method shown in FIG. 1 according to an exemplary embodiment;

图5是根据一示例性实施例示出的一种多音字消歧的装置框图；5 is a block diagram of a device for disambiguating polyphonic characters according to an exemplary embodiment;

图6是根据一示例性实施例示出的另一种多音字消歧的装置框图；Fig. 6 is another apparatus block diagram of polyphonic word disambiguation shown according to an exemplary embodiment;

图7是图6所示的根据一示例性实施例示出的另一种多音字消歧的装置框图中多音字消歧模型训练模块601的装置框图；Fig. 7 is the device block diagram of the polyphonic word disambiguation model training module 601 in the device block diagram of another polyphonic word disambiguation shown according to an exemplary embodiment shown in Fig. 6;

图8是图5所示的根据一示例性实施例示出的一种多音字消歧的装置框图中多音字消歧模块503的装置框图；Fig. 8 is the device block diagram of the polyphonic word disambiguation module 503 in the device block diagram of a polyphonic word disambiguation shown according to an exemplary embodiment shown in Fig. 5;

图9是根据一示例性实施例示出的一种电子设备的框图。Fig. 9 is a block diagram of an electronic device according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本发明相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本发明的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with some aspects of the invention as recited in the appended claims.

需要说明的是，在本申请实施例中，目标多音字消歧模型可优选为全卷积模型Unet，之所以选用全卷积模型Unet这一结构，是因为全卷积模型Unet可以取代现有技术中使用的循环神经网络模型RNN，进而能够达到更好的多音字消歧效果。全卷积模型Unet最常被用于图像分割领域，具体地，全卷积模型Unet的得名，源自其对称的U型结构，左侧为卷积层，右侧为上采样层。每个卷积层得到的特征图都会拼接到对应的上采样层上，以此来确保每层得到的特征都能运用的后续的计算中。这也使得全卷积模型Unet具备结合各层的特征，从而提高多音字消歧模型对于整体特征的把握，进而提高多音字消歧模型的效果。本申请是以全卷积模型Unet作为目标多音字消歧模型的实施例，在实际应用过程中，也可以使用其他全卷积模型应用于本申请的技术方案，例如：包括但不限于IDCNN。It should be noted that, in the embodiment of the present application, the target polyphonic word disambiguation model may preferably be the fully convolutional model Unet. The reason why the fully convolutional model Unet is selected is because the fully convolutional model Unet can replace the existing The recurrent neural network model RNN used in the technology can achieve better polyphonic word disambiguation effect. The fully convolutional model Unet is most often used in the field of image segmentation. Specifically, the fully convolutional model Unet is named after its symmetrical U-shaped structure, with the convolutional layer on the left and the upsampling layer on the right. The feature maps obtained by each convolutional layer will be spliced to the corresponding upsampling layer to ensure that the features obtained by each layer can be used in subsequent calculations. This also enables the fully convolutional model Unet to combine the features of each layer, thereby improving the polyphony disambiguation model's grasp of the overall features, thereby improving the effect of the polyphonic disambiguation model. The present application uses the fully convolutional model Unet as an embodiment of the target polyphonic word disambiguation model. In the actual application process, other fully convolutional models can also be used to apply to the technical solutions of the present application, for example: including but not limited to IDCNN.

图1是根据一示例性实施例示出的一种多音字消歧方法的流程图，如图1所示，包括以下步骤。Fig. 1 is a flowchart of a method for disambiguating a polyphonic word according to an exemplary embodiment. As shown in Fig. 1 , the method includes the following steps.

步骤101，将待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符。Step 101: Divide the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters.

需要说明的是，在本申请实施例中，从任务整体的角度来看，多音字消歧是一个序列到序列的标注任务；然则拆分开来，对于每个字符而言，是一个分类任务。例如：“行”的读音选项是xing2或者hang2，其中，2指的是该字符读音的声调为阳平(第二声)；“转”的读音选项是zhuan2、zhuan3和zhuan4，其中，3指的是该字符读音的声调为上声(第三声)，4指的是该字符读音的声调为去声(第四声)。It should be noted that, in the embodiment of the present application, from the perspective of the overall task, polyphonic word disambiguation is a sequence-to-sequence labeling task; however, for each character, it is a classification task. . For example: the pronunciation options of "行" are xing2 or hang2, where 2 refers to the tone of the character being pronounced as yangping (the second tone); the pronunciation options of "zhuan" are zhuan2, zhuan3 and zhuan4, where 3 refers to The tone of the character's pronunciation is the upper tone (the third tone), and 4 means the tone of the character's pronunciation is the go-tone (the fourth tone).

因此，首先可以将待处理文本划分为一个一个的字符，具体地，例如：以“登陆人行征信系统”作为待处理文本，将其划分为一个一个的字符，得到“登，陆，人，行，征，信，体，统”。Therefore, firstly, the text to be processed can be divided into characters one by one, specifically, for example: take "login to the People's Bank Credit Information System" as the text to be processed, divide it into characters one by one, and get "login, log in, person, Conduct, sign, trust, style, and standardize".

进一步地，在本申请实施例中，图2是根据一示例性实施例示出的另一种多音字消歧方法的流程图，如图2所示，在步骤101之前还可以包括以下步骤。Further, in this embodiment of the present application, FIG. 2 is a flowchart of another method for disambiguating polyphonic characters according to an exemplary embodiment. As shown in FIG. 2 , the following steps may be further included before step 101 .

步骤201，预先生成目标多音字消歧模型。Step 201, pre-generate a target polyphonic word disambiguation model.

进一步地，在本申请实施例中，图3是图2所示的根据一示例性实施例示出的另一种多音字消歧方法的流程图中步骤201的流程图，如图3所示，步骤201还可以包括以下步骤。Further, in the embodiment of the present application, FIG. 3 is a flowchart of step 201 in the flowchart of another polyphonic word disambiguation method shown in FIG. 2 according to an exemplary embodiment, as shown in FIG. 3 , Step 201 may also include the following steps.

步骤301，获取训练样本，其中，所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息，所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识。Step 301: Obtain training samples, wherein the training samples include several sample texts and label information of target polyphonic characters in the sample texts, where the labeling information is used to indicate the pronunciation of the target polyphonic characters in the sample texts and the second identifier corresponding to the pronunciation of the target polyphonic character in the sample text.

步骤302，将所述若干样本文本作为输入，将若干所述样本文本中目标多音字字符的标注信息作为输出的目标，对预设的初始模型进行训练，将训练完成的模型确定为目标多音字消歧模型。Step 302, using the several sample texts as input, using the label information of the target polyphonic characters in the several sample texts as the output target, training the preset initial model, and determining the model after training as the target polyphonic character Disambiguation model.

需要说明的是，在本申请实施例中，可根据训练样本对预设的初始模型进行训练，训练完成的模型即为目标多音字消歧模型，其中，目标多音字消歧模型为全卷积模型Unet结构。具体地，将获取的训练样本中包含的若干样本文本作为初始模型的输入，将若干样本文本中目标多音字字符的标注信息作为初始模型的输出的目标，对预设的初始模型进行训练，将训练完成的模型确定为目标多音字消歧模型，其中，目标多音字消歧模型为全卷积模型Unet。例如：针对多音字字符“行”的读音为hang2，获取一万句包含有多音字字符“行”，其中，在包含有“行”多音字字符的一万句中，需要包含“行”这个字符在自然语言中的各种读音，例如：hang2,xing2；在读音为hang2的文本中，将不是多音字字符“行”的其他字符添加标识为0，将是多音字字符“行”，且其读音为hang2的字符根据预先设定的多音字列表获取其对应的标注信息，如：hang2_4,其中，2指的是该字符读音的声调为阳平(第二声)，4指的是该字符在多音字列表中的标识。将一万句包含有多音字字符“行”，且其读音为hang2的文本作为初始模型的输入，将该字符对应的标注信息，即hang2_4,作为输出的目标，对初始模型进行训练，训练完成的模型即为目标多音字消歧模型。本申请对标注信息的形式以及标识的具体数值不作具体限定。It should be noted that, in the embodiment of the present application, the preset initial model can be trained according to the training sample, and the model after training is the target polyphonic word disambiguation model, wherein the target polyphonic word disambiguation model is a full convolution Model Unet structure. Specifically, several sample texts contained in the acquired training samples are used as the input of the initial model, and the label information of the target polyphonic characters in the several sample texts is used as the output target of the initial model, and the preset initial model is trained, and the The trained model is determined as the target polyphonic word disambiguation model, wherein the target polyphonic word disambiguation model is a fully convolutional model Unet. For example: for the pronunciation of the polyphonic character "行" is hang2, get 10,000 sentences containing the polyphonic character "行", among which, in the ten thousand sentences containing the polyphonic character "行", it is necessary to include the word "行" Various pronunciations of characters in natural language, such as: hang2, xing2; in the text with the pronunciation of hang2, adding other characters that are not the polysyllabic character "line" as 0 will be the polysyllabic character "line", and The character whose pronunciation is hang2 obtains its corresponding labeling information according to the preset polyphonic word list, such as: hang2_4, wherein 2 refers to the tone of the pronunciation of the character as yangping (the second tone), and 4 refers to the character Identification in the polyphonic list. Take 10,000 sentences containing the polyphonic character "line" and the text with the pronunciation of hang2 as the input of the initial model, and the label information corresponding to the character, namely hang2_4, as the target of the output, train the initial model, and the training is completed. The model is the target polyphonic word disambiguation model. The application does not specifically limit the form of the marked information and the specific value of the marking.

本申请通过预先训练目标多音字消歧模型，其中，目标多音字消歧模型为全卷积模型Unet，进而能够将预先训练好的全卷积模型Unet应用于多音字消歧的方法中，实现提高多音字消歧的效率。The present application pre-trains a target polyphonic word disambiguation model, wherein the target polyphonic word disambiguation model is a full convolution model Unet, and then the pre-trained full convolution model Unet can be applied to the method for polyphonic word disambiguation to achieve Improve the efficiency of polyphonic word disambiguation.

进一步地，在本申请实施例中，图2是根据一示例性实施例示出的另一种语音识别方法的流程图，如图2所示，在步骤101之前还可以包括以下步骤。Further, in this embodiment of the present application, FIG. 2 is a flowchart of another speech recognition method according to an exemplary embodiment. As shown in FIG. 2 , the following steps may be further included before step 101 .

步骤202，获取待处理文本。Step 202, acquiring the text to be processed.

需要说明的是，在本申请实施例中，待处理文本可以是任何文本，其可以包括至少一个目标多音字字符。可以通过任何方式获取待处理文本，本申请对获取待处理文本的方式不作具体限定。It should be noted that, in this embodiment of the present application, the text to be processed may be any text, which may include at least one target polyphonic character. The text to be processed may be acquired in any manner, and the application does not specifically limit the manner of acquiring the text to be processed.

步骤102，针对每个字符，获取所述字符对应的第一标识。Step 102: For each character, obtain a first identifier corresponding to the character.

需要说明的是，在本申请实施例中，针对待处理文本划分出的每个字符，将其每个字符添加第一标识，具体地，例如：将划分出的每个字符“登，陆，人，行，征，信，体，统”添加上第一标识，可以得到“登_1，陆_2，人_3，行_4，征_5，信_6，系_7，统_8”,本申请对第一标识的具体数值不做具体限定，且对添加第一标识的形式不作具体限定。It should be noted that, in this embodiment of the present application, for each character divided by the text to be processed, a first identifier is added to each character, specifically, for example: each divided character "login, land, Person, line, sign, letter, body, system" add the first logo, you can get "Deng_1, Lu_2, person_3, line_4, sign_5, letter_6, system_7, system _8", this application does not specifically limit the specific value of the first identification, and does not specifically limit the form of adding the first identification.

步骤103，将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型，根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。Step 103: Input the character and the first identifier corresponding to the character into a pre-generated target polyphone disambiguation model, and determine the pronunciation of the target polyphone character according to the output of the target polyphone disambiguation model.

进一步地，在本申请实施例中，图4是图1所示的根据一示例性实施例示出的一种语音识别方法的流程图中步骤103的流程图，如图4所示，步骤103还可以包括以下步骤。Further, in this embodiment of the present application, FIG. 4 is a flowchart of step 103 in the flowchart of a speech recognition method shown in FIG. 1 according to an exemplary embodiment. As shown in FIG. 4 , step 103 further The following steps can be included.

步骤401，根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息。Step 401: Obtain label information of the target polyphone character according to the output of the target polyphone disambiguation model.

步骤402，根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。Step 402: Determine the pronunciation of the target polyphonic character according to the labeling information of the target polyphonic character.

需要说明的是，在本申请实施例中，根据目标多音字消歧模型，也即全卷积模型Unet的输出得到目标多音字字符的标注信息，其中，标注信息中包含有目标多音字字符的发音和目标多音字字符对应的第二标识，进一步地，能够确定目标多音字字符的发音。例如：得到多音字字符“行”的标注信息为xing2_8,即可获得在多音字列表中该字符对应的标识为8，且其发音为xing阳平(第二声)。It should be noted that, in the embodiment of the present application, according to the target polyphonic disambiguation model, that is, the output of the full convolution model Unet, the labeling information of the target polyphonic character is obtained, wherein the labeling information includes the target polyphonic character. The pronunciation corresponds to the second identification of the target polyphonic character, and further, the pronunciation of the target polyphonic character can be determined. For example, if the label information of the polyphonic character "row" is obtained as xing2_8, the corresponding identification of the character in the polyphonic list can be obtained as 8, and its pronunciation is xing yangping (the second tone).

本申请通过将待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符；针对每个字符，获取所述字符对应的第一标识；将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型，根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。通过本申请的实施例提供的技术方案，利用目标多音字消歧模型，也即全卷积模型Unet，对待处理文本中的目标多音字字符进行多音字消歧，由于全卷积模型Unet更易并行化，不需要保留以及更新state，因而输出之间不存在依序关系，进而通过利用全卷积模型Unet不仅能够提高多音字消歧的预测速度，同时还能够很好地对上下文信息进行把握，进一步地，提高了多音字消歧的效果。本申请通过预先生成目标多音字消歧模型，其中，目标多音字消歧模型为全卷积模型Unet，进而能够将预先训练好的全卷积模型Unet应用于多音字消歧的方法中，进一步地，提高了多音字消歧的处理效率和预测准确率。In the present application, the text to be processed is divided into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters; for each character, the first identifier corresponding to the character is obtained; the The character and the first identifier corresponding to the character are input into a pre-generated target polyphone disambiguation model, and the pronunciation of the target polyphone character is determined according to the output of the target polyphone disambiguation model. Through the technical solutions provided by the embodiments of the present application, the target polyphone disambiguation model, that is, the full convolution model Unet, is used to disambiguate the target polyphone characters in the text to be processed. Since the full convolution model Unet is easier to parallelize Therefore, there is no sequential relationship between the outputs. By using the fully convolutional model Unet, the prediction speed of polyphonic word disambiguation can not only be improved, but also the context information can be well grasped. Further, the effect of polyphonic word disambiguation is improved. The present application generates a target polyphonic word disambiguation model in advance, wherein the target polyphonic word disambiguation model is a full convolution model Unet, and then the pre-trained full convolution model Unet can be applied to the method for polyphonic word disambiguation, and further Therefore, the processing efficiency and prediction accuracy of polyphonic word disambiguation are improved.

图5是根据一示例性实施例示出的一种多音字消歧的装置框图，参照图5，该装置包括划分模块501、第一标识获取模块502、多音字消歧模块503。FIG. 5 is a block diagram of an apparatus for disambiguating a polyphonic word according to an exemplary embodiment. Referring to FIG. 5 , the apparatus includes a division module 501 , a first identification obtaining module 502 , and a polyphonic word disambiguation module 503 .

划分模块501，用于将待处理文本划分为若干个字符，其中，所述若干个字符包括目标多音字字符和非目标多音字字符。The dividing module 501 is configured to divide the text to be processed into several characters, wherein the several characters include target polyphonic characters and non-target polyphonic characters.

第一标识获取模块502，用于针对每个字符，获取所述字符对应的第一标识。The first identification obtaining module 502 is configured to obtain, for each character, the first identification corresponding to the character.

多音字消歧模块503，用于将所述字符以及所述字符对应的第一标识输入至预先生成的目标多音字消歧模型，根据所述目标多音字消歧模型的输出确定所述目标多音字字符的发音。The polyphonic word disambiguation module 503 is configured to input the character and the first identification corresponding to the character into the pre-generated target polyphonic word disambiguation model, and determine the target polyphonic word disambiguation model according to the output of the target polyphonic word disambiguation model. Pronunciation of phonetic characters.

可选地，图6是根据一示例性实施例示出的另一种多音字消歧的装置框图。参照图6，该装置包括多音字消歧模型训练模块601、待处理文本获取模块602。Optionally, FIG. 6 is a block diagram of another apparatus for disambiguating polyphonic characters according to an exemplary embodiment. Referring to FIG. 6 , the apparatus includes a polyphonic word disambiguation model training module 601 and a to-be-processed text acquisition module 602 .

多音字消歧模型训练模块601，用于预先生成目标多音字消歧模型。The polyphonic word disambiguation model training module 601 is used to generate a target polyphonic word disambiguation model in advance.

待处理文本获取模块602，用于获取待处理文本。The to-be-processed text acquisition module 602 is configured to acquire the to-be-processed text.

可选地，图7是图6所示的根据一示例性实施例示出的另一种多音字消歧的装置框图中多音字消歧模型训练模块601的装置框图。参照图7，该装置包括训练样本获取单元701、多音字消歧模型训练单元702。Optionally, FIG. 7 is an apparatus block diagram of a polyphonic word disambiguation model training module 601 in another apparatus block diagram of polyphonic word disambiguation shown in FIG. 6 according to an exemplary embodiment. Referring to FIG. 7 , the apparatus includes a training sample acquisition unit 701 and a polyphonic word disambiguation model training unit 702 .

训练样本获取单元701，用于获取训练样本，其中，所述训练样本包括若干样本文本和若干所述样本文本中目标多音字字符的标注信息，所述标注信息用于指示所述样本文本中目标多音字字符的发音以及所述样本文本中目标多音字字符的发音对应的第二标识。A training sample obtaining unit 701, configured to obtain a training sample, wherein the training sample includes several sample texts and several annotation information of target polyphonic characters in the sample text, and the annotation information is used to indicate the target in the sample text. The pronunciation of the polyphonic character and the second identifier corresponding to the pronunciation of the target polyphonic character in the sample text.

多音字消歧模型训练单元702，用于将所述若干样本文本作为输入，将若干所述样本文本中目标多音字字符的标注信息作为输出的目标，对预设的初始模型进行训练，将训练完成的模型确定为目标多音字消歧模型。The polyphonic word disambiguation model training unit 702 is configured to use the several sample texts as input, and use the label information of the target polyphonic word characters in the several sample texts as the output target, train a preset initial model, and train the training The completed model is determined as the target polyphonic word disambiguation model.

可选地，图8是图5所示的根据一示例性实施例示出的一种多音字消歧的装置框图中多音字消歧模块503的装置框图。参照图8该装置包括标注信息获取单元801、目标多音字字符的发音获取单元802。Optionally, FIG. 8 is an apparatus block diagram of the polyphonic word disambiguation module 503 in the apparatus block diagram of a polyphonic word disambiguation shown in FIG. 5 according to an exemplary embodiment. Referring to FIG. 8 , the apparatus includes an annotation information acquisition unit 801 and a pronunciation acquisition unit 802 of a target polyphonic character.

标注信息获取单元801，用于根据所述目标多音字消歧模型的输出得到所述目标多音字字符的标注信息。The labeling information obtaining unit 801 is configured to obtain labeling information of the target polyphonic character according to the output of the target polyphonic disambiguation model.

目标多音字字符的发音获取单元802，用于根据所述目标多音字字符的标注信息确定所述目标多音字字符的发音。The pronunciation obtaining unit 802 of the target polyphonic character is configured to determine the pronunciation of the target polyphonic character according to the label information of the target polyphonic character.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

图9是根据一示例性实施例示出的一种用于电子设备900的框图。例如，电子设备900可以是移动电话，计算机，数字广播终端，消息收发设备，游戏控制台，平板设备，医疗设备，健身设备，个人数字助理等。FIG. 9 is a block diagram of an electronic device 900 according to an exemplary embodiment. For example, electronic device 900 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

参照图9，电子设备900可以包括以下一个或多个组件：处理组件902，存储器904，电源组件906，多媒体组件908，音频组件910，输入/输出接口912，传感器组件914，以及通信组件916。9, electronic device 900 may include one or more of the following components: processing component 902, memory 904, power supply component 906, multimedia component 908, audio component 910, input/output interface 912, sensor component 914, and communication component 916.

处理组件902通常控制装置900的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。处理组件902可以包括一个或多个处理器920来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件902可以包括一个或多个模块，便于处理组件902和其他组件之间的交互。例如，处理组件902可以包括多媒体模块，以方便多媒体组件908和处理组件902之间的交互。The processing component 902 generally controls the overall operation of the apparatus 900, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 902 may include one or more processors 920 to execute instructions to perform all or some of the steps of the methods described above. Additionally, processing component 902 may include one or more modules to facilitate interaction between processing component 902 and other components. For example, processing component 902 may include a multimedia module to facilitate interaction between multimedia component 908 and processing component 902.

存储器904被配置为存储各种类型的数据以支持在设备900的操作。这些数据的示例包括用于在装置900上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。存储器904可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。Memory 904 is configured to store various types of data to support operation at device 900 . Examples of such data include instructions for any application or method operating on device 900, contact data, phonebook data, messages, pictures, videos, and the like. Memory 904 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电源组件906为电子设备900的各种组件提供电力。电源组件906可以包括电源管理系统，一个或多个电源，及其他与为电子设备900生成、管理和分配电力相关联的组件。Power supply assembly 906 provides power to various components of electronic device 900 . Power supply components 906 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 900 .

多媒体组件908包括在所述电子设备900和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件908包括一个前置摄像头和/或后置摄像头。当电子设备900处于操作模式，如拍摄模式或视频模式时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。Multimedia component 908 includes a screen that provides an output interface between the electronic device 900 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 908 includes a front-facing camera and/or a rear-facing camera. When the electronic device 900 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件910被配置为输出和/或输入音频信号。例如，音频组件910包括一个麦克风(MIC)，当电子设备900处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器904或经由通信组件916发送。在一些实施例中，音频组件910还包括一个扬声器，用于输出音频信号。Audio component 910 is configured to output and/or input audio signals. For example, audio component 910 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 900 is in operating modes, such as calling mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 904 or transmitted via communication component 916 . In some embodiments, audio component 910 also includes a speaker for outputting audio signals.

输入/输出接口912为处理组件902和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The input/output interface 912 provides an interface between the processing component 902 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

传感器组件914包括一个或多个传感器，用于为电子设备900提供各个方面的状态评估。例如，传感器组件914可以检测到电子设备900的打开/关闭状态，组件的相对定位，例如所述组件为电子设备900的显示器和小键盘，传感器组件914还可以检测电子设备900或电子设备900一个组件的位置改变，用户与电子设备900接触的存在或不存在，电子设备900方位或加速/减速和电子设备900的温度变化。传感器组件914可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件914还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件914还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。Sensor assembly 914 includes one or more sensors for providing status assessments of various aspects of electronic device 900 . For example, the sensor assembly 914 can detect the open/closed state of the electronic device 900, the relative positioning of the components, such as the display and the keypad of the electronic device 900, the sensor assembly 914 can also detect the electronic device 900 or one of the electronic devices 900 Changes in the position of components, presence or absence of user contact with the electronic device 900 , orientation or acceleration/deceleration of the electronic device 900 and changes in the temperature of the electronic device 900 . Sensor assembly 914 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 914 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件916被配置为便于电子设备900和其他设备之间有线或无线方式的通信。电子设备900可以接入基于通信标准的无线网络，如WiFi，运营商网络(如2G、3G、4G或5G)，或它们的组合。在一个示例性实施例中，通信组件916经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件1116还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。Communication component 916 is configured to facilitate wired or wireless communication between electronic device 900 and other devices. Electronic device 900 may access wireless networks based on communication standards, such as WiFi, carrier networks (eg, 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 916 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1116 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，电子设备900可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述方法。In an exemplary embodiment, electronic device 900 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器904，上述指令可由电子设备900的处理器920执行以完成上述方法。例如，所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as memory 904 including instructions, executable by the processor 920 of the electronic device 900 to perform the above method. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本发明的其它实施方案。本发明旨在涵盖本发明的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本发明的真正范围和精神由下面的权利要求指出。Other embodiments of the invention will readily occur to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. The present invention is intended to cover any modifications, uses or adaptations of the present invention which follow the general principles of the invention and include common knowledge or common technical means in the technical field not disclosed in this application . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the invention being indicated by the following claims.

应当理解的是，本发明并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本发明的范围仅由所附的权利要求来限制。It should be understood that the present invention is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from its scope. The scope of the present invention is limited only by the appended claims.

Claims

1. A method of polyphonic disambiguation, the method comprising:

dividing a text to be processed into a plurality of characters, wherein the plurality of characters comprise target polyphone characters and non-target polyphone characters;

aiming at each character, acquiring a first identifier corresponding to the character;

inputting the characters and the first identifications corresponding to the characters into a pre-generated target polyphonic character disambiguation model, and determining the pronunciation of the target polyphonic character according to the output of the target polyphonic character disambiguation model.

2. The method of claim 1, wherein before the step of dividing the text to be processed into a number of characters, wherein the number of characters includes a target polyphonic character and a non-target polyphonic character, further comprising:

generating a target polyphone disambiguation model in advance;

and acquiring a text to be processed.

3. The method of claim 2, wherein the pre-generating a target polyphonic disambiguation model comprises:

acquiring a training sample, wherein the training sample comprises a plurality of sample texts and marking information of target polyphonic characters in the plurality of sample texts, and the marking information is used for indicating pronunciation of the target polyphonic characters in the sample texts and second identification corresponding to the pronunciation of the target polyphonic characters in the sample texts;

and taking the sample texts as input, taking the marking information of the target polyphone characters in the sample texts as an output target, training a preset initial model, and determining the trained model as a target polyphone disambiguation model.

4. The method of claim 1, wherein determining the pronunciation of the target polyphonic character from the output of the target polyphonic disambiguation model comprises:

obtaining the marking information of the target polyphone character according to the output of the target polyphone disambiguation model;

and determining the pronunciation of the target polyphonic character according to the labeling information of the target polyphonic character.

5. An apparatus for polyphonic disambiguation, the apparatus comprising:

the device comprises a dividing module, a processing module and a processing module, wherein the dividing module is used for dividing a text to be processed into a plurality of characters, and the plurality of characters comprise target polyphonic characters and non-target polyphonic characters;

the first identification acquisition module is used for acquiring a first identification corresponding to each character;

and the polyphone disambiguation module is used for inputting the characters and the first identifications corresponding to the characters into a pre-generated target polyphone disambiguation model and determining the pronunciation of the target polyphone characters according to the output of the target polyphone disambiguation model.

6. The apparatus of claim 5, further comprising:

the polyphone disambiguation model training module is used for generating a target polyphone disambiguation model in advance;

and the to-be-processed text acquisition module is used for acquiring the to-be-processed text.

7. The apparatus of claim 6, wherein the polyphonic disambiguation model training module comprises:

the training sample acquisition unit is used for acquiring a training sample, wherein the training sample comprises a plurality of sample texts and marking information of target polyphonic characters in the plurality of sample texts, and the marking information is used for indicating pronunciation of the target polyphonic characters in the sample texts and second identification corresponding to the pronunciation of the target polyphonic characters in the sample texts;

and the polyphone disambiguation model training unit is used for taking the plurality of sample texts as input, taking the marking information of the target polyphone characters in the plurality of sample texts as an output target, training a preset initial model, and determining the trained model as the target polyphone disambiguation model.

8. The apparatus of claim 5, wherein the polyphonic disambiguation module further comprises:

the label information acquisition unit is used for acquiring label information of the target polyphonic characters according to the output of the target polyphonic disambiguation model;

and the pronunciation acquisition unit of the target polyphonic character is used for determining the pronunciation of the target polyphonic character according to the labeling information of the target polyphonic character.

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to execute the instructions to implement a method of polyphonic disambiguation as claimed in any of claims 1 to 4.

10. A computer readable storage medium having instructions that, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of polyphonic disambiguation as claimed in any of claims 1 to 4.