WO2021102754A1

WO2021102754A1 - Data processing method and device and storage medium

Info

Publication number: WO2021102754A1
Application number: PCT/CN2019/121331
Authority: WO
Inventors: 赵亮
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Heytap Technology Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd; Shenzhen Heytap Technology Corp Ltd
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2021-06-03
Anticipated expiration: 2022-05-27
Also published as: CN114556969A

Abstract

A data processing method and device and a storage medium. The method comprises: acquiring audio data, and recognizing the audio data to obtain recognition texts (201); extracting at least two pieces of feature data of the recognition texts, and searching, from a knowledge graph database, additional information associated with each of the at least two pieces of feature data (202); and using the searched additional information and the recognition texts to generate a simultaneous interpretation result, and outputting the simultaneous interpretation result (203), the simultaneous interpretation result being used for being presented at a first terminal when the audio data is played.

Description

Data processing method, device and storage medium

Technical field

本申请涉及同声传译技术，具体涉及一种数据处理方法、装置及存储介质。This application relates to simultaneous interpretation technology, in particular to a data processing method, device and storage medium.

Background technique

机器同传技术是近些年出现的针对会议、报告等场景的语音翻译产品，其结合自动语音识别技术(ASR，Automatic Speech Recognition)技术和机器翻译(MT，Machine Translation)技术，为演讲者的演讲内容提供多语种的字幕展现，替代人工同传服务。Machine simultaneous interpretation technology is a speech translation product for conferences, reports and other scenes that has emerged in recent years. It combines automatic speech recognition (ASR, Automatic Speech Recognition) technology and machine translation (MT, Machine Translation) technology to provide speakers The speech content provides multilingual subtitles to display, instead of manual simultaneous interpretation services.

相关机器同传技术中，通常对演讲内容进行翻译，并通过文字进行展示，但展示的内容无法使用户充分、准确理解演讲内容。In related machine simultaneous interpretation technology, the speech content is usually translated and displayed through text, but the displayed content cannot enable users to fully and accurately understand the speech content.

发明内容Summary of the invention

本申请实施例提供一种数据处理方法、装置和存储介质。The embodiments of the present application provide a data processing method, device, and storage medium.

本申请实施例提供一种数据处理方法，包括：The embodiment of the present application provides a data processing method, including:

获取音频数据；Obtain audio data;

对所述音频数据进行识别，得到识别文本；Recognizing the audio data to obtain recognized text;

提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；Extracting at least two feature data of the recognized text, and searching a knowledge graph database for additional information associated with each feature data of the at least two feature data;

利用查找到的附加信息和所述识别文本，生成同传结果；Use the additional information found and the recognized text to generate simultaneous interpretation results;

输出所述同传结果；所述同传结果用于在播放所述音频数据时在第一终端进行呈现。The simultaneous interpretation result is output; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.

本申请实施例还提供一种数据处理装置，包括：The embodiment of the present application also provides a data processing device, including:

获取单元，配置为获取音频数据；The obtaining unit is configured to obtain audio data;

第一处理单元，配置为对所述音频数据进行识别，得到识别文本；提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；The first processing unit is configured to recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search for each feature data of the at least two feature data from the knowledge graph database Associated additional information;

第二处理单元，配置为利用查找到的附加信息和所述识别文本，生成同传结果；The second processing unit is configured to use the found additional information and the recognized text to generate a simultaneous interpretation result;

输出单元，配置为输出所述同传结果；所述同传结果用于在播放所述音频数据时在第一终端进行呈现。The output unit is configured to output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.

本申请实施例又提供了一种数据处理装置，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现上述任一所述方法的步骤。The embodiment of the present application further provides a data processing device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any of the above-mentioned methods when the program is executed. A step of.

本申请实施例还提供了一种存储介质，其上存储有计算机指令，所述指令被处理器执行时实现上述任一所述方法的步骤。The embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any one of the foregoing methods are implemented.

本申请实施例提供的数据处理方法、装置和存储介质，获取音频数据；对所述音频数据进行识别，得到识别文本；提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；利用查找到的附加信息和所述识别文本，生成同传结果；输出所述同传结果，所述同传结果用于在播放所述音频数据时在第一终端进行呈现，为用户提供与所述音频数据关联的附加信息，能够帮助用户充分、准确理解所述音频数据的产生者的的演讲内容，降低用户对所述演讲内容的理解难度。The data processing method, device, and storage medium provided by the embodiments of the application obtain audio data; recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search and search from the knowledge graph database Additional information associated with each feature data in the at least two feature data; use the additional information found and the recognition text to generate a simultaneous interpretation result; output the simultaneous interpretation result, which is used in the playback When the audio data is presented in the first terminal, the user is provided with additional information associated with the audio data, which can help the user to fully and accurately understand the speech content of the producer of the audio data, and reduce the user’s speech on the speech. Difficulty in understanding the content.

Description of the drawings

图1是相关技术中同声传译实现的流程示意图；Figure 1 is a schematic diagram of the implementation process of simultaneous interpretation in related technologies;

图2为本申请实施例数据处理方法的实现流程示意图；FIG. 2 is a schematic diagram of an implementation process of a data processing method according to an embodiment of the application;

图3为本申请实施例服务器提取所述识别文本的至少两个特征数据的实现流程示意图一；3 is a schematic diagram 1 of the implementation process of extracting at least two feature data of the recognized text by the server according to the embodiment of the application;

图4为本申请实施例服务器提取所述翻译文本的至少两个特征数据的实现流程示意图一；4 is a schematic diagram 1 of the implementation process of extracting at least two feature data of the translated text by the server according to the embodiment of the application;

图5为本申请实施例服务器提取所述识别文本的至少两个特征数据的实现流程示意图二；5 is a second schematic diagram of the implementation process of extracting at least two feature data of the recognized text by the server according to the embodiment of the application;

图6为本申请实施例服务器提取所述翻译文本的至少两个特征数据的实现流程示意图二；6 is a second schematic diagram of the implementation process of extracting at least two feature data of the translated text by the server according to the embodiment of the application;

图7为本申请实施例服务器搜索所述识别文本对应的附加信息的实现流程示意图；FIG. 7 is a schematic diagram of the implementation process of searching the additional information corresponding to the recognized text by the server according to the embodiment of the application;

图8为本申请实施例服务器搜索所述翻译文本对应的附加信息的实现流程示意图；FIG. 8 is a schematic diagram of the implementation process of searching the additional information corresponding to the translated text by the server according to the embodiment of the application;

图9为本申请实施例服务器搜索所述识别文本对应的附加信息的实现流程示意图；9 is a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text according to the embodiment of the application;

图10为本申请实施例服务器搜索所述翻译文本对应的附加信息的实现流程示意图；FIG. 10 is a schematic diagram of the implementation process of searching the additional information corresponding to the translated text by the server according to the embodiment of the application;

图11为本申请实施例服务器输出同传结果的示意图；FIG. 11 is a schematic diagram of the simultaneous interpretation result output by the server according to an embodiment of the application;

图12为本申请实施例服务器生成并输出同传结果的一种实现流程示意图；FIG. 12 is a schematic diagram of an implementation process of the server generating and outputting simultaneous interpretation results according to an embodiment of the application;

图13为本申请实施例服务器生成并输出同传结果的又一种实现流程示意图；FIG. 13 is a schematic diagram of another implementation process of the server generating and outputting simultaneous interpretation results according to an embodiment of the application;

图14为本申请实施例数据处理装置的一种组成结构示意图；FIG. 14 is a schematic diagram of a structure of a data processing device according to an embodiment of the application;

图15为本申请实施例数据处理装置的又一种组成结构示意图。FIG. 15 is a schematic diagram of another composition structure of the data processing device according to an embodiment of the application.

Detailed ways

在对本申请实施例的技术方案进行详细说明之前，首先对相关技术进行简单说明。Before describing the technical solutions of the embodiments of the present application in detail, first, a brief description of the related technologies will be given.

相关技术中，机器同传技术是近些年出现的针对会议、报告等场景的语音翻译产品，其结合人工智能(AI，Artificial Intelligence)技术、MT、ASR和语音合成(TTS，Text-To-Speech)技术，实现同声传译(SI，Simultaneous Interpretation)。所述机器同传还可以称为机器同传、AI同声传译、AI同传等。Among related technologies, machine simultaneous interpretation technology is a voice translation product for conferences, reports and other scenarios that has appeared in recent years. It combines artificial intelligence (AI, Artificial Intelligence) technology, MT, ASR, and speech synthesis (TTS, Text-To- Speech technology to realize Simultaneous Interpretation (SI, Simultaneous Interpretation). The machine simultaneous interpretation may also be referred to as machine simultaneous interpretation, AI simultaneous interpretation, AI simultaneous interpretation, and the like.

实际应用中，演讲者可以通过客户端进行会议演讲，并将展示的内容投屏到显示屏幕，通过显示屏幕展示给用户。图1是相关技术中同声传译实现的流程示意图，如图1所示，在进行会议演讲的过程中，客户端通过麦克风采集演讲者的音频，将采集的音频发送给服务端，所述服务端对音频数据进行识别，得到源语言对应的识别文本，再对所述识别文本进行机器翻译，得到目标语言对应的翻译结果；最后通过屏幕展示或通过耳机等设备播报语音，为用户展示翻译结果，从而实现将演讲者的演讲内容翻译成用户需要的语种。In practical applications, the lecturer can give a conference lecture through the client, and project the displayed content to the display screen, and show it to the user through the display screen. Figure 1 is a schematic diagram of the implementation process of simultaneous interpretation in related technologies. As shown in Figure 1, during a conference speech, the client uses a microphone to collect the speaker’s audio and sends the collected audio to the server. The end recognizes the audio data, obtains the recognized text corresponding to the source language, and then performs machine translation on the recognized text to obtain the translation result corresponding to the target language; finally, the voice is displayed on the screen or broadcast through headphones and other devices to show the user the translation result , So as to achieve the translation of the lecturer's speech content into the language required by the user.

相关技术中的同声传译方案可以为用户展示不同语种的同传内容，但是仅针对演讲者口述内容进行同传，并未翻译出与演讲者口述内容相关的信息(如专业术语、引用事件、人物简介等)，若用户缺乏或不熟悉上述信息，将难以正确、充分理解演讲者的演讲内容，从而不能有效降低用户对演讲内容理解的难度，进而降低用户体验。为使用户更加方便、容易理解演讲者的演讲内容，目前的机器同传技术通过字符串匹配方法对待解释字符串与预设词典进行匹配，并展示匹配结果。但针对的应用场景并非同传场景；且针对预设词典，需要大量人工予以开发，费时费力。另外，对于无法事先确定演讲内容的情况，更加难以预先设定词典，难以针对演讲者临场发挥演讲的内容给出充分理解的信息，缺乏灵活性。Simultaneous interpretation solutions in related technologies can display simultaneous interpretation content in different languages for users, but only perform simultaneous interpretation for the speaker’s verbal content, and do not translate information related to the speaker’s verbal content (such as professional terms, citations, events, etc.). Character profile, etc.), if the user lacks or is not familiar with the above information, it will be difficult to correctly and fully understand the lecture content of the lecturer, which will not effectively reduce the difficulty of the user’s understanding of the lecture content, thereby reducing the user experience. In order to make it more convenient and easier for users to understand the content of the lecturer's speech, the current machine simultaneous interpretation technology uses a string matching method to match the interpreted string with a preset dictionary and display the matching result. But the target application scenario is not a simultaneous interpretation scenario; and for the preset dictionary, a lot of manual development is required, which is time-consuming and labor-intensive. In addition, for situations where the content of the speech cannot be determined in advance, it is even more difficult to pre-set the dictionary, and it is difficult to give a full understanding of the content of the speech by the speaker, and lack flexibility.

基于此，在本申请的各种实施例中，获取音频数据；对所述音频数据进行识别，得到识别文本；提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息(即能够帮助用户充分、准确理解演讲内容的补充说明信息)；利用查找到的附加信息和所述识别文本，生成同传结果；输出所述同传结果；所述同传结果用于在播放所述音频数据时在第一终端进行呈现。Based on this, in various embodiments of the present application, audio data is obtained; the audio data is recognized to obtain the recognized text; at least two feature data of the recognized text are extracted, and the data is searched from the knowledge graph database. Additional information associated with each feature data in at least two feature data (that is, supplementary information that can help users fully and accurately understand the content of the speech); use the additional information found and the recognized text to generate simultaneous interpretation results; output the The result of the simultaneous interpretation; the result of the simultaneous interpretation is used for presentation on the first terminal when the audio data is played.

下面结合附图及具体实施例对本申请作进一步详细的说明。The application will be further described in detail below in conjunction with the drawings and specific embodiments.

本申请实施例提供了一种数据处理方法，应用于服务器，图2为本申请实施例数据处理方法的实现流程示意图；如图2所示，所述方法包括：The embodiment of the application provides a data processing method, which is applied to a server. FIG. 2 is a schematic diagram of the implementation process of the data processing method according to the embodiment of the application; as shown in FIG. 2, the method includes:

步骤201：获取音频数据；对所述音频数据进行识别，得到识别文本；Step 201: Acquire audio data; recognize the audio data to obtain recognized text;

这里，所述音频数据是处于应用同传的场景中的用户进行演讲时所产生的音频。Here, the audio data is audio generated when a user in a scene where simultaneous interpretation is applied is giving a speech.

下面对服务器获取所述音频数据的过程进行说明。The process by which the server obtains the audio data will be described below.

这里，客户端可以设有或者连接有语音采集模块，如麦克风，通过所述语音采集模块对应用同传的场景中的用户的演讲内容进行采集，得到所述音频数据。所述客户端与所述服务器之间建立通信连接，通过无线通信模块发送给所述服务器。所述无线通信模块可以是蓝牙模块、无线保真(WiFi，Wireless Fidelity)模块等。Here, the client may be provided with or connected to a voice collection module, such as a microphone, through which the voice collection module collects the user's speech content in a scene where simultaneous interpretation is applied to obtain the audio data. A communication connection is established between the client and the server and sent to the server through a wireless communication module. The wireless communication module may be a Bluetooth module, a wireless fidelity (WiFi, Wireless Fidelity) module, or the like.

所述客户端的具体类型，本申请可以不做限定，例如可以为智能手机、个人计算机、笔记本电脑、平板电脑和便携式可穿戴设备等。The specific type of the client is not limited in this application. For example, it may be a smart phone, a personal computer, a notebook computer, a tablet computer, and a portable wearable device.

在一实施例中，所述服务器获取所述音频数据后，所述方法还包括：In an embodiment, after the server obtains the audio data, the method further includes:

运用语音识别技术对所述音频数据进行语音识别，获得识别文本。Using voice recognition technology to perform voice recognition on the audio data to obtain recognized text.

其中，所述识别文本的语言与应用同传的场景中产生所述音频数据的用户的语言即源语言一致。Wherein, the language of the recognized text is consistent with the language of the user who generates the audio data in the scenario where simultaneous interpretation is applied, that is, the source language.

举例来说，在应用同传的会议场景下，用户针对通信领域中终端如何进行上行传输进行演讲，演讲内容中包含两个术语“非授权频谱”和“LBT类型”，所述服务器获得所述演讲内容后，后续可以从知识图谱数据库中查询与术语“非授权频谱”匹配的附加信息，如“非授权频谱”的定义，即非授权频谱是指共享频谱，换句话说，不同通信系统中的通信设备只要满足国家或地区在该频谱上设置的法规要求，就可以使用该频谱，不需要向政府申请专有的频谱授权；后续还可以从知识图谱数据库中查询与术语“LBT类型”匹配的附加信息，如“LBT类型”的定义，即LBT类型包括三个类型LBT Category1(Cat1)：通信设备在非授权频谱上无需进行信道检测，切换空隙(switching gap)结束后立即传输；切换空隙不超过16μs。LBT Category2(Cat2)：通信设备在非授权频谱上进行信道检测；单次检测时间内信道空闲则可以进行信号发送，信道被占用则不能进行信号发送；单次检测时间为16us或25us。LBT Category3(Cat4)：通信设备在非授权频谱上进行信道检测；需要根据传输业务的优先级进一步确定进行信道检测的时长。For example, in a conference scenario where simultaneous interpretation is used, a user gives a speech on how terminals in the communication field perform uplink transmission. The speech content contains two terms "unlicensed spectrum" and "LBT type", and the server obtains the After the content of the speech, you can query additional information matching the term "unlicensed spectrum" from the knowledge graph database, such as the definition of "unlicensed spectrum", that is, unlicensed spectrum refers to shared spectrum, in other words, in different communication systems As long as the communication equipment of the country meets the regulatory requirements set by the country or region on the spectrum, the spectrum can be used, and there is no need to apply for a proprietary spectrum authorization from the government; you can also query the knowledge map database to match the term "LBT type" Additional information, such as the definition of "LBT type", that is, the LBT type includes three types. LBT Category1 (Cat1): The communication device does not need to perform channel detection on the unlicensed spectrum, and the switching gap (switching gap) is immediately transmitted after the end; the switching gap No more than 16μs. LBT Category2 (Cat2): The communication device performs channel detection on the unlicensed spectrum; within a single detection period, the channel is idle and the signal can be sent, and the channel is occupied, the signal cannot be sent; the single detection time is 16us or 25us. LBT Category3 (Cat4): The communication device performs channel detection on the unlicensed spectrum; the length of the channel detection needs to be further determined according to the priority of the transmission service.

需要说明的是，所述服务器获得所述识别文本后，后续可以基于所述识别文本从知识图谱数据库中搜索与所述音频数据的产生者的源语言对应的、且有助于所述音频数据的接收者理解所述音频数据的产生者的演讲内容的附加信息，如此，所述服务器不仅可以向所述音频数据的接收者提供演讲内容，还可以提供有助于理解所述演讲内容的附加信息，内容较丰富。It should be noted that after the server obtains the recognized text, it can subsequently search the knowledge graph database based on the recognized text for the source language of the producer of the audio data and contribute to the audio data. The recipient of the audio data understands the additional information of the speech content of the producer of the audio data. In this way, the server can not only provide the speech content to the recipient of the audio data, but also provide additional information that helps to understand the speech content. Information and content are richer.

运用语音识别技术对所述音频数据进行语音识别，获得识别文本；Use voice recognition technology to perform voice recognition on the audio data to obtain recognized text;

利用预设的翻译模型对所述识别文本进行翻译，得到翻译文本。The recognized text is translated using a preset translation model to obtain the translated text.

其中，所述翻译文本的语言与所述音频数据的接收者的语言即目标语言一致。所述翻译模型，用于将第一语种的文本翻译为至少一种第二语种的文本；所述第一语种与第二语种不同。Wherein, the language of the translated text is consistent with the language of the receiver of the audio data, that is, the target language. The translation model is used to translate a text in a first language into at least one text in a second language; the first language is different from the second language.

举例来说，在应用同传的会议场景下，用户针对通信领域中终端如何进行上行传输进行演讲，演讲内容中包含两个术语“NRU”和“LBT Category”，所述服务器获得所述演讲内容后，按照目标语言对演讲内容进行翻译，得到翻译文本，后续可以从知识图谱数据库中查询与翻译文本中“非授权频谱”匹配的附加信息，如“非授权频谱”的定义，即非授权频谱是指共享频谱，换句话说，不同通信系统中的通信设备只要满足国家或地区在该频谱上设置的法规要求，就可以使用该频谱，不需要向政府申请专有的频谱授权；后续还可以从知识图谱数据库中查询与翻译文本中“LBT类型”匹配的附加信息，如“LBT类型”的定义，即LBT类型包括：类型LBT Category1(Cat1)：通信设备在非授权频谱上无需进行信道检测，切换空隙(switching gap)结束后立即传输；切换空隙不超过16μs。LBT Category2(Cat2)：通信设备在非授权频谱上进行信道检测；单次检测时间内信道空闲则可以进行信号发送，信道被占用则不能进行信号发送；单次检测时间为16us或25us。LBT Category3(Cat4)：通信设备在非授权频谱上进行信道检测；需要根据传输业务的优先级进一步确定进行信道检测的时长。For example, in a conference scenario where simultaneous interpretation is used, a user gives a speech on how the terminal in the communication field performs uplink transmission. The speech content contains two terms "NRU" and "LBT Category", and the server obtains the speech content After that, the speech content is translated according to the target language to obtain the translated text, and then the additional information that matches the "unlicensed spectrum" in the translated text can be queried from the knowledge graph database, such as the definition of "unlicensed spectrum", that is, unlicensed spectrum It refers to the shared spectrum. In other words, the communication devices in different communication systems can use the spectrum as long as they meet the regulatory requirements set by the country or region on the spectrum, without applying for a proprietary spectrum authorization from the government; Query additional information that matches the "LBT type" in the translated text from the knowledge graph database, such as the definition of "LBT type", that is, the LBT type includes: Type LBT Category1 (Cat1): Communication equipment does not need to perform channel detection on unlicensed spectrum , The switching gap (switching gap) is immediately transmitted after the end; the switching gap does not exceed 16μs. LBT Category2 (Cat2): The communication device performs channel detection on the unlicensed spectrum; within a single detection period, the channel is idle and the signal can be sent, and the channel is occupied, the signal cannot be sent; the single detection time is 16us or 25us. LBT Category3 (Cat4): The communication device performs channel detection on the unlicensed spectrum; the length of the channel detection needs to be further determined according to the priority of the transmission service.

需要说明的是，所述服务器对所述识别文本进行翻译获得所述翻译文本后，后续可以基于所述翻译文本从知识图谱数据库中搜索与所述音频数据的接收者的目标语言对应的、且有助于所述音频数据的接收者理解所述音频数据的产生者的演讲内容的附加信息，如此，能够降低所述音频数据的接收者对所述演讲内容的理解难度。It should be noted that, after the server translates the recognized text to obtain the translated text, it can subsequently search the knowledge graph database for the target language corresponding to the recipient of the audio data based on the translated text, and The additional information that helps the receiver of the audio data understand the speech content of the producer of the audio data, so that the receiver of the audio data can reduce the difficulty of understanding the speech content.

步骤202：提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；Step 202: Extract at least two feature data of the recognized text, and search for additional information associated with each feature data of the at least two feature data from the knowledge graph database;

这里，附加信息可以是指能够帮助用户充分、准确理解演讲内容的补充说明信息。Here, the additional information may refer to supplementary explanatory information that can help the user fully and accurately understand the content of the speech.

为了能够扩大检索范围，所述知识图谱数据库支持根据与所述音频数据的产生者的源语言对应的识别文本，搜索有助于理解所述音频数据的产生者的演讲内容的附加信息。In order to be able to expand the search scope, the knowledge graph database supports searching for additional information that helps understand the speech content of the audio data producer based on the recognized text corresponding to the source language of the audio data producer.

在一实施例中，所述提取所述识别文本的至少两个特征数据，包括：In an embodiment, the extracting at least two feature data of the recognized text includes:

提取所述识别文本中的至少两个关键词；Extract at least two keywords in the recognized text;

针对所述至少两个关键词中每个关键词，对相应关键词进行实体识别，得到至少两个实体词语；For each of the at least two keywords, entity recognition is performed on the corresponding keywords to obtain at least two entity words;

将所述至少两个实体词语作为所述识别文本的至少两个特征数据。The at least two entity words are used as at least two feature data of the recognized text.

其中，实体识别是指识别文本中具有特定意义的实体词语，例如人名、地名、时间、机构名、事件名、专业术语等实体。Among them, entity recognition refers to the recognition of entity words with specific meanings in the text, such as entity names, place names, time, organization names, event names, and professional terms.

这里，也可以提取具有特定意义的至少两个实体短语，这里不做限定。Here, at least two entity phrases with specific meaning can also be extracted, which is not limited here.

具体来说，所述服务器可以利用关键词抽取技术从所述识别文本中进行关键词抽取，得到至少两个关键词；还可以将所述第一文本转换为多个比特数据；从多个比特数据中查找比特数满足预设阈值的比特数据；将查找到的满足预设条件的比特数据作为所述关键词。Specifically, the server may use keyword extraction technology to extract keywords from the recognized text to obtain at least two keywords; it may also convert the first text into multiple bit data; Search for bit data whose number of bits meets a preset threshold in the data; use the found bit data that meets the preset condition as the keyword.

下面对所述至少两个关键词进行实体识别的具体情况进行说明。The specific situation of entity recognition of the at least two keywords will be described below.

第一种情况，利用预设规则，对所述至少两个关键词进行实体识别，得到至少两个实体词语。In the first case, using preset rules, entity recognition is performed on the at least two keywords to obtain at least two entity words.

例如，利用正则表达式从所述至少两个关键词中搜索与预设字符串匹配的关键词，将搜索到的与预设字符串匹配的关键词作为实体词语。For example, a regular expression is used to search for a keyword matching a preset character string from the at least two keywords, and the searched keyword that matches the preset character string is used as an entity word.

第二种情况，利用分类模型，对所述至少两个关键词语进行实体识别，得到具有特定意义的至少两个实体词语。In the second case, the classification model is used to perform entity recognition on the at least two key words to obtain at least two entity words with specific meanings.

例如，利用神经网络模型，对所述至少两个关键词语进行输入到输出的映射，得到识别结果；当所述识别结果表征相应关键词具备特定意义时，将相应关键词作为实体词语。For example, a neural network model is used to map the input to the output of the at least two key words to obtain a recognition result; when the recognition result indicates that the corresponding keyword has a specific meaning, the corresponding keyword is used as an entity word.

第三种情况，利用序列标注模型，对所述至少两个关键词进行实体识别，得到至少两个实体词语。In the third case, a sequence labeling model is used to perform entity recognition on the at least two keywords to obtain at least two entity words.

例如，利用语义分析技术对所述至少关键词的语义进行分析，得到相关关键词的语义信息，按照得到的语义信息对所述至少两个关键词的词性进行标注，从标注的关键词中选取具有特定意义的关键词作为实体词语。For example, using semantic analysis technology to analyze the semantics of the at least keywords to obtain semantic information of related keywords, mark the parts of speech of the at least two keywords according to the obtained semantic information, and select from the marked keywords Keywords with specific meanings are regarded as entity words.

在一示例中，以实体词语为例，描述服务器提取所述识别文本的至少两个特征数据的实现流程示意图，如图3所示，包括：In an example, taking entity words as an example, a schematic diagram describing the implementation process of extracting at least two feature data of the recognized text by the server, as shown in FIG. 3, includes:

步骤1：提取所述识别文本中的至少两个关键词。Step 1: Extract at least two keywords in the recognized text.

假设所述识别文本为“2019年9月23日，华为总裁任正非关于美国针对孟晚舟事件在北京接收德国电视台主持人的采访”。Assume that the recognized text is "September 23, 2019, Huawei President Ren Zhengfei's interview with the host of a German TV station in Beijing by the United States regarding the Meng Wanzhou incident".

所述至少两个关键词可以为“2019、华为、任正非、美国、孟晚舟、事件、北京”；The at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing";

步骤2：对所述至少两个关键词进行实体识别，得到至少两个实体词语。Step 2: Perform entity recognition on the at least two keywords to obtain at least two entity words.

所述至少两个实体词语可以为“任正非、华为、孟晚舟”。The at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou".

这里，可以将所述至少两个实体词语作为所述识别文本的至少两个特征数据，后续可以从所述知识图谱数据库中搜索有助于理解所述音频数据的产生者的演讲内容的附加信息，帮助所述音频数据的接收者快速理解所述演讲内容。Here, the at least two entity words may be used as the at least two feature data of the recognized text, and the knowledge graph database may be subsequently searched for additional information that helps to understand the speech content of the creator of the audio data To help the receiver of the audio data quickly understand the content of the speech.

在一示例中，以实体词语为例，描述服务器提取所述翻译文本的至少两个特征数据的实现流程示意图，如图4所示，包括：In an example, taking entity words as an example, a schematic diagram describing the implementation process of extracting at least two feature data of the translated text by the description server, as shown in FIG. 4, includes:

步骤1：对所述识别文本进行翻译，得到翻译文本。Step 1: Translate the recognized text to obtain the translated text.

按照所述音频数据的接收者的目标语言对所述识别文本进行翻译，得到翻译文本。The recognized text is translated according to the target language of the receiver of the audio data to obtain the translated text.

步骤2：提取所述识别文本中的至少两个关键词。Step 2: Extract at least two keywords in the recognized text.

步骤3：对所述至少两个关键词进行实体识别，得到至少两个实体词语。Step 3: Perform entity recognition on the at least two keywords to obtain at least two entity words.

在一实施例中，提取所述至少两个特征数据时，所述方法还包括：In an embodiment, when extracting the at least two feature data, the method further includes:

针对所述至少两个实体词语中每个实体词语，基于预设规则和预设神经网络模型，提取所述识别文本中与相应实体词语关联的事件相关信息，得到至少两个事件相关信息；For each entity word in the at least two entity words, based on a preset rule and a preset neural network model, extract event-related information associated with the corresponding entity word in the recognized text to obtain at least two event-related information;

将所述至少两个实体词语和至少两个事件相关信息作为所述识别文本的至少两个特征数据。The at least two entity words and at least two event-related information are used as at least two feature data of the recognized text.

其中，所述事件相关信息可以与所述至少两个实体词语中一个实体词语关联，也可以与所述至少两个实体词语中多个实体词语关联。Wherein, the event-related information may be associated with one entity word among the at least two entity words, or may be associated with multiple entity words among the at least two entity words.

实际应用时，若所述至少两个实体词语的数量少于预设阈值，则可以结合查找到的实体词语，利用预设规则、分类模型、序列标注模型，从所述识别文本中提取与相应实体词语关联的事件相关信息。In actual application, if the number of the at least two entity words is less than the preset threshold, the entity words found can be combined with preset rules, classification models, and sequence labeling models to extract and correspond from the recognized text. Event-related information associated with entity words.

下面对结合所述至少两个实体词语从所述识别文本中进行事件相关信息提取的具体情况进行说明。The specific situation of extracting event-related information from the recognized text in combination with the at least two entity words is described below.

第一种情况，利用预设规则，结合所述至少两个实体词语从所述识别文本中提取至少两个事件相关信息。In the first case, a preset rule is used to extract at least two event-related information from the recognized text in combination with the at least two entity words.

例如，利用正则表达式从所述识别文本中搜索包含预设字符串的第一文本，判断所述第一文本与所述至少两个实体词语中每个实体词语之间是否具有关联性，当所述第一文本与所述至少两个实体词语中某个实体词语具有关联性时，将所述第一文本对应的文本信息作为与该实体词语关联的事件相关信息。其中，预设字符串对应的文本可以为“事件”。For example, a regular expression is used to search for a first text containing a preset character string from the recognized text, to determine whether the first text is related to each of the at least two entity words, when When the first text is associated with a certain entity word of the at least two entity words, the text information corresponding to the first text is used as the event-related information associated with the entity word. Among them, the text corresponding to the preset character string may be "event".

第二种情况，利用分类模型，结合所述至少两个实体词语从所述识别文本中提取至少两个事件相关信息。In the second case, a classification model is used to extract at least two event-related information from the recognized text in combination with the at least two entity words.

例如，利用神经网络模型，对所述识别文本进行输入到输出的映射，得到识别结果；当所述识别结果表征所述识别文本中包含预设字符串的第一文本时，判断所述第一文本与所述至少两个实体词语中每个实体词语之间是否具有关联性，当所述第一文本与所述至少两个实体词语中某个实体词语具有关联性时，将所述第一文本对应的文本信息作为与该实体词语关联的事件相关信息。For example, a neural network model is used to map the recognized text from input to output to obtain the recognition result; when the recognition result represents the first text containing a preset character string in the recognized text, it is determined that the first Whether the text is associated with each of the at least two entity words, when the first text is associated with an entity word in the at least two entity words, the first The text information corresponding to the text serves as the event-related information associated with the entity word.

在一示例中，以实体词语和事件相关信息为例，描述服务器提取所述识别文本的至少两个特征数据的实现流程示意图，如图5所示，包括：In an example, taking entity words and event-related information as examples, a schematic diagram describing the implementation process of extracting at least two characteristic data of the recognized text by the server, as shown in FIG. 5, includes:

步骤3：利用所述至少两个实体词语，提取所述识别文本中与相应实体词语关联的事件相关信息，得到至少两个事件相关信息。Step 3: Use the at least two entity words to extract event-related information associated with the corresponding entity words in the recognized text to obtain at least two event-related information.

针对实体词语“任正非”，从所述识别文本中提取的事件相关信息为“孟晚舟事件”；针对实体词语“华为”，从所述识别文本中提取的事件相关信息为“孟晚舟事件”；针对实体词语“孟晚舟”，从所述识别文本中提取的事件相关信息为“孟晚舟事件”。For the entity word "Ren Zhengfei", the event-related information extracted from the recognized text is "Meng Wanzhou event"; for the entity word "Huawei", the event-related information extracted from the recognized text is "Meng Wanzhou event" "; For the entity word "Meng Wanzhou", the event-related information extracted from the recognized text is "Meng Wanzhou event".

这里，可以将所述至少两个实体词语和所述至少两个事件相关信息作为所述识别文本的至少两个特征数据，后续可以从所述知识图谱数据库中搜索有助于理解所述音频数据的产生者的演讲内容的附加信息，帮助所述音频数据的接收者理解所述演讲内容的背景。Here, the at least two entity words and the at least two event-related information may be used as at least two feature data of the recognized text, and subsequent searches may be made from the knowledge graph database to help understand the audio data The additional information of the speech content of the producer helps the receiver of the audio data to understand the background of the speech content.

在一示例中，以实体词语和事件相关信息为例，描述服务器提取所述翻译文本的至少两个特征数据的实现流程示意图，如图6所示，包括：In an example, taking entity words and event-related information as an example, a schematic diagram describing the implementation process of extracting at least two feature data of the translated text by the server, as shown in FIG. 6, includes:

步骤2：提取所述翻译文本中的至少两个关键词。Step 2: Extract at least two keywords in the translated text.

假设所述翻译文本为“2019年9月23日，华为总裁任正非关于美国针对孟晚舟事件在北京接收德国电视台主持人的采访”。Assume that the translated text is "September 23, 2019, Huawei President Ren Zhengfei's interview with the host of a German TV station in Beijing by the United States regarding the Meng Wanzhou incident".

步骤4：利用所述至少两个实体词语，提取所述翻译文本中与相应实体词语关联的事件相关信息，得到至少两个事件相关信息。Step 4: Use the at least two entity words to extract event-related information associated with the corresponding entity words in the translated text to obtain at least two event-related information.

在一实施例中，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息，包括：In an embodiment, searching for additional information associated with each of the at least two feature data from the knowledge graph database includes:

针对所述至少两个特征数据中每个特征数据，确定相应特征数据对应的第一索引标识；For each feature data in the at least two feature data, determine the first index identifier corresponding to the corresponding feature data;

从知识图谱数据库查找与所述第一索引标识对应的至少两个知识节点；所述知识图谱数据库中存储有索引标识与知识节点对应的关系；Searching for at least two knowledge nodes corresponding to the first index identifier from a knowledge graph database; the knowledge graph database stores the relationship between the index identifier and the knowledge node;

结合所述识别文本的语境，从所述至少两个知识节点中排除满足预设条件的第一知识节点，以确定出与所述识别文本的语境匹配的至少两个第二知识节点；Combining the context of the recognized text, exclude from the at least two knowledge nodes a first knowledge node that meets a preset condition, so as to determine at least two second knowledge nodes that match the context of the recognized text;

获取所述至少两个第二知识节点对应的附加信息。Acquiring additional information corresponding to the at least two second knowledge nodes.

其中，所述语境是指所述识别文本的上下文。Wherein, the context refers to the context of the recognized text.

在一示例中，以至少两个实体词语为例，描述服务器搜索所述识别文本对应的附加信息的实现流程示意图，如图7所示，包括：In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text, as shown in FIG. 7, including:

步骤1：确定所述至少两个实体词语各自对应的第一索引标识。Step 1: Determine the first index identifier corresponding to each of the at least two entity words.

假设所述至少两个实体词语为“任正非、华为、孟晚舟”，对应的第一索引标识分别为01、02、03。Assuming that the at least two entity words are "Ren Zhengfei, Huawei, and Meng Wanzhou", the corresponding first index identifiers are 01, 02, and 03, respectively.

步骤2：利用所述第一索引标识，从知识图谱数据库中查找与所述至少两个实体词语对应的至少两个知识节点。Step 2: Use the first index identifier to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.

与01对应的知识节点的节点标识为A、B；The node identifiers of the knowledge nodes corresponding to 01 are A and B;

与02对应的知识节点的节点标识为C；The node identifier of the knowledge node corresponding to 02 is C;

与03对应的知识节点的节点标识为D、E。The node identifiers of the knowledge nodes corresponding to 03 are D and E.

步骤3：结合所述识别文本的上下文，对所述至少两个知识节点对应的知识信息进行消歧，得到与所述识别文本的上下文匹配的至少两个第二知识节点。Step 3: Combining the context of the recognized text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.

对所述至少两个知识节点对应的知识信息进行消歧，得到的至少两个第二知识节点为：Disambiguation is performed on the knowledge information corresponding to the at least two knowledge nodes, and the obtained at least two second knowledge nodes are:

与01对应的知识节点的节点标识为B；The node identifier of the knowledge node corresponding to 01 is B;

与03对应的知识节点的节点标识为E。The node identifier of the knowledge node corresponding to 03 is E.

步骤4：获取所述至少两个第二知识节点对应的知识信息。Step 4: Acquire knowledge information corresponding to the at least two second knowledge nodes.

附加信息可以为所述至少两个第二知识节点对应的知识信息。The additional information may be knowledge information corresponding to the at least two second knowledge nodes.

在一示例中，以至少两个实体词语为例，描述服务器搜索所述翻译文本对应的附加信息的实现流程示意图，如图8所示，包括：In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for additional information corresponding to the translated text, as shown in FIG. 8, including:

步骤3：结合所述翻译文本的上下文，对所述至少两个知识节点对应的知识信息进行消歧，得到与所述翻译文本的上下文匹配的至少两个第二知识节点。Step 3: Combining the context of the translated text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the translated text.

在一实施例中，所述利用所述至少两个第二知识节点，得到附加信息，包括：In an embodiment, the using the at least two second knowledge nodes to obtain additional information includes:

利用所述至少两个第二知识节点，得到至少两个知识信息；Obtain at least two pieces of knowledge information by using the at least two second knowledge nodes;

按照重要性等级对所述至少两个知识信息进行排序，得到排序结果；Sorting the at least two pieces of knowledge information according to the importance level to obtain a sorting result;

从所述排序结果中选取重要性等级大于预设等级阈值的知识信息；Selecting knowledge information whose importance level is greater than a preset level threshold from the ranking result;

将选取的知识信息作为所述附加信息。The selected knowledge information is used as the additional information.

其中，重要性等级可以是按照用户对知识信息的历史访问次数确定的，例如，若用户对某个知识信息的历史访问次数在0与1000之间，则该知识信息对应的重要性等级为第三级别，表征最低级；若用户对某个知识信息的历史访问次数在1000至2000之间，则该知识信息对应的重要性等级为第二级别；若用户对某个知识信息的历史访问次数在2000至5000之间，则该知识信息对应的重要性等级为第一级别，表征最高级。Among them, the importance level can be determined according to the user's historical access times to the knowledge information. For example, if the user's historical access times to a certain knowledge information is between 0 and 1000, the importance level corresponding to the knowledge information is the first Three levels, the lowest level of characterization; if the user’s historical access times to a certain knowledge information is between 1,000 and 2,000, the importance level corresponding to the knowledge information is the second level; if the user’s historical access times to a certain knowledge information Between 2000 and 5000, the importance level corresponding to the knowledge information is the first level, which represents the highest level.

在一示例中，以至少两个实体词语为例，描述服务器搜索所述识别文本对应的附加信息的实现流程示意图，如图9所示，包括：In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text, as shown in FIG. 9, including:

步骤1：利用所述至少两个实体词语，从知识图谱数据库中查找与所述至少两个实体词语对应的至少两个知识节点。Step 1: Use the at least two entity words to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.

假设所述至少两个实体词语为“任正非、华为、孟晚舟”，对应的第一索引标识分别为01、02、03。与01对应的知识节点的节点标识为A、B；与02对应的知识节点的节点标识为C；与03对应的知识节点的节点标识为D、E。Assuming that the at least two entity words are "Ren Zhengfei, Huawei, and Meng Wanzhou", the corresponding first index identifiers are 01, 02, and 03, respectively. The node ID of the knowledge node corresponding to 01 is A and B; the node ID of the knowledge node corresponding to 02 is C; the node ID of the knowledge node corresponding to 03 is D, E.

步骤2：结合所述识别文本的上下文，对所述至少两个知识节点对应的知识信息进行消歧，得到与所述识别文本的上下文匹配的至少两个第二知识节点。Step 2: Combining the context of the recognized text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.

所述至少两个第二知识节点为：与01对应的知识节点的节点标识为B；与02对应的知识节点的节点标识为C；与03对应的知识节点的节点标识为E。The at least two second knowledge nodes are: the node identifier of the knowledge node corresponding to 01 is B; the node identifier of the knowledge node corresponding to 02 is C; and the node identifier of the knowledge node corresponding to 03 is E.

步骤3：按照重要性等级对所述至少两个第二知识节点对应的知识信息进行排序，将重要性等级在第二级别以上的知识节点的知识信息作为附加信息。Step 3: Sort the knowledge information corresponding to the at least two second knowledge nodes according to the importance level, and use the knowledge information of the knowledge nodes with the importance level above the second level as additional information.

假设节点标识为B的知识节点的重要性等级为第二级别，节点标识为C的知识节点的重要性等级为第二级别，节点标识为E的知识节点的重要性等级为第三级别，则可以将重要性等级在第二级别以上的节点标识为B和C对应的知识节点的知识信息作为附加信息。Assuming that the importance level of the knowledge node with the node identification B is the second level, the importance level of the knowledge node with the node identification C is the second level, and the importance level of the knowledge node with the node identification E is the third level, then The knowledge information of nodes whose importance level is above the second level can be identified as knowledge nodes corresponding to B and C as additional information.

在一示例中，以至少两个实体词语为例，描述服务器搜索所述翻译文本对应的附加信息的实现流程示意图，如图10所示，包括：In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for additional information corresponding to the translated text, as shown in FIG. 10, including:

步骤2：结合所述翻译文本的上下文，对所述至少两个知识节点对应的知识信息进行消歧，得到与所述识别文本的上下文匹配的至少两个第二知识节点。Step 2: Combining the context of the translated text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.

需要说明的是，从搜索到的至少两个知识节点对应的知识信息中剔除掉重要性等级较低的知识信息，能够避免向所述音频数据的接收者提供冗余的附加信息。It should be noted that removing knowledge information with a lower importance level from the knowledge information corresponding to at least two knowledge nodes found can avoid providing redundant additional information to the receiver of the audio data.

所述知识图谱数据库不仅支持根据与所述音频数据的产生者的源语言对应的识别文本，搜索有助于理解所述音频数据的产生者的演讲内容的附加信息；还支持根据与所述音频数据的接收者的目标语言对应的翻译文本，搜索有助于理解所述音频数据的产生者的演讲内容的附加信息。The knowledge graph database not only supports searching for additional information that helps to understand the speech content of the producer of the audio data based on the recognized text corresponding to the source language of the producer of the audio data; it also supports searching for additional information based on the source language of the producer of the audio data; The translated text corresponding to the target language of the recipient of the data is searched for additional information that helps to understand the content of the speech of the producer of the audio data.

这里，利用翻译文本从知识图谱数据库查找所述附加信息的过程与利用识别文本从知识图谱数据库查找所述附加信息的过程类似。Here, the process of using the translated text to find the additional information from the knowledge graph database is similar to the process of using the recognized text to find the additional information from the knowledge graph database.

步骤203：利用查找到的附加信息和所述识别文本，生成同传结果；输出所述同传结果；所述同传结果用于在播放音频数据时在第一终端进行呈现。Step 203: Use the found additional information and the recognized text to generate a simultaneous interpretation result; output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.

这里，所述同传结果用于在播放音频数据时在第一终端进行呈现，可以是指在播放音频数据的同时呈现所述同传结果，即所述数据处理方法可以应用于同声传译的场景。Here, the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played, which may mean that the simultaneous interpretation result is presented while the audio data is being played, that is, the data processing method can be applied to simultaneous interpretation. Scenes.

这里，当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时，利用查找到的附加信息和所述识别文本，生成同传结果。Here, when the language of the recognized text and the language of the recipient of the audio data belong to the same language, the additional information found and the recognized text are used to generate a simultaneous interpretation result.

这里，当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时，对所述识别结果进行翻译，得到翻译文本；利用查找到的附加信息和所述翻译文本，生成同传结果。Here, when the language of the recognized text and the language of the recipient of the audio data belong to different languages, the recognition result is translated to obtain the translated text; the additional information found and the translated text are used to generate the same Pass the result.

其中，所述附加信息可以是基于所述识别文本从所述知识图谱数据库中搜索到的，也可以是基于所述翻译文本从所述知识图谱数据库中搜索到的。Wherein, the additional information may be searched from the knowledge graph database based on the recognized text, or may be searched from the knowledge graph database based on the translated text.

在一实施例中，生成同传结果时，所述方法还包括：In an embodiment, when generating simultaneous interpretation results, the method further includes:

基于预设过滤规则，将所述附加信息中满足预设条件的第一信息进行排除；Based on a preset filtering rule, exclude the first information that meets a preset condition in the additional information;

利用所述附加信息中除所述第一信息外的第二信息和所述识别文本，生成同传结果。The second information except the first information in the additional information and the recognized text are used to generate a simultaneous interpretation result.

其中，预设过滤规则可以是所述附加信息所包含的字数大于预设字数阈值，例如，统计所述附加信息所包含的字数；当统计的字数超过100字时，删除所述附加信息中重要性等级较低的第一信息，保证附加信息的字数在100字以内。Wherein, the preset filtering rule may be that the number of words contained in the additional information is greater than a preset word number threshold, for example, counting the number of words contained in the additional information; when the counted number of words exceeds 100 words, deleting important information in the additional information The first information with a lower sex level ensures that the number of additional information is within 100 words.

实际应用时，所述服务器可以利用音频流的形式，利用所述附加信息和所述翻译文本，生成并输出同传结果。In actual application, the server may use the form of audio stream, use the additional information and the translated text, to generate and output the simultaneous interpretation result.

在一实施例中，所述输出所述同传结果，包括：In an embodiment, the outputting the simultaneous interpretation result includes:

对所述同传结果进行语音合成，得到同传音频数据；Perform speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data;

将所述同传音频数据发送至第一终端；所述同传音频数据用于供所述第一终端进行播放。The simultaneous interpretation audio data is sent to the first terminal; the simultaneous interpretation audio data is used for playback by the first terminal.

这里，可以通过所述第一终端的耳机播放所述同传音频数据，从而帮助使用所述第一终端的用户理解所述音频数据的产生者的演讲内容。Here, the simultaneous interpretation audio data may be played through the headset of the first terminal, so as to help the user who uses the first terminal understand the speech content of the producer of the audio data.

实际应用时，服务器还可以利用表格、图形、网页等结构化格式，基于所述附加信息和所述翻译文本，生成同传结果。In actual applications, the server can also use structured formats such as tables, graphics, and web pages to generate simultaneous interpretation results based on the additional information and the translated text.

在一实施例中，当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时，所述输出所述同传结果，包括：In an embodiment, when the language of the recognized text and the language of the recipient of the audio data belong to the same language, the output of the simultaneous interpretation result includes:

将所述同传结果发送至第一终端关联的显示屏幕；所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述识别文本展示在所述显示屏幕的第二展示框中。The simultaneous interpretation result is sent to the display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen, and the The recognized text is displayed in the second display box of the display screen.

在一实施例中，当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时，所述输出所述同传结果，包括：In an embodiment, when the language of the recognized text and the language of the recipient of the audio data belong to different languages, the output of the simultaneous interpretation result includes:

将所述同传结果发送至第一终端关联的显示屏幕；所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述识别文本对应的翻译文本展示在所述显示屏幕的第二展示框中。The simultaneous interpretation result is sent to the display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen, and the The translated text corresponding to the recognized text is displayed in the second display box of the display screen.

这里，可以通过所述第一终端的显示屏幕所述同传音频数据，从而帮助使用所述第一终端的用户理解使用所述音频数据的产生者的演讲内容。Here, the simultaneous interpretation of the audio data may be performed through the display screen of the first terminal, so as to help the user who uses the first terminal to understand the speech content of the producer who uses the audio data.

图11是服务器输出同传结果的示意图，如图11所示，所述第一终端可以将所述同传结果中所述附加信息在所述第一终端关联的显示屏幕的上方位置对应的第一展示框中进行展示，所述上方位置为正上方居中、上方居中靠右、上方居中靠左等；当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时，所述第一终端可以将所述同传结果中所述识别文本在所述显示屏幕的下方位置对应的第二展示框中进行展示，所述下方位置为正下方居中、下方居中靠右、下方居中靠左等；展示的方式可以包括以下至少之一：图片、多媒体、文本框、富文本框。FIG. 11 is a schematic diagram of the server outputting the simultaneous interpretation result. As shown in FIG. 11, the first terminal may place the additional information in the simultaneous interpretation result in the first terminal corresponding to the position above the display screen associated with the first terminal. For display in a display box, the top position is directly above the center, top center and right, top center and left, etc.; when the language of the recognized text and the language of the receiver of the audio data belong to the same language, all The first terminal may display the recognized text in the simultaneous interpretation result in a second display box corresponding to a lower position of the display screen, where the lower position is centered directly below, centered below to the right, centered below Lean to the left, etc.; the display mode can include at least one of the following: pictures, multimedia, text boxes, and rich text boxes.

在一示例中，以基于识别文本搜索附加信息为例，描述服务器生成并输出同传结果的实现流程示意图，如图12所示，包括：In an example, taking the search for additional information based on the recognized text as an example, a schematic diagram of the implementation process of the server generating and outputting simultaneous interpretation results is described, as shown in Figure 12, including:

步骤1：客户端采集演讲者的音频数据，并发送给服务器。Step 1: The client terminal collects the speaker's audio data and sends it to the server.

在应用同传的会议场景中，客户端通过麦克风对演讲者的演讲内容进行采集，得到音频数据，并发送给服务器。In a conference scenario where simultaneous interpretation is used, the client uses a microphone to collect the content of the lecturer's speech, obtain audio data, and send it to the server.

步骤2：服务器对所述音频数据进行语音识别，得到与源语言对应的识别文本。Step 2: The server performs voice recognition on the audio data to obtain recognized text corresponding to the source language.

步骤3：服务器基于所述识别文本从知识图谱数据库中搜索与所述识别文本匹配的附加信息。Step 3: The server searches the knowledge graph database for additional information matching the recognized text based on the recognized text.

服务器提取所述识别文本中至少两个实体词语；从知识图谱数据库查找与所述至少两个实体词语对应的知识节点；在查找的过程中，结合所述识别文本的语境，从所述至少两个知识节点中排除满足预设条件的第一知识节点，以确定出与所述识别文本的语境匹配的至少两个第二知识节点，获取所述至少两个第二知识节点对应的附加信息。The server extracts at least two entity words in the recognized text; searches for knowledge nodes corresponding to the at least two entity words from the knowledge graph database; in the search process, combining the context of the recognized text, from the at least The first knowledge node that meets the preset condition is excluded from the two knowledge nodes, so as to determine at least two second knowledge nodes that match the context of the recognized text, and obtain additional corresponding to the at least two second knowledge nodes. information.

这里，所述附加信息具体可以是实体词/短语，即具有对难理解内容进行补充说明的额外信息，所述难理解的内容可以包括技术术语、人物名、引用事件名等；也可以是实体词/短语的解释信息，如术语的定义、举例信息、人物的介绍、说明信息、事件的要素和相关影响信息等。Here, the additional information may specifically be entity words/phrases, that is, additional information that supplements difficult-to-understand content. The difficult-to-understand content may include technical terms, person names, quoted event names, etc.; it may also be an entity Explanation of words/phrases, such as definitions of terms, example information, introduction of characters, explanatory information, elements of events, and related impact information, etc.

步骤4：服务器基于所述附加信息和所述识别文本，生成同传结果。Step 4: The server generates a simultaneous interpretation result based on the additional information and the recognized text.

当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时，利用查找到的附加信息和所述识别文本，生成同传结果。When the language of the recognized text and the language of the receiver of the audio data belong to the same language, the additional information found and the recognized text are used to generate a simultaneous interpretation result.

步骤5：服务器对所述同传结果进行语音合成，得到同传音频数据。Step 5: The server performs speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data.

这里，服务器将生成的同传音频数据发送至第一终端，所述第一终端通过耳机播报所述同传音频数据。Here, the server sends the generated simultaneous interpretation audio data to the first terminal, and the first terminal broadcasts the simultaneous interpretation audio data through a headset.

步骤6：服务器将所述同传结果发送至第一终端关联的显示屏幕。Step 6: The server sends the simultaneous interpretation result to the display screen associated with the first terminal.

当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时，所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述识别文本展示在所述显示屏幕的第二展示框中；When the language of the recognized text and the language of the recipient of the audio data belong to the same language, the first terminal displays the additional information in the first display frame of the display screen, and displays the recognized text The text is displayed in the second display box of the display screen;

当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时，所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述识别文本对应的翻译文本展示在所述显示屏幕的第二展示框中。When the language of the recognized text and the language of the receiver of the audio data belong to different languages, the first terminal displays the additional information in the first display frame of the display screen, and displays the recognition The translated text corresponding to the text is displayed in the second display box of the display screen.

在一示例中，以基于翻译文本搜索附加信息为例，描述服务器生成并输出同传结果的实现流程示意图，如图13所示，包括：In an example, taking the search for additional information based on translated text as an example, a schematic diagram of the implementation process of the server generating and outputting simultaneous interpretation results is described, as shown in Figure 13, including:

步骤3：服务器对所述识别文本进行翻译，得到翻译文本。Step 3: The server translates the recognized text to obtain the translated text.

步骤4：服务器基于所述翻译文本从知识图谱数据库中搜索与所述识别文本匹配的附加信息。Step 4: The server searches the knowledge graph database for additional information matching the recognized text based on the translated text.

服务器提取所述翻译文本中至少两个实体词语；从知识图谱数据库查找与所述至少两个实体词语对应的知识节点；在查找的过程中，结合所述识别文本的语境，从所述至少两个知识节点中排除满足预设条件的第一知识节点，以确定出与所述识别文本的语境匹配的至少两个第二知识节点，获取所述至少两个第二知识节点对应的附加信息。The server extracts at least two entity words in the translated text; searches for knowledge nodes corresponding to the at least two entity words from the knowledge graph database; in the search process, combining the context of the recognized text, from the at least The first knowledge node that meets the preset condition is excluded from the two knowledge nodes, so as to determine at least two second knowledge nodes that match the context of the recognized text, and obtain additional corresponding to the at least two second knowledge nodes. information.

步骤5：服务器基于所述附加信息和所述翻译文本，生成同传结果。Step 5: The server generates a simultaneous interpretation result based on the additional information and the translated text.

步骤6：服务器对所述同传结果进行语音合成，得到同传音频数据。Step 6: The server performs speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data.

步骤7：服务器将所述同传结果发送至第一终端关联的显示屏幕。Step 7: The server sends the simultaneous interpretation result to the display screen associated with the first terminal.

所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述翻译文本展示在所述显示屏幕的第二展示框中。The first terminal displays the additional information in a first display frame of the display screen, and displays the translated text in a second display frame of the display screen.

应理解，上述实施例中说明各步骤的顺序并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the order of the steps described in the above embodiments does not mean the order of execution. The order of execution of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

本申请实施例提供的数据处理方法、装置和存储介质，获取音频数据；对所述音频数据进行识别，得到识别文本；提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；利用查找到的附加信息和所述识别文本，生成同传结果；输出所述同传结果，所述同传结果用于在播放所述音频数据时在第一终端进行呈现，为用户提供所述音频数据的产生者的演讲内容的附加信息，能够帮助用户充分、准确理解所述音频数据的产生者的演讲内容，降低用户对所述音频数据的产生者的演讲内容的理解难度。The data processing method, device, and storage medium provided by the embodiments of the application obtain audio data; recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search and search from the knowledge graph database Additional information associated with each feature data in the at least two feature data; use the additional information found and the recognition text to generate a simultaneous interpretation result; output the simultaneous interpretation result, which is used in the playback When the audio data is presented in the first terminal, the user is provided with additional information of the speech content of the producer of the audio data, which can help the user to fully and accurately understand the speech content of the producer of the audio data, and reduce the user’s dissatisfaction. The difficulty of comprehension of the speech content of the producer of the audio data.

为实现本申请实施例的数据处理方法，本申请实施例还提供了一种数据处理装置。图14为本申请实施例的数据处理装置的组成结构示意图；如图14所示，所述数据处理装置包括：In order to implement the data processing method of the embodiment of the present application, the embodiment of the present application also provides a data processing device. FIG. 14 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 14, the data processing device includes:

获取单元141，配置为获取音频数据；The obtaining unit 141 is configured to obtain audio data;

第一处理单元142，配置为对所述音频数据进行识别，得到识别文本；提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；The first processing unit 142 is configured to recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search for each feature of the at least two feature data from the knowledge graph database. Additional information associated with the data;

第二处理单元143，配置为利用查找到的附加信息和所述识别文本，生成同传结果；The second processing unit 143 is configured to use the found additional information and the recognized text to generate a simultaneous interpretation result;

输出单元144，配置为输出所述同传结果；所述同传结果用于在播放所述音频数据时在第一终端进行呈现。The output unit 144 is configured to output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.

在一实施例中，第一处理单元142，配置为提取所述识别文本中的至少两个关键词语；针对所述至少两个关键词语中每个关键词语，对相应关键词语进行实体识别，得到至少两个实体词语；将所述至少两个实体词语作为所述识别文本的至少两个特征数据。In one embodiment, the first processing unit 142 is configured to extract at least two key words in the recognized text; for each key word in the at least two key words, perform entity recognition on the corresponding key word to obtain At least two entity words; use the at least two entity words as at least two feature data of the recognized text.

在一实施例中，第一处理单元142，配置为针对所述至少两个实体词语中每个实体词语，基于预设规则和预设神经网络模型，提取所述识别文本中与相应实体词语关联的事件相关信息，得到至少两个事件相关信息；将所述至少两个实体词语和至少两个事件相关信息作为所述识别文本的至少两个特征数据。In an embodiment, the first processing unit 142 is configured to extract, for each entity word of the at least two entity words, based on a preset rule and a preset neural network model, the recognition text is associated with the corresponding entity word At least two event-related information are obtained from the event-related information of the, and the at least two entity words and the at least two event-related information are used as at least two feature data of the recognized text.

在一实施例中，第一处理单元142，配置为对所述识别文本进行翻译，得到翻译文本。In an embodiment, the first processing unit 142 is configured to translate the recognized text to obtain the translated text.

在一实施例中，第一处理单元142，配置为提取所述翻译文本中的至少两个关键词语；针对所述至少两个关键词语中每个关键词语，对相应关键词语进行实体识别，得到至少两个实体词语；将所述至少两个实体词语作为所述识别文本的至少两个特征数据。In an embodiment, the first processing unit 142 is configured to extract at least two key words in the translated text; for each key word in the at least two key words, perform entity recognition on the corresponding key word to obtain At least two entity words; use the at least two entity words as at least two feature data of the recognized text.

在一实施例中，第一处理单元142，配置为针对所述至少两个实体词语中每个实体词语，基于预设规则和预设神经网络模型，提取所述翻译文本中与相应实体词语关联的事件相关信息，得到至少两个事件相关信息；将所述至少两个实体词语和至少两个事件相关信息作为所述识别文本的至少两个特征数据。In an embodiment, the first processing unit 142 is configured to extract, for each entity word of the at least two entity words, based on a preset rule and a preset neural network model, the translation text is associated with the corresponding entity word At least two event-related information are obtained from the event-related information of the, and the at least two entity words and the at least two event-related information are used as at least two feature data of the recognized text.

在一实施例中，第一处理单元142，配置为针对所述至少两个特征数据中每个特征数据，确定相应特征数据对应的第一索引标识；从知识图谱数据库查找与所述第一索引标识对应的至少两个知识节点；所述知识图谱数据库中存储有索引标识与知识节点对应的关系；结合所述识别文本的语境，从所述至少两个知识节点中排除满足预设条件的第一知识节点，以确定出与所述识别文本的语境匹配的至少两个第二知识节点；获取所述至少两个第二知识节点对应的附加信息。In an embodiment, the first processing unit 142 is configured to determine, for each feature data of the at least two feature data, a first index identifier corresponding to the corresponding feature data; and search for the first index identifier corresponding to the knowledge graph database. Identify the corresponding at least two knowledge nodes; the knowledge graph database stores the relationship between the index identifier and the knowledge node; in combination with the context of the recognized text, exclude from the at least two knowledge nodes those that meet preset conditions The first knowledge node determines at least two second knowledge nodes that match the context of the recognized text; and obtains additional information corresponding to the at least two second knowledge nodes.

在一实施例中，第一处理单元142，配置为利用所述至少两个第二知识节点，得到至少两个知识信息；按照重要性等级对所述至少两个知识信息进行排序，得到排序结果；从所述排序结果中选取重要性等级大于预设等级阈值的知识信息；将选取的知识信息作为所述附加信息。In an embodiment, the first processing unit 142 is configured to use the at least two second knowledge nodes to obtain at least two pieces of knowledge information; to sort the at least two pieces of knowledge information according to the importance level to obtain the sorting result ; Select knowledge information with an importance level greater than a preset level threshold from the ranking result; use the selected knowledge information as the additional information.

在一实施例中，第一处理单元142，配置为基于预设过滤规则，将所述附加信息中满足预设条件的第一信息进行排除；利用所述附加信息中除所述第一信息外的第二信息和所述识别文本，生成同传结果。In an embodiment, the first processing unit 142 is configured to exclude first information that meets a preset condition from the additional information based on a preset filtering rule; use the additional information in addition to the first information The second information of and the recognized text generate a simultaneous interpretation result.

在一实施例中，输出单元144，配置为对所述同传结果进行语音合成，得到同传音频数据；将所述同传音频数据发送至第一终端；所述同传音频数据用于供所述第一终端进行播放。In an embodiment, the output unit 144 is configured to perform speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data; send the simultaneous interpretation audio data to the first terminal; the simultaneous interpretation audio data is used for supply The first terminal plays.

在一实施例中，输出单元144，配置为将所述同传结果发送至第一终端关联的显示屏幕；所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述识别文本展示在所述显示屏幕的第二展示框中。In an embodiment, the output unit 144 is configured to send the simultaneous interpretation result to a display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information on the The first display box of the display screen is displayed, and the recognized text is displayed in the second display box of the display screen.

在一实施例中，输出单元144，配置为将所述同传结果发送至第一终端关联的显示屏幕；所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中，并将所述识别文本对应的翻译文本展示在所述显示屏幕的第二展示框中。In an embodiment, the output unit 144 is configured to send the simultaneous interpretation result to a display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information on the The first display box of the display screen is displayed, and the translated text corresponding to the recognized text is displayed in the second display box of the display screen.

实际应用时，所述获取单元141、输出单元144可通过通信接口实现；所述第一处理单元142、所述第二处理单元143均可由所述装置中的处理器实现。In practical applications, the acquisition unit 141 and the output unit 144 can be implemented through a communication interface; the first processing unit 142 and the second processing unit 143 can both be implemented by a processor in the device.

需要说明的是：上述实施例提供的装置在进行数据处理时，仅以上述各程序模块的划分进行举例说明，实际应用中，可以根据需要而将上述处理分配由不同的程序模块完成，即将终端的内部结构划分成不同的程序模块，以完成以上描述的全部或者部分处理。另外，上述实施例提供的装置与数据处理方法实施例属于同一构思，其具体实现过程详见方法实施例，这里不再赘述。It should be noted that when the device provided in the above embodiment performs data processing, only the division of the above-mentioned program modules is used as an example. In practical applications, the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal The internal structure is divided into different program modules to complete all or part of the processing described above. In addition, the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.

基于上述设备的硬件实现，本申请实施例还提供了一种数据处理装置，图15为本申请实施例的数据处理装置的硬件组成结构示意图，如图15所示，数据处理装置150包括存储器153、处理器152及存储在存储器153上并可在处理器152上运行的计算机程序；位于数据处理装置的处理器152执行所述程序时实现上述数据处理装置侧一个或多个技术方案提供的方法。Based on the hardware implementation of the above-mentioned equipment, an embodiment of the present application also provides a data processing apparatus. FIG. 15 is a schematic diagram of the hardware composition structure of the data processing apparatus according to an embodiment of the application. As shown in FIG. 15, the data processing apparatus 150 includes a memory 153. , The processor 152 and a computer program stored on the memory 153 and capable of running on the processor 152; the processor 152 located in the data processing device executes the program to implement the method provided by one or more technical solutions on the data processing device side .

具体地，位于数据处理装置150的处理器152执行所述程序时实现：获取音频数据；所述音频数据是所述第一终端采集的；对所述音频数据进行翻译，得到识别文本；提取所述识别文本的至少两个特征数据，从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息；利用查找到的附加信息和所述识别文本，生成同传结果；输出所述同传结果；所述同传结果随着音频数据的采集进行同步输出。Specifically, when the processor 152 located in the data processing device 150 executes the program, it realizes: acquiring audio data; the audio data is collected by the first terminal; translating the audio data to obtain the recognized text; extracting all the audio data; The at least two feature data of the recognized text are searched for additional information associated with each feature data of the at least two feature data from the knowledge graph database; the simultaneous interpretation result is generated by using the found additional information and the recognized text ; Output the result of the simultaneous interpretation; The result of the simultaneous interpretation is output synchronously with the collection of audio data.

需要说明的是，位于数据处理装置150的处理器152执行所述程序时实现的具体步骤已在上文详述，这里不再赘述。It should be noted that the specific steps implemented when the processor 152 of the data processing device 150 executes the program have been described in detail above, and will not be repeated here.

可以理解，数据处理装置还包括通信接口151；数据处理装置中的各个组件通过总线系统154耦合在一起。可理解，总线系统154配置为实现这些组件之间的连接通信。总线系统154除包括数据总线之外，还包括电源总线、控制总线和状态信号总线等。It can be understood that the data processing device further includes a communication interface 151; various components in the data processing device are coupled together through the bus system 154. It can be understood that the bus system 154 is configured to implement connection and communication between these components. In addition to the data bus, the bus system 154 also includes a power bus, a control bus, and a status signal bus.

可以理解，本实施例中的存储器153可以是易失性存储器或非易失性存储器，也可包括易失性和非易失性存储器两者。其中，非易失性存储器可以是只读存储器(ROM，Read Only Memory)、可编程只读存储器(PROM，Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM，Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM，Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM，ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM，Compact Disc Read-Only Memory)；磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM，Random Access Memory)，其用作外部高速缓存。通过示例性但不是限制性说明，许多形式的RAM可用，例如静态随机存取存储器(SRAM，Static Random Access Memory)、同步静态随机存取存储器(SSRAM，Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM，Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM，Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM，Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM，Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM，SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM，Direct Rambus Random Access Memory)。本申请实施例描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 153 in this embodiment may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory. Among them, the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage. The volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (SRAM, Static Random Access Memory), synchronous static random access memory (SSRAM, Synchronous Static Random Access Memory), and dynamic random access memory. Memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced Type synchronous dynamic random access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronous connection dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, Direct Rambus Random Access Memory) ). The memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.

上述本申请实施例揭示的方法可以应用于处理器152中，或者由处理器152实现。处理器152可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器152中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器152可以是通用处理器、DSP，或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器152可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤，可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中，该存储介质位于存储器，处理器152读取存储器中的信息，结合其硬件完成前述方法的步骤。The method disclosed in the foregoing embodiments of the present application may be applied to the processor 152 or implemented by the processor 152. The processor 152 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 152 or instructions in the form of software. The aforementioned processor 152 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like. The processor 152 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application. The general-purpose processor may be a microprocessor or any conventional processor or the like. Combining the steps of the method disclosed in the embodiments of the present application, it may be directly embodied as execution and completion by a hardware decoding processor, or execution and completion by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium, and the storage medium is located in a memory. The processor 152 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.

本申请实施例还提供了一种存储介质，具体为计算机存储介质，更具体的为计算机可读存储介质。其上存储有计算机指令，即计算机程序，该计算机指令被处理器执行时上述数据处理装置侧一个或多个技术方案提供的方法。The embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium. Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the data processing device side when the computer instructions are executed by a processor.

在本申请所提供的几个实施例中，应该理解到，所揭露的方法和智能设备，可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，如：多个单元或组件可以结合，或可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口，设备或单元的间接耦合或通信连接，可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed method and smart device can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.

上述作为分离部件说明的单元可以是、或也可以不是物理上分开的，作为单元显示的部件可以是、或也可以不是物理单元，即可以位于一个地方，也可以分布到多个网络单元上；可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

另外，在本申请各实施例中的各功能单元可以全部集成在一个第二处理单元中，也可以是各单元分别单独作为一个单元，也可以两个或两个以上单元集成在一个单元中；上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit; The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.

本领域普通技术人员可以理解：实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成，前述的程序可以存储于一计算机可读取存储介质中，该程序在执行时，执行包括上述方法实施例的步骤；而前述的存储介质包括：移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps of the above method embodiments can be implemented by a program instructing relevant hardware. The foregoing program can be stored in a computer readable storage medium, and when the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.

或者，本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机、数据处理装置、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括：移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a data processing device, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.

需要说明的是：“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。It should be noted that: "first", "second", etc. are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.

另外，本申请实施例所记载的技术方案之间，在不冲突的情况下，可以任意组合。In addition, the technical solutions described in the embodiments of the present application can be combined arbitrarily without conflict.

以上所述，仅为本申请的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本申请的保护范围之内。The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application.

Claims

A data processing method, including:

Obtain audio data;

Recognizing the audio data to obtain recognized text;

Extracting at least two feature data of the recognized text, and searching a knowledge graph database for additional information associated with each feature data of the at least two feature data;

Use the additional information found and the recognized text to generate simultaneous interpretation results;

The simultaneous interpretation result is output; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.

The method according to claim 1, wherein said extracting at least two characteristic data of said recognized text comprises:

Extract at least two keywords in the recognized text;

For each keyword in the at least two keyword terms, entity recognition is performed on the corresponding keyword to obtain at least two entity terms with specific meaning;

The at least two entity words are used as at least two feature data of the recognized text.

The method according to claim 2, wherein when extracting the at least two characteristic data, the method further comprises:

For each entity word in the at least two entity words, based on a preset rule and a preset neural network model, extract event-related information associated with the corresponding entity word in the recognized text to obtain at least two event-related information;

The at least two entity words and at least two event-related information are used as at least two feature data of the recognized text.

The method according to any one of claims 1 to 3, wherein searching for additional information associated with each of the at least two feature data from a knowledge graph database comprises:

For each feature data in the at least two feature data, determine the first index identifier corresponding to the corresponding feature data;

Searching for at least two knowledge nodes corresponding to the first index identifier from a knowledge graph database; the knowledge graph database stores the relationship between the index identifier and the knowledge node;

Combining the context of the recognized text, exclude from the at least two knowledge nodes a first knowledge node that meets a preset condition, so as to determine at least two second knowledge nodes that match the context of the recognized text;

Acquiring additional information corresponding to the at least two second knowledge nodes.

The method according to claim 4, wherein said acquiring additional information corresponding to said at least two second knowledge nodes comprises:

Obtain at least two pieces of knowledge information by using the at least two second knowledge nodes;

Sorting the at least two pieces of knowledge information according to the importance level to obtain a sorting result;

Selecting knowledge information whose importance level is greater than a preset level threshold from the ranking result;

The selected knowledge information is used as the additional information.

The method according to claim 4, wherein when generating simultaneous interpretation results, the method further comprises:

Based on a preset filtering rule, exclude the first information that meets a preset condition in the additional information;

The second information except the first information in the additional information and the recognized text are used to generate a simultaneous interpretation result.

The method according to claim 1, wherein said outputting said simultaneous interpretation result comprises:

Perform speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data;

The simultaneous interpretation audio data is sent to the first terminal; the simultaneous interpretation audio data is used for playback by the first terminal.

The simultaneous interpretation result is sent to a display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen.

A data processing device includes:

The obtaining unit is configured to obtain audio data;

The first processing unit is configured to recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search for each feature data of the at least two feature data from the knowledge graph database Associated additional information;

The second processing unit is configured to use the found additional information and the recognized text to generate a simultaneous interpretation result;

The output unit is configured to output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.

A data processing device, comprising a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements the steps of the method according to any one of claims 1 to 7 when the processor executes the program .

A storage medium having computer instructions stored thereon, which implement the steps of the method described in any one of claims 1 to 8 when the instructions are executed by a processor.