WO2021102754A1 - Data processing method and device and storage medium - Google Patents
Data processing method and device and storage medium Download PDFInfo
- Publication number
- WO2021102754A1 WO2021102754A1 PCT/CN2019/121331 CN2019121331W WO2021102754A1 WO 2021102754 A1 WO2021102754 A1 WO 2021102754A1 CN 2019121331 W CN2019121331 W CN 2019121331W WO 2021102754 A1 WO2021102754 A1 WO 2021102754A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- knowledge
- simultaneous interpretation
- recognized text
- additional information
- audio data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
Definitions
- This application relates to simultaneous interpretation technology, in particular to a data processing method, device and storage medium.
- Machine simultaneous interpretation technology is a speech translation product for conferences, reports and other scenes that has emerged in recent years. It combines automatic speech recognition (ASR, Automatic Speech Recognition) technology and machine translation (MT, Machine Translation) technology to provide speakers
- ASR Automatic Speech recognition
- MT Machine Translation
- the speech content provides multilingual subtitles to display, instead of manual simultaneous interpretation services.
- the speech content is usually translated and displayed through text, but the displayed content cannot enable users to fully and accurately understand the speech content.
- the embodiments of the present application provide a data processing method, device, and storage medium.
- the embodiment of the present application provides a data processing method, including:
- the simultaneous interpretation result is output; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
- the embodiment of the present application also provides a data processing device, including:
- the obtaining unit is configured to obtain audio data
- the first processing unit is configured to recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search for each feature data of the at least two feature data from the knowledge graph database Associated additional information;
- the second processing unit is configured to use the found additional information and the recognized text to generate a simultaneous interpretation result
- the output unit is configured to output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
- the embodiment of the present application further provides a data processing device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any of the above-mentioned methods when the program is executed. A step of.
- the embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any one of the foregoing methods are implemented.
- the data processing method, device, and storage medium provided by the embodiments of the application obtain audio data; recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search and search from the knowledge graph database Additional information associated with each feature data in the at least two feature data; use the additional information found and the recognition text to generate a simultaneous interpretation result; output the simultaneous interpretation result, which is used in the playback
- the audio data is presented in the first terminal, the user is provided with additional information associated with the audio data, which can help the user to fully and accurately understand the speech content of the producer of the audio data, and reduce the user’s speech on the speech. Difficulty in understanding the content.
- Figure 1 is a schematic diagram of the implementation process of simultaneous interpretation in related technologies
- FIG. 2 is a schematic diagram of an implementation process of a data processing method according to an embodiment of the application
- FIG. 3 is a schematic diagram 1 of the implementation process of extracting at least two feature data of the recognized text by the server according to the embodiment of the application;
- FIG. 4 is a schematic diagram 1 of the implementation process of extracting at least two feature data of the translated text by the server according to the embodiment of the application;
- FIG. 5 is a second schematic diagram of the implementation process of extracting at least two feature data of the recognized text by the server according to the embodiment of the application;
- FIG. 6 is a second schematic diagram of the implementation process of extracting at least two feature data of the translated text by the server according to the embodiment of the application;
- FIG. 7 is a schematic diagram of the implementation process of searching the additional information corresponding to the recognized text by the server according to the embodiment of the application;
- FIG. 8 is a schematic diagram of the implementation process of searching the additional information corresponding to the translated text by the server according to the embodiment of the application;
- FIG. 9 is a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text according to the embodiment of the application;
- FIG. 10 is a schematic diagram of the implementation process of searching the additional information corresponding to the translated text by the server according to the embodiment of the application;
- FIG. 11 is a schematic diagram of the simultaneous interpretation result output by the server according to an embodiment of the application.
- FIG. 12 is a schematic diagram of an implementation process of the server generating and outputting simultaneous interpretation results according to an embodiment of the application;
- FIG. 13 is a schematic diagram of another implementation process of the server generating and outputting simultaneous interpretation results according to an embodiment of the application;
- FIG. 14 is a schematic diagram of a structure of a data processing device according to an embodiment of the application.
- FIG. 15 is a schematic diagram of another composition structure of the data processing device according to an embodiment of the application.
- machine simultaneous interpretation technology is a voice translation product for conferences, reports and other scenarios that has appeared in recent years. It combines artificial intelligence (AI, Artificial Intelligence) technology, MT, ASR, and speech synthesis (TTS, Text-To- Speech technology to realize Simultaneous Interpretation (SI, Simultaneous Interpretation).
- AI Artificial Intelligence
- TTS Text-To- Speech technology to realize Simultaneous Interpretation
- SI Simultaneous Interpretation
- the machine simultaneous interpretation may also be referred to as machine simultaneous interpretation, AI simultaneous interpretation, AI simultaneous interpretation, and the like.
- the lecturer can give a conference lecture through the client, and project the displayed content to the display screen, and show it to the user through the display screen.
- Figure 1 is a schematic diagram of the implementation process of simultaneous interpretation in related technologies.
- the client uses a microphone to collect the speaker’s audio and sends the collected audio to the server.
- the end recognizes the audio data, obtains the recognized text corresponding to the source language, and then performs machine translation on the recognized text to obtain the translation result corresponding to the target language; finally, the voice is displayed on the screen or broadcast through headphones and other devices to show the user the translation result , So as to achieve the translation of the lecturer's speech content into the language required by the user.
- Simultaneous interpretation solutions in related technologies can display simultaneous interpretation content in different languages for users, but only perform simultaneous interpretation for the speaker’s verbal content, and do not translate information related to the speaker’s verbal content (such as professional terms, citations, events, etc.). Character profile, etc.), if the user lacks or is not familiar with the above information, it will be difficult to correctly and fully understand the lecture content of the lecturer, which will not effectively reduce the difficulty of the user’s understanding of the lecture content, thereby reducing the user experience.
- the current machine simultaneous interpretation technology uses a string matching method to match the interpreted string with a preset dictionary and display the matching result.
- the target application scenario is not a simultaneous interpretation scenario; and for the preset dictionary, a lot of manual development is required, which is time-consuming and labor-intensive.
- the content of the speech cannot be determined in advance, it is even more difficult to pre-set the dictionary, and it is difficult to give a full understanding of the content of the speech by the speaker, and lack flexibility.
- audio data is obtained; the audio data is recognized to obtain the recognized text; at least two feature data of the recognized text are extracted, and the data is searched from the knowledge graph database. Additional information associated with each feature data in at least two feature data (that is, supplementary information that can help users fully and accurately understand the content of the speech); use the additional information found and the recognized text to generate simultaneous interpretation results; output the The result of the simultaneous interpretation; the result of the simultaneous interpretation is used for presentation on the first terminal when the audio data is played.
- FIG. 2 is a schematic diagram of the implementation process of the data processing method according to the embodiment of the application; as shown in FIG. 2, the method includes:
- Step 201 Acquire audio data; recognize the audio data to obtain recognized text;
- the audio data is audio generated when a user in a scene where simultaneous interpretation is applied is giving a speech.
- the client may be provided with or connected to a voice collection module, such as a microphone, through which the voice collection module collects the user's speech content in a scene where simultaneous interpretation is applied to obtain the audio data.
- a communication connection is established between the client and the server and sent to the server through a wireless communication module.
- the wireless communication module may be a Bluetooth module, a wireless fidelity (WiFi, Wireless Fidelity) module, or the like.
- the specific type of the client is not limited in this application.
- it may be a smart phone, a personal computer, a notebook computer, a tablet computer, and a portable wearable device.
- the method further includes:
- voice recognition technology to perform voice recognition on the audio data to obtain recognized text.
- the language of the recognized text is consistent with the language of the user who generates the audio data in the scenario where simultaneous interpretation is applied, that is, the source language.
- a user gives a speech on how terminals in the communication field perform uplink transmission.
- the speech content contains two terms "unlicensed spectrum” and "LBT type", and the server obtains the After the content of the speech, you can query additional information matching the term "unlicensed spectrum” from the knowledge graph database, such as the definition of "unlicensed spectrum”, that is, unlicensed spectrum refers to shared spectrum, in other words, in different communication systems
- the knowledge graph database such as the definition of "unlicensed spectrum” that is, unlicensed spectrum refers to shared spectrum, in other words, in different communication systems
- the knowledge map database to match the term "LBT type”
- Additional information such as the definition of "LBT type", that is, the LBT type includes three types.
- LBT Category1 The communication device does not need to perform channel detection on the unlicensed spectrum, and the switching gap (switching gap) is immediately transmitted after the end; the switching gap No more than 16 ⁇ s.
- LBT Category2 (Cat2): The communication device performs channel detection on the unlicensed spectrum; within a single detection period, the channel is idle and the signal can be sent, and the channel is occupied, the signal cannot be sent; the single detection time is 16us or 25us.
- LBT Category3 (Cat4): The communication device performs channel detection on the unlicensed spectrum; the length of the channel detection needs to be further determined according to the priority of the transmission service.
- the server can subsequently search the knowledge graph database based on the recognized text for the source language of the producer of the audio data and contribute to the audio data.
- the recipient of the audio data understands the additional information of the speech content of the producer of the audio data.
- the server can not only provide the speech content to the recipient of the audio data, but also provide additional information that helps to understand the speech content. Information and content are richer.
- the method further includes:
- the recognized text is translated using a preset translation model to obtain the translated text.
- the language of the translated text is consistent with the language of the receiver of the audio data, that is, the target language.
- the translation model is used to translate a text in a first language into at least one text in a second language; the first language is different from the second language.
- a user gives a speech on how the terminal in the communication field performs uplink transmission.
- the speech content contains two terms "NRU” and "LBT Category", and the server obtains the speech content
- the speech content is translated according to the target language to obtain the translated text, and then the additional information that matches the "unlicensed spectrum” in the translated text can be queried from the knowledge graph database, such as the definition of "unlicensed spectrum", that is, unlicensed spectrum It refers to the shared spectrum.
- the communication devices in different communication systems can use the spectrum as long as they meet the regulatory requirements set by the country or region on the spectrum, without applying for a proprietary spectrum authorization from the government; Query additional information that matches the "LBT type" in the translated text from the knowledge graph database, such as the definition of "LBT type", that is, the LBT type includes: Type LBT Category1 (Cat1): Communication equipment does not need to perform channel detection on unlicensed spectrum , The switching gap (switching gap) is immediately transmitted after the end; the switching gap does not exceed 16 ⁇ s.
- Cat1 Communication equipment does not need to perform channel detection on unlicensed spectrum , The switching gap (switching gap) is immediately transmitted after the end; the switching gap does not exceed 16 ⁇ s.
- LBT Category2 The communication device performs channel detection on the unlicensed spectrum; within a single detection period, the channel is idle and the signal can be sent, and the channel is occupied, the signal cannot be sent; the single detection time is 16us or 25us.
- LBT Category3 The communication device performs channel detection on the unlicensed spectrum; the length of the channel detection needs to be further determined according to the priority of the transmission service.
- the server can subsequently search the knowledge graph database for the target language corresponding to the recipient of the audio data based on the translated text, and The additional information that helps the receiver of the audio data understand the speech content of the producer of the audio data, so that the receiver of the audio data can reduce the difficulty of understanding the speech content.
- Step 202 Extract at least two feature data of the recognized text, and search for additional information associated with each feature data of the at least two feature data from the knowledge graph database;
- the additional information may refer to supplementary explanatory information that can help the user fully and accurately understand the content of the speech.
- the knowledge graph database supports searching for additional information that helps understand the speech content of the audio data producer based on the recognized text corresponding to the source language of the audio data producer.
- the extracting at least two feature data of the recognized text includes:
- entity recognition is performed on the corresponding keywords to obtain at least two entity words
- the at least two entity words are used as at least two feature data of the recognized text.
- entity recognition refers to the recognition of entity words with specific meanings in the text, such as entity names, place names, time, organization names, event names, and professional terms.
- the server may use keyword extraction technology to extract keywords from the recognized text to obtain at least two keywords; it may also convert the first text into multiple bit data; Search for bit data whose number of bits meets a preset threshold in the data; use the found bit data that meets the preset condition as the keyword.
- entity recognition is performed on the at least two keywords to obtain at least two entity words.
- a regular expression is used to search for a keyword matching a preset character string from the at least two keywords, and the searched keyword that matches the preset character string is used as an entity word.
- the classification model is used to perform entity recognition on the at least two key words to obtain at least two entity words with specific meanings.
- a neural network model is used to map the input to the output of the at least two key words to obtain a recognition result; when the recognition result indicates that the corresponding keyword has a specific meaning, the corresponding keyword is used as an entity word.
- a sequence labeling model is used to perform entity recognition on the at least two keywords to obtain at least two entity words.
- semantic analysis technology to analyze the semantics of the at least keywords to obtain semantic information of related keywords, mark the parts of speech of the at least two keywords according to the obtained semantic information, and select from the marked keywords Keywords with specific meanings are regarded as entity words.
- a schematic diagram describing the implementation process of extracting at least two feature data of the recognized text by the server, as shown in FIG. 3, includes:
- Step 1 Extract at least two keywords in the recognized text.
- the at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing”;
- Step 2 Perform entity recognition on the at least two keywords to obtain at least two entity words.
- the at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou”.
- the at least two entity words may be used as the at least two feature data of the recognized text, and the knowledge graph database may be subsequently searched for additional information that helps to understand the speech content of the creator of the audio data To help the receiver of the audio data quickly understand the content of the speech.
- a schematic diagram describing the implementation process of extracting at least two feature data of the translated text by the description server, as shown in FIG. 4, includes:
- Step 1 Translate the recognized text to obtain the translated text.
- the recognized text is translated according to the target language of the receiver of the audio data to obtain the translated text.
- Step 2 Extract at least two keywords in the recognized text.
- the at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing”;
- Step 3 Perform entity recognition on the at least two keywords to obtain at least two entity words.
- the at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou”.
- the at least two entity words may be used as the at least two feature data of the recognized text, and the knowledge graph database may be subsequently searched for additional information that helps to understand the speech content of the creator of the audio data To help the receiver of the audio data quickly understand the content of the speech.
- the method when extracting the at least two feature data, the method further includes:
- the at least two entity words and at least two event-related information are used as at least two feature data of the recognized text.
- the event-related information may be associated with one entity word among the at least two entity words, or may be associated with multiple entity words among the at least two entity words.
- the entity words found can be combined with preset rules, classification models, and sequence labeling models to extract and correspond from the recognized text. Event-related information associated with entity words.
- a preset rule is used to extract at least two event-related information from the recognized text in combination with the at least two entity words.
- a regular expression is used to search for a first text containing a preset character string from the recognized text, to determine whether the first text is related to each of the at least two entity words, when When the first text is associated with a certain entity word of the at least two entity words, the text information corresponding to the first text is used as the event-related information associated with the entity word.
- the text corresponding to the preset character string may be "event".
- a classification model is used to extract at least two event-related information from the recognized text in combination with the at least two entity words.
- a neural network model is used to map the recognized text from input to output to obtain the recognition result; when the recognition result represents the first text containing a preset character string in the recognized text, it is determined that the first Whether the text is associated with each of the at least two entity words, when the first text is associated with an entity word in the at least two entity words, the first The text information corresponding to the text serves as the event-related information associated with the entity word.
- a schematic diagram describing the implementation process of extracting at least two characteristic data of the recognized text by the server, as shown in FIG. 5, includes:
- Step 1 Extract at least two keywords in the recognized text.
- the at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing”;
- Step 2 Perform entity recognition on the at least two keywords to obtain at least two entity words.
- the at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou”.
- Step 3 Use the at least two entity words to extract event-related information associated with the corresponding entity words in the recognized text to obtain at least two event-related information.
- the at least two entity words and the at least two event-related information may be used as at least two feature data of the recognized text, and subsequent searches may be made from the knowledge graph database to help understand the audio data
- subsequent searches may be made from the knowledge graph database to help understand the audio data
- the additional information of the speech content of the producer helps the receiver of the audio data to understand the background of the speech content.
- a schematic diagram describing the implementation process of extracting at least two feature data of the translated text by the server, as shown in FIG. 6, includes:
- Step 1 Translate the recognized text to obtain the translated text.
- the recognized text is translated according to the target language of the receiver of the audio data to obtain the translated text.
- Step 2 Extract at least two keywords in the translated text.
- the at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing”;
- Step 3 Perform entity recognition on the at least two keywords to obtain at least two entity words.
- the at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou”.
- Step 4 Use the at least two entity words to extract event-related information associated with the corresponding entity words in the translated text to obtain at least two event-related information.
- searching for additional information associated with each of the at least two feature data from the knowledge graph database includes:
- the knowledge graph database stores the relationship between the index identifier and the knowledge node
- Combining the context of the recognized text exclude from the at least two knowledge nodes a first knowledge node that meets a preset condition, so as to determine at least two second knowledge nodes that match the context of the recognized text;
- the context refers to the context of the recognized text.
- At least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text, as shown in FIG. 7, including:
- Step 1 Determine the first index identifier corresponding to each of the at least two entity words.
- the corresponding first index identifiers are 01, 02, and 03, respectively.
- Step 2 Use the first index identifier to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
- the node identifiers of the knowledge nodes corresponding to 01 are A and B;
- the node identifier of the knowledge node corresponding to 02 is C;
- the node identifiers of the knowledge nodes corresponding to 03 are D and E.
- Step 3 Combining the context of the recognized text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.
- Disambiguation is performed on the knowledge information corresponding to the at least two knowledge nodes, and the obtained at least two second knowledge nodes are:
- the node identifier of the knowledge node corresponding to 01 is B;
- the node identifier of the knowledge node corresponding to 02 is C;
- the node identifier of the knowledge node corresponding to 03 is E.
- Step 4 Acquire knowledge information corresponding to the at least two second knowledge nodes.
- the additional information may be knowledge information corresponding to the at least two second knowledge nodes.
- At least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for additional information corresponding to the translated text, as shown in FIG. 8, including:
- Step 1 Determine the first index identifier corresponding to each of the at least two entity words.
- the corresponding first index identifiers are 01, 02, and 03, respectively.
- Step 2 Use the first index identifier to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
- the node identifiers of the knowledge nodes corresponding to 01 are A and B;
- the node identifier of the knowledge node corresponding to 02 is C;
- the node identifiers of the knowledge nodes corresponding to 03 are D and E.
- Step 3 Combining the context of the translated text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the translated text.
- Disambiguation is performed on the knowledge information corresponding to the at least two knowledge nodes, and the obtained at least two second knowledge nodes are:
- the node identifier of the knowledge node corresponding to 01 is B;
- the node identifier of the knowledge node corresponding to 02 is C;
- the node identifier of the knowledge node corresponding to 03 is E.
- Step 4 Acquire knowledge information corresponding to the at least two second knowledge nodes.
- the additional information may be knowledge information corresponding to the at least two second knowledge nodes.
- the using the at least two second knowledge nodes to obtain additional information includes:
- the selected knowledge information is used as the additional information.
- the importance level can be determined according to the user's historical access times to the knowledge information. For example, if the user's historical access times to a certain knowledge information is between 0 and 1000, the importance level corresponding to the knowledge information is the first Three levels, the lowest level of characterization; if the user’s historical access times to a certain knowledge information is between 1,000 and 2,000, the importance level corresponding to the knowledge information is the second level; if the user’s historical access times to a certain knowledge information Between 2000 and 5000, the importance level corresponding to the knowledge information is the first level, which represents the highest level.
- At least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text, as shown in FIG. 9, including:
- Step 1 Use the at least two entity words to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
- the corresponding first index identifiers are 01, 02, and 03, respectively.
- the node ID of the knowledge node corresponding to 01 is A and B; the node ID of the knowledge node corresponding to 02 is C; the node ID of the knowledge node corresponding to 03 is D, E.
- Step 2 Combining the context of the recognized text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.
- the at least two second knowledge nodes are: the node identifier of the knowledge node corresponding to 01 is B; the node identifier of the knowledge node corresponding to 02 is C; and the node identifier of the knowledge node corresponding to 03 is E.
- Step 3 Sort the knowledge information corresponding to the at least two second knowledge nodes according to the importance level, and use the knowledge information of the knowledge nodes with the importance level above the second level as additional information.
- the knowledge information of nodes whose importance level is above the second level can be identified as knowledge nodes corresponding to B and C as additional information.
- At least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for additional information corresponding to the translated text, as shown in FIG. 10, including:
- Step 1 Use the at least two entity words to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
- the corresponding first index identifiers are 01, 02, and 03, respectively.
- the node ID of the knowledge node corresponding to 01 is A and B; the node ID of the knowledge node corresponding to 02 is C; the node ID of the knowledge node corresponding to 03 is D, E.
- Step 2 Combining the context of the translated text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.
- the at least two second knowledge nodes are: the node identifier of the knowledge node corresponding to 01 is B; the node identifier of the knowledge node corresponding to 02 is C; and the node identifier of the knowledge node corresponding to 03 is E.
- Step 3 Sort the knowledge information corresponding to the at least two second knowledge nodes according to the importance level, and use the knowledge information of the knowledge nodes with the importance level above the second level as additional information.
- the knowledge information of nodes whose importance level is above the second level can be identified as knowledge nodes corresponding to B and C as additional information.
- removing knowledge information with a lower importance level from the knowledge information corresponding to at least two knowledge nodes found can avoid providing redundant additional information to the receiver of the audio data.
- the knowledge graph database not only supports searching for additional information that helps to understand the speech content of the producer of the audio data based on the recognized text corresponding to the source language of the producer of the audio data; it also supports searching for additional information based on the source language of the producer of the audio data; The translated text corresponding to the target language of the recipient of the data is searched for additional information that helps to understand the content of the speech of the producer of the audio data.
- the process of using the translated text to find the additional information from the knowledge graph database is similar to the process of using the recognized text to find the additional information from the knowledge graph database.
- Step 203 Use the found additional information and the recognized text to generate a simultaneous interpretation result; output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
- the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played, which may mean that the simultaneous interpretation result is presented while the audio data is being played, that is, the data processing method can be applied to simultaneous interpretation. Scenes.
- the additional information found and the recognized text are used to generate a simultaneous interpretation result.
- the recognition result is translated to obtain the translated text; the additional information found and the translated text are used to generate the same Pass the result.
- the additional information may be searched from the knowledge graph database based on the recognized text, or may be searched from the knowledge graph database based on the translated text.
- the method when generating simultaneous interpretation results, the method further includes:
- the second information except the first information in the additional information and the recognized text are used to generate a simultaneous interpretation result.
- the preset filtering rule may be that the number of words contained in the additional information is greater than a preset word number threshold, for example, counting the number of words contained in the additional information; when the counted number of words exceeds 100 words, deleting important information in the additional information
- a preset word number threshold for example, counting the number of words contained in the additional information
- the first information with a lower sex level ensures that the number of additional information is within 100 words.
- the server may use the form of audio stream, use the additional information and the translated text, to generate and output the simultaneous interpretation result.
- the outputting the simultaneous interpretation result includes:
- the simultaneous interpretation audio data is sent to the first terminal; the simultaneous interpretation audio data is used for playback by the first terminal.
- the simultaneous interpretation audio data may be played through the headset of the first terminal, so as to help the user who uses the first terminal understand the speech content of the producer of the audio data.
- the server can also use structured formats such as tables, graphics, and web pages to generate simultaneous interpretation results based on the additional information and the translated text.
- the output of the simultaneous interpretation result includes:
- the simultaneous interpretation result is sent to the display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen, and the The recognized text is displayed in the second display box of the display screen.
- the output of the simultaneous interpretation result includes:
- the simultaneous interpretation result is sent to the display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen, and the The translated text corresponding to the recognized text is displayed in the second display box of the display screen.
- the simultaneous interpretation of the audio data may be performed through the display screen of the first terminal, so as to help the user who uses the first terminal to understand the speech content of the producer who uses the audio data.
- FIG. 11 is a schematic diagram of the server outputting the simultaneous interpretation result.
- the first terminal may place the additional information in the simultaneous interpretation result in the first terminal corresponding to the position above the display screen associated with the first terminal.
- the top position is directly above the center, top center and right, top center and left, etc.; when the language of the recognized text and the language of the receiver of the audio data belong to the same language, all
- the first terminal may display the recognized text in the simultaneous interpretation result in a second display box corresponding to a lower position of the display screen, where the lower position is centered directly below, centered below to the right, centered below Lean to the left, etc.; the display mode can include at least one of the following: pictures, multimedia, text boxes, and rich text boxes.
- Step 1 The client terminal collects the speaker's audio data and sends it to the server.
- the client uses a microphone to collect the content of the lecturer's speech, obtain audio data, and send it to the server.
- Step 2 The server performs voice recognition on the audio data to obtain recognized text corresponding to the source language.
- Step 3 The server searches the knowledge graph database for additional information matching the recognized text based on the recognized text.
- the server extracts at least two entity words in the recognized text; searches for knowledge nodes corresponding to the at least two entity words from the knowledge graph database; in the search process, combining the context of the recognized text, from the at least The first knowledge node that meets the preset condition is excluded from the two knowledge nodes, so as to determine at least two second knowledge nodes that match the context of the recognized text, and obtain additional corresponding to the at least two second knowledge nodes. information.
- the additional information may specifically be entity words/phrases, that is, additional information that supplements difficult-to-understand content.
- entity words/phrases that is, additional information that supplements difficult-to-understand content.
- the difficult-to-understand content may include technical terms, person names, quoted event names, etc.; it may also be an entity Explanation of words/phrases, such as definitions of terms, example information, introduction of characters, explanatory information, elements of events, and related impact information, etc.
- Step 4 The server generates a simultaneous interpretation result based on the additional information and the recognized text.
- the additional information found and the recognized text are used to generate a simultaneous interpretation result.
- the recognition result is translated to obtain the translated text; the additional information found and the translated text are used to generate the same Pass the result.
- Step 5 The server performs speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data.
- the server sends the generated simultaneous interpretation audio data to the first terminal, and the first terminal broadcasts the simultaneous interpretation audio data through a headset.
- Step 6 The server sends the simultaneous interpretation result to the display screen associated with the first terminal.
- the first terminal displays the additional information in the first display frame of the display screen, and displays the recognized text
- the text is displayed in the second display box of the display screen
- the first terminal displays the additional information in the first display frame of the display screen, and displays the recognition
- the translated text corresponding to the text is displayed in the second display box of the display screen.
- Step 1 The client terminal collects the speaker's audio data and sends it to the server.
- the client uses a microphone to collect the content of the lecturer's speech, obtain audio data, and send it to the server.
- Step 2 The server performs voice recognition on the audio data to obtain recognized text corresponding to the source language.
- Step 3 The server translates the recognized text to obtain the translated text.
- Step 4 The server searches the knowledge graph database for additional information matching the recognized text based on the translated text.
- the server extracts at least two entity words in the translated text; searches for knowledge nodes corresponding to the at least two entity words from the knowledge graph database; in the search process, combining the context of the recognized text, from the at least The first knowledge node that meets the preset condition is excluded from the two knowledge nodes, so as to determine at least two second knowledge nodes that match the context of the recognized text, and obtain additional corresponding to the at least two second knowledge nodes. information.
- the additional information may specifically be entity words/phrases, that is, additional information that supplements difficult-to-understand content.
- entity words/phrases that is, additional information that supplements difficult-to-understand content.
- the difficult-to-understand content may include technical terms, person names, quoted event names, etc.; it may also be an entity Explanation of words/phrases, such as definitions of terms, example information, introduction of characters, explanatory information, elements of events, and related impact information, etc.
- Step 5 The server generates a simultaneous interpretation result based on the additional information and the translated text.
- Step 6 The server performs speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data.
- the server sends the generated simultaneous interpretation audio data to the first terminal, and the first terminal broadcasts the simultaneous interpretation audio data through a headset.
- Step 7 The server sends the simultaneous interpretation result to the display screen associated with the first terminal.
- the first terminal displays the additional information in a first display frame of the display screen, and displays the translated text in a second display frame of the display screen.
- the data processing method, device, and storage medium provided by the embodiments of the application obtain audio data; recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search and search from the knowledge graph database Additional information associated with each feature data in the at least two feature data; use the additional information found and the recognition text to generate a simultaneous interpretation result; output the simultaneous interpretation result, which is used in the playback
- the user is provided with additional information of the speech content of the producer of the audio data, which can help the user to fully and accurately understand the speech content of the producer of the audio data, and reduce the user’s dissatisfaction.
- the difficulty of comprehension of the speech content of the producer of the audio data is provided in the first terminal.
- FIG. 14 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 14, the data processing device includes:
- the obtaining unit 141 is configured to obtain audio data
- the first processing unit 142 is configured to recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search for each feature of the at least two feature data from the knowledge graph database. Additional information associated with the data;
- the second processing unit 143 is configured to use the found additional information and the recognized text to generate a simultaneous interpretation result
- the output unit 144 is configured to output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
- the first processing unit 142 is configured to extract at least two key words in the recognized text; for each key word in the at least two key words, perform entity recognition on the corresponding key word to obtain At least two entity words; use the at least two entity words as at least two feature data of the recognized text.
- the first processing unit 142 is configured to extract, for each entity word of the at least two entity words, based on a preset rule and a preset neural network model, the recognition text is associated with the corresponding entity word At least two event-related information are obtained from the event-related information of the, and the at least two entity words and the at least two event-related information are used as at least two feature data of the recognized text.
- the first processing unit 142 is configured to translate the recognized text to obtain the translated text.
- the first processing unit 142 is configured to extract at least two key words in the translated text; for each key word in the at least two key words, perform entity recognition on the corresponding key word to obtain At least two entity words; use the at least two entity words as at least two feature data of the recognized text.
- the first processing unit 142 is configured to extract, for each entity word of the at least two entity words, based on a preset rule and a preset neural network model, the translation text is associated with the corresponding entity word At least two event-related information are obtained from the event-related information of the, and the at least two entity words and the at least two event-related information are used as at least two feature data of the recognized text.
- the first processing unit 142 is configured to determine, for each feature data of the at least two feature data, a first index identifier corresponding to the corresponding feature data; and search for the first index identifier corresponding to the knowledge graph database. Identify the corresponding at least two knowledge nodes; the knowledge graph database stores the relationship between the index identifier and the knowledge node; in combination with the context of the recognized text, exclude from the at least two knowledge nodes those that meet preset conditions The first knowledge node determines at least two second knowledge nodes that match the context of the recognized text; and obtains additional information corresponding to the at least two second knowledge nodes.
- the first processing unit 142 is configured to use the at least two second knowledge nodes to obtain at least two pieces of knowledge information; to sort the at least two pieces of knowledge information according to the importance level to obtain the sorting result ; Select knowledge information with an importance level greater than a preset level threshold from the ranking result; use the selected knowledge information as the additional information.
- the first processing unit 142 is configured to exclude first information that meets a preset condition from the additional information based on a preset filtering rule; use the additional information in addition to the first information
- the second information of and the recognized text generate a simultaneous interpretation result.
- the output unit 144 is configured to perform speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data; send the simultaneous interpretation audio data to the first terminal; the simultaneous interpretation audio data is used for supply The first terminal plays.
- the output unit 144 is configured to send the simultaneous interpretation result to a display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information on the The first display box of the display screen is displayed, and the recognized text is displayed in the second display box of the display screen.
- the output unit 144 is configured to send the simultaneous interpretation result to a display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information on the The first display box of the display screen is displayed, and the translated text corresponding to the recognized text is displayed in the second display box of the display screen.
- the acquisition unit 141 and the output unit 144 can be implemented through a communication interface; the first processing unit 142 and the second processing unit 143 can both be implemented by a processor in the device.
- the device provided in the above embodiment performs data processing
- only the division of the above-mentioned program modules is used as an example.
- the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal
- the internal structure is divided into different program modules to complete all or part of the processing described above.
- the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
- FIG. 15 is a schematic diagram of the hardware composition structure of the data processing apparatus according to an embodiment of the application.
- the data processing apparatus 150 includes a memory 153.
- the processor 152 located in the data processing device 150 executes the program, it realizes: acquiring audio data; the audio data is collected by the first terminal; translating the audio data to obtain the recognized text; extracting all the audio data; The at least two feature data of the recognized text are searched for additional information associated with each feature data of the at least two feature data from the knowledge graph database; the simultaneous interpretation result is generated by using the found additional information and the recognized text ; Output the result of the simultaneous interpretation; The result of the simultaneous interpretation is output synchronously with the collection of audio data.
- the data processing device further includes a communication interface 151; various components in the data processing device are coupled together through the bus system 154. It can be understood that the bus system 154 is configured to implement connection and communication between these components. In addition to the data bus, the bus system 154 also includes a power bus, a control bus, and a status signal bus.
- the memory 153 in this embodiment may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory.
- the non-volatile memory can be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read- Only Memory, Electrically Erasable Programmable Read-Only Memory (EEPROM), Ferromagnetic Random Access Memory (FRAM), Flash Memory, Magnetic Surface Memory , CD-ROM, or CD-ROM (Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk storage or tape storage.
- the volatile memory may be a random access memory (RAM, Random Access Memory), which is used as an external cache.
- RAM random access memory
- SRAM static random access memory
- SSRAM synchronous static random access memory
- Synchronous Static Random Access Memory Synchronous Static Random Access Memory
- DRAM Dynamic Random Access Memory
- SDRAM Synchronous Dynamic Random Access Memory
- DDRSDRAM Double Data Rate Synchronous Dynamic Random Access Memory
- ESDRAM Enhanced Synchronous Dynamic Random Access Memory
- SLDRAM synchronous connection dynamic random access memory
- DRRAM Direct Rambus Random Access Memory
- the memories described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memories.
- the method disclosed in the foregoing embodiments of the present application may be applied to the processor 152 or implemented by the processor 152.
- the processor 152 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 152 or instructions in the form of software.
- the aforementioned processor 152 may be a general-purpose processor, a DSP, or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
- the processor 152 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
- the general-purpose processor may be a microprocessor or any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as execution and completion by a hardware decoding processor, or execution and completion by a combination of hardware and software modules in the decoding processor.
- the software module may be located in a storage medium, and the storage medium is located in a memory.
- the processor 152 reads the information in the memory and completes the steps of the foregoing method in combination with its hardware.
- the embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
- a storage medium which is specifically a computer storage medium, and more specifically, a computer-readable storage medium.
- Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the data processing device side when the computer instructions are executed by a processor.
- the disclosed method and smart device can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
- the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
- the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit;
- the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
- a person of ordinary skill in the art can understand that all or part of the steps of the above method embodiments can be implemented by a program instructing relevant hardware.
- the foregoing program can be stored in a computer readable storage medium, and when the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
- the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
- the computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a data processing device, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application.
- the aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
本申请涉及同声传译技术,具体涉及一种数据处理方法、装置及存储介质。This application relates to simultaneous interpretation technology, in particular to a data processing method, device and storage medium.
机器同传技术是近些年出现的针对会议、报告等场景的语音翻译产品,其结合自动语音识别技术(ASR,Automatic Speech Recognition)技术和机器翻译(MT,Machine Translation)技术,为演讲者的演讲内容提供多语种的字幕展现,替代人工同传服务。Machine simultaneous interpretation technology is a speech translation product for conferences, reports and other scenes that has emerged in recent years. It combines automatic speech recognition (ASR, Automatic Speech Recognition) technology and machine translation (MT, Machine Translation) technology to provide speakers The speech content provides multilingual subtitles to display, instead of manual simultaneous interpretation services.
相关机器同传技术中,通常对演讲内容进行翻译,并通过文字进行展示,但展示的内容无法使用户充分、准确理解演讲内容。In related machine simultaneous interpretation technology, the speech content is usually translated and displayed through text, but the displayed content cannot enable users to fully and accurately understand the speech content.
发明内容Summary of the invention
本申请实施例提供一种数据处理方法、装置和存储介质。The embodiments of the present application provide a data processing method, device, and storage medium.
本申请实施例提供一种数据处理方法,包括:The embodiment of the present application provides a data processing method, including:
获取音频数据;Obtain audio data;
对所述音频数据进行识别,得到识别文本;Recognizing the audio data to obtain recognized text;
提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;Extracting at least two feature data of the recognized text, and searching a knowledge graph database for additional information associated with each feature data of the at least two feature data;
利用查找到的附加信息和所述识别文本,生成同传结果;Use the additional information found and the recognized text to generate simultaneous interpretation results;
输出所述同传结果;所述同传结果用于在播放所述音频数据时在第一终端进行呈现。The simultaneous interpretation result is output; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
本申请实施例还提供一种数据处理装置,包括:The embodiment of the present application also provides a data processing device, including:
获取单元,配置为获取音频数据;The obtaining unit is configured to obtain audio data;
第一处理单元,配置为对所述音频数据进行识别,得到识别文本;提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;The first processing unit is configured to recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search for each feature data of the at least two feature data from the knowledge graph database Associated additional information;
第二处理单元,配置为利用查找到的附加信息和所述识别文本,生成同传结果;The second processing unit is configured to use the found additional information and the recognized text to generate a simultaneous interpretation result;
输出单元,配置为输出所述同传结果;所述同传结果用于在播放所述音频数据时在第一终端进行呈现。The output unit is configured to output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
本申请实施例又提供了一种数据处理装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述任一所述方法的步骤。The embodiment of the present application further provides a data processing device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any of the above-mentioned methods when the program is executed. A step of.
本申请实施例还提供了一种存储介质,其上存储有计算机指令,所述指令被处理器执行时实现上述任一所述方法的步骤。The embodiment of the present application also provides a storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the steps of any one of the foregoing methods are implemented.
本申请实施例提供的数据处理方法、装置和存储介质,获取音频数据;对所述音频数据进行识别,得到识别文本;提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;利用查找到的附加信息和所述识别文本,生成同传结果;输出所述同传结果,所述同传结果用于在播放 所述音频数据时在第一终端进行呈现,为用户提供与所述音频数据关联的附加信息,能够帮助用户充分、准确理解所述音频数据的产生者的的演讲内容,降低用户对所述演讲内容的理解难度。The data processing method, device, and storage medium provided by the embodiments of the application obtain audio data; recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search and search from the knowledge graph database Additional information associated with each feature data in the at least two feature data; use the additional information found and the recognition text to generate a simultaneous interpretation result; output the simultaneous interpretation result, which is used in the playback When the audio data is presented in the first terminal, the user is provided with additional information associated with the audio data, which can help the user to fully and accurately understand the speech content of the producer of the audio data, and reduce the user’s speech on the speech. Difficulty in understanding the content.
图1是相关技术中同声传译实现的流程示意图;Figure 1 is a schematic diagram of the implementation process of simultaneous interpretation in related technologies;
图2为本申请实施例数据处理方法的实现流程示意图;FIG. 2 is a schematic diagram of an implementation process of a data processing method according to an embodiment of the application;
图3为本申请实施例服务器提取所述识别文本的至少两个特征数据的实现流程示意图一;3 is a schematic diagram 1 of the implementation process of extracting at least two feature data of the recognized text by the server according to the embodiment of the application;
图4为本申请实施例服务器提取所述翻译文本的至少两个特征数据的实现流程示意图一;4 is a schematic diagram 1 of the implementation process of extracting at least two feature data of the translated text by the server according to the embodiment of the application;
图5为本申请实施例服务器提取所述识别文本的至少两个特征数据的实现流程示意图二;5 is a second schematic diagram of the implementation process of extracting at least two feature data of the recognized text by the server according to the embodiment of the application;
图6为本申请实施例服务器提取所述翻译文本的至少两个特征数据的实现流程示意图二;6 is a second schematic diagram of the implementation process of extracting at least two feature data of the translated text by the server according to the embodiment of the application;
图7为本申请实施例服务器搜索所述识别文本对应的附加信息的实现流程示意图;FIG. 7 is a schematic diagram of the implementation process of searching the additional information corresponding to the recognized text by the server according to the embodiment of the application;
图8为本申请实施例服务器搜索所述翻译文本对应的附加信息的实现流程示意图;FIG. 8 is a schematic diagram of the implementation process of searching the additional information corresponding to the translated text by the server according to the embodiment of the application;
图9为本申请实施例服务器搜索所述识别文本对应的附加信息的实现流程示意图;9 is a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text according to the embodiment of the application;
图10为本申请实施例服务器搜索所述翻译文本对应的附加信息的实现流程示意图;FIG. 10 is a schematic diagram of the implementation process of searching the additional information corresponding to the translated text by the server according to the embodiment of the application;
图11为本申请实施例服务器输出同传结果的示意图;FIG. 11 is a schematic diagram of the simultaneous interpretation result output by the server according to an embodiment of the application;
图12为本申请实施例服务器生成并输出同传结果的一种实现流程示意图;FIG. 12 is a schematic diagram of an implementation process of the server generating and outputting simultaneous interpretation results according to an embodiment of the application;
图13为本申请实施例服务器生成并输出同传结果的又一种实现流程示意图;FIG. 13 is a schematic diagram of another implementation process of the server generating and outputting simultaneous interpretation results according to an embodiment of the application;
图14为本申请实施例数据处理装置的一种组成结构示意图;FIG. 14 is a schematic diagram of a structure of a data processing device according to an embodiment of the application;
图15为本申请实施例数据处理装置的又一种组成结构示意图。FIG. 15 is a schematic diagram of another composition structure of the data processing device according to an embodiment of the application.
在对本申请实施例的技术方案进行详细说明之前,首先对相关技术进行简单说明。Before describing the technical solutions of the embodiments of the present application in detail, first, a brief description of the related technologies will be given.
相关技术中,机器同传技术是近些年出现的针对会议、报告等场景的语音翻译产品,其结合人工智能(AI,Artificial Intelligence)技术、MT、ASR和语音合成(TTS,Text-To-Speech)技术,实现同声传译(SI,Simultaneous Interpretation)。所述机器同传还可以称为机器同传、AI同声传译、AI同传等。Among related technologies, machine simultaneous interpretation technology is a voice translation product for conferences, reports and other scenarios that has appeared in recent years. It combines artificial intelligence (AI, Artificial Intelligence) technology, MT, ASR, and speech synthesis (TTS, Text-To- Speech technology to realize Simultaneous Interpretation (SI, Simultaneous Interpretation). The machine simultaneous interpretation may also be referred to as machine simultaneous interpretation, AI simultaneous interpretation, AI simultaneous interpretation, and the like.
实际应用中,演讲者可以通过客户端进行会议演讲,并将展示的内容投屏到显示屏幕,通过显示屏幕展示给用户。图1是相关技术中同声传译实现的流程示意图,如图1所示,在进行会议演讲的过程中,客户端通过麦克风采集演讲者的音频,将采集的音频发送给服务端,所述服务端对音频数据进行识别,得到源语言对应的识别文本,再对所述识别文本进行机器翻译,得到目标语言对应的翻译结果;最后通过屏幕展示或通过耳机等设备播报语音,为用户展示翻译结果,从而实现将演讲者的演讲内容翻译成用户需要的语种。In practical applications, the lecturer can give a conference lecture through the client, and project the displayed content to the display screen, and show it to the user through the display screen. Figure 1 is a schematic diagram of the implementation process of simultaneous interpretation in related technologies. As shown in Figure 1, during a conference speech, the client uses a microphone to collect the speaker’s audio and sends the collected audio to the server. The end recognizes the audio data, obtains the recognized text corresponding to the source language, and then performs machine translation on the recognized text to obtain the translation result corresponding to the target language; finally, the voice is displayed on the screen or broadcast through headphones and other devices to show the user the translation result , So as to achieve the translation of the lecturer's speech content into the language required by the user.
相关技术中的同声传译方案可以为用户展示不同语种的同传内容,但是仅针对演讲者口述内容进行同传,并未翻译出与演讲者口述内容相关的信息(如专业术语、引用事件、人物简介等),若用户缺乏或不熟悉上述信息,将难以正确、充分理解演讲者的演讲内容,从而不能有效降低用户对演讲内容理解的难度,进而降低用户体验。为使用户 更加方便、容易理解演讲者的演讲内容,目前的机器同传技术通过字符串匹配方法对待解释字符串与预设词典进行匹配,并展示匹配结果。但针对的应用场景并非同传场景;且针对预设词典,需要大量人工予以开发,费时费力。另外,对于无法事先确定演讲内容的情况,更加难以预先设定词典,难以针对演讲者临场发挥演讲的内容给出充分理解的信息,缺乏灵活性。Simultaneous interpretation solutions in related technologies can display simultaneous interpretation content in different languages for users, but only perform simultaneous interpretation for the speaker’s verbal content, and do not translate information related to the speaker’s verbal content (such as professional terms, citations, events, etc.). Character profile, etc.), if the user lacks or is not familiar with the above information, it will be difficult to correctly and fully understand the lecture content of the lecturer, which will not effectively reduce the difficulty of the user’s understanding of the lecture content, thereby reducing the user experience. In order to make it more convenient and easier for users to understand the content of the lecturer's speech, the current machine simultaneous interpretation technology uses a string matching method to match the interpreted string with a preset dictionary and display the matching result. But the target application scenario is not a simultaneous interpretation scenario; and for the preset dictionary, a lot of manual development is required, which is time-consuming and labor-intensive. In addition, for situations where the content of the speech cannot be determined in advance, it is even more difficult to pre-set the dictionary, and it is difficult to give a full understanding of the content of the speech by the speaker, and lack flexibility.
基于此,在本申请的各种实施例中,获取音频数据;对所述音频数据进行识别,得到识别文本;提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息(即能够帮助用户充分、准确理解演讲内容的补充说明信息);利用查找到的附加信息和所述识别文本,生成同传结果;输出所述同传结果;所述同传结果用于在播放所述音频数据时在第一终端进行呈现。Based on this, in various embodiments of the present application, audio data is obtained; the audio data is recognized to obtain the recognized text; at least two feature data of the recognized text are extracted, and the data is searched from the knowledge graph database. Additional information associated with each feature data in at least two feature data (that is, supplementary information that can help users fully and accurately understand the content of the speech); use the additional information found and the recognized text to generate simultaneous interpretation results; output the The result of the simultaneous interpretation; the result of the simultaneous interpretation is used for presentation on the first terminal when the audio data is played.
下面结合附图及具体实施例对本申请作进一步详细的说明。The application will be further described in detail below in conjunction with the drawings and specific embodiments.
本申请实施例提供了一种数据处理方法,应用于服务器,图2为本申请实施例数据处理方法的实现流程示意图;如图2所示,所述方法包括:The embodiment of the application provides a data processing method, which is applied to a server. FIG. 2 is a schematic diagram of the implementation process of the data processing method according to the embodiment of the application; as shown in FIG. 2, the method includes:
步骤201:获取音频数据;对所述音频数据进行识别,得到识别文本;Step 201: Acquire audio data; recognize the audio data to obtain recognized text;
这里,所述音频数据是处于应用同传的场景中的用户进行演讲时所产生的音频。Here, the audio data is audio generated when a user in a scene where simultaneous interpretation is applied is giving a speech.
下面对服务器获取所述音频数据的过程进行说明。The process by which the server obtains the audio data will be described below.
这里,客户端可以设有或者连接有语音采集模块,如麦克风,通过所述语音采集模块对应用同传的场景中的用户的演讲内容进行采集,得到所述音频数据。所述客户端与所述服务器之间建立通信连接,通过无线通信模块发送给所述服务器。所述无线通信模块可以是蓝牙模块、无线保真(WiFi,Wireless Fidelity)模块等。Here, the client may be provided with or connected to a voice collection module, such as a microphone, through which the voice collection module collects the user's speech content in a scene where simultaneous interpretation is applied to obtain the audio data. A communication connection is established between the client and the server and sent to the server through a wireless communication module. The wireless communication module may be a Bluetooth module, a wireless fidelity (WiFi, Wireless Fidelity) module, or the like.
所述客户端的具体类型,本申请可以不做限定,例如可以为智能手机、个人计算机、笔记本电脑、平板电脑和便携式可穿戴设备等。The specific type of the client is not limited in this application. For example, it may be a smart phone, a personal computer, a notebook computer, a tablet computer, and a portable wearable device.
在一实施例中,所述服务器获取所述音频数据后,所述方法还包括:In an embodiment, after the server obtains the audio data, the method further includes:
运用语音识别技术对所述音频数据进行语音识别,获得识别文本。Using voice recognition technology to perform voice recognition on the audio data to obtain recognized text.
其中,所述识别文本的语言与应用同传的场景中产生所述音频数据的用户的语言即源语言一致。Wherein, the language of the recognized text is consistent with the language of the user who generates the audio data in the scenario where simultaneous interpretation is applied, that is, the source language.
举例来说,在应用同传的会议场景下,用户针对通信领域中终端如何进行上行传输进行演讲,演讲内容中包含两个术语“非授权频谱”和“LBT类型”,所述服务器获得所述演讲内容后,后续可以从知识图谱数据库中查询与术语“非授权频谱”匹配的附加信息,如“非授权频谱”的定义,即非授权频谱是指共享频谱,换句话说,不同通信系统中的通信设备只要满足国家或地区在该频谱上设置的法规要求,就可以使用该频谱,不需要向政府申请专有的频谱授权;后续还可以从知识图谱数据库中查询与术语“LBT类型”匹配的附加信息,如“LBT类型”的定义,即LBT类型包括三个类型LBT Category1(Cat1):通信设备在非授权频谱上无需进行信道检测,切换空隙(switching gap)结束后立即传输;切换空隙不超过16μs。LBT Category2(Cat2):通信设备在非授权频谱上进行信道检测;单次检测时间内信道空闲则可以进行信号发送,信道被占用则不能进行信号发送;单次检测时间为16us或25us。LBT Category3(Cat4):通信设备在非授权频谱上进行信道检测;需要根据传输业务的优先级进一步确定进行信道检测的时长。For example, in a conference scenario where simultaneous interpretation is used, a user gives a speech on how terminals in the communication field perform uplink transmission. The speech content contains two terms "unlicensed spectrum" and "LBT type", and the server obtains the After the content of the speech, you can query additional information matching the term "unlicensed spectrum" from the knowledge graph database, such as the definition of "unlicensed spectrum", that is, unlicensed spectrum refers to shared spectrum, in other words, in different communication systems As long as the communication equipment of the country meets the regulatory requirements set by the country or region on the spectrum, the spectrum can be used, and there is no need to apply for a proprietary spectrum authorization from the government; you can also query the knowledge map database to match the term "LBT type" Additional information, such as the definition of "LBT type", that is, the LBT type includes three types. LBT Category1 (Cat1): The communication device does not need to perform channel detection on the unlicensed spectrum, and the switching gap (switching gap) is immediately transmitted after the end; the switching gap No more than 16μs. LBT Category2 (Cat2): The communication device performs channel detection on the unlicensed spectrum; within a single detection period, the channel is idle and the signal can be sent, and the channel is occupied, the signal cannot be sent; the single detection time is 16us or 25us. LBT Category3 (Cat4): The communication device performs channel detection on the unlicensed spectrum; the length of the channel detection needs to be further determined according to the priority of the transmission service.
需要说明的是,所述服务器获得所述识别文本后,后续可以基于所述识别文本从知识图谱数据库中搜索与所述音频数据的产生者的源语言对应的、且有助于所述音频数据的接收者理解所述音频数据的产生者的演讲内容的附加信息,如此,所述服务器不仅可以向所述音频数据的接收者提供演讲内容,还可以提供有助于理解所述演讲内容的附加信息,内容较丰富。It should be noted that after the server obtains the recognized text, it can subsequently search the knowledge graph database based on the recognized text for the source language of the producer of the audio data and contribute to the audio data. The recipient of the audio data understands the additional information of the speech content of the producer of the audio data. In this way, the server can not only provide the speech content to the recipient of the audio data, but also provide additional information that helps to understand the speech content. Information and content are richer.
在一实施例中,所述服务器获取所述音频数据后,所述方法还包括:In an embodiment, after the server obtains the audio data, the method further includes:
运用语音识别技术对所述音频数据进行语音识别,获得识别文本;Use voice recognition technology to perform voice recognition on the audio data to obtain recognized text;
利用预设的翻译模型对所述识别文本进行翻译,得到翻译文本。The recognized text is translated using a preset translation model to obtain the translated text.
其中,所述翻译文本的语言与所述音频数据的接收者的语言即目标语言一致。所述翻译模型,用于将第一语种的文本翻译为至少一种第二语种的文本;所述第一语种与第二语种不同。Wherein, the language of the translated text is consistent with the language of the receiver of the audio data, that is, the target language. The translation model is used to translate a text in a first language into at least one text in a second language; the first language is different from the second language.
举例来说,在应用同传的会议场景下,用户针对通信领域中终端如何进行上行传输进行演讲,演讲内容中包含两个术语“NRU”和“LBT Category”,所述服务器获得所述演讲内容后,按照目标语言对演讲内容进行翻译,得到翻译文本,后续可以从知识图谱数据库中查询与翻译文本中“非授权频谱”匹配的附加信息,如“非授权频谱”的定义,即非授权频谱是指共享频谱,换句话说,不同通信系统中的通信设备只要满足国家或地区在该频谱上设置的法规要求,就可以使用该频谱,不需要向政府申请专有的频谱授权;后续还可以从知识图谱数据库中查询与翻译文本中“LBT类型”匹配的附加信息,如“LBT类型”的定义,即LBT类型包括:类型LBT Category1(Cat1):通信设备在非授权频谱上无需进行信道检测,切换空隙(switching gap)结束后立即传输;切换空隙不超过16μs。LBT Category2(Cat2):通信设备在非授权频谱上进行信道检测;单次检测时间内信道空闲则可以进行信号发送,信道被占用则不能进行信号发送;单次检测时间为16us或25us。LBT Category3(Cat4):通信设备在非授权频谱上进行信道检测;需要根据传输业务的优先级进一步确定进行信道检测的时长。For example, in a conference scenario where simultaneous interpretation is used, a user gives a speech on how the terminal in the communication field performs uplink transmission. The speech content contains two terms "NRU" and "LBT Category", and the server obtains the speech content After that, the speech content is translated according to the target language to obtain the translated text, and then the additional information that matches the "unlicensed spectrum" in the translated text can be queried from the knowledge graph database, such as the definition of "unlicensed spectrum", that is, unlicensed spectrum It refers to the shared spectrum. In other words, the communication devices in different communication systems can use the spectrum as long as they meet the regulatory requirements set by the country or region on the spectrum, without applying for a proprietary spectrum authorization from the government; Query additional information that matches the "LBT type" in the translated text from the knowledge graph database, such as the definition of "LBT type", that is, the LBT type includes: Type LBT Category1 (Cat1): Communication equipment does not need to perform channel detection on unlicensed spectrum , The switching gap (switching gap) is immediately transmitted after the end; the switching gap does not exceed 16μs. LBT Category2 (Cat2): The communication device performs channel detection on the unlicensed spectrum; within a single detection period, the channel is idle and the signal can be sent, and the channel is occupied, the signal cannot be sent; the single detection time is 16us or 25us. LBT Category3 (Cat4): The communication device performs channel detection on the unlicensed spectrum; the length of the channel detection needs to be further determined according to the priority of the transmission service.
需要说明的是,所述服务器对所述识别文本进行翻译获得所述翻译文本后,后续可以基于所述翻译文本从知识图谱数据库中搜索与所述音频数据的接收者的目标语言对应的、且有助于所述音频数据的接收者理解所述音频数据的产生者的演讲内容的附加信息,如此,能够降低所述音频数据的接收者对所述演讲内容的理解难度。It should be noted that, after the server translates the recognized text to obtain the translated text, it can subsequently search the knowledge graph database for the target language corresponding to the recipient of the audio data based on the translated text, and The additional information that helps the receiver of the audio data understand the speech content of the producer of the audio data, so that the receiver of the audio data can reduce the difficulty of understanding the speech content.
步骤202:提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;Step 202: Extract at least two feature data of the recognized text, and search for additional information associated with each feature data of the at least two feature data from the knowledge graph database;
这里,附加信息可以是指能够帮助用户充分、准确理解演讲内容的补充说明信息。Here, the additional information may refer to supplementary explanatory information that can help the user fully and accurately understand the content of the speech.
为了能够扩大检索范围,所述知识图谱数据库支持根据与所述音频数据的产生者的源语言对应的识别文本,搜索有助于理解所述音频数据的产生者的演讲内容的附加信息。In order to be able to expand the search scope, the knowledge graph database supports searching for additional information that helps understand the speech content of the audio data producer based on the recognized text corresponding to the source language of the audio data producer.
在一实施例中,所述提取所述识别文本的至少两个特征数据,包括:In an embodiment, the extracting at least two feature data of the recognized text includes:
提取所述识别文本中的至少两个关键词;Extract at least two keywords in the recognized text;
针对所述至少两个关键词中每个关键词,对相应关键词进行实体识别,得到至少两个实体词语;For each of the at least two keywords, entity recognition is performed on the corresponding keywords to obtain at least two entity words;
将所述至少两个实体词语作为所述识别文本的至少两个特征数据。The at least two entity words are used as at least two feature data of the recognized text.
其中,实体识别是指识别文本中具有特定意义的实体词语,例如人名、地名、时间、机构名、事件名、专业术语等实体。Among them, entity recognition refers to the recognition of entity words with specific meanings in the text, such as entity names, place names, time, organization names, event names, and professional terms.
这里,也可以提取具有特定意义的至少两个实体短语,这里不做限定。Here, at least two entity phrases with specific meaning can also be extracted, which is not limited here.
具体来说,所述服务器可以利用关键词抽取技术从所述识别文本中进行关键词抽取,得到至少两个关键词;还可以将所述第一文本转换为多个比特数据;从多个比特数据中查找比特数满足预设阈值的比特数据;将查找到的满足预设条件的比特数据作为所述关键词。Specifically, the server may use keyword extraction technology to extract keywords from the recognized text to obtain at least two keywords; it may also convert the first text into multiple bit data; Search for bit data whose number of bits meets a preset threshold in the data; use the found bit data that meets the preset condition as the keyword.
下面对所述至少两个关键词进行实体识别的具体情况进行说明。The specific situation of entity recognition of the at least two keywords will be described below.
第一种情况,利用预设规则,对所述至少两个关键词进行实体识别,得到至少两个 实体词语。In the first case, using preset rules, entity recognition is performed on the at least two keywords to obtain at least two entity words.
例如,利用正则表达式从所述至少两个关键词中搜索与预设字符串匹配的关键词,将搜索到的与预设字符串匹配的关键词作为实体词语。For example, a regular expression is used to search for a keyword matching a preset character string from the at least two keywords, and the searched keyword that matches the preset character string is used as an entity word.
第二种情况,利用分类模型,对所述至少两个关键词语进行实体识别,得到具有特定意义的至少两个实体词语。In the second case, the classification model is used to perform entity recognition on the at least two key words to obtain at least two entity words with specific meanings.
例如,利用神经网络模型,对所述至少两个关键词语进行输入到输出的映射,得到识别结果;当所述识别结果表征相应关键词具备特定意义时,将相应关键词作为实体词语。For example, a neural network model is used to map the input to the output of the at least two key words to obtain a recognition result; when the recognition result indicates that the corresponding keyword has a specific meaning, the corresponding keyword is used as an entity word.
第三种情况,利用序列标注模型,对所述至少两个关键词进行实体识别,得到至少两个实体词语。In the third case, a sequence labeling model is used to perform entity recognition on the at least two keywords to obtain at least two entity words.
例如,利用语义分析技术对所述至少关键词的语义进行分析,得到相关关键词的语义信息,按照得到的语义信息对所述至少两个关键词的词性进行标注,从标注的关键词中选取具有特定意义的关键词作为实体词语。For example, using semantic analysis technology to analyze the semantics of the at least keywords to obtain semantic information of related keywords, mark the parts of speech of the at least two keywords according to the obtained semantic information, and select from the marked keywords Keywords with specific meanings are regarded as entity words.
在一示例中,以实体词语为例,描述服务器提取所述识别文本的至少两个特征数据的实现流程示意图,如图3所示,包括:In an example, taking entity words as an example, a schematic diagram describing the implementation process of extracting at least two feature data of the recognized text by the server, as shown in FIG. 3, includes:
步骤1:提取所述识别文本中的至少两个关键词。Step 1: Extract at least two keywords in the recognized text.
假设所述识别文本为“2019年9月23日,华为总裁任正非关于美国针对孟晚舟事件在北京接收德国电视台主持人的采访”。Assume that the recognized text is "September 23, 2019, Huawei President Ren Zhengfei's interview with the host of a German TV station in Beijing by the United States regarding the Meng Wanzhou incident".
所述至少两个关键词可以为“2019、华为、任正非、美国、孟晚舟、事件、北京”;The at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing";
步骤2:对所述至少两个关键词进行实体识别,得到至少两个实体词语。Step 2: Perform entity recognition on the at least two keywords to obtain at least two entity words.
所述至少两个实体词语可以为“任正非、华为、孟晚舟”。The at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou".
这里,可以将所述至少两个实体词语作为所述识别文本的至少两个特征数据,后续可以从所述知识图谱数据库中搜索有助于理解所述音频数据的产生者的演讲内容的附加信息,帮助所述音频数据的接收者快速理解所述演讲内容。Here, the at least two entity words may be used as the at least two feature data of the recognized text, and the knowledge graph database may be subsequently searched for additional information that helps to understand the speech content of the creator of the audio data To help the receiver of the audio data quickly understand the content of the speech.
在一示例中,以实体词语为例,描述服务器提取所述翻译文本的至少两个特征数据的实现流程示意图,如图4所示,包括:In an example, taking entity words as an example, a schematic diagram describing the implementation process of extracting at least two feature data of the translated text by the description server, as shown in FIG. 4, includes:
步骤1:对所述识别文本进行翻译,得到翻译文本。Step 1: Translate the recognized text to obtain the translated text.
按照所述音频数据的接收者的目标语言对所述识别文本进行翻译,得到翻译文本。The recognized text is translated according to the target language of the receiver of the audio data to obtain the translated text.
步骤2:提取所述识别文本中的至少两个关键词。Step 2: Extract at least two keywords in the recognized text.
假设所述识别文本为“2019年9月23日,华为总裁任正非关于美国针对孟晚舟事件在北京接收德国电视台主持人的采访”。Assume that the recognized text is "September 23, 2019, Huawei President Ren Zhengfei's interview with the host of a German TV station in Beijing by the United States regarding the Meng Wanzhou incident".
所述至少两个关键词可以为“2019、华为、任正非、美国、孟晚舟、事件、北京”;The at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing";
步骤3:对所述至少两个关键词进行实体识别,得到至少两个实体词语。Step 3: Perform entity recognition on the at least two keywords to obtain at least two entity words.
所述至少两个实体词语可以为“任正非、华为、孟晚舟”。The at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou".
这里,可以将所述至少两个实体词语作为所述识别文本的至少两个特征数据,后续可以从所述知识图谱数据库中搜索有助于理解所述音频数据的产生者的演讲内容的附加信息,帮助所述音频数据的接收者快速理解所述演讲内容。Here, the at least two entity words may be used as the at least two feature data of the recognized text, and the knowledge graph database may be subsequently searched for additional information that helps to understand the speech content of the creator of the audio data To help the receiver of the audio data quickly understand the content of the speech.
在一实施例中,提取所述至少两个特征数据时,所述方法还包括:In an embodiment, when extracting the at least two feature data, the method further includes:
针对所述至少两个实体词语中每个实体词语,基于预设规则和预设神经网络模型,提取所述识别文本中与相应实体词语关联的事件相关信息,得到至少两个事件相关信息;For each entity word in the at least two entity words, based on a preset rule and a preset neural network model, extract event-related information associated with the corresponding entity word in the recognized text to obtain at least two event-related information;
将所述至少两个实体词语和至少两个事件相关信息作为所述识别文本的至少两个特征数据。The at least two entity words and at least two event-related information are used as at least two feature data of the recognized text.
其中,所述事件相关信息可以与所述至少两个实体词语中一个实体词语关联,也可 以与所述至少两个实体词语中多个实体词语关联。Wherein, the event-related information may be associated with one entity word among the at least two entity words, or may be associated with multiple entity words among the at least two entity words.
实际应用时,若所述至少两个实体词语的数量少于预设阈值,则可以结合查找到的实体词语,利用预设规则、分类模型、序列标注模型,从所述识别文本中提取与相应实体词语关联的事件相关信息。In actual application, if the number of the at least two entity words is less than the preset threshold, the entity words found can be combined with preset rules, classification models, and sequence labeling models to extract and correspond from the recognized text. Event-related information associated with entity words.
下面对结合所述至少两个实体词语从所述识别文本中进行事件相关信息提取的具体情况进行说明。The specific situation of extracting event-related information from the recognized text in combination with the at least two entity words is described below.
第一种情况,利用预设规则,结合所述至少两个实体词语从所述识别文本中提取至少两个事件相关信息。In the first case, a preset rule is used to extract at least two event-related information from the recognized text in combination with the at least two entity words.
例如,利用正则表达式从所述识别文本中搜索包含预设字符串的第一文本,判断所述第一文本与所述至少两个实体词语中每个实体词语之间是否具有关联性,当所述第一文本与所述至少两个实体词语中某个实体词语具有关联性时,将所述第一文本对应的文本信息作为与该实体词语关联的事件相关信息。其中,预设字符串对应的文本可以为“事件”。For example, a regular expression is used to search for a first text containing a preset character string from the recognized text, to determine whether the first text is related to each of the at least two entity words, when When the first text is associated with a certain entity word of the at least two entity words, the text information corresponding to the first text is used as the event-related information associated with the entity word. Among them, the text corresponding to the preset character string may be "event".
第二种情况,利用分类模型,结合所述至少两个实体词语从所述识别文本中提取至少两个事件相关信息。In the second case, a classification model is used to extract at least two event-related information from the recognized text in combination with the at least two entity words.
例如,利用神经网络模型,对所述识别文本进行输入到输出的映射,得到识别结果;当所述识别结果表征所述识别文本中包含预设字符串的第一文本时,判断所述第一文本与所述至少两个实体词语中每个实体词语之间是否具有关联性,当所述第一文本与所述至少两个实体词语中某个实体词语具有关联性时,将所述第一文本对应的文本信息作为与该实体词语关联的事件相关信息。For example, a neural network model is used to map the recognized text from input to output to obtain the recognition result; when the recognition result represents the first text containing a preset character string in the recognized text, it is determined that the first Whether the text is associated with each of the at least two entity words, when the first text is associated with an entity word in the at least two entity words, the first The text information corresponding to the text serves as the event-related information associated with the entity word.
在一示例中,以实体词语和事件相关信息为例,描述服务器提取所述识别文本的至少两个特征数据的实现流程示意图,如图5所示,包括:In an example, taking entity words and event-related information as examples, a schematic diagram describing the implementation process of extracting at least two characteristic data of the recognized text by the server, as shown in FIG. 5, includes:
步骤1:提取所述识别文本中的至少两个关键词。Step 1: Extract at least two keywords in the recognized text.
假设所述识别文本为“2019年9月23日,华为总裁任正非关于美国针对孟晚舟事件在北京接收德国电视台主持人的采访”。Assume that the recognized text is "September 23, 2019, Huawei President Ren Zhengfei's interview with the host of a German TV station in Beijing by the United States regarding the Meng Wanzhou incident".
所述至少两个关键词可以为“2019、华为、任正非、美国、孟晚舟、事件、北京”;The at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing";
步骤2:对所述至少两个关键词进行实体识别,得到至少两个实体词语。Step 2: Perform entity recognition on the at least two keywords to obtain at least two entity words.
所述至少两个实体词语可以为“任正非、华为、孟晚舟”。The at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou".
步骤3:利用所述至少两个实体词语,提取所述识别文本中与相应实体词语关联的事件相关信息,得到至少两个事件相关信息。Step 3: Use the at least two entity words to extract event-related information associated with the corresponding entity words in the recognized text to obtain at least two event-related information.
针对实体词语“任正非”,从所述识别文本中提取的事件相关信息为“孟晚舟事件”;针对实体词语“华为”,从所述识别文本中提取的事件相关信息为“孟晚舟事件”;针对实体词语“孟晚舟”,从所述识别文本中提取的事件相关信息为“孟晚舟事件”。For the entity word "Ren Zhengfei", the event-related information extracted from the recognized text is "Meng Wanzhou event"; for the entity word "Huawei", the event-related information extracted from the recognized text is "Meng Wanzhou event" "; For the entity word "Meng Wanzhou", the event-related information extracted from the recognized text is "Meng Wanzhou event".
这里,可以将所述至少两个实体词语和所述至少两个事件相关信息作为所述识别文本的至少两个特征数据,后续可以从所述知识图谱数据库中搜索有助于理解所述音频数据的产生者的演讲内容的附加信息,帮助所述音频数据的接收者理解所述演讲内容的背景。Here, the at least two entity words and the at least two event-related information may be used as at least two feature data of the recognized text, and subsequent searches may be made from the knowledge graph database to help understand the audio data The additional information of the speech content of the producer helps the receiver of the audio data to understand the background of the speech content.
在一示例中,以实体词语和事件相关信息为例,描述服务器提取所述翻译文本的至少两个特征数据的实现流程示意图,如图6所示,包括:In an example, taking entity words and event-related information as an example, a schematic diagram describing the implementation process of extracting at least two feature data of the translated text by the server, as shown in FIG. 6, includes:
步骤1:对所述识别文本进行翻译,得到翻译文本。Step 1: Translate the recognized text to obtain the translated text.
按照所述音频数据的接收者的目标语言对所述识别文本进行翻译,得到翻译文本。The recognized text is translated according to the target language of the receiver of the audio data to obtain the translated text.
步骤2:提取所述翻译文本中的至少两个关键词。Step 2: Extract at least two keywords in the translated text.
假设所述翻译文本为“2019年9月23日,华为总裁任正非关于美国针对孟晚舟事件在北京接收德国电视台主持人的采访”。Assume that the translated text is "September 23, 2019, Huawei President Ren Zhengfei's interview with the host of a German TV station in Beijing by the United States regarding the Meng Wanzhou incident".
所述至少两个关键词可以为“2019、华为、任正非、美国、孟晚舟、事件、北京”;The at least two keywords may be "2019, Huawei, Ren Zhengfei, the United States, Meng Wanzhou, events, Beijing";
步骤3:对所述至少两个关键词进行实体识别,得到至少两个实体词语。Step 3: Perform entity recognition on the at least two keywords to obtain at least two entity words.
所述至少两个实体词语可以为“任正非、华为、孟晚舟”。The at least two entity words may be "Ren Zhengfei, Huawei, Meng Wanzhou".
步骤4:利用所述至少两个实体词语,提取所述翻译文本中与相应实体词语关联的事件相关信息,得到至少两个事件相关信息。Step 4: Use the at least two entity words to extract event-related information associated with the corresponding entity words in the translated text to obtain at least two event-related information.
在一实施例中,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息,包括:In an embodiment, searching for additional information associated with each of the at least two feature data from the knowledge graph database includes:
针对所述至少两个特征数据中每个特征数据,确定相应特征数据对应的第一索引标识;For each feature data in the at least two feature data, determine the first index identifier corresponding to the corresponding feature data;
从知识图谱数据库查找与所述第一索引标识对应的至少两个知识节点;所述知识图谱数据库中存储有索引标识与知识节点对应的关系;Searching for at least two knowledge nodes corresponding to the first index identifier from a knowledge graph database; the knowledge graph database stores the relationship between the index identifier and the knowledge node;
结合所述识别文本的语境,从所述至少两个知识节点中排除满足预设条件的第一知识节点,以确定出与所述识别文本的语境匹配的至少两个第二知识节点;Combining the context of the recognized text, exclude from the at least two knowledge nodes a first knowledge node that meets a preset condition, so as to determine at least two second knowledge nodes that match the context of the recognized text;
获取所述至少两个第二知识节点对应的附加信息。Acquiring additional information corresponding to the at least two second knowledge nodes.
其中,所述语境是指所述识别文本的上下文。Wherein, the context refers to the context of the recognized text.
在一示例中,以至少两个实体词语为例,描述服务器搜索所述识别文本对应的附加信息的实现流程示意图,如图7所示,包括:In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text, as shown in FIG. 7, including:
步骤1:确定所述至少两个实体词语各自对应的第一索引标识。Step 1: Determine the first index identifier corresponding to each of the at least two entity words.
假设所述至少两个实体词语为“任正非、华为、孟晚舟”,对应的第一索引标识分别为01、02、03。Assuming that the at least two entity words are "Ren Zhengfei, Huawei, and Meng Wanzhou", the corresponding first index identifiers are 01, 02, and 03, respectively.
步骤2:利用所述第一索引标识,从知识图谱数据库中查找与所述至少两个实体词语对应的至少两个知识节点。Step 2: Use the first index identifier to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
与01对应的知识节点的节点标识为A、B;The node identifiers of the knowledge nodes corresponding to 01 are A and B;
与02对应的知识节点的节点标识为C;The node identifier of the knowledge node corresponding to 02 is C;
与03对应的知识节点的节点标识为D、E。The node identifiers of the knowledge nodes corresponding to 03 are D and E.
步骤3:结合所述识别文本的上下文,对所述至少两个知识节点对应的知识信息进行消歧,得到与所述识别文本的上下文匹配的至少两个第二知识节点。Step 3: Combining the context of the recognized text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.
对所述至少两个知识节点对应的知识信息进行消歧,得到的至少两个第二知识节点为:Disambiguation is performed on the knowledge information corresponding to the at least two knowledge nodes, and the obtained at least two second knowledge nodes are:
与01对应的知识节点的节点标识为B;The node identifier of the knowledge node corresponding to 01 is B;
与02对应的知识节点的节点标识为C;The node identifier of the knowledge node corresponding to 02 is C;
与03对应的知识节点的节点标识为E。The node identifier of the knowledge node corresponding to 03 is E.
步骤4:获取所述至少两个第二知识节点对应的知识信息。Step 4: Acquire knowledge information corresponding to the at least two second knowledge nodes.
附加信息可以为所述至少两个第二知识节点对应的知识信息。The additional information may be knowledge information corresponding to the at least two second knowledge nodes.
在一示例中,以至少两个实体词语为例,描述服务器搜索所述翻译文本对应的附加信息的实现流程示意图,如图8所示,包括:In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for additional information corresponding to the translated text, as shown in FIG. 8, including:
步骤1:确定所述至少两个实体词语各自对应的第一索引标识。Step 1: Determine the first index identifier corresponding to each of the at least two entity words.
假设所述至少两个实体词语为“任正非、华为、孟晚舟”,对应的第一索引标识分别为01、02、03。Assuming that the at least two entity words are "Ren Zhengfei, Huawei, and Meng Wanzhou", the corresponding first index identifiers are 01, 02, and 03, respectively.
步骤2:利用所述第一索引标识,从知识图谱数据库中查找与所述至少两个实体词语对应的至少两个知识节点。Step 2: Use the first index identifier to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
与01对应的知识节点的节点标识为A、B;The node identifiers of the knowledge nodes corresponding to 01 are A and B;
与02对应的知识节点的节点标识为C;The node identifier of the knowledge node corresponding to 02 is C;
与03对应的知识节点的节点标识为D、E。The node identifiers of the knowledge nodes corresponding to 03 are D and E.
步骤3:结合所述翻译文本的上下文,对所述至少两个知识节点对应的知识信息进行消歧,得到与所述翻译文本的上下文匹配的至少两个第二知识节点。Step 3: Combining the context of the translated text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the translated text.
对所述至少两个知识节点对应的知识信息进行消歧,得到的至少两个第二知识节点为:Disambiguation is performed on the knowledge information corresponding to the at least two knowledge nodes, and the obtained at least two second knowledge nodes are:
与01对应的知识节点的节点标识为B;The node identifier of the knowledge node corresponding to 01 is B;
与02对应的知识节点的节点标识为C;The node identifier of the knowledge node corresponding to 02 is C;
与03对应的知识节点的节点标识为E。The node identifier of the knowledge node corresponding to 03 is E.
步骤4:获取所述至少两个第二知识节点对应的知识信息。Step 4: Acquire knowledge information corresponding to the at least two second knowledge nodes.
附加信息可以为所述至少两个第二知识节点对应的知识信息。The additional information may be knowledge information corresponding to the at least two second knowledge nodes.
在一实施例中,所述利用所述至少两个第二知识节点,得到附加信息,包括:In an embodiment, the using the at least two second knowledge nodes to obtain additional information includes:
利用所述至少两个第二知识节点,得到至少两个知识信息;Obtain at least two pieces of knowledge information by using the at least two second knowledge nodes;
按照重要性等级对所述至少两个知识信息进行排序,得到排序结果;Sorting the at least two pieces of knowledge information according to the importance level to obtain a sorting result;
从所述排序结果中选取重要性等级大于预设等级阈值的知识信息;Selecting knowledge information whose importance level is greater than a preset level threshold from the ranking result;
将选取的知识信息作为所述附加信息。The selected knowledge information is used as the additional information.
其中,重要性等级可以是按照用户对知识信息的历史访问次数确定的,例如,若用户对某个知识信息的历史访问次数在0与1000之间,则该知识信息对应的重要性等级为第三级别,表征最低级;若用户对某个知识信息的历史访问次数在1000至2000之间,则该知识信息对应的重要性等级为第二级别;若用户对某个知识信息的历史访问次数在2000至5000之间,则该知识信息对应的重要性等级为第一级别,表征最高级。Among them, the importance level can be determined according to the user's historical access times to the knowledge information. For example, if the user's historical access times to a certain knowledge information is between 0 and 1000, the importance level corresponding to the knowledge information is the first Three levels, the lowest level of characterization; if the user’s historical access times to a certain knowledge information is between 1,000 and 2,000, the importance level corresponding to the knowledge information is the second level; if the user’s historical access times to a certain knowledge information Between 2000 and 5000, the importance level corresponding to the knowledge information is the first level, which represents the highest level.
在一示例中,以至少两个实体词语为例,描述服务器搜索所述识别文本对应的附加信息的实现流程示意图,如图9所示,包括:In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for the additional information corresponding to the recognized text, as shown in FIG. 9, including:
步骤1:利用所述至少两个实体词语,从知识图谱数据库中查找与所述至少两个实体词语对应的至少两个知识节点。Step 1: Use the at least two entity words to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
假设所述至少两个实体词语为“任正非、华为、孟晚舟”,对应的第一索引标识分别为01、02、03。与01对应的知识节点的节点标识为A、B;与02对应的知识节点的节点标识为C;与03对应的知识节点的节点标识为D、E。Assuming that the at least two entity words are "Ren Zhengfei, Huawei, and Meng Wanzhou", the corresponding first index identifiers are 01, 02, and 03, respectively. The node ID of the knowledge node corresponding to 01 is A and B; the node ID of the knowledge node corresponding to 02 is C; the node ID of the knowledge node corresponding to 03 is D, E.
步骤2:结合所述识别文本的上下文,对所述至少两个知识节点对应的知识信息进行消歧,得到与所述识别文本的上下文匹配的至少两个第二知识节点。Step 2: Combining the context of the recognized text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.
所述至少两个第二知识节点为:与01对应的知识节点的节点标识为B;与02对应的知识节点的节点标识为C;与03对应的知识节点的节点标识为E。The at least two second knowledge nodes are: the node identifier of the knowledge node corresponding to 01 is B; the node identifier of the knowledge node corresponding to 02 is C; and the node identifier of the knowledge node corresponding to 03 is E.
步骤3:按照重要性等级对所述至少两个第二知识节点对应的知识信息进行排序,将重要性等级在第二级别以上的知识节点的知识信息作为附加信息。Step 3: Sort the knowledge information corresponding to the at least two second knowledge nodes according to the importance level, and use the knowledge information of the knowledge nodes with the importance level above the second level as additional information.
假设节点标识为B的知识节点的重要性等级为第二级别,节点标识为C的知识节点的重要性等级为第二级别,节点标识为E的知识节点的重要性等级为第三级别,则可以将重要性等级在第二级别以上的节点标识为B和C对应的知识节点的知识信息作为附加信息。Assuming that the importance level of the knowledge node with the node identification B is the second level, the importance level of the knowledge node with the node identification C is the second level, and the importance level of the knowledge node with the node identification E is the third level, then The knowledge information of nodes whose importance level is above the second level can be identified as knowledge nodes corresponding to B and C as additional information.
在一示例中,以至少两个实体词语为例,描述服务器搜索所述翻译文本对应的附加信息的实现流程示意图,如图10所示,包括:In an example, at least two entity words are taken as an example to describe a schematic diagram of the implementation process of the server searching for additional information corresponding to the translated text, as shown in FIG. 10, including:
步骤1:利用所述至少两个实体词语,从知识图谱数据库中查找与所述至少两个实体词语对应的至少两个知识节点。Step 1: Use the at least two entity words to search for at least two knowledge nodes corresponding to the at least two entity words from the knowledge graph database.
假设所述至少两个实体词语为“任正非、华为、孟晚舟”,对应的第一索引标识分别为01、02、03。与01对应的知识节点的节点标识为A、B;与02对应的知识节点的节点标识为C;与03对应的知识节点的节点标识为D、E。Assuming that the at least two entity words are "Ren Zhengfei, Huawei, and Meng Wanzhou", the corresponding first index identifiers are 01, 02, and 03, respectively. The node ID of the knowledge node corresponding to 01 is A and B; the node ID of the knowledge node corresponding to 02 is C; the node ID of the knowledge node corresponding to 03 is D, E.
步骤2:结合所述翻译文本的上下文,对所述至少两个知识节点对应的知识信息进 行消歧,得到与所述识别文本的上下文匹配的至少两个第二知识节点。Step 2: Combining the context of the translated text, disambiguate the knowledge information corresponding to the at least two knowledge nodes to obtain at least two second knowledge nodes that match the context of the recognized text.
所述至少两个第二知识节点为:与01对应的知识节点的节点标识为B;与02对应的知识节点的节点标识为C;与03对应的知识节点的节点标识为E。The at least two second knowledge nodes are: the node identifier of the knowledge node corresponding to 01 is B; the node identifier of the knowledge node corresponding to 02 is C; and the node identifier of the knowledge node corresponding to 03 is E.
步骤3:按照重要性等级对所述至少两个第二知识节点对应的知识信息进行排序,将重要性等级在第二级别以上的知识节点的知识信息作为附加信息。Step 3: Sort the knowledge information corresponding to the at least two second knowledge nodes according to the importance level, and use the knowledge information of the knowledge nodes with the importance level above the second level as additional information.
假设节点标识为B的知识节点的重要性等级为第二级别,节点标识为C的知识节点的重要性等级为第二级别,节点标识为E的知识节点的重要性等级为第三级别,则可以将重要性等级在第二级别以上的节点标识为B和C对应的知识节点的知识信息作为附加信息。Assuming that the importance level of the knowledge node with the node identification B is the second level, the importance level of the knowledge node with the node identification C is the second level, and the importance level of the knowledge node with the node identification E is the third level, then The knowledge information of nodes whose importance level is above the second level can be identified as knowledge nodes corresponding to B and C as additional information.
需要说明的是,从搜索到的至少两个知识节点对应的知识信息中剔除掉重要性等级较低的知识信息,能够避免向所述音频数据的接收者提供冗余的附加信息。It should be noted that removing knowledge information with a lower importance level from the knowledge information corresponding to at least two knowledge nodes found can avoid providing redundant additional information to the receiver of the audio data.
所述知识图谱数据库不仅支持根据与所述音频数据的产生者的源语言对应的识别文本,搜索有助于理解所述音频数据的产生者的演讲内容的附加信息;还支持根据与所述音频数据的接收者的目标语言对应的翻译文本,搜索有助于理解所述音频数据的产生者的演讲内容的附加信息。The knowledge graph database not only supports searching for additional information that helps to understand the speech content of the producer of the audio data based on the recognized text corresponding to the source language of the producer of the audio data; it also supports searching for additional information based on the source language of the producer of the audio data; The translated text corresponding to the target language of the recipient of the data is searched for additional information that helps to understand the content of the speech of the producer of the audio data.
这里,利用翻译文本从知识图谱数据库查找所述附加信息的过程与利用识别文本从知识图谱数据库查找所述附加信息的过程类似。Here, the process of using the translated text to find the additional information from the knowledge graph database is similar to the process of using the recognized text to find the additional information from the knowledge graph database.
步骤203:利用查找到的附加信息和所述识别文本,生成同传结果;输出所述同传结果;所述同传结果用于在播放音频数据时在第一终端进行呈现。Step 203: Use the found additional information and the recognized text to generate a simultaneous interpretation result; output the simultaneous interpretation result; the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played.
这里,所述同传结果用于在播放音频数据时在第一终端进行呈现,可以是指在播放音频数据的同时呈现所述同传结果,即所述数据处理方法可以应用于同声传译的场景。Here, the simultaneous interpretation result is used for presentation on the first terminal when the audio data is played, which may mean that the simultaneous interpretation result is presented while the audio data is being played, that is, the data processing method can be applied to simultaneous interpretation. Scenes.
这里,当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时,利用查找到的附加信息和所述识别文本,生成同传结果。Here, when the language of the recognized text and the language of the recipient of the audio data belong to the same language, the additional information found and the recognized text are used to generate a simultaneous interpretation result.
这里,当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时,对所述识别结果进行翻译,得到翻译文本;利用查找到的附加信息和所述翻译文本,生成同传结果。Here, when the language of the recognized text and the language of the recipient of the audio data belong to different languages, the recognition result is translated to obtain the translated text; the additional information found and the translated text are used to generate the same Pass the result.
其中,所述附加信息可以是基于所述识别文本从所述知识图谱数据库中搜索到的,也可以是基于所述翻译文本从所述知识图谱数据库中搜索到的。Wherein, the additional information may be searched from the knowledge graph database based on the recognized text, or may be searched from the knowledge graph database based on the translated text.
在一实施例中,生成同传结果时,所述方法还包括:In an embodiment, when generating simultaneous interpretation results, the method further includes:
基于预设过滤规则,将所述附加信息中满足预设条件的第一信息进行排除;Based on a preset filtering rule, exclude the first information that meets a preset condition in the additional information;
利用所述附加信息中除所述第一信息外的第二信息和所述识别文本,生成同传结果。The second information except the first information in the additional information and the recognized text are used to generate a simultaneous interpretation result.
其中,预设过滤规则可以是所述附加信息所包含的字数大于预设字数阈值,例如,统计所述附加信息所包含的字数;当统计的字数超过100字时,删除所述附加信息中重要性等级较低的第一信息,保证附加信息的字数在100字以内。Wherein, the preset filtering rule may be that the number of words contained in the additional information is greater than a preset word number threshold, for example, counting the number of words contained in the additional information; when the counted number of words exceeds 100 words, deleting important information in the additional information The first information with a lower sex level ensures that the number of additional information is within 100 words.
实际应用时,所述服务器可以利用音频流的形式,利用所述附加信息和所述翻译文本,生成并输出同传结果。In actual application, the server may use the form of audio stream, use the additional information and the translated text, to generate and output the simultaneous interpretation result.
在一实施例中,所述输出所述同传结果,包括:In an embodiment, the outputting the simultaneous interpretation result includes:
对所述同传结果进行语音合成,得到同传音频数据;Perform speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data;
将所述同传音频数据发送至第一终端;所述同传音频数据用于供所述第一终端进行播放。The simultaneous interpretation audio data is sent to the first terminal; the simultaneous interpretation audio data is used for playback by the first terminal.
这里,可以通过所述第一终端的耳机播放所述同传音频数据,从而帮助使用所述第一终端的用户理解所述音频数据的产生者的演讲内容。Here, the simultaneous interpretation audio data may be played through the headset of the first terminal, so as to help the user who uses the first terminal understand the speech content of the producer of the audio data.
实际应用时,服务器还可以利用表格、图形、网页等结构化格式,基于所述附加信 息和所述翻译文本,生成同传结果。In actual applications, the server can also use structured formats such as tables, graphics, and web pages to generate simultaneous interpretation results based on the additional information and the translated text.
在一实施例中,当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时,所述输出所述同传结果,包括:In an embodiment, when the language of the recognized text and the language of the recipient of the audio data belong to the same language, the output of the simultaneous interpretation result includes:
将所述同传结果发送至第一终端关联的显示屏幕;所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中,并将所述识别文本展示在所述显示屏幕的第二展示框中。The simultaneous interpretation result is sent to the display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen, and the The recognized text is displayed in the second display box of the display screen.
在一实施例中,当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时,所述输出所述同传结果,包括:In an embodiment, when the language of the recognized text and the language of the recipient of the audio data belong to different languages, the output of the simultaneous interpretation result includes:
将所述同传结果发送至第一终端关联的显示屏幕;所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中,并将所述识别文本对应的翻译文本展示在所述显示屏幕的第二展示框中。The simultaneous interpretation result is sent to the display screen associated with the first terminal; the simultaneous interpretation result is used for the first terminal to display the additional information in the first display frame of the display screen, and the The translated text corresponding to the recognized text is displayed in the second display box of the display screen.
这里,可以通过所述第一终端的显示屏幕所述同传音频数据,从而帮助使用所述第一终端的用户理解使用所述音频数据的产生者的演讲内容。Here, the simultaneous interpretation of the audio data may be performed through the display screen of the first terminal, so as to help the user who uses the first terminal to understand the speech content of the producer who uses the audio data.
图11是服务器输出同传结果的示意图,如图11所示,所述第一终端可以将所述同传结果中所述附加信息在所述第一终端关联的显示屏幕的上方位置对应的第一展示框中进行展示,所述上方位置为正上方居中、上方居中靠右、上方居中靠左等;当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时,所述第一终端可以将所述同传结果中所述识别文本在所述显示屏幕的下方位置对应的第二展示框中进行展示,所述下方位置为正下方居中、下方居中靠右、下方居中靠左等;展示的方式可以包括以下至少之一:图片、多媒体、文本框、富文本框。FIG. 11 is a schematic diagram of the server outputting the simultaneous interpretation result. As shown in FIG. 11, the first terminal may place the additional information in the simultaneous interpretation result in the first terminal corresponding to the position above the display screen associated with the first terminal. For display in a display box, the top position is directly above the center, top center and right, top center and left, etc.; when the language of the recognized text and the language of the receiver of the audio data belong to the same language, all The first terminal may display the recognized text in the simultaneous interpretation result in a second display box corresponding to a lower position of the display screen, where the lower position is centered directly below, centered below to the right, centered below Lean to the left, etc.; the display mode can include at least one of the following: pictures, multimedia, text boxes, and rich text boxes.
在一示例中,以基于识别文本搜索附加信息为例,描述服务器生成并输出同传结果的实现流程示意图,如图12所示,包括:In an example, taking the search for additional information based on the recognized text as an example, a schematic diagram of the implementation process of the server generating and outputting simultaneous interpretation results is described, as shown in Figure 12, including:
步骤1:客户端采集演讲者的音频数据,并发送给服务器。Step 1: The client terminal collects the speaker's audio data and sends it to the server.
在应用同传的会议场景中,客户端通过麦克风对演讲者的演讲内容进行采集,得到音频数据,并发送给服务器。In a conference scenario where simultaneous interpretation is used, the client uses a microphone to collect the content of the lecturer's speech, obtain audio data, and send it to the server.
步骤2:服务器对所述音频数据进行语音识别,得到与源语言对应的识别文本。Step 2: The server performs voice recognition on the audio data to obtain recognized text corresponding to the source language.
步骤3:服务器基于所述识别文本从知识图谱数据库中搜索与所述识别文本匹配的附加信息。Step 3: The server searches the knowledge graph database for additional information matching the recognized text based on the recognized text.
服务器提取所述识别文本中至少两个实体词语;从知识图谱数据库查找与所述至少两个实体词语对应的知识节点;在查找的过程中,结合所述识别文本的语境,从所述至少两个知识节点中排除满足预设条件的第一知识节点,以确定出与所述识别文本的语境匹配的至少两个第二知识节点,获取所述至少两个第二知识节点对应的附加信息。The server extracts at least two entity words in the recognized text; searches for knowledge nodes corresponding to the at least two entity words from the knowledge graph database; in the search process, combining the context of the recognized text, from the at least The first knowledge node that meets the preset condition is excluded from the two knowledge nodes, so as to determine at least two second knowledge nodes that match the context of the recognized text, and obtain additional corresponding to the at least two second knowledge nodes. information.
这里,所述附加信息具体可以是实体词/短语,即具有对难理解内容进行补充说明的额外信息,所述难理解的内容可以包括技术术语、人物名、引用事件名等;也可以是实体词/短语的解释信息,如术语的定义、举例信息、人物的介绍、说明信息、事件的要素和相关影响信息等。Here, the additional information may specifically be entity words/phrases, that is, additional information that supplements difficult-to-understand content. The difficult-to-understand content may include technical terms, person names, quoted event names, etc.; it may also be an entity Explanation of words/phrases, such as definitions of terms, example information, introduction of characters, explanatory information, elements of events, and related impact information, etc.
步骤4:服务器基于所述附加信息和所述识别文本,生成同传结果。Step 4: The server generates a simultaneous interpretation result based on the additional information and the recognized text.
当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时,利用查找到的附加信息和所述识别文本,生成同传结果。When the language of the recognized text and the language of the receiver of the audio data belong to the same language, the additional information found and the recognized text are used to generate a simultaneous interpretation result.
这里,当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时,对所述识别结果进行翻译,得到翻译文本;利用查找到的附加信息和所述翻译文本,生成同传结果。Here, when the language of the recognized text and the language of the recipient of the audio data belong to different languages, the recognition result is translated to obtain the translated text; the additional information found and the translated text are used to generate the same Pass the result.
步骤5:服务器对所述同传结果进行语音合成,得到同传音频数据。Step 5: The server performs speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data.
这里,服务器将生成的同传音频数据发送至第一终端,所述第一终端通过耳机播报 所述同传音频数据。Here, the server sends the generated simultaneous interpretation audio data to the first terminal, and the first terminal broadcasts the simultaneous interpretation audio data through a headset.
步骤6:服务器将所述同传结果发送至第一终端关联的显示屏幕。Step 6: The server sends the simultaneous interpretation result to the display screen associated with the first terminal.
当所述识别文本的语言与所述音频数据的接收者的语言属于同一语种时,所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中,并将所述识别文本展示在所述显示屏幕的第二展示框中;When the language of the recognized text and the language of the recipient of the audio data belong to the same language, the first terminal displays the additional information in the first display frame of the display screen, and displays the recognized text The text is displayed in the second display box of the display screen;
当所述识别文本的语言与所述音频数据的接收者的语言属于不同语种时,所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中,并将所述识别文本对应的翻译文本展示在所述显示屏幕的第二展示框中。When the language of the recognized text and the language of the receiver of the audio data belong to different languages, the first terminal displays the additional information in the first display frame of the display screen, and displays the recognition The translated text corresponding to the text is displayed in the second display box of the display screen.
在一示例中,以基于翻译文本搜索附加信息为例,描述服务器生成并输出同传结果的实现流程示意图,如图13所示,包括:In an example, taking the search for additional information based on translated text as an example, a schematic diagram of the implementation process of the server generating and outputting simultaneous interpretation results is described, as shown in Figure 13, including:
步骤1:客户端采集演讲者的音频数据,并发送给服务器。Step 1: The client terminal collects the speaker's audio data and sends it to the server.
在应用同传的会议场景中,客户端通过麦克风对演讲者的演讲内容进行采集,得到音频数据,并发送给服务器。In a conference scenario where simultaneous interpretation is used, the client uses a microphone to collect the content of the lecturer's speech, obtain audio data, and send it to the server.
步骤2:服务器对所述音频数据进行语音识别,得到与源语言对应的识别文本。Step 2: The server performs voice recognition on the audio data to obtain recognized text corresponding to the source language.
步骤3:服务器对所述识别文本进行翻译,得到翻译文本。Step 3: The server translates the recognized text to obtain the translated text.
步骤4:服务器基于所述翻译文本从知识图谱数据库中搜索与所述识别文本匹配的附加信息。Step 4: The server searches the knowledge graph database for additional information matching the recognized text based on the translated text.
服务器提取所述翻译文本中至少两个实体词语;从知识图谱数据库查找与所述至少两个实体词语对应的知识节点;在查找的过程中,结合所述识别文本的语境,从所述至少两个知识节点中排除满足预设条件的第一知识节点,以确定出与所述识别文本的语境匹配的至少两个第二知识节点,获取所述至少两个第二知识节点对应的附加信息。The server extracts at least two entity words in the translated text; searches for knowledge nodes corresponding to the at least two entity words from the knowledge graph database; in the search process, combining the context of the recognized text, from the at least The first knowledge node that meets the preset condition is excluded from the two knowledge nodes, so as to determine at least two second knowledge nodes that match the context of the recognized text, and obtain additional corresponding to the at least two second knowledge nodes. information.
这里,所述附加信息具体可以是实体词/短语,即具有对难理解内容进行补充说明的额外信息,所述难理解的内容可以包括技术术语、人物名、引用事件名等;也可以是实体词/短语的解释信息,如术语的定义、举例信息、人物的介绍、说明信息、事件的要素和相关影响信息等。Here, the additional information may specifically be entity words/phrases, that is, additional information that supplements difficult-to-understand content. The difficult-to-understand content may include technical terms, person names, quoted event names, etc.; it may also be an entity Explanation of words/phrases, such as definitions of terms, example information, introduction of characters, explanatory information, elements of events, and related impact information, etc.
步骤5:服务器基于所述附加信息和所述翻译文本,生成同传结果。Step 5: The server generates a simultaneous interpretation result based on the additional information and the translated text.
步骤6:服务器对所述同传结果进行语音合成,得到同传音频数据。Step 6: The server performs speech synthesis on the simultaneous interpretation result to obtain simultaneous interpretation audio data.
这里,服务器将生成的同传音频数据发送至第一终端,所述第一终端通过耳机播报所述同传音频数据。Here, the server sends the generated simultaneous interpretation audio data to the first terminal, and the first terminal broadcasts the simultaneous interpretation audio data through a headset.
步骤7:服务器将所述同传结果发送至第一终端关联的显示屏幕。Step 7: The server sends the simultaneous interpretation result to the display screen associated with the first terminal.
所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中,并将所述翻译文本展示在所述显示屏幕的第二展示框中。The first terminal displays the additional information in a first display frame of the display screen, and displays the translated text in a second display frame of the display screen.
应理解,上述实施例中说明各步骤的顺序并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the order of the steps described in the above embodiments does not mean the order of execution. The order of execution of the processes should be determined by their functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
本申请实施例提供的数据处理方法、装置和存储介质,获取音频数据;对所述音频数据进行识别,得到识别文本;提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;利用查找到的附加信息和所述识别文本,生成同传结果;输出所述同传结果,所述同传结果用于在播放所述音频数据时在第一终端进行呈现,为用户提供所述音频数据的产生者的演讲内容的附加信息,能够帮助用户充分、准确理解所述音频数据的产生者的演讲内容,降低用户对所述音频数据的产生者的演讲内容的理解难度。The data processing method, device, and storage medium provided by the embodiments of the application obtain audio data; recognize the audio data to obtain recognized text; extract at least two feature data of the recognized text, and search and search from the knowledge graph database Additional information associated with each feature data in the at least two feature data; use the additional information found and the recognition text to generate a simultaneous interpretation result; output the simultaneous interpretation result, which is used in the playback When the audio data is presented in the first terminal, the user is provided with additional information of the speech content of the producer of the audio data, which can help the user to fully and accurately understand the speech content of the producer of the audio data, and reduce the user’s dissatisfaction. The difficulty of comprehension of the speech content of the producer of the audio data.
为实现本申请实施例的数据处理方法,本申请实施例还提供了一种数据处理装置。图14为本申请实施例的数据处理装置的组成结构示意图;如图14所示,所述数据处理装置包括:In order to implement the data processing method of the embodiment of the present application, the embodiment of the present application also provides a data processing device. FIG. 14 is a schematic diagram of the composition structure of a data processing device according to an embodiment of the application; as shown in FIG. 14, the data processing device includes:
获取单元141,配置为获取音频数据;The obtaining
第一处理单元142,配置为对所述音频数据进行识别,得到识别文本;提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;The
第二处理单元143,配置为利用查找到的附加信息和所述识别文本,生成同传结果;The
输出单元144,配置为输出所述同传结果;所述同传结果用于在播放所述音频数据时在第一终端进行呈现。The
在一实施例中,第一处理单元142,配置为提取所述识别文本中的至少两个关键词语;针对所述至少两个关键词语中每个关键词语,对相应关键词语进行实体识别,得到至少两个实体词语;将所述至少两个实体词语作为所述识别文本的至少两个特征数据。In one embodiment, the
在一实施例中,第一处理单元142,配置为针对所述至少两个实体词语中每个实体词语,基于预设规则和预设神经网络模型,提取所述识别文本中与相应实体词语关联的事件相关信息,得到至少两个事件相关信息;将所述至少两个实体词语和至少两个事件相关信息作为所述识别文本的至少两个特征数据。In an embodiment, the
在一实施例中,第一处理单元142,配置为对所述识别文本进行翻译,得到翻译文本。In an embodiment, the
在一实施例中,第一处理单元142,配置为提取所述翻译文本中的至少两个关键词语;针对所述至少两个关键词语中每个关键词语,对相应关键词语进行实体识别,得到至少两个实体词语;将所述至少两个实体词语作为所述识别文本的至少两个特征数据。In an embodiment, the
在一实施例中,第一处理单元142,配置为针对所述至少两个实体词语中每个实体词语,基于预设规则和预设神经网络模型,提取所述翻译文本中与相应实体词语关联的事件相关信息,得到至少两个事件相关信息;将所述至少两个实体词语和至少两个事件相关信息作为所述识别文本的至少两个特征数据。In an embodiment, the
在一实施例中,第一处理单元142,配置为针对所述至少两个特征数据中每个特征数据,确定相应特征数据对应的第一索引标识;从知识图谱数据库查找与所述第一索引标识对应的至少两个知识节点;所述知识图谱数据库中存储有索引标识与知识节点对应的关系;结合所述识别文本的语境,从所述至少两个知识节点中排除满足预设条件的第一知识节点,以确定出与所述识别文本的语境匹配的至少两个第二知识节点;获取所述至少两个第二知识节点对应的附加信息。In an embodiment, the
在一实施例中,第一处理单元142,配置为利用所述至少两个第二知识节点,得到至少两个知识信息;按照重要性等级对所述至少两个知识信息进行排序,得到排序结果;从所述排序结果中选取重要性等级大于预设等级阈值的知识信息;将选取的知识信息作为所述附加信息。In an embodiment, the
在一实施例中,第一处理单元142,配置为基于预设过滤规则,将所述附加信息中满足预设条件的第一信息进行排除;利用所述附加信息中除所述第一信息外的第二信息和所述识别文本,生成同传结果。In an embodiment, the
在一实施例中,输出单元144,配置为对所述同传结果进行语音合成,得到同传音频数据;将所述同传音频数据发送至第一终端;所述同传音频数据用于供所述第一终端进行播放。In an embodiment, the
在一实施例中,输出单元144,配置为将所述同传结果发送至第一终端关联的显示屏幕;所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展示框中,并将所述识别文本展示在所述显示屏幕的第二展示框中。In an embodiment, the
在一实施例中,输出单元144,配置为将所述同传结果发送至第一终端关联的显示屏幕;所述同传结果用于供所述第一终端将所述附加信息展示在所述显示屏幕的第一展 示框中,并将所述识别文本对应的翻译文本展示在所述显示屏幕的第二展示框中。In an embodiment, the
实际应用时,所述获取单元141、输出单元144可通过通信接口实现;所述第一处理单元142、所述第二处理单元143均可由所述装置中的处理器实现。In practical applications, the
需要说明的是:上述实施例提供的装置在进行数据处理时,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述处理分配由不同的程序模块完成,即将终端的内部结构划分成不同的程序模块,以完成以上描述的全部或者部分处理。另外,上述实施例提供的装置与数据处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the device provided in the above embodiment performs data processing, only the division of the above-mentioned program modules is used as an example. In practical applications, the above-mentioned processing can be allocated by different program modules as needed, that is, the terminal The internal structure is divided into different program modules to complete all or part of the processing described above. In addition, the device provided in the foregoing embodiment and the data processing method embodiment belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
基于上述设备的硬件实现,本申请实施例还提供了一种数据处理装置,图15为本申请实施例的数据处理装置的硬件组成结构示意图,如图15所示,数据处理装置150包括存储器153、处理器152及存储在存储器153上并可在处理器152上运行的计算机程序;位于数据处理装置的处理器152执行所述程序时实现上述数据处理装置侧一个或多个技术方案提供的方法。Based on the hardware implementation of the above-mentioned equipment, an embodiment of the present application also provides a data processing apparatus. FIG. 15 is a schematic diagram of the hardware composition structure of the data processing apparatus according to an embodiment of the application. As shown in FIG. 15, the
具体地,位于数据处理装置150的处理器152执行所述程序时实现:获取音频数据;所述音频数据是所述第一终端采集的;对所述音频数据进行翻译,得到识别文本;提取所述识别文本的至少两个特征数据,从知识图谱数据库里查找与所述至少两个特征数据中每个特征数据关联的附加信息;利用查找到的附加信息和所述识别文本,生成同传结果;输出所述同传结果;所述同传结果随着音频数据的采集进行同步输出。Specifically, when the
需要说明的是,位于数据处理装置150的处理器152执行所述程序时实现的具体步骤已在上文详述,这里不再赘述。It should be noted that the specific steps implemented when the
可以理解,数据处理装置还包括通信接口151;数据处理装置中的各个组件通过总线系统154耦合在一起。可理解,总线系统154配置为实现这些组件之间的连接通信。总线系统154除包括数据总线之外,还包括电源总线、控制总线和状态信号总线等。It can be understood that the data processing device further includes a
可以理解,本实施例中的存储器153可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,Erasable Programmable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,Electrically Erasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagnetic random access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,Synchronous Static Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random Access Memory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random Access Memory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data Rate Synchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the
上述本申请实施例揭示的方法可以应用于处理器152中,或者由处理器152实现。处理器152可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法 的各步骤可以通过处理器152中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器152可以是通用处理器、DSP,或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器152可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器,处理器152读取存储器中的信息,结合其硬件完成前述方法的步骤。The method disclosed in the foregoing embodiments of the present application may be applied to the
本申请实施例还提供了一种存储介质,具体为计算机存储介质,更具体的为计算机可读存储介质。其上存储有计算机指令,即计算机程序,该计算机指令被处理器执行时上述数据处理装置侧一个或多个技术方案提供的方法。The embodiment of the present application also provides a storage medium, which is specifically a computer storage medium, and more specifically, a computer-readable storage medium. Stored thereon are computer instructions, that is, a computer program, which is a method provided by one or more technical solutions on the data processing device side when the computer instructions are executed by a processor.
在本申请所提供的几个实施例中,应该理解到,所揭露的方法和智能设备,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed method and smart device can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms. of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各实施例中的各功能单元可以全部集成在一个第二处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the embodiments of the present application may all be integrated into a second processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit; The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps of the above method embodiments can be implemented by a program instructing relevant hardware. The foregoing program can be stored in a computer readable storage medium, and when the program is executed, it is executed. Including the steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a mobile storage device, ROM, RAM, magnetic disk, or optical disk.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、数据处理装置、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the aforementioned integrated unit of the present application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to the prior art. The computer software product is stored in a storage medium and includes several instructions for A computer device (which may be a personal computer, a data processing device, or a network device, etc.) executes all or part of the methods described in the various embodiments of the present application. The aforementioned storage media include: removable storage devices, ROM, RAM, magnetic disks, or optical disks and other media that can store program codes.
需要说明的是:“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that: "first", "second", etc. are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
另外,本申请实施例所记载的技术方案之间,在不冲突的情况下,可以任意组合。In addition, the technical solutions described in the embodiments of the present application can be combined arbitrarily without conflict.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application.
Claims (11)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201980100993.0A CN114556969A (en) | 2019-11-27 | 2019-11-27 | Data processing method, device and storage medium |
| PCT/CN2019/121331 WO2021102754A1 (en) | 2019-11-27 | 2019-11-27 | Data processing method and device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2019/121331 WO2021102754A1 (en) | 2019-11-27 | 2019-11-27 | Data processing method and device and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021102754A1 true WO2021102754A1 (en) | 2021-06-03 |
Family
ID=76128971
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/121331 Ceased WO2021102754A1 (en) | 2019-11-27 | 2019-11-27 | Data processing method and device and storage medium |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN114556969A (en) |
| WO (1) | WO2021102754A1 (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113312899A (en) * | 2021-06-18 | 2021-08-27 | 网易(杭州)网络有限公司 | Text classification method and device and electronic equipment |
| CN113837167A (en) * | 2021-08-31 | 2021-12-24 | 北京捷通华声科技股份有限公司 | Text image recognition method, device, equipment and storage medium |
| CN114090449A (en) * | 2021-11-24 | 2022-02-25 | 中国银行股份有限公司 | Fault data analysis method and device of data system |
| CN114661951A (en) * | 2022-03-22 | 2022-06-24 | 腾讯科技(深圳)有限公司 | Video processing method and device, computer equipment and storage medium |
| CN117316163A (en) * | 2023-10-08 | 2023-12-29 | 江门市麦德利电子科技有限公司 | Paperless office conference equipment and paperless office conference method |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060143681A1 (en) * | 2004-12-29 | 2006-06-29 | Delta Electronics, Inc. | Interactive entertainment center |
| CN101714140A (en) * | 2008-10-07 | 2010-05-26 | 英业达股份有限公司 | Instant translation system with multimedia display and method thereof |
| CN104750678A (en) * | 2015-04-19 | 2015-07-01 | 王学庆 | Image text recognizing translation glasses and method |
| CN109614628A (en) * | 2018-11-16 | 2019-04-12 | 广州市讯飞樽鸿信息技术有限公司 | A kind of interpretation method and translation system based on Intelligent hardware |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101169780A (en) * | 2006-10-25 | 2008-04-30 | 华为技术有限公司 | A Semantic Ontology-Based Retrieval System and Method |
| CN101901235B (en) * | 2009-05-27 | 2013-03-27 | 国际商业机器公司 | Method and system for document processing |
| CN103294693A (en) * | 2012-02-27 | 2013-09-11 | 华为技术有限公司 | Searching method, server and system |
| US9257115B2 (en) * | 2012-03-08 | 2016-02-09 | Facebook, Inc. | Device for extracting information from a dialog |
| US9146969B2 (en) * | 2012-11-26 | 2015-09-29 | The Boeing Company | System and method of reduction of irrelevant information during search |
| CN104866511B (en) * | 2014-02-26 | 2018-10-02 | 华为技术有限公司 | A kind of method and apparatus of addition multimedia file |
| CN110019713A (en) * | 2017-12-07 | 2019-07-16 | 上海智臻智能网络科技股份有限公司 | Based on the data retrieval method and device, equipment and storage medium for being intended to understand |
| CN108615527B (en) * | 2018-05-10 | 2021-10-15 | 腾讯科技(北京)有限公司 | Data processing method, device and storage medium based on simultaneous interpretation |
| CN109670119A (en) * | 2018-12-29 | 2019-04-23 | 咪咕文化科技有限公司 | Data processing method and device and computer storage medium |
| CN110347908B (en) * | 2019-05-23 | 2023-04-18 | 平安科技(深圳)有限公司 | Voice shopping method, device, medium and electronic equipment |
| CN110222189A (en) * | 2019-06-19 | 2019-09-10 | 北京百度网讯科技有限公司 | Method and apparatus for output information |
-
2019
- 2019-11-27 WO PCT/CN2019/121331 patent/WO2021102754A1/en not_active Ceased
- 2019-11-27 CN CN201980100993.0A patent/CN114556969A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060143681A1 (en) * | 2004-12-29 | 2006-06-29 | Delta Electronics, Inc. | Interactive entertainment center |
| CN101714140A (en) * | 2008-10-07 | 2010-05-26 | 英业达股份有限公司 | Instant translation system with multimedia display and method thereof |
| CN104750678A (en) * | 2015-04-19 | 2015-07-01 | 王学庆 | Image text recognizing translation glasses and method |
| CN109614628A (en) * | 2018-11-16 | 2019-04-12 | 广州市讯飞樽鸿信息技术有限公司 | A kind of interpretation method and translation system based on Intelligent hardware |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113312899A (en) * | 2021-06-18 | 2021-08-27 | 网易(杭州)网络有限公司 | Text classification method and device and electronic equipment |
| CN113312899B (en) * | 2021-06-18 | 2023-07-04 | 网易(杭州)网络有限公司 | Text classification method and device and electronic equipment |
| CN113837167A (en) * | 2021-08-31 | 2021-12-24 | 北京捷通华声科技股份有限公司 | Text image recognition method, device, equipment and storage medium |
| CN114090449A (en) * | 2021-11-24 | 2022-02-25 | 中国银行股份有限公司 | Fault data analysis method and device of data system |
| CN114661951A (en) * | 2022-03-22 | 2022-06-24 | 腾讯科技(深圳)有限公司 | Video processing method and device, computer equipment and storage medium |
| CN117316163A (en) * | 2023-10-08 | 2023-12-29 | 江门市麦德利电子科技有限公司 | Paperless office conference equipment and paperless office conference method |
| CN117316163B (en) * | 2023-10-08 | 2024-05-31 | 江门市麦德利电子科技有限公司 | Paperless office conference equipment and paperless office conference method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114556969A (en) | 2022-05-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114514528B (en) | Data processing method, device, electronic device and storage medium | |
| WO2021102754A1 (en) | Data processing method and device and storage medium | |
| CN110517689B (en) | Voice data processing method, device and storage medium | |
| CN110853615B (en) | Data processing method, device and storage medium | |
| TW201717062A (en) | Intelligent high fault-tolerant video recognition system based on multi-modal fusion and its identification method | |
| WO2021087665A1 (en) | Data processing method and apparatus, server, and storage medium | |
| CN112399269B (en) | Video segmentation method, device, equipment and storage medium | |
| CN111062221A (en) | Data processing method, data processing device, electronic equipment and storage medium | |
| WO2023142590A1 (en) | Sign language video generation method and apparatus, computer device, and storage medium | |
| KR101618084B1 (en) | Method and apparatus for managing minutes | |
| CN116521626A (en) | Personal knowledge management method and system based on content retrieval | |
| CN110781346A (en) | News production method, system, device and storage medium based on virtual image | |
| KR20220130863A (en) | Apparatus for Providing Multimedia Conversion Content Creation Service Based on Voice-Text Conversion Video Resource Matching | |
| WO2021062757A1 (en) | Simultaneous interpretation method and apparatus, and server and storage medium | |
| CN113903335B (en) | User intention recognition method, user intention recognition device and storage medium | |
| WO2021097629A1 (en) | Data processing method and apparatus, and electronic device and storage medium | |
| KR102357313B1 (en) | Content indexing method of electronic apparatus for setting index word based on audio data included in video content | |
| CN118690029B (en) | Video question-answering method, system and medium based on multi-mode information fusion | |
| Spolaôr et al. | A video indexing and retrieval computational prototype based on transcribed speech | |
| CN114155841B (en) | Speech recognition method, device, equipment and storage medium | |
| KR102435243B1 (en) | A method for providing a producing service of transformed multimedia contents using matching of video resources | |
| CN110942775B (en) | Data processing method and device, electronic equipment and storage medium | |
| Bigot et al. | Detecting individual role using features extracted from speaker diarization results | |
| US8478584B1 (en) | Method and system for domain-optimized semantic tagging and task execution using task classification encoding | |
| CN108108350B (en) | Noun recognition method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19954481 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19954481 Country of ref document: EP Kind code of ref document: A1 |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.12.2022) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19954481 Country of ref document: EP Kind code of ref document: A1 |