[go: up one dir, main page]

CN102262624A - System and method for realizing cross-language communication based on multi-mode assistance - Google Patents

System and method for realizing cross-language communication based on multi-mode assistance Download PDF

Info

Publication number
CN102262624A
CN102262624A CN201110225342XA CN201110225342A CN102262624A CN 102262624 A CN102262624 A CN 102262624A CN 201110225342X A CN201110225342X A CN 201110225342XA CN 201110225342 A CN201110225342 A CN 201110225342A CN 102262624 A CN102262624 A CN 102262624A
Authority
CN
China
Prior art keywords
conversation
content
text
chat
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110225342XA
Other languages
Chinese (zh)
Inventor
徐常胜
程健
梁超
张歆明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201110225342XA priority Critical patent/CN102262624A/en
Publication of CN102262624A publication Critical patent/CN102262624A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

本发明提出基于多模态辅助的实现跨语言沟通系统及方法,所述方法利用实现跨语言沟通系统中的前台交互模块、数据管理模块和语义关联模块,通过分析谈话内容,利用自然语言处理工具能够自动地提取对话中的中心议题及关键字,并语义关联模块根据检测到的中心议题与关键字信息,自动地搜索相关的图片和视频片段并以恰当的方式提供给谈话双方,从而达到促进彼此的了解和沟通。这里,作为辅助理解的图片和视频,既可以通过搜索的方法从网络自动扒取,也可以从一个预先已标注好的多媒体库中直接获取。最后,系统根据谈话双方的文本聊天信息以及与之相对应的图片和视频内容,生成一个多模态的谈话摘要。

Figure 201110225342

The present invention proposes a cross-language communication system and method based on multi-modal assistance. The method utilizes the foreground interaction module, data management module and semantic association module in the cross-language communication system to analyze conversation content and use natural language processing tools It can automatically extract the central topic and keywords in the conversation, and the semantic association module can automatically search for relevant pictures and video clips based on the detected central topic and keyword information, and provide them to both parties in an appropriate way, so as to promote mutual understanding and communication. Here, the pictures and videos used as aids in understanding can be automatically picked up from the Internet by searching, or directly obtained from a pre-marked multimedia library. Finally, the system generates a multi-modal conversation summary based on the text chat information of the two parties in the conversation and the corresponding picture and video content.

Figure 201110225342

Description

基于多模态辅助的实现跨语言沟通系统及方法System and method for realizing cross-language communication based on multimodal assistance

技术领域 technical field

本发明属于多媒体分析、网络通讯领域,涉及基于多模态辅助的实现跨语言沟通的方法。The invention belongs to the fields of multimedia analysis and network communication, and relates to a method for realizing cross-language communication based on multi-mode assistance.

背景技术 Background technique

随着通讯技术和互联网技术的快速发展,出现了与邮件、电话、电报等传统通讯方式完全不同的一种网络即时通讯系统,比如MSN和QQ。传统的邮件和电报以文字为主,电话以语音为主,而即时通讯不仅可以使用文字和语音,还可以辅助丰富的视频、图片等多媒体手段。通过即时通讯系统,远隔重洋的人们可以实现如面对面的实时交谈。整个地球已经成为名副其实的地球村。With the rapid development of communication technology and Internet technology, a network instant messaging system completely different from traditional communication methods such as mail, telephone, and telegram has emerged, such as MSN and QQ. Traditional emails and telegrams mainly use text, and telephone calls mainly use voice, while instant messaging can not only use text and voice, but also can assist rich multimedia methods such as video and pictures. Through the instant messaging system, people across oceans can realize face-to-face real-time conversation. The whole earth has become a veritable global village.

对于说不同语言的对话者来说,语言问题仍然是即时通讯中难以逾越的障碍。近年来,由于机器翻译技术取得了长足进步,不同语言之间的用户的交流存在的语言问题在某种程度上通过机器翻译的技术得到了一定的解决。但是机器翻译存在两个明显的缺点。第一就是不同语言之间的准确翻译。但是机器翻译仍然只能对一些简单的对话进行自动翻译。即使是世界上使用人数最多的两种语言:英语和汉语,它们之间的自动翻译准确率也还是无法完全满足日常使用需要。如果考虑到世界上众多的少数民族语言,不同语言之间准确的自动翻译可能仍然是一个任重道远的问题。第二个就是词义的多义性是机器翻译中遇到的另一个挑战性的难题。Language issues remain an insurmountable barrier in instant messaging for interlocutors who speak different languages. In recent years, due to the great progress made in machine translation technology, the language problems existing in the communication between users of different languages have been solved to some extent by the technology of machine translation. But machine translation has two obvious disadvantages. The first is accurate translation between different languages. But machine translation can still only automatically translate some simple conversations. Even for the two most spoken languages in the world: English and Chinese, the accuracy of automatic translation between them still cannot fully meet the needs of daily use. If one considers the numerous minority languages in the world, accurate automatic translation between different languages may still be a problem with a long way to go. The second is that the polysemy of word meaning is another challenging problem encountered in machine translation.

为增强交流的从文本到图像的合成系统,现有技术中将输入的文本中主体内容以图片的形式表现出来。这个问题的解决是通过三个优化来完成从文本到图片的转换,即基于输入的文本最大化关键字出现的概率、基于输入文本和已选择的关键字最大化相应的图片出现的概率和基于输入文本,已选关键字和对应的图片最大化文本和图片的空间分布。这样基于这三个优化最终完成从文本到图片的转化。但是这个系统存在以下三个缺点:In order to enhance the text-to-image synthesis system for communication, in the prior art, the main content of the input text is represented in the form of pictures. The solution to this problem is to complete the conversion from text to pictures through three optimizations, that is, to maximize the probability of keywords based on the input text, to maximize the probability of corresponding pictures based on the input text and the selected keywords, and to maximize the probability of occurrence of the corresponding pictures based on the input text and selected keywords. Input text, selected keywords and corresponding images to maximize the spatial distribution of text and images. In this way, based on these three optimizations, the conversion from text to pictures is finally completed. But this system has the following three disadvantages:

1).系统处理速度慢。这个系统由于要计算优化,这样会导致图片到文本的转化速度变慢;1). The processing speed of the system is slow. Due to the calculation optimization of this system, it will slow down the conversion speed of pictures to text;

2).系统的界面不友好。由于要对输入的文本和给出的图片一起进行优化得出空间布局再呈现给用户。如果将这样的文本图片混杂的布局应用到用户之间对话的情况,势必会给用户造成不友好的感觉。2). The interface of the system is not friendly. Because it is necessary to optimize the input text and the given picture together, the spatial layout is presented to the user. If such a layout with mixed text and pictures is applied to the dialogue between users, it will definitely give users an unfriendly feeling.

3).系统不易使用。由于是终端软件,这样势必要求用户自行下载软件。可以借助网页来解决系统的不易使用的缺点。3). The system is not easy to use. Since it is terminal software, it is bound to require the user to download the software by himself. The shortcomings of the system that are not easy to use can be solved with the help of web pages.

发明内容 Contents of the invention

本发明的目的是解决现有技术处理速度慢、不易使用的技术缺陷,通过多模态信息辅助使用不同语言的人能够顺畅地在线交流。通过图像、视频等多模态信息减少传统自动翻译中产生的歧义性和多义性,并且辅助对用户对话内容的语义理解,由此本发明提供一种基于多模态辅助的实现跨语言沟通的方法。The purpose of the present invention is to solve the technical defects of slow processing speed and difficult use in the prior art, and assist people who use different languages to communicate online smoothly through multi-modal information. The ambiguity and polysemy generated in traditional automatic translation are reduced through multi-modal information such as images and videos, and the semantic understanding of user dialogue content is assisted. Therefore, the present invention provides a cross-language communication based on multi-modal assistance. Methods.

为实现所述目的,本发明的第一方面提供一种基于多模态辅助的跨语言沟通系统,该系统的技术方案包括:前台交互模块、数据管理模块和语义关联模块,其中:In order to achieve the stated purpose, the first aspect of the present invention provides a cross-language communication system based on multimodal assistance. The technical solution of the system includes: a front-end interaction module, a data management module and a semantic association module, wherein:

前台交互模块的输入端接受用户输入的文本聊天内容并对用户聊天的内容进行预处理,得到用户聊天的文本信息,并通过前台交互模块的前后台交互模块的输出端传送处理后的用户文本聊天内容;前台交互模块的聊天页面为用户显示聊天双方的对话的文字内容和根据双方谈话的内容系统推荐出来的多媒体图片;The input terminal of the front-end interaction module accepts the text chat content input by the user and preprocesses the content of the user chat, obtains the text information of the user chat, and transmits the processed user text chat through the output end of the front-end interaction module of the front-end interaction module Content; the chat page of the front-end interactive module displays the text content of the conversation between the two parties and the multimedia pictures recommended by the system according to the content of the conversation between the two parties for the user;

语义关联模块的输入端与前台交互模块输出端连接,接收并对用户的文本聊天内容进行分析,利用自然语言处理工具提取出双方谈话的主要内容,得到并输出文本信息关联上翻译的文本和相对应的多媒体信息,及根据文本聊天内容、翻译的内容和相应的多媒体信息生成一个多模态摘要;The input end of the semantic association module is connected to the output end of the front-end interaction module, which receives and analyzes the text chat content of the user, uses natural language processing tools to extract the main content of the conversation between the two parties, obtains and outputs the text information associated with the translated text and related Corresponding multimedia information, and generating a multimodal summary based on text chat content, translated content and corresponding multimedia information;

数据管理模块的输入端与语义关联模块连接输出端连接,数据管理模块要对新输入的文本聊天内容、翻译的内容和相应的多媒体信息进行存储,同时把历史的用户信息连同新的用户信息进行整合,生成并显示所有的聊天双方的对话的文字内容和根据双方谈话的内容系统推荐出来的多媒体图片信息。The input end of the data management module is connected to the output end of the semantic association module. The data management module needs to store the newly input text chat content, translated content and corresponding multimedia information, and simultaneously store the historical user information together with the new user information. Integrate, generate and display all the text content of the conversation between the two chatting parties and the multimedia picture information recommended by the system according to the content of the conversation between the two parties.

优选实施例,当后台的语义关联模块收到用户发送过来的文本信息之后,语义关联模块为了帮助不同语种的聊天用户能够从使用的语言的角度来理解对方的说话的含义,将Google翻译的结果集成进来;这样除了原始的用户聊天信息以外,还附带上了对这个聊天内容的基于Google翻译的用户聊天的译文。In a preferred embodiment, after the semantic association module in the background receives the text information sent by the user, the semantic association module will translate the results of Google translation in order to help chat users of different languages understand the meaning of the other party's speech from the perspective of the language used. Integrate in; so that in addition to the original user chat information, the translation of the chat content based on Google Translate is also attached.

优选实施例,语义关联模块提取出双方谈话的主要内容是将这些主要内容作为关键字,采用基于文本的图像检索从图像数据库中检索出来相应的候选图片集。In a preferred embodiment, the semantic association module extracts the main content of the conversation between the two parties by using these main content as keywords, and using text-based image retrieval to retrieve the corresponding candidate picture set from the image database.

为实现所述目的,本发明的第二方面提供一种使用基于多模态辅助的跨语言沟通系统实现跨语言沟通的方法,该方法以用户对话聊天为基础,根据文本解析技术对谈话内容分析得到的结果,为用户提供多媒体元素以辅助语言交流上存在障碍的或者文化背景存在差异的用户之间的语义理解,所述方法实现步骤包括以下:In order to achieve the stated purpose, the second aspect of the present invention provides a method for realizing cross-language communication using a cross-language communication system based on multimodal assistance. The method is based on user dialogue and chat, and analyzes the content of the conversation according to text analysis technology. The result obtained is to provide users with multimedia elements to assist semantic understanding between users who have language barriers or have differences in cultural backgrounds. The implementation steps of the method include the following:

步骤S1:用户首先通过语义聊天的前台界面发送自己想和对方的聊天的文字内容,前台界面通过Ajax构建的前后台交互模块向后台的语义关联模块传递用户聊天的文本信息,采用基于主题的跨模态分析方法对用户谈话内容进行分析,利用自然语言处理工具自动地提取对话中的中心议题及关键字;Step S1: The user first sends the text content he wants to chat with the other party through the front-end interface of the semantic chat. The modal analysis method analyzes the content of the user's conversation, and uses natural language processing tools to automatically extract the central topic and keywords in the conversation;

步骤S2:语义关联模块根据对话中的中心议题及关键字信息,采用基于文本的图像检索自动地从数据库或者互联网根据谈话主题检索相关的图片集和视频片段并提供给谈话双方;Step S2: According to the central topic and keyword information in the conversation, the semantic association module automatically retrieves relevant picture collections and video clips from the database or the Internet according to the conversation topic and provides them to the conversation parties by using text-based image retrieval;

步骤S3:系统根据谈话双方的文本聊天信息以及与之相对应的图片和视频片段内容,生成一个多模态的谈话摘要,最终以多媒体的形式来实现不同语种的用户之间顺畅的语义交流;同时,系统根据谈话双方的文本聊天历史信息以及与之相对应的图片和视频内容,能为谈话双方生成一个多模态的谈话摘要。Step S3: The system generates a multi-modal conversation summary based on the text chat information of the two parties in the conversation and the corresponding pictures and video clips, and finally realizes smooth semantic communication between users of different languages in the form of multimedia; At the same time, the system can generate a multi-modal conversation summary for the two parties based on the text chat history information of the two parties and the corresponding picture and video content.

优选实施例,所述多模态的谈话摘要包含文本、音频、图像和视频信息,为用户提供多媒体元素以辅助语言交流上存在障碍的或者文化背景存在差异的用户之间的语义理解。In a preferred embodiment, the multimodal conversation summary includes text, audio, image and video information, and provides multimedia elements for users to assist semantic understanding between users who have language barriers or have different cultural backgrounds.

优选实施例,所述图片和视频片段内容是通过搜索从网络自动扒取,或从一个预先已标注好的多媒体库中直接获取。In a preferred embodiment, the content of the pictures and video clips is automatically picked up from the network through searching, or directly obtained from a pre-marked multimedia library.

优选实施例,所述多模态的谈话摘要是基于主题的摘要,使用的关系网络并根据统计上次谈话中出现在一个预定义预料库中的词语共生频率得到检测主题。In a preferred embodiment, the multimodal conversation summary is a topic-based summary, using a relational network and detecting topics according to statistics of co-occurrence frequencies of words that appeared in a predefined prediction library in the last conversation.

本发明的有益效果:本发明的核心是如何通过多媒体信息(图像或者视频)来对文本信息进行描述。本发明提出的基于多模态辅助的跨语言沟通系统能为在线即时通讯提供友好和方便的环境,有三个主要特点:第一友好性,由于采用了基于话题相关的图像或视频搜索技术辅助文本内容理解,从而大大减少了翻译的多义性和歧义性;第二交互性,使得系统能够更好地满足用户个性化的需求;第三易用性,所提出的系统能够根据谈话记录自动地生成多媒体的摘要。Beneficial effects of the present invention: the core of the present invention is how to describe text information through multimedia information (image or video). The cross-language communication system based on multimodal assistance proposed by the present invention can provide a friendly and convenient environment for online instant messaging, and has three main features: first, friendliness, due to the use of topic-related image or video search technology to assist text Content understanding, which greatly reduces the ambiguity and ambiguity of translation; second interactivity, which enables the system to better meet the individual needs of users; third ease of use, the proposed system can automatically Generate summaries for multimedia.

为了辅助使用者之间的交流与理解,本发明的系统采用了基于主题的跨模态分析方法。系统根据谈话双方的文本聊天信息以及与之相对应的图片和视频内容,生成一个多模态的谈话摘要。这样,由于这个多模态的谈话通过包含丰富的内容,即非常直观易懂的图像、视频、文本等的多模态辅助信息,从而有效消除纯文本之间的自动翻译出现的歧义性,提高了语言交流的效率及质量,实现不同语种的用户之间进行顺畅的语义交流。In order to assist the communication and understanding between users, the system of the present invention adopts a topic-based cross-modal analysis method. The system generates a multi-modal conversation summary based on the text chat information of the two parties in the conversation and the corresponding pictures and video content. In this way, because this multimodal conversation contains rich content, that is, multimodal auxiliary information such as images, videos, and texts that are very intuitive and easy to understand, it can effectively eliminate the ambiguity that occurs in automatic translation between plain texts and improve It improves the efficiency and quality of language communication, and realizes smooth semantic communication between users of different languages.

附图说明 Description of drawings

图1是本发明基于多模态辅助的跨语言沟通系统的界面框图;Fig. 1 is the interface block diagram of the cross-language communication system based on multimodal assistance in the present invention;

图2是本发明基于多模态辅助的跨语言沟通系统的结构框图;Fig. 2 is the structural block diagram of the cross-language communication system based on multimodal assistance in the present invention;

图3a和图3b给出了一个预定披萨的示例结果;Figures 3a and 3b present an example result for ordering a pizza;

图4针对谈话内容的多媒体摘要示例。Figure 4 is an example of a multimedia summary for a conversation.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

本发明提出基于多模态辅助的跨语言沟通系统及实现跨语言沟通的方法,所述方法利用前台交互模块1、数据管理模块2和语义关联模块3,通过分析谈话内容,利用自然语言处理工具能够自动地提取对话中的中心议题及关键字,并语义关联模块3根据检测到的中心议题与关键字信息,自动地搜索相关的图片和视频片段并以恰当的方式提供给谈话双方,从而达到促进彼此的了解和沟通。这里,作为辅助理解的图片和视频,既可以通过搜索的方法从网络自动扒取,也可以从一个预先已标注好的多媒体库中直接获取。最后,系统根据谈话双方的文本聊天信息以及与之相对应的图片和视频内容,生成一个多模态的谈话摘要。The present invention proposes a cross-language communication system based on multimodal assistance and a method for realizing cross-language communication. The method utilizes a front-end interaction module 1, a data management module 2 and a semantic association module 3 to analyze conversation content and use natural language processing tools It can automatically extract the central topic and keywords in the conversation, and the semantic association module 3 automatically searches for relevant pictures and video clips based on the detected central topic and keyword information, and provides them to both parties in an appropriate way, so as to achieve Promote mutual understanding and communication. Here, the pictures and videos used as aids in understanding can be automatically picked up from the Internet by searching, or directly obtained from a pre-marked multimedia library. Finally, the system generates a multimodal conversation summary based on the text chat information of the two parties in the conversation and the corresponding picture and video content.

图1示出了本发明提出了一个辅助跨语言沟通的多媒体聊天系统的用户交互界面,它能够为使用不同语言的用户进行交流提供一个友好、可交互的及时沟通环境。其中主要包括了三个方面的功能:基于及时翻译的文本通信,一个基于谈话主题的图片或视频检索,以及针对谈话内容的多媒体摘要(图4示出)。图1的最上面的部分主要是用来显示系统的名字以及用户聊天谈话的主题。接下来的是系统界面的主要显示区,即文本对话和多媒体辅助信息显示,例如问路、买车、定宾馆等。图1中的右侧部分是基于及时翻译的文本通信,用户文本聊天区域:呈现用户基本的文字聊天信息机相关的Google翻译的文本信息;图1左侧部分是一个基于谈话主题的图片或视频检索,以及针对谈话内容的多媒体摘要,多媒体内容展示区:基于用户谈话的内容为用户呈现相关的多媒体信息辅助用户的语义理解。Fig. 1 shows the user interface of a multimedia chatting system that assists cross-language communication proposed by the present invention, which can provide a friendly, interactive and timely communication environment for users using different languages to communicate. It mainly includes three aspects of functions: text communication based on timely translation, a picture or video retrieval based on the topic of the conversation, and multimedia summary for the content of the conversation (shown in Figure 4). The top part of Figure 1 is mainly used to display the name of the system and the subject of the user's chat conversation. What follows is the main display area of the system interface, that is, text dialogue and multimedia auxiliary information display, such as asking for directions, buying a car, and ordering a hotel. The right part of Figure 1 is text communication based on instant translation, and the user text chat area: presents the user's basic text chat information and Google-translated text information related to the machine; the left part of Figure 1 is a picture or video based on the topic of the conversation Retrieval, as well as multimedia abstracts for conversation content, multimedia content display area: based on the content of user conversations, relevant multimedia information is presented for users to assist users in semantic understanding.

如图2示出本发明基于多模态辅助的跨语言沟通系统的结构框图。基于多模态辅助的跨语言沟通系统的框架分成三个组成部分,即前台交互模块1,数据管理模块2和语义关联模块3。其中前台设计包括聊天界面和前后台交互两个部分。其中前台交互模块1接受用户输入的文本聊天内容并对用户聊天的内容进行预处理,得到用户聊天的文本信息;用户的聊天文字内容通过前台交互模块1的前后台交互字模块的输出端将处理后的用户文本聊天内容传送给语义关联模块3,前台交互模块1的聊天页面为用户显示聊天双方的对话的文字内容和根据双方谈话的内容系统推荐出来的多媒体图片。FIG. 2 shows a structural block diagram of the cross-language communication system based on multi-modal assistance in the present invention. The framework of the cross-language communication system based on multimodal assistance is divided into three components, namely the front-end interaction module 1, the data management module 2 and the semantic association module 3. The front-end design includes two parts: the chat interface and the front-end and back-end interaction. Wherein the foreground interactive module 1 accepts the text chat content of user's input and carries out preprocessing to the content of user's chat, obtains the text information of user's chat; The user's text chat content after is transmitted to semantic association module 3, and the chat page of foreground interactive module 1 shows the text content of the conversation of chatting two sides and the multimedia picture that comes out according to the content system of both sides' conversation for the user display.

语义关联模块3的输入端与前台交互模块1输出端连接,接收并通过对用户的文字聊天内容进行分析之后,利用自然语言处理工具提取出双方谈话的主要内容,得到并输出文本信息关联上翻译的文本和相对应的多媒体信息,及根据文本聊天内容、翻译的内容和相应的多媒体信息生成一个多模态摘要;语义关联模块3将文本聊天内容、翻译的内容和相应的多媒体信息一起输出到数据管理模块2。The input end of the semantic association module 3 is connected to the output end of the front-end interaction module 1. After receiving and analyzing the content of the user’s text chat, the natural language processing tool is used to extract the main content of the conversation between the two parties, and the text information is obtained and output. text and corresponding multimedia information, and generate a multimodal summary according to text chat content, translated content and corresponding multimedia information; semantic association module 3 outputs text chat content, translated content and corresponding multimedia information to Data management module2.

数据管理模块2的输入端与语义关联模块3连接输出端连接,数据管理模块2要对新输入文本聊天内容、翻译的内容和相应的多媒体的信息进行存储。同时要把历史用户信息连同新的用户信息进行整合,生成并显示所有的聊天双方的对话的文字内容和根据双方谈话的内容系统推荐出来的多媒体图片信息;最后一并返还给前台交互模块1。最终前台交互模块1的聊天页面就会将所有的信息全部显示给用户。下面详细说明一下模块的工作流程。The input end of the data management module 2 is connected to the output end of the semantic association module 3, and the data management module 2 stores newly input text chat content, translated content and corresponding multimedia information. At the same time, it is necessary to integrate historical user information with new user information, generate and display all the text content of the dialogue between the chatting parties and the multimedia picture information recommended by the system according to the content of the conversation between the two parties; and finally return them to the front-end interaction module 1. Finally, the chat page of the foreground interaction module 1 will display all the information to the user. The workflow of the module is described in detail below.

用户首先通过聊天界面向前台交互模块1发送聊天内容。续请见图1用户的语义聊天界面是分成两个主要的部分,一部分就是显示传统的聊天双方的对话的文字内容的部分,另一部分就是显示根据双方谈话的内容系统推荐出来的多媒体图片列表。这个时候前台界面通过Ajax构建的前后台交互模块向后台传递用户输入的文字聊天的文本信息。后台框架是分成两个部分,一部分是数据管理模块2,另一部分是语义关联模块3。当后台收到用户发送过来的文本信息之后,语义关联模块3为了帮助不同语种的聊天用户能够从自身的使用的语言的角度来理解对方的说话的含义,将Google翻译的结果集成进来。这样除了原始的用户聊天信息以外,还附带上了对这个聊天内容的基于Google翻译的用户聊天的译文。语义关联模块3对文本信息利用自然语言处理工具提取出双方谈话的主要内容。这个时候,语义关联模块3首先将这些主要内容作为关键字,采用基于文本的图像检索从图像数据库中检索出来相应的候选图片集。最后用户的所有和对话和相应的多媒体信息可以用来生成一个多模态摘要。以一个预定披萨的示例结果为例说明一下生成的多媒体摘要,如图4所示。从图4给出的这个基于多模态的摘要看出,用户在和披萨店的货物员的对话中,进行了披萨种类、饮料和付款方式的选择。用户通过聊天系统反馈回来的相应的披萨店的披萨的图片,能够更好地根据自己的意愿进行选择。这个多模态摘要也有利于用户日后想再次想定披萨,可以根据这个多模态摘要提供的多媒体信息来帮助用户进行回顾。The user first sends the chat content to the foreground interaction module 1 through the chat interface. See Figure 1. The user's semantic chat interface is divided into two main parts, one part is to display the text content of the traditional conversation between the two parties in the chat, and the other part is to display the list of multimedia pictures recommended by the system according to the content of the conversation between the two parties. At this time, the front-end interface transmits the text information of the text chat entered by the user to the back-end through the front-end and back-end interaction modules built by Ajax. The background framework is divided into two parts, one is the data management module 2, and the other is the semantic association module 3. After the background receives the text information sent by the user, the semantic association module 3 integrates the results of Google translation in order to help chat users of different languages understand the meaning of the other party's speech from the perspective of their own language. In this way, in addition to the original user chat information, a translation of the chat content based on Google Translate is attached. Semantic association module 3 uses natural language processing tools to extract the main content of the conversation between the two parties on the text information. At this time, the semantic association module 3 first uses these main contents as keywords, and uses text-based image retrieval to retrieve corresponding candidate picture sets from the image database. Finally, all user conversations and corresponding multimedia information can be used to generate a multimodal summary. Take a sample result of ordering pizza as an example to illustrate the generated multimedia summary, as shown in Figure 4. From the multimodal summary given in Figure 4, it can be seen that the user made choices about the type of pizza, drink and payment method during the conversation with the goods clerk in the pizzeria. The picture of the pizza of the corresponding pizzeria that the user feeds back through the chat system can better choose according to his own wishes. This multimodal summary is also beneficial for the user to imagine pizza again in the future, and the multimedia information provided by this multimodal summary can help the user to review.

下面对图2中的语义关联机制进行阐述。语义关联机制主要分成三个部分,即基于即时翻译的文本通信、基于话题和图片的视频检索以及最后基于用户文本聊天内容和相应的多媒体信息生成的多模态摘要。The semantic association mechanism in Fig. 2 is described below. The semantic association mechanism is mainly divided into three parts, namely text communication based on instant translation, video retrieval based on topics and pictures, and finally multimodal summarization based on user text chat content and corresponding multimedia information.

(1).基于及时翻译的文本通信(1). Text communication based on timely translation

类似大多数的及时通信系统,本发明提出的系统也支持最基本的文本通信。但是,由于谈话的双方可能具有不同的语言背景。例如,当一个说英语的美国人和一个说汉语的中国人在网上交谈,美国人不懂汉语,而中国人又不懂英语,通过普通的文本交谈不能使双方无障碍的沟通。为此,本发明的系统集成了一个简单的机器翻译功能,在聊天时,将说话者的语言自动翻译为接受者的语言后再显示出来,这样就能够保证谈话双方能够大致了解对方的意图。Similar to most instant communication systems, the system proposed by the present invention also supports the most basic text communication. However, since the two parties in the conversation may have different language backgrounds. For example, when an English-speaking American talks with a Chinese-speaking Chinese on the Internet, and the American does not understand Chinese, and the Chinese does not understand English, ordinary text conversations cannot enable the two parties to communicate without barriers. For this reason, the system of the present invention integrates a simple machine translation function. When chatting, the speaker's language is automatically translated into the recipient's language and then displayed, so that both sides of the conversation can roughly understand each other's intentions.

(2).基于话题的图片和视频检索(2). Topic-based image and video retrieval

尽管有机器翻译作为桥梁,跨语言的沟通仍然不能令人十分满意。究其原意,主要在于机器翻译的准确性(翻译的目标语言的可理解程度)依然偏低。主要语种间的翻译结果,例如英语与汉语之间,仍然还达不到实用的标准。另外,由于日常用语中多义词与句子的存在,导致机器翻译技术也难以满足现实的需要。图3a中示出食品包括:海食品、水果、肉。水果包括:香蕉、苹果、桔子,例如“苹果”一词既可以表示一种水果,也可以表示苹果公司(图3a)。为了营造一种易于理解的、沉浸式的在线沟通环境,我们设计了一种基于主题的图片/视频检索子模块来辅助不同语言背景的用户相互交流。其中,话题检测、图片检索以及相关反馈是三个主要功能。Despite machine translation as a bridge, cross-language communication is still not quite satisfactory. The original intention is that the accuracy of machine translation (the intelligibility of the translated target language) is still low. The translation results between major languages, such as between English and Chinese, are still not up to practical standards. In addition, due to the existence of polysemous words and sentences in everyday language, it is difficult for machine translation technology to meet the needs of reality. The foods shown in Fig. 3a include: seafood, fruit, and meat. Fruits include: bananas, apples, and oranges. For example, the word "apple" can represent either a fruit or an apple company (Fig. 3a). In order to create an easy-to-understand and immersive online communication environment, we designed a topic-based image/video retrieval sub-module to assist users with different language backgrounds to communicate with each other. Among them, topic detection, image retrieval and related feedback are the three main functions.

话题检测通过两种途径来实现。第一是用户从一个预定义的话题列表中选择一个话题。不同的话题与不同的已标注的(通过手工或者学习的方法得到标注)图片/视频数据库相关联。第二种方法则是通过抽取文本分析提取主题关键词。在一次对话中,可以抽取许多表示谈话内容的实体词。根据这些实体词,我们首先建立一个类似WordNet的语义关系树,它对词间的语义继承关系进行了刻画,如图3a所示,词“苹果”,“香蕉”以及“桔子”都属于食品类中的水果子类,而图3b所示“苹果”一词同时可能又同时与“戴尔”,“联想”一道属于电脑品牌这一类,图3b示出“苹果”电脑品牌例子包括:台式电脑mac、平板电脑ipad及智能手机iphone。上述的这些语义关系可以从WordNet中所抽取得到,也可以通过使用通过统计单词在一个预定义的语料库中的“词频-反向文档频率”权重(TF-IDF)所得到。一旦我们从对话中抽取到关键词,系统就可以通过分析关键词间的语义关系来自动地推断其所对应的潜在话题。Topic detection is achieved in two ways. The first is for the user to select a topic from a predefined list of topics. Different topics are associated with different annotated (annotated by hand or learned methods) image/video databases. The second method is to extract subject keywords through extractive text analysis. In a conversation, many entity words representing the content of the conversation can be extracted. According to these entity words, we first build a semantic relationship tree similar to WordNet, which describes the semantic inheritance relationship between words. As shown in Figure 3a, the words "apple", "banana" and "orange" all belong to the category of food Fruit sub-category in , and the word "Apple" shown in Figure 3b may also belong to the category of computer brands together with "Dell" and "Lenovo". Figure 3b shows examples of "Apple" computer brands include: desktop computers mac, tablet ipad and smartphone iphone. The above semantic relationships can be extracted from WordNet, or can be obtained by using the "term frequency-inverse document frequency" weight (TF-IDF) of words in a predefined corpus. Once we extract keywords from the conversation, the system can automatically infer the corresponding potential topics by analyzing the semantic relationship between the keywords.

根据对话中所抽取的主题,系统自动地从网络或者后台数据库中检索相应的图片信息。使用基于文本的检索,我们可以容易地根据谈话主题找到相关的标注图片。然而,大部分的网络图片都是未标注的,我们使用检索到的已标注好的文本相关联的图片作为训练集,学习得到一个主题模型,并且用这个主题模型区检索大量的未标注图片。为此,基于主题的图片检索需要首先构建主题模型,其目标是自动地找到一个潜在的(隐含的)语义空间以便更准确的建模检索过程中的文档信息。这里,一个文档的语义结构包括了一些潜在的隐含概念或者主题(它们往往对应词间的一种稳定而特有的共生模式)。通过潜在主题的加权组合,文档可以表示为一系列的潜在主题,而其较全组合系数则可以看做是文档的一种特征表示。这种表示具有一些系列的优点:首先语义空间相较于单词空间而言,维度往往较低。这不仅节约了存储空间,也有利于快速搜索;其次通过单词空间到语义空间的转换,不仅可以减少单词向量中的噪音,而且也可以解决上述的多义和歧义问题,进而提高检索性能。例如,单词“苹果”既可以表示一种水果,又可以表示一个电脑品牌(图3b)。它的准确意义可以同一主题的其他相关的关键词所推得。According to the topic extracted in the dialogue, the system automatically retrieves the corresponding picture information from the network or background database. Using text-based retrieval, we can easily find relevant annotated images based on the conversation topic. However, most of the network pictures are unlabeled. We use the retrieved pictures associated with the labeled text as the training set to learn a topic model, and use this topic model to retrieve a large number of unlabeled pictures. For this reason, topic-based image retrieval needs to build a topic model first, and its goal is to automatically find a latent (implicit) semantic space in order to more accurately model the document information in the retrieval process. Here, the semantic structure of a document includes some potential hidden concepts or themes (they often correspond to a stable and unique co-occurrence pattern between words). Through the weighted combination of latent topics, a document can be represented as a series of latent topics, and its comprehensive combination coefficient can be regarded as a feature representation of the document. This representation has a number of advantages: First, the semantic space tends to have a lower dimensionality than the word space. This not only saves storage space, but also facilitates fast search; secondly, through the conversion from word space to semantic space, it can not only reduce the noise in the word vector, but also solve the above polysemy and ambiguity problems, thereby improving the retrieval performance. For example, the word "apple" can refer to both a fruit and a computer brand (Fig. 3b). Its precise meaning can be deduced from other related keywords on the same topic.

反馈作为一种流行的人机交互技术广泛应用于文本域视觉信息的分析中。通过用户对系统输出的反馈评价,系统可以自适应地进行修正。通过用户反馈所得到的监督信息已经在实践中被证明是有效地。在我们的系统中,用户可以从自动的主题抽取算法所得到的候选列表中选择正确的主题。被选主题将用于下一次的主题抽取通过建模时序的(当前和下一步的)主题关系。在图像检索中,我们的系统列巨额了一些检索到的样本图片,并且邀请用户依据谈话主题对相关图片进行打分。Feedback, as a popular human-computer interaction technique, is widely used in the analysis of visual information in the text domain. Through the user's feedback and evaluation of the system output, the system can be adaptively corrected. Supervision information obtained through user feedback has been proven to be effective in practice. In our system, users can select the correct topic from a candidate list generated by an automatic topic extraction algorithm. The selected topics will be used for the next topic extraction by modeling temporal (current and next) topic relationships. In image retrieval, our system lists some retrieved sample images and invites users to rate related images according to the conversation topic.

(3).多模态摘要(3). Multimodal Summary

传统的及时通信通常保存以文本方式保留聊天记录。我们的系统中,用户可以使用图片、视频以及文本等多模态的方式来表达谈话者的意图。通过一种多模态的方式而非单一的文本来保存聊天信息,可以得到较之以往更加生动形象记录。Traditional instant communication usually keeps chat records in text mode. In our system, users can use multi-modal methods such as pictures, videos, and texts to express the speaker's intentions. By saving chat information in a multimodal way instead of a single text, you can get more vivid records than ever before.

文本,图片以及视频的摘要是自然语言处理以及多媒体领域的一个研究热点。它往往通过一段更为精练简洁的文本(图片或者视频)来概括地表达原始的文本(图片或者视频)信息。目前相关的技术大多根据显著性特征,重复的模态或者关键词(帧)等信息来构建摘要内容。在我们的系统中,考虑到除文本外还存在大量的图片和视频信息,我们采用了主题驱动的摘要方法通过分析用户间的谈话内容进而生成关于特定话题的摘要信息。这一摘要信息包含了涉及该话题的相关文本、图片以及视频内容。Text, image and video summarization is a research hotspot in the field of natural language processing and multimedia. It often expresses the original text (picture or video) information in a general way through a more concise and concise text (picture or video). Most of the current related technologies construct abstract content based on salient features, repeated modalities or keywords (frames) and other information. In our system, considering that there are a large amount of image and video information besides text, we adopt a topic-driven summarization method to generate summary information on a specific topic by analyzing the content of conversations between users. This summary information contains relevant text, pictures and video content related to the topic.

以上所述,仅为本发明中的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉该技术的人在本发明所揭露的技术范围内,可理解想到的变换或替换,都应涵盖在本发明的权利要求书的保护范围之内。The above is only a specific implementation mode in the present invention, but the scope of protection of the present invention is not limited thereto. Anyone familiar with the technology can understand the conceivable transformation or replacement within the technical scope disclosed in the present invention. All should be covered within the scope of protection of the claims of the present invention.

Claims (7)

1.一种基于多模态辅助的跨语言沟通系统,其特征在于,所述系统包括:前台交互模块、数据管理模块和语义关联模块,其中:1. A cross-language communication system based on multimodal assistance, characterized in that the system includes: a front-end interaction module, a data management module and a semantic association module, wherein: 前台交互模块的输入端接受用户输入的文本聊天内容并对用户聊天的内容进行预处理,得到用户聊天的文本信息,并通过前台交互模块的前后台交互模块的输出端传送处理后的用户文本聊天内容;前台交互模块的聊天页面为用户显示聊天双方的对话的文字内容和根据双方谈话的内容系统推荐出来的多媒体图片;The input terminal of the front-end interaction module accepts the text chat content input by the user and preprocesses the content of the user chat, obtains the text information of the user chat, and transmits the processed user text chat through the output end of the front-end interaction module of the front-end interaction module Content; the chat page of the front-end interactive module displays the text content of the conversation between the two parties and the multimedia pictures recommended by the system according to the content of the conversation between the two parties for the user; 语义关联模块的输入端与前台交互模块输出端连接,接收并对用户的文本聊天内容进行分析,利用自然语言处理工具提取出双方谈话的主要内容,得到并输出文本信息关联上翻译的文本和相对应的多媒体信息,及根据文本聊天内容、翻译的内容和相应的多媒体信息生成一个多模态摘要;The input end of the semantic association module is connected to the output end of the front-end interaction module, which receives and analyzes the text chat content of the user, uses natural language processing tools to extract the main content of the conversation between the two parties, obtains and outputs the text information associated with the translated text and related Corresponding multimedia information, and generating a multimodal summary based on text chat content, translated content and corresponding multimedia information; 数据管理模块的输入端与语义关联模块连接输出端连接,数据管理模块要对新输入的文本聊天内容、翻译的内容和相应的多媒体信息进行存储,同时把历史的用户信息连同新的用户信息进行整合,生成并显示所有的聊天双方的对话的文字内容和根据双方谈话的内容系统推荐出来的多媒体图片信息。The input end of the data management module is connected to the output end of the semantic association module. The data management module needs to store the newly input text chat content, translated content and corresponding multimedia information, and simultaneously store the historical user information together with the new user information. Integrate, generate and display all the text content of the conversation between the two chatting parties and the multimedia picture information recommended by the system according to the content of the conversation between the two parties. 2.如权利要求1基于多模态辅助的跨语言沟通系统,其特征在于,当后台的语义关联模块收到用户发送过来的文本信息之后,语义关联模块为了帮助不同语种的聊天用户能够从使用的语言的角度来理解对方的说话的含义,将Google翻译的结果集成进来;这样除了原始的用户聊天信息以外,还附带上了对这个聊天内容的基于Google翻译的用户聊天的译文。2. The cross-language communication system based on multi-modal assistance as claimed in claim 1, characterized in that, after the semantic association module in the background receives the text information sent by the user, the semantic association module can help chat users of different languages from using In order to understand the meaning of the other party’s speech from the perspective of their own language, the results of Google translation are integrated; in this way, in addition to the original user chat information, the translation of the chat content based on Google translation is also attached. 3.如权利要求1基于多模态辅助的跨语言沟通系统,其特征在于,语义关联模块提取出双方谈话的主要内容是将这些主要内容作为关键字,采用基于文本的图像检索从图像数据库中检索出来相应的候选图片集。3. The cross-language communication system based on multimodal assistance as claimed in claim 1, wherein the main content of the conversation between the two parties is extracted by the semantic association module by using these main content as keywords, and using text-based image retrieval from the image database The corresponding candidate image set is retrieved. 4.一种使用权利要求1所述基于多模态辅助的跨语言沟通系统实现跨语言沟通的方法,其特征在于,该方法以用户对话聊天为基础,根据文本解析技术对谈话内容分析得到的结果,为用户提供多媒体元素以辅助语言交流上存在障碍的或者文化背景存在差异的用户之间的语义理解,所述方法实现包括以下步骤:4. A method for realizing cross-language communication using the cross-language communication system based on multimodal assistance described in claim 1, characterized in that, the method is based on the user dialogue and chatting, and is obtained by analyzing the content of the conversation according to the text analysis technology As a result, multimedia elements are provided for users to assist semantic understanding between users who have language barriers or have differences in cultural backgrounds, and the implementation of the method includes the following steps: 步骤S1:用户首先通过语义聊天的前台界面发送自己想和对方的聊天的文字内容,前台界面通过Ajax构建的前后台交互模块向后台的语义关联模块传递用户聊天的文本信息,采用基于主题的跨模态分析方法对用户谈话内容进行分析,利用自然语言处理工具自动地提取对话中的中心议题及关键字;Step S1: The user first sends the text content he wants to chat with the other party through the front-end interface of the semantic chat. The modal analysis method analyzes the content of the user's conversation, and uses natural language processing tools to automatically extract the central topic and keywords in the conversation; 步骤S2:语义关联模块根据对话中的中心议题及关键字信息,采用基于文本的图像检索自动地从数据库或者互联网根据谈话主题检索相关的图片集和视频片段并提供给谈话双方;Step S2: According to the central topic and keyword information in the conversation, the semantic association module automatically retrieves relevant picture collections and video clips from the database or the Internet according to the conversation topic and provides them to the conversation parties by using text-based image retrieval; 步骤S3:系统根据谈话双方的文本聊天信息以及与之相对应的图片和视频片段内容,生成一个多模态的谈话摘要,最终以多媒体的形式来实现不同语种的用户之间顺畅的语义交流;同时,系统根据谈话双方的文本聊天历史信息以及与之相对应的图片和视频内容,能为谈话双方生成一个多模态的谈话摘要。Step S3: The system generates a multi-modal conversation summary based on the text chat information of the two parties in the conversation and the corresponding pictures and video clips, and finally realizes smooth semantic communication between users of different languages in the form of multimedia; At the same time, the system can generate a multi-modal conversation summary for the two parties based on the text chat history information of the two parties and the corresponding picture and video content. 5.如权利要求4所述的实现跨语言沟通的方法,其特征在于,所述多模态的谈话摘要包含文本、音频、图像和视频信息,为用户提供多媒体元素以辅助语言交流上存在障碍的或者文化背景存在差异的用户之间的语义理解。5. The method for realizing cross-language communication as claimed in claim 4, characterized in that, the multi-modal talk summary includes text, audio, image and video information, and provides multimedia elements for users to assist in language communication. Semantic understanding between users with different or cultural backgrounds. 6.如权利要求4所述的实现跨语言沟通的方法,其特征在于,所述图片和视频片段内容是通过搜索从网络自动扒取,或从一个预先已标注好的多媒体库中直接获取。6. The method for realizing cross-language communication as claimed in claim 4, characterized in that, the contents of the pictures and video clips are automatically picked up from the Internet by searching, or directly obtained from a pre-marked multimedia library. 7.如权利要求4所述的实现跨语言沟通的方法,其特征在于,所述多模态的谈话摘要是基于主题的摘要,使用的关系网络并根据统计上次谈话中出现在一个预定义预料库中的词语共生频率得到检测主题。7. The method for realizing cross-language communication as claimed in claim 4, characterized in that, the multi-modal conversation summary is a subject-based summary, using a relational network and appearing in a predefined The co-occurrence frequency of words in the expected library is detected by the subject.
CN201110225342XA 2011-08-08 2011-08-08 System and method for realizing cross-language communication based on multi-mode assistance Pending CN102262624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110225342XA CN102262624A (en) 2011-08-08 2011-08-08 System and method for realizing cross-language communication based on multi-mode assistance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110225342XA CN102262624A (en) 2011-08-08 2011-08-08 System and method for realizing cross-language communication based on multi-mode assistance

Publications (1)

Publication Number Publication Date
CN102262624A true CN102262624A (en) 2011-11-30

Family

ID=45009255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110225342XA Pending CN102262624A (en) 2011-08-08 2011-08-08 System and method for realizing cross-language communication based on multi-mode assistance

Country Status (1)

Country Link
CN (1) CN102262624A (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567509A (en) * 2011-12-26 2012-07-11 中国科学院自动化研究所 Method and system for instant messaging with visual messaging assistance
CN102750366A (en) * 2012-06-18 2012-10-24 海信集团有限公司 Video search system and method based on natural interactive import and video search server
WO2013080214A1 (en) * 2011-12-02 2013-06-06 Hewlett-Packard Development Company, L.P. Topic extraction and video association
CN104536570A (en) * 2014-12-29 2015-04-22 广东小天才科技有限公司 Information processing method and device of smart watch
CN104679733A (en) * 2013-11-26 2015-06-03 中国移动通信集团公司 Voice conversation translation method, device and system
CN105260396A (en) * 2015-09-16 2016-01-20 百度在线网络技术(北京)有限公司 Word retrieval method and apparatus
CN105335343A (en) * 2014-07-25 2016-02-17 北京三星通信技术研究有限公司 Text editing method and device
CN105898627A (en) * 2016-05-31 2016-08-24 北京奇艺世纪科技有限公司 Video playing method and device
WO2016150083A1 (en) * 2015-03-24 2016-09-29 北京搜狗科技发展有限公司 Information input method and apparatus
CN106295565A (en) * 2016-08-10 2017-01-04 中用环保科技有限公司 Monitor event identifications based on big data and in real time method of crime prediction
WO2016197767A3 (en) * 2016-02-16 2017-02-02 中兴通讯股份有限公司 Method and device for inputting expression, terminal, and computer readable storage medium
CN106682967A (en) * 2017-01-05 2017-05-17 胡开标 Online translation and chat system
CN107480766A (en) * 2017-07-18 2017-12-15 北京光年无限科技有限公司 The method and system of the content generation of multi-modal virtual robot
CN107798386A (en) * 2016-09-01 2018-03-13 微软技术许可有限责任公司 More process synergics training based on unlabeled data
CN108027812A (en) * 2015-09-18 2018-05-11 迈克菲有限责任公司 System and method for multipath language translation
CN108173747A (en) * 2017-12-27 2018-06-15 上海传英信息技术有限公司 Information interacting method and device
CN108255939A (en) * 2017-12-08 2018-07-06 北京搜狗科技发展有限公司 A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN108369585A (en) * 2015-11-30 2018-08-03 三星电子株式会社 Method for providing translation service and its electronic device
CN108664336A (en) * 2017-04-01 2018-10-16 北京搜狗科技发展有限公司 Recommend method and apparatus, the device for recommendation
CN108874787A (en) * 2018-06-12 2018-11-23 深圳市合言信息科技有限公司 A method of analysis speech intention simultaneously carries out depth translation explanation
CN109255130A (en) * 2018-07-17 2019-01-22 北京赛思美科技术有限公司 A kind of method, system and the equipment of language translation and study based on artificial intelligence
CN109726265A (en) * 2018-12-13 2019-05-07 深圳壹账通智能科技有限公司 Information processing method, device and computer-readable storage medium for assisting chat
CN109817351A (en) * 2019-01-31 2019-05-28 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, equipment and storage medium
CN110209772A (en) * 2019-06-17 2019-09-06 科大讯飞股份有限公司 A kind of text handling method, device, equipment and readable storage medium storing program for executing
CN110706771A (en) * 2019-10-10 2020-01-17 复旦大学附属中山医院 Method, device, server and storage medium for generating multimodal patient teaching content
CN111651674A (en) * 2020-06-03 2020-09-11 北京妙医佳健康科技集团有限公司 Two-way search method, device and electronic device
CN112307156A (en) * 2019-07-26 2021-02-02 北京宝捷拿科技发展有限公司 Cross-language intelligent auxiliary side inspection method and system
CN113055275A (en) * 2016-08-30 2021-06-29 谷歌有限责任公司 Conditional disclosure of individually controlled content in a group context
CN113656613A (en) * 2021-08-20 2021-11-16 北京百度网讯科技有限公司 Method for training image-text retrieval model, multi-mode image retrieval method and device
WO2021233112A1 (en) * 2020-05-20 2021-11-25 腾讯科技(深圳)有限公司 Multimodal machine learning-based translation method, device, equipment, and storage medium
CN114629863A (en) * 2016-09-27 2022-06-14 微软技术许可有限责任公司 Control system using scoped search and dialog interfaces
CN114663246A (en) * 2022-05-24 2022-06-24 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method
CN115345407A (en) * 2021-05-13 2022-11-15 八维智能股份有限公司 Virtual assistant system for emergency dispatchers and method of operation thereof
CN116319691A (en) * 2022-09-06 2023-06-23 阿里巴巴(中国)有限公司 Auxiliary interaction method, conference auxiliary interaction method and conference auxiliary interaction device
CN116508315A (en) * 2020-09-03 2023-07-28 索尼互动娱乐股份有限公司 Multimode Gameplay Video Summary
CN117874265A (en) * 2023-12-19 2024-04-12 北京滴普科技有限公司 A complex data processing system and method based on large model
US12505673B2 (en) 2020-09-03 2025-12-23 Sony Interactive Entertainment Inc. Multimodal game video summarization with metadata

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834809A (en) * 2010-05-18 2010-09-15 华中科技大学 Internet instant message communication system
CN101251855B (en) * 2008-03-27 2010-12-22 腾讯科技(深圳)有限公司 Equipment, system and method for cleaning internet web page
US20110153752A1 (en) * 2009-12-21 2011-06-23 International Business Machines Corporation Processing of Email Based on Semantic Relationship of Sender to Recipient

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101251855B (en) * 2008-03-27 2010-12-22 腾讯科技(深圳)有限公司 Equipment, system and method for cleaning internet web page
US20110153752A1 (en) * 2009-12-21 2011-06-23 International Business Machines Corporation Processing of Email Based on Semantic Relationship of Sender to Recipient
CN101834809A (en) * 2010-05-18 2010-09-15 华中科技大学 Internet instant message communication system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINMING ZHANG, ET AL: "A visualized Communication System Using Cross-Media Semantic Association", 《17TH INTERNATIONAL MULTIMEDIA MODELING CONFERENCE》, 7 January 2011 (2011-01-07), pages 88 - 98, XP019159534 *

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9645987B2 (en) 2011-12-02 2017-05-09 Hewlett Packard Enterprise Development Lp Topic extraction and video association
WO2013080214A1 (en) * 2011-12-02 2013-06-06 Hewlett-Packard Development Company, L.P. Topic extraction and video association
CN102567509B (en) * 2011-12-26 2014-08-27 中国科学院自动化研究所 Method and system for instant messaging with visual messaging assistance
CN102567509A (en) * 2011-12-26 2012-07-11 中国科学院自动化研究所 Method and system for instant messaging with visual messaging assistance
CN102750366A (en) * 2012-06-18 2012-10-24 海信集团有限公司 Video search system and method based on natural interactive import and video search server
CN104679733A (en) * 2013-11-26 2015-06-03 中国移动通信集团公司 Voice conversation translation method, device and system
US10878180B2 (en) 2014-07-25 2020-12-29 Samsung Electronics Co., Ltd Text editing method and electronic device supporting same
CN105335343A (en) * 2014-07-25 2016-02-17 北京三星通信技术研究有限公司 Text editing method and device
US11790156B2 (en) 2014-07-25 2023-10-17 Samsung Electronics Co., Ltd. Text editing method and electronic device supporting same
CN104536570A (en) * 2014-12-29 2015-04-22 广东小天才科技有限公司 Information processing method and device of smart watch
WO2016150083A1 (en) * 2015-03-24 2016-09-29 北京搜狗科技发展有限公司 Information input method and apparatus
US10628524B2 (en) 2015-03-24 2020-04-21 Beijing Sogou Technology Development Co., Ltd. Information input method and device
CN105260396B (en) * 2015-09-16 2019-09-03 百度在线网络技术(北京)有限公司 Word retrieval method and device
CN105260396A (en) * 2015-09-16 2016-01-20 百度在线网络技术(北京)有限公司 Word retrieval method and apparatus
CN108027812A (en) * 2015-09-18 2018-05-11 迈克菲有限责任公司 System and method for multipath language translation
CN108369585A (en) * 2015-11-30 2018-08-03 三星电子株式会社 Method for providing translation service and its electronic device
CN108369585B (en) * 2015-11-30 2022-07-08 三星电子株式会社 Method for providing translation service and electronic device thereof
WO2016197767A3 (en) * 2016-02-16 2017-02-02 中兴通讯股份有限公司 Method and device for inputting expression, terminal, and computer readable storage medium
CN105898627B (en) * 2016-05-31 2019-04-12 北京奇艺世纪科技有限公司 A kind of video broadcasting method and device
CN105898627A (en) * 2016-05-31 2016-08-24 北京奇艺世纪科技有限公司 Video playing method and device
CN106295565A (en) * 2016-08-10 2017-01-04 中用环保科技有限公司 Monitor event identifications based on big data and in real time method of crime prediction
CN113055275B (en) * 2016-08-30 2022-08-02 谷歌有限责任公司 Conditional disclosure of individually controlled content in a group context
CN113055275A (en) * 2016-08-30 2021-06-29 谷歌有限责任公司 Conditional disclosure of individually controlled content in a group context
CN107798386A (en) * 2016-09-01 2018-03-13 微软技术许可有限责任公司 More process synergics training based on unlabeled data
CN114629863B (en) * 2016-09-27 2024-11-22 微软技术许可有限责任公司 Control systems using scoped search and dialog interfaces
CN114629863A (en) * 2016-09-27 2022-06-14 微软技术许可有限责任公司 Control system using scoped search and dialog interfaces
CN106682967A (en) * 2017-01-05 2017-05-17 胡开标 Online translation and chat system
CN108664336A (en) * 2017-04-01 2018-10-16 北京搜狗科技发展有限公司 Recommend method and apparatus, the device for recommendation
CN107480766A (en) * 2017-07-18 2017-12-15 北京光年无限科技有限公司 The method and system of the content generation of multi-modal virtual robot
WO2019109664A1 (en) * 2017-12-08 2019-06-13 北京搜狗科技发展有限公司 Cross-language search method and apparatus, and apparatus for cross-language search
CN108255939A (en) * 2017-12-08 2018-07-06 北京搜狗科技发展有限公司 A kind of cross-language search method and apparatus, a kind of device for cross-language search
CN108255939B (en) * 2017-12-08 2020-02-14 北京搜狗科技发展有限公司 Cross-language search method and device for cross-language search
CN108173747B (en) * 2017-12-27 2021-10-22 上海传英信息技术有限公司 Information interaction method and device
CN108173747A (en) * 2017-12-27 2018-06-15 上海传英信息技术有限公司 Information interacting method and device
CN108874787A (en) * 2018-06-12 2018-11-23 深圳市合言信息科技有限公司 A method of analysis speech intention simultaneously carries out depth translation explanation
CN109255130A (en) * 2018-07-17 2019-01-22 北京赛思美科技术有限公司 A kind of method, system and the equipment of language translation and study based on artificial intelligence
CN109726265A (en) * 2018-12-13 2019-05-07 深圳壹账通智能科技有限公司 Information processing method, device and computer-readable storage medium for assisting chat
CN109817351A (en) * 2019-01-31 2019-05-28 百度在线网络技术(北京)有限公司 A kind of information recommendation method, device, equipment and storage medium
CN110209772B (en) * 2019-06-17 2021-10-08 科大讯飞股份有限公司 Text processing method, device and equipment and readable storage medium
CN110209772A (en) * 2019-06-17 2019-09-06 科大讯飞股份有限公司 A kind of text handling method, device, equipment and readable storage medium storing program for executing
CN112307156A (en) * 2019-07-26 2021-02-02 北京宝捷拿科技发展有限公司 Cross-language intelligent auxiliary side inspection method and system
CN110706771A (en) * 2019-10-10 2020-01-17 复旦大学附属中山医院 Method, device, server and storage medium for generating multimodal patient teaching content
WO2021233112A1 (en) * 2020-05-20 2021-11-25 腾讯科技(深圳)有限公司 Multimodal machine learning-based translation method, device, equipment, and storage medium
CN111651674B (en) * 2020-06-03 2023-08-25 北京妙医佳健康科技集团有限公司 Bidirectional searching method and device and electronic equipment
CN111651674A (en) * 2020-06-03 2020-09-11 北京妙医佳健康科技集团有限公司 Two-way search method, device and electronic device
US12505673B2 (en) 2020-09-03 2025-12-23 Sony Interactive Entertainment Inc. Multimodal game video summarization with metadata
CN116508315A (en) * 2020-09-03 2023-07-28 索尼互动娱乐股份有限公司 Multimode Gameplay Video Summary
CN115345407A (en) * 2021-05-13 2022-11-15 八维智能股份有限公司 Virtual assistant system for emergency dispatchers and method of operation thereof
CN113656613A (en) * 2021-08-20 2021-11-16 北京百度网讯科技有限公司 Method for training image-text retrieval model, multi-mode image retrieval method and device
CN114663246A (en) * 2022-05-24 2022-06-24 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method
CN116319691A (en) * 2022-09-06 2023-06-23 阿里巴巴(中国)有限公司 Auxiliary interaction method, conference auxiliary interaction method and conference auxiliary interaction device
CN117874265A (en) * 2023-12-19 2024-04-12 北京滴普科技有限公司 A complex data processing system and method based on large model

Similar Documents

Publication Publication Date Title
CN102262624A (en) System and method for realizing cross-language communication based on multi-mode assistance
TWI732271B (en) Human-machine dialog method, device, electronic apparatus and computer readable medium
CN104050160B (en) Interpreter's method and apparatus that a kind of machine is blended with human translation
JP6667504B2 (en) Orphan utterance detection system and method
CN108491443B (en) Computer-implemented method and computer system for interacting with a user
CN103324665B (en) Hot spot information extraction method and device based on micro-blog
CN102567509B (en) Method and system for instant messaging with visual messaging assistance
US11929100B2 (en) Video generation method, apparatus, electronic device, storage medium and program product
JP2023062173A (en) Video generation method and apparatus of the same, and neural network training method and apparatus of the same
US11158349B2 (en) Methods and systems of automatically generating video content from scripts/text
CN107133345A (en) Exchange method and device based on artificial intelligence
CN104978332B (en) User-generated content label data generation method, device and correlation technique and device
KR20210002619A (en) Creation of domain-specific models in network systems
WO2022134701A1 (en) Video processing method and apparatus
Castellanos et al. LCI: a social channel analysis platform for live customer intelligence
CN103559220B (en) Picture searching equipment, method and system
US20240195765A1 (en) Personality reply for digital content
CN118764681B (en) Interaction method for video and processing method and device for video
CN117151119A (en) Content generation, model construction, data processing methods, devices, equipment and media
CN114064943A (en) Conference management method, conference management device, storage medium and electronic equipment
US20220365956A1 (en) Method and apparatus for generating patent summary information, and electronic device and medium
CN116257690A (en) A resource recommendation method, device, electronic equipment and storage medium
CN104281565A (en) Semantic dictionary constructing method and device
US20230112385A1 (en) Method of obtaining event information, electronic device, and storage medium
CN105426382A (en) Music recommendation method based on emotional context awareness of Personal Rank

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20111130