CN115827881A

CN115827881A - Multi-mode tourism information positioning type retrieval method based on tourism knowledge map

Info

Publication number: CN115827881A
Application number: CN202111088382.4A
Authority: CN
Inventors: 任桐炜; 黄蕾; 于凡; 赵志翔
Original assignee: Nanjing Research Institute Of Nanjing University; Nanjing University
Current assignee: Nanjing Research Institute Of Nanjing University; Nanjing University
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2023-03-21

Abstract

A multi-modal tourism information positioning retrieval method based on tourism knowledge graph, constructs a weighted tourism knowledge graph based on multi-modal data in the mixed database of graphic travel notes and travel videos, and saves it during the construction and update process The entity and the relationship between the entities index the semantic position of the data source. When the user performs a text search, the text extracts the search entity and the relationship between the search entity, and maps it to a subgraph of the knowledge graph. After the subgraph is enhanced and expanded, it is returned according to the corresponding index. Search Results. The results returned by the present invention to the retrieved text are also multi-modal, and point to semantically corresponding positions. For the travel data in the database, return the text and pictures corresponding to the enhanced sub-picture and the travel notes; for the travel video data in the database, return the video segment and the whole video corresponding to the enhanced sub-picture. The invention solves the problems that it is difficult to effectively manage the multi-modal data, and it is difficult to locate the target semantic unit in tourism data retrieval.

Description

A positioning retrieval method for multimodal tourism information based on tourism knowledge graph

技术领域technical field

本发明属于多媒体计算领域，涉及文本、图片和视频的语义分析，具体为一种基于旅游知识图谱的多模态旅游信息定位式检索方法。The invention belongs to the field of multimedia computing, relates to semantic analysis of text, pictures and videos, and specifically relates to a multi-modal travel information positioning retrieval method based on a travel knowledge graph.

背景技术Background technique

知识图谱可以描述客观世界的概念、实体、事件及其之间的关系，从而支持信息检索、智能问答等应用。旅游大数据来源复杂，体量巨大，模态多样，难以有效获取和管理，因此多模态的旅游数据难以准确检索。利用多模态的旅游大数据构建多模态知识图谱可以有效增强对旅游数据的管理和利用能力。The knowledge map can describe the concepts, entities, events and the relationship between them in the objective world, thus supporting applications such as information retrieval and intelligent question answering. Tourism big data has complex sources, huge volume, and diverse modes, making it difficult to effectively acquire and manage it. Therefore, it is difficult to accurately retrieve multi-modal tourism data. Using multi-modal tourism big data to build a multi-modal knowledge graph can effectively enhance the ability to manage and utilize tourism data.

如今的旅游信息检索应用大多依赖于文本查找和标签匹配，当检索要求复杂时难以给出准确的检索结果。旅游知识图谱可以支持更复杂的检索要求，但是现有的旅游知识图谱模态单一，大多是文本数据，少有图像数据和视频数据。然而，伴随着移动终端的发展，现今的互联网充斥着海量的图片和视频数据，人们在旅游的时候往往会拍摄大量的图片和视频，图文游记和旅游vlog成为了当下时兴的旅游经历分享方式。并且，基于文字的检索方式对于无标签的图片和视频难以定位到语义对应的位置，仍需要用户二次人工筛选定位，检索难度大。因此，传统的检索方式和单一模态的旅游知识图谱无法支持对目前多模态旅游大数据的定位式检索。Most of today's tourism information retrieval applications rely on text search and tag matching, and it is difficult to give accurate retrieval results when the retrieval requirements are complex. Tourism knowledge graphs can support more complex retrieval requirements, but the existing tourism knowledge graphs have a single mode, mostly text data, and few image data and video data. However, with the development of mobile terminals, today's Internet is flooded with massive pictures and video data. People often take a lot of pictures and videos when traveling. Graphic travel notes and travel vlogs have become popular ways to share travel experiences. . In addition, the text-based retrieval method is difficult to locate the semantically corresponding position for unlabeled pictures and videos, and the user still needs a second manual screening and positioning, making retrieval difficult. Therefore, traditional retrieval methods and single-modal tourism knowledge graphs cannot support the positioning retrieval of current multi-modal tourism big data.

发明内容Contents of the invention

本发明要解决的问题是：对多模态旅游信息的检索定位，通过构建多模态旅游知识图谱实现对多模态旅游大数据进行检索的语义定位，得到更符合要求的检索结果。The problem to be solved by the present invention is: for the retrieval and positioning of multi-modal tourism information, the semantic positioning of the retrieval of multi-modal tourism big data can be realized by constructing a multi-modal tourism knowledge map, and more satisfactory retrieval results can be obtained.

本发明的技术方案为：一种基于旅游知识图谱的多模态旅游信息定位式检索方法，基于图文游记数据和旅游视频数据混合的多模态旅游数据库，构建带有权值的旅游知识图谱，并在构建和更新过程中保存实体和实体间关系对数据源的语义位置索引，用户进行文本搜索时对文本抽取搜索实体和搜索实体间的关系，映射到知识图谱的一个子图，对该子图增强扩展后根据对应索引返回检索结果，返回的检索结果为子图在多模态旅游数据库中对应的多模态数据。The technical solution of the present invention is: a multi-modal tourism information positioning search method based on the tourism knowledge map, based on the multi-modal tourism database mixed with graphic travel data and travel video data, to construct a tourism knowledge map with weights , and save the semantic position index of the entity and the relationship between entities to the data source during the construction and update process. When the user performs text search, the text is extracted from the search entity and the relationship between the search entity, and mapped to a subgraph of the knowledge map. After the subgraph is enhanced and expanded, the retrieval result is returned according to the corresponding index, and the returned retrieval result is the multimodal data corresponding to the subgraph in the multimodal tourism database.

作为优选方式，构建带有权值的旅游知识图谱具体为：As an optimal way, constructing a tourism knowledge graph with weights is specifically:

1)根据旅游垂直网站构建本体库，定义实体类型，包括城市、景点、地点、时间、活动、和其他实体；1) Construct an ontology library according to the tourism vertical website, and define entity types, including cities, scenic spots, places, time, activities, and other entities;

2)从旅游垂直网站和视频网站获取多模态数据，作为多模态旅游数据库，包括从旅游垂直网站获取半结构化城市、景点数据以及非结构化的游记数据，以及从视频网站获取非结构化的旅游类视频；2) Obtain multi-modal data from vertical tourism websites and video websites as a multi-modal tourism database, including semi-structured city, scenic spot data and unstructured travel data from vertical tourism websites, and unstructured data from video websites. personalized travel videos;

3)将多模态数据进行预处理，对游记数据中的文本进行分词、词性分析和依存关系分析，对游记数据中的图片进行物体识别，对视频进行物体跟踪识别和场景文字识别，并对场景文字进行分词、词性分析和依存关系分析；3) Preprocess the multimodal data, perform word segmentation, part-of-speech analysis and dependency analysis on the text in the travel data, perform object recognition on the pictures in the travel data, perform object tracking recognition and scene text recognition on the video, and Word segmentation, part-of-speech analysis and dependency analysis of scene text;

4)从游记数据的分析的文本、视频识别的场景文字文本、游记数据识别的物体、视频跟踪识别的物体中，结合半结构化数据抽取语义实体；4) Extract semantic entities from the analyzed text of travel data, the scene text of video recognition, the objects recognized by travel data, and the objects recognized by video tracking, combined with semi-structured data;

5)挖掘提取的实体之间的关系，构成知识图谱，实体间关系权值计算方法为：5) Mining the relationship between the extracted entities to form a knowledge map, the calculation method of the relationship weight between entities is:

w(h，r，t)＝P((r，t)|h)，w(h,r,t)=P((r,t)|h),

其中w(h，r，t)表示头部实体h和尾部实体t之间的关系(h，r，t)的权值，P((r，t)|h)表示实体关系在头部实体h出现条件下关系为r，尾部实体为t的事件出现的概率。where w(h, r, t) represents the weight of the relationship (h, r, t) between the head entity h and the tail entity t, and P((r, t)|h) represents the entity relationship in the head entity The probability of occurrence of an event whose relationship is r and tail entity is t under the condition that h appears.

作为优选方式，检索方法具体为：As a preferred method, the retrieval method is as follows:

1)基于多模态旅游数据构建带有权值的旅游知识图谱；1) Construct a tourism knowledge map with weights based on multi-modal tourism data;

2)在构建旅游知识图谱的过程中，保存知识图谱中实体和实体关系所对应数据源的语义单元位置索引，游记文本数据源的实体语义定位表示为＜文档id，章id，节id，段id，句id，词id＞，游记图片数据源的实体语义定位表示为＜文档id，章id，节id，段id，图片语句id，包围框＞，视频数据源图像中的实体语义定位表示为＜视频id，镜头id，0，0，帧id，包围框＞，视频数据源识别出的文本中实体语义定位表示为＜视频id，镜头id，0，0，句id，词id＞，数据源的实体关系语义定位表示为＜头实体定位，尾实体定位＞；2) In the process of constructing the tourism knowledge map, the semantic unit location index of the data source corresponding to the entity and entity relationship in the knowledge map is saved, and the entity semantic positioning of the travel text data source is expressed as <document id, chapter id, section id, paragraph id, sentence id, word id>, the entity semantic positioning representation of the travel picture data source is <document id, chapter id, section id, segment id, picture sentence id, bounding box>, the entity semantic positioning representation in the video data source image is <video id, shot id, 0, 0, frame id, bounding box>, and the entity semantic positioning in the text identified by the video data source is expressed as <video id, shot id, 0, 0, sentence id, word id>, The entity relationship semantic positioning of the data source is expressed as <head entity positioning, tail entity positioning>;

3)对输入的检索文本抽取实体和实体关系；3) Extract entities and entity relationships from the input retrieval text;

4)将步骤3)获得的实体和实体关系映射到步骤1)构建的知识图谱，得到其中的一个子图；4) Map the entity and entity relationship obtained in step 3) to the knowledge graph constructed in step 1), and obtain a subgraph thereof;

5)对步骤4)得到的子图，将每个实体根据设置的扩展阈值沿实体关系扩展关联实体，将扩展的实体和实体关系加入到子图中，得到增强子图；5) For the subgraph obtained in step 4), expand the associated entity along the entity relationship for each entity according to the extension threshold set, and add the expanded entity and entity relationship into the subgraph to obtain an enhanced subgraph;

6)根据步骤5)增强子图中实体和实体关系对应的源数据语义位置返回检索数据。6) Return retrieval data according to the source data semantic positions corresponding to the entities and entity relationships in the enhanced subgraph in step 5).

本发明对检索文本返回的结果同样是多模态的，并且指向语义对应的位置。对于数据库中的游记数据，返回增强后子图对应的文本和图片及所在游记；对于数据库中的旅游视频数据，返回增强后子图对应的视频片段和整个视频。The results returned by the present invention to the retrieved text are also multi-modal, and point to semantically corresponding positions. For the travel data in the database, return the text and pictures corresponding to the enhanced sub-picture and the travel notes; for the travel video data in the database, return the video segment and the entire video corresponding to the enhanced sub-picture.

进一步地，本发明实现带权值的多模态旅游知识图谱构建和检索文本子图映射检索，本发明通过对多模态旅游大数据构建带权值的多模态知识图谱，为多模态旅游信息的定位式检索提供了解决方案，对检索文本构建检索子图后映射到知识图谱子图，根据知识图谱子图中实体和实体关系对源数据的语义位置索引，返回符合检索要求的源数据及对应语义位置。Further, the present invention realizes the construction of a weighted multimodal tourism knowledge map and the retrieval of text subgraph mapping retrieval. The present invention constructs a weighted multimodal knowledge map for multimodal tourism big data, providing multimodal The location-based retrieval of tourism information provides a solution. After the retrieval subgraph is constructed for the retrieval text, it is mapped to the knowledge graph subgraph. According to the semantic position index of the source data in the knowledge graph subgraph and the entity relationship, the source that meets the retrieval requirements is returned. Data and their corresponding semantic locations.

本发明首先利用文本分析、图片物体识别、视频场景文字识别和视频物体跟踪等技术构建了一个带权值的多模态旅游知识图谱。不同于单一文本模态旅游知识图谱，该知识图谱能够抽取图像和视频中的知识与文本中抽取的支持相互补充和制约，提供更加丰富准确的实体和实体关系。本发明利用构建的带权值的多模态旅游知识图谱支持了多模态旅游信息的定位式检索。有效地解决了传统文本检索和标签检索不能支持复杂语义检索要求的问题，以及解决了单一文本模态知识图谱不能对图片、视频进行语义检索的问题。同时，本发明使用了定位式检索，可以帮助用户找到定位更加精确的检索目标，无需人工对检索返回数据再次搜索理解，尤其是对于长视频数据源，降低人工花费的效果更加明显。The present invention first uses technologies such as text analysis, picture object recognition, video scene text recognition and video object tracking to construct a weighted multi-modal tourism knowledge graph. Different from a single text modal tourism knowledge map, this knowledge map can complement and restrict the knowledge extracted from images and videos and the support extracted from text, and provide more abundant and accurate entities and entity relationships. The invention uses the constructed multimodal tourism knowledge map with weights to support the positioning retrieval of multimodal tourism information. It effectively solves the problem that traditional text retrieval and label retrieval cannot support complex semantic retrieval requirements, and solves the problem that a single text modal knowledge graph cannot perform semantic retrieval on pictures and videos. At the same time, the present invention uses a positioning search, which can help users find a search target with more accurate positioning, and does not need to manually search and understand the returned data. Especially for long video data sources, the effect of reducing labor costs is more obvious.

本发明的有效利益是：提供了一种多模态旅游信息定位式搜索的解决方案，通过构建带权值的多模态旅游知识图谱，增强了多模态旅游大数据的复杂语义要求的检索能力，通过知识图谱对数据源的语义位置索引，能够返回更加精确的检索结果，降低了人工二次搜索理解的成本，具有良好的广泛性与实用性。The effective benefit of the present invention is that it provides a solution for multi-modal tourism information positioning search, and enhances the retrieval of complex semantic requirements of multi-modal tourism big data by constructing a weighted multi-modal tourism knowledge map Ability, through the knowledge map to index the semantic position of the data source, it can return more accurate retrieval results, reduce the cost of manual secondary search and understanding, and has good universality and practicability.

附图说明Description of drawings

图1为本发明的检索原理示意。Fig. 1 is a schematic diagram of the retrieval principle of the present invention.

图2为本发明的多模态旅游知识图谱构建过程。Fig. 2 is the construction process of the multimodal tourism knowledge map of the present invention.

具体实施方式Detailed ways

本发明提出一种基于旅游知识图谱的多模态旅游信息定位式检索方法，原理如图1所示，根据图文游记数据和旅游视频数据混合数据库中的多模态数据构建多模态旅游知识图谱，并在构建和更新过程中保存实体和实体间关系对数据源的语义位置索引，用户进行文本搜索时对文本抽取搜索实体和搜索实体间的关系，映射到知识图谱的一个子图，对该子图增强扩展后根据对应索引返回检索结果。本发明对检索文本返回的结果同样是多模态的，并且指向语义对应的位置。对于数据库中的游记数据，返回增强后子图对应的文本和图片及所在游记；对于数据库中的旅游视频数据，返回增强后子图对应的视频片段和整个视频。The present invention proposes a multimodal tourism information positioning retrieval method based on tourism knowledge map, the principle is shown in Figure 1, and multimodal tourism knowledge is constructed according to the multimodal data in the mixed database of graphic travel data and tourism video data Graph, and save the entity and the relationship between the entity and the semantic position index of the data source during the construction and update process. When the user searches for text, the text extracts the search entity and the relationship between the search entity, and maps it to a subgraph of the knowledge graph. After the subgraph is enhanced and expanded, the retrieval result is returned according to the corresponding index. The results returned by the present invention to the retrieved text are also multi-modal, and point to semantically corresponding positions. For the travel data in the database, return the text and pictures corresponding to the enhanced sub-picture and the travel notes; for the travel video data in the database, return the video segment and the entire video corresponding to the enhanced sub-picture.

本发明对带权值的多模态旅游知识图谱构建和检索文本子图映射检索的实现包括：In the present invention, the construction of weighted multimodal tourism knowledge map and the realization of retrieval text subgraph mapping retrieval include:

1)如图2所示，基于多模态旅游数据构建带有权值的旅游知识图谱；1) As shown in Figure 2, construct a tourism knowledge map with weights based on multi-modal tourism data;

1.1)对视频使用镜头分割软件ShotDetect进行镜头分割；1.1) Shot segmentation is performed on the video using the lens segmentation software ShotDetect;

1.2)对步骤1.1)分割的每个镜头每0.5秒取帧，并使用文本识别软件PaddleOCR识别帧场景文本；1.2) Get frames every 0.5 seconds for each shot of step 1.1) segmentation, and use the text recognition software PaddleOCR to recognize the frame scene text;

1.3)对每个镜头中步骤1.2)识别的文本去重，并以镜头为单位保存；1.3) Deduplicate the text identified in step 1.2) in each shot, and save it in units of shots;

1.4)对视频使用跟踪器CenterTrack进行多类别多物体跟踪；1.4) Use the tracker CenterTrack to perform multi-category and multi-object tracking on the video;

1.5)对步骤1.4)跟踪结果保存每帧的物体类别和物体包围框；1.5) to step 1.4) the tracking result saves the object category and the object bounding box of each frame;

1.6)对游记图片使用Mask R-CNN进行物体识别；1.6) Use Mask R-CNN for object recognition on travel pictures;

1.7)对步骤1.6)识别结果保存物体的类别的物体包围框；1.7) to step 1.6) the object bounding box of the category of recognition result preservation object;

1.8)对游记每章的每节文本进行分句；1.8) Make sentences for each section of each chapter of the travel notes;

1.9)对步骤1.8)分句结果进行分词；1.9) word segmentation is carried out to step 1.8) sentence result;

1.10)基于步骤1.9)分词结果进行词性分析；1.10) carry out part-of-speech analysis based on step 1.9) part-of-speech result;

1.11)基于步骤1.9)分词结果进行命名实体识别；1.11) carry out named entity recognition based on step 1.9) participle result;

1.12)基于步骤1.9)分词结果进行依存句法分析；1.12) carry out dependency syntax analysis based on step 1.9) participle result;

1.13)对视频每个镜头文本进行分句；1.13) segment the text of each shot of the video into sentences;

1.14)对步骤1.13)分句结果进行分词；1.14) carry out participle to step 1.13) sentence result;

1.15)基于步骤1.13)分词结果进行词性分析；1.15) carry out part-of-speech analysis based on step 1.13) part-of-speech result;

1.16)基于步骤1.13)分词结果进行命名实体识别；1.16) carry out named entity recognition based on step 1.13) participle result;

1.17)基于步骤1.13)分词结果进行依存句法分析。1.17) Based on the result of step 1.13) word segmentation, the dependency syntax analysis is carried out.

1.18)对城市和对应景点构建映射关系；1.18) Construct a mapping relationship between cities and corresponding scenic spots;

1.19)按照游记行文顺序将图片和游记文本的句作为语义单位；1.19) According to the order of the travel notes, the pictures and the sentences of the travel notes are taken as semantic units;

1.20)从步骤1.19)每个句中词性为地名的命名实体中选取能与城市景点映射对应的命名实体抽取为城市实体和景点实体，并记录为最近城市或最近景点；1.20) from step 1.19) in each sentence, part of speech is the named entity of the place name, selects the named entity that can correspond to the city scenic spot mapping and extracts as city entity and scenic spot entity, and is recorded as the nearest city or the nearest scenic spot;

1.21)从步骤1.19)每个句中词性为地名的命名实体中选取未能与城市景点映射对应的命名实体抽取为地点实体；1.21) from step 1.19) in each sentence, the part of speech is selected as the named entity of the place name, and the named entity corresponding to the urban scenic spot mapping is selected to be extracted as the location entity;

1.22)从步骤1.19)每个句中选取临近时间词组合抽取为时间实体；1.22) from step 1.19) in each sentence, select the near time word combination to extract as time entity;

1.23)从步骤1.19)每个句中选取动词抽取为活动实体；1.23) from step 1.19) in each sentence, select verbs to be extracted as active entities;

1.24)从步骤1.19)每个句中选取与动词、介词具有依存关系的非地点名词及图片中的物体抽取为其他实体；1.24) From step 1.19) in each sentence, select non-location nouns and objects in pictures that have dependencies with verbs and prepositions and extract them as other entities;

1.25)按照视频镜头时间顺序将视频的镜头、视频的文本识别句作为语义单位；1.25) According to the time sequence of the video shots, the video shots and the text recognition sentences of the video are used as semantic units;

1.26)从步骤1.25)每个句中词性为地名的命名实体中选取能与城市景点映射对应的命名实体抽取为城市实体和景点实体，并记录为最近城市或最近景点；1.26) from step 1.25) in each sentence, part of speech is the named entity of the place name, selects the named entity that can correspond to the urban scenic spot mapping and extracts it as a city entity and a scenic spot entity, and records it as the nearest city or the nearest scenic spot;

1.27)从步骤1.25)每个句中词性为地名的命名实体中选取未能与城市景点映射对应的命名实体抽取为地点实体；1.27) from step 1.25) the part of speech in each sentence is the named entity of the place name, and the named entity that fails to map corresponding to the city scenic spot is selected as the place entity;

1.28)从步骤1.25)每个句中选取临近时间词组合抽取为时间实体；1.28) from step 1.25) in each sentence, select the near time word combination to extract as time entity;

1.29)从步骤1.25)每个句中选取动词抽取为活动实体；1.29) from step 1.25) in each sentence, select verbs to be extracted as active entities;

1.30)从步骤1.25)每个句中选取与动词、介词具有依存关系的非地点名词及镜头中跟踪的物体抽取为其他实体；1.30) From each sentence in step 1.25), select non-location nouns that have dependencies with verbs and prepositions and objects tracked in the camera lens to be extracted as other entities;

1.31)对从步骤1.19)到步骤1.30)抽取的实体计算莱文斯坦比合并相同类别实体。1.31) Calculate the Levin-Stan ratio for entities extracted from step 1.19) to step 1.30) and merge entities of the same category.

1.32)对抽取的景点实体和最近城市实体构建所属关系；1.32) Construct the ownership relationship between the extracted scenic spot entity and the nearest city entity;

1.33)对抽取的地点实体和最近城市实体构建所属关系；1.33) Construct the relationship between the extracted location entity and the nearest city entity;

1.34)对抽取的地点实体和最近景点实体构建所属关系；1.34) Construct the ownership relationship between the extracted location entity and the nearest scenic spot entity;

1.35)对抽取的活动实体和最近景点实体构建发生在关系；1.35) The relationship between the extracted activity entity and the nearest attraction entity is constructed;

1.36)对抽取的活动实体和地点实体根据依存关系构建发生在关系；1.36) Construct the happened-in relationship for the extracted activity entity and location entity according to the dependency relationship;

1.37)对抽取的活动实体和时间实体根据依存关系构建发生时关系；1.37) Construct the time-of-occurrence relationship for the extracted activity entity and time entity according to the dependency relationship;

1.38)对抽取的活动实体和时间实体根据依存关系构建发生时关系；1.38) Construct the time-of-occurrence relationship for the extracted activity entity and time entity according to the dependency relationship;

1.39)对抽取的景点实体和地点实体根据关键词和依存关系构建位置接近关系；1.39) Constructing a location proximity relationship for the extracted scenic spot entity and location entity according to keywords and dependencies;

1.40)对抽取的其他实体和地点、景点、城市实体根据关键词和依存关系构建利用到达和利用出发关系；1.40) Construct utilization arrival and utilization departure relations for other extracted entities and locations, scenic spots, and city entities according to keywords and dependencies;

1.41)对抽取的其他实体间根据依存关系或语义顺序构建所属关系；1.41) Construct the belonging relationship between other extracted entities according to the dependency relationship or semantic order;

1.42)对从步骤3.32)到步骤3.41)抽取的实体间关系权值计算方法为：1.42) The calculation method for the relationship weight between entities extracted from step 3.32) to step 3.41) is:

w(h,r,t)＝P((r，t)|h)，w(h,r,t)=P((r,t)|h),

其中w(h,r,t)表示头部实体h和尾部实体t之间的关系(h,r,t)的权值，P((r,t)|h)表示实体关系在头部实体h出现条件下关系为r，尾部实体为t的事件出现的概率。Where w(h, r, t) represents the weight of the relationship (h, r, t) between the head entity h and the tail entity t, and P((r, t)|h) represents the entity relationship in the head entity The probability of occurrence of an event whose relationship is r and tail entity is t under the condition that h appears.

2)在步骤1)构建旅游知识图谱的过程中保存知识图谱中实体和实体关系对应源数据的语义单元位置索引，游记文本数据源的实体语义定位表示为<文档id，章id，节id，段id，句id，词id>，游记图片数据源的实体语义定位表示为<文档id，章id，节id，段id，图片语句id，包围框>，视频数据源图像中的实体语义定位表示为<视频id，镜头id，0，0，帧id，包围框>，视频数据源识别出的文本中实体语义定位表示为<视频id，镜头id，0，0，句id，词id>，数据源的实体关系语义定位表示为<头实体定位，尾实体定位>；；2) In the process of constructing the tourism knowledge map in step 1), the semantic unit position index of the source data corresponding to the entity and entity relationship in the knowledge map is saved, and the entity semantic positioning of the travel text data source is expressed as <document id, chapter id, section id, Segment id, sentence id, word id>, entity semantic positioning of travel picture data source is expressed as <document id, chapter id, section id, segment id, picture sentence id, bounding box>, entity semantic positioning in video data source image Expressed as <video id, shot id, 0, 0, frame id, bounding box>, entity semantic positioning in the text identified by the video data source is represented as <video id, shot id, 0, 0, sentence id, word id> , the entity relationship semantic location of the data source is expressed as <head entity location, tail entity location>;

3)对给定的检索文本抽取实体和实体关系：3) Extract entities and entity relationships for a given search text:

3.1)对检索文本进行文本识别分析；3.1) Perform text recognition and analysis on the retrieved text;

3.2)从步骤3.1)得到的分析后数据中根据词性和句法依存关系抽取语义实体；3.2) extracting semantic entities according to part-of-speech and syntactic dependencies from the analyzed data obtained in step 3.1);

3.3)从步骤3.1)得到的分析后数据中根据句法依存关系抽取实体间关系。3.3) From the analyzed data obtained in step 3.1), the relationship between entities is extracted according to the syntactic dependency relationship.

4)将步骤3)获得的检索文本实体和实体关系映射到步骤1)构建的知识图谱中的一个子图：4) Map the retrieved text entities and entity relationships obtained in step 3) to a subgraph in the knowledge graph constructed in step 1):

4.1)对检索文本中的实体和实体关系构建检索子图；4.1) Construct a retrieval subgraph for the entities and entity relationships in the retrieval text;

4.2)将步骤4.1)中构建的检索子图映射到步骤1)构建的带权值的旅游知识图谱的一个检索子图。4.2) Map the retrieval subgraph constructed in step 4.1) to a retrieval subgraph of the weighted tourism knowledge graph constructed in step 1).

5)对步骤4)得到的映射的检索子图，将每个实体根据阈值沿实体关系扩展关联实体，将扩展的实体和实体关系加入到子图中得到增强子图：5) For the mapped retrieval subgraph obtained in step 4), extend each entity along the entity relationship to the associated entity according to the threshold, and add the expanded entity and entity relationship to the subgraph to obtain an enhanced subgraph:

5.1)对于边缘实体h，子图暂时不作延申，对于边缘实体关系(h，r，t)，将实体t作为边缘实体，现子图只存在边缘实体而不存在边缘实体关系；5.1) For the edge entity h, the subgraph will not be extended for the time being. For the edge entity relationship (h, r, t), the entity t is used as the edge entity. Now the subgraph only has edge entities but not edge entity relationships;

5.2)对于步骤5.1)得到的子图，对于边缘实体h和非子图实体

如果存在非子图关系

关系且

大于等于阈值α，将非子图实体

和非子图关系

添加为扩展子图实体

和扩展子图实体关系

同样地，如果存在非子图关系

关系且

大于等于阈值α，将非子图实体

和非子图关系

关添加为扩展子图实体

和扩展子图实体关系

5.2) For the subgraph obtained in step 5.1), for the edge entity h and the non-subgraph entity

If there is a non-subgraph relationship

relationship and

Greater than or equal to the threshold α, the non-subgraph entity

and non-subgraph relations

Added as an extended subgraph entity

and Extended Subgraph Entity Relationships

Likewise, if there is a non-subgraph relation

relationship and

Greater than or equal to the threshold α, the non-subgraph entity

and non-subgraph relations

Close Add as Extended Subgraph Entity

and Extended Subgraph Entity Relationships

5.3)对于步骤5.2)得到的扩展子图，对于原边缘实体h，和非子图实体

如果存在非子图关系

且

大于等于阈值α，将非子图实体

和非子图关系

添加为扩展子图实体

和扩展子图实体关系

同样地，如果存在非子图关系

且

大于等于阈值α，将非子图实体

和非子图关系

添加为扩展子图实体

和扩展子图实体关系

由此类推直到对非子图实体

的权值乘积均小于阈值α，则对于原边缘实体h的扩展结束；5.3) For the extended subgraph obtained in step 5.2), for the original edge entity h, and the non-subgraph entity

If there is a non-subgraph relationship

and

Greater than or equal to the threshold α, the non-subgraph entity

and non-subgraph relations

Added as an extended subgraph entity

and Extended Subgraph Entity Relationships

Likewise, if there is a non-subgraph relation

and

Greater than or equal to the threshold α, the non-subgraph entity

and non-subgraph relations

Added as an extended subgraph entity

and Extended Subgraph Entity Relationships

And so on until for non-subgraph entities

The weight products of all are less than the threshold α, then the expansion of the original edge entity h ends;

5.4)当所有边缘实体扩展结束，该子图扩展结束。5.4) When all edge entity expansion ends, the subgraph expansion ends.

6)根据步骤5)增强子图中实体和实体关系，对应查询多模态旅游数据库中源数据语义位置，返回检索数据，具体为：6) Enhancing the entity and entity relationship in the sub-graph according to step 5), correspondingly querying the semantic position of the source data in the multimodal tourism database, and returning the retrieved data, specifically:

6.1)对增强子图中的实体和实体关系，取得源数据映射索引；6.1) To enhance the entity and entity relationship in the subgraph, obtain the source data mapping index;

6.2)根据映射索引查询多模态旅游数据库，对于数据库中的游记数据，返回索引对应的文本和图片及所在游记；对于数据库中的旅游视频数据，返回索引对应的视频片段和整个视频。6.2) Query the multimodal tourism database according to the mapping index. For the travel data in the database, return the text and pictures corresponding to the index and the travel notes; for the travel video data in the database, return the video clips and the entire video corresponding to the index.

Claims

1. A multi-mode tourism information positioning type retrieval method based on a tourism knowledge map is characterized in that a multi-mode tourism database with a weight is built based on a mixed image-text travel note data and tourism video data, semantic position indexes of entities and the relationships among the entities to a data source are stored in the building and updating processes, when a user conducts text search, the relationships between a search entity and the search entity are extracted from a text and are mapped to a sub-graph of the knowledge map, a retrieval result is returned according to the corresponding indexes after the sub-graph is enhanced and expanded, and the returned retrieval result is multi-mode data corresponding to the sub-graph in the multi-mode tourism database.

2. The multi-modal tourist information positioning type retrieval method based on the tourist knowledge map as claimed in claim 1, wherein the construction of the tourist knowledge map with the weight specifically comprises:

1) Constructing an ontology base according to the tourism vertical website, and defining entity types including cities, scenic spots, places, time, activities and other entities;

2) Obtaining multi-mode data from a travel vertical website and a video website as a multi-mode travel database, wherein the multi-mode data comprises semi-structured city and scenic spot data and unstructured travel note data obtained from the travel vertical website, and unstructured travel videos obtained from the video website;

3) Preprocessing multi-modal data, performing word segmentation, part-of-speech analysis and dependency relationship analysis on texts in the travel record data, performing object identification on pictures in the travel record data, performing object tracking identification and scene character identification on videos, and performing word segmentation, part-of-speech analysis and dependency relationship analysis on scene characters;

4) Extracting semantic entities from analyzed texts of travel note data, scene text recognized by videos, objects recognized by the travel note data and objects recognized by video tracking by combining with semi-structured data;

5) The relation between the entities extracted is mined to form a knowledge graph, and the calculation method of the relation weight between the entities comprises the following steps:

w(h,r,t)＝P((r,t)|h)，

where w (h, r, t) represents a weight of a relationship (h, r, t) between the head entity h and the tail entity t, and P ((r, t) | h) represents a probability that an event in which the relationship is r and the tail entity is t occurs under the condition that the head entity h occurs.

3. The multi-modal travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 2, wherein the picture object recognition is carried out on the travel record data as follows:

1) Performing object recognition on the travel record picture by using Mask R-CNN;

2) And saving the object type and the object surrounding frame for the object identification result.

4. The multi-modal travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 2, wherein the acquisition of scene text of the video specifically comprises:

1) Shot segmentation is carried out on the video by using shotDetect;

2) Framing every 0.5 second for each divided shot, and recognizing a frame scene text by using PaddleOCR;

3) And removing the duplication of the scene text identified in each shot, and storing the scene text by taking the shot as a unit to obtain the scene text.

5. The multi-modal tourist information positioning type retrieval method based on the tourist knowledge map as claimed in claim 2, wherein the object recognition and tracking of the video specifically comprises:

1) Performing multi-class multi-object tracking on the video by using a CenterTrack;

2) And saving the object type and the object enclosing frame of each frame for the tracking result.

6. The multi-modal travel information location-based retrieval method based on the travel knowledge graph as claimed in claim 2, wherein the word segmentation, part of speech analysis and dependency relationship analysis are specifically as follows:

1) Dividing sentences of each section of text of each chapter of the travel note text or dividing sentences of the scene character text;

2) Performing word segmentation on the sentence segmentation result;

3) Performing part-of-speech analysis based on the word segmentation result;

4) Carrying out named entity recognition based on the word segmentation result;

5) And performing dependency syntax analysis based on the word segmentation result.

7. The multi-modal travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 2, wherein in the entity extraction, the semantic entity extraction in the travel record data is specifically as follows:

1) Constructing a mapping relation between the obtained city and the corresponding scenic spot according to the semi-structured data;

2) The sentence of the picture and the text is taken as a semantic unit according to the line text sequence of the travel note data;

3) Selecting named entities which can be mapped and corresponding to cities and scenic spots from the named entities with the part of speech as the place name in each sentence of the travel note data text, extracting the named entities into city entities and scenic spot entities, and recording the city entities and the scenic spot entities as the nearest cities or the nearest scenic spots;

4) Selecting named entities which cannot be mapped with urban scenic spots from the named entities with the part of speech as the place name in each sentence of the travel note data text, and extracting the named entities as place entities;

5) Selecting a word combination of the adjacent time from each sentence of the travel memory data text and extracting the word combination as a time entity;

6) Selecting verbs from each sentence of the travel note data text and extracting the verbs as active entities;

7) Selecting non-place nouns which have dependency relationship with verbs and prepositions from each sentence of the travel-memory data text and objects identified in the travel-memory data picture, and extracting the non-place nouns and the objects as other entities;

8) Calculating the Levensan distance of the entities extracted from the step 3) to the step 7), and combining the similar entities according to a set threshold value.

8. The multi-modal travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 2, wherein in the entity extraction, the semantic entity extraction in the video is specifically as follows:

2) Taking the shots of the video and the recognition sentences in the corresponding shots as semantic units according to the time sequence of the video shots;

3) Selecting a named entity which can be mapped and corresponds to the urban scenic spot from the named entities with the part of speech of each recognition sentence as a place name, extracting the named entities into an urban entity and a scenic spot entity, and recording the urban entity and the scenic spot entity as the nearest city or the nearest scenic spot;

4) Selecting named entities which cannot be mapped and corresponding to urban scenic spots from the named entities of which the part of speech of each recognition sentence is a place name, and extracting the named entities as place entities;

5) Selecting an adjacent time word combination from each recognition sentence and extracting the adjacent time word combination as a time entity;

6) Selecting verbs from each recognition sentence and extracting the verbs as active entities;

7) Selecting non-place nouns having dependency relations with verbs and prepositions and objects tracked in the shot from each recognition sentence, and extracting the non-place nouns and the objects as other entities;

9. The knowledge graph construction method based on multi-mode tourism big data according to claim 7 or 8, characterized in that entity relationship mining is heterogeneous relationship mining, and specifically comprises the following steps:

1) Constructing a relationship between the extracted scenery spot entities and the nearest city entity;

2) Constructing a relationship between the extracted place entity and the nearest city entity;

3) Constructing a relationship between the extracted location entity and the nearest scenery spot entity;

4) Establishing occurrence relationship between the extracted activity entity and the nearest scenery spot entity;

5) Constructing an occurrence relation between the extracted active entities and the extracted place entities according to the dependency relation;

6) Constructing a time-of-occurrence relation between the extracted active entities and the time entities according to the dependency relation;

7) Constructing a time-of-occurrence relation between the extracted active entities and the time entities according to the dependency relation;

8) Constructing a position proximity relation for the extracted scenery spot entities and the extracted place entities according to the keywords and the dependency relations;

9) Constructing and utilizing arrival and departure relations of other extracted entities, places, scenic spots and city entities according to the keywords and the dependency relations;

10 Based on dependency relationships or semantic order, constructs relationships between other extracted entities.

10. The multi-mode travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 1 or 2, wherein the retrieval method is specifically as follows:

1) Constructing a tourism knowledge map with a weight value based on multi-modal tourism data;

2) In the process of constructing the travel knowledge map, semantic unit position indexes of data sources corresponding to entities and entity relations in the knowledge map are stored, entity semantic positioning of a travel record text data source is expressed as < document id, chapter id, section id, segment id, sentence id and word id >, entity semantic positioning of a travel record picture data source is expressed as < document id, chapter id, section id, segment id, picture statement id and bounding box >, entity semantic positioning in a video data source image is expressed as < video id, lens id,0, frame id and bounding box >, entity semantic positioning in a text identified by the video data source is expressed as < video id, lens id,0, sentence id and word id >, and entity relation semantic positioning of the data source is expressed as < head entity positioning and tail entity positioning >;

3) Extracting entities and entity relations from the input retrieval text;

4) Mapping the entity and entity relationship obtained in the step 3) to the knowledge graph constructed in the step 1) to obtain one sub-graph;

5) For the subgraph obtained in the step 4), expanding the associated entities along the entity relationship by each entity according to the set expansion threshold, and adding the expanded entities and the entity relationship into the subgraph to obtain an enhanced subgraph;

6) And returning retrieval data according to the semantic position of the source data corresponding to the entity and the entity relation in the enhanced subgraph in the step 5).

11. The multi-modal travel information location-based retrieval method based on the travel knowledge graph as claimed in claim 10, wherein the retrieval text entity extraction and the entity relationship extraction for the input are specifically as follows:

3.1 Performing text recognition analysis on the retrieval text;

3.2 Extracting semantic entities from the text recognition analysis result according to the part of speech and the syntactic dependency relationship;

3.3 Extract relationships between entities from the textual recognition analysis results based on syntactic dependencies.

12. The multi-mode travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 10, wherein the step 4) of obtaining knowledge graph retrieval subgraphs through mapping is as follows:

4.1 Constructing a retrieval subgraph for the entities and the entity relations in the retrieval text;

4.2 Mapping the retrieval subgraph constructed in the step 4.1) to one retrieval subgraph of the constructed tourism knowledge map with the weight value;

4.3 ) expanding the relation between the entity and the entity according to the threshold value of the relation weight between the edge entity and other entities by using the knowledge graph retrieval subgraph mapped in the step 4.2).

13. The multi-modal travel information positioning type retrieval method based on the travel knowledge graph as claimed in claim 12, wherein the weight threshold of the pair graph is expanded as follows:

4.3.1 For the edge entity h, the subgraph does not extend temporarily, for the edge entity relationship (h, r, t), the entity t is taken as the edge entity, and the current subgraph only has the edge entity but does not have the edge entity relationship;

4.3.2 For subgraphs obtained in step 4.3.1), for edge entities h and non-subgraph entities

If there is a non-sub graph relationship

Are in relation to each other

Greater than or equal to threshold value alpha, and adding non-sub-graph entity

And non-subgraph relationships

Added as an extended sub-graph entity

And extended subgraph entity relationships

Likewise, if there is a non-sub graph relationship

Are in relation to each other

Greater than or equal to threshold value alpha, and adding non-sub-graph entity

And non-subgraph relationships

Adding as an extended sub-graph entity

And extended subgraph entity relationships

4.3.3 For the extended subgraph obtained in step 4.3.2), for the primary edge entity h, and the non-subgraph entity

If there is a non-sub graph relationship

And is

Greater than or equal to threshold value alpha, and adding non-sub-graph entity

And non-subgraph relationships

Added as an extended sub-graph entity

And extended subgraph entity relationships

Likewise, if there is a non-sub graph relationship

And is provided with

Greater than or equal to threshold value alpha, and adding non-sub-graph entity

And non-subgraph relationships

Added as an extended sub-graph entity

And extended subgraph entity relationships

From this analogy to non-sub-graph entities

If the weighted products are all smaller than the threshold value alpha, the expansion of the original edge entity h is ended;

4.3.4 The sub-graph expansion ends when all edge entity expansion ends.

14. The multi-modal travel information location-based retrieval method based on the travel knowledge graph of claim 10, wherein the returned retrieval data is as follows:

1) Acquiring a source data mapping index for an entity and an entity relation in the subgraph;

2) Inquiring a multi-mode travel database according to the mapping index, and returning texts and pictures corresponding to the index and the travel notes in the database for the travel note data in the database; and returning the video clips corresponding to the indexes and the whole video for the travel video data in the database.