CN118467851B

CN118467851B - Artificial intelligent data searching and distributing method and system

Info

Publication number: CN118467851B
Application number: CN202410939629.6A
Authority: CN
Inventors: 徐杭; 蒙婕; 陈钢; 任军
Original assignee: Beijing Honeycomb Technology Co ltd
Current assignee: Beijing Honeycomb Technology Co ltd
Priority date: 2024-07-15
Filing date: 2024-07-15
Publication date: 2024-10-25
Anticipated expiration: 2044-07-15
Also published as: CN118467851A

Abstract

The present invention provides an artificial intelligence data search and distribution method and system, which relates to the field of artificial intelligence technology, including obtaining a search request input by a user, semantically expanding the search request to obtain a target expanded search keyword, performing semantic search according to the target expanded search keyword and a pre-built multimodal semantic index to obtain a preliminary search result; extracting and fusing features of the preliminary search results through a multimodal fusion neural network to obtain a multimodal fusion feature vector, using an attention mechanism to adjust weights, calculating the correlation scores of each data in the preliminary search results and the user's search intent, sorting the preliminary search results according to the correlation scores to obtain ordered search results; inputting the ordered search results into a personalized recommendation system to generate a final recommendation result, presenting and adapting the final recommendation result based on a terminal device and a user portrait, determining the distribution content, and pushing it to the terminal device.

Description

Artificial intelligent data searching and distributing method and system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence data searching and distributing method and system.

Background

The internet users are increasingly dependent on search engines and recommendation systems. Users want to be able to quickly and accurately obtain the required information, and at the same time want the system to be able to provide personalized recommended content according to their interests and behavior habits. However, with the explosive growth of information volume, how to efficiently extract relevant information from massive data and accurately distribute the relevant information according to specific needs and preferences of users becomes a technical problem to be solved.

Traditional search engines rely primarily on keyword matching and rule-based ranking algorithms, and it is difficult to fully understand the semantic needs and interest preferences of users, resulting in poor relevance of search results and user satisfaction. Meanwhile, the traditional recommendation system often depends on collaborative filtering or content-based recommendation methods, and although the methods can improve the recommendation effect to a certain extent, the problems of data sparsity, cold start and the like are still faced, and real-time optimization and dynamic adjustment are difficult to realize; along with the rapid development of artificial intelligence technology, the artificial intelligence technology has great potential in the fields of searching and recommending, improves the accuracy of searching and recommending, provides a richer semantic background, enhances the understanding capability of a system, dynamically adjusts the recommending strategy through continuous user interaction, and improves the user satisfaction and the long-term benefits of the system.

In summary, many challenges still exist in practical application, so that multi-mode data needs to be effectively fused, diversity and coverage of search and recommendation results are improved, semantic expansion is performed by using a knowledge graph, understanding of search requests and recall rate of related results are improved, user images and recommendation models are dynamically updated through real-time feedback and incremental learning, and more accurate personalized recommendation is provided.

Disclosure of Invention

The embodiment of the invention provides an artificial intelligent data searching and distributing method and system, which can solve the problems in the prior art.

In a first aspect of an embodiment of the present invention,

Provided is an artificial intelligence data searching and distributing method, comprising:

Obtaining a search request input by a user, carrying out semantic expansion on the search request based on a pre-constructed comprehensive knowledge graph to obtain a target expanded search keyword, and carrying out semantic search on structured data and unstructured data according to the target expanded search keyword and a pre-constructed multi-mode semantic index to obtain a preliminary search result;

Performing feature extraction and fusion on the preliminary search results through a multi-modal fusion neural network to obtain multi-modal fusion feature vectors, performing weight adjustment on the multi-modal fusion feature vectors through an attention mechanism, calculating relevance scores of each data in the preliminary search results and search intentions of users according to the adjusted multi-modal fusion feature vectors, and sequencing the preliminary search results according to the relevance scores to obtain ordered search results;

The ordered search results are input into a personalized recommendation system, a final recommendation result is generated, presentation adaptation is carried out on the final recommendation result based on terminal equipment and user portraits, distribution content is determined, and the distribution content is pushed to the terminal equipment.

In an alternative embodiment of the present invention,

Obtaining a search request input by a user, carrying out semantic expansion on the search request based on a pre-constructed comprehensive knowledge graph to obtain a target expanded search keyword, carrying out semantic search on structured data and unstructured data according to the target expanded search keyword and a pre-constructed multi-mode semantic index, and obtaining a preliminary search result, wherein the obtaining of the preliminary search result comprises the following steps:

acquiring a search request input by a user, preprocessing the search request to obtain a search request text, and converting the search request text into semantic vector representation of the search request;

carrying out semantic expansion on the semantic vector representation by utilizing the comprehensive knowledge graph to obtain a target expanded search keyword;

searching in a multi-mode semantic index constructed in advance according to the target expanded search keyword, wherein the multi-mode semantic index comprises a structured data index and an unstructured data index;

Based on the structured data, matching the target expanded search keywords with entities and relations in the multi-mode knowledge graph by adopting a query language based on a graph to obtain a first search result;

Based on the unstructured data, a pre-trained multi-modal representation learning model is adopted, the target expanded search keywords are mapped into a multi-modal semantic space, and a second search result is obtained through vector similarity calculation;

And carrying out semantic fusion on the first search result and the second search result, and sorting based on semantic relevance and importance to obtain a preliminary search result.

In an alternative embodiment of the present invention,

Carrying out semantic expansion on the semantic vector representation by utilizing the comprehensive knowledge graph, and obtaining target expanded search keywords comprises the following steps:

Carrying out vectorization processing on the entities and the relations in the comprehensive knowledge graph by adopting a knowledge graph embedding model to obtain entity relation vector representation, and selecting an entity with the highest similarity as a candidate expansion keyword by calculating the similarity between semantic vector representation and entity relation vector representation;

Taking the candidate expanded keywords as the center, adopting a random walk algorithm to perform context sampling in the comprehensive knowledge graph, and calculating node centrality measurement and node importance score by determining node degree, centrality and clustering coefficient of the comprehensive knowledge graph to generate an expanded keyword sequence;

And screening the expanded keyword sequence based on the node centrality measurement and the node importance score to obtain a target expanded search keyword.

In an alternative embodiment of the present invention,

Performing feature extraction and fusion on the preliminary search result through a multi-modal fusion neural network to obtain a multi-modal fusion feature vector, performing weight adjustment on the multi-modal fusion feature vector by using an attention mechanism, calculating a relevance score of each data in the preliminary search result and the search intention of a user according to the adjusted multi-modal fusion feature vector, and sequencing the preliminary search result according to the relevance score, wherein the step of obtaining an ordered search result comprises the following steps:

Based on the structured data in the preliminary search result, extracting corresponding key attributes and corresponding attribute values, and determining dominant features; based on unstructured data in the preliminary search results, extracting semantic features by adopting a pre-trained deep learning model, and determining deep features;

Inputting the dominant features and the deep features into a multi-modal fusion neural network, and carrying out feature fusion through multi-layer nonlinear transformation and interactive operation to obtain multi-modal fusion feature vectors;

Based on an attention mechanism, carrying out interactive calculation on semantic vector representations corresponding to the user search intention and the multi-modal fusion feature vectors to obtain an attention weight matrix, and adjusting weight distribution of different dimensionalities in the multi-modal fusion feature vectors through the attention weight matrix to determine multi-modal weighted fusion feature vectors;

And calculating a correlation score between the multimodal weighted fusion feature vector and the semantic vector representation of the user search intention by adopting a similarity measurement method, sequencing the preliminary search results according to the correlation score from high to low, and determining the ordered search results.

In an alternative embodiment of the present invention,

The ordered search results are input into a personalized recommendation model, a final recommendation result is generated, presentation adaptation is carried out on the final recommendation result based on terminal equipment and user portraits, distribution content is determined, and the distribution content is pushed to the terminal equipment, wherein the steps of:

Based on a pre-acquired user image, the ordered search result acquires a recommendation candidate set through a content recommendation algorithm, potential information corresponding to potential interests of a user is obtained in a reasoning mode, and the potential information is added into the recommendation candidate set to synthesize a recommendation result;

Optimizing the recommendation result in real time through a reinforcement learning model according to the real-time feedback and behavior change of the user, dynamically adjusting the display and sequencing of the recommendation result by dynamically adjusting priority sequencing parameters and combining the explicit feedback and the implicit feedback of the user, and generating a final recommendation result;

Based on terminal equipment information and the user portrait, performing presentation adaptation on the final recommendation result, determining distribution content, and pushing the distribution content to corresponding terminal equipment;

And after the terminal equipment receives the final recommendation result, acquiring user interaction data in real time to generate data feedback, wherein the data feedback is used for updating the user portrait based on incremental updating calculation through data reflux, and simultaneously iteratively updating the personalized recommendation model.

In an alternative embodiment of the present invention,

Based on the pre-acquired user image, the ordered search result acquires a recommendation candidate set through a content recommendation algorithm, and inferentially acquires potential information corresponding to potential interests of the user, the potential information is added into the recommendation candidate set, and the recommendation result is synthesized by the steps of:

in the content recommendation algorithm, user content preference is calculated, and the formula is as follows:

；

Where u represents user u, c represents content c, p _u,c represents a preference score of user u on content c, F represents a feature in the content, F represents a set of all features, w _u,f represents a preference weight of user u on feature F, v _c,f represents a value of content c on feature F, x _f represents an importance weight of feature F, y _u,c,f represents interaction strength of user u and content c on feature F, α represents an intensity parameter controlling similarity, sim _u,c represents similarity of user u and content c, β represents an intensity parameter controlling popularity, pop _c represents popularity of content c.

In a second aspect of an embodiment of the present invention,

There is provided an artificial intelligence data searching and distributing system comprising:

the first unit is used for acquiring a search request input by a user, carrying out semantic expansion on the search request based on a pre-built comprehensive knowledge graph to obtain a target expanded search keyword, and carrying out semantic search on structured data and unstructured data according to the target expanded search keyword and a pre-built multi-mode semantic index to obtain a preliminary search result;

The second unit is used for carrying out feature extraction and fusion on the preliminary search results through a multi-modal fusion neural network to obtain multi-modal fusion feature vectors, carrying out weight adjustment on the multi-modal fusion feature vectors by using an attention mechanism, calculating relevance scores of all data in the preliminary search results and search intentions of users according to the adjusted multi-modal fusion feature vectors, and sequencing the preliminary search results according to the relevance scores to obtain ordered search results;

and the third unit is used for inputting the ordered search results into the personalized recommendation system, generating a final recommendation result, performing presentation adaptation on the final recommendation result based on the terminal equipment and the user portrait, determining distribution content and pushing the distribution content to the terminal equipment.

In a third aspect of an embodiment of the present invention,

There is provided an electronic device including:

A processor;

A memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.

In a fourth aspect of an embodiment of the present invention,

There is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.

In the embodiment of the invention, accurate search request semantic vector representation is generated through preprocessing and word embedding models; based on semantic expansion of the knowledge graph, related expansion keywords are generated, and coverage and accuracy of search results are improved; the multi-mode semantic indexes of the structured and unstructured data are combined, so that the diversity and the relevance of search results are improved; by extracting and fusing the features of the structured and unstructured data, the search result is not only dependent on the data of a single modality, but also considers the search intention of the user more comprehensively; the method for generating the modal weighted fusion feature vector and measuring the similarity ensures that the sorting of the search results is more accurate and personalized, and improves the satisfaction degree of users; the final ordered search results are generated, so that the general search requirements are met, the method can be further used for personalized recommendation, and the practicability and the user viscosity of a search system are improved; when the content recommendation algorithm and the reasoning algorithm are combined, the historical behaviors and the interests and hobbies of the user are considered when the recommendation candidate set is generated, potential information corresponding to the potential interests of the user is obtained through reasoning, the diversity and the coverage range of recommendation are expanded, and a more comprehensive recommendation result is provided; the interactive action data of the user is transmitted to the recommendation system through a data reflow mechanism, new user feedback data and historical data are combined and updated based on incremental update calculation, so that a user portrait and a personalized recommendation model are updated, the personalized recommendation model is continuously and iteratively updated and optimized by utilizing an incremental learning algorithm, the updated user portrait and the updated personalized recommendation model can reflect real-time interests and demands of the user more accurately, and more accurate and satisfactory recommendation service is provided.

Drawings

FIG. 1 is a flow chart of an artificial intelligence data searching and distributing method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of an artificial intelligence data searching and distributing system according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

FIG. 1 is a schematic flow chart of an artificial intelligence data searching and distributing method according to an embodiment of the invention, as shown in FIG. 1, the method includes:

S101, acquiring a search request input by a user, carrying out semantic expansion on the search request based on a pre-constructed comprehensive knowledge graph to obtain a target expanded search keyword, and carrying out semantic search on structured data and unstructured data according to the target expanded search keyword and a pre-constructed multi-mode semantic index to obtain a preliminary search result;

The comprehensive knowledge graph specifically refers to a tool for representing knowledge through a graph structure, and comprises a plurality of search related domain knowledge, a large number of entities such as people, places, events and the like, and relationships among the entities such as 'friends who are someone', 'located somewhere', the entities and the relationships are embedded into a graph, and the knowledge graph is displayed in a node and side form and aims to integrate multi-source heterogeneous data in a structured way to provide a global view so as to perform complex query and reasoning;

The semantic expansion specifically means that the semantic analysis is carried out on the input text, potential meanings behind the text are extracted, a group of expansion keywords are generated by utilizing the meanings, the original text is associated with related context information by utilizing entities and relations in a knowledge graph, the semantic range of the text is expanded, and the purpose of the semantic expansion is to improve the accuracy and the comprehensiveness of searching and find content more related to the user requirement.

In the embodiment, the semantic expansion is performed on the search request through the comprehensive knowledge graph, potential meanings and contexts in user input can be captured, and more accurate expanded search keywords are generated, so that the relevance and accuracy of search results are improved; the semantic expansion can discover more related keywords, cover more aspects possibly concerned by a user, and enable search results to be more comprehensive and rich; the combination of the comprehensive knowledge graph and the multi-mode semantic index enables the search system to find related data more quickly, reduces waiting time of users and improves response speed of the system; through semantic expansion and multi-modal indexing, data of different types and different sources can be effectively associated and mined, potential links between the data are discovered, and more valuable information is provided.

In an alternative embodiment, a search request input by a user is acquired, semantic expansion is performed on the search request based on a pre-built comprehensive knowledge graph to obtain a target expanded search keyword, semantic search is performed on structured data and unstructured data according to the target expanded search keyword and a pre-built multi-modal semantic index, and the obtaining of a preliminary search result includes:

The first search result specifically refers to a search result generated based on structured data, and the data is usually stored in a database or a knowledge graph, and has a definite structure and relationship. By retrieving entities and relationships matching the target expanded search keywords based on the graph-based query language, structured information related to the user search request is obtained, the results typically including well-defined data such as specific entries in the database, associations, attribute values, and the like.

The second search result specifically refers to a search result generated based on unstructured data, and the data comprises text, images, videos and other contents without fixed structures. The target expanded search keywords are mapped to a multi-modal semantic space through a pre-trained multi-modal representation learning model, and content most relevant to the target expanded search keywords is retrieved from unstructured data indexes through vector similarity calculation, and the result generally comprises information such as documents, pictures, video clips and the like.

Obtaining a search request input by a user, preprocessing the search request, including word segmentation, word deactivation, part-of-speech tagging and the like, obtaining a search request text, and mapping each word in the search request text into a low-dimensional dense vector representation by utilizing a pre-trained word embedding model, preferably a GloVe model; based on the attention mechanism, word vectors in the text of the search request are aggregated to obtain a semantic vector representation of the search request.

Carrying out semantic expansion on semantic vector representation of a search request by utilizing a pre-constructed comprehensive knowledge graph, finding out entity nodes most relevant to the semantic vector representation of the search request in the comprehensive knowledge graph, and acquiring attribute information and associated entities of the corresponding entity nodes; and generating an expanded keyword related to the original search request according to the attribute information of the entity node and the associated entity to form a target expanded search keyword set.

Searching in a pre-constructed multi-mode semantic index according to the target expanded search keyword, adopting a query language based on a graph, preferably using a Cypher query language, for the structured data index, matching the target expanded search keyword with the entity and the relation in the multi-mode knowledge graph, and finding out related structured data to form a first search result; for unstructured data index, a pre-trained multi-modal representation learning model, preferably a ViLBERT model, is adopted to map the target expanded search keyword into a multi-modal semantic space to obtain semantic vector representation thereof, and the unstructured data most relevant to the target expanded search keyword is found by calculating cosine similarity between the semantic vector representation of the target expanded search keyword and semantic vector representations of various items in the unstructured data index to form a second search result.

Carrying out semantic fusion on the first search result, namely the structured data, and the second search result, namely the unstructured data; for the first search result, calculating a semantic relevance score for each result item according to the relevance and importance of the matched entity and relationship; for the second search result, calculating a semantic relevance score of each result item according to the semantic similarity with the target expanded search keyword; comprehensively considering semantic relevance scores of the first search result and the second search result, and sequencing all result items to obtain a preliminary search result list; and further adjusting and optimizing the preliminary search result list according to the importance of the result items to obtain final search result ordering.

In the embodiment, accurate search request semantic vector representation is generated through preprocessing and word embedding models; based on semantic expansion of the knowledge graph, related expansion keywords are generated, and coverage and accuracy of search results are improved; the multi-mode semantic indexes of the structured and unstructured data are combined, so that the diversity and the relevance of search results are improved; through semantic fusion and correlation calculation, the search results are finely ordered and optimized, and the user search experience and satisfaction are improved.

In an alternative embodiment, using the integrated knowledge graph to semantically expand the semantic vector representation, obtaining the target expanded search keyword includes:

The node degree specifically refers to the number of edges connected by one node in the knowledge graph. It represents the number of other nodes with which the node is directly associated. The node degree can be divided into an outgoing degree and an incoming degree, namely, the number of outgoing edges and the number of incoming edges, and in the undirected graph, the node degree is the total number of edges connected with the node.

The centrality specifically refers to an index for measuring the relative importance of nodes in a graph, and various centrality measuring methods exist: the median centrality measures the number of shortest paths that one node takes as between other node pairs in the graph. The nodes with high medium-number centrality have more control force in the network; near centrality, the average shortest path length of one node with all other nodes is measured. Nodes with high proximity centrality can propagate information to other parts of the network faster; the degree centrality is measured by directly using the degree of the node, and the higher the degree is, the greater the importance of the node in the network is.

The node centrality measurement specifically refers to an index for comprehensively evaluating the importance and influence of a node in a knowledge graph, and based on the centrality concept, the relative importance of the node in the overall graph structure is calculated by combining the structure position, the connection quantity and the connection quality of the node in the network.

The node importance score specifically refers to a comprehensive score calculated according to the node centrality measurement and other related indexes, the comprehensive score reflects the global importance and the local importance of the nodes in the knowledge graph, and the higher the score, the more critical and the more influencing the nodes in the graph.

And carrying out vectorization processing on the entities and relations in the comprehensive knowledge graph by adopting a knowledge graph embedding model, preferably adopting a ComplEx embedding model, mapping each entity and relation in the knowledge graph into a low-dimensional dense vector space to obtain entity relation vector representation, for semantic vector representation of a search request, calculating similarity between the semantic vector representation of the search request and each entity relation vector representation in the knowledge graph, preferably calculating Euclidean distance, finding the entity most relevant to the semantic of the search request, and selecting the entity with the highest similarity as a candidate expansion keyword to form a candidate expansion keyword set.

Taking a candidate expansion keyword as a center, adopting a random walk algorithm to perform context sampling in an integrated knowledge graph, starting from the candidate expansion keyword, randomly selecting neighbor nodes to walk according to entity relations in the knowledge graph, generating a node sequence containing context information, and evaluating importance of the nodes in the knowledge graph by counting the degree of the nodes, namely the number of edges connected with the nodes, centrality such as medium centrality, approximate centrality and the like, and clustering coefficient, namely the proportion of triangles formed between the nodes and the neighbor nodes; according to the node degree, the centrality and the clustering coefficient, calculating the centrality measurement of each node, reflecting the importance degree of the node in the knowledge graph, and calculating the importance score of each node by combining the centrality measurement and the wandering frequency of the node to obtain the extended keyword sequence.

Based on the node centrality measurement and the node importance score, screening the expanded keyword sequence, setting a threshold value of the centrality measurement and the importance score, filtering out nodes lower than the threshold value, reserving nodes with high centrality and high importance, de-duplicating and sequencing the screened nodes to obtain a final target expanded search keyword set, wherein the target expanded search keyword set contains expanded keywords which are related to the original search request semanteme and have high importance in a knowledge graph, and can effectively expand the semantic range of the search request.

In the embodiment, the embedded model can capture complex relation and semantic information in the knowledge graph, and map the entity and the relation to a low-dimensional space, so that similarity calculation is more efficient; by calculating Euclidean distance, the entity most relevant to the search request is accurately found, and the semantic relevance of the expanded keywords is ensured to be high; the random walk algorithm can capture the context information among the entities in the knowledge graph and generate a node sequence with rich semantics; the importance of the nodes is accurately estimated by counting the degree, the centrality and the clustering coefficient of the nodes, so that the quality of the expanded keywords is ensured; the importance scores of the nodes are calculated by combining the centrality measurement and the wandering frequency of the nodes, so that the expanded keyword sequences are more comprehensive and accurate; by setting a threshold value, nodes with low centrality and low importance are filtered, high-quality expanded keywords are reserved, and accuracy and relevance of results are ensured; performing de-duplication and sequencing on the screened nodes, further optimizing and expanding a keyword set, and improving the coverage range and the precision of a search result; the final target expanded search keyword set can effectively expand the semantic range of the search request, and the comprehensiveness and accuracy of search are improved.

S102, carrying out feature extraction and fusion on the preliminary search results through a multi-modal fusion neural network to obtain multi-modal fusion feature vectors, carrying out weight adjustment on the multi-modal fusion feature vectors by using an attention mechanism, calculating relevance scores of all data in the preliminary search results and search intentions of users according to the adjusted multi-modal fusion feature vectors, and sequencing the preliminary search results according to the relevance scores to obtain ordered search results;

The user searching intention specifically refers to information or a problem to be found or solved by a user when the user inputs a searching request, reflects the actual requirement and purpose of the user, not only comprises keywords on the surface, but also comprises back semantics and context, and identifies and understands the key that the user searching intention is to improve the performance of a searching system and the satisfaction degree of the user;

The relevance score specifically refers to an index for quantifying the matching degree of the search result and the search intention of the user, reflects the matching degree of each data item in the preliminary search result and the search intention of the user, and is generally calculated through methods such as feature extraction, semantic analysis and the like, wherein the higher the relevance score is, the more the data item meets the requirements and expectations of the user.

In the embodiment, the multi-modal fusion neural network is utilized to comprehensively extract and fuse the features of the structured and unstructured data in the primary search result, so that a unified multi-modal fusion feature vector is generated, and the richness and the comprehensiveness of feature expression are improved; the multi-mode fusion feature vector is subjected to weight adjustment through an attention mechanism, so that the feature highly related to the search intention of the user can be highlighted, the influence of noise features is reduced, the calculation precision of the relevance score is improved, and the precision of the search result is improved; and calculating the relevance score of each data in the preliminary search results and the search intention of the user according to the adjusted multimodal fusion feature vector, and sequencing according to the scores to ensure that the most relevant search results are ranked in front and provide more valuable information for the user.

In an alternative embodiment, the feature extraction and fusion are performed on the preliminary search result through a multi-modal fusion neural network to obtain a multi-modal fusion feature vector, the weight of the multi-modal fusion feature vector is adjusted by using an attention mechanism, the relevance score of each data in the preliminary search result and the search intention of the user is calculated according to the adjusted multi-modal fusion feature vector, the preliminary search result is ordered according to the relevance score, and the obtaining of the ordered search result comprises:

Analyzing the structured data in the preliminary search results, and identifying key attributes thereof, such as title, category, timestamp, browsing amount, click amount and the like; extracting attribute values corresponding to each key attribute to form a structured feature vector; and taking the extracted structured feature vector as an explicit feature for subsequent feature fusion.

Preprocessing unstructured data in the primary search results, such as texts, images, videos and the like, such as text segmentation, image normalization and the like; performing feature extraction on unstructured data by adopting a pre-trained deep learning model, preferably ResNet; for text data, extracting semantic features of the text data by using a pre-trained language model to obtain text semantic vector representation; for image and video data, extracting visual characteristics of the image and video data by using a pre-trained convolutional neural network to obtain image characteristic vector representation and video characteristic vector representation; and taking the extracted unstructured data features as deep features for subsequent feature fusion.

Inputting the dominant features and the deep features into a multi-modal fusion neural network, fusing the features of different modes through multi-layer nonlinear transformation and interaction operation, realizing nonlinear combination and interaction of the features by using a full connection layer, an attention mechanism, a gating mechanism and the like in the fusion process, and obtaining a fused multi-modal fusion feature vector through forward propagation of the multi-modal fusion neural network.

The semantic vector representation corresponding to the user search intention is interactively calculated with the multi-modal fusion feature vector, the attention mechanism is used, the attention weight matrix is obtained by calculating the correlation between the semantic vector representation of the user search intention and each dimension in the multi-modal fusion feature vector, the attention weight matrix is used for adjusting the weights of different dimensions in the multi-modal fusion feature vector, the feature dimension relevant to the user search intention is highlighted, the multi-modal weighted fusion feature vector is obtained through weighted fusion, and the correlation information of the user search intention is fused.

Calculating a correlation score between the multimodal weighted fusion feature vector and the semantic vector representation of the user search intention by adopting a similarity measurement method; the preliminary search results are ranked from high to low according to the relevance score, the characteristics of structured and unstructured data and the relevance to the search intention of the user are considered by the ranked search results, the search requirement of the user can be better met, and the ranked search results can be used as final ordered search results to be presented to the user or the search results can be further personalized.

In the embodiment, through feature extraction and fusion of structured and unstructured data, the search result is not only dependent on data of a single modality, but also comprehensively considers the search intention of a user; the comprehensive processing of the structured data and the unstructured data enables the search result to cover wider content types including texts, images, videos and the like, so as to meet the diversified information demands of users; according to the search intention of the user, the feature weight is adjusted, the feature dimension with strong correlation is highlighted, and the ordering of the search results is optimized, so that the user can find the required information more quickly; the feature weight is dynamically adjusted according to the search intention of the user by using an attention mechanism, so that the search result can timely reflect the change of the user's requirement; the sorting of the search results not only considers the characteristics of the preliminary search results, but also combines the correlation of the search intentions of the users, and can better adapt to the personalized requirements of the users; the multi-mode weighting fusion feature vector generation and similarity measurement method enables the sorting of the search results to be more accurate and personalized, and improves the user satisfaction; the final ordered search result is generated, so that the general search requirement is met, the method can be further used for personalized recommendation, and the practicability and the user viscosity of a search system are improved.

S103, inputting the ordered search results into a personalized recommendation system, generating final recommendation results, performing presentation adaptation on the final recommendation results based on terminal equipment and user portraits, determining distribution content, and pushing the distribution content to the terminal equipment.

The terminal device specifically refers to a hardware device used by a user to access, browse and interact with the recommendation system. Terminal devices include, but are not limited to, smartphones, tablet computers, personal computers, smartwatches, smarttelevisions, etc., each having different screen sizes, resolutions, operating systems and modes of interaction.

The user portrayal specifically refers to comprehensive description of characteristics, behaviors and preferences of the user, and comprises demographic information such as age, gender, occupation and the like, hobbies and interests, historical behavior data such as browsing records, clicking records, purchasing records and the like, and real-time feedback and interaction data of the user. The user portraits are used to help the recommendation system better understand and predict the needs and preferences of the user, providing personalized recommendations.

The distribution specifically refers to a process of pushing the recommendation result to the user. It includes preparation, transmission and presentation of content on a terminal device. The distribution process takes network conditions, device performance, user preferences, behaviors, and other factors into account, and ensures that recommended content can be efficiently and accurately delivered to users and presented in a suitable form on the user terminals.

In the embodiment, the ordered search results are input into the personalized recommendation system, and final recommendation results which more accord with the interests and the demands of the user are generated by combining the user portraits, so that the personalized degree and the accuracy of recommendation are improved; according to the characteristics of different terminal equipment, the final recommendation result is presented and adapted, so that the content can be displayed in an optimal form on various kinds of equipment, and the user experience is improved; the system can dynamically adjust recommended content and distribution strategies according to real-time feedback and user portraits, improves flexibility and intelligence of the system, and can timely respond to changes of user behaviors.

In an alternative embodiment, the inputting the ordered search result into the personalized recommendation model, generating a final recommendation result, performing presentation adaptation on the final recommendation result based on the terminal device and the user portrait, determining the distribution content, and pushing to the terminal device includes:

Generating a recommendation candidate set through a content recommendation algorithm based on pre-acquired user portrayal information such as demographic characteristics, hobbies, historical behaviors and the like, and finding out an item matched with the user interests by analyzing the similarity and the relevance between the user and the item based on content recommendation; and when the recommendation candidate set is generated, potential information corresponding to the potential interests of the user is obtained through an inference algorithm, such as knowledge graph inference, the potential information obtained through inference is added into the recommendation candidate set, the diversity and coverage range of recommendation are expanded, and a more comprehensive recommendation result is formed.

According to the real-time feedback and behavior change of the user, real-time optimization is carried out on the recommendation result through a reinforcement learning model, the reinforcement learning model continuously learns and adjusts the recommendation strategy through interaction with the user so as to maximize satisfaction and long-term benefits of the user, the priority ranking parameters such as the click rate, the residence time and the conversion rate of the recommended articles by the user are dynamically adjusted, the ranking of the recommendation result is adjusted in real time, and the display and the ranking of the recommendation result are dynamically adjusted by combining the explicit feedback of the user such as scoring, praise and the like and the implicit feedback such as browsing time, click sequence and the like; through continuous interaction and optimization, final recommendation results matched with the real-time interests and preferences of the user are generated.

Based on terminal device information, such as device type, screen size, network status, etc., and user portraits, rendering adaptations are made to the final recommendation result; according to the characteristics of different terminal equipment and the preferences of users, the display form, layout and interaction mode of the recommended content are determined, personalized rendering and typesetting are carried out on the recommended content, good user experience is provided, the matched recommended content is pushed to the corresponding terminal equipment, such as mobile phone APP, web pages and intelligent hardware, in the pushing process, the network condition and the equipment performance are considered, and the efficient transmission and display of the recommended content are ensured by adopting a proper transmission protocol and compression algorithm.

After the terminal equipment receives the final recommendation result, interactive action data of the user, such as clicking, browsing, collecting, commenting and the like, are collected in real time, the interactive action data of the user are generated into data feedback and are transmitted to a recommendation system through a data reflow mechanism, the recommendation system combines and updates new user feedback data with historical data based on incremental updating calculation, and user portraits are updated through an incremental learning algorithm to capture the changes of user interests and preferences; and simultaneously, the new user feedback data is utilized to carry out iterative updating and optimization on the personalized recommendation model, the updated user portraits and the personalized recommendation model can more accurately reflect the real-time interests and requirements of the user, and more accurate and satisfactory recommendation service is provided.

In the embodiment, through collecting the user interaction data on the terminal equipment and updating the user portrait and the personalized recommendation model in real time, the recommendation result can be continuously iterated and optimized, the real-time personalized recommendation service is realized, the reinforcement learning model continuously adjusts the recommendation strategy through the interactive learning with the user, the satisfaction degree and the long-term income of the user are maximized, and the recommendation result can be better matched with the real-time interest and the requirement of the user; when the content recommendation algorithm and the reasoning algorithm are combined, the historical behaviors and the interests and hobbies of the user are considered when the recommendation candidate set is generated, potential information corresponding to the potential interests of the user is obtained through reasoning, the diversity and the coverage range of recommendation are expanded, and a more comprehensive recommendation result is provided; the display and the sequencing of the recommendation results are dynamically adjusted by combining the explicit feedback and the implicit feedback of the user, so that the recommendation results can better meet the requirements and the preferences of the user, and the satisfaction degree of the user is improved; the interactive action data of the user is transmitted to the recommendation system through a data reflow mechanism, new user feedback data and historical data are combined and updated based on incremental update calculation, so that a user portrait and a personalized recommendation model are updated, the personalized recommendation model is continuously and iteratively updated and optimized by utilizing an incremental learning algorithm, the updated user portrait and the updated personalized recommendation model can reflect real-time interests and demands of the user more accurately, and more accurate and satisfactory recommendation service is provided.

In an alternative embodiment, based on the pre-acquired user image, the ordered search result acquires a recommendation candidate set through a content recommendation algorithm, and inferentially acquires potential information corresponding to potential interests of the user, and adds the potential information into the recommendation candidate set, so as to synthesize a recommendation result, wherein the recommendation result comprises:

；

Calculating the preference weight of the user on the content characteristics and multiplying the value of the content on the characteristics, and simultaneously considering the importance weight of the characteristics and the interaction strength of the user and the content on the characteristics; normalizing the weighted scores of all the features, and respectively calculating the square sum of the weighted scores of the features and the square root of the weighted score sum of the content on the features by the user for normalizing the feature scores; and on the basis of the normalized score, correcting the similarity between the user and the content and the popularity of the content by adjusting parameters, and finally obtaining the preference score of the user on the content.

According to the formula, the interests and the demands of the user can be reflected more accurately by comprehensively considering the preference weights of the user on the characteristics, so that the individuation degree of the recommended content is improved; the interest of the user to the specific content is effectively measured by using the feature importance weight and the interaction strength of the user and the content, so that the recommendation result is more in line with the real preference of the user; by combining the similarity between the user and the content and the popularity adjustment parameter of the content, the recommendation system not only recommends the content similar to the user, but also recommends some popular or high-quality content with low similarity to the user, and the diversity of recommendation results is increased; the algorithm can dynamically capture and adapt to the change of the user interests according to the interaction strength of the user and the content on the characteristics, so that the recommendation result can reflect the latest interests and requirements of the user in real time; and the factors in multiple aspects are comprehensively considered, and the generated recommended content is more fit with the personalized requirements of the user, so that the satisfaction degree and the use experience of the user on the recommended result are improved.

FIG. 2 is a schematic structural diagram of an artificial intelligence data searching and distributing system according to an embodiment of the present invention, as shown in FIG. 2, the system includes:

In a third aspect of an embodiment of the present invention,

There is provided an electronic device including:

A processor;

A memory for storing processor-executable instructions;

In a fourth aspect of an embodiment of the present invention,

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. An artificial intelligence data search and distribution method, characterized by comprising:

Obtaining a search request input by a user, semantically expanding the search request based on a pre-built comprehensive knowledge graph to obtain a target expanded search keyword, and performing a semantic search on structured data and unstructured data based on the target expanded search keyword and a pre-built multimodal semantic index to obtain preliminary search results;

Extracting and fusing features of the preliminary search results through a multimodal fusion neural network to obtain a multimodal fusion feature vector, using an attention mechanism to adjust the weight of the multimodal fusion feature vector, calculating the relevance score between each data in the preliminary search results and the user's search intent based on the adjusted multimodal fusion feature vector, and sorting the preliminary search results based on the relevance score to obtain an ordered search result;

The ordered search results are input into a personalized recommendation system to generate final recommendation results, and based on the terminal device and user portrait, the final recommendation results are presented and adapted, distribution content is determined, and pushed to the terminal device;

The search request input by the user is obtained, and based on the pre-built comprehensive knowledge graph, the search request is semantically expanded to obtain the target expanded search keyword, and the structured data and unstructured data are semantically searched according to the target expanded search keyword and the pre-built multimodal semantic index to obtain the preliminary search results including:

Obtaining a search request input by a user, preprocessing the search request to obtain a search request text, and converting the search request text into a semantic vector representation of the search request;

Using a comprehensive knowledge graph to semantically expand the semantic vector representation to obtain a target expanded search keyword;

Expanding the search keywords according to the target, searching in a pre-built multimodal semantic index, wherein the multimodal semantic index includes a structured data index and an unstructured data index;

Based on the structured data, using a graph-based query language, matching the target expanded search keyword with entities and relationships in the multimodal semantic index to obtain a first search result;

Based on the unstructured data, a pre-trained multimodal representation learning model is used to map the target extended search keyword into a multimodal semantic space, and a second search result is obtained by vector similarity calculation;

The first search result and the second search result are semantically fused and sorted based on semantic relevance and importance to obtain preliminary search results.

2. The method according to claim 1 is characterized in that the semantic vector representation is semantically expanded using a comprehensive knowledge graph to obtain target expanded search keywords including:

The entities and relations in the comprehensive knowledge graph are vectorized by using a knowledge graph embedding model to obtain entity relationship vector representations, and the entities with the highest similarity are selected as candidate expansion keywords by calculating the similarity between the semantic vector representations and the entity relationship vector representations;

Taking the candidate extended keyword as the center, a random walk algorithm is used to perform context sampling in the comprehensive knowledge graph, and by determining the node degree, centrality and clustering coefficient of the comprehensive knowledge graph, a node centrality measure and a node importance score are calculated to generate an extended keyword sequence;

Based on the node centrality measurement and the node importance score, the extended keyword sequence is screened to obtain a target extended search keyword.

3. The method according to claim 1 is characterized in that the feature extraction and fusion of the preliminary search results are performed through a multimodal fusion neural network to obtain a multimodal fusion feature vector, the weight of the multimodal fusion feature vector is adjusted using an attention mechanism, and the correlation score between each data in the preliminary search results and the user's search intention is calculated according to the adjusted multimodal fusion feature vector, and the preliminary search results are sorted according to the correlation score, and the ordered search results obtained include:

Based on the structured data in the preliminary search results, corresponding key attributes and corresponding attribute values are extracted to determine explicit features; based on the unstructured data in the preliminary search results, semantic features are extracted using a pre-trained deep learning model to determine deep features;

Inputting the explicit features and the deep features into a multimodal fusion neural network, performing feature fusion through multi-layer nonlinear transformation and interactive operation, and obtaining a multimodal fusion feature vector;

Based on the attention mechanism, the semantic vector representation corresponding to the user's search intention is interactively calculated with the multimodal fusion feature vector to obtain an attention weight matrix. The weight distribution of different dimensions in the multimodal fusion feature vector is adjusted through the attention weight matrix to determine the multimodal weighted fusion feature vector;

A similarity measurement method is used to calculate the correlation score between the multimodal weighted fusion feature vector and the semantic vector representation of the user's search intention, and the preliminary search results are sorted from high to low according to the correlation score to determine the ordered search results.

4. The method according to claim 1, characterized in that the ordered search results are input into a personalized recommendation model to generate a final recommendation result, and based on the terminal device and the user portrait, the final recommendation result is presented and adapted, distribution content is determined, and pushed to the terminal device, comprising:

Based on the pre-acquired user portrait, the ordered search results obtain a recommendation candidate set through a content recommendation algorithm, and infer the potential information corresponding to the user's potential interests, add the potential information to the recommendation candidate set, and synthesize the recommendation results;

According to the real-time feedback and behavior changes of users, the recommendation results are optimized in real time through the reinforcement learning model. By dynamically adjusting the priority sorting parameters and combining the explicit and implicit feedback of users, the display and sorting of the recommendation results are dynamically adjusted to generate the final recommendation results.

Based on the terminal device information and the user portrait, the final recommendation result is presented and adapted, the distribution content is determined, and pushed to the corresponding terminal device;

When the terminal device receives the final recommendation result, it collects user interaction action data in real time and generates data feedback. The data feedback is calculated based on incremental updates through data reflux to update the user portrait and iteratively update the personalized recommendation model.

5. The method according to claim 4 is characterized in that, based on the pre-acquired user portrait, the ordered search results obtain a recommendation candidate set through a content recommendation algorithm, and infer the potential information corresponding to the user's potential interests, and add the potential information to the recommendation candidate set, and the synthesis recommendation result includes:

In the content recommendation algorithm, the user content preference is calculated as follows: ;

Wherein, u represents user u , c represents content c , pu _,c represents the preference score of user u for content c , f represents the feature in the content, F represents the set of all features, wu _,f represents the preference weight of user u for feature f , vc _,f _represents the value of content c on feature f , xf represents the importance weight of feature f , yu _,c,f represents the interaction intensity between user u and content c on feature f , α represents the intensity parameter controlling similarity, simu _,c represents the similarity between user u and content c , β represents the intensity parameter controlling popularity, _and popc represents the popularity of content c .

6. An artificial intelligence data search and distribution system, used to implement the method of any one of claims 1 to 5, characterized in that it comprises:

The first unit is used to obtain a search request input by a user, perform semantic expansion on the search request based on a pre-built comprehensive knowledge graph, obtain a target expanded search keyword, perform semantic search on structured data and unstructured data based on the target expanded search keyword and a pre-built multimodal semantic index, and obtain preliminary search results;

The second unit is used to extract and fuse the features of the preliminary search results through a multimodal fusion neural network to obtain a multimodal fusion feature vector, use an attention mechanism to adjust the weight of the multimodal fusion feature vector, calculate the relevance score of each data in the preliminary search results and the user's search intention according to the adjusted multimodal fusion feature vector, sort the preliminary search results according to the relevance score, and obtain an ordered search result;

The third unit is used to input the ordered search results into a personalized recommendation system, generate a final recommendation result, present and adapt the final recommendation result based on the terminal device and the user portrait, determine the distribution content, and push it to the terminal device.

7. An electronic device, comprising:

processor;

a memory for storing processor-executable instructions;

The processor is configured to call the instructions stored in the memory to execute the method described in any one of claims 1 to 5.

8. A computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method according to any one of claims 1 to 5.