CN119377361A

CN119377361A - A large model document question-answering method and system based on graph structure

Info

Publication number: CN119377361A
Application number: CN202411323514.0A
Authority: CN
Inventors: 王兵
Original assignee: Guangzhou Pole 3d Information Technology Co ltd
Current assignee: Guangzhou Pole 3d Information Technology Co ltd
Priority date: 2024-09-23
Filing date: 2024-09-23
Publication date: 2025-01-28
Anticipated expiration: 2044-09-23
Also published as: CN119377361B

Abstract

The invention discloses a large model document question-answering method and a large model document question-answering system based on a graph structure, wherein the method comprises the steps of obtaining a target product document, identifying product information described in the target product document and generating a preliminary knowledge graph of the graph structure of the target product; according to the product information and the preliminary knowledge graph, searching the structural similarity of the product graph in the existing historical product knowledge base to generate a more comprehensive target knowledge graph, coding the questioning information of the user and the information of the target knowledge graph, calculating the similarity to extract all relevant characteristics of the questioning information, and feeding back answer information according to the extracted characteristics through the summarizing and reasoning capability of the large model. The embodiment of the invention can accurately answer related problems aiming at the household field, has high accuracy and is beneficial to improving user experience, and can be widely applied to the technical field of computers.

Description

Large model document question-answering method and system based on graph structure

Technical Field

The invention relates to the technical field of computers, in particular to a large model document question-answering method and system based on a graph structure.

Background

In the field of document question-answering/understanding, how to quickly and accurately take all pieces of information related to a certain question or a certain content of interest has been an important point of business concern. In the large model and the large data age, the understanding of the document often needs to rely on the capability of the large model to carry out embedd ing vectorization on the whole document, and then the document is segmented by manpower or other rules and stored in a vector database. When a question about a certain information in a document is encountered, the part related to the question needs to be extracted from all the blocks of the vector database, and the document blocks and the question are combined and conveyed into a large model for reasoning and answering. The large model document understanding method with the graph structure can more accurately and more rapidly extract all relevant parts in the vector database, reduce resources consumed by document understanding, and improve the accuracy of document understanding, thereby improving the working efficiency.

A document question and answer typically requires that all pieces of information related to the question and information be obtained. Acquisition of the information pieces all document blocks satisfying the similarity requirement with the problem are obtained by traversing each document block in the vector database using some algorithm such as cosine similarity algorithm, euclidean distance algorithm, etc. However, the acquired document block contains a large amount of irrelevant information, and key information beneficial to answering the questions is easily hidden and submerged by useless text, so that the output accuracy is reduced. Meanwhile, more useless information occupies a large number of large-model input characters, and GPU resources are too high to be utilized efficiently.

In particular, for the home field, a large model document question-answering method is not applied to the application scene.

Disclosure of Invention

The embodiment of the invention mainly aims to provide a large model document question-answering method and system based on a graph structure, which can accurately answer related problems aiming at the household field, have high accuracy and are beneficial to improving user experience.

In order to achieve the above purpose, an aspect of the embodiments of the present invention provides a method for question answering of a large model document based on a graph structure, including the following steps:

acquiring a target product document, identifying product information described in the target product document and generating a preliminary knowledge graph of a graph structure of a target product;

according to the product information and the preliminary knowledge graph, searching the structural similarity of the product graph in the existing historical product knowledge base, and judging whether the target product document exists in the historical product knowledge base or not;

If the target product exists in the historical product knowledge base, fusing the preliminary knowledge graph with the knowledge graph in the historical product knowledge base to obtain a target knowledge graph;

If the target product does not exist in the historical product knowledge base, the closest product in the historical product knowledge base is matched to guide the target product document, and a more comprehensive target knowledge map is generated;

coding the questioning information of the user and the information of the target knowledge graph, and calculating the similarity to extract all the relevant characteristics of the questioning information;

Answer information is fed back for the extracted features through the summarization and reasoning capability of the large model.

In some embodiments, the obtaining the target product document, identifying product information described in the target product document, and generating a preliminary knowledge-graph of a graph structure of the target product includes the steps of:

converting a PDF product document input by a user into a target picture in a PNG format, and preprocessing the picture, wherein the preprocessing comprises image graying and binarization;

detecting a contour line segment of the target picture through a feature detection algorithm of deep learning, and screening out a target contour, wherein the target contour is used for representing a table structure and a CAD curve structure;

Based on the analysis of the angles and the lengths of the line segments, connecting and analyzing the identified structural contours, combining a plurality of line segments into a complete structural contour, and generating contour analysis results of various content types;

According to the structural outline and the outline analysis result, obtaining the product name contained in the document and the attribute information corresponding to the product by adjusting the prompt word prompt in the household field, wherein the attribute information comprises color systems, styles, materials and spaces;

generating a preliminary graph structure knowledge graph according to the attribute information, wherein a main node of the graph structure knowledge graph is a product name, and a child node is the attribute information of the product;

And storing by using a Neo4j graph database according to the obtained information of the graph structure knowledge graph of the node.

In some embodiments, the analyzing based on the angles and lengths of the line segments, connecting and analyzing the identified structural contours, combining a plurality of line segments into a complete structural contour, and generating contour analysis results of each content type, includes the following steps:

and respectively analyzing the different modes based on the obtained contour information:

For the outline of the form type, performing row-column analysis;

For the outline of the text lattice type, analyzing the content of the text lattice according to the original typesetting format of the table;

and for the outline of the picture type, the picture is independently identified and stored to the corresponding position of the document layout.

In some embodiments, the searching for structural similarity of product graphs in an existing historical product knowledge base according to the product information and the preliminary knowledge graph, and judging whether the target product document already exists in the historical product knowledge base, includes the following steps:

Performing traversal search on an existing historical product knowledge base to perform spectrum similarity matching, namely judging whether a target product document exists in the base, if so, extracting a spectrum structure of a corresponding product in the historical product knowledge base, and if not, performing similarity calculation on the preliminary knowledge spectrum and the knowledge spectrum of the existing historical product knowledge base by using a path matching algorithm, and extracting a nearest product graph structure in the historical product knowledge base;

The method comprises the steps of fusing the patterns of target product documents, specifically, extracting the product patterns of the target product documents aiming at the matched historical knowledge base if the target product documents exist in the historical product knowledge base, fusing the attributes which do not exist in the target product documents but belong to the target product, and fusing the obtained product patterns into the knowledge patterns of the target product documents to carry out information supplementation and pattern alignment, summarizing the attributes contained in the product patterns of the matched similar knowledge base if the target product documents do not exist in the historical product knowledge base, transmitting the summarized attributes as prompt words to a large model, regenerating the product knowledge patterns of the target product documents, and realizing guidance of the product knowledge base on generation of new document patterns.

In some embodiments, the encoding the question information of the user and the information of the target knowledge graph, and calculating the similarity to extract all the features related to the question information, includes the following steps:

coding and representing the questions of the user through embedd ing model to obtain a user question embedd ing vector;

Vectorizing the final product map by using embedd ing model of the map to obtain embedding vector of the product map;

and carrying out similarity calculation on the user problem embedd ing vector and the product map embedding vector, and extracting all nodes related to the user problem and meeting the threshold value and all attribute information of the nodes in the map.

In some embodiments, the feedback of answer information for extracted features through summary and reasoning capabilities of the large model comprises the steps of:

And splicing the questions of the user with all node information related to the questions extracted from the atlas, inputting all the node information into a natural language dialogue large model for reasoning and analysis, and returning answers to the questions by the large model.

Another aspect of the embodiment of the present invention further provides a large model document question-answering system based on a graph structure, including:

The first module is used for acquiring a target product document, identifying product information described in the target product document and generating a preliminary knowledge graph of a graph structure of a target product;

the second module is used for searching the structural similarity of the product graph in the existing historical product knowledge base according to the product information and the preliminary knowledge graph and judging whether the target product document exists in the historical product knowledge base or not;

a third module, configured to fuse the preliminary knowledge graph with the knowledge graph in the historical product knowledge base to obtain a target knowledge graph if the target product exists in the historical product knowledge base;

A fourth module, configured to, if the target product does not exist in the historical product knowledge base, match a closest product in the historical product knowledge base to guide the target product document, and generate a more comprehensive target knowledge graph;

A fifth module, configured to encode the questioning information of the user and the information of the target knowledge graph, and calculate a similarity to extract all features related to the questioning information;

And a sixth module for feeding back answer information for the extracted features through the summarization and reasoning capability of the large model.

Another aspect of the embodiment of the invention also provides an electronic device, which includes a processor and a memory;

The memory is used for storing programs;

the processor executes the program to implement the method as described above.

Another aspect of the embodiments of the present invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement a method as described above.

Another aspect of embodiments of the invention also provides a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

The method and the system have the advantages that a large model document question-answering method and system based on a graph structure are provided, a target product document is obtained, product information described in the target product document is identified, a preliminary knowledge graph of the graph structure of the target product is generated, product graph structure similarity searching is conducted in an existing historical product knowledge base according to the product information and the preliminary knowledge graph, whether the target product document exists in the historical product knowledge base is judged, if the target product exists in the historical product knowledge base, the preliminary knowledge graph is fused with the knowledge graph in the historical product knowledge base to obtain a target knowledge graph, if the target product does not exist in the historical product knowledge base, the closest product in the historical product knowledge base is matched to guide the target product document, a more comprehensive target knowledge graph is generated, question information of a user and information of the target knowledge graph are coded, similarity is calculated to extract all relevant characteristics of the question information, and the characteristics of the question information are extracted through the large model and the answer capability is summarized according to the feedback. The embodiment of the invention can accurately answer related problems aiming at the household field, has high accuracy and is beneficial to improving user experience.

Drawings

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present invention;

FIG. 2 is a flow chart of the overall steps provided by an embodiment of the present invention;

FIG. 3 is a question-answering flow chart based on the knowledge graph of the graph structure provided by the embodiment of the invention;

FIG. 4 is a diagram of a data model structure of an intra-domain knowledge graph provided by an embodiment of the present invention;

FIG. 5 is a method for generating a product library map guide according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with embodiments of the invention, but are merely examples of apparatuses and methods consistent with aspects of embodiments of the invention as detailed in the accompanying claims.

It is to be understood that the terms "first," "second," and the like, as used herein, may be used to describe various concepts, but are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present invention. The words "if", as used herein, may be interpreted as "when" or "in response to a determination", depending on the context.

The terms "at least one", "a plurality", "each", "any" and the like as used herein, at least one includes one, two or more, a plurality includes two or more, each means each of the corresponding plurality, and any one means any of the plurality.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

The embodiment of the invention provides a large model document question-answering method and system based on a graph structure, and relates to the technical field of computers. The large model document question-answering method based on the graph structure provided by the embodiment of the invention can be applied to a terminal, a server and software running in the terminal or the server. In some embodiments, the terminal may be, but not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle terminal, etc., the server may be configured as an independent physical server, may be configured as a server cluster or a distributed system formed by a plurality of physical servers, may be configured as a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platform, and the server may also be a node server in a blockchain network, and the software may be an application for implementing a big model document question-answering method based on a graph structure, etc., but is not limited to the above forms.

The invention is operational with numerous general purpose or special purpose computer system environments or configurations. Such as a personal computer, a server computer, a hand-held or portable device, a tablet device, a multiprocessor system, a microprocessor-based system, a set top box, a programmable consumer electronics, a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

FIG. 1 is a schematic view of an implementation environment according to an embodiment of the present invention. Referring to fig. 1, the implementation environment includes at least one terminal 102 and a server 101. The terminal 102 and the server 101 can be connected through a network in a wireless or wired mode to complete data transmission and exchange.

The server 101 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

In addition, server 101 may also be a node server in a blockchain network. The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like.

The terminal 102 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., where the terminal 102 may also be a vehicle-mounted terminal of various device types as exemplified above, but is not limited thereto. The terminal 102 and the server 101 may be directly or indirectly connected through wired or wireless communication, which is not limited in this embodiment of the present invention.

Exemplary based on the implementation environment shown in fig. 1, the embodiment of the present invention provides a large model document question-answering method based on a graph structure, and the large model document question-answering method based on the graph structure is described below by taking an example that the large model document question-answering method based on the graph structure is applied to a server 101 as an example, and it is understood that the method can also be applied to a terminal 102.

Referring to fig. 2, fig. 2 is a flowchart of a large model document question-answering method based on a graph structure, which is applied to a server according to an embodiment of the present invention, and an execution subject of the method may be any one of the foregoing computer devices (including a server or a terminal).

Referring to fig. 2, the method may include the steps of:

For the outline of the form type, performing row-column analysis;

The following describes the implementation process of the embodiment of the present invention in detail in a specific application scenario with reference to the accompanying drawings of the specification:

in order to answer the question about the document content information of the user quickly and accurately, the text understanding and summarizing capability of the large model is utilized to generate the information of the graph structure of the document, then the user question is matched with the information of the graph structure through a similar algorithm, all graph node information related to the question is collected accurately, the answer is analyzed through the reasoning capability of the large model, and the accuracy of the answer is improved.

The embodiment of the invention provides a method for carrying out document question answering by using a graph structure. The method has the core that the information related to the user problem is quickly and accurately aggregated and searched by combining the existing graph structure information in the product library and the reasoning analysis capability of the large model, so that the deep understanding requirement of the complex document is realized.

The embodiment of the invention provides a large model household product document question-answering method based on a graph structure, and a method flow chart is shown in figure 3. Firstly, an AI is used for analyzing a document input by a user, the product name and the corresponding style, color system, space and other attribute information of the product described in the document are identified, and a product diagram structure knowledge map shown in fig. 4 is initially generated. Secondly, searching the structural similarity of the product graph in the existing product knowledge base, judging whether the product exists in the product knowledge base, if the product exists in the product knowledge base, fusing the knowledge graph of the document with the knowledge graph in the product knowledge base to realize feature supplement and alignment, and if the product does not exist in the product knowledge base, matching the closest product in the knowledge base to guide the document, so as to realize more comprehensive knowledge graph generation. And coding the questioning of the user and the information of the knowledge graph, calculating the similarity to realize extraction of all relevant characteristics of the questions, and finally giving out answers through the summarization and reasoning capacity of the large model to finish deep understanding and questioning and answering of the documents.

Specifically, the implementation process of the embodiment of the invention comprises the following steps:

1. document analysis and knowledge graph construction referring to fig. 5, the graph generation process of the embodiment of the present invention includes the following steps:

I. document analysis

A. Aiming at PDF product documents input by users, PDF is firstly converted into pictures in PNG format, and then operations such as image graying, binarization and the like are carried out so as to facilitate subsequent content extraction and form analysis.

B. All possible contour line segments in the document picture are detected through a feature detection algorithm of deep learning, such as CNN. Of these contour segments, contours that may represent structures such as tables, CAD curves, etc. are screened out.

C. Connection and analysis is performed on the identified structural contours. And combining a plurality of line segments into a complete structural outline through line segment angle and length analysis and other methods. Analyzing the table in different modes based on the obtained outline information, analyzing the text lattice content according to the original typesetting format of the table, and identifying the picture individually and storing the picture in the corresponding position of the document layout.

I. knowledge graph construction

D. and matching the analyzed document file with the analysis capability of the large model, and obtaining the product name contained in the document and the attribute information such as color system, style, material, space and the like corresponding to the product by adjusting prompt words promtt in the household field.

E. And preliminarily generating a graph structure knowledge graph according to the obtained attribute information. The main node of the knowledge graph is the product name, the sub-nodes are the color system, style, material, space and other attribute information of the product, and the attribute nodes respectively comprise specific information nodes.

F. And obtaining the knowledge graph information of the graph structure of the node, and storing the knowledge graph information by using a Neo4j graph database. And the subsequent retrieval, addition and similarity calculation of the map information are facilitated.

2. Map matching and fusion

I. Map similarity matching

G. traversing search is firstly carried out on the existing product knowledge base, and whether the document product exists in the base is judged. If the product exists, extracting the map structure of the product in the knowledge base.

H. if the document product does not exist in the existing product knowledge base, the obtained document product knowledge graph and the existing product base knowledge graph are subjected to similarity calculation by using a path matching algorithm, and the nearest product graph structure in the library is extracted.

I. Pattern fusion

I. If the document product exists in the product knowledge base, extracting the attribute which does not exist in the document but also belongs to the product aiming at the matched knowledge base product map, and fusing the attribute into the knowledge map of the document to carry out information supplement and map alignment.

J. If the document product does not exist in the product knowledge base, summarizing the attributes contained in the product atlas of the matched similar knowledge base, transmitting the summarized attributes as prompt words to the large model, regenerating the product knowledge atlas of the product document, and realizing the guidance of the product knowledge base on the generation of the new document atlas.

3. Problem encoding and node extraction

I. Question coding

K. in order to better and faster answer the questions of the user, the same embedd ing model as the selected large model is used for coding and representing the questions of the user, so that the questions are convenient to store and calculate the subsequent similarity.

I.node extraction

And carrying out vector representation on the final product map by using embedd ing model of the map, carrying out similarity calculation on the final product map and the obtained user problem embedd ing vector, and extracting all nodes related to the problem and meeting the threshold value and all attribute information of the nodes in the map.

4. Large model inference answers

And m, splicing the question of the user with all node information related to the question extracted from the atlas, inputting all the node information into a natural language dialogue large model for reasoning and analysis, and returning the answer of the question by the large model. The information is summarized in the form of the map, so that the problem that the reference information must be similar in reading position in the traditional large model knowledge base question-answering/understanding is solved, the redundancy of the reference information is reduced, and the retrieval efficiency and the answer speed are improved.

It can be understood that the content in the above method embodiment is applicable to the system embodiment, and the functions specifically implemented by the system embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.

The embodiment of the invention also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the large model document question-answering method based on the graph structure when executing the computer program. The electronic equipment can be any intelligent terminal including a tablet personal computer, a vehicle-mounted computer and the like.

It can be understood that the content in the above method embodiment is applicable to the embodiment of the present apparatus, and the specific functions implemented by the embodiment of the present apparatus are the same as those of the embodiment of the above method, and the achieved beneficial effects are the same as those of the embodiment of the above method.

Referring to fig. 6, fig. 6 illustrates a hardware structure of an electronic device according to another embodiment, the electronic device includes:

The processor 601 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits, etc. for executing related programs, so as to implement the technical solution provided by the embodiments of the present invention;

The Memory 602 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 602 may store an operating system and other application programs, and when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 602, and the processor 601 invokes the large model document question-answering method based on the graph structure to execute the embodiments of the present disclosure;

an input/output interface 603 for implementing information input and output;

the communication interface 604 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

a bus 605 for transferring information between the various components of the device (e.g., the processor 601, memory 602, input/output interface 603, and communication interface 604);

wherein the processor 601, the memory 602, the input/output interface 603 and the communication interface 604 are communicatively coupled to each other within the device via a bus 605.

The embodiment of the invention also provides a computer readable storage medium which stores a computer program, and the computer program realizes the large model document question-answering method based on the graph structure when being executed by a processor.

It can be understood that the content of the above method embodiment is applicable to the present storage medium embodiment, and the functions of the present storage medium embodiment are the same as those of the above method embodiment, and the achieved beneficial effects are the same as those of the above method embodiment.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

It should be noted that, in each specific embodiment of the present invention, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards. In addition, when the embodiment of the invention needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through popup or jump to a confirmation page and the like, and after the independent permission or independent consent of the user is definitely acquired, the necessary relevant data of the user for enabling the embodiment of the invention to normally operate is acquired.

The embodiments described in the embodiments of the present invention are for more clearly describing the technical solutions of the embodiments of the present invention, and do not constitute a limitation on the technical solutions provided by the embodiments of the present invention, and those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present invention are equally applicable to similar technical problems.

It will be appreciated by persons skilled in the art that the embodiments of the invention are not limited by the illustrations, and that more or fewer steps than those shown may be included, or certain steps may be combined, or different steps may be included.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the invention and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present invention, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention. The storage medium includes various media capable of storing programs, such as a USB flash disk, a removable hard disk, a Read-only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present invention have been described above with reference to the accompanying drawings, and are not thereby limiting the scope of the claims of the embodiments of the present invention. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present invention shall fall within the scope of the claims of the embodiments of the present invention.

Claims

1. A large model document question-answering method based on a graph structure, characterized by comprising the following steps:

Obtain a target product document, identify product information described in the target product document, and generate a preliminary knowledge graph of the graph structure of the target product;

Based on the product information and the preliminary knowledge graph, a product graph structure similarity search is performed in an existing historical product knowledge base to determine whether the target product document already exists in the historical product knowledge base;

If the target product exists in the historical product knowledge base, the preliminary knowledge graph is merged with the knowledge graph in the historical product knowledge base to obtain a target knowledge graph;

If the target product does not exist in the historical product knowledge base, the most similar product in the historical product knowledge base is matched to guide the target product document to generate a more comprehensive target knowledge graph;

Encode the user's question information and the information of the target knowledge graph, and calculate the similarity to extract all features related to the question information;

The summary and reasoning capabilities of the large model are used to provide feedback on the extracted features.

2. According to the graph-structured large-model document question-answering method of claim 1, the step of obtaining the target product document, identifying the product information described in the target product document, and generating a preliminary knowledge graph of the graph structure of the target product comprises the following steps:

For the PDF product document input by the user, the PDF product document is converted into a target image in PNG format, and the image is preprocessed, wherein the preprocessing operation includes image grayscale processing and binarization processing;

Through the feature detection algorithm of deep learning, the contour line segments of the target image are detected to screen out the target contour, and the target contour is used to represent the table structure and the CAD curve structure;

Based on the analysis of line segment angles and lengths, the identified structural contours are connected and analyzed, multiple line segments are combined into a complete structural contour, and contour analysis results for each content type are generated;

According to the structural outline and the outline analysis result, the product name contained in the document and the attribute information corresponding to the product are obtained by adjusting the prompt word prompt in the home furnishing field, wherein the attribute information includes color, style, material, and space;

Generate a preliminary graph structure knowledge graph based on the attribute information; wherein the main node of the graph structure knowledge graph is the product name, and the child node is the attribute information of the product;

According to the information of the graph structure knowledge graph of the obtained nodes, the Neo4j graph database is used for storage.

3. According to the graph-structured large-model document question-answering method of claim 2, the method is characterized in that the analysis based on the line segment angle and length connects and analyzes the identified structural contours, combines multiple line segments into a complete structural contour, and generates contour analysis results for each content type, including the following steps:

Based on the obtained contour information, different analysis methods are performed:

For tabular type profiles, row and column analysis is performed;

For outlines of the text cell type, the content of the text cell is parsed according to the original layout format of the table;

For outlines of picture type, the pictures are individually identified and saved to the corresponding positions on the document layout.

4. According to the graph-structured large-model document question-answering method of claim 1, it is characterized in that the step of performing a product graph structure similarity search in an existing historical product knowledge base based on the product information and the preliminary knowledge graph to determine whether the target product document already exists in the historical product knowledge base comprises the following steps:

Traverse and search the existing historical product knowledge base and perform graph similarity matching, specifically: determine whether the target product document already exists in the library, and if so, extract the graph structure of the corresponding product in the historical product knowledge base; if the target product document does not exist in the existing historical product knowledge base, use the path matching algorithm to calculate the similarity between the preliminary knowledge graph and the knowledge graph of the existing historical product knowledge base, and extract the most similar product graph structure in the historical product knowledge base;

The graph of the target product document is fused, specifically: if the target product document exists in the historical product knowledge base, then the attributes that do not exist in the target product document but also belong to the target product are extracted from the matched historical knowledge base product graph, and integrated into the knowledge graph of the target product document for information supplement and graph alignment; if the target product document does not exist in the historical product knowledge base, then the attributes contained in the matched similar knowledge base product graph are summarized, and these summaries are passed as prompt words to the large model to regenerate the product knowledge graph of the target product document, thereby realizing the guidance of the product knowledge base on the generation of new document graphs.

5. According to a graph-structured large-model document question-answering method according to claim 1, characterized in that encoding the user's question information with the information of the target knowledge graph and calculating the similarity to extract all features related to the question information comprises the following steps:

Encode the user's question through the embedding model to obtain the user's question embedding vector;

Use the graph embedding model to vectorize the final product graph and obtain the product graph embedding vector;

The similarity between the user question embedding vector and the product graph embedding vector is calculated to extract all nodes related to the user question and meeting the threshold and all attribute information of the nodes in the graph.

6. A large-model document question-answering method based on a graph structure according to claim 1, characterized in that the feedback of answer information based on the extracted features by using the summarization and reasoning capabilities of the large model comprises the following steps:

The user's question is spliced with all the node information related to the question extracted from the graph, and all are input into the natural language dialogue model for reasoning and analysis, and the big model returns the answer to the question.

7. A large-model document question-answering system based on a graph structure, characterized by comprising:

The first module is used to obtain a target product document, identify the product information described in the target product document, and generate a preliminary knowledge graph of the graph structure of the target product;

The second module is used to perform a product graph structure similarity search in an existing historical product knowledge base based on the product information and the preliminary knowledge graph, and determine whether the target product document already exists in the historical product knowledge base;

The third module is used for fusing the preliminary knowledge graph with the knowledge graph in the historical product knowledge base to obtain a target knowledge graph if the target product exists in the historical product knowledge base;

The fourth module is used for matching the most similar product in the historical product knowledge base to guide the target product document if the target product does not exist in the historical product knowledge base, so as to generate a more comprehensive target knowledge graph;

The fifth module is used to encode the user's question information and the information of the target knowledge graph, and calculate the similarity to extract all features related to the question information;

The sixth module is used to provide feedback and answer information based on the extracted features through the summarization and reasoning capabilities of the large model.

8. An electronic device, comprising a processor and a memory;

The memory is used to store programs;

The processor executes the program to implement the method according to any one of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the storage medium stores a program, and the program is executed by a processor to implement the method according to any one of claims 1 to 6.

10. A computer program product, comprising a computer program, wherein when the computer program is executed by a processor, the method according to any one of claims 1 to 6 is implemented.