Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a natural language query scheme based on a space-time knowledge cube, wherein the space-time knowledge cube is used for describing and analyzing data in different time and space dimensions, and the meaning and the relevance of the data are enhanced through semantic expansion. And (3) introducing an intermediate ontology layer by utilizing the virtual knowledge graph, enriching the semantic relation between data, constructing a space-time knowledge cube, and processing complex data query in the space-time knowledge cube by utilizing a large language model to realize natural language query of multi-source space-time data and reduce a user query threshold.
In order to achieve the above object, the present invention provides a natural language query method based on a spatio-temporal knowledge cube, comprising the steps of:
step1, constructing a space-time data cube based on a relation data table;
Step 2, constructing a virtual knowledge graph describing the space-time knowledge cube according to the space-time data cube;
step 2.1, constructing a domain ontology according to a relation data table;
Step 2.2, representing the body structure in the form of triples;
step 2.3, defining ontology concepts, relationships and attributes of the space-time knowledge cube according to table names, fields and foreign keys of a relationship data table of the space-time data cube and referring to OGC standard GeoSPARQL, and forming an RDF graph;
Step 2.4, constructing a mapping model according to the RDF graph, mapping the data in the relational data table onto the concept and the attribute defined by the ontology of the space-time knowledge cube, and expressing by using a W3C standard R2 RML;
Step 2.5, taking the body structure of the space-time knowledge cube expressed in the form of the triplet of step 2.2 and the mapping model constructed according to the RDF diagram of step 2.4 as the virtual knowledge graph of the space-time knowledge cube;
step 3, based on the space-time knowledge cube, carrying out natural language data query by using a large language model;
step 3.1, performing entity recognition on natural language input by a user by using a large language model, and analyzing the required GeoSPARQL field information and calculation information;
Step 3.2, splicing into a prompt project required by a large language model by utilizing the domain ontology in the space-time knowledge cube;
step 3.3, based on the large language model, converting the natural language query into a corresponding GeoSPARQL query statement by using the selected prompt engineering;
and 3.4, executing the GeoSPARQL query statement in the OBDA system to acquire a data result.
Further, the space-time data cube in the step 1 is represented by a relational data table, including a product dimension table, a time dimension table, a space dimension table, a grid fact table and a vector fact table. The product dimension table contains 3 columns, the first column is the product ID, the second column is the product name, and the third column is the product type. The time dimension table contains 3 columns, the first column is the time ID, the second column is the observation time, and the third column is the update time. The space dimension table contains 4 columns, the first column is a space ID, the second column is a place name, the third column is a four-to-range, and the fourth column is an accurate range. The grid facts table contains 5 columns, the first column being the grid facts ID, the second column being the product ID, the third column being the time ID, the fourth column being the space ID, the fifth column being the grid data address. The vector fact table contains 5 columns, the first column is vector fact ID, the second column is product ID, the third column is time ID, the fourth column is space ID, and the fifth column is vector data address.
Grid fact tables and vector fact tables are associated with product dimension tables, time dimension tables, and space dimension tables. The product ID is the primary key of the product dimension table and is the foreign key with the product ID in the grid fact table and the vector fact table. The time ID is the primary key of the time dimension table and is the foreign key to the time ID in the grid fact table and the vector fact table. The space ID is the primary key of the space dimension table and is the foreign key to the space ID in the grid fact table and the vector fact table.
Further, the domain ontology core concept in the step 2.1 covers "time", "space", "measurement" and "dimension", where "time" and "space" form the basic dimension of the spatiotemporal data, and "measurement" refers to the value obtained by observation or calculation performed at a specific time and space point, and "dimension" is not limited to time and space, but can be extended to other analysis dimensions.
Further, in the triplet (O, C, R, P) in the step 2.2, O represents an ontology, C represents an ontology concept, R represents an ontology relationship, and P represents an ontology attribute.
Further, in the step 3.1, the natural language understanding capability of the large language model is utilized to extract important information elements in the query intention of the user, including entity names, attributes and association relations among the entities, and based on the extracted important information elements, an intermediate representation form is constructed to extract the relations among the entities, the attributes and specific grammar elements required by GeoSPARQL query.
Further, in the step 3.2, according to the ontology concept and the relationship in the space-time knowledge cube defined in the step 2.3, the prompt information applicable to the large language model is designed and organized, and the designed prompt information is integrated into the input of the large language model by utilizing the prediction capability of the large language model to form a complete prompt project.
Further, in the step 3.4, the virtual knowledge graph is accessed into the query OBDA system based on the ontology database through an API, and the data query is performed by utilizing GeoSPARQL query sentences.
The invention also provides a natural language query system based on the space-time knowledge cube, which is used for realizing the natural language query method based on the space-time knowledge cube.
Further, the system includes a processor and a memory, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a spatiotemporal knowledge cube based natural language query method as described above.
Or comprises a readable storage medium having stored thereon a computer program which, when executed, implements a spatiotemporal knowledge cube based natural language query method as described above.
Compared with the prior art, the invention has the following advantages:
1) The invention can simplify the query operation, and the user can query the space-time data only by the natural language without grasping the complex database query language by the natural language processing technology and the large language model.
2) And the multisource data integration is realized by effectively organizing, storing and managing data with time and space dimensions based on a data organization form of a space-time data cube and effectively integrating multisource heterogeneous data by utilizing a virtual knowledge graph technology.
3) And the intelligent data query is that the natural language query input by the user can be accurately analyzed by utilizing the ontology prompt and the large language model and converted into a corresponding structured query (GeoSPARQL) statement, so that the accurate query and analysis of the multidimensional data can be realized.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and examples of the present invention, and it is apparent that the described examples are some, but not all, examples of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a natural language query method based on a spatio-temporal knowledge cube, including the following steps:
And step1, constructing a space-time data cube based on the relation data table.
The spatiotemporal data cubes are represented in relational data tables, including product dimension tables, time dimension tables, space dimension tables, grid fact tables, and vector fact tables. The product dimension table contains 3 columns, the first column is the product ID, the second column is the product name, and the third column is the product type. The time dimension table contains 3 columns, the first column is the time ID, the second column is the observation time, and the third column is the update time. The space dimension table contains 4 columns, the first column is a space ID, the second column is a place name, the third column is a four-to-range, and the fourth column is an accurate range. The grid facts table contains 5 columns, the first column being the grid facts ID, the second column being the product ID, the third column being the time ID, the fourth column being the space ID, the fifth column being the grid data address. The vector fact table contains 5 columns, the first column is vector fact ID, the second column is product ID, the third column is time ID, the fourth column is space ID, and the fifth column is vector data address.
Grid fact tables and vector fact tables are associated with product dimension tables, time dimension tables, and space dimension tables. The product ID is the primary key of the product dimension table and is the foreign key with the product ID in the grid fact table and the vector fact table. The time ID is the primary key of the time dimension table and is the foreign key to the time ID in the grid fact table and the vector fact table. The space ID is the primary key of the space dimension table and is the foreign key to the space ID in the grid fact table and the vector fact table.
The five data table structures can be implemented in various relational database management systems, such as an Oracle Spatial database, a PostgreSQL database and the like. Taking PostgreSQL database schema as an example, the table information is as follows:
Table 1 spatiotemporal data cube metadata table
And 2, constructing a virtual knowledge graph describing the space-time knowledge cube according to the space-time data cube.
And 2.1, constructing a domain ontology according to the relation data table in the step 1.
Domain ontology core concepts cover "time", "space", "metrics" and "dimensions". Wherein "time" and "space" form the fundamental dimensions of spatiotemporal data, and "metric" refers to the observed or calculated values at specific points in time and space, and the "dimensions" are not limited to time and space, but can be extended to other analysis dimensions.
And 2.2, representing the body structure in the form of triples.
O in the triples (O, C, R, P) represents an ontology, C represents an ontology concept, R represents an ontology relationship, and P represents an ontology attribute.
The example of building a domain ontology from the relationship data table in step1 is as follows:
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Product>rdf:type owl:Class .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Productname>rdf:type owl:DatatypeProperty ;
rdfs:domain :Product;
rdfs:range xsd:string .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Productname>rdf:type owl:DatatypeProperty ;
rdfs:domain :Product;
rdfs:range xsd:string .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Observation>rdf:type owl:DatatypeProperty ;
rdfs:domain :Product ;
rdfs:range xsd:dateTime .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#Raster>rdf:type owl:Class .
<http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube#belongsToProduct>rdf:type owl:ObjectProperty;
rdfs:domain : Raster;
rdfs:range :Product .
With ontology < http:// www.semanticweb.org/geocube/ontologies/spatial-temporal-cube #)
Product > rdf: type owl: class. As an example, ontology O is http:// www.semanticweb.org/geocube
Ontologies/spatial-temporal-cube # Product, ontology concept C is Product, ontology R is rdf: type, ontology property is owl: class, whole ontology meaning is Product is a Class (Class).
Step 2.3, according to table names, fields and foreign keys of the relational data table of the spatiotemporal data cube, referring to OGC standard (Open Geospatial Consortium, open geographic space information alliance standard) GeoSPARQLA (Geographic Query Language for RDF Data, geographic query language of RDF data), defining ontology concepts, relations and attributes of the spatiotemporal knowledge cube, and forming RDF diagram (Resource Description Framework, resource description framework diagram).
In this embodiment, according to table names, fields and foreign keys of the relational data table of the spatiotemporal data cube, referring to OGC standard GeoSPARQL, the ontology relationship of the spatiotemporal knowledge cube is defined as shown in fig. 2, and the established RDF diagram is shown in fig. 3.
Step 2.4, constructing a mapping model according to the RDF graph, mapping the data in the relational data table onto the concepts and attributes defined by the ontology of the space-time knowledge cube, and expressing the data by using the W3C standard (World Wide Web Consortium ) R2RML (RDB to RDF MAPPING Language, RDB to RDF mapping Language).
Examples of building a map from RDF graphs and expressing using the W3C standard R2RML are as follows:
@prefix rr:<http://www.w3.org/ns/r2rml#>.
@prefix rml:<http://semweb.mmlab.be/ns/rml#>.
@prefix ql:<http://semweb.mmlab.be/ns/ql#>.
@prefix xsd:<http://www.w3.org/2001/XMLSchema#>.
@prefix stc:<http://www.semanticweb.org/geocube/ontologies/spatial -temporal -cube #>.
@prefix geo:<http://www.opengis.net/ont/geosparql#>.
<#RasterMapping>
rr:logicalTable [
rr:sqlQuery """
SELECT r.id AS id_rs, p.id AS id_pr, ST_AsText(s.geom) AS geom
FROM "Raster" r, "Spatial" s, "Product" p
WHERE r.spatial_key = s.key
AND r.product_key = p.key
"""
] ;
rr:subjectMap [
rr:template "http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/raster-{id_rs}" ;
rr:class :Raster
] ;
rr:predicateObjectMap [
rr:predicate :belongsToProduct ;
rr:objectMap [
rr:template
"http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/product-{id_pr}" ;
]
] ;
rr:predicateObjectMap [
rr:predicate geo:hasGeometry ;
rr:objectMap [
rr:template
"http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/geom-{id_rs}" ;
]
] .
<#GeometryMapping>
rr:logicalTable [
rr:sqlQuery """
SELECT r.id AS id_rs, ST_AsText(s.geom) AS geom
FROM "Raster" r, "Spatial" s
WHERE r.spatial_key = s.key
"""
] ;
rr:subjectMap [
rr:template "http://www.semanticweb.org/geocube/ontologies/spatial-temporal-cube/geom-{id_rs}" ;
rr:class geo:Geometry
] ;
rr:predicateObjectMap [
rr:predicate geo:asWKT ;
rr:objectMap [
rr:column "geom" ;
rr:datatype geo:wktLiteral
]
] .
and 2.5, taking the body structure of the space-time knowledge cube expressed in the form of the triplet of the step 2.2 and the mapping model constructed according to the RDF diagram in the step 2.4 as a virtual knowledge graph of the space-time knowledge cube.
And 3, based on the space-time knowledge cube, carrying out natural language data query by using a large language model.
And 3.1, performing entity recognition on natural language input by a user by using a large language model, and analyzing the required GeoSPARQL field information and calculation information.
The method comprises the steps of extracting important information elements in user query intention, including entity names, attributes and association relations among entities, constructing an intermediate representation form based on the extracted important information elements, and extracting GeoSPARQL field information (relations among the entities and attributes) and calculation information (GeoSPARQL specific grammar elements required by query) by utilizing natural language understanding capability of a large language model.
And 3.2, splicing prompt engineering required by the large language model by utilizing the domain ontology in the space-time knowledge cube.
According to the ontology concepts and relations in the space-time knowledge cube defined in the step 2.3, the prompt information applicable to the large language model is designed and organized, and the designed prompt information is integrated into the input of the large language model by utilizing the prediction capability of the large language model to form a complete prompt project.
And 3.3, based on the large language model, converting the natural language query into a corresponding GeoSPARQL query statement by using the selected prompt engineering.
Step 3.4, executing the GeoSPARQL query statement in OBDA (ontologiy-baseddataaccess, ontology-based data access) system to obtain a data result.
And accessing the virtual knowledge graph into the OBDA system through the API, and carrying out data query by utilizing GeoSPARQL sentences.
Taking the natural language "how many grid images are in the martial arts" input by the user as an example, the important information elements related in the natural language query sentence are sequentially analyzed according to rules:
(1) Query field (SELECT) all fields ) Accurate range (geom)
(2) The related table (WHERE) is a grid image (Raster), a space dimension table (Spatial), and a Product dimension table (Product)
(3) Screening (Filter) intersection (st_ intersects)
SPARQL field information is "master" and calculation information is "SELECT", "WHERE", "FILTER", "COUNT".
The information is spliced into a prompt project, and the prompt project is obtained as follows:
"now a text2sparql task, please convert the user question into GeoSPARQL query statement according to the following ontology information. The ontology is { RDF ontology }, and the problem is { natural language problem }. "
According to the prompt engineering above, a large language model technique is used to generate GeoSPARQL query statements, resulting in GeoSPARQL as follows:
PREFIX geo:<http://www.opengis.net/ont/geosparql#>
PREFIX geof:<http://www.opengis.net/def/function/geosparql/>
PREFIX stc:<http://www.semanticweb.org/geocube/ontologies/spatial-temporal
-cube#>
SELECT (COUNT(?image) AS ?imageCount)
WHERE {
?image a stc:Raster;
geo:hasGeometry ? image_geom.
?image_geom geo:asWKT ?image_wkt.
?wuhan_region a geo:Geometry;
geo asWKT region_wkt.# wuhan _region is a geographic entity representing the geographic scope of the martial arts generated by a large language model
FILTER(geof:sfIntersects(?image_wkt, ?region_wkt))
}
In OBDA system, execute the GeoSPARQL query sentence to obtain the data needed by the end user, wherein the total number of grid data of Wuhan city is 126
Example 2
Based on the same inventive concept, the invention also provides a natural language query system based on the space-time knowledge cube, which comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the natural language query method based on the space-time knowledge cube.
Example 3
Based on the same inventive concept, the invention also provides a natural language query system based on the space-time knowledge cube, which comprises a readable storage medium, wherein the readable storage medium is stored with a computer program, and the computer program realizes the natural language query method based on the space-time knowledge cube when being executed.
In particular, the method according to the technical solution of the present invention may be implemented by those skilled in the art using computer software technology to implement an automatic operation flow, and a system apparatus for implementing the method, such as a computer readable storage medium storing a corresponding computer program according to the technical solution of the present invention, and a computer device including the operation of the corresponding computer program, should also fall within the protection scope of the present invention.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.