CN117112852A

CN117112852A - Large language model driven vector database retrieval method and device

Info

Publication number: CN117112852A
Application number: CN202311385079.XA
Authority: CN
Inventors: 屠静; 王亚; 赵策; 苏岳; 万晶晶; 李伟伟; 颉彬; 周勤民
Original assignee: Zhuoshi Future Beijing technology Co ltd
Current assignee: Zhuoshi Future Beijing technology Co ltd
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2023-11-24
Anticipated expiration: 2043-10-25
Also published as: CN117112852B

Abstract

The invention provides a large language model driven vector database retrieval method and a device, wherein the method specifically comprises the following steps: acquiring demand data to be retrieved; detecting whether the demand data accords with the type of the authorized materials supported by the vector database; if yes, vector similarity calculation is carried out on each resource data vector in the vector database and the required data respectively, so that a first target resource data vector is determined from each resource data vector; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material types by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model. Therefore, the cross-modal data retrieval function of the vector database is realized.

Description

Large language model driven vector database retrieval method and device

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a large language model driven vector database retrieval method and device.

Background

A vector database is a database used to store, analyze, and retrieve vectors. Traditional relational databases are often used as data archives, and when querying data, the databases are usually searched accurately, so that fuzzy matching is rarely caused. Whereas queries in vector databases are typically computation of the distance between vectors, querying the data row most similar to the input conditions is a computationally intensive database. Meanwhile, the vector database has high concurrency and low delay design requirements. The vector database has the characteristics of large storage capacity, high concurrency, low delay and the like, so that the use of the distributed vector database is indispensable, and the distributed vector database has high expansibility and high reliability.

Currently, vector database technology is still in development, and Milvus, vearch, proxima, scaNN is a common vector database. Most of the using modes are to firstly use a neural network model to change pictures, texts and the like into feature vectors, and then select a distance calculating mode (such as inner product distance, cosine distance and the like).

However, current vector database retrieval only supports data retrieval in the same modality, for example, a vector database in an image modality only supports vector similarity analysis on image input features, but cross-modality data retrieval, for example, similarity calculation on video input features or text input features, cannot be realized.

In view of the above problems, currently, no preferred solution is proposed.

Disclosure of Invention

The invention provides a large language model driven vector database retrieval method and device, which are used for solving the defect that a vector database cannot support cross-mode data retrieval in the prior art.

The invention provides a large language model driven vector database retrieval method, which comprises the following steps: acquiring demand data to be retrieved; detecting whether the demand data accords with the type of the authorized materials supported by the vector database; if yes, vector similarity calculation is carried out on each resource data vector in the vector database and the required data respectively, so that a first target resource data vector is determined from each resource data vector; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

According to the large language model driven vector database retrieval method provided by the invention, the data conversion model comprises a large language model unit and a material generation unit which are cascaded, wherein the large language model unit is used for identifying target intention corresponding to the requirement data, and the material generation unit is used for generating cross-modal requirement data corresponding to the target intention; model parameters of the large language model unit remain unchanged during training of the data conversion model with a set of data samples.

According to the large language model driven vector database retrieval method provided by the invention, the method further comprises training operation aiming at the data conversion model, wherein the training operation comprises the following steps: locking model parameters of each connecting layer in the large language model unit; based on the data sample set, training the large language model unit by adopting a forward propagation algorithm, wherein the forward propagation algorithm is as follows:

（1）

wherein Y represents a model output result, M represents model parameters of the large language model unit, x represents model input sample characteristics, beta represents super parameters, lambda represents intermediate parameters, and G and P both represent weight adjustment matrices.

According to the method for searching the vector database driven by the large language model provided by the invention, each resource data vector in the vector database is respectively subjected to vector similarity calculation with the required data so as to determine a first target resource data vector from the resource data vectors, and the method comprises the following steps: carrying out characterization processing on the demand data to obtain a demand feature vector corresponding to the demand data;

determining a demand characteristic vector corresponding to the demand data;

determining matching similarity between the resource data vector and the demand feature vector based on a multi-level similarity matching algorithm aiming at each resource data vector in a vector database; the multi-level similarity matching algorithm comprises a first-level matching algorithm and a second-level matching algorithm which are cascaded, wherein the first-level matching algorithm adopts a clustering algorithm, and the second-level matching algorithm adopts a cosine similarity calculation algorithm;

and determining a first target resource data vector according to the resource data vector corresponding to the largest one of the matching similarity.

According to the method for searching the vector database driven by the large language model provided by the invention, for each resource data vector in the vector database, the matching similarity between the resource data vector and the demand feature vector is determined based on a multi-level similarity matching algorithm, and the method comprises the following steps: dividing each resource data vector in the vector database into a plurality of cluster clusters based on a density clustering algorithm, and acquiring a center point vector corresponding to a cluster center point of each cluster;

For each center point vector, calculating the Euclidean distance between the demand feature vector and the center point vector;

selecting a cluster with the minimum Euclidean distance from the clusters to serve as a target cluster;

and calculating cosine similarity between each resource data vector and the demand feature vector aiming at each resource data vector in the target cluster to serve as the corresponding matching similarity.

According to the large language model driven vector database retrieval method provided by the invention, the Euclidean distance is calculated by the following method, which comprises the following steps:

（2）

wherein θ _j Representing the center point vector V _j The Euclidean distance between the clustering cluster and the demand characteristic vector A, and n represents the number of clustering clusters.

According to the vector database retrieval method driven by the large language model, the cosine similarity N (d) ₁ , d ₂ ) The calculation method is as follows:

（3）

wherein Z represents the number of channels of the demand data, d ₁ And d ₂ And the coding features of the Z dimensions corresponding to the demand feature vector and the central point vector are respectively N (d) ₁ , d ₂ )∈[0,1]。

According to the large language model driven vector database retrieval method provided by the invention, the authorized material types comprise any one of the following: text material type, audio material type, image material type, and video material type.

According to the large language model driven vector database retrieval method provided by the invention, the authorized material type is a video material type or an image material type; before the vector similarity calculation is performed on each resource data vector in the vector database and the required data, the method further includes: extracting an effective information block in a frame image corresponding to the demand data, and dividing the effective information block into a plurality of pixel areas; the pixel areas in the frame image are respectively subjected to feature coding in the following way:

（4）

wherein C represents a frame image scale parameter, H, K and E represent a matrix, a key matrix and a value matrix respectively corresponding to the frame image,representing the encoded output characteristics, L representing the number of channels of the frame image; />A region feature vector representing a dimension of c×c×l corresponding to the frame image;

based on a multi-head attention mechanism and a self-attention mechanism in a transducer model, updating the weight value of the region feature vector corresponding to each pixel region specifically comprises:

（5）

B _T-RMAC representing the output characteristics after the weights are updated,representing feature dimension normalization operations on a c×c scale;

Based on a Gaussian filtering algorithm, updating the weight value of the region feature vector corresponding to the pixel region in the center of the frame image specifically comprises the following steps:

（6）

wherein K (x, y) represents the weight, delta, of the pixel point (x, y) on the feature map _x And delta _y The gaussian kernel parameters in the x-axis and y-axis directions are represented respectively,and->The width and height of the pixel region at the center of the image are indicated, respectively.

According to the present invention, there is provided a large language model driven vector database retrieval apparatus, the apparatus comprising: the acquisition module is used for acquiring the requirement data to be retrieved; the detection module is used for detecting whether the demand data accords with the type of the authorized materials supported by the vector database; the vector matching module is used for carrying out vector similarity calculation on each resource data vector in the vector database and the required data respectively if yes so as to determine a first target resource data vector from the resource data vectors; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

The data conversion model comprises a large language model unit and a material generation unit which are connected in cascade, wherein the large language model unit is used for identifying target intention corresponding to the demand data, and the material generation unit is used for generating cross-modal demand data corresponding to the target intention; model parameters of the large language model unit remain unchanged during training of the data conversion model with a set of data samples.

Optionally, the vector matching module is further configured to:

training operations for the data conversion model, wherein the training operations include: locking model parameters of each connecting layer in the large language model unit; based on the data sample set, training the large language model unit by adopting a forward propagation algorithm, wherein the forward propagation algorithm is as follows:

（1）

Optionally, the vector matching module is further configured to:

carrying out characterization processing on the demand data to obtain a demand feature vector corresponding to the demand data;

Determining a demand characteristic vector corresponding to the demand data;

Optionally, the vector matching module is further configured to:

dividing each resource data vector in the vector database into a plurality of cluster clusters based on a density clustering algorithm, and acquiring a center point vector corresponding to a cluster center point of each cluster;

Optionally, calculating the euclidean distance comprises:

（2）

（3）

According to the large language model driven vector database retrieval method provided by the invention, the authorized material type is a video material type or an image material type;

Before the vector similarity calculation is performed on each resource data vector in the vector database and the required data, the method further includes: extracting an effective information block in a frame image corresponding to the demand data, and dividing the effective information block into a plurality of pixel areas; the pixel areas in the frame image are respectively subjected to feature coding in the following way:

（4）

（5）

（6）

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the vector database retrieval method driven by the large language model when executing the program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a large language model driven vector database retrieval method as described in any one of the above.

The present invention also provides a computer program product comprising a computer program which when executed by a processor implements a large language model driven vector database retrieval method as described in any one of the above.

The invention provides a large language model driven vector database retrieval method, a large language model driven vector database retrieval device, electronic equipment and a non-transitory computer readable storage medium. Therefore, the vector database does not need to be set for each data mode, the construction cost of the multi-mode database is reduced, one vector database can support data retrieval corresponding to a plurality of different modes, and the cross-mode data retrieval function of the vector database is realized.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a flowchart of one example of a large language model driven vector database retrieval method according to an embodiment of the present invention;

fig. 2 shows an operation flowchart according to an example of step S131 in fig. 1;

fig. 3 shows an operation flowchart according to an example of step S230 in fig. 2;

FIG. 4 shows a block diagram of an example of a large language model driven vector database retrieval apparatus according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 illustrates a flowchart of an example of a large language model driven vector database retrieval method according to an embodiment of the present invention.

The execution subject of the method of the embodiment of the invention can be any controller or processor with computing or processing capability to achieve the goal of providing data retrieval to a client user. In some examples, it may be configured integrally in a client or a server by means of software, hardware, or a combination of software and hardware, which should not be limited herein.

The details of the technical scheme related to the present invention will be described below with a large language model driven vector database retrieval device as an exemplary implementation subject. It should be understood that one or more of the steps involved in the subordinate flows may be implemented by one or more controllers or software installed and deployed in the server.

As shown in fig. 1, in step S110, demand data to be retrieved is acquired.

The user performs an input operation on the client to determine corresponding demand data, and uploads the demand data to the server to request the server to invoke the vector database to perform a data retrieval operation.

In step S120, it is detected whether the demand data conforms to the authorized material types supported by the vector database.

Here, the authorized material type includes any one of the following: text material type, audio material type, image material type, and video material type.

It should be noted that the material types supported by the current vector database are generally only text types, for example, postgreSQL may implement some simple vector calculations by expanding pgvector to implement a basic text similarity metric. Currently, vector databases are more unable to support cross-modal or cross-type material retrieval, such as vector databases that support retrieval responses to entered text data or image data at the same time.

In step S131, if yes, vector similarity calculation is performed on each resource data vector in the vector database and the demand data, so as to determine a first target resource data vector from each resource data vector.

In step S133, if not, the demand data is input to the data conversion model, so as to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and vector similarity calculation is performed on each resource data vector in the vector database and the cross-modal demand data respectively, so as to determine a second target resource data vector from each resource data vector.

Here, the data conversion model employs a large language model (Large Language Model, LLM), and various types of large language model architectures, such as ChatGPT, roBERTa, etc., may be employed without limitation. Among them, the best known Large Language Model (LLM) architecture is a transducer architecture, and a typical transducer model has four cascaded key links when processing input data: word embedding-position encoding-self-attention mechanism-feedforward neural network.

Specifically, when the authorized material type supported by the vector database is the image material type and the required data to be retrieved is the text type, automatically generating a target image corresponding to the required data through a large language model, taking the target image as cross-modal required data, respectively carrying out similarity calculation on the modal required data and each resource data vector in the vector database, and further determining a target resource data vector corresponding to the maximum similarity in the vector database. For example, when the demand data is "sister", the large language model automatically generates a target image corresponding to "sister", and performs vector similarity calculation on the "sister" image and each resource data vector in the vector database, so as to find a target resource data vector corresponding to the maximum similarity. Therefore, the vector database can support the retrieval of input data corresponding to a plurality of different modes through the natural language processing and generating capacity of the large language model, and the cross-mode data retrieval function of the vector database is realized.

In some examples of the embodiment of the invention, the data conversion model comprises a large language model unit and a material generation unit which are cascaded, the large language model unit is used for identifying target intention corresponding to the demand data, and the material generation unit is used for generating cross-modal demand data corresponding to the target intention, wherein model parameters of the large language model unit are kept unchanged in the process of training the data conversion model by utilizing a data sample set, and only weight parameters of the material generation unit are adjusted in the training process, so that the logical understanding capability of natural language of the large language model unit can be reserved, the language characteristics of the large language model unit are prevented from being damaged in the training process, the material generation performance of the material generation unit can be optimized, the data conversion model can accurately generate the cross-modal demand data corresponding to the authorization material type aiming at the demand data, and the high quality of the generated cross-modal demand data is ensured.

In some examples of embodiments of the present invention, the method further includes training operations for the data conversion model, specifically including: model parameters of each connection layer in the large language model unit are locked, and then the large language model unit is trained by adopting a forward propagation algorithm based on a data sample set, wherein the forward propagation algorithm is as follows:

（1）

Wherein Y represents a model output result, M represents model parameters of a large language model unit, x represents model input sample characteristics, beta represents a super parameter, lambda represents an intermediate parameter, and G and P both represent weight adjustment matrices.

The estimation process for the above formula (1) is as follows:

assuming that the model output isThe input is x, and the forward propagation algorithm is

（2）

Wherein,，/>。/>incremental weights representing fine-tuning of the model. By means of largeModel fine tuning method (e.g., loRA method), in forward propagation algorithm +.>Instead of the weight adjustment matrices G and P, wherein +.>, And introducing the super-parameter beta, so that the propagation formula is converted into

（3）

It should be noted that, the training process of the data conversion model is the process of reasoning the parameters of the matrix G and P, adam, SDG, etc. are selected as optimizers, and the values of r and super parameter β are continuously adjusted, so as to train the model, and obtain adjusted weight matrices G and P. And adding the result of G, P to the original parameter matrix S to obtain the weight S of the data conversion model, and then using the model to generate the cross-modal demand data.

Fig. 2 shows an operation flowchart according to an example of step S131 in fig. 1.

In step S210, the demand data is subjected to a characterization process to obtain a demand feature vector corresponding to the demand data.

In step S220, a demand feature vector corresponding to the demand data is determined.

In step S230, for each resource data vector in the vector database, a matching similarity between the resource data vector and the demand feature vector is determined based on a multi-level similarity matching algorithm.

The multi-level similarity matching algorithm comprises a first-level matching algorithm and a second-level matching algorithm which are cascaded, wherein the first-level matching algorithm adopts a clustering algorithm, and the second-level matching algorithm adopts a cosine similarity calculation algorithm.

In step S240, a first target resource data vector is determined from the resource data vector corresponding to the largest one of the respective matching similarities.

According to the embodiment of the invention, the resource data vector with larger similarity with the demand feature vector is rapidly positioned by using the clustering algorithm to screen, for example, the similarity calculation which is not needed to be performed is screened out by filtering the resource data vector which does not exceed the preset similarity threshold, and only the similarity calculation of the key part is performed, so that the search efficiency can be accelerated while the system resource is saved.

Fig. 3 shows an operation flowchart according to an example of step S230 in fig. 2.

As shown in fig. 3, in step S310, each resource data vector in the vector database is divided into a plurality of clusters based on a density clustering algorithm, and a center point vector corresponding to a cluster center point of each cluster is obtained.

It should be noted that the density clustering algorithm is a density-based clustering method, which classifies the teaching points with connected densities into one class, and classifies the isolated points into noise. Illustratively, according to a preset neighborhood size or density threshold, each resource data vector in the vector database is subjected to pre-partitioning processing to determine corresponding M cluster clusters, and M cluster center points are obtained.

In step S320, the euclidean distance between the demand feature vector and the center point vector is calculated for each center point vector.

In step S330, a cluster corresponding to the cluster having the smallest euclidean distance is selected from among the clusters as a target cluster.

In step S340, for each resource data vector in the target cluster, a cosine similarity between each resource data vector and the demand feature vector is calculated as a corresponding matching similarity.

Specifically, the similarity calculation is classified according to M cluster center points obtained through preprocessing and the feature distribution of the data set. The spatial distance between the input vector and M clustering centers is calculated to judge which resource vector in M categories is closest to the input vector, and then cosine similarity between the demand data and each resource data vector in the category is calculated, so that the target resource data vector closest to the demand data in similarity can be found out.

Specifically, the euclidean distance is calculated by:

（4）

Thereby, through Euclidean distance theta _j And presuming the similarity between the input vector and each resource data vector in the target cluster.

Further, cosine similarity N (d ₁ , d ₂ ) The calculation method is as follows:

（5）

wherein Z represents the number of channels of the demand data, d ₁ And d ₂ Coding features corresponding to Z dimensions for the demand feature vector and the center point vector, respectively, N (d ₁ , d ₂ )∈[0,1]。

In some embodiments, the method selects the frame image with the highest cosine similarity corresponding to the required data as the target resource data vector by searching in the vector database.

It should be noted that in the practice of the present invention, materials for image types, such as image material types or video material types, are found, and similar content of the found frame contents is often concentrated in the middle position of the frame, and the frame difference at the edge position of the frame can be ignored.

In view of this, in some examples of embodiments of the invention, before similarity calculation is performed for the demand data and the resource data vectors in the vector database, operations including: and extracting an effective information block in the frame image corresponding to the demand data, and dividing the effective information block into a plurality of pixel areas. Illustratively, pixels at edge locations of the frame image are cropped to reduce device resource consumption.

Further, each pixel region in the frame image is feature-coded in the following manner:

（6）

wherein C represents a frame image scale parameter, H, K and E represent a matrix, a key matrix and a value matrix respectively corresponding to the frame image,representing the encoded output characteristics, L representing the number of channels of the frame image; />Region feature vectors representing the corresponding c×c×l dimensions of the frame image;

then, based on the multi-head attention mechanism and the self-attention mechanism in the transducer model, updating the weight value of the region feature vector corresponding to each pixel region specifically comprises:

（7）

（8）

wherein K (x, y) represents the weight, delta, of the pixel point (x, y) on the feature map _x And delta _y Respectively are provided withRepresenting gaussian kernel parameters in the x-axis and y-axis,and->The width and height of the pixel region at the center of the image are indicated, respectively.

Therefore, before feature similarity calculation is performed on frame image materials or input image materials in the video, gaussian kernel preprocessing is performed on the image materials in advance, so that more spatial attention can be obtained from a feature map of a central area of the image materials, and high accuracy of the determined target resource data vector can be guaranteed more favorably.

The large language model driven vector database searching device provided by the invention is described below, and the large language model driven vector database searching device described below and the large language model driven vector database searching method described above can be correspondingly referred to each other.

Fig. 4 shows a block diagram of a large language model driven vector database retrieval apparatus according to an embodiment of the present invention.

As shown in fig. 4, the large language model driven vector database retrieval apparatus 400 includes an acquisition module 410, a detection module 420, and a vector matching module 430.

The obtaining module 410 is configured to obtain the requirement data to be retrieved.

The detection module 420 is configured to detect whether the requirement data conforms to the authorized material types supported by the vector database.

The vector matching module 430 is configured to, if yes, perform vector similarity calculation on each resource data vector in the vector database and the demand data, so as to determine a first target resource data vector from each resource data vector; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material types by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

The data conversion model comprises a large language model unit and a material generation unit which are connected in cascade, wherein the large language model unit is used for identifying target intention corresponding to the requirement data, and the material generation unit is used for generating cross-mode requirement data corresponding to the target intention; model parameters of the large language model unit remain unchanged during training of the data conversion model using the data sample set.

Optionally, the vector matching module 430 is further configured to:

training operations for a data conversion model, wherein the training operations include: locking model parameters of each connecting layer in the large language model unit; based on the data sample set, training the large language model unit by adopting a forward propagation algorithm, wherein the forward propagation algorithm is as follows:

（1）

Optionally, the vector matching module 430 is further configured to:

carrying out characteristic processing on the demand data to obtain a demand characteristic vector corresponding to the demand data;

determining a demand characteristic vector corresponding to the demand data;

Determining matching similarity between the resource data vector and the demand feature vector based on a multi-level similarity matching algorithm aiming at each resource data vector in the vector database; the multi-level similarity matching algorithm comprises a first-level matching algorithm and a second-level matching algorithm which are cascaded, wherein the first-level matching algorithm adopts a clustering algorithm, and the second-level matching algorithm adopts a cosine similarity calculation algorithm;

and determining a first target resource data vector according to the resource data vector corresponding to the largest one of the matching similarities.

Optionally, the vector matching module 430 is further configured to:

dividing each resource data vector in a vector database into a plurality of cluster clusters based on a density clustering algorithm, and acquiring a center point vector corresponding to a cluster center point of each cluster;

and aiming at each resource data vector in the target cluster, calculating cosine similarity between each resource data vector and the demand feature vector to serve as corresponding matching similarity.

Optionally, calculating the euclidean distance comprises:

（4）

According to the vector database retrieval method driven by the large language model provided by the invention, the cosine similarity N (d) ₁ , d ₂ ) The calculation method is as follows:

（5）

before vector similarity calculation is performed on each resource data vector in the vector database and the demand data respectively, the method further comprises the following steps: extracting an effective information block in a frame image corresponding to the demand data, and dividing the effective information block into a plurality of pixel areas; each pixel region in the frame image is respectively subjected to feature coding in the following manner:

（6）

based on a multi-head attention mechanism and a self-attention mechanism in a transducer model, updating the weight value of the region feature vector corresponding to each pixel region specifically comprises the following steps:

（7）

（8）

Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a large language model driven vector database retrieval method comprising: acquiring demand data to be retrieved; detecting whether the demand data accords with the type of the authorized materials supported by the vector database; if yes, vector similarity calculation is carried out on each resource data vector in the vector database and the required data respectively, so that a first target resource data vector is determined from each resource data vector; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the method for searching a vector database driven by a large language model provided by the above methods, the method comprising: acquiring demand data to be retrieved; detecting whether the demand data accords with the type of the authorized materials supported by the vector database; if yes, vector similarity calculation is carried out on each resource data vector in the vector database and the required data respectively, so that a first target resource data vector is determined from each resource data vector; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for large language model driven vector database retrieval provided by the above methods, the method comprising: acquiring demand data to be retrieved; detecting whether the demand data accords with the type of the authorized materials supported by the vector database; if yes, vector similarity calculation is carried out on each resource data vector in the vector database and the required data respectively, so that a first target resource data vector is determined from each resource data vector; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for large language model driven vector database retrieval, the method comprising:

acquiring demand data to be retrieved;

detecting whether the demand data accords with the type of the authorized materials supported by the vector database;

if yes, vector similarity calculation is carried out on each resource data vector in the vector database and the required data respectively, so that a first target resource data vector is determined from each resource data vector;

if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.

2. The large language model driven vector database retrieval method according to claim 1, wherein the data conversion model comprises a cascaded large language model unit and a material generation unit, the large language model unit is used for identifying target intention corresponding to the requirement data, and the material generation unit is used for generating cross-modal requirement data corresponding to the target intention; model parameters of the large language model unit remain unchanged during training of the data conversion model with a set of data samples.

3. The large language model driven vector database retrieval method of claim 2, further comprising a training operation for the data conversion model, wherein the training operation comprises:

locking model parameters of each connecting layer in the large language model unit;

based on the data sample set, training the large language model unit by adopting a forward propagation algorithm, wherein the forward propagation algorithm is as follows:

（1）

4. The large language model driven vector database retrieval method according to claim 1, wherein said performing vector similarity calculation on each of the resource data vectors in the vector database and the demand data to determine a first target resource data vector from the each of the resource data vectors, respectively, comprises:

Determining a demand characteristic vector corresponding to the demand data;

5. The large language model driven vector database retrieval method according to claim 4, wherein said determining matching similarity between said resource data vector and said demand feature vector for each resource data vector in a vector database based on a multi-level similarity matching algorithm comprises:

6. The large language model driven vector database retrieval method according to claim 5, wherein calculating the euclidean distance by:

（2）

7. The large language model driven vector database retrieval method according to claim 5, which is characterized in thatCharacterized in that the cosine similarity N (d ₁ , d ₂ ) The calculation method is as follows:

（3）

8. The large language model driven vector database retrieval method according to any one of claims 1-7, wherein the authorized material types include any one of: text material type, audio material type, image material type, and video material type.

9. The large language model driven vector database retrieval method according to claim 8, wherein the authorized material type is a video material type or an image material type;

before the vector similarity calculation is performed on each resource data vector in the vector database and the required data, the method further includes:

extracting an effective information block in a frame image corresponding to the demand data, and dividing the effective information block into a plurality of pixel areas;

the pixel areas in the frame image are respectively subjected to feature coding in the following way:

（4）

（5）

（6）

10. A large language model driven vector database retrieval apparatus, the apparatus comprising:

the acquisition module is used for acquiring the requirement data to be retrieved;

the detection module is used for detecting whether the demand data accords with the type of the authorized materials supported by the vector database;

the vector matching module is used for carrying out vector similarity calculation on each resource data vector in the vector database and the required data respectively if yes so as to determine a first target resource data vector from the resource data vectors; if not, inputting the demand data into a data conversion model to generate cross-modal demand data corresponding to the authorized material type by the data conversion model, and respectively carrying out vector similarity calculation on each resource data vector in the vector database and the cross-modal demand data to determine a second target resource data vector from each resource data vector; the data conversion model adopts a large language model.