Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a collusion-resistant multi-file privacy information retrieval method and a collusion-resistant multi-file privacy information retrieval system, which are based on marginal information inquiry and demand information inquiry and are efficiently realized on the basis of any oneResistance to arbitrary in a distributed storage system for codesAnd searching multi-file privacy information colluded by the servers.
To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:
The first aspect of the invention provides a collusion-resistant multi-file privacy information retrieval method.
A collusion-resistant multi-file privacy information retrieval method comprises the following steps:
A plurality of files are stored on a plurality of servers in a distributed mode through file sub-packaging and coding storage modes, so that a distributed storage system is formed;
acquiring search information of a demand file, carrying out query round selection together with storage parameters, and constructing a plurality of alternative search schemes;
selecting an optimal retrieval scheme from a plurality of alternative retrieval schemes according to the PIR code rate of the retrieval scheme;
obtaining the content of the demand file by executing an optimal searching scheme;
the searching scheme comprises a plurality of rounds including the requirement information inquiry and the marginal information inquiry and a round including only the requirement information inquiry, and non-requirement files in the follow-up round requirement information inquiry results are eliminated by utilizing the marginal information inquiry results, so that the contents of the requirement files are obtained.
Further, the method for packetizing and encoding the files is to cut the files into piecesOriginal data packets are divided, and each data packet passes throughCode generationThe data packets after the coding are stored in the following parts respectivelyIn the server.
Further, the storage parameters include the number of divisionsNumber of serversTotal number of files;
The search information of the requirement files comprises the number of the requirement filesNumbering of each demand file and the number of collusion servers。
Further, the search scheme comprises a plurality of rounds, each round comprises a plurality of stages, in each stage, a user is required to send a query request to each server, and the server returns a query response;
Wherein the query request is as follows In the form of the sum of the search symbols of the individual documents, this is requested in batchesFiles are stored in file data packages on the server,Is the round sequence number;
The query response is a file data packet pointed by the server return retrieval symbol.
Further, the query round is selected fromIn the individual rounds, selectNumber of rounds, wherein, the frontThe round contains the demand information query and the marginal information query, the firstThe individual rounds only contain the demand information query;
the marginal information inquiry is only inquiry of the sum of retrieval symbols of the non-required files;
The demand information query is a query containing the sum of search symbols of the demand file.
Further, the construction of the alternative retrieval scheme further comprises calculating scheme parameters based on the storage parameters and the retrieval parametersAnd;
Utilization scheme parametersAndThe following calculations were performed:
(1) Calculate the selected The number of stages of each of the rounds;
(2) Utilization scheme parameters AndSetting search symbols of all files;
(3) Utilization scheme parameters AndThe PIR code rate for each scheme is calculated.
Further, the search symbol is set by a pre-selected methodAnd the coding matrix codes the data packet of each file to obtain the retrieval symbol of the file.
The second aspect of the invention provides a collusion-resistant multi-file privacy information retrieval system.
The collusion-resistant multi-file privacy information retrieval system comprises a file storage module, a scheme construction module, a scheme selection module and a retrieval execution module:
the file storage module is configured to store a plurality of files on a plurality of servers in a distributed manner in a file sub-packaging and coding storage mode to form a distributed storage system;
the scheme construction module is configured to acquire the retrieval information of the demand file, select query rounds together with the storage parameters and construct a plurality of alternative retrieval schemes;
The scheme optimizing module is configured to select an optimal searching scheme from a plurality of alternative searching schemes according to the PIR code rate of the searching scheme;
the searching and executing module is configured to obtain the content of the demand file by executing the optimal searching scheme;
the searching scheme comprises a plurality of rounds including the requirement information inquiry and the marginal information inquiry and a round including only the requirement information inquiry, and non-requirement files in the follow-up round requirement information inquiry results are eliminated by utilizing the marginal information inquiry results, so that the contents of the requirement files are obtained.
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the steps in a collusion resistant multi-file privacy information retrieval method according to the first aspect of the present invention.
A fourth aspect of the present invention provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a collusion resistant multi-file privacy information retrieval method according to the first aspect of the present invention when the program is executed.
The one or more of the above technical solutions have the following beneficial effects:
1) The invention constructs a search scheme consisting of a plurality of rounds including the demand information query and the marginal information query and a round including only the demand information query, eliminates non-demand files in the follow-up round demand information query results by utilizing the marginal information query results to obtain the content of the demand files, and realizes the search scheme based on any one round of demand information query results One of the distributed storage systems of codes is resistant to arbitraryMultiple file privacy information retrieval scheme for collusion of individual servers, may be directed to any suitable parametersAndI.e. allowing the system to be slavedThe copy storage is generalized to be arbitraryAndAnd allows collusion of servers.
2) Banawan and Ulukus, in the previous work, atTake turns toDefault selected in wheelThe wheel, not being aware that the selection here will have an impact on the retrieval efficiency. The invention considers the difference of the selection of each round, and can efficiently calculate and select the PIR code rate of the scheme obtained by each round, thereby aiming at given parameters、、、、The optimal round Q can be selected. For example, in,,,When Banawan and Ulukus default selections of the 5 th round, the PIR code rate of the scheme isThe invention selects the 4 th round, the PIR code rate of the scheme isTherefore, the search efficiency of the scheme is greatly improved.
3) The searching efficiency of the invention is far better than the efficiency of repeated single file searching scheme, thus the searching efficiency can be practically improved in the actual multi-file searching process.
Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
In one embodiment of the present disclosure, a collusion-resistant multi-file privacy information retrieval method is provided, as shown in fig. 1, including the following steps:
step S1, a plurality of files are stored on a plurality of servers in a distributed mode through file sub-packaging and coding storage to form a distributed storage system;
further, the method for packetizing and encoding the files is to cut the files into pieces Original data packets are divided, and each data packet passes throughCode generationThe data packets after the coding are stored in the following parts respectivelyIn the server.
Step S2, acquiring search information of a demand file, carrying out query round selection together with storage parameters, and constructing a plurality of alternative search schemes;
further, the storage parameters include the number of divisions Number of serversTotal number of files;
The search information of the requirement files comprises the number of the requirement filesNumbering of each demand file and the number of collusion servers。
Further, the search scheme comprises a plurality of rounds, each round comprises a plurality of stages, in each stage, a user is required to send a query request to each server, and the server returns a query response;
Wherein the query request is as follows In the form of the sum of the search symbols of the individual documents, this is requested in batchesFiles are stored in file data packages on the server,Is the round sequence number;
The query response is a file data packet pointed by the server return retrieval symbol.
Further, the query round is selected fromIn the individual rounds, selectNumber of rounds, wherein, the frontThe round contains the demand information query and the marginal information query, the firstThe individual rounds only contain the demand information query;
the marginal information inquiry is only inquiry of the sum of retrieval symbols of the non-required files;
The demand information query is a query containing the sum of search symbols of the demand file.
Further, the construction of the alternative retrieval scheme further comprises calculating scheme parameters based on the storage parameters and the retrieval parametersAnd;
Utilization scheme parametersAndThe following calculations were performed:
(1) Calculate the selected The number of stages of each of the rounds;
(2) Utilization scheme parameters AndSetting search symbols of all files;
(3) Utilization scheme parameters AndThe PIR code rate for each scheme is calculated.
Further, the search symbol is set by a pre-selected methodAnd the coding matrix codes the data packet of each file to obtain the retrieval symbol of the file.
Step S3, selecting an optimal retrieval scheme from a plurality of alternative retrieval schemes according to the PIR code rate of the retrieval scheme;
Step S4, obtaining the content of the demand file by executing an optimal searching scheme;
the searching scheme comprises a plurality of rounds including the requirement information inquiry and the marginal information inquiry and a round including only the requirement information inquiry, and non-requirement files in the follow-up round requirement information inquiry results are eliminated by utilizing the marginal information inquiry results, so that the contents of the requirement files are obtained.
In the above method, the sum isThe files are stored in the system via encoding, each file being cut intoBy the same packetCode generationThe coded data packets are stored in a distributed storage system respectivelyOn the server, there is no hybrid code between the files.
The number of files that the user needs to retrieve (i.e. the demand files) isAnd has As shown in fig. 2, the user generates a plurality of inquiry requests and sends the inquiry requests to each server, the server carries out feedback response on the inquiry of the user, and the user synthesizes all feedback response results and decodes the result to obtain the file content to be searched.
In this process, anyThe individual servers may collude, i.e., analyze the numbers of files retrieved by the user using the query responses they receive. PIR schemes require that the user not go to this if he can retrieve it correctlyThe collusion server leaks any information about the required file number, according to specific parametersAndThe user selects different inquiry request construction strategies to construct different retrieval schemes, and the method specifically comprises the following steps:
(1) At the position of When the user's query request construction strategy is as follows:
1) Setting the retrieval symbol of the file by Random permutations of eachThe coordinates of the individual files are scrambled, and the replaced coordinates are used as search symbols of the files.
2) The searching scheme is divided into a plurality of rounds, wherein each round comprises a plurality of stages, and in the first roundThe number of rounds is) In the form of a user search to each serverAny of the individual filesThe sum of certain coordinates of the files is totally thatAnd (5) searching.
3) The number of stages per round is set to follow specific rules-in the firstTake turns toOf the wheels, the user selects only one of the wheels (denoted as the first wheelWheel), will be the firstTake turns toThe number of stages of the remaining turns in the turn is set to 0, the turn selectedAnd front(s)The rounds together form a complete query scheme.
The number of the stages of the rounds is non-zero, the specific design follows a certain proportion, the essential purpose of the method is to fully utilize marginal information brought by non-requirement files to improve the PIR code rate, and the calculation details are shown in the definition and setting part of the rounds and the stages.
In general terms, the process is carried out,Has a certain influence on PIR code rate, and is given in parameters、、Then, for eachCalculating code rate one by one, selecting optimumDesign plan, especially inInteger divisionIn the time-course of which the first and second contact surfaces,The choice of (c) may be arbitrary.
(2) For general parameters,The user's query request construction strategy is as follows:
1) Selecting a particular one And the coding matrix carries out coding pretreatment on each file to generate a search symbol of the file.
2) In the first placeThe number of rounds is) In one stage of (a), the user retrieves from each server in the form ofAny of the individual filesSome of the files are summed together by the sign after the first encoding stepAnd (5) searching.
3) The specific design of the ratio of the number of stages of each round is subject to、、、、Is a function of (a) and (b).
In the concrete search, the coded symbols of the related files are required to meet the following settings that the user needs to search the linear independence among the symbols of the files, and the combined part of the non-searched files in the addition of each search is required to be calculated by the searched data on other servers, and meanwhile, in order to ensure the privacy of the scheme, the method is arbitrarily used for searching the filesOn the collusion server, each retrieval symbol of any file needs to be linearly irrelevant. For specific design details, see the "settings for each document retrieval symbol" section below.
The implementation process of the collusion-resistant multi-file privacy information retrieval method of the embodiment is described in detail below from the aspects of construction of a distributed storage system, definition and setting of 'rounds and phases', setting of retrieval symbols of each file, an overall scheme example and a PIR code rate calculation formula respectively.
1. Construction of distributed storage systems
Consider the followingDistributed storage system formed by individual servers, in which system storage is performedFiles, each via the sameThe code is stored in the specific form:
The following files: Wherein, the method comprises the steps of, wherein, I.e. each file is of lengthExpressed as a sizeIs a matrix of (a) in the matrix.
Code generator matrix:。
Document is put into effectIs the matrix of (a) represents the (b)The row, which is a cut data packet, is denoted as a row vectorBy usingCode generator matrix with length ofIs a row vector of (2)Coded as lengthVector of (3)First, theThe individual servers store the encoded vectorsAfter traversing all rows of the matrix representation of all files, as shown in FIG. 2, the firstContent stored by individual serversIs that all row vectors are inProjection onto, i.e.
The user needs to search in the systemPersonal files (i.e. demand files), wherein And ensure arbitraryThe collusion server cannot obtain any information about the index file number.
2. Definition and setting of "turns and phases
The structure of a search scheme comprises a plurality of rounds, each round comprises a plurality of stages, and in each stage, a user sends a query request to each server, wherein the specific form of the query request is the sum of search symbols of a plurality of files.
First, theThe number of rounds is) Representing the searchIn individual filesThe sum of the search symbols of different documentsA phase of a round is defined as a phase comprisingAny of the individual filesModule for adding search symbols of individual filesAnd (5) searching.
Wherein, in the frontIn the wheel, the firstThe number of rounds is) One stage of (2) isQueries consisting of the sum of search symbols of non-required documents, called marginal information queries, andA query containing the sum of the symbols of the required documents is called a requirement information query. First, theTo the firstIn the wheel, the firstThe number of rounds is) Is only one stage ofAnd (5) inquiring the demand information.
The round selection setting of the alternative retrieval scheme follows the following rules:
The user is at the first To the firstOne of the wheels is selected, the firstTo the firstSetting the rest turns in the turns to 0, i.e. not carrying out actual search or query, and comparing the selected turns with the previous turnsThe rounds together form a complete retrieval scheme, and the complete retrieval scheme is summed upAlternative search schemes.
The retrieval scheme eliminates the related interference in the subsequent round of demand information inquiry by adjusting the stage number of each round to fully utilize the marginal information inquiry of the non-demand file. First, theTo the point ofThe round does not generate marginal information inquiry, and optionally one round can fully utilize all marginal information inquiry by adjusting the stage number proportion, so that the calculation is carried out one by one on the premise of giving the retrieval information and the storage parametersAfter PIR code rate of each alternative scheme, only the first is selectedTo the point ofOne of the rounds for maximizing the PIR code rate of the scheme is associated with the frontThe rounds form the optimal retrieval scheme.
The number of stages of the search scheme is set to follow the following rules:
Order the AndIn order to satisfy the minimum positive integer of the following formula, in this way, the scheme parameters are setAndIs a value of (2);
in an alternative search scheme, assume that the user has selected the first Wheel) Participate in composing alternative search scheme, number of stages of each turn of schemeThe calculation is as follows:
When (when) When, i.e. scheme No.The number of the wheel stages is. When (when) When, i.e. scheme NoTake turns toIn wheel divide byThe number of stages outside the wheel is set to 0. When (when)When, i.e. scheme NoTake turns toThe number of stages of the wheel is set to。
First, theThe number of stages of the wheel is set toTo ensure that the number of stages of each round of the scheme is an integerThe number of stages of the wheel is defined byThe number of stages of the wheel is recursively obtained, the thOne stage of the wheel requires the firstWheel provisionMarginal information inquiry of each stage to eliminate interference in demand information inquiry, thusThe number of stages of the wheel is:
and so on, for the first Wheel) The number of stages is:
3. Arrangement of search symbols for each document
By pre-selection ofThe coding matrix codes the data packet of each file to obtain the retrieval symbol of the file, which is specifically:
provided that in a sufficiently large domain On is provided withCode whose transpose of the generator matrix is noted asOrder-making machineRepresenting a user slaveAll of the aboveSelected uniformly and independently in a full order matrixA random matrix.
In general parametersAndLower, the firstOne of the phases of the rounds relates toThe search symbol added query specific form of each file is set to be isomorphic with an auxiliary matrix, and the auxiliary matrix is constructed byThe size of each symbol is Each symbol appearing in the secondary matrixSecondary and arbitraryColumns share one and the same symbol, i.e. arbitraryThe servers have one and the same query.
The search symbol of the non-demand file is set as follows:
By a specific MDS coding matrix Will be Personal non-required filesSearch symbol precoding as WhereinFor a slave matrixSelected from (a) and (b) Line, ensure that the whole scheme is followedThe selected row is not repeated.
For the required fileIts search symbol is set as。
In particular forThe search symbol of the file is set as follows:
the user uses M random permutations to disturb the coordinates of each file, and the coordinates are directly used as search symbols.
4. Overall solution example
,,,When the 5 th round is selected asWheel, and frontThe search scheme composed of individual rounds is shown in FIG. 3, for the first search schemeThe user searches the total of the servers at each stageArbitrary seedQuery of the sum of the search symbols of different documents, shaped asRepresentation ofThe query form of each file is isomorphic with the auxiliary matrix, and represents one query request.
ConsiderA demand information query including a required file and a non-required fileA marginal information query including only corresponding non-required files, each non-required file The search symbols are obtained by the precoding mode in 3), and the search information inquiry only comprises a required fileSearch symbol fromThe required information inquiry comprising two or more required files, only taking the symbol of one required file as the symbol required to be searched for inquiry, setting the other required files as interference in inquiry by using the same precoding mode of non-required files, setting the required information inquiry as the search symbol which can be determined from the search of other servers, and simultaneously, at any timeThe collusion server appears to be linearly independent of the retrieval sign of any file.
5. PIR code rate calculation formula
The PIR code rate of the search scheme refers to the ratio between the size of the demand file and the total download amount, and is calculated by the following iterative method:
Record the first The total load of one stage of the wheel is,Is the firstThe sum of the download amount of the wheel itself and the download amount of the required front-end wheel, the user search file size is,Is the firstThe sum of the retrieval file size of the round itself and the retrieval file size downloaded from other servers for the required preamble round, the firstThe wheel code rate is expressed as In the first placeCode rate of wheelFor the front to be the initial valueThe round may write an iterative calculation formula:
The code rate of the seed search alternative scheme is To the point ofThe code rate of the stage of the wheel is calculated as follows:
Selection of To the point ofHighest stage code rate of wheel asAn optimal search scheme among the alternative search schemes.
Example two
An embodiment of the present disclosure provides a collusion-resistant multi-file privacy information retrieval system, which includes a file storage module, a scheme construction module, a scheme optimization module, and a retrieval execution module:
the file storage module is configured to store a plurality of files on a plurality of servers in a distributed manner in a file sub-packaging and coding storage mode to form a distributed storage system;
the scheme construction module is configured to acquire the retrieval information of the demand file, select query rounds together with the storage parameters and construct a plurality of alternative retrieval schemes;
The scheme optimizing module is configured to select an optimal searching scheme from a plurality of alternative searching schemes according to the PIR code rate of the searching scheme;
the searching and executing module is configured to obtain the content of the demand file by executing the optimal searching scheme;
the searching scheme comprises a plurality of rounds including the requirement information inquiry and the marginal information inquiry and a round including only the requirement information inquiry, and non-requirement files in the follow-up round requirement information inquiry results are eliminated by utilizing the marginal information inquiry results, so that the contents of the requirement files are obtained.
Example III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer readable storage medium having stored thereon a computer program which when executed by a processor performs steps in a collusion resistant multi-file privacy information retrieval method according to a first embodiment of the present disclosure.
Example IV
An object of the present embodiment is to provide an electronic apparatus.
An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps in a collusion-resistant multi-file privacy information retrieval method according to the first embodiment of the present disclosure when executing the program.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.