Track differential privacy protection method and system for resisting predictive attack
Technical Field
The invention relates to the technical field of privacy protection data processing, in particular to a track differential privacy protection method and system for resisting predictive attack.
Background
In recent years, with the wide application of the internet of things and the popularization of mobile terminals with positioning functions, various Location-based services (LBS) have been rapidly developed, and are an indispensable part of people's life. A location-based service is a service that surrounds a geographical location, obtains the current location of a device using various location technologies, and sends the current location to a server, which retrieves resources and information related to the location in a spatial database and feeds the information back to the device, thereby providing the device with information retrieval or other basic services related to its location, such as searching for nearby restaurants, querying routes, times, etc., to destinations, which greatly facilitates people's lives.
However, at the same time, the LBS server collects a large amount of track information, which may cause serious track privacy disclosure problems. Once the user's location privacy is exposed, an attacker can illegally obtain sensitive data of the user (such as occupation, health condition, personal relationship, etc. of the user) by analyzing the user's location information. Even the future track of the user can be predicted and tracked by utilizing technologies such as data mining and the like, and the privacy security of the user is greatly threatened. And as people's awareness of privacy protection increases, users begin to tend not to expose themselves to accurate location information, but to provide only ambiguous location information, which greatly limits the development of location-related applications. Thus, a location privacy protection scheme is needed, both from a user perspective and a facilitator perspective. How to protect the location privacy of users on the basis of ensuring that location services are available has become an increasingly popular topic.
The track privacy protection scheme in the current LBS roughly includes the following four types: generalization, mixing region, suppression, and perturbation. The traditional privacy protection scheme is seriously dependent on background knowledge owned by an attacker, when a new attack (such as anonymization attack and composition attack) appears, the model can not provide good protection effect, and the problems are effectively overcome by the appearance of a differential privacy technology. Differential privacy technology has a strict mathematical theory basis and a controllable privacy protection level, and becomes a research hotspot for privacy protection in recent years. Andres, and the like apply the idea of differential privacy to track data, a geographical indistinguishability position privacy protection model is provided, and the position privacy protection is realized by adding Laplace noise to generate a disturbance position to replace a real position acquisition service. This model has become the most common method in LBS location privacy protection today. However, the existing track differential privacy protection research has the following two problems:
1. Privacy budget consumption problem for tracks in continuous location queries. Most of the prior art focuses on privacy protection of single location points, so that the single location points can better meet epsilon-differential privacy. However, differential privacy has sequence combinability, and in a continuous position inquiry scene, a large number of position points form track information, if epsilon privacy budget is consumed by each position point, the privacy budget consumed by the finally formed track is extremely large. Therefore, the privacy of the track in the continuous position inquiry scene is difficult to ensure.
2. The trajectories have temporal-spatial correlation, and an attacker can often infer future position information from historical trajectory information of moving objects. Along with the development of artificial intelligence, the prediction technology is more advanced, the prediction accuracy is higher, and if the prediction accuracy is utilized by malicious attackers, related data are collected to predict future track information of the user to attack, so that the risk of privacy disclosure of the user is increased.
On the basis of guaranteeing the availability and quality of service of data, aiming at malicious prediction attack, the privacy information of a user is guaranteed, and the method and the device become a technical problem to be solved in the field.
Disclosure of Invention
The invention aims to provide a track differential privacy protection method and a track differential privacy protection system for resisting a predictive attack, which can effectively protect user privacy, ensure data availability and effectively resist track predictive attack.
In order to achieve the above object, the present invention provides the following solutions:
a track differential privacy protection method for resisting predictive attack comprises the following steps:
The track sequence is used as the input of a trained hidden Markov model to obtain a hidden state sequence;
obtaining current position point information according to the hidden state sequence;
Determining predictability and importance of current location point information;
Allocating a privacy budget based on said predictability and said importance using a w sliding window mechanism;
Adding Laplacian noise to the real location point information according to the privacy budget by using a planar Laplacian mechanism to generate a disturbance location set;
determining disturbance positions which meet preset conditions in the disturbance position set;
and replacing the actual position in the track with the disturbance position meeting the preset condition to form a disturbance track.
Preferably, the hidden state sequence is obtained by using the track position sequence as the input of the trained hidden markov model, and the method further comprises the following steps:
acquiring parameters of a historical track dataset estimation model, and constructing an initial hidden Markov model based on the parameters;
initializing and assigning the initial hidden Markov model so that the initial hidden Markov model after assignment meets a preset constraint condition;
calculating the forward probability of the observation sequence at the time t and the backward probability of the observation sequence at the time t+1 based on the initial hidden Markov model meeting the preset constraint condition;
determining the probability of being in a preset state at the time t according to the forward probability and the backward probability, and marking the probability as a first probability;
determining the probability that the moment t and the moment t+1 are in a preset state according to the forward probability and the backward probability, and marking the probability as a second probability;
Updating the initial hidden Markov model meeting preset constraint conditions according to the first probability and the second probability to obtain an updated hidden Markov model;
And returning to the step of calculating the forward probability of the observation sequence at the time t and the backward probability of the observation sequence at the time t+1 based on the initial hidden Markov model meeting the preset constraint condition until the updated hidden Markov model reaches convergence, so as to obtain the hidden Markov model.
Preferably, the obtaining the current location point information according to the hidden state sequence specifically includes:
Initializing a probability value of a start time state, wherein the initialized probability value delta 1 (i) is:
δ1(i)=πibi(o1),1≤i≤N
Determining that the probability maximum in all hidden state sequences < s 1,s2,...,st > with hidden states s t at time t is delta t (i):
determining that the hidden state of the (t-1) th node in the hidden state sequence with the maximum probability at the moment t is phi t (i):
According to the maximum probability and the hidden state of the T-1 node, recursing from the initial moment to the T moment, and then backtracking by using the previous state node recorded by the hidden state of the T-1 node until the optimal hidden state sequence is found; the optimal hidden state sequence is S *:
Predicting the hidden state of the current position based on the optimal hidden state sequence and combining to generate a probability matrix to obtain observation position point information generated by the hidden state of the current position; the observed position point information is the current position point information;
Where pi i is the probability of being in state s i at time t=1, b i is the probability of generating a corresponding observed output value for each state, a ji is the probability of entering state s j from state s i, i is the current position, and j is the next position.
Preferably, the determining predictability and importance of the current location point information specifically includes:
Determining the predictability of current location point information by manhattan distance between a real location and the current location point information; the predictability PP is:
Wherein p i is real position point information, o i is current position point information, and d (p i,oi) is Manhattan distance between the real position point information and the current position point information;
Judging whether the real position point information is a track characteristic point or not;
If the real position point information is a track characteristic point, the importance I= |cos (theta) | of the current position point information; if the real position point information is not the track characteristic point, the importance I=0 of the current position point information; wherein,
Preferably, the allocating a privacy budget according to the predictability and the importance by using a w sliding window mechanism specifically includes:
The w sliding window limits the maximum privacy budget, and calculates the sum of the privacy budget consumption of the w-1 positions before calculation;
Calculating the residual privacy budget of the current window [ i-w+1, i ] according to the privacy budget consumption of the previous w-1 positions; the residual privacy budget is the maximum privacy budget which can be allocated to the current position point information; the maximum privacy budget is ε max:
where ε is the privacy total budget,/> Epsilon k is the privacy budget for the kth location, w is the window size, and i is the current location, for the privacy budget consumption sum for the first w-1 locations.
Assigning a privacy budget to the current location point information based on the predictability and the importance; the privacy budget is ε i:
Where β 1 is the predictability weight, β 2 is the importance weight, and Δε is the privacy budget increment.
Preferably, the adding laplace noise to the real location point information according to the privacy budget by using a planar laplace mechanism to generate a disturbance location set specifically includes:
determining a noise radius from the privacy budget; the noise radius is r:
Wherein W -1 (·) is the interval (- ≡1) branch of the Lembert W function, ρ is a random number subject to [0,1] uniform distribution;
Randomly generating random numbers obeying the uniform distribution of [0,2 pi ];
Calculating a disturbance position through the generated random numbers subjected to [0,2 pi ] uniform distribution and the noise radius; the disturbance position is z:
z=pi+(r·cos(θ),r·sin(θ));
wherein θ is a generated random number subject to [0,2 pi ] uniform distribution;
Returning to the step of determining the noise radius according to the privacy budget until the number of generated interference positions meets a set threshold value, and generating the disturbance position set.
Preferably, the determining the disturbance location set meets the preset condition specifically includes:
judging whether the disturbance positions in the disturbance position set and the real positions are in the same cell or not;
if the disturbance position and the real position are in the same cell, determining the disturbance position with the maximum importance as the disturbance position meeting the preset condition;
and if the disturbance position and the real position are not in the same cell, selecting the disturbance position in the cell with the minimum transition probability of the cell to which the real position belongs as the disturbance position meeting the preset condition.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
According to the track differential privacy protection method for resisting the prediction attack, provided by the invention, the current position of the mobile object is predicted based on the hidden Markov model, and the predictability of the position is calculated to adjust the privacy parameters. And secondly, allocating corresponding privacy budget to the position points by utilizing a w sliding window mechanism, and ensuring that the track segments with the length of w meet epsilon-differential privacy. And finally, adding Laplacian noise to the original track data according to the set privacy calculation by combining a geographic indistinguishable mechanism to generate a disturbance position set, and releasing an optimal disturbance position point to improve the usability of the data. Therefore, the track data can be subjected to privacy protection and the track prediction attack can be effectively resisted.
The invention also provides a track differential privacy protection system for resisting the prediction attack, which comprises:
The hidden state sequence determining module is used for obtaining a hidden state sequence by taking the track sequence as the input of the trained hidden Markov model;
the current position point information determining module is used for obtaining current position point information according to the hidden state sequence;
a prediction importance determining module, configured to determine predictability and importance of current location point information;
a privacy budget allocation module for allocating a privacy budget according to the predictability and the importance using a w sliding window mechanism;
The disturbance position set generation module is used for adding Laplacian noise to the real position point information according to the privacy budget by using a planar Laplacian mechanism so as to generate a disturbance position set;
The disturbance position determining module is used for determining disturbance positions which meet preset conditions in the disturbance position set;
And the disturbance track forming module is used for replacing the real position in the track with the disturbance position meeting the preset condition to form a disturbance track.
The technical effects achieved by the track differential privacy protection system for resisting the predictive attack are the same as those achieved by the track differential privacy protection method for resisting the predictive attack, so that the technical effects are not repeated here.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a track differential privacy protection method for resisting predictive attack provided by the invention;
FIG. 2 is a general flow chart of a method for implementing trace differential privacy protection against predictive attack according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of meshing according to an embodiment of the present invention;
FIG. 4 is a diagram of an example of a trace feature point provided in an embodiment of the present invention;
FIG. 5 is a schematic diagram of an interference location set according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a track differential privacy protection system for resisting a prediction attack according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a track differential privacy protection method and a track differential privacy protection system for resisting prediction attack, which fully utilize track data of a mobile object, solve the track prediction attack problem which is not considered by the traditional method, reasonably adjust the size of privacy budget through a hidden Markov model and a sliding window mechanism, effectively reduce the risk of track privacy disclosure of a user, ensure the usability of data, effectively resist track prediction attack and better promote the development of a position-based service industry.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, the track differential privacy protection method for resisting prediction attack provided by the invention comprises the following steps:
Step 100: and obtaining a hidden state sequence by taking the track sequence as the input of a trained hidden Markov model.
Step 101: and obtaining the current position point information according to the hidden state sequence.
Step 102: predictability and importance of current location point information is determined.
Step 103: privacy budgets are allocated according to predictability and importance using a w sliding window mechanism.
Step 104: the planar Laplace mechanism is utilized to add Laplace noise to the real location point information in accordance with the privacy budget to generate a disturbance location set.
Step 105: and determining disturbance positions which meet preset conditions in the disturbance position set.
Step 106: and replacing the actual position in the track with the disturbance position meeting the preset condition to form a disturbance track.
The following describes a specific implementation procedure of the track differential privacy protection method for resisting a prediction attack according to the present invention based on the implementation architecture shown in fig. 2, which is not limited to this in the practical application process.
Step one, building a hidden Markov prediction model: the track position sequence is used as an observable sequence of a hidden Markov model, the cell sequence is used as a hidden state sequence, the position points of the moving object are regarded as observable position points which are only relevant to the cells, and model training is carried out at an LBS server side.
The hidden Markov model is established by the following steps:
The hidden Markov model is migrated to the problem of predicting the track data, the track sequence is taken as an observable sequence of the hidden Markov model, the hidden state is each cell in the geographic area, and the position point of the moving object can be regarded as an observed value generated by a certain cell. Estimating parameters (a, B, pi) of the model from the historical trajectory dataset to obtain a hidden markov model μ= (a, B, pi), a being a state transition probability matrix, a= { a ij }, wherein a ij represents a probability of entering a current state s j from a previous state s i, B being a generation probability matrix, b= { B i (k) }, wherein B i (k) represents a probability of each state s i producing a corresponding observable output value o k, pi being an initial state probability vector, pi= { pi i }, wherein pi i represents a probability of being in state s i at time t=1. The specific processing steps are as follows:
(1-1) initializing and assigning mu (A, B, pi) to enable the mu (A, B, pi) to meet the following constraint (namely preset constraint conditions):
wherein N, M represents the number of hidden states and the number of observable values each hidden state may produce, respectively.
(1-2) Calculating a forward probability α t (i) of an observation sequence of (o 1,o2,...,ot) at a time T of s i and a backward probability β t (i) of an observation sequence of (o t+1,ot+2,...,oT) from a time t+1 to a final time T of s i:
The probability ζ t (i, j) that the user is in state s i at time t, state s j at time t+1, and the probability γ t (i) that the user is in state s i at time t are calculated from α t(i)、βt (i):
Where i represents the current position, j represents the next position, s i represents the state corresponding to the current position, and s j represents the state corresponding to the next possible position.
(1-3) Re-estimating parameters pi i、aij、bi (k) of the hidden Markov model according to the xi and gamma results obtained in the step (1-2), and obtaining an updated model hidden Markov model as follows:
πi=P(S1=si|O,μ)=γ1(i)
(1-4) cyclically performing the operations of steps (1-2) and (1-3) using the updated μ (a, B, pi) values until μ converges (the values of the parameters a, B, pi no longer change), resulting in a hidden markov model μ= (a, B, pi).
Step two, predicting the current position point: and finding out the transition probability among the calculated unit cells and the probability of each specific position point corresponding to the unit cells through a hidden Markov model, and solving a hidden state sequence by adopting a Viterbi algorithm. The specific implementation process of the steps is as follows:
According to the trained hidden Markov model mu= (A, B, pi) and the track sequence tr= (p 1,p2,...,pt) of the position point p next to be predicted, finding the hidden state sequence most likely to generate the position points, predicting the next hidden state through the hidden state sequence and the state transition matrix A, and calculating the position point o next most likely to generate the hidden state according to the generated probability matrix B, wherein the position point is the prediction result. The specific procedure is as follows.
(2-1) Initializing a state at a start time:
(2-2) calculating the probability maximum value delta t (i) in all hidden state sequences < s 1,s2,...,st > with hidden states s t at time t, namely the probability that delta t (i) is o= (O 1,o2,...,ot) the most likely corresponding hidden state sequence:
Calculating the hidden state of the t-1 th node in the hidden state sequence with the highest probability at the moment t as phi t (i), namely the most probable hidden state at the moment t-1:
(2-3) recursively starting from the initial time to the time T based on delta t (i) and phi t (i), and then tracing back with the most likely state node before recorded by phi t (i) until the optimal hidden state sequence is found
(2-4) Predicting the hidden state S t to which the current position belongs, and combining to generate a probability matrix B, and solving the most likely generated observation position point o next of the hidden state.
Step three, position point privacy budget allocation: the w sliding window mechanism is utilized to allocate corresponding privacy budgets to each position point in the track, wherein the meshing rule is shown in fig. 3. The privacy budget allocated to the current location is determined not only by the total budget allocated to the previous w-1 locations, but also by the predictability and importance of the current location. The privacy requirements and predictability of each location point are different, and the privacy budget needs to be adjusted based on the differences of each location point. The specific process is as follows.
(3-1) Calculating position predictability by manhattan distance d (p i,oi) between the real position p i and the predicted position o i:
(3-2) calculating the position importance, if p i is a track feature point (see fig. 4 for an example of a track feature point), i= |cos (θ) |, otherwise i=0, namely:
(3-3) w sliding window limiting maximum privacy budget, calculating the privacy budget consumption sum of the previous w-1 positions to calculate the residual privacy budget of the current window [ i-w+1, i ], namely the maximum privacy budget which can be allocated for the current position:
(3-4) assigning a privacy budget to the current location point based on the predictability PP obtained in step (3-1) and the importance I obtained in step (3-2):
Step four, generating disturbance positions: and adding Laplacian noise to the real position of the mobile object according to the set privacy budget epsilon i by using a planar Laplacian mechanism to generate a disturbance position set, and selecting the position point with the highest availability in the disturbance position set to replace the real position and uploading the position point to a server. The planar laplace mechanism is a mechanism that satisfies epsilon-geographically indistinguishable, which is to derive the interference location from a two-dimensional laplace distribution centered around the true location p. The interference positions are randomly generated through a planar Laplace mechanism to form an interference position set, and then the position point with the highest availability is selected to be issued as the interference position, and the specific process is as follows, and an example of the generated interference position set is shown in fig. 5.
(4-1) Generating a disturbance location: from the assigned privacy budget ε i, calculate the noise radius r:
Where W -1 (·) is the interval (- ≡1) branch of the Lembert W function and ρ is a random number subject to a uniform distribution of [0,1 ]. Randomly generating random numbers theta subject to [0,2 pi ] uniform distribution, and calculating disturbance positions z through theta and r: z=p+ (r·cos (θ), r·sin (θ)).
(4-2) Generating a disturbance data set: and (4) circularly executing the step (4-1) until the generated interference position number meets the set threshold value.
(4-3) Selecting the disturbance location with the highest availability: if the disturbance location is in the same cell as the real location, the disturbance location with the greatest availability is selected in consideration of the availability. If the two are not in the same cell, the predictability is considered, and the disturbance position in the cell with the minimum transition probability with the cell to which the true position belongs is selected.
Based on the above description, a software program for implementing the track differential privacy protection method for resisting the prediction attack provided by the invention is generally described as follows:
input: trace data tr, privacy budget ε, privacy budget delta Δε, window size w, hidden Markov model μ
And (3) outputting: disturbance track tr'
Initializing tr' ≡0
FOR pi in tr DO:
Each position point p in the/(traversal locus data tr) T
Initializing δ 1(i)=πibi(p1)、ψ1 (i) =0
FOR t=2to(T-1)DO:
Calculating the values of delta t (i) and psi t (i)
END FOR
P*←max(δT-1)
Record the most likely state node of the current location
FOR t=T-2to 1DO:
Backtracking from T time to initial time, finding out optimal state sequence of cause
END FOR
Hidden state to which the current position belongs is/is predicted
The combination generates a probability matrix B to find the most likely observation position o of the hidden state T
Privacy budgets for// computing current location
Initializing a set of interference locations
Determination of the cell m where p T is located
FOR k=1to k DO:
θ=rand()×2π
z=pT+(r·cos(θ),r·sin(θ))
Adding z to DS
END FOR
FOR zi in DS DO:
zi.f=d(pT,zi)
END FOR
Selecting the position point with highest availability in DS as disturbance position z T
Adding z T to tr
END FOR
RETURN tr'
Corresponding to the track differential privacy protection method for resisting the predictive attack, the invention also provides a track differential privacy protection system for resisting the predictive attack, as shown in fig. 6, the system comprises: the system comprises a hidden state sequence determining module 1, a current position point information determining module 2, a prediction importance determining module 3, a privacy budget allocation module 4, a disturbance position set generating module 5, a disturbance position determining module 6 and a disturbance track forming module 7.
The hidden state sequence determining module 1 is used for obtaining a hidden state sequence by taking the track sequence as the input of a trained hidden Markov model.
The current position point information determining module 2 is used for obtaining current position point information according to the hidden state sequence.
The prediction importance determination module 3 is used for determining the predictability and importance of the current location point information.
The privacy budget allocation module 4 is configured to allocate a privacy budget in terms of predictability and importance using a w-sliding window mechanism.
The disturbance location set generation module 5 is configured to add laplace noise to the real location point information according to the privacy budget using a planar laplace mechanism to generate a disturbance location set.
The disturbance location determination module 6 is configured to determine disturbance locations in the disturbance location set that satisfy a preset condition.
The disturbance track forming module 7 is used for replacing the real position in the track with the disturbance position meeting the preset condition to form a disturbance track.
In summary, the method and the device can effectively solve the problems that the existing track privacy protection model is difficult to resist track prediction attack and the total budget consumption of track privacy is overlarge. And evaluating the predictability of the position points through a hidden Markov model, and adjusting privacy parameters. The track privacy overall budget is controlled by a sliding window mechanism. Data availability is guaranteed by generating a set of interference locations. Therefore, the method and the system can effectively protect track privacy, ensure data availability, effectively resist track prediction attack, provide reference for the location-based service (LBS) industry and promote the development of the industry.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.