Social network privacy attack framework based on graph data, construction method and application thereof
Technical Field
The application relates to the field of network security, in particular to a social network privacy attack framework based on graph data, a construction method and application thereof.
Background
In recent years, social networks (such as microblogs, facebook and Twitter) have become convenient platforms for users to communicate with each other, share information and make new friends, and thus have also caused a large amount of personal data and information to be spread and shared over the internet. For example, current social networking homepages typically display various attributes of the user (e.g., lists of professions, residence, and hobbies, and circles of friends of the user). In general, information displayed on a social network homepage is not only accessible to general users but also to third parties for various purposeful commercial activities such as advertising, product and service recommendations, and even off-line promotions. Social networks, while enriching our social circles, exploring and discovering information, also inadvertently pose a potential threat to our privacy.
However, most users are not willing to disclose their own sensitive information to the public. Clearly, such third party actions that infer and benefit from user privacy based on social networks seriously jeopardize the privacy and security of the user. Yi et al propose an information sharing scheme based on privacy conditions aiming at the problem of information leakage in a social network, and can control information according to privacy requirements of owners and information environments. Rehman et al propose a conceptual model based on privacy computing theory for privacy issues in online social networks to explain user privacy behavior in online social networks. In fact, most attackers attack the privacy of the target user primarily on a homogeneous basis. Research by Kayes et al shows that in a publicly flowing social network, personal attributes and surrounding relationships can be inferred accurately, especially when the target user is closely related to highly influential public personas.
In general, the data threatened by the privacy attack includes user attributes and user identities. The former attacker uses the public information of the user (such as occupation, residence, hobbies and interests, etc.) to infer sensitive attributes or missing user information, and the attack is mostly based on machine learning technology. From the perspective of privacy disclosure, privacy attack techniques in social networks can be broadly divided into two categories, namely attribute privacy attacks and identity privacy attacks. In attribute privacy attack, an attacker targets the attribute privacy of a user, including location privacy, age privacy, professional privacy, etc. For example, chetan et al, starting from the traditional visual family problem, indicated that family information was easily exposed to an attacker by connecting a candid shot to a social network. Jia et al infer the location information of the user from the scoring metrics of the applications in the social network. Weinberg et al demonstrate that an attacker can use a machine-learned classifier to predict the location and gender of a user.
On the other hand, with the wide application and popularization of knowledge graphs, the graph data are increasingly paid attention to by students, and are gradually applied to related researches of social networks. Myers et al studied the topological features of the Twitter attention graph and studied in depth whether Twitter is a social network or an information network. Wei et al propose a bipartite graph-based pairwise graph neural network (DSGNN) for predicting relational links in LBSNs to perform relational reasoning in social networks. It models social proximity and movement similarity by extracting paired social and trajectory subgraphs. Wu et al advanced the development of DiffNet by reformulating social recommendations as heterograms containing social and trajectory subgraphs.
In summary, in order to search the nature of the privacy attack problem in the social network and analyze the privacy protection capability and attack efficiency of the algorithm, a social network privacy attack frame based on graph data needs to be constructed, a knowledge graph frame is constructed by comprehensively utilizing the user identity and attribute information in a social network scene, topology structure information existing in the network is analyzed, the attribute nodes of the user are classified, the node attributes are divided in fine granularity, so that the maximization of feature extraction is achieved, the node connection in the social attribute network frame is marked with attribute labels based on the topology structure information, further privacy attack, information protection and attribute recommendation operation are realized for downstream tasks, the expansibility of the algorithm is realized, and a certain help is provided in solving the security problem of privacy protection in the social network environment.
Disclosure of Invention
The application aims to provide a social network privacy attack framework based on graph data, a construction method and application thereof, so as to solve the technical problems.
In order to achieve the above object, the present application provides the following solutions:
the application firstly provides a social network privacy attack framework based on graph data, wherein the attack framework is a carefully planned knowledge base for tracking network pairs tactics and technologies used by a threatening actor in the whole attack life cycle. The framework is not just a data set: it is intended to be used as a tool to strengthen the security situation of an organization. The attack framework may be from the adversary's perspective, and the security action team may more easily infer the adversaries' personal action motivations and understand the relationship of these actions to specific defensive categories.
The social network privacy attack framework based on the graph data comprises user nodes, user attribute nodes, similarity nodes, user social relations, attribute relations and similarity relation connection categories. To perform a privacy attack task, an attacker needs to build a privacy attack framework to represent the social information he has collected about the target user. Social information related to the target user. Such a privacy attack framework may integrate privacy information fully into the network topology, thereby facilitating privacy attacks and inferences. The network construction process is divided into three main phases: preprocessing, attribute classification, and constructing a social network privacy map.
Firstly, according to the data obtained from the social network, preprocessing is needed, and the data is processed into graph data. Composition G in a common social network s =(V s ,E s ) Only social user nodes are included. On this basis, we first determine the relationship between the user and the social network in re (v i ,v j ) And establishing connection between the user nodes. Then, according to the attribute information of the user in the social network, the attribute node a ij ,a ij ∈a i ,a i =[a i1 ,a i2 ,......,a in ]A plurality of attribute nodes may be included in one attribute class. According to the relation between the user and the attribute, a corresponding attribute connection re (v i ,a ij ). Then extend nodes in the social network to social nodes andattribute node v=v a +V s 。
Formatting the collected users by using graph data to obtain A= { a 0 ,a 1 ,a 2 ,......,a n (wherein a) 0 ,a 1 ,......,a n Is a single attribute value. Converting attribute values to 0,1 matrix a= [ α ] 0 ,α 1 ,......,α n ]. In matrix A, n represents the number of attribute nodes, α i Is a 0,1 vector, if the user node has the v attribute, it is 1; otherwise, 0. According to the attribute set A= { a 0 ,a 1 ,a 2 ,......,a n And classifying the attribute values according to the attribute categories classified after the clustering operation. Classifying user attributes according to attribute categories, i.e. a ij ={a ij |a ij ∈β i },B={β 0 ,β 1 ,......,β n }. Wherein B is an attribute category matrix. According to the attribute set A and the attribute category matrix B, the formal representation A ' = { a ' is obtained ' 0 ,a′ 1 ,......,a′ n }。
Further calculating the similarity of attribute nodes and user nodes, and establishing a similarity set S= { S AA ,S PA ,S RA ,S CN }, whereinsim e AA, PA, RA, CN. Converting the similarity into a similarity node, adding the similarity node into a social attribute framework, and establishing attribute similarity connection re (a ij ,s m ) And user similarity connection re (v i ,s m ) There is v=v a +V s +V sim . And then, further refining the category of the attribute connection on the basis of the original attribute connection. According to the classification of different attributes, attribute connections can be further refined into gender attribute connectionsAge attribute connections, interest attribute connections, location attribute connections, and the like. Attribute connections, etc. After labeling the connection, the connection attributes are combined with W-addedAnd (3) entering a social attribute network to obtain a social network privacy graph G= (V, E, W).
The application has the beneficial effects that: the application provides a social network privacy attack framework based on graph data to realize information aggregation in a social network, so that a large-scale graph data set can be processed and the requirements of tasks such as downstream attack and recommendation of a knowledge graph can be met. Compared with the prior framework based on the social network, the framework fully fuses the relationship information of the user, namely the topological structure, the attribute information in the social network and the similarity index, and has stronger computing and learning capabilities. In terms of the practicability of the framework, the privacy attack framework not only can be used for safety analysis of identity privacy and attribute privacy, but also can be used for simulating machine learning attack methods such as similarity attack and graph embedding.
Drawings
FIG. 1 is a flow chart of the calculation of the present application.
Detailed Description
The preferred embodiments of the present application will be described in detail below with reference to the attached drawings, so that the objects, features and advantages of the present application will be more clearly understood. It should be understood that the embodiments shown in the drawings are not intended to limit the scope of the application, but rather are merely illustrative of the true spirit of the application.
In the following description, for the purposes of explanation of various disclosed embodiments, certain specific details are set forth in order to provide a thorough understanding of the various disclosed embodiments. One skilled in the relevant art will recognize, however, that an embodiment may be practiced without one or more of the specific details. In other instances, well-known devices, structures, and techniques associated with the present application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.
Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Example 1:
the calculation flow of the application is shown in figure 1. In social networks, the attacker is interested in the privacy or user information of the user. It may be an operator and investor in whom the software wants to promote its services, or a third party organization interested in user information. In this scenario, an attacker may acquire part of the public information, but not all of the public information. An attacker can acquire attribute information (i.e., residence, gender, and profession) and relationship information (i.e., friends, relatives) of the user. After the attacker obtains the user information, the user data is preprocessed firstly, namely, a social attribute network G is established s =(V s ,E s ). The attribute nodes are divided according to the mode of attribute collection through machine learning algorithm models such as SVM and Random Forest, namelyCalculating a user node degree index and similarity indexes (comprising CN, AA, PA, RA) between the user node degree index and other nodes, establishing user similarity nodes and attribute similarity nodes, and establishing a similarity set S= { S AA ,S PA ,S RA ,S CN }. Then classifying the connection in the knowledge graph, marking different class labels according to the attribute, identity and similarityWherein the attribute connections may be further divided according to their attribute categories. Finally, the construction of the social network privacy attack framework based on the graph data is completed.
The workflow of the application can be divided into three stages: an initialization stage, an attribute classification stage and a knowledge graph frame connection construction stage. In the present application, an attacker is considered as a third party or a software developer who cannot acquire all information, and processes the acquired social network data using a knowledge pattern. The method has the classification capability of selecting a machine learning method and an attack algorithm when constructing an attack framework.
The method comprises the following steps:
(1) Initialization phase
The initialization phase is divided into three steps:
step 1: firstly, collecting user and attribute information in a social network to obtain a social network G s =(V s ,E s ) Wherein V is s For user nodes, E s Is a user relationship connection. And simultaneously obtaining a user attribute set A.
Step 2: calculating the relation degree k of the user according to the attribute node and the user relation node u And attribute degree k a The degree of relationship is the number of user nodes that this node owns; attribute degree is the number of attribute nodes that a node possesses.
Step 3: calculate each existing relationship connection re (v i ,v j ) And attribute connection re (v) i ,a ij ) AA, CN, PA, RA similarity index of (2). And converting the similarity index into information which can be identified by the attribute nodes, and adding the information into a social attribute network, wherein the calculated similarity is denoted by S. Similarity set S= { S of knowledge graph is established based on the method AA ,S PA ,S RA ,S CN }, wherein
(2) Attribute classification stage
The node classification stage is divided into three steps:
step 1: firstly, formatting the collected users by using graph data to obtain A= { a 0 ,a 1 ,a 2 ,......,a n (wherein a) 0 ,a 1 ,......,a n Is a single attribute value. Converting attribute values to 0,1 matrix a= [ α ] 0 ,α 1 ,......,α n ]. Briefly, the "Nanjing" and "Beijing" belong to attribute values, and the "City" belongs to attribute categories. In matrix A, n represents the number of attribute nodes, α i Is a 0,1 vector, if the user node has the v attribute, it is 1; otherwise, 0.
Step 2: according to the attribute set A= { a 0 ,a 1 ,a 2 ,......,a n And classifying the attribute values according to the attribute categories classified after the clustering operation. User attributes are divided according to attribute categories through machine learning algorithm models such as SVM and random forest, namely a ij ={a ij |a ij ∈β i },B={β 0 ,β 1 ,......,β n }. Wherein B is an attribute category matrix.
Step 3: according to the attribute set A and the attribute category matrix B, the formal representation A ' = { a ' is obtained ' 0 ,a′ 1 ,......,a′ n }。
(3) Stage of constructing knowledge graph frame
The knowledge graph frame construction stage comprises four steps:
step 1: converting the existing attribute into attribute nodes, converting the calculated similarity into similarity nodes, and adding the similarity nodes into a social attribute knowledge graph G s =(V s ,E s ) In which v=v s +V a +V sim
Step 2: the nodes in the knowledge graph are divided into three main categories of attribute nodes, user nodes and similarity nodes according to the node characteristics. Wherein the connection between the attribute node and the user node is called an attribute connection re (v i ,a ij ) The connection between the user node and the user node is called user connection re (v i ,v j ) The connection between the user node and the similarity node is called a similarity connection re (v i ,s m ) The connection between the attribute node and the similarity node is called an attribute similarity connection re (a ij ,s m ). Wherein s is m ∈S,S={S AA ,S PA ,S RA ,S CN }。
Step 3: based on the attributes of the nodes and the connections between the nodes, the connections in the social knowledge graph are classified, and the connections are labeled with corresponding attribute tags. For the categories of different attributes, attribute connection needs to be further refined into gender attribute connection, professional attribute connection, hobby attribute connection and the like. Namely:
step 4: after labeling the connection, adding the connection attribute in combination with W into a social attribute network to obtain a social network privacy attack frame G= (V, E, W) based on the graph data.
The foregoing has shown and described the basic principles and main features of the present application and the advantages of the present application. It will be understood by those skilled in the art that the present application is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present application, and various changes and modifications may be made without departing from the spirit and scope of the application, which is defined in the appended claims.