CN116955849A

CN116955849A - Social network privacy attack framework based on graph data, construction method and application thereof

Info

Publication number: CN116955849A
Application number: CN202311011131.5A
Authority: CN
Inventors: 彭长根; 赵小然; 丁红发; 谭伟杰
Original assignee: Guizhou University
Current assignee: Guizhou University
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-10-27

Abstract

The invention discloses a social network privacy attack framework, construction method and application based on graph data. The attack framework includes user nodes, user attribute nodes, similarity nodes, user social relationships, attribute relationships and similarity relationship connection categories; the framework By integrating user privacy and attribute information in social networks, we build a social attribute network based on graph data topology, use users' identity relationships to create user nodes, represent users' social relationships in social networks in the knowledge graph, and use user attributes to build knowledge. Attribute nodes in the graph finally classify the connections between nodes through a classification mechanism to achieve fine-grained user relationships. This framework fully integrates the user's relationship information, that is, the topological structure, attribute information and similarity indicators in the social network, and has stronger computing and learning capabilities.

Description

Social network privacy attack framework based on graph data, construction method and application thereof

Technical Field

The application relates to the field of network security, in particular to a social network privacy attack framework based on graph data, a construction method and application thereof.

Background

In recent years, social networks (such as microblogs, facebook and Twitter) have become convenient platforms for users to communicate with each other, share information and make new friends, and thus have also caused a large amount of personal data and information to be spread and shared over the internet. For example, current social networking homepages typically display various attributes of the user (e.g., lists of professions, residence, and hobbies, and circles of friends of the user). In general, information displayed on a social network homepage is not only accessible to general users but also to third parties for various purposeful commercial activities such as advertising, product and service recommendations, and even off-line promotions. Social networks, while enriching our social circles, exploring and discovering information, also inadvertently pose a potential threat to our privacy.

However, most users are not willing to disclose their own sensitive information to the public. Clearly, such third party actions that infer and benefit from user privacy based on social networks seriously jeopardize the privacy and security of the user. Yi et al propose an information sharing scheme based on privacy conditions aiming at the problem of information leakage in a social network, and can control information according to privacy requirements of owners and information environments. Rehman et al propose a conceptual model based on privacy computing theory for privacy issues in online social networks to explain user privacy behavior in online social networks. In fact, most attackers attack the privacy of the target user primarily on a homogeneous basis. Research by Kayes et al shows that in a publicly flowing social network, personal attributes and surrounding relationships can be inferred accurately, especially when the target user is closely related to highly influential public personas.

In general, the data threatened by the privacy attack includes user attributes and user identities. The former attacker uses the public information of the user (such as occupation, residence, hobbies and interests, etc.) to infer sensitive attributes or missing user information, and the attack is mostly based on machine learning technology. From the perspective of privacy disclosure, privacy attack techniques in social networks can be broadly divided into two categories, namely attribute privacy attacks and identity privacy attacks. In attribute privacy attack, an attacker targets the attribute privacy of a user, including location privacy, age privacy, professional privacy, etc. For example, chetan et al, starting from the traditional visual family problem, indicated that family information was easily exposed to an attacker by connecting a candid shot to a social network. Jia et al infer the location information of the user from the scoring metrics of the applications in the social network. Weinberg et al demonstrate that an attacker can use a machine-learned classifier to predict the location and gender of a user.

On the other hand, with the wide application and popularization of knowledge graphs, the graph data are increasingly paid attention to by students, and are gradually applied to related researches of social networks. Myers et al studied the topological features of the Twitter attention graph and studied in depth whether Twitter is a social network or an information network. Wei et al propose a bipartite graph-based pairwise graph neural network (DSGNN) for predicting relational links in LBSNs to perform relational reasoning in social networks. It models social proximity and movement similarity by extracting paired social and trajectory subgraphs. Wu et al advanced the development of DiffNet by reformulating social recommendations as heterograms containing social and trajectory subgraphs.

In summary, in order to search the nature of the privacy attack problem in the social network and analyze the privacy protection capability and attack efficiency of the algorithm, a social network privacy attack frame based on graph data needs to be constructed, a knowledge graph frame is constructed by comprehensively utilizing the user identity and attribute information in a social network scene, topology structure information existing in the network is analyzed, the attribute nodes of the user are classified, the node attributes are divided in fine granularity, so that the maximization of feature extraction is achieved, the node connection in the social attribute network frame is marked with attribute labels based on the topology structure information, further privacy attack, information protection and attribute recommendation operation are realized for downstream tasks, the expansibility of the algorithm is realized, and a certain help is provided in solving the security problem of privacy protection in the social network environment.

Disclosure of Invention

The application aims to provide a social network privacy attack framework based on graph data, a construction method and application thereof, so as to solve the technical problems.

In order to achieve the above object, the present application provides the following solutions:

the application firstly provides a social network privacy attack framework based on graph data, wherein the attack framework is a carefully planned knowledge base for tracking network pairs tactics and technologies used by a threatening actor in the whole attack life cycle. The framework is not just a data set: it is intended to be used as a tool to strengthen the security situation of an organization. The attack framework may be from the adversary's perspective, and the security action team may more easily infer the adversaries' personal action motivations and understand the relationship of these actions to specific defensive categories.

The social network privacy attack framework based on the graph data comprises user nodes, user attribute nodes, similarity nodes, user social relations, attribute relations and similarity relation connection categories. To perform a privacy attack task, an attacker needs to build a privacy attack framework to represent the social information he has collected about the target user. Social information related to the target user. Such a privacy attack framework may integrate privacy information fully into the network topology, thereby facilitating privacy attacks and inferences. The network construction process is divided into three main phases: preprocessing, attribute classification, and constructing a social network privacy map.

Firstly, according to the data obtained from the social network, preprocessing is needed, and the data is processed into graph data. Composition G in a common social network _s ＝(V _s ,E _s ) Only social user nodes are included. On this basis, we first determine the relationship between the user and the social network in re (v _i ,v _j ) And establishing connection between the user nodes. Then, according to the attribute information of the user in the social network, the attribute node a _ij ,a _ij ∈a _i ,a _i ＝[a _i1 ,a _i2 ,......,a _in ]A plurality of attribute nodes may be included in one attribute class. According to the relation between the user and the attribute, a corresponding attribute connection re (v _i ,a _ij ). Then extend nodes in the social network to social nodes andattribute node v=v _a +V _s 。

Formatting the collected users by using graph data to obtain A= { a ₀ ,a ₁ ,a ₂ ,......,a _n (wherein a) ₀ ,a ₁ ,......,a _n Is a single attribute value. Converting attribute values to 0,1 matrix a= [ α ] ₀ ,α ₁ ,......,α _n ]. In matrix A, n represents the number of attribute nodes, α _i Is a 0,1 vector, if the user node has the v attribute, it is 1; otherwise, 0. According to the attribute set A= { a ₀ ,a ₁ ,a ₂ ,......,a _n And classifying the attribute values according to the attribute categories classified after the clustering operation. Classifying user attributes according to attribute categories, i.e. a _ij ＝{a _ij |a _ij ∈β _i },B＝{β ₀ ,β ₁ ,......,β _n }. Wherein B is an attribute category matrix. According to the attribute set A and the attribute category matrix B, the formal representation A ' = { a ' is obtained ' ₀ ,a′ ₁ ,......,a′ _n }。

Further calculating the similarity of attribute nodes and user nodes, and establishing a similarity set S= { S _AA ,S _PA ,S _RA ,S _CN }, whereinsim e AA, PA, RA, CN. Converting the similarity into a similarity node, adding the similarity node into a social attribute framework, and establishing attribute similarity connection re (a _ij ,s _m ) And user similarity connection re (v _i ,s _m ) There is v=v _a +V _s +V _sim . And then, further refining the category of the attribute connection on the basis of the original attribute connection. According to the classification of different attributes, attribute connections can be further refined into gender attribute connectionsAge attribute connections, interest attribute connections, location attribute connections, and the like. Attribute connections, etc. After labeling the connection, the connection attributes are combined with W-addedAnd (3) entering a social attribute network to obtain a social network privacy graph G= (V, E, W).

The application has the beneficial effects that: the application provides a social network privacy attack framework based on graph data to realize information aggregation in a social network, so that a large-scale graph data set can be processed and the requirements of tasks such as downstream attack and recommendation of a knowledge graph can be met. Compared with the prior framework based on the social network, the framework fully fuses the relationship information of the user, namely the topological structure, the attribute information in the social network and the similarity index, and has stronger computing and learning capabilities. In terms of the practicability of the framework, the privacy attack framework not only can be used for safety analysis of identity privacy and attribute privacy, but also can be used for simulating machine learning attack methods such as similarity attack and graph embedding.

Drawings

FIG. 1 is a flow chart of the calculation of the present application.

Detailed Description

The preferred embodiments of the present application will be described in detail below with reference to the attached drawings, so that the objects, features and advantages of the present application will be more clearly understood. It should be understood that the embodiments shown in the drawings are not intended to limit the scope of the application, but rather are merely illustrative of the true spirit of the application.

In the following description, for the purposes of explanation of various disclosed embodiments, certain specific details are set forth in order to provide a thorough understanding of the various disclosed embodiments. One skilled in the relevant art will recognize, however, that an embodiment may be practiced without one or more of the specific details. In other instances, well-known devices, structures, and techniques associated with the present application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.

Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Example 1:

the calculation flow of the application is shown in figure 1. In social networks, the attacker is interested in the privacy or user information of the user. It may be an operator and investor in whom the software wants to promote its services, or a third party organization interested in user information. In this scenario, an attacker may acquire part of the public information, but not all of the public information. An attacker can acquire attribute information (i.e., residence, gender, and profession) and relationship information (i.e., friends, relatives) of the user. After the attacker obtains the user information, the user data is preprocessed firstly, namely, a social attribute network G is established _s ＝(V _s ,E _s ). The attribute nodes are divided according to the mode of attribute collection through machine learning algorithm models such as SVM and Random Forest, namelyCalculating a user node degree index and similarity indexes (comprising CN, AA, PA, RA) between the user node degree index and other nodes, establishing user similarity nodes and attribute similarity nodes, and establishing a similarity set S= { S _AA ,S _PA ,S _RA ,S _CN }. Then classifying the connection in the knowledge graph, marking different class labels according to the attribute, identity and similarityWherein the attribute connections may be further divided according to their attribute categories. Finally, the construction of the social network privacy attack framework based on the graph data is completed.

The workflow of the application can be divided into three stages: an initialization stage, an attribute classification stage and a knowledge graph frame connection construction stage. In the present application, an attacker is considered as a third party or a software developer who cannot acquire all information, and processes the acquired social network data using a knowledge pattern. The method has the classification capability of selecting a machine learning method and an attack algorithm when constructing an attack framework.

The method comprises the following steps:

(1) Initialization phase

The initialization phase is divided into three steps:

step 1: firstly, collecting user and attribute information in a social network to obtain a social network G _s ＝(V _s ,E _s ) Wherein V is _s For user nodes, E _s Is a user relationship connection. And simultaneously obtaining a user attribute set A.

Step 2: calculating the relation degree k of the user according to the attribute node and the user relation node _u And attribute degree k _a The degree of relationship is the number of user nodes that this node owns; attribute degree is the number of attribute nodes that a node possesses.

Step 3: calculate each existing relationship connection re (v _i ,v _j ) And attribute connection re (v) _i ,a _ij ) AA, CN, PA, RA similarity index of (2). And converting the similarity index into information which can be identified by the attribute nodes, and adding the information into a social attribute network, wherein the calculated similarity is denoted by S. Similarity set S= { S of knowledge graph is established based on the method _AA ,S _PA ,S _RA ,S _CN }, wherein

(2) Attribute classification stage

The node classification stage is divided into three steps:

step 1: firstly, formatting the collected users by using graph data to obtain A= { a ₀ ,a ₁ ,a ₂ ,......,a _n (wherein a) ₀ ,a ₁ ,......,a _n Is a single attribute value. Converting attribute values to 0,1 matrix a= [ α ] ₀ ,α ₁ ,......,α _n ]. Briefly, the "Nanjing" and "Beijing" belong to attribute values, and the "City" belongs to attribute categories. In matrix A, n represents the number of attribute nodes, α _i Is a 0,1 vector, if the user node has the v attribute, it is 1; otherwise, 0.

Step 2: according to the attribute set A= { a ₀ ,a ₁ ,a ₂ ,......,a _n And classifying the attribute values according to the attribute categories classified after the clustering operation. User attributes are divided according to attribute categories through machine learning algorithm models such as SVM and random forest, namely a _ij ＝{a _ij |a _ij ∈β _i },B＝{β ₀ ,β ₁ ,......,β _n }. Wherein B is an attribute category matrix.

Step 3: according to the attribute set A and the attribute category matrix B, the formal representation A ' = { a ' is obtained ' ₀ ,a′ ₁ ,......,a′ _n }。

(3) Stage of constructing knowledge graph frame

The knowledge graph frame construction stage comprises four steps:

step 1: converting the existing attribute into attribute nodes, converting the calculated similarity into similarity nodes, and adding the similarity nodes into a social attribute knowledge graph G _s ＝(V _s ,E _s ) In which v=v _s +V _a +V _sim

Step 2: the nodes in the knowledge graph are divided into three main categories of attribute nodes, user nodes and similarity nodes according to the node characteristics. Wherein the connection between the attribute node and the user node is called an attribute connection re (v _i ,a _ij ) The connection between the user node and the user node is called user connection re (v _i ,v _j ) The connection between the user node and the similarity node is called a similarity connection re (v _i ,s _m ) The connection between the attribute node and the similarity node is called an attribute similarity connection re (a _ij ,s _m ). Wherein s is _m ∈S,S＝{S _AA ,S _PA ,S _RA ,S _CN }。

Step 3: based on the attributes of the nodes and the connections between the nodes, the connections in the social knowledge graph are classified, and the connections are labeled with corresponding attribute tags. For the categories of different attributes, attribute connection needs to be further refined into gender attribute connection, professional attribute connection, hobby attribute connection and the like. Namely:

step 4: after labeling the connection, adding the connection attribute in combination with W into a social attribute network to obtain a social network privacy attack frame G= (V, E, W) based on the graph data.

The foregoing has shown and described the basic principles and main features of the present application and the advantages of the present application. It will be understood by those skilled in the art that the present application is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present application, and various changes and modifications may be made without departing from the spirit and scope of the application, which is defined in the appended claims.

Claims

1. The social network privacy attack framework based on the graph data is characterized in that: the attack framework comprises user nodes, user attribute nodes, similarity nodes, user social relations, attribute relations and similarity relation connection categories; the framework constructs a social attribute network based on a graph data topological structure by integrating user privacy and attribute information in a social network, creates user nodes by using identity relations of users, represents the social relations of the users in the social network in a knowledge graph, constructs attribute nodes in the knowledge graph by using user attributes, and finally classifies the connection between the nodes by a classification mechanism to realize fine granularity of the user relations.

2. A method of constructing an attack framework according to claim 1, comprising the steps of: firstly, counting and calculating the number of nodes and node relations in a social network, then, quantifying user attributes in the social network through attribute collection, constructing user nodes and attribute nodes by using a knowledge graph, calculating network complexity and direct degree characteristics of each node, and finally, integrating information to classify node connection to form a graph data representation framework of social network information.

3. The attack framework construction method according to claim 2, wherein: the method also comprises the steps of constructing identity nodes and attribute nodes aiming at the user relationship and the user attribute in the social network, and calculating the user node degree index and the similarity index between the user node degree index and other nodes.

4. A method of constructing an attack framework according to claim 3, wherein: the similarity index includes one of CN, AA, PA, RA or a combination thereof.

5. The method for constructing an attack framework according to claim 4, wherein: the step of calculating the similarity index between the user node degree index and other nodes comprises the following steps:

let X and Y be user nodes, N (X) be the set of neighbors of node X, k be the degree of node;

CN(X,Y)＝|N(X)∩N(Y)|

PA(X,Y)＝|N(X)|×|N(Y)|

and constructing attribute nodes according to the four local similarity indexes, adding the attribute nodes into a set of the previous attribute nodes, and constructing an attribute set of the knowledge graph.

6. A method of constructing an attack framework according to claim 3, wherein: aiming at constructed and screened user identity nodes and attribute nodes, firstly carrying out fine granularity classification on the attribute nodes, and dividing the attribute nodes according to a plurality of major categories including residence, gender and occupation; then, marking the corresponding attribute connection according to the classified attribute nodes, and endowing the attribute connection with the same attribute characteristics as the attribute nodes; if the connection between the user nodes is the connection between the user nodes, the connection attribute type is the user connection.

7. A method of constructing an attack framework according to claim 3, comprising the steps of:

1) Firstly, collecting user and attribute information in a social network to obtain a social network G _s ＝(V _s ,E _s ) Wherein V is _s For user nodes, E _s Connection for user relationship; simultaneously obtaining a user attribute set A;

2) Fine-grained classification is performed on the user attribute set to obtain A= { a ₀ ,a ₁ ,a ₂ ,......,a _n (wherein a) ₀ ,a ₁ ,......,a _n For each attribute category matrix; converting attribute matrix into 0,1 matrix a= [ alpha ] ₀ ,α ₁ ,......,α _n ]The method comprises the steps of carrying out a first treatment on the surface of the Where n represents the number of attribute nodes, α _i Is a 0,1 vector, if the user node has the v attribute, it is 1; otherwise, 0;

3) From the attribute set a, a formal representation a ' = { a ' thereof is obtained ' ₀ ,a′ ₁ ,......,a′ _n }；

4) Calculating the relation degree k of the user according to the attribute node and the user relation node _u And attribute degree k _a The degree of relationship is the number of user nodes that this node owns; attribute degree is the number of attribute nodes that a node possesses;

5) Calculate each existing relationship connection re (v _i ,v _j ) And attribute connection re (v) _i ,a _ij ) AA, CN, PA, RA similarity index of (2); and converting the similarity phase index into information which can be identified by the attribute node, and adding the information into a social attribute network, wherein the calculated similarity is denoted by s.

8. The attack framework construction method according to claim 2, wherein: the method also comprises the steps of creating user nodes and user attribute nodes according to the user and attribute information in the social network to form a social attribute network; on the basis, according to the calculated similarity of the similarity index, a similarity node is created and added into the social attribute network.

9. The attack framework construction method according to claim 3 or 8, characterized in that: the attribute node classification and the connection classification comprise the following steps:

step 1: firstly, constructing a social attribute knowledge graph G according to the existing social network information _s Wherein the set of attributes a= { a is obtained ₀ ,a ₁ ,a ₂ ,......,a _n A 0,1 attribute matrix a= [ α ] ₀ ,α ₁ ,......,α _n ]And formalized representation of the attribute set a '= { a' ₀ ,a′ ₁ ,......,a′ _n }；

Step 2: according to the attribute set A= { a ₀ ,a ₁ ,a ₂ ,......,a _n Formalized representation of the sum attribute set a' = { a ₀ ′,a ₁ ′,......,a _n ' classifying the attribute nodes; the attribute nodes are divided according to the mode of attribute collection through machine learning algorithm models such as SVM and Random Forest, namely

Step 3: dividing the nodes in the knowledge graph into three main categories, namely attribute nodes, user nodes and similarity nodes according to the node characteristics; wherein the connection between the attribute node and the user node is called an attribute connection re (v _i ,a _ij ) The connection between the user node and the user node is called user connection re (v _i ,v _j ) The connection between the user node and the similarity node is called a similarity connection re (v _i ,s _m ) The connection between the attribute node and the similarity node is called an attribute similarity connection re (a _ij ,s _m )；

Step 4: based on the attributes of the nodes and the connection between the nodes, classifying the connection in the social knowledge graph, and marking the connection with corresponding attribute labels; aiming at attribute connection, the attribute connection of the category needs to be further refined into connection comprising gender attribute connection, professional attribute connection and hobby attribute connection; namely:

step 5: after labeling the connection, adding the connection attribute in combination with W into a social attribute network to obtain a social network privacy attack frame G= (V, E, W) based on the graph data.

10. The application of the graph data based social network privacy attack framework of claim 1 in security analysis of identity privacy and attribute privacy, or in a machine learning attack method including simulated similarity attacks and graph embedding.