[go: up one dir, main page]

CN118964962B - A social network user identification method and device based on multi-dimensional social fingerprint - Google Patents

A social network user identification method and device based on multi-dimensional social fingerprint Download PDF

Info

Publication number
CN118964962B
CN118964962B CN202411441385.5A CN202411441385A CN118964962B CN 118964962 B CN118964962 B CN 118964962B CN 202411441385 A CN202411441385 A CN 202411441385A CN 118964962 B CN118964962 B CN 118964962B
Authority
CN
China
Prior art keywords
social
user
fingerprint
users
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411441385.5A
Other languages
Chinese (zh)
Other versions
CN118964962A (en
Inventor
贾伟雯
冉义军
肖婧
许小可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Campus Of Beijing Normal University
Original Assignee
Zhuhai Campus Of Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Campus Of Beijing Normal University filed Critical Zhuhai Campus Of Beijing Normal University
Priority to CN202411441385.5A priority Critical patent/CN118964962B/en
Publication of CN118964962A publication Critical patent/CN118964962A/en
Application granted granted Critical
Publication of CN118964962B publication Critical patent/CN118964962B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a social network user identification method and equipment based on multidimensional social fingerprints, belonging to the technical field of user identification, collecting social network data and preprocessing, extracting multidimensional social fingerprint characteristics of a user and constructing multidimensional social fingerprints; the method and the device for identifying the unknown user can improve the accuracy and the effectiveness of the user identification by inputting the multi-dimensional social fingerprint distance as a data set into a model for training and testing to determine whether the user belongs to the same person or not, thereby identifying the identity of the unknown user.

Description

Social network user identification method and device based on multidimensional social fingerprint
Technical Field
The invention relates to the technical field of user identification, in particular to a social network user identification method and device based on multidimensional social fingerprints.
Background
With the rapid development of social networks, a great deal of user interaction data is accumulated on the social network platform. These data include not only personal information of the users, but also interactive behavior between users, posted content, comments, praise, etc. The rich data not only provides deep user insight for the social network platform, but also brings new opportunities for information dissemination and sharing. However, this also presents a series of problems such as user identity forgery, false account flooding, and the addition of malicious activity. These problems not only affect the health development of social networks, but also pose a threat to the privacy and security of users.
To cope with these problems, user identification is a key technology. By the effective user identification method, real users and false accounts can be distinguished, and social robot users can be identified, so that the safety and the credibility of a social network are improved. Existing social network user identification methods cover a variety of approaches including graph theory-based algorithms, machine learning techniques, crowdsourcing strategies, and the like. Although these methods are each characterized, there are certain limitations. For example, graph theory algorithms are good at discovering social relationships of users in a social network, but an attacker may evade recognition by forging a user's social relationship graph, and in addition, this approach relies on a relationship graph that is static and cannot capture the dynamic evolution of user relationships. The machine learning technology can extract the behavior mode of the social network user from massive user behavior data, so as to train the recognition model and improve the recognition efficiency and accuracy. However, this approach can be affected by data quality, complexity of user behavior, and dynamic changes in the social network. Crowd-sourced strategies have advantages in data accuracy and reliability, but implementation in large-scale networks can raise costly and privacy concerns.
In these various recognition algorithms, extracting effective features is a key to social network user recognition, and the effective features need to be able to accurately capture the behavior patterns of social network users, however, as the behaviors and policies of social network users may evolve continuously, this increases the difficulty of feature engineering.
At present, a graph theory-based method shows a certain effectiveness in social network user identification, but most methods still need to collect various characteristic data of users, so that not only is the complexity of data collection increased, but also most of collected characteristics are often difficult to accurately capture dynamic evolution of user relations in a social network. Therefore, when extracting features capable of accurately identifying social network users, the applicability of the features in different social networks needs to be explored to solve the current challenges.
For example, patent 202010814842.6 discloses a method, apparatus, device, and storage medium for user identification in a social network that rely on graph structure data for analysis and identification. The method utilizes propagation data in a social network to represent connection relations among users, generates graph structure data to represent connection among the users based on the propagation data, then utilizes the graph structure data to determine attribute data of each user, such as node degree, compactness centrality and intermediation centrality, and finally identifies key users and sets corresponding labels according to the attribute data of the users. However, this approach relies primarily on static graph structure data and does not adequately capture dynamic changes in user relationships in social networks. While the behavior and relationships of users are constantly changing in social networks, static graph structures may not reflect these changes, affecting the accuracy of recognition. For another example, patent 202210429087.9 discloses a social network topology based associated user identification method that collects data sets of two social networks, including a user and a friend relationship between users. And respectively constructing topological graphs of the social networks G-1 and G-2 according to the relationship between the user and the friends. Generating a node sequence set by using a random walk method, mapping the node sequence set into a feature space by using a skip-gram model, and learning a low-dimensional vector representation of the node. And obtaining a target feature mapping matrix through training, mapping the two feature spaces to the same space, calculating the similarity between the nodes, and identifying the associated users. However, the method mainly relies on a static topological graph of the social network, and calculates node characteristics and user relevance through a graph structure, and also cannot fully capture dynamic changes of user behaviors and relationships. Furthermore, using a skip-gram model to map a sequence of nodes into feature space and train a target feature mapping matrix, this process typically requires a significant amount of computational resources and training time. Such a need for computing resources may limit the scalability of the method, especially when real-time applications are made in a large-scale social network.
Therefore, how to provide a social network user identification method and device based on multi-dimensional social fingerprints, which can solve the above problems, is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a social network user identification method and device based on multidimensional social fingerprints, which construct more unique and tamper-resistant characteristics to cope with dynamically-changed social network environments, thereby improving the accuracy and effectiveness of user identification.
In order to achieve the above object, the present invention provides the following technical solutions:
a social network user identification method based on multi-dimensional social fingerprints comprises the following steps:
S100, collecting social network data;
s200, constructing a social interaction network based on social network data;
S300, extracting multidimensional social fingerprint characteristics of a user based on a social interaction network;
S400, constructing a multi-dimensional social fingerprint based on the multi-dimensional social fingerprint features;
S500, calculating multi-dimensional social fingerprint distances among users, taking the social fingerprint distance between each unknown user and the known user as a multi-dimensional social fingerprint distance characteristic, classifying and labeling the affiliated user pairs of the multi-dimensional social fingerprint distance characteristic, and generating label sample data of the same user and different users;
S600, constructing XGBoost a recognition model, and inputting the multi-dimensional social fingerprint distance characteristics into the XGBoost recognition model for training and verification;
and S700, carrying out user identification by utilizing the social fingerprint distance characteristics of the unknown user and the known user through the trained XGBoost identification model.
Preferably, S200 includes:
Extracting a one-hop individual interaction graph of a user directly connected with one-hop neighbors of the user and an individual interaction graph of the user, the one-hop neighbors of the user and two-hop neighbors indirectly connected with the user through the one-hop neighbors from the social network data;
And constructing a social interaction network based on the individual interaction graph of the two-hop neighbors.
Preferably, S300 includes:
and extracting multidimensional social fingerprint features between the user and contacts corresponding to the user based on the social interaction network.
Preferably, the multi-dimensional social fingerprint features comprise interaction times, interaction reciprocity, topology overlapping, connection diversity, total number of die bodies, connection age, relative freshness, continuous call proportion in preset time and user activity diversity features.
Preferably, S400 includes:
And arranging the extracted multi-dimensional social fingerprint features in sequence from large to small, calculating and returning the proportion of the value of each contact person under each multi-dimensional social fingerprint feature to the total feature value under the feature, and constructing the multi-dimensional social fingerprint of the user in a preset time interval.
Preferably, S500 includes:
dividing the social network into different periods according to time average, wherein the users in the first period are known users, and the users in other periods are unknown users;
Sequentially calculating social fingerprint distances between each unknown user in other periods and all known users in the first period, and taking the social fingerprint distances as multidimensional social fingerprint distance characteristics;
If the true labels of the known user and the unknown user belong to the same user, marking the multi-dimensional social fingerprint distance characteristic between the two users as 1, and if the two users are not the same user, marking the multi-dimensional social fingerprint distance characteristic as 0, so as to generate label sample data of the same user and different users.
Preferably, S600 includes:
dividing the label sample data into a training set, a verification set and a test set;
Inputting the multi-dimensional social fingerprint distance characteristics among users in the training set and the preset labels in the training set into XGBoost identification models for training;
the optimal parameters of the identification model are determined XGBoost by the validation set, resulting in a final XGBoost identification model.
An electronic device includes one or more processors and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a social network user identification method based on multi-dimensional social fingerprints.
Compared with the prior art, the social network user identification method and device based on the multi-dimensional social fingerprint are disclosed, and the user data on the social network are processed to capture the dynamic multi-dimensional social fingerprint characteristics of the user. And constructing multidimensional social fingerprints in the network, calculating social fingerprint distances among users, and classifying the social fingerprint distances by utilizing the distance characteristic data sets so as to identify the identity of the users. According to the method, the social network user can be accurately identified by only extracting the interaction information in the user network, and the constructed multi-dimensional social fingerprint can effectively reflect the long-term and unique interaction behavior of the user in the social network. The method overcomes the difficulty that the prior social network user identification technology based on static information is difficult to adapt to the dynamic environment, and remarkably improves the time stability of the algorithm. Therefore, the method is more convenient and efficient to apply in a practical system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a social network user identification method based on multi-dimensional social fingerprints;
FIG. 2 is a flowchart of the multi-dimensional social fingerprint construction provided by the invention;
FIG. 3 is a flowchart of the multi-dimensional social fingerprint feature calculation provided by the invention;
FIG. 4 is a flow chart for constructing social fingerprints based on feature values provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a social network user identification method and equipment based on multidimensional social fingerprints, wherein the method comprises the following steps: the method comprises the steps of collecting social network data and preprocessing, extracting multi-dimensional social fingerprint characteristics of users, constructing multi-dimensional social fingerprints, calculating social fingerprint distances among the users, inputting the multi-dimensional social fingerprint distance characteristics as a data set into a model for training test, and determining whether the users belong to the same person, so that the identity of an unknown user is identified, and a solution is provided for the problem of poor accuracy and effectiveness of the existing user identification.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1, on one hand, the embodiment of the invention discloses a social network user identification method based on multidimensional social fingerprints, which comprises the following steps:
And S100, collecting social network data, namely manually collecting interaction data of part of accounts on the social platform.
S200, constructing a social interaction network based on social network data, wherein the social interaction network comprises user nodesWith another user nodeAn interaction is performed, and a slave can be establishedTo the point ofDirected edge of (2). In this network, each directed edge is given a weight whose value is determined by the nodeAnd nodeThe number of interactions between the two is determined.
The method comprises the steps of S300, extracting multidimensional social fingerprint characteristics of a user based on a social interaction network, and specifically comprises the step of extracting interaction times, interaction reciprocity, topology overlapping, connection diversity, die body total number, connection age, relative freshness, continuous conversation proportion within 5 minutes and user activity diversity characteristics between the user and the contact person based on the social interaction network of the user and the contact person.
S400, constructing the multi-dimensional social fingerprint based on the multi-dimensional social fingerprint features.
S500, calculating the distances of multi-dimensional social fingerprints constructed by the features among users, taking the social fingerprint distances of each unknown user and the known user as multi-dimensional social fingerprint distance features, classifying and labeling the pairs of the users belonging to the multi-dimensional social fingerprint distance features, and generating label sample data of the same user and different users.
S600, constructing XGBoost identification models, and inputting multidimensional social fingerprint distance characteristics into XGBoost identification models for training and verification;
and S700, carrying out user identification by utilizing the social fingerprint distance characteristics of the unknown user and the known user through the trained XGBoost identification model.
Specifically, in one embodiment, the gathering of social network data in step S100 includes:
email data sets for a large research facility cover all incoming and outgoing Email communication records between members and their contacts. The dataset is reduced to members, contacts, and time stamps, spanning from 10 months 2003 to 3 months 2005 for a total of 17 months.
In a specific embodiment, the constructing a social interaction network in step S200 includes:
the constructed social interaction network is a directional weighted network in the form of Where V is a set of nodes in the network, E is a set of conjoined edges in the network, typically represented as ordered pairs (e.g.,Representing an edge from node u to node v), W is a set of weights in the network, representing the weight of each edge, typically expressed as a function(Real set), i.e. for each edgeAll have a corresponding weight
Specifically, the social interaction network rule constructed in the step is that based on the social network interaction data collected in the step S100, each user is abstracted into a node u in the graph, and the interaction relationship existing between the user nodes is converted into edges in the network. Specifically, if the user nodeWith another user nodeInteraction can then build a slaveTo the point ofDirected edge of (2). In this way, all the users collected in step one are converted into nodes of the network, and corresponding directed edges are formed according to the interaction relationship between them. The strength of the interaction relationship also needs to be considered when constructing the directionally weighted network. Specifically, if a user node is at a certain timeWith another user nodeThere are multiple interactions between them, and this directed edge can be given a higher weight. At this time, the weightRepresenting slave user nodesWith another user nodeIs a function of the interaction strength of the device. Finally, a complete social interaction network is constructed, which accurately reflects the interaction relationship between users.
In a specific embodiment, extracting the multi-dimensional social fingerprint construction feature of the user in step S300 includes:
The definition of the interaction times is as follows:
;
Wherein a represents the interaction times initiated by the user u to the user v, and b represents the interaction times initiated by the user v to the user u.
The definition of the interactive reciprocity features is as follows:
;
The definition of the topological overlap feature is:
;
Wherein, Representing the set of neighbors of user u,Representing the set of neighbors of user v,Representing the number of co-neighbors of user u and user v,Representing the total number of neighbors of user u and user v.
The definition of the connection diversity feature is:
;
Wherein, Representing the number of neighbors of user u,Representing the number of neighbors of user v.
Specifically, the definition of the total number of motifs isRepresenting the number of ith fourth order motifs for user u,
The definition of the contact age characteristics is:
;
Wherein, Representing the last moment of the current period of time,The moment of the first interaction of user u with user v is indicated.
The definition of the relative freshness profile is:
;
Wherein, The time of the last interaction between user u and user v before the end of the current period is indicated.Representing the average inter-event time between interactions.
The continuous call proportion feature within 5 minutes is defined as:
;
Wherein, The time of the nth interaction of user u with user v in the current period is represented.
The definition of the user activity diversity features is:
;
Wherein, Representing the number of connections created by user u,Representing the number of connections created by user v.
The multi-dimensional social fingerprint construction characteristics are calculated according to the formula, and the calculation process is shown in fig. 2. In FIG. 2, (a) is the structure of social network data, (b) extracts a one-hop individual interaction graph (1-IIG) of a user (ego) directly connected with one-hop neighbors thereof and an individual interaction graph (2-IIG) of a user (ego) indirectly connected with one-hop neighbors thereof and two-hop neighbors indirectly connected through one-hop neighbors from the social network data, (c) accurately calculates the strength, time and multi-dimensional social fingerprint characteristics under the structure of the user and the contact persons according to the 1-IIG and the 2-IIG, wherein the multi-dimensional social fingerprint characteristics comprise interaction times, interaction reciprocity, topological overlapping, connection diversity, die body total number, connection age, relative freshness, continuous call proportion within 5 minutes and user activity diversity characteristics, and (d) respectively arranges characteristic values under the characteristics according to the order of the specific gravity from large to small, thereby constructing the multi-dimensional social fingerprint of the user. The specific calculation flow of the sub-graph (c) in fig. 2 is shown in fig. 3, and the specific construction flow of the sub-graph (d) is shown in fig. 4.
In a specific embodiment, constructing the multi-dimensional social fingerprint in step S400 includes:
User set U in social interaction network, which comprises n users, namely . Meanwhile, a set of time intervals T is assumed, which includes m time intervals, i.e. For each userAnd each time intervalA feature vector can be extracted from the dataRepresenting the userIn the time intervalBehavior characteristics of (2). Will userAccording to feature vectorsThe values of the contact persons are arranged from large to small, and the proportion of the value of each contact person to the total value is returned, thereby constructing the userIn the time intervalSocial fingerprint of (a)
In a specific embodiment, calculating the social fingerprint distance between users in step S500 includes:
Jensen-Shannon divergence (JSD) was used as a tool to measure social fingerprint distance. Calculating self-distance between social fingerprints of users at different periods And the reference distance of all users from other users in different periods
Wherein, social fingerprint distance is defined as:
;
Wherein, ,For a social fingerprint of two users,Is shannon entropy, k is the number of digits of the last contact after sorting, and is equal in value to the total number of contacts.For the feature value of the contacts ranked r, X is the sum of the feature values of all contacts.
Specifically, the user is first calculatedSelf-distance between social fingerprints in different time intervals. Second, calculate the userReference distance between social fingerprints within a certain time interval and social fingerprints of other usersWherein. The embodiment isLabeled 1, representing the same user. Is thatLabeling with 0 represents different users.
In a specific embodiment, the step S600 of inputting the multi-dimensional social fingerprint distance characteristic data into the XGBoost model for training and testing includes:
And extracting multi-dimensional social fingerprint distance features by using sample data to form a feature data set, and dividing the data set into a training set, a verification set and a test set, wherein the ratio of the training set to the verification set is 9:1. And inputting the characteristic data in the training set and the preset labels in the training set into the XGBoost model, and verifying the optimal parameters of the training set. The test set is used to determine the accuracy of the generated XGBoost recognition model. And finally, accurately identifying the user identity according to the prediction result of the model.
Specifically, in this embodiment, the social fingerprint distance calculated by each detection user pair is used as the feature value. For an unknown user, calculating the social fingerprint distance between the unknown user and each known user in the known user set, so as to accurately acquire the identity of the unknown user.
More specifically, the method extracts multidimensional social fingerprint characteristics from the constructed social interaction network, calculates the social fingerprint distance between an unknown user and a known user, and inputs the social fingerprint distance as characteristics into a XGBoost model for user identification. The multidimensional social fingerprint is formed based on a certain period of time, covers the strength, the structure and the time characteristics in the network, and provides clues for understanding the overall structure evolution of the network. The network analysis is carried out through the multi-dimensional social fingerprint, so that finer granularity can be obtained, the structure and the dynamics of the network can be grasped more accurately, and the problem that the dynamic network environment is difficult to deal with in the existing social network user identification technology based on static information is solved. In addition, the selection and extraction process of the multi-dimensional social fingerprint features is generally closely related to specific tasks and data sets, so that the selection and extraction process can be optimized and adjusted according to different application scenes. This flexibility allows the algorithm to adapt well to different data and tasks, thereby improving its scalability.
On the other hand, the embodiment of the invention also discloses electronic equipment, which comprises one or more processors and a storage device, wherein the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are caused to execute a social network user identification method based on the multi-dimensional social fingerprint.
Specifically, through the technical scheme, in the social interaction network constructed based on Email data, the extracted multidimensional social fingerprint features can accurately capture the interaction information in the network. The information fully reflects interaction and relation among the nodes, so that a unique interaction mode of the social network user can be accurately identified, and the accuracy of identifying the social network user is improved. And due to the flexibility and universality of the multidimensional social fingerprint characteristics, the user can be identified excellently when the method is migrated to other social platform data. The social network user identification accuracy is improved, so that the method can be used for identifying the social robots in the network, and the malicious robots can be more effectively prevented from attacking, propagating false information or conducting fraudulent conduct on the social network platform, and the safety of the social network is improved.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1.一种基于多维度社交指纹的社交网络用户识别方法,其特征在于,包括以下步骤:1. A method for identifying social network users based on multi-dimensional social fingerprints, comprising the following steps: S100:搜集社交网络数据;S100: Collect social network data; S200:基于社交网络数据构建社交互动网络;S200: Constructing social interaction networks based on social network data; S300:基于社交互动网络提取用户的多维度社交指纹特征;S300: Extracting multi-dimensional social fingerprint features of users based on social interaction networks; S400:利用基于所述多维度社交指纹特征构建多维度社交指纹,包括:S400: constructing a multi-dimensional social fingerprint based on the multi-dimensional social fingerprint feature, including: 将提取的多维度社交指纹特征分别按照从大到小的顺序排列,计算并返回各个多维度社交指纹特征下每个联系人的值占该特征下总特征值的比例,构建出用户在预设时间区间内的多维度社交指纹;Arrange the extracted multi-dimensional social fingerprint features in descending order, calculate and return the ratio of the value of each contact under each multi-dimensional social fingerprint feature to the total feature value under the feature, and construct the multi-dimensional social fingerprint of the user within a preset time interval; 其中,步骤S400中构建多维度社交指纹包括:The step S400 of constructing a multi-dimensional social fingerprint includes: 社交互动网络中用户集合U,其中包含n个用户,即U={u1,u2,…,un};同时,假设有一个时间区间集合T,其中包含m个时间区间,即T={t1,t2,…,tm};对于每个用户ui和每个时间区间tj,从数据中提取一个特征向量x(ui,tj),表示用户ui在时间区间tj的行为特征;将用户ui的联系人按照特征向量x(ui,tj)中的值从大到小排列,并返回每个联系人的值占总值的比例,由此构建用户ui在时间区间tj的社交指纹P(ui,tj);The user set U in the social interaction network contains n users, that is, U = {u 1 , u 2 , ..., un }; at the same time, assume that there is a time interval set T, which contains m time intervals, that is, T = {t 1 , t 2 , ..., t m }; for each user u i and each time interval t j , extract a feature vector x (u i , t j ) from the data to represent the behavioral characteristics of user u i in time interval t j ; arrange the contacts of user u i from large to small according to the values in the feature vector x (u i , t j ) and return the proportion of each contact's value to the total value, thereby constructing the social fingerprint P (u i , t j ) of user u i in time interval t j ; S500:计算用户间多维度社交指纹距离,将每个待识别用户与已知用户的社交指纹距离作为多维度社交指纹距离特征,并对所述多维度社交指纹距离特征的所属用户对进行分类标注,生成同一用户和不同用户的标签样本数据,包括:S500: Calculate the multi-dimensional social fingerprint distance between users, use the social fingerprint distance between each user to be identified and the known user as a multi-dimensional social fingerprint distance feature, and classify and label the user pairs to which the multi-dimensional social fingerprint distance feature belongs, and generate label sample data of the same user and different users, including: 将所述社交网络按时间平均划分为不同时期,其中,划分第一时期用户为已知用户,其他时期用户为未知用户;Dividing the social network into different periods evenly according to time, wherein users in the first period are classified as known users, and users in other periods are classified as unknown users; 依次计算其他时期中每个未知用户与第一时期的所有已知用户的社交指纹距离,并作为多维度社交指纹距离特征;The social fingerprint distance between each unknown user in other periods and all known users in the first period is calculated in sequence and used as a multi-dimensional social fingerprint distance feature; 其中,未知用户的真实标签是已知的,若已知用户和未知用户的真实标签是同一个用户,则将两个用户间的多维度社交指纹距离特征打标签为1,若不是同一用户,则将多维度社交指纹距离特征打标签为0,生成同一用户和不同用户的标签样本数据;The real label of the unknown user is known. If the real label of the known user and the unknown user is the same user, the multi-dimensional social fingerprint distance feature between the two users is labeled as 1. If they are not the same user, the multi-dimensional social fingerprint distance feature is labeled as 0 to generate label sample data of the same user and different users. 其中,步骤S500中计算用户间社交指纹距离包括:The step S500 of calculating the social fingerprint distance between users includes: 采用Jensen-Shannon散度(JSD)作为度量社交指纹距离的工具;计算用户在不同时期的社交指纹之间的自身距离dself以及所有用户在不同时期与其他用户的参考距离drefJensen-Shannon divergence (JSD) is used as a tool to measure the distance of social fingerprints; the self-distance d self between the social fingerprints of users at different times and the reference distance d ref between all users and other users at different times are calculated; 其中,社交指纹距离定义为:Among them, the social fingerprint distance is defined as: 其中,P1,P2为两个用户的社交指纹,H(P)是香农熵,k为排序后的最后一个联系人的位数,数值上等于联系人总数;x(r)为排名r的联系人的特征值,X为所有联系人的特征值之和;Where P 1 and P 2 are the social fingerprints of two users, H(P) is the Shannon entropy, k is the number of digits of the last contact after sorting, which is numerically equal to the total number of contacts; x(r) is the eigenvalue of the contact ranked r, and X is the sum of the eigenvalues of all contacts; 具体过程为:首先计算用户ui在不同时间区间内的社交指纹之间的自身距离dself(ui);其次,计算用户ui在某个时间区间内的社交指纹与其他用户的社交指纹之间的参考距离dref(ui,uj),其中uj∈U-{ui};The specific process is as follows: first, calculate the self-distance d self (u i ) between the social fingerprints of user u i in different time intervals; second, calculate the reference distance d ref (u i , u j ) between the social fingerprint of user u i in a certain time interval and the social fingerprints of other users, where u j ∈ U-{u i }; S600:构建XGBoost识别模型,将所述多维度社交指纹距离特征输入到所述XGBoost识别模型中进行训练和验证;S600: Building an XGBoost recognition model, and inputting the multi-dimensional social fingerprint distance feature into the XGBoost recognition model for training and verification; S700:通过训练好的XGBoost识别模型,利用待识别用户与已知用户的社交指纹距离特征进行用户识别。S700: Through the trained XGBoost recognition model, the social fingerprint distance feature between the user to be identified and the known users is used to perform user identification. 2.根据权利要求1所述的一种基于多维度社交指纹的社交网络用户识别方法,其特征在于,S200包括:2. The method for identifying social network users based on multi-dimensional social fingerprints according to claim 1, characterized in that S200 comprises: 从所述社交网络数据中提取出用户与其一跳邻居直接相连的一跳个体交互图,以及用户与其一跳邻居以及通过一跳邻居间接相连的两跳邻居的个体交互图;Extracting from the social network data a one-hop individual interaction graph of a user directly connected to its one-hop neighbor, and an individual interaction graph of a user and its one-hop neighbor and two-hop neighbors indirectly connected via the one-hop neighbor; 基于两跳邻居的个体交互图构建社交互动网络。Constructing social interaction networks based on individual interaction graphs of two-hop neighbors. 3.根据权利要求1所述的一种基于多维度社交指纹的社交网络用户识别方法,其特征在于,S300包括:3. The method for identifying social network users based on multi-dimensional social fingerprints according to claim 1, characterized in that S300 comprises: 基于社交互动网络提取出用户和用户对应的联系人间的多维度社交指纹特征。Based on the social interaction network, multi-dimensional social fingerprint features between users and their corresponding contacts are extracted. 4.根据权利要求1所述的一种基于多维度社交指纹的社交网络用户识别方法,其特征在于,所述多维度社交指纹特征包括:互动次数、互动互惠度、拓扑重叠、连接多样性、模体总数、联系年龄、相对新鲜度、5分钟内的连续通话比例以及用户活动多样性特征。4. A method for identifying social network users based on multi-dimensional social fingerprints according to claim 1, characterized in that the multi-dimensional social fingerprint features include: number of interactions, interaction reciprocity, topological overlap, connection diversity, total number of motifs, contact age, relative freshness, continuous call ratio within 5 minutes, and user activity diversity features. 5.根据权利要求1所述的一种基于多维度社交指纹的社交网络用户识别方法,其特征在于,S600包括:5. The method for identifying social network users based on multi-dimensional social fingerprints according to claim 1, wherein S600 comprises: 将标签样本数据划分为训练集、验证集与测试集;Divide the labeled sample data into training set, validation set and test set; 将训练集中用户间的多维度社交指纹距离特征和训练集中的预设定标签输入到XGBoost识别模型中进行训练;The multi-dimensional social fingerprint distance features between users in the training set and the pre-set labels in the training set are input into the XGBoost recognition model for training; 通过验证集确定XGBoost识别模型的最佳参数,生成最终的XGBoost识别模型。The optimal parameters of the XGBoost recognition model are determined through the validation set, and the final XGBoost recognition model is generated. 6.一种电子设备,其特征在于,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,其中,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至5任意一项所述的方法。6. An electronic device, characterized in that it comprises: one or more processors; a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors execute the method as described in any one of claims 1 to 5.
CN202411441385.5A 2024-10-16 2024-10-16 A social network user identification method and device based on multi-dimensional social fingerprint Active CN118964962B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411441385.5A CN118964962B (en) 2024-10-16 2024-10-16 A social network user identification method and device based on multi-dimensional social fingerprint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411441385.5A CN118964962B (en) 2024-10-16 2024-10-16 A social network user identification method and device based on multi-dimensional social fingerprint

Publications (2)

Publication Number Publication Date
CN118964962A CN118964962A (en) 2024-11-15
CN118964962B true CN118964962B (en) 2025-03-18

Family

ID=93388966

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411441385.5A Active CN118964962B (en) 2024-10-16 2024-10-16 A social network user identification method and device based on multi-dimensional social fingerprint

Country Status (1)

Country Link
CN (1) CN118964962B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259962A (en) * 2020-01-17 2020-06-09 中南大学 A Sybil Account Detection Method for Time Series Social Data
CN117708711A (en) * 2023-12-09 2024-03-15 西安交通大学 Method and system for identifying social robot users on Weibo platform based on multi-dimensional features

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505307B (en) * 2021-09-06 2021-12-07 南京航空航天大学 Social network user region identification method based on weak supervision enhancement
CN114782209B (en) * 2022-04-22 2024-06-11 武汉大学 Social network topological graph-based associated user identity recognition method
CN118606630A (en) * 2024-06-19 2024-09-06 北京师范大学珠海校区 A method, device and storage medium for identifying gender of social network users based on heterogeneous motif features

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259962A (en) * 2020-01-17 2020-06-09 中南大学 A Sybil Account Detection Method for Time Series Social Data
CN117708711A (en) * 2023-12-09 2024-03-15 西安交通大学 Method and system for identifying social robot users on Weibo platform based on multi-dimensional features

Also Published As

Publication number Publication date
CN118964962A (en) 2024-11-15

Similar Documents

Publication Publication Date Title
CN111614690B (en) Abnormal behavior detection method and device
KR101898648B1 (en) Apparatus and Method for Detecting Interacting Groups Between Individuals in an Image
CN113422761B (en) Malicious social user detection method based on counterstudy
Wanda et al. DeepOSN: Bringing deep learning as malicious detection scheme in online social network
CN111008337A (en) Deep attention rumor identification method and device based on ternary characteristics
CN107679626A (en) Machine learning method, device, system, storage medium and equipment
Khlobystova et al. Soft estimates for social engineering attack propagation probabilities depending on interaction rates among instagram users
Gautam et al. Adaptive discretization using golden section to aid outlier detection for software development effort estimation
Sheng et al. Network traffic anomaly detection method based on chaotic neural network
CN110263106B (en) Collaborative public opinion fraud detection method and device
Wang et al. Unsupervised outlier detection for mixed-valued dataset based on the adaptive k-nearest neighbor global network
CN116662982A (en) Fraud detection method and device based on associated fraud perception
CN110097120B (en) Network flow data classification method, equipment and computer storage medium
CN108364098A (en) A kind of weather characteristics register on user the measure of influence
Malmir et al. Optimization of data mining with evolutionary algorithms for cloud computing application
CN118262276B (en) Method and device for detecting counterfeiting of video, electronic equipment and storage medium
CN118964962B (en) A social network user identification method and device based on multi-dimensional social fingerprint
CN118761888A (en) Smart city service platform, method and equipment based on cloud computing and big data
CN106815653B (en) Distance game-based social network relationship prediction method and system
CN118606630A (en) A method, device and storage medium for identifying gender of social network users based on heterogeneous motif features
CN113407808A (en) Method and device for judging applicability of graph neural network model and computer equipment
Folch et al. Web bot detection using mouse movement
CN115631622B (en) Traffic state prediction method, device, electronic device and storage medium
CN111860598A (en) A data analysis method and electronic device for motion behavior and relationship recognition
CN112966732B (en) Multi-factor interactive behavior anomaly detection method with periodic attribute

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant