CN118964962B - A social network user identification method and device based on multi-dimensional social fingerprint - Google Patents
A social network user identification method and device based on multi-dimensional social fingerprint Download PDFInfo
- Publication number
- CN118964962B CN118964962B CN202411441385.5A CN202411441385A CN118964962B CN 118964962 B CN118964962 B CN 118964962B CN 202411441385 A CN202411441385 A CN 202411441385A CN 118964962 B CN118964962 B CN 118964962B
- Authority
- CN
- China
- Prior art keywords
- social
- user
- fingerprint
- users
- dimensional
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 7
- 230000003993 interaction Effects 0.000 claims description 39
- 230000003997 social interaction Effects 0.000 claims description 20
- 230000000694 effects Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 230000003542 behavioural effect Effects 0.000 claims 1
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000006399 behavior Effects 0.000 description 9
- 230000003068 static effect Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000000547 structure data Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005242 forging Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003012 network analysis Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a social network user identification method and equipment based on multidimensional social fingerprints, belonging to the technical field of user identification, collecting social network data and preprocessing, extracting multidimensional social fingerprint characteristics of a user and constructing multidimensional social fingerprints; the method and the device for identifying the unknown user can improve the accuracy and the effectiveness of the user identification by inputting the multi-dimensional social fingerprint distance as a data set into a model for training and testing to determine whether the user belongs to the same person or not, thereby identifying the identity of the unknown user.
Description
Technical Field
The invention relates to the technical field of user identification, in particular to a social network user identification method and device based on multidimensional social fingerprints.
Background
With the rapid development of social networks, a great deal of user interaction data is accumulated on the social network platform. These data include not only personal information of the users, but also interactive behavior between users, posted content, comments, praise, etc. The rich data not only provides deep user insight for the social network platform, but also brings new opportunities for information dissemination and sharing. However, this also presents a series of problems such as user identity forgery, false account flooding, and the addition of malicious activity. These problems not only affect the health development of social networks, but also pose a threat to the privacy and security of users.
To cope with these problems, user identification is a key technology. By the effective user identification method, real users and false accounts can be distinguished, and social robot users can be identified, so that the safety and the credibility of a social network are improved. Existing social network user identification methods cover a variety of approaches including graph theory-based algorithms, machine learning techniques, crowdsourcing strategies, and the like. Although these methods are each characterized, there are certain limitations. For example, graph theory algorithms are good at discovering social relationships of users in a social network, but an attacker may evade recognition by forging a user's social relationship graph, and in addition, this approach relies on a relationship graph that is static and cannot capture the dynamic evolution of user relationships. The machine learning technology can extract the behavior mode of the social network user from massive user behavior data, so as to train the recognition model and improve the recognition efficiency and accuracy. However, this approach can be affected by data quality, complexity of user behavior, and dynamic changes in the social network. Crowd-sourced strategies have advantages in data accuracy and reliability, but implementation in large-scale networks can raise costly and privacy concerns.
In these various recognition algorithms, extracting effective features is a key to social network user recognition, and the effective features need to be able to accurately capture the behavior patterns of social network users, however, as the behaviors and policies of social network users may evolve continuously, this increases the difficulty of feature engineering.
At present, a graph theory-based method shows a certain effectiveness in social network user identification, but most methods still need to collect various characteristic data of users, so that not only is the complexity of data collection increased, but also most of collected characteristics are often difficult to accurately capture dynamic evolution of user relations in a social network. Therefore, when extracting features capable of accurately identifying social network users, the applicability of the features in different social networks needs to be explored to solve the current challenges.
For example, patent 202010814842.6 discloses a method, apparatus, device, and storage medium for user identification in a social network that rely on graph structure data for analysis and identification. The method utilizes propagation data in a social network to represent connection relations among users, generates graph structure data to represent connection among the users based on the propagation data, then utilizes the graph structure data to determine attribute data of each user, such as node degree, compactness centrality and intermediation centrality, and finally identifies key users and sets corresponding labels according to the attribute data of the users. However, this approach relies primarily on static graph structure data and does not adequately capture dynamic changes in user relationships in social networks. While the behavior and relationships of users are constantly changing in social networks, static graph structures may not reflect these changes, affecting the accuracy of recognition. For another example, patent 202210429087.9 discloses a social network topology based associated user identification method that collects data sets of two social networks, including a user and a friend relationship between users. And respectively constructing topological graphs of the social networks G-1 and G-2 according to the relationship between the user and the friends. Generating a node sequence set by using a random walk method, mapping the node sequence set into a feature space by using a skip-gram model, and learning a low-dimensional vector representation of the node. And obtaining a target feature mapping matrix through training, mapping the two feature spaces to the same space, calculating the similarity between the nodes, and identifying the associated users. However, the method mainly relies on a static topological graph of the social network, and calculates node characteristics and user relevance through a graph structure, and also cannot fully capture dynamic changes of user behaviors and relationships. Furthermore, using a skip-gram model to map a sequence of nodes into feature space and train a target feature mapping matrix, this process typically requires a significant amount of computational resources and training time. Such a need for computing resources may limit the scalability of the method, especially when real-time applications are made in a large-scale social network.
Therefore, how to provide a social network user identification method and device based on multi-dimensional social fingerprints, which can solve the above problems, is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a social network user identification method and device based on multidimensional social fingerprints, which construct more unique and tamper-resistant characteristics to cope with dynamically-changed social network environments, thereby improving the accuracy and effectiveness of user identification.
In order to achieve the above object, the present invention provides the following technical solutions:
a social network user identification method based on multi-dimensional social fingerprints comprises the following steps:
S100, collecting social network data;
s200, constructing a social interaction network based on social network data;
S300, extracting multidimensional social fingerprint characteristics of a user based on a social interaction network;
S400, constructing a multi-dimensional social fingerprint based on the multi-dimensional social fingerprint features;
S500, calculating multi-dimensional social fingerprint distances among users, taking the social fingerprint distance between each unknown user and the known user as a multi-dimensional social fingerprint distance characteristic, classifying and labeling the affiliated user pairs of the multi-dimensional social fingerprint distance characteristic, and generating label sample data of the same user and different users;
S600, constructing XGBoost a recognition model, and inputting the multi-dimensional social fingerprint distance characteristics into the XGBoost recognition model for training and verification;
and S700, carrying out user identification by utilizing the social fingerprint distance characteristics of the unknown user and the known user through the trained XGBoost identification model.
Preferably, S200 includes:
Extracting a one-hop individual interaction graph of a user directly connected with one-hop neighbors of the user and an individual interaction graph of the user, the one-hop neighbors of the user and two-hop neighbors indirectly connected with the user through the one-hop neighbors from the social network data;
And constructing a social interaction network based on the individual interaction graph of the two-hop neighbors.
Preferably, S300 includes:
and extracting multidimensional social fingerprint features between the user and contacts corresponding to the user based on the social interaction network.
Preferably, the multi-dimensional social fingerprint features comprise interaction times, interaction reciprocity, topology overlapping, connection diversity, total number of die bodies, connection age, relative freshness, continuous call proportion in preset time and user activity diversity features.
Preferably, S400 includes:
And arranging the extracted multi-dimensional social fingerprint features in sequence from large to small, calculating and returning the proportion of the value of each contact person under each multi-dimensional social fingerprint feature to the total feature value under the feature, and constructing the multi-dimensional social fingerprint of the user in a preset time interval.
Preferably, S500 includes:
dividing the social network into different periods according to time average, wherein the users in the first period are known users, and the users in other periods are unknown users;
Sequentially calculating social fingerprint distances between each unknown user in other periods and all known users in the first period, and taking the social fingerprint distances as multidimensional social fingerprint distance characteristics;
If the true labels of the known user and the unknown user belong to the same user, marking the multi-dimensional social fingerprint distance characteristic between the two users as 1, and if the two users are not the same user, marking the multi-dimensional social fingerprint distance characteristic as 0, so as to generate label sample data of the same user and different users.
Preferably, S600 includes:
dividing the label sample data into a training set, a verification set and a test set;
Inputting the multi-dimensional social fingerprint distance characteristics among users in the training set and the preset labels in the training set into XGBoost identification models for training;
the optimal parameters of the identification model are determined XGBoost by the validation set, resulting in a final XGBoost identification model.
An electronic device includes one or more processors and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a social network user identification method based on multi-dimensional social fingerprints.
Compared with the prior art, the social network user identification method and device based on the multi-dimensional social fingerprint are disclosed, and the user data on the social network are processed to capture the dynamic multi-dimensional social fingerprint characteristics of the user. And constructing multidimensional social fingerprints in the network, calculating social fingerprint distances among users, and classifying the social fingerprint distances by utilizing the distance characteristic data sets so as to identify the identity of the users. According to the method, the social network user can be accurately identified by only extracting the interaction information in the user network, and the constructed multi-dimensional social fingerprint can effectively reflect the long-term and unique interaction behavior of the user in the social network. The method overcomes the difficulty that the prior social network user identification technology based on static information is difficult to adapt to the dynamic environment, and remarkably improves the time stability of the algorithm. Therefore, the method is more convenient and efficient to apply in a practical system.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a social network user identification method based on multi-dimensional social fingerprints;
FIG. 2 is a flowchart of the multi-dimensional social fingerprint construction provided by the invention;
FIG. 3 is a flowchart of the multi-dimensional social fingerprint feature calculation provided by the invention;
FIG. 4 is a flow chart for constructing social fingerprints based on feature values provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a social network user identification method and equipment based on multidimensional social fingerprints, wherein the method comprises the following steps: the method comprises the steps of collecting social network data and preprocessing, extracting multi-dimensional social fingerprint characteristics of users, constructing multi-dimensional social fingerprints, calculating social fingerprint distances among the users, inputting the multi-dimensional social fingerprint distance characteristics as a data set into a model for training test, and determining whether the users belong to the same person, so that the identity of an unknown user is identified, and a solution is provided for the problem of poor accuracy and effectiveness of the existing user identification.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Referring to fig. 1, on one hand, the embodiment of the invention discloses a social network user identification method based on multidimensional social fingerprints, which comprises the following steps:
And S100, collecting social network data, namely manually collecting interaction data of part of accounts on the social platform.
S200, constructing a social interaction network based on social network data, wherein the social interaction network comprises user nodesWith another user nodeAn interaction is performed, and a slave can be establishedTo the point ofDirected edge of (2). In this network, each directed edge is given a weight whose value is determined by the nodeAnd nodeThe number of interactions between the two is determined.
The method comprises the steps of S300, extracting multidimensional social fingerprint characteristics of a user based on a social interaction network, and specifically comprises the step of extracting interaction times, interaction reciprocity, topology overlapping, connection diversity, die body total number, connection age, relative freshness, continuous conversation proportion within 5 minutes and user activity diversity characteristics between the user and the contact person based on the social interaction network of the user and the contact person.
S400, constructing the multi-dimensional social fingerprint based on the multi-dimensional social fingerprint features.
S500, calculating the distances of multi-dimensional social fingerprints constructed by the features among users, taking the social fingerprint distances of each unknown user and the known user as multi-dimensional social fingerprint distance features, classifying and labeling the pairs of the users belonging to the multi-dimensional social fingerprint distance features, and generating label sample data of the same user and different users.
S600, constructing XGBoost identification models, and inputting multidimensional social fingerprint distance characteristics into XGBoost identification models for training and verification;
and S700, carrying out user identification by utilizing the social fingerprint distance characteristics of the unknown user and the known user through the trained XGBoost identification model.
Specifically, in one embodiment, the gathering of social network data in step S100 includes:
email data sets for a large research facility cover all incoming and outgoing Email communication records between members and their contacts. The dataset is reduced to members, contacts, and time stamps, spanning from 10 months 2003 to 3 months 2005 for a total of 17 months.
In a specific embodiment, the constructing a social interaction network in step S200 includes:
the constructed social interaction network is a directional weighted network in the form of Where V is a set of nodes in the network, E is a set of conjoined edges in the network, typically represented as ordered pairs (e.g.,Representing an edge from node u to node v), W is a set of weights in the network, representing the weight of each edge, typically expressed as a function(Real set), i.e. for each edgeAll have a corresponding weight。
Specifically, the social interaction network rule constructed in the step is that based on the social network interaction data collected in the step S100, each user is abstracted into a node u in the graph, and the interaction relationship existing between the user nodes is converted into edges in the network. Specifically, if the user nodeWith another user nodeInteraction can then build a slaveTo the point ofDirected edge of (2). In this way, all the users collected in step one are converted into nodes of the network, and corresponding directed edges are formed according to the interaction relationship between them. The strength of the interaction relationship also needs to be considered when constructing the directionally weighted network. Specifically, if a user node is at a certain timeWith another user nodeThere are multiple interactions between them, and this directed edge can be given a higher weight. At this time, the weightRepresenting slave user nodesWith another user nodeIs a function of the interaction strength of the device. Finally, a complete social interaction network is constructed, which accurately reflects the interaction relationship between users.
In a specific embodiment, extracting the multi-dimensional social fingerprint construction feature of the user in step S300 includes:
The definition of the interaction times is as follows:
;
Wherein a represents the interaction times initiated by the user u to the user v, and b represents the interaction times initiated by the user v to the user u.
The definition of the interactive reciprocity features is as follows:
;
The definition of the topological overlap feature is:
;
Wherein, Representing the set of neighbors of user u,Representing the set of neighbors of user v,Representing the number of co-neighbors of user u and user v,Representing the total number of neighbors of user u and user v.
The definition of the connection diversity feature is:
;
Wherein, Representing the number of neighbors of user u,Representing the number of neighbors of user v.
Specifically, the definition of the total number of motifs is。Representing the number of ith fourth order motifs for user u, 。
The definition of the contact age characteristics is:
;
Wherein, Representing the last moment of the current period of time,The moment of the first interaction of user u with user v is indicated.
The definition of the relative freshness profile is:
;
Wherein, The time of the last interaction between user u and user v before the end of the current period is indicated.Representing the average inter-event time between interactions.
The continuous call proportion feature within 5 minutes is defined as:
;
Wherein, The time of the nth interaction of user u with user v in the current period is represented.
The definition of the user activity diversity features is:
;
Wherein, Representing the number of connections created by user u,Representing the number of connections created by user v.
The multi-dimensional social fingerprint construction characteristics are calculated according to the formula, and the calculation process is shown in fig. 2. In FIG. 2, (a) is the structure of social network data, (b) extracts a one-hop individual interaction graph (1-IIG) of a user (ego) directly connected with one-hop neighbors thereof and an individual interaction graph (2-IIG) of a user (ego) indirectly connected with one-hop neighbors thereof and two-hop neighbors indirectly connected through one-hop neighbors from the social network data, (c) accurately calculates the strength, time and multi-dimensional social fingerprint characteristics under the structure of the user and the contact persons according to the 1-IIG and the 2-IIG, wherein the multi-dimensional social fingerprint characteristics comprise interaction times, interaction reciprocity, topological overlapping, connection diversity, die body total number, connection age, relative freshness, continuous call proportion within 5 minutes and user activity diversity characteristics, and (d) respectively arranges characteristic values under the characteristics according to the order of the specific gravity from large to small, thereby constructing the multi-dimensional social fingerprint of the user. The specific calculation flow of the sub-graph (c) in fig. 2 is shown in fig. 3, and the specific construction flow of the sub-graph (d) is shown in fig. 4.
In a specific embodiment, constructing the multi-dimensional social fingerprint in step S400 includes:
User set U in social interaction network, which comprises n users, namely . Meanwhile, a set of time intervals T is assumed, which includes m time intervals, i.e. For each userAnd each time intervalA feature vector can be extracted from the dataRepresenting the userIn the time intervalBehavior characteristics of (2). Will userAccording to feature vectorsThe values of the contact persons are arranged from large to small, and the proportion of the value of each contact person to the total value is returned, thereby constructing the userIn the time intervalSocial fingerprint of (a)。
In a specific embodiment, calculating the social fingerprint distance between users in step S500 includes:
Jensen-Shannon divergence (JSD) was used as a tool to measure social fingerprint distance. Calculating self-distance between social fingerprints of users at different periods And the reference distance of all users from other users in different periods。
Wherein, social fingerprint distance is defined as:
;
Wherein, ,For a social fingerprint of two users,Is shannon entropy, k is the number of digits of the last contact after sorting, and is equal in value to the total number of contacts.For the feature value of the contacts ranked r, X is the sum of the feature values of all contacts.
Specifically, the user is first calculatedSelf-distance between social fingerprints in different time intervals. Second, calculate the userReference distance between social fingerprints within a certain time interval and social fingerprints of other usersWherein. The embodiment isLabeled 1, representing the same user. Is thatLabeling with 0 represents different users.
In a specific embodiment, the step S600 of inputting the multi-dimensional social fingerprint distance characteristic data into the XGBoost model for training and testing includes:
And extracting multi-dimensional social fingerprint distance features by using sample data to form a feature data set, and dividing the data set into a training set, a verification set and a test set, wherein the ratio of the training set to the verification set is 9:1. And inputting the characteristic data in the training set and the preset labels in the training set into the XGBoost model, and verifying the optimal parameters of the training set. The test set is used to determine the accuracy of the generated XGBoost recognition model. And finally, accurately identifying the user identity according to the prediction result of the model.
Specifically, in this embodiment, the social fingerprint distance calculated by each detection user pair is used as the feature value. For an unknown user, calculating the social fingerprint distance between the unknown user and each known user in the known user set, so as to accurately acquire the identity of the unknown user.
More specifically, the method extracts multidimensional social fingerprint characteristics from the constructed social interaction network, calculates the social fingerprint distance between an unknown user and a known user, and inputs the social fingerprint distance as characteristics into a XGBoost model for user identification. The multidimensional social fingerprint is formed based on a certain period of time, covers the strength, the structure and the time characteristics in the network, and provides clues for understanding the overall structure evolution of the network. The network analysis is carried out through the multi-dimensional social fingerprint, so that finer granularity can be obtained, the structure and the dynamics of the network can be grasped more accurately, and the problem that the dynamic network environment is difficult to deal with in the existing social network user identification technology based on static information is solved. In addition, the selection and extraction process of the multi-dimensional social fingerprint features is generally closely related to specific tasks and data sets, so that the selection and extraction process can be optimized and adjusted according to different application scenes. This flexibility allows the algorithm to adapt well to different data and tasks, thereby improving its scalability.
On the other hand, the embodiment of the invention also discloses electronic equipment, which comprises one or more processors and a storage device, wherein the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are caused to execute a social network user identification method based on the multi-dimensional social fingerprint.
Specifically, through the technical scheme, in the social interaction network constructed based on Email data, the extracted multidimensional social fingerprint features can accurately capture the interaction information in the network. The information fully reflects interaction and relation among the nodes, so that a unique interaction mode of the social network user can be accurately identified, and the accuracy of identifying the social network user is improved. And due to the flexibility and universality of the multidimensional social fingerprint characteristics, the user can be identified excellently when the method is migrated to other social platform data. The social network user identification accuracy is improved, so that the method can be used for identifying the social robots in the network, and the malicious robots can be more effectively prevented from attacking, propagating false information or conducting fraudulent conduct on the social network platform, and the safety of the social network is improved.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411441385.5A CN118964962B (en) | 2024-10-16 | 2024-10-16 | A social network user identification method and device based on multi-dimensional social fingerprint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411441385.5A CN118964962B (en) | 2024-10-16 | 2024-10-16 | A social network user identification method and device based on multi-dimensional social fingerprint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118964962A CN118964962A (en) | 2024-11-15 |
CN118964962B true CN118964962B (en) | 2025-03-18 |
Family
ID=93388966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411441385.5A Active CN118964962B (en) | 2024-10-16 | 2024-10-16 | A social network user identification method and device based on multi-dimensional social fingerprint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118964962B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259962A (en) * | 2020-01-17 | 2020-06-09 | 中南大学 | A Sybil Account Detection Method for Time Series Social Data |
CN117708711A (en) * | 2023-12-09 | 2024-03-15 | 西安交通大学 | Method and system for identifying social robot users on Weibo platform based on multi-dimensional features |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505307B (en) * | 2021-09-06 | 2021-12-07 | 南京航空航天大学 | Social network user region identification method based on weak supervision enhancement |
CN114782209B (en) * | 2022-04-22 | 2024-06-11 | 武汉大学 | Social network topological graph-based associated user identity recognition method |
CN118606630A (en) * | 2024-06-19 | 2024-09-06 | 北京师范大学珠海校区 | A method, device and storage medium for identifying gender of social network users based on heterogeneous motif features |
-
2024
- 2024-10-16 CN CN202411441385.5A patent/CN118964962B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259962A (en) * | 2020-01-17 | 2020-06-09 | 中南大学 | A Sybil Account Detection Method for Time Series Social Data |
CN117708711A (en) * | 2023-12-09 | 2024-03-15 | 西安交通大学 | Method and system for identifying social robot users on Weibo platform based on multi-dimensional features |
Also Published As
Publication number | Publication date |
---|---|
CN118964962A (en) | 2024-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111614690B (en) | Abnormal behavior detection method and device | |
KR101898648B1 (en) | Apparatus and Method for Detecting Interacting Groups Between Individuals in an Image | |
CN113422761B (en) | Malicious social user detection method based on counterstudy | |
Wanda et al. | DeepOSN: Bringing deep learning as malicious detection scheme in online social network | |
CN111008337A (en) | Deep attention rumor identification method and device based on ternary characteristics | |
CN107679626A (en) | Machine learning method, device, system, storage medium and equipment | |
Khlobystova et al. | Soft estimates for social engineering attack propagation probabilities depending on interaction rates among instagram users | |
Gautam et al. | Adaptive discretization using golden section to aid outlier detection for software development effort estimation | |
Sheng et al. | Network traffic anomaly detection method based on chaotic neural network | |
CN110263106B (en) | Collaborative public opinion fraud detection method and device | |
Wang et al. | Unsupervised outlier detection for mixed-valued dataset based on the adaptive k-nearest neighbor global network | |
CN116662982A (en) | Fraud detection method and device based on associated fraud perception | |
CN110097120B (en) | Network flow data classification method, equipment and computer storage medium | |
CN108364098A (en) | A kind of weather characteristics register on user the measure of influence | |
Malmir et al. | Optimization of data mining with evolutionary algorithms for cloud computing application | |
CN118262276B (en) | Method and device for detecting counterfeiting of video, electronic equipment and storage medium | |
CN118964962B (en) | A social network user identification method and device based on multi-dimensional social fingerprint | |
CN118761888A (en) | Smart city service platform, method and equipment based on cloud computing and big data | |
CN106815653B (en) | Distance game-based social network relationship prediction method and system | |
CN118606630A (en) | A method, device and storage medium for identifying gender of social network users based on heterogeneous motif features | |
CN113407808A (en) | Method and device for judging applicability of graph neural network model and computer equipment | |
Folch et al. | Web bot detection using mouse movement | |
CN115631622B (en) | Traffic state prediction method, device, electronic device and storage medium | |
CN111860598A (en) | A data analysis method and electronic device for motion behavior and relationship recognition | |
CN112966732B (en) | Multi-factor interactive behavior anomaly detection method with periodic attribute |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |