CN108304482A - The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing - Google Patents
The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN108304482A CN108304482A CN201711478995.2A CN201711478995A CN108304482A CN 108304482 A CN108304482 A CN 108304482A CN 201711478995 A CN201711478995 A CN 201711478995A CN 108304482 A CN108304482 A CN 108304482A
- Authority
- CN
- China
- Prior art keywords
- community
- user
- relationship network
- vertex
- broker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 238000010276 construction Methods 0.000 claims description 6
- 238000000638 solvent extraction Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000006399 behavior Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2237—Vectors, bitmaps or matrices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention provides a kind of recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing.This method includes:User in preset time is obtained to post the entity information of daily record and each daily record, customer relationship network is built according to the entity information of all daily records, customer relationship network is made of vertex table and Bian Biao, vertex table is the set on vertex, set when table is, each entity information is a vertex, incidence relation of the side between user identifier and other entity informations;Customer relationship network is divided using community discovery algorithm, obtains community discovery as a result, community discovery result is community's mark on each vertex in customer relationship network;The broker in the publication user of all daily records is identified according to preset rules and community discovery result.To improve recognition accuracy and recognition efficiency.
Description
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a broker identification method and device, electronic equipment and a readable storage medium.
Background
At present, house renting information, second-hand house information, second-hand car information and the like are published on various webpages or related application programs (APPs), and the house renting information, the second-hand house information and the like can be published privately or by brokers (namely agents).
Taking house rental information as an example, how to identify a publishing user of the house rental information from published house rental information is a broker, an existing identification method of the broker is as follows: and if the number of the house sources issued by a user is greater than a preset threshold value, the user issues house renting information in more than N areas, and N is a preset value, the user is judged to be a broker. However, users are required to fill out user identities when they publish house rental information, there are some private users who randomly fill out identities that result in inaccurate information, and some brokers intentionally hide broker identities to attract traffic.
Therefore, according to the identification method, a part of brokers who issue house resources using a plurality of accounts are omitted, and meanwhile, the set value N of the house resource distribution area is difficult to set reasonably, so that the identification accuracy of the brokers is not high.
Disclosure of Invention
The embodiment of the invention provides a broker identification method and device, electronic equipment and a readable storage medium, so as to improve the accuracy of broker identification.
In a first aspect, an embodiment of the present invention provides an identification method for a broker, including:
acquiring user posting logs in preset time and entity information of each log;
constructing a user relationship network according to entity information of all logs, wherein the user relationship network is composed of a vertex table and an edge table, the vertex table is a set of vertexes, the edge table is a set of edges, each entity information is a vertex, and the edges are incidence relations between user identifications and other entity information;
dividing a user relationship network by using a community discovery algorithm to obtain community discovery results, wherein the community discovery results are community identifications of each vertex in the user relationship network;
and identifying brokers in the publishing users of all logs according to preset rules and community discovery results.
Optionally, the constructing a user relationship network according to the entity information of all logs includes:
determining a vertex table and an edge table according to the entity information of all logs, and storing the vertex table and the edge table into the HDFS;
and taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
Optionally, the dividing the user relationship network by using a community discovery algorithm to obtain a community discovery result includes:
and taking the user relationship network as input, and running a community discovery algorithm on Spark GraphX to obtain a community discovery result.
Optionally, the identifying brokers among the publishing users of all logs according to the preset rule and the community discovery result includes:
determining a target community meeting preset conditions according to a community discovery result;
if the number of the users publishing the property information in the target community is larger than N, the users publishing the property information in the target community are judged to be the brokers, and N is a preset positive integer.
Optionally, the determining a target community that meets a preset condition includes:
counting the number of users belonging to the same community, wherein each user has a community identifier, and the users with the same community identifier belong to the same community;
determining the community with the number of users belonging to the same community being larger than a first preset threshold value as the target community;
or,
counting the sum of the number of house sources released by all users belonging to the same community;
and determining the community, of which the sum of the house source numbers issued by all users belonging to the same community is greater than a second preset threshold value, as the target community.
In a second aspect, an embodiment of the present invention provides an identification apparatus for a broker, including:
the acquisition module is used for acquiring user posting logs in preset time and entity information of each log;
the system comprises a construction module, a storage module and a processing module, wherein the construction module is used for constructing a user relationship network according to entity information of all logs, the user relationship network is composed of a vertex table and an edge table, the vertex table is a set of vertexes, the edge table is a set of edges, each entity information is a vertex, and the edges are incidence relations between user identification and other entity information;
the system comprises a dividing module, a searching module and a judging module, wherein the dividing module is used for dividing a user relationship network by using a community discovery algorithm to obtain a community discovery result, and the community discovery result is a community identifier of each vertex in the user relationship network;
and the identification module is used for identifying brokers in the publishing users of all logs according to the preset rules and the community discovery results.
Optionally, the building module is configured to:
determining a vertex table and an edge table according to the entity information of all logs, and storing the vertex table and the edge table into the HDFS;
and taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
Optionally, the dividing module is configured to:
and taking the user relationship network as input, and running a community discovery algorithm on Spark GraphX to obtain a community discovery result.
Optionally, the identification module includes:
the determining unit is used for determining a target community meeting preset conditions according to a community discovery result;
and the judging unit is used for judging that the user publishing the house property information in the target community is a broker when the number of the users publishing the house property information in the target community is greater than N, wherein N is a preset positive integer.
Optionally, the determining unit is configured to:
counting the number of users belonging to the same community, wherein each user has a community identifier, and the users with the same community identifier belong to the same community;
determining the community with the number of users belonging to the same community being larger than a first preset threshold value as the target community;
or,
counting the sum of the number of house sources released by all users belonging to the same community;
and determining the community, of which the sum of the house source numbers issued by all users belonging to the same community is greater than a second preset threshold value, as the target community.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
a memory for storing program instructions;
a processor for invoking and executing program instructions in the memory to implement the broker's identification method of the first aspect.
In a fourth aspect, embodiments of the invention provide a readable storage medium having stored therein a computer program which, when executed by at least one processor of a broker's identification means, causes the broker's identification means to perform the broker's identification method of the first aspect.
In a fifth aspect, an embodiment of the present invention provides a program product, which includes a computer program, and the computer program is stored in a readable storage medium. The computer program may be readable from a readable storage medium by at least one processor of an identification appliance of a broker, execution of which by the at least one processor causes the identification appliance of the broker to implement the identification method of a broker of the first aspect.
The broker identification method and apparatus, the electronic device, and the readable storage medium provided by the embodiments of the present invention, by obtaining user posting logs within a preset time and entity information of each log, integrating user entity information of multiple dimensions such as user identification, telephone number and electronic equipment used for sending logs, modeling a user relation network by using entity information (a vertex table) and an incidence relation (a side table) between the user identification and the entity information, can utilize data of a plurality of dimensions to analyze, and finally use a community discovery algorithm to divide the user relationship network to obtain community discovery results, and then identify brokers among all logged publishing users according to preset rules and community discovery results, the broker who issues the house source with one person and multiple accounts can be accurately found, and the identification accuracy and the identification efficiency are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of an identification method for a broker according to an embodiment of the present invention;
FIG. 2 is a flow chart of another broker identification method according to an embodiment of the present invention;
FIG. 3 is a flow chart of another broker identification method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an identification apparatus of a broker according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an identification device of another broker according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an identification apparatus of another broker according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention without any creative efforts shall fall within the protection scope of the embodiments of the present invention.
The existing identification method of the broker judges that a user is the broker by judging that the number of house sources issued by the user is larger than a preset threshold and the user issues house renting information in more than N areas, but the identification method can omit a part of brokers issuing house sources by using a plurality of accounts, and meanwhile, the set value N of a house source distribution area is difficult to reasonably set, and the identification accuracy is not high. In order to improve the accuracy of broker identification, a user relationship network is established according to entity information of each log by obtaining user posting logs in a preset time period and entity information of each log, the user relationship network is divided by using spark graph X and a community discovery algorithm to obtain community discovery results, brokers in publishing users of all logs are identified according to preset rules and the community discovery results, one-man multi-account-number broker is accurately discovered, and the identification accuracy and the identification efficiency are improved. The technical solution of the present application is described in detail below with reference to the accompanying drawings.
First, some terms in the embodiments of the present invention are explained below to facilitate understanding by those skilled in the art.
1. Community discovery: the community reflects the local characteristics of individual behaviors in the network and the correlation relationship between the individual behaviors, and the research on the community in the network plays a crucial role in understanding the structure and the function of the whole network and can help to analyze and predict the interaction relationship among elements of the whole network.
2. Fast-unfolding algorithm: the community discovery algorithm is based on the modularity optimization theory, input graph data are divided into a large number of communities, users in the same community are closely related to each other, each node in the graph is endowed with an Identifier (ID), and the same ID indicates that the nodes are in the same community.
3. Spark graphX: the Spark graph X is a distributed graph processing framework, provides a simple and easy-to-use and colorful interface for graph calculation and graph mining based on a Spark platform, and greatly facilitates the requirement for processing the distributed graph. As is known, there are many relationship chains between people in social networks, such as Twitter, Facebook, microblog and WeChat, which are places where big data are generated, graph computation is required, the current graph processing is basically distributed graph processing rather than stand-alone processing, and Spark GraphX is naturally a distributed graph processing system because the bottom layer is processed based on Spark. The distributed or parallel processing of the graph is realized by splitting the graph into a plurality of sub-graphs, then respectively calculating the sub-graphs, and respectively carrying out iteration staged calculation during calculation, namely carrying out parallel calculation on the graph.
Fig. 1 is a flowchart of an identification method of a broker according to an embodiment of the present invention, where an execution subject of this embodiment may be any device having a function of executing the identification method of the broker, and optionally, the device may be a processor, as shown in fig. 1, the method of this embodiment may include:
s101, obtaining user posting logs in preset time and entity information of each log.
Specifically, the preset time is, for example, one month, three months, or half a year. The entity information includes, for example, a user identifier, a phone number, and an electronic device identifier used for sending a log, where the user identifier, the phone number, and the electronic device identifier used for sending the log are all entity information, and the entity information may also be other information related to the user, such as payment information, where the payment information is payment using a WeChat, Paibao, or a bank card. The user identification is used for representing the user identity, and the electronic equipment used for sending the log comprises mobile phones, computers, handheld computers and other electronic equipment.
S102, constructing a user relation network according to the entity information of all logs, wherein the user relation network is composed of a vertex table and an edge table, the vertex table is a set of vertexes, the edge table is a set of edges, each entity information is a vertex, and the edges are incidence relations between the user identification and other entity information.
Specifically, each log corresponds to entity information such as a user identifier, a telephone number and electronic equipment used for sending the log, each entity information is a vertex, an association relationship between the user identifier and other entity information is a side, and a user relationship network is constructed according to a vertex table and a side table according to a side set (i.e. a side table) and a vertex set (i.e. a vertex table) obtained by the entity information of all logs.
S103, dividing the user relationship network by using a community discovery algorithm to obtain community discovery results, wherein the community discovery results are community identifications of each vertex in the user relationship network.
In the embodiment, a user relationship network is input graph data, the user relationship network is output as a plurality of divided communities and community discovery results, users in the same community are closely related to each other, the Fast-unfolding algorithm assigns a community Identification (ID) to each vertex in the user relationship network, the community discovery results are community identifications of each vertex in the user relationship network, and users with the same ID are in the same community.
Optionally, after the community discovery result is obtained, the community discovery result is stored in a distributed File System (HDFS), and the community discovery result can be persisted by storing in the HDFS, so that subsequent analysis and processing are facilitated.
And S104, identifying brokers in the publishing users of all logs according to preset rules and community discovery results.
Specifically, the preset rule may be set according to actual needs, and in this embodiment, S104 may specifically include:
and S1041, determining a target community meeting preset conditions according to a community discovery result.
Optionally, two implementable modes are provided for determining the target community meeting the preset condition according to the community discovery result:
the method comprises the steps of counting the number of users belonging to the same community, wherein each user has a community identifier, the users with the same community identifier belong to the same community, and determining the community with the number of the users belonging to the same community larger than a first preset threshold value as the target community. For example, if the first preset threshold is 5, and the counted number of users belonging to the same community is 7, the community is the target community.
As another implementable manner, the sum of the number of the house sources released by all the users belonging to the same community is counted, and the community in which the sum of the number of the house sources released by all the users belonging to the same community is greater than a second preset threshold value is determined as the target community. For example, the user a, the user B, and the user C belong to the same community, the number of the house sources released by the user a is 3, the number of the house sources released by the user B is 5, the number of the house sources released by the user C is 5, the sum of the number of the house sources of the user a, the number of the house sources of the user B, and the number of the house sources of the user C is 3+5+5 — 13, and the second preset threshold is 10, then the community is the target community.
S1042, if the number of the users publishing the property information in the target community is larger than N, determining that the users publishing the property information in the target community are brokers, and N is a preset positive integer.
The broker identification method provided in this embodiment includes obtaining user posting logs within a preset time and entity information of each log, constructing a user relationship network according to the entity information of all logs, where the user relationship network is composed of a vertex table and an edge table, each entity information is a vertex, and each entity information is an association between a user identifier and other entity information, dividing the user relationship network by using a community discovery algorithm to obtain community discovery results, and finally identifying brokers in publishing users of all logs according to preset rules and the community discovery results. By integrating user entity information of multiple dimensions such as electronic equipment used by user identification, telephone numbers and journal sending, modeling a user relationship network by using entity information (a vertex table) and an incidence relation (a side table) between the user identification and the entity information, analyzing the user relationship network by using the data of the multiple dimensions, dividing the user relationship network by using a community discovery algorithm to obtain community discovery results, identifying brokers in all journal issuing users according to preset rules and the community discovery results, accurately discovering the brokers of one person and multiple account issuing house sources, and improving the identification accuracy and the identification efficiency.
Fig. 2 is a flowchart of another broker identification method according to an embodiment of the present invention, and as shown in fig. 2, the method according to this embodiment may include:
s201, obtaining user posting logs in preset time and entity information of each log.
Specifically, the preset time is, for example, one month, three months, or half a year. The user identification, the telephone number and the electronic equipment identification used for sending the log are all entity information, the entity information can also be other information related to the user, the user identification is used for representing the identity of the user, and the electronic equipment used for sending the log comprises electronic equipment such as a mobile phone, a computer, a handheld computer and the like.
S202, determining a vertex table and an edge table according to the entity information of all logs, and storing the vertex table and the edge table into the HDFS.
The vertex table is a set of vertices, the edge table is a set of edges, each entity information is a vertex, and an edge is an association relation between the user identifier and other entity information. And the vertex table and the edge table occupy larger memory, and the vertex table and the edge table are stored into the HDFS for subsequent processing.
And S203, taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
And S204, operating a community discovery algorithm on Spark GraphX by taking the user relationship network as input to obtain a community discovery result, wherein the community discovery result is a community identifier of each vertex in the user relationship network.
S205, determining a target community meeting preset conditions according to a community discovery result.
Specifically, there are two possible ways to implement:
counting the number of users belonging to the same community, wherein each user has a community identifier, the users with the same community identifier belong to the same community, and the community with the number of the users belonging to the same community being larger than a first preset threshold value is determined as a target community.
And secondly, counting the sum of the number of the house resources released by all the users belonging to the same community, and determining the community as the target community, wherein the sum of the number of the house resources released by all the users belonging to the same community is greater than a second preset threshold value.
S206, if the number of the users publishing the property information in the target community is larger than N, determining that the users publishing the property information in the target community are brokers, and N is a preset positive integer.
In the broker identification method provided by this embodiment, a vertex table and an edge table are determined according to entity information of all logs by obtaining user posting logs and entity information of each log within a preset time, a user relationship network is constructed by using the vertex table and the edge table as input through Spark graph x, a community discovery algorithm is run on the Spark graph x to divide the user relationship network, a community discovery result is obtained, a target community meeting a preset condition is determined, and if the number of users publishing house property information in the target community is greater than N, the user publishing house property information in the target community is determined to be a broker. By integrating user entity information of multiple dimensions such as electronic equipment used for user identification, telephone numbers and logs, modeling a user relationship network by using a vertex table and an edge table, analyzing data of the multiple dimensions, and finally dividing the user relationship network by using a community discovery algorithm to obtain community discovery results, so that a target community meeting preset conditions is determined according to the community discovery results, broker users are identified from the target community, a broker for one person and multiple accounts publishing house feeds can be accurately discovered, and the identification accuracy and the identification efficiency are improved.
The following describes the technical solution of the embodiment of the method shown in fig. 1 and 2 in detail by using a specific embodiment.
Fig. 3 is a flowchart of another broker identification method according to an embodiment of the present invention, and as shown in fig. 3, the method according to this embodiment may include:
s301, obtaining a user posting log in a preset time and entity information of each log, wherein the entity information comprises a user identifier, a telephone number and an electronic device identifier used for posting.
S302, determining a vertex table and an edge table according to the entity information of all logs.
S303, storing the vertex table and the edge table into the HDFS.
S304, taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
S305, taking the user relationship network as input, and running a community discovery algorithm on Spark GraphX to obtain a community discovery result, wherein the community discovery result is a community identifier of each vertex in the user relationship network.
S306, storing the community discovery result into the HDFS.
And S307, determining a target community meeting preset conditions according to the community discovery result.
Specifically, there are two possible ways to implement:
counting the number of users belonging to the same community, wherein each user has a community identifier, the users with the same community identifier belong to the same community, and the community with the number of the users belonging to the same community being larger than a first preset threshold value is determined as a target community.
And secondly, counting the sum of the number of the house resources released by all the users belonging to the same community, and determining the community as the target community, wherein the sum of the number of the house resources released by all the users belonging to the same community is greater than a second preset threshold value.
S308, if the number of the users publishing the property information in the target community is larger than N, the users publishing the property information in the target community are judged to be the brokers, and N is a preset positive integer.
Fig. 4 is a schematic structural diagram of an identification apparatus for a broker according to an embodiment of the present invention, as shown in fig. 4, the apparatus according to this embodiment may include: an acquisition module 11, a construction module 12, a division module 13 and an identification module 14, wherein,
the obtaining module 11 is configured to obtain user posting logs within a preset time and entity information of each log.
The construction module 12 is configured to construct a user relationship network according to the entity information of all logs, where the user relationship network is composed of a vertex table and an edge table, the vertex table is a set of vertices, the edge table is a set of edges, each entity information is a vertex, and an edge is an association relationship between a user identifier and other entity information.
The dividing module 13 is configured to divide the user relationship network by using a community discovery algorithm to obtain a community discovery result, where the community discovery result is a community identifier of each vertex in the user relationship network.
The identification module 14 is configured to identify brokers among publishing users of all logs according to preset rules and community discovery results.
Optionally, the building block 12 is configured to: determining a vertex table and an edge table according to the entity information of all logs, and storing the vertex table and the edge table into the HDFS; and taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
Optionally, the dividing module 13 is configured to: and taking the user relationship network as input, and running a community discovery algorithm on Spark graph X to obtain a community discovery result.
The apparatus of this embodiment may be configured to execute the technical solutions of the method embodiments shown in fig. 1 or fig. 2, and the implementation principles thereof are similar and will not be described herein again.
The broker recognition apparatus provided in this embodiment integrates user entity information of multiple dimensions, such as user identifiers, phone numbers, and electronic devices used for posting, by obtaining user posting logs and entity information of each log within a preset time, models a user relationship network by using entity information (vertex table) and an association relationship (side table) between the user identifiers and the entity information, can analyze data of multiple dimensions, and finally divides the user relationship network by using a community discovery algorithm to obtain community discovery results, thereby recognizing brokers among publishing users of all logs according to preset rules and the community discovery results, accurately discovering brokers of one-person multi-account publishing houses, and improving recognition accuracy and recognition efficiency.
Fig. 5 is a schematic structural diagram of another identification apparatus for a broker according to an embodiment of the present invention, as shown in fig. 5, in the apparatus according to this embodiment, based on the apparatus shown in fig. 4, further, the identification module 14 includes: the determining unit 141 is used for determining a target community meeting preset conditions according to a community discovery result; the determining unit 142 is configured to determine that the user publishing the property information in the target community is a broker when the number of the users publishing the property information in the target community is greater than N, where N is a preset positive integer.
Further, the determining unit 141 is configured to: counting the number of users belonging to the same community, wherein each user has a community identifier, the users with the same community identifier belong to the same community, and determining the community with the number of the users belonging to the same community larger than a first preset threshold value as a target community.
Or counting the sum of the number of the house sources released by all the users belonging to the same community, and determining the community as the target community, wherein the sum of the number of the house sources released by all the users belonging to the same community is greater than a second preset threshold value.
The apparatus of this embodiment may be configured to execute the technical solutions of the method embodiments shown in fig. 1 or fig. 2, and the implementation principles thereof are similar and will not be described herein again.
The broker recognition device provided by this embodiment determines a vertex table and an edge table according to entity information of all logs by acquiring user posting logs and entity information of each log within a preset time, constructs a user relationship network through Spark graph x by taking the vertex table and the edge table as input, and divides the user relationship network by running a community discovery algorithm on the Spark graph x to obtain a community discovery result, and finally determines a target community meeting a preset condition, if the number of users publishing house property information in the target community is greater than N, the user publishing house property information in the target community is determined to be the broker. By integrating user entity information of multiple dimensions such as electronic equipment used for user identification, telephone numbers and logs, modeling a user relationship network by using a vertex table and an edge table, analyzing data of the multiple dimensions, and finally dividing the user relationship network by using a community discovery algorithm to obtain community discovery results, so that a target community meeting preset conditions is determined according to the community discovery results, broker users are identified from the target community, a broker for one person and multiple accounts publishing house feeds can be accurately discovered, and the identification accuracy and the identification efficiency are improved.
In the embodiment of the present invention, the identification apparatus of the broker may be divided into function modules according to the above method, for example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiments of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device according to the embodiment may include: a memory 21 and a processor 22, which,
a memory 21 for storing program instructions, which may be a flash (flash memory).
A processor 22 for invoking and executing program instructions in memory to implement the steps in the broker identification method shown in fig. 1 or fig. 2. Reference may be made in particular to the description relating to the preceding method embodiment.
Alternatively, the memory 21 may be independent, or the memory 21 may be integrated with the processor 22.
Embodiments of the present invention further provide a readable storage medium, in which a computer program is stored, and when the computer program is executed by at least one processor of an identification device of a broker, the identification device of the broker performs the identification method of the broker in the above-described method embodiments.
An embodiment of the present invention also provides a program product including a computer program, where the computer program is stored in a readable storage medium. The computer program may be read from a readable storage medium by at least one processor of the identification appliance of the broker, the execution of which by the at least one processor causes the identification appliance of the broker to implement the identification method of the broker in the above-described method embodiments.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present invention, and are not limited thereto; although embodiments of the present invention have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (12)
1. A broker identification method, comprising:
acquiring user posting logs in preset time and entity information of each log;
constructing a user relationship network according to entity information of all logs, wherein the user relationship network is composed of a vertex table and an edge table, the vertex table is a set of vertexes, the edge table is a set of edges, each entity information is a vertex, and the edges are incidence relations between user identifications and other entity information;
dividing a user relationship network by using a community discovery algorithm to obtain community discovery results, wherein the community discovery results are community identifications of each vertex in the user relationship network;
and identifying brokers in the publishing users of all logs according to preset rules and community discovery results.
2. The method of claim 1, wherein constructing the user relationship network according to the entity information of all logs comprises:
determining a vertex table and an edge table according to the entity information of all logs, and storing the vertex table and the edge table into the HDFS;
and taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
3. The method according to claim 1 or 2, wherein the dividing the user relationship network by using the community discovery algorithm to obtain the community discovery result comprises:
and taking the user relationship network as input, and running a community discovery algorithm on Spark GraphX to obtain a community discovery result.
4. The method as claimed in claim 1 or 2, wherein the identifying brokers among all logged publishing users according to preset rules and community discovery results comprises:
determining a target community meeting preset conditions according to a community discovery result;
if the number of the users publishing the property information in the target community is larger than N, the users publishing the property information in the target community are judged to be the brokers, and N is a preset positive integer.
5. The method according to claim 4, wherein the determining a target community meeting a preset condition according to the community discovery result comprises:
counting the number of users belonging to the same community, wherein each user has a community identifier, and the users with the same community identifier belong to the same community;
determining the community with the number of users belonging to the same community being larger than a first preset threshold value as the target community;
or,
counting the sum of the number of house sources released by all users belonging to the same community;
and determining the community, of which the sum of the house source numbers issued by all users belonging to the same community is greater than a second preset threshold value, as the target community.
6. An identification arrangement for a broker, comprising:
the acquisition module is used for acquiring user posting logs in preset time and entity information of each log;
the system comprises a construction module, a storage module and a processing module, wherein the construction module is used for constructing a user relationship network according to entity information of all logs, the user relationship network is composed of a vertex table and an edge table, the vertex table is a set of vertexes, the edge table is a set of edges, each entity information is a vertex, and the edges are incidence relations between user identification and other entity information;
the system comprises a dividing module, a searching module and a judging module, wherein the dividing module is used for dividing a user relationship network by using a community discovery algorithm to obtain a community discovery result, and the community discovery result is a community identifier of each vertex in the user relationship network;
and the identification module is used for identifying brokers in the publishing users of all logs according to the preset rules and the community discovery results.
7. The apparatus of claim 6, wherein the build module is configured to:
determining a vertex table and an edge table according to the entity information of all logs, and storing the vertex table and the edge table into the HDFS;
and taking the vertex table and the edge table as input, constructing a user relationship network through Spark GraphX, and loading the user relationship network into a memory in a graph form.
8. The apparatus of claim 6 or 7, wherein the partitioning module is configured to:
and taking the user relationship network as input, and running a community discovery algorithm on Spark GraphX to obtain a community discovery result.
9. The apparatus of claim 6 or 7, wherein the identification module comprises:
the determining unit is used for determining a target community meeting preset conditions according to a community discovery result;
and the judging unit is used for judging that the user publishing the house property information in the target community is a broker when the number of the users publishing the house property information in the target community is greater than N, wherein N is a preset positive integer.
10. The apparatus of claim 9, wherein the determining unit is configured to:
counting the number of users belonging to the same community, wherein each user has a community identifier, and the users with the same community identifier belong to the same community;
determining the community with the number of users belonging to the same community being larger than a first preset threshold value as the target community;
or,
counting the sum of the number of house sources released by all users belonging to the same community;
and determining the community, of which the sum of the house source numbers issued by all users belonging to the same community is greater than a second preset threshold value, as the target community.
11. An electronic device, comprising:
a memory for storing program instructions;
a processor for invoking and executing program instructions in the memory to implement the broker identification method of any of claims 1-5.
12. A readable storage medium having stored therein a computer program which, when executed by at least one processor of a broker's identification means, causes the broker's identification means to perform the broker's identification method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711478995.2A CN108304482A (en) | 2017-12-29 | 2017-12-29 | The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711478995.2A CN108304482A (en) | 2017-12-29 | 2017-12-29 | The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108304482A true CN108304482A (en) | 2018-07-20 |
Family
ID=62868233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711478995.2A Pending CN108304482A (en) | 2017-12-29 | 2017-12-29 | The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108304482A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222484A (en) * | 2019-04-28 | 2019-09-10 | 五八有限公司 | A kind of method for identifying ID, device, electronic equipment and storage medium |
CN110633381A (en) * | 2018-12-25 | 2019-12-31 | 北京时光荏苒科技有限公司 | Method and device for identifying false house source, storage medium and electronic equipment |
CN110990727A (en) * | 2019-11-01 | 2020-04-10 | 贝壳技术有限公司 | Broker information display method, device, storage medium and equipment |
CN111209512A (en) * | 2020-01-03 | 2020-05-29 | 北京同邦卓益科技有限公司 | User identification method, device and equipment |
CN112101390A (en) * | 2019-05-29 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Attribute information determination method, attribute information determination device and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103731284A (en) * | 2012-10-11 | 2014-04-16 | 腾讯科技(深圳)有限公司 | Method and system for correlating a plurality of network accounts |
CN106874931A (en) * | 2016-12-30 | 2017-06-20 | 东软集团股份有限公司 | User portrait grouping method and device |
CN106960143A (en) * | 2017-03-23 | 2017-07-18 | 网易(杭州)网络有限公司 | The recognition methods of user account and device, storage medium, electronic equipment |
CN107404408A (en) * | 2017-08-30 | 2017-11-28 | 北京邮电大学 | A kind of virtual identity association recognition methods and device |
CN107438050A (en) * | 2016-05-26 | 2017-12-05 | 北京京东尚科信息技术有限公司 | Identify the method and system of the potential malicious user of website |
-
2017
- 2017-12-29 CN CN201711478995.2A patent/CN108304482A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103731284A (en) * | 2012-10-11 | 2014-04-16 | 腾讯科技(深圳)有限公司 | Method and system for correlating a plurality of network accounts |
CN107438050A (en) * | 2016-05-26 | 2017-12-05 | 北京京东尚科信息技术有限公司 | Identify the method and system of the potential malicious user of website |
CN106874931A (en) * | 2016-12-30 | 2017-06-20 | 东软集团股份有限公司 | User portrait grouping method and device |
CN106960143A (en) * | 2017-03-23 | 2017-07-18 | 网易(杭州)网络有限公司 | The recognition methods of user account and device, storage medium, electronic equipment |
CN107404408A (en) * | 2017-08-30 | 2017-11-28 | 北京邮电大学 | A kind of virtual identity association recognition methods and device |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633381A (en) * | 2018-12-25 | 2019-12-31 | 北京时光荏苒科技有限公司 | Method and device for identifying false house source, storage medium and electronic equipment |
CN110633381B (en) * | 2018-12-25 | 2023-04-07 | 北京时光荏苒科技有限公司 | Method and device for identifying false house source, storage medium and electronic equipment |
CN110222484A (en) * | 2019-04-28 | 2019-09-10 | 五八有限公司 | A kind of method for identifying ID, device, electronic equipment and storage medium |
CN110222484B (en) * | 2019-04-28 | 2023-05-23 | 五八有限公司 | User identity recognition method and device, electronic equipment and storage medium |
CN112101390A (en) * | 2019-05-29 | 2020-12-18 | 腾讯科技(深圳)有限公司 | Attribute information determination method, attribute information determination device and electronic equipment |
CN110990727A (en) * | 2019-11-01 | 2020-04-10 | 贝壳技术有限公司 | Broker information display method, device, storage medium and equipment |
CN110990727B (en) * | 2019-11-01 | 2024-05-10 | 贝壳技术有限公司 | Broker information display method, device, storage medium and equipment |
CN111209512A (en) * | 2020-01-03 | 2020-05-29 | 北京同邦卓益科技有限公司 | User identification method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108304482A (en) | The recognition methods and device of broker, electronic equipment and readable storage medium storing program for executing | |
CN107230008B (en) | Risk information output and risk information construction method and device | |
US11570214B2 (en) | Crowdsourced innovation laboratory and process implementation system | |
CN111371767B (en) | Malicious account identification method, malicious account identification device, medium and electronic device | |
US20190251638A1 (en) | Identification of life events within social media conversations | |
WO2016110121A1 (en) | Method and device for data rasterization and method and device for analyzing user behavior | |
CN109600344B (en) | Method and device for identifying risk group and electronic equipment | |
US20130226838A1 (en) | Missing value imputation for predictive models | |
US10891442B2 (en) | Message tone evaluation between entities in an organization | |
US20170372347A1 (en) | Sequence-based marketing attribution model for customer journeys | |
CN111951052A (en) | Method and device for acquiring potential customers based on knowledge graph | |
CN112541765A (en) | Method and apparatus for detecting suspicious transactions | |
US11586487B2 (en) | Rest application programming interface route modeling | |
CN112667869B (en) | Data processing method, device, system and storage medium | |
CN112598496A (en) | Wind control blacklist setting method and device, terminal equipment and readable storage medium | |
US11093636B2 (en) | Maintaining data protection compliance and data inference from data degradation in cross-boundary data transmission using containers | |
CN114048512B (en) | Method and device for processing sensitive data | |
CN116955148A (en) | Service system testing method, device, equipment, storage medium and product | |
CN116166879A (en) | Sharing service processing method, device, computer equipment and storage medium | |
CN116432001A (en) | Feature dimension screening method and device, computer equipment and storage medium | |
CN116049678A (en) | Feature contribution degree evaluation method, device, electronic equipment and storage medium | |
CN110489568B (en) | Method and device for generating event graph, storage medium and electronic equipment | |
CN114399353A (en) | Service recommendation method and device, electronic equipment and computer readable medium | |
CN115204888A (en) | Target account identification method and device, storage medium and electronic equipment | |
CN114629675B (en) | Method, system and storage medium for making security recommendations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180720 |
|
RJ01 | Rejection of invention patent application after publication |