CN111737320B - Group user behavior baseline establishment method and device and computer equipment - Google Patents
Group user behavior baseline establishment method and device and computer equipment Download PDFInfo
- Publication number
- CN111737320B CN111737320B CN202010621812.3A CN202010621812A CN111737320B CN 111737320 B CN111737320 B CN 111737320B CN 202010621812 A CN202010621812 A CN 202010621812A CN 111737320 B CN111737320 B CN 111737320B
- Authority
- CN
- China
- Prior art keywords
- user
- group
- behavior
- establishing
- baseline
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000006399 behavior Effects 0.000 claims description 171
- 230000002159 abnormal effect Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 5
- 230000001960 triggered effect Effects 0.000 description 4
- 206010000117 Abnormal behaviour Diseases 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a method, a device and a computer device for establishing a group user behavior baseline. By using the method provided by the embodiment of the application, a plurality of group user behavior baselines aiming at different types of users can be quickly established.
Description
Technical Field
The present application relates to the field of data mining, and in particular, to a method, an apparatus, and a computer device for establishing a group user behavior baseline.
Background
With the rapid development of network application technology, the network behaviors of users are more and more diversified, and how to identify the behaviors of network users and discover abnormal behavior events, so that ensuring the security of the network is more and more important. At present, whether a user has abnormal behavior events or not is judged mainly by establishing a behavior baseline of the individual and then by the behavior baseline of the individual. However, for convenience in management, the same group user behavior base line is generally used for managing the same department or group, but the working mode, habit and the like of each person are different, so that mismanagement may occur due to mismatching of the group user behavior base line and the person, and in order to solve the problem, a method for establishing behavior base lines corresponding to different types of people is needed, but how to quickly establish behavior base lines for different types of people is not yet well done.
Disclosure of Invention
The application mainly aims to provide a method, a device, computer equipment and a storage medium for establishing a group user behavior baseline, and aims to solve the problem that the behavior baseline aiming at different types of groups cannot be established rapidly in the prior art.
In order to achieve the above object, the present application provides a method for establishing a group user behavior baseline, including:
Acquiring a user portrait of each user and an individual behavior baseline corresponding to the user portrait, wherein the user portrait is a portrait constructed based on the specified information of the user and log history data corresponding to the user in a specified time period;
clustering calculation is carried out on all the user portraits to obtain user groups of different categories;
Based on individual behavior baselines of different users in the user group of the same category, a corresponding group user behavior baseline is established.
Further, the method for acquiring the individual behavior baseline corresponding to the user image comprises the following steps:
acquiring log history data of the user and appointed information of the user;
obtaining dates corresponding to all pieces of data in the log historical data;
Classifying the data with the date being the working day to obtain working day log historical data, and classifying the data with the date being the holiday to obtain holiday log historical data;
Establishing a working day individual behavior baseline of the user according to the working day log historical data and the user specified information, and establishing a working day individual behavior baseline of the user according to the holiday log historical data and the user specified information.
Further, the step of establishing a corresponding group user behavior baseline based on individual behavior baselines of different users in the same class of user group further comprises:
Removing abnormal data in individual baselines of different users in the user group of the same category by adopting an orphan forest algorithm;
and establishing the group user behavior base line by utilizing each individual behavior base line after abnormal data are removed.
Further, after the step of establishing the corresponding group user behavior base line based on the individual behavior base lines of the different users in the same class user group, the method further comprises:
acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be reflected in the group user behavior base line; and determining a user group category of the first user according to the user portrait of the first user;
Comparing the appointed characteristic value with a reference characteristic value corresponding to the appointed characteristic in a first group user behavior baseline, wherein the first group user behavior baseline is a group user behavior baseline corresponding to a user group category to which the first user belongs;
And if the comparison result meets the condition of triggering risk early warning, sending out alarm information.
Further, after the step of sending out the alarm information if the comparison result meets the condition of triggering the risk early warning, the method further includes:
judging whether the appointed characteristic value reaches a preset abnormal data threshold value or not;
if not, marking the appointed characteristic on the individual behavior base line corresponding to the first user.
Further, after the step of labeling the specified feature on the individual behavior base line corresponding to the first user, the method further includes:
judging whether the marked times of the features on the individual behavior baselines corresponding to the first user reach a preset quantity value or not;
If yes, reconstructing an individual behavior baseline corresponding to the first user.
Further, in one embodiment, after the step of establishing the group user behavior baseline based on the individual behavior baselines of the different users in the same class of user group, the method further includes:
And associating the users with the categories by using association rules.
The application also provides a device for establishing the group user behavior base line, which comprises the following steps:
An acquisition unit, configured to acquire a user portrait of each user, and an individual behavior baseline corresponding to the user portrait, where the user portrait is a portrait constructed based on specified information of the user and log history data corresponding to the user in a specified period of time;
the clustering unit is used for carrying out clustering calculation on all the user portraits to obtain user groups of different categories;
The establishing unit is used for establishing corresponding group user behavior baselines based on individual behavior baselines of different users in the user group of the same category.
The application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any of claims 1 to 7 when the computer program is executed.
The application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of claims 1 to 7.
When the establishment method is realized, the user portraits are established firstly, then the users are classified through the user portraits, and finally, the group user behavior baselines of the same class are established based on the individual behavior baselines of the users. By using the method provided by the embodiment of the application, a plurality of group user behavior baselines aiming at different types of users can be quickly established.
Drawings
FIG. 1 is a flow chart of a method for establishing a group user behavior baseline according to an embodiment of the application;
FIG. 2 is a block diagram schematically illustrating a device for establishing a group user behavior baseline according to an embodiment of the present application;
Fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, an embodiment of the present application provides a method for establishing a group user behavior baseline, including:
S1, obtaining a user portrait of each user and an individual behavior baseline corresponding to the user portrait, wherein the user portrait is a portrait constructed based on the specified information of the user and log history data corresponding to the user in a specified time period;
S2, carrying out clustering calculation on all the user portraits to obtain user groups of different categories;
S3, establishing corresponding group user behavior baselines based on individual behavior baselines of different users in the user group of the same category.
In this embodiment, the server acquires personal information of each user, establishes a user portrait by acquiring log history data of each user and specified information of the user, and tags the log history data, and the server can quickly read the information in the user portrait by the tag and establish individual behavior baselines of the user, performs cluster analysis on the basis of the user portrait to obtain groups of different categories, and then establishes group user behavior baselines of the groups of different categories.
As described in step S1, the server obtains the specified information of each user, which mainly includes the gender, age, department, post information, academic information, etc. of the user. Then, by acquiring log history data of each user and combining the user specification information, a user portrait is established, and the labels of the user portrait comprise ① user types (including salesmen, internal staff, car account numbers and other four types); ② Workday activity, workday activity= (days of accessing the specified system in the past 90 workdays)/total workdays; ③ Holiday activity, holiday activity= (days of past 90 days holidays (including weekends, legal holidays) access the specified system)/total holiday days; ④ Diligence index = calculate total overtime/total days; ⑤ Whether there is abnormal behavior or not, and matching the results of other abnormal detection models. The above specified system may be PNBS (safe production insurance business new core system) or the like. The user type, the weekday activity, the holiday activity, the diligence index and the like are obtained through the log historical data.
The individual behaviors of the users are obtained by extracting specified features from the log history data, for example, an individual behavior baseline is established based on log history data of the past 90 days, and the extracted specified features comprise: ① Total access frequency per day, including mean, standard deviation, Q1, Q3, maximum, minimum; ② The number of SESSION_ID/day comprises a mean value, a standard deviation, Q1, Q3, a maximum value and a minimum value; ③ IP number/day, including mean, standard deviation, Q1, Q3, maximum, minimum; ④ The number of price polls per day comprises a mean value, a standard deviation, Q1, Q3, a maximum value and a minimum value; ⑤ Search times/day, including mean, standard deviation, Q1, Q3, maximum, minimum; ⑥ The number of times per day of insurance tracking, including mean, standard deviation, Q1, Q3, maximum, minimum; ⑦ The number of HTTP access failures per day includes mean, standard deviation, Q1, Q3, maximum, minimum. When the mean value and the standard deviation of the data are calculated, in order to avoid the influence of noise data, a quartile range method is adopted to remove the noise data. Wherein, Q1 and Q3 are the first quartile (Q1), also called "smaller quartile", of the quartiles, Q1 and Q3, which is the 25 th number after all the values in the sample are arranged from small to large. The second quartile (Q2), also known as the "median", is equal to the 50% number after all values in the sample are arranged from small to large. The third quartile (Q3), also known as the "greater quartile", is equal to the 75% number after all values in the sample are arranged from small to large. The difference between the third quartile and the first quartile is also known as the quartile range (InterQuartile Range, IQR).
When all user images are obtained and clustered, the best clustering number is determined by the elbow method to determine the clustering number, then the Kmeans algorithm is adopted to cluster the user images, the specific working process is that K points are selected as initial clustering centers, each object is distributed to the nearest centers to form K clusters, the center of each cluster is recalculated, the iteration steps are repeated until the clusters are not changed or the appointed iteration times are reached, and finally a plurality of user groups of different categories are obtained. The elbow method is a method of removing the top cluster number commonly found in Kmeans calculation, and is not described herein.
As described in step S3, according to the individual behavior base line of each user in the same user group, the group user behavior base line of the user group is established, so that the behavior base line suitable for the user group can be obtained, and the judgment base line is more moderate in subsequent use, thereby being convenient for popularization and use. In the application, the characteristics of the group user behavior baselines are the same as those of the individual user behavior baselines, and only the specific corresponding numerical values are changed. In a particular embodiment, each characteristic value in the group user behavior baseline may be an average of the characteristic values in each individual user behavior baseline in the group, or the like.
In one embodiment, the method for obtaining the individual behavior baseline corresponding to the user image includes:
Acquiring log history data of the user;
obtaining the date corresponding to each piece of data in the log history data,
Classifying the data with the date being the working day to obtain working day log historical data, and classifying the data with the date being the holiday to obtain holiday log historical data;
Establishing a working day individual behavior baseline of the user according to the working day log historical data and the user specified information, and establishing a working day individual behavior baseline of the user according to the holiday log historical data and the user specified information.
In this embodiment, since the behavior baselines of the weekday and holiday are different, analysis needs to be performed separately, and by calling the hundred degree interface http:// www.easybots.cn/api/holiday.php on Java, the interface can determine whether a given date is the weekday or holiday. Further, when the group user behavior base line is established, the working day group user behavior base line, the holiday group user behavior base line and the like can be established according to the requirement. For example, when establishing a working day group user behavior baseline, selecting a working day individual behavior baseline, and establishing a holiday group user behavior baseline, selecting a holiday individual behavior baseline.
In one embodiment, the step S3 of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same group of users further includes:
s301, eliminating abnormal data in individual baselines of different users in the same class of user groups by adopting an orphan forest algorithm;
s302, establishing the group user behavior base line by utilizing each individual behavior base line after abnormal data are removed.
In this embodiment, the orphan forest algorithm (iForest) is commonly used to mine abnormal data, such as attack detection and traffic anomaly analysis in network security, and the financial institution is used to mine fraud. The algorithm has low memory requirements, high processing speed and linear time complexity. High-dimensional data and big data can be well processed, and the method can also be used for online anomaly detection. Abnormal data refers to interference data, for example, the operation times of a certain user on a certain day can be particularly large or particularly small, and the obvious abnormal data can influence the result of data analysis, so that an orphan forest algorithm can be adopted to remove the interference data when the mean value and the standard deviation are calculated. For example: a user normally logs in to the A webpage 1-2 times a day, but on a certain day, for some reasons, it is necessary to repeat the login a plurality of times, 50 times in total, and 50 times are abnormal data. And establishing a group user behavior baseline by utilizing each individual behavior baseline after abnormal data are removed, wherein the obtained group user behavior baseline is more accurate and has stronger practicability.
In one embodiment, after the step S3 of establishing the corresponding group user behavior baseline based on the individual behavior baselines of the different users in the same user group, the method further includes:
acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be reflected in the group user behavior base line; and determining a user group category of the first user according to the user portrait of the first user;
Comparing the appointed characteristic value with a reference characteristic value corresponding to the appointed characteristic in a first group user behavior baseline, wherein the first group user behavior baseline is a group user behavior baseline corresponding to a user group category to which the first user belongs;
And if the comparison result meets the condition of triggering risk early warning, sending out alarm information.
In this embodiment, the individual behavior baseline and the group user behavior baseline are unified within a set period, such as a day behavior baseline, a week behavior baseline, a quarter behavior baseline, and the like, where the current period is the current period, and is generally not yet completed. The comparison method is that the number of times of logging in the website a in a range space, such as a period, is an appointed characteristic, the corresponding reference characteristic value is 5 times, and when the appointed characteristic value is not more than 7 times, risk early warning can not be started, that is, the condition for triggering the risk early warning is that the appointed characteristic value is more than 8. In another embodiment, the formula is q1+1.5 (Q3-Q1) as the trigger threshold, when the specified feature value > q1+1.5 (Q3-Q1), the feature is considered to deviate from the individual behavior baseline, and the risk alarm is automatically triggered and an instruction is sent to the server, and the server performs identification judgment on the feature. Q1 and Q3 are Q1 and Q3 in the quartile, and are not described herein.
In one embodiment, if the comparison result meets the condition of triggering risk early warning, the step of sending out alarm information further includes:
judging whether the appointed characteristic value reaches a preset abnormal data threshold value or not;
if not, marking the appointed characteristic on the individual behavior base line corresponding to the first user.
In this embodiment, after the risk alert is triggered, the server determines whether the specified feature value reaches the preset abnormal data threshold, if yes, the abnormal data in the individual behavior baseline is removed, if no, the feature is marked, because the behavior habit of the first user may change, for example, the number of times the first user logs in the a webpage every day before is 1-4, but logs in 8 times today, and if the warning is triggered but the abnormal data threshold is not reached, the feature is marked, so that the follow-up tracking processing of the data is facilitated.
In one embodiment, after the step of labeling the specified feature on the individual behavior base line corresponding to the first user, the method further includes:
judging whether the marked times of the features on the individual behavior baselines corresponding to the first user reach a preset quantity value or not;
If yes, reconstructing an individual behavior baseline corresponding to the first user.
In this embodiment, when the number of feature labels reaches a preset threshold, it indicates that the personal behavior of the first user changes, and the labeled number of times includes a sum of the labeled times of each labeled feature. For example: the number of times of logging in the A webpage before the first user logs in is 1-4 times, but 7 times are logged in today, if the warning is triggered but the abnormal data threshold is not reached, the characteristic is marked once, if the number of times of logging in the A webpage in the next N days of the first user is 7-10, the characteristic of logging in the A webpage is marked n+1 times, other characteristics can be also be marked M times in the period, the marked times are equal to n+1+m (M and N are positive integers), when the marked times reach the preset threshold, the personal behavior of the user is determined to be changed, and the individual behavior baseline of the first user needs to be re-established.
In one embodiment, after the step S3 of establishing the corresponding group user behavior baseline based on the individual behavior baselines of the different users in the same user group, the method further includes:
And associating the users with the categories by using association rules.
In this embodiment, association rules are an important issue in data mining for mining correlations between valuable data items from a large amount of data. Common problems solved by association rules are: "if a consumer purchased product a, then how much will he purchase product B? "and" if he purchased products C and D, then he will also purchase what products? "the same data features may be observed from different dimensions, such as date, region, channel, product, user, etc., which are dimensions, a 3D model is built, more than 80 classes are obtained through clustering in step S3, and the main features of each class can be approximately known, such as class a after the examination is finished: the main characteristics of the subjects a are 'bad', B: the main characteristics of the subjects b are "excellent", class C: c teacher in class a subject, class D: d teacher's class b subjects, all need the manual work to classify before, just can obtain the relation between A class and the C class, and owing to the repeated deviation that still probably appears in work, can obtain the a subject that C teacher's was in class through the association rule, the student is mostly "not passing", so alright excavate analysis and obtain that C teacher has obvious problem in the aspect of the teaching of a subject, need to correct from this to provide powerful technological basis and support for urging C teacher to improve and improve the teaching effect. The users have relevance, so that the reasons of high or low performance, efficiency and the like can be analyzed according to the relevance, and a reference is provided for solving the problem. For example, two persons in an industry are colleagues, and are related to each other in the downstream in terms of business logic, and are related together by the above-described association rule, and if the downstream work is inefficient, the upstream progress may be affected, etc. Further, in the application, the association relation for each group of the group users can be established through the association rule, and the association between each group of users is analyzed, so that the relation of how the different user groups should be matched and the like is mined based on the group user behavior base line of each group of users, and the specific analysis method is different according to different industries and different analysis purposes and the like and is not described in detail herein. In the application, the user category is imaged, and then the association relations are connected through colored arrows and the like, so that the user can conveniently check, analyze and use.
According to the method for establishing the group user behavior base line, the user portrait is established firstly, then the users are classified through the user portrait, and then the group user behavior base line of the same class is established based on the individual behavior base line of the users. By using the method provided by the embodiment of the application, a plurality of group user behavior baselines aiming at different types of users can be quickly established.
Referring to fig. 2, the embodiment of the present application further provides a device for establishing a group user behavior baseline, including:
An acquisition unit 10 for acquiring a user portraits of each user, which are portraits constructed based on the designation information of the user and log history data corresponding to the user in a designated period, and individual behavior baselines corresponding to the user portraits;
A clustering unit 20, configured to perform a clustering calculation on all the user portraits to obtain user groups of different categories;
The establishing unit 30 is configured to establish a corresponding group user behavior baseline based on individual behavior baselines of different users in the same class of user group.
In one embodiment, the device for establishing a group user behavior baseline further includes:
a log obtaining unit, configured to obtain log history data of the user;
a date acquisition unit for acquiring a date corresponding to each piece of data in the log history data,
The classification unit is used for classifying the data with the date being the working day to obtain the working day log historical data, and classifying the data with the date being the holiday to obtain the holiday log historical data;
The individual behavior base line establishing unit is used for establishing a working day individual behavior base line of the user according to the working day log historical data and the user specified information, and establishing a holiday individual behavior base line of the user according to the holiday log historical data and the user specified information.
In one embodiment, the establishing unit 30 further includes:
The abnormal eliminating module is used for eliminating abnormal data in individual baselines of different users in the same class of user groups by adopting an orphan forest algorithm;
the establishing module is used for establishing the group user behavior base line by utilizing each individual behavior base line after the abnormal data are removed.
In one embodiment, the device for establishing a group user behavior baseline further includes:
The first acquisition unit is used for acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
The extraction unit is used for extracting the appointed characteristic value of the current behavior log, wherein the appointed characteristic value is the characteristic value required to be reflected in the group user behavior base line; and determining a user group category of the first user according to the user portrait of the first user;
the comparison unit is used for comparing the appointed characteristic value with a reference characteristic value corresponding to the appointed characteristic in a first group user behavior base line, wherein the first group user behavior base line is a group user behavior base line corresponding to a user group category to which the first user belongs;
and the alarm unit is used for sending alarm information if the comparison result meets the condition of triggering risk early warning.
In one embodiment, the device for establishing a group user behavior baseline further includes:
the first judging unit is used for judging whether the specified characteristic value reaches a preset abnormal data threshold value or not;
And the labeling unit is used for labeling the appointed characteristics on the individual behavior base line corresponding to the first user if not.
In one embodiment, the device for establishing a group user behavior baseline further includes:
the second judging unit is used for judging whether the number of times of marking the characteristics on the individual behavior base line corresponding to the first user reaches a preset quantity value or not;
and the reconstruction unit is used for reconstructing the individual behavior base line corresponding to the first user if the user is in the first state.
In one embodiment, the device for establishing a group user behavior baseline further includes:
And the association unit is used for associating the users with the categories by using association rules.
The units, modules, and the like in the above embodiments are devices that correspondingly perform the methods in the above embodiments.
The device for establishing the group user behavior base line firstly establishes the user portrait, classifies the users through the user portrait, and establishes the group user behavior base line of the same class based on the individual behavior base line of the users. By using the method provided by the embodiment of the application, a plurality of group user behavior baselines aiming at different types of users can be quickly established.
Referring to fig. 3, in an embodiment of the present application, there is further provided a computer device, which may be a server, and an internal structure thereof may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing log data, user portraits, behavior baselines and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, may implement the method for establishing a group user behavior baseline of any one of the embodiments described above.
It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present inventive arrangements and is not intended to limit the computer devices to which the present inventive arrangements are applicable.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for establishing the group user behavior baseline of any one of the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by hardware associated with a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided by the present application and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous link (SYNCHLINK) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the application.
Claims (7)
1. The method for establishing the group user behavior base line is characterized by comprising the following steps:
Acquiring a user portrait of each user and an individual behavior baseline corresponding to the user portrait, wherein the user portrait is a portrait constructed based on the specified information of the user and log history data corresponding to the user in a specified time period;
clustering calculation is carried out on all the user portraits to obtain user groups of different categories;
establishing corresponding group user behavior baselines based on individual behavior baselines of different users in the user group of the same category;
The method for acquiring the individual behavior baselines corresponding to the user images comprises the following steps:
acquiring log history data of the user and appointed information of the user;
obtaining dates corresponding to all pieces of data in the log historical data;
Classifying the data with the date being the working day to obtain working day log historical data, and classifying the data with the date being the holiday to obtain holiday log historical data;
Establishing a working day individual behavior baseline of the user according to the working day log historical data and the user specified information, and establishing a holiday individual behavior baseline of the user according to the holiday log historical data and the user specified information;
the step of establishing a corresponding group user behavior baseline based on individual behavior baselines of different users in the same class user group further comprises the following steps:
Removing abnormal data in individual baselines of different users in the user group of the same category by adopting an orphan forest algorithm;
establishing a group user behavior baseline by utilizing each individual behavior baseline after abnormal data are removed;
After the step of establishing the corresponding group user behavior base line based on the individual behavior base lines of different users in the same class user group, the method further comprises the following steps:
acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be reflected in the group user behavior base line; and determining a user group category of the first user according to the user portrait of the first user;
Comparing the appointed characteristic value with a reference characteristic value corresponding to the appointed characteristic in a first group user behavior baseline, wherein the first group user behavior baseline is a group user behavior baseline corresponding to a user group category to which the first user belongs;
And if the comparison result meets the condition of triggering risk early warning, sending out alarm information.
2. The method for establishing a group user behavior baseline according to claim 1, wherein after the step of sending out the alarm information if the comparison result meets the condition for triggering the risk early warning, further comprises:
judging whether the appointed characteristic value reaches a preset abnormal data threshold value or not;
if not, marking the appointed characteristic on the individual behavior base line corresponding to the first user.
3. The method for establishing a group user behavior baseline according to claim 2, further comprising, after the step of labeling the specified feature on the individual behavior baseline corresponding to the first user:
judging whether the marked times of the features on the individual behavior baselines corresponding to the first user reach a preset quantity value or not;
If yes, reconstructing an individual behavior baseline corresponding to the first user.
4. The method of claim 1, wherein after the step of establishing the corresponding group user behavior baseline based on individual behavior baselines of different users in the same category of user group, further comprising:
And associating the users with the categories by using association rules.
5. A group user behavior baseline establishing apparatus for implementing a group user behavior baseline establishing method as defined in any one of claims 1 to 4, comprising:
An acquisition unit, configured to acquire a user portrait of each user, and an individual behavior baseline corresponding to the user portrait, where the user portrait is a portrait constructed based on specified information of the user and log history data corresponding to the user in a specified period of time;
the clustering unit is used for carrying out clustering calculation on all the user portraits to obtain user groups of different categories;
The establishing unit is used for establishing corresponding group user behavior baselines based on individual behavior baselines of different users in the user group of the same category.
6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 4 when the computer program is executed.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010621812.3A CN111737320B (en) | 2020-06-30 | 2020-06-30 | Group user behavior baseline establishment method and device and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010621812.3A CN111737320B (en) | 2020-06-30 | 2020-06-30 | Group user behavior baseline establishment method and device and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111737320A CN111737320A (en) | 2020-10-02 |
CN111737320B true CN111737320B (en) | 2024-08-02 |
Family
ID=72652224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010621812.3A Active CN111737320B (en) | 2020-06-30 | 2020-06-30 | Group user behavior baseline establishment method and device and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111737320B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112579581B (en) * | 2020-11-30 | 2023-04-14 | 贵州力创科技发展有限公司 | Data access method and system of data analysis engine |
CN114283917B (en) * | 2021-11-25 | 2025-05-30 | 皖南医学院 | A warning analysis method and system based on big data of chronic disease medication |
CN114398966A (en) * | 2021-12-31 | 2022-04-26 | 北京久安世纪科技有限公司 | Early warning method for user portrait based on fortress machine |
CN114925265B (en) * | 2022-03-25 | 2025-01-28 | 上海聚均科技有限公司 | Method, system, device and computer-readable storage medium for acquiring user portrait groups based on group behavior |
CN114817377B (en) * | 2022-06-29 | 2022-09-20 | 深圳红途科技有限公司 | User portrait based data risk detection method, device, equipment and medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108021929A (en) * | 2017-11-16 | 2018-05-11 | 华南理工大学 | Mobile terminal electric business user based on big data, which draws a portrait, to establish and analysis method and system |
CN108133390A (en) * | 2017-12-22 | 2018-06-08 | 北京三快在线科技有限公司 | For predicting the method and apparatus of user behavior and computing device |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014172380A1 (en) * | 2013-04-15 | 2014-10-23 | Flextronics Ap, Llc | Altered map routes based on user profile information |
CN109086787B (en) * | 2018-06-06 | 2023-07-25 | 平安科技(深圳)有限公司 | User portrait acquisition method, device, computer equipment and storage medium |
CN109740620B (en) * | 2018-11-12 | 2023-09-26 | 平安科技(深圳)有限公司 | Method, device, equipment and storage medium for establishing crowd figure classification model |
-
2020
- 2020-06-30 CN CN202010621812.3A patent/CN111737320B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108021929A (en) * | 2017-11-16 | 2018-05-11 | 华南理工大学 | Mobile terminal electric business user based on big data, which draws a portrait, to establish and analysis method and system |
CN108133390A (en) * | 2017-12-22 | 2018-06-08 | 北京三快在线科技有限公司 | For predicting the method and apparatus of user behavior and computing device |
Also Published As
Publication number | Publication date |
---|---|
CN111737320A (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111737320B (en) | Group user behavior baseline establishment method and device and computer equipment | |
WO2020253358A1 (en) | Service data risk control analysis processing method, apparatus and computer device | |
CN109461078B (en) | Abnormal transaction identification method and system based on fund transaction network | |
CN109767322B (en) | Suspicious transaction analysis method and device based on big data and computer equipment | |
CN109858737B (en) | Grading model adjustment method and device based on model deployment and computer equipment | |
CN109543096B (en) | Data query method, device, computer equipment and storage medium | |
CN109767327A (en) | Anti-money laundering-based customer information collection and its use | |
CN110738388B (en) | Method, device, equipment and storage medium for evaluating risk conduction through association map | |
CN109949154B (en) | Customer information classification method, apparatus, computer device and storage medium | |
CN108876133A (en) | Risk assessment processing method, device, server and medium based on business information | |
CN109767326B (en) | Suspicious transaction report generation method, device, computer equipment and storage medium | |
CN109543925B (en) | Risk prediction method and device based on machine learning, computer equipment and storage medium | |
CN109523153A (en) | Acquisition methods, device, computer equipment and the storage medium of illegal fund collection enterprise | |
CN108268624B (en) | User data visualization method and system | |
CN111192153B (en) | Crowd relation network construction method, device, computer equipment and storage medium | |
CN109886554B (en) | Illegal behavior discrimination method, device, computer equipment and storage medium | |
CN110472114B (en) | Abnormal data early warning method and device, computer equipment and storage medium | |
CN108280644B (en) | Group membership data visualization method and system | |
CN112581283B (en) | Method and device for analyzing and warning transaction behavior of commercial bank employees | |
CN111897587B (en) | Internet of things application configuration method, device, computer equipment and storage medium | |
CN110729054B (en) | Abnormal diagnosis behavior detection method and device, computer equipment and storage medium | |
CN111382944A (en) | Job behavior risk identification method and device, computer equipment and storage medium | |
WO2016188334A1 (en) | Method and device for processing application access data | |
CN114186760A (en) | Analysis method and system for stable operation of enterprise and readable storage medium | |
CN112232556B (en) | Product recommendation method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |