Disclosure of Invention
In order to solve the technical problems, the application aims to provide a method and a system for online monitoring abnormal data of financial service application, and the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for online monitoring abnormal data of a financial service application, where the method includes the following steps:
Collecting login failure frequency, unauthorized access application frequency and access module frequency of a user in a financial service application system every day, and forming a security log vector of the user every day; acquiring a login failure time set, an access time set and a non-authority application time set of a user every day;
Respectively analyzing login failure frequency, unauthorized access application frequency and fluctuation and change trend of access module frequency of a user to obtain data consistency scores of the user every day;
determining an operation sequence correlation coefficient of a user every day based on the difference of elements in the login failure time set and the access time set of the user every day;
analyzing the correlation between the login failure frequency and the unauthorized access application frequency of the user to obtain the daily data change trend coefficient of the user;
Combining the operation sequence correlation coefficient with the data change trend coefficient and the difference between the unauthorized application time set and the login failure time set to obtain a data association change index of a user every day;
and combining the data consistency score, the data association change index and the safety log vector, and utilizing an isolated forest algorithm to apply abnormal data on-line monitoring to the financial service.
In one embodiment, the access module frequency is a sum of the number of times the user accesses the authorized module and the unauthorized access application frequency.
In one embodiment, the determination of the data consistency score comprises:
The login failure frequency of any day and all the days before the any day of the user is formed into a login failure sequence of the any day, the login failure sequence is utilized to obtain moving averages by a moving average method to form a moving average sequence of the login failure sequence, and the data comprehensive fluctuation coefficient of the login failure frequency of the any day is determined based on the difference of adjacent elements in the moving average sequence of the login failure sequence;
Acquiring the data comprehensive fluctuation coefficient of the access application frequency and the access module frequency without permission of any day by adopting a calculation method which is the same as the data comprehensive fluctuation coefficient of the login failure frequency of any day;
Calculating the difference between the comprehensive fluctuation coefficient of the data of the access module frequency and the comprehensive fluctuation coefficient of the data of the non-authority access application frequency, which is marked as a first difference, and calculating the difference between the average value of the non-authority access application frequency of any day and the non-authority access application frequency of all days before any day, which is marked as a second difference;
The data comprehensive fluctuation coefficient of the login failure frequency, the first difference and the second difference are synthesized, and the data consistency score of any day is determined;
and the data consistency score of any day is in negative correlation with the comprehensive fluctuation coefficient of the login failure frequency data and the second difference, and is in positive correlation with the first difference.
In one embodiment, the determining process of the comprehensive fluctuation coefficient of the login failure frequency data is as follows:
calculating the difference between each element in the moving average sequence of the login failure sequence and the adjacent element, marking the difference as the adjacent difference, carrying out forward fusion on all the adjacent differences in the moving average sequence of the login failure sequence, and taking the forward fusion result as the data comprehensive fluctuation coefficient of the login failure frequency.
In one embodiment, the determining process of the operation sequence correlation coefficient is:
If all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the daily operation sequence correlation coefficient is a first preset value, otherwise, the daily operation sequence correlation coefficient is a second preset value, and the first preset value is smaller than the second preset value.
In one embodiment, the determining the data trend coefficient includes:
calculating the correlation between a login failure sequence and a non-authority access application sequence every day, calculating the growth rate of each element in the login failure sequence every day relative to the previous element, marking the growth rate as a first growth rate, calculating the growth rate of each element in the non-authority access application sequence every day relative to the previous element, marking the growth rate as a second growth rate, carrying out forward fusion on the correlation, and taking the forward fusion result as a data change trend coefficient every day.
In one embodiment, the determining the data association change index includes:
calculating the difference of the average value of all elements in the unlicensed application time set and the login failure time set of each day, marking the difference as a third difference, and integrating the third difference, the data change trend coefficient and the operation sequence correlation coefficient of each day to determine the data association change index of each day;
The daily data correlation change index and the daily data change trend coefficient, the third difference and the operation sequence correlation coefficient are all positive correlations.
In one embodiment, the applying abnormal data online monitoring to the financial service includes:
Taking the daily security log vector as input of an isolated forest algorithm, and determining the weight of each isolated tree based on the data consistency score and the distribution of the data association change indexes of the security log vector corresponding to the days contained in each isolated tree in the isolated forest algorithm and the depths of the left subtree and the right subtree of each isolated tree;
The output of the isolated forest algorithm is the anomaly score of the safety log vector of each day, when the anomaly score is smaller than a preset threshold value, the corresponding safety log vector is judged to be abnormal data, and otherwise, the corresponding safety log vector is judged to be normal data.
In one embodiment, the determining process of the weight of each isolated tree is:
Calculating the discrete degree of the data consistency scores of all the safety log vectors contained in each isolated tree in corresponding days, calculating the average value of the data association change indexes of all the safety log vectors contained in each isolated tree in corresponding days, and calculating the difference between the maximum value and the minimum value of the depths of the left subtree and the right subtree of each isolated tree to be marked as a fourth difference;
determining a weight for each orphan tree based on the degree of discretization, the average value, and the fourth difference;
The weight of each isolated tree is in positive correlation with the degree of discretization, the average value and the fourth difference.
In a second aspect, an embodiment of the present application further provides an online monitoring system for abnormal data of a financial service application, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of any one of the methods described above when executing the computer program.
The application has at least the following beneficial effects:
the application acquires the login failure frequency, the unauthorized access application frequency and the access module frequency of the user in the financial service application system every day, so as to form a security log vector of the user every day, acquire a login failure time set, an access time set and an unauthorized application time set of the user every day, respectively analyze fluctuation and variation trend of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user, acquire a data consistency score of the user every day, reflect the similarity of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user every day before, further judge the security of the financial service application data of the user every day, and improve the reliability of monitoring abnormal data of the financial service application; based on the difference of elements in the login failure time set and the access time set of the user every day, determining the operational sequence correlation coefficient of the user every day, reflecting the precedence relation between the login failure time set and the time points in the access time set, reflecting the possibility that a user account is attacked by an attacker, adding causality features in the abnormal data monitoring process of the financial service application, and improving the accuracy of the abnormal data monitoring of the financial service application; analyzing the correlation between the login failure frequency and the non-authority access application frequency of the user to obtain a data change trend coefficient of each day of the user, combining the correlation coefficient of the operation sequence with the data change trend coefficient and the difference between a non-authority application time set and a login failure time set to obtain a data correlation change index of each day of the user, reflecting the possibility that a financial service application system of each day of the user is attacked and abnormal data appears, combining the data consistency score, the data correlation change index and the security log vector, utilizing an isolated forest algorithm to complete online monitoring of the abnormal data of the financial service application, reflecting the value and timeliness of carrying out abnormal monitoring on the security log vector in the isolated forest algorithm, improving the sensitivity of abnormal monitoring while guaranteeing the accuracy of abnormal data monitoring, and effectively challenging risks in the financial service application system.
Detailed Description
In order to further describe the technical means and effects adopted by the present application to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of a method and a system for online monitoring abnormal data of financial service according to the present application, which are described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
The application provides a method and a system for online monitoring of abnormal data of financial service application, which are specifically described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for online monitoring abnormal data of a financial service application according to an embodiment of the application is shown, the method includes the following steps:
S1, acquiring login failure frequency, unauthorized access application frequency and access module frequency of a user in a financial service application system every day, and forming a security log vector of the user every day; and acquiring a login failure time set, an access time set and a non-authority application time set of the user every day.
And downloading the safety operation log of the user from the safety operation log module of the financial service application system, and counting the daily data according to the period of one day. And counting to obtain login failure frequency, non-authority access application frequency and access module frequency of the user every day, wherein the login failure frequency represents the number of login failure times of the user every day, the non-authority access application frequency represents the number of access application times of the user accessing the non-authority module every day, and the access module frequency represents the sum of the number of times of the user accessing the authority module and the access application number of the non-authority module every day. And forming a security log vector of the user every day by the login failure frequency, the unauthorized access application frequency and the access module frequency of the user every day, for example, the login failure frequency of the user on the a-th day is 4, the unauthorized access application frequency is 1, the access module frequency is 2, and the security log vector of the user on the a-th day is (4, 1, 2).
The time interval between the time point of each login failure of the user and the zero point of each day takes seconds as a unit to form a login failure time set of the user every day, the time interval between the starting time point of each access of any module of the user every day and the zero point of each day takes seconds as a unit to form an access time set of the user every day, and the time interval between the starting time point of each unauthorized access application of the user every day and the zero point of each day takes seconds as a unit to form an unauthorized application time set of the user every day. Taking the login failure time set of the user's day a as an example, if the login failure time point of the user's day a is 0, 2 minutes and 20 seconds, the element corresponding to the login failure time set is 140, that is, the number of seconds included in 2 minutes and 20 seconds.
If the account of the user does not have any operation within one day, the account is dormant day, and the value of each component in the security log vector corresponding to the dormant day is the average value of each component in the security log vector corresponding to all non-dormant days of the user.
S2, respectively analyzing login failure frequency, unauthorized access application frequency and fluctuation and change trend of access module frequency of the user to obtain data consistency scores of the user every day.
Further, as an example of the present application, the process of constructing the data consistency score for each day of the user is:
Firstly, when a user operates a financial service application system, due to the development of habit of the user, the collected safe operation log data in most cases have high similarity, for example, login failure hardly or rarely occurs, and an operated module is also often a module with authority of the user, so that unauthorized access application is not frequently performed.
Taking the day a as an example, the login failure frequency of the previous day a is formed into a login failure sequence of the day a according to the time sequence, and then the login failure sequence and the subsequence length delta when moving average is used as input, and each moving average is obtained by adopting a moving average method. The length δ=5 of the subsequence at the time of moving average in this example, and the implementer can set itself according to the actual situation, the present application is not limited herein. The moving average method is a known technique, and the detailed description of the present application is omitted here; all the obtained moving averages are formed into a moving average sequence L a of a day login failure frequency according to a time sequence, and a moving average sequence P a of a day unauthorized access application frequency and a moving average sequence Q a of an access module frequency are calculated by adopting the same calculation method as the moving average sequence of the day login failure frequency.
Taking a moving average sequence L a of a day login failure frequency as an example, calculating the difference between each element in the moving average sequence L a and the previous element, marking the difference as a neighboring difference, carrying out forward fusion on all the neighboring differences in the moving average sequence L a, and taking a forward fusion result as a data comprehensive fluctuation coefficient of the day login failure frequency.
The difference represents the degree of difference between two variables, and specifically may be calculated by using a difference value, a ratio value, or the like, and the forward fusion represents the combination of a plurality of variables in such a manner as to enhance the overall effect, and specifically may be calculated by using addition, multiplication, or the like.
In one embodiment of the present application, the difference between each element in the moving average sequence L a and the previous element is calculated, and the sum of all the differences in the moving average sequence L a is used as the data integrated fluctuation coefficient of the login failure frequency on the a-th day.
And calculating the data comprehensive fluctuation coefficient of the frequency of the access application without permission on the a day and the data comprehensive fluctuation coefficient of the frequency of the access module on the a day by adopting the same calculation method as the data comprehensive fluctuation coefficient of the frequency of the login failure on the a day.
Calculating the difference between the data comprehensive fluctuation coefficient of the access module frequency and the data comprehensive fluctuation coefficient of the non-authority access application frequency, and marking the difference as a first difference, and calculating the difference between the non-authority access application frequency of any day and the average value of the non-authority access application frequency of all days before any day, and marking the difference as a second difference;
The data comprehensive fluctuation coefficient of the login failure frequency, the first difference and the second difference are synthesized, and the data consistency score of any day is determined;
and the data consistency score of any day is in negative correlation with the comprehensive fluctuation coefficient of the login failure frequency data and the second difference, and is in positive correlation with the first difference.
It should be understood that a positive correlation indicates that the dependent variable increases with increasing independent variable, and that the dependent variable decreases with decreasing independent variable, and may specifically be an additive relationship, a multiplicative relationship, an idempotent of an exponential function, and a negative correlation indicates that the dependent variable decreases with increasing independent variable, and that the dependent variable increases with decreasing independent variable, and may specifically be a subtractive relationship, a divisive relationship, or the like.
In one embodiment of the present application, taking the a day as an example, calculating the sum result of the data integrated fluctuation coefficient of the a-th access application frequency and the adjustment factor preset to be greater than 0, calculating the ratio of the data integrated fluctuation coefficient of the a-th access module frequency to the sum result, that is, the first difference, calculating the subtraction result of the mean value of the a-th access application frequency and the a-th access application frequency, that is, the second difference, taking the subtraction result as the independent variable of the sign function, when the independent variable of the sign function is less than or equal to 0, otherwise, the function value of the sign function is the value of the independent variable, taking the function value of the sign function as the index of the index function based on the natural constant, calculating the product of the calculation result of the index function and the data integrated fluctuation coefficient of the a-th login failure frequency, calculating the reciprocal of the product and the sum value of the adjustment factor preset to be greater than 0, and taking the reciprocal of the reciprocal and the ratio as the data consistency score of the a-th day.
It should be noted that, as an embodiment of the present application, the adjustment factors preset to be greater than 0 and the adjustment coefficients preset to be greater than 0 are both set to be 0.01 by human, and the practitioner can set themselves according to the actual situation, the present application is not limited herein, and the purpose of using the function value of the sign function as the exponent of the exponent function based on the natural constant is to calculate the subtraction result of the average value of the unauthorized access application frequency on the a-th day and the unauthorized access application frequency on the a-th day as the argument of the sign function, and measuring whether the frequency of the unauthorized access application on the a-th day is normal or not, if the frequency of the unauthorized access application on the a-th day is smaller than the average value of the frequency of the unauthorized access application on the a-th day, indicating that the frequency of the unauthorized access application on the a-th day may be normal, at this time, reducing the influence of the frequency of the unauthorized access application on the a-th day on the data consistency score on the a-th day, otherwise, indicating that the frequency of the unauthorized access application on the a-th day may be abnormal, and amplifying the influence of the frequency of the unauthorized access application on the a-th day on the data consistency score on the a-th day.
When no abnormality exists in the safety operation log data of the current day a, the risk of the account is smaller, the safety operation log data of each day has high similarity, so that after moving average, each moving average fluctuates around a fixed value, but the fluctuation is small, and after positive and negative counteraction, the comprehensive fluctuation coefficient of the data is smaller or is approximately 0, so that the first difference is larger, and the comprehensive fluctuation coefficient of the data of the login failure frequency of the day a is smaller; and because of high similarity between the safety operation log data of each day, the average value of the unlicensed access application frequency of the a day is close to that of the unlicensed access application frequency of the previous a day, so that the calculation result of the exponential function is smaller. The greater the data consistency score on day a when no anomaly exists in the safety operation log data on day a.
Otherwise, if the login failure frequency is abnormal, the moving average value of the login failure frequency is larger, and the comprehensive fluctuation coefficient of the data of the login failure frequency on the a-th day is larger, so that the smaller the data consistency score is finally caused; or the frequency of the unauthorized access application increases, so that the larger the calculation result of the exponential function is, the smaller the data consistency score is finally caused.
S3, determining an operation sequence correlation coefficient of each day of the user based on the difference between the login failure time set and the elements in the access time set of each day of the user.
For the time point in the login failure time set of the user every day, the attacker is possibly unfamiliar with the financial service application system, the password of the user is continuously tried to be cracked, so that the login failure occurs, and the time point in the access time set represents the starting time point of any module in the access financial service application system after the user is successfully logged in, so that the precedence relation between the time point in the login failure time set and the time point in the access time set is analyzed, namely, the operation sequence correlation coefficient of the user every day is determined, and the account security of the user can be assisted to be judged.
If all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the daily operation sequence correlation coefficient is a first preset value, otherwise, the daily operation sequence correlation coefficient is a second preset value, and the first preset value is smaller than the second preset value.
The first preset value is set to 1 by human, the second preset value is set to 2 by human, and the operator can set according to the actual situation, which is not limited in this application.
When all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the fact that the time points in the access time set are all before the time points in the login failure time set is reflected that the phenomenon that the user account is successful in login after being attacked by an attacker is not shown, the fact that the user account is safe at the moment is shown, the operation sequence correlation coefficient of the user is smaller, otherwise, the fact that the module in the financial service application system is accessed after the login failure is shown, at the moment, the fact that the attacker is likely to succeed in cracking the financial service application system is shown, the fact that the module in the financial service application system is accessed is likely to be caused by the attacker is shown, the fact that the user account is at risk is shown, and the operation sequence correlation coefficient of the user is larger is shown.
And S4, analyzing the correlation between the login failure frequency and the unauthorized access application frequency of the user to obtain the daily data change trend coefficient of the user.
When the account of the user is attacked by an attacker, the attacker needs to continuously try to crack the password of the user, so that the login failure frequency of the user is increased, after the login is successful, the attacker is unfamiliar with a financial service application system, and meanwhile, the authority owned by the user is not clear, so that the attacker can continuously access each module to check the authority module and information of the user, thereby increasing the frequency of the access module and the frequency of the unauthorized access application frequency, the attacker can try to utilize the existing authority of the account, and the unauthorized access is performed by searching and utilizing the vulnerability of the authority management of the financial service application system, so that the speed of the unauthorized access application frequency is increased more quickly than the speed of the access module in the process; when the user uses the financial service application system, the familiarity of the user with the financial service application system is high, and the authority of the user is clear, so that the user can perform purposeful operation in the financial service application system, and can not frequently click each module and access the module without the authority.
Therefore, according to the analysis, the correlation between the login failure sequence and the unauthorized access application sequence is calculated, the growth rate of each element in the login failure sequence is calculated relative to the previous element, the growth rate is marked as a first growth rate, the growth rate of each element in the unauthorized access application sequence is calculated relative to the previous element, the growth rate is marked as a second growth rate, the differences between the first growth rate and the second growth rate at all the same positions are used for the login failure sequence and the unauthorized access application sequence, and the correlation is fused in a forward direction, and the forward direction fusion result is used as the data change trend coefficient of each day.
It should be noted that, the correlation represents a correlation between sequences, and specifically, pearson correlation coefficients, cosine similarity, spearman class correlation coefficients, and the like may be used for calculation.
In one embodiment of the present application, pearson correlation coefficients of a login failure sequence and a non-authority access application sequence of each day are calculated, for the login failure sequence and the non-authority access application sequence of each day, a difference value between the first growth rate and the second growth rate at the same position is recorded as a trend difference value, the trend difference value is used as an independent variable of a sign function, when the independent variable of the sign function is less than or equal to 0, a function value of the sign function is 0, otherwise, the function value of the sign function is the value of the independent variable, and a sum of the function values of the sign function at all the same positions and a sum of the pearson correlation coefficients are used as a data change trend coefficient of each day of a user.
If the data change trend coefficient of the user on the a day is larger, the account of the user is more likely to be attacked by an attacker on the a day, and the risk of the account is more likely to be present, and the abnormal data of the financial service application is more likely to be present.
S5, combining the operation sequence correlation coefficient with the data change trend coefficient and the difference between the unauthorized application time set and the login failure time set to obtain a data association change index of the user every day.
Calculating the difference of the average value of all elements in the unlicensed application time set and the login failure time set of each day, marking the difference as a third difference, and integrating the third difference, the data change trend coefficient and the operation sequence correlation coefficient of each day to determine the data association change index of each day;
The daily data correlation change index and the daily data change trend coefficient, the third difference and the operation sequence correlation coefficient are all positive correlations.
In one embodiment of the present application, the difference between the average value of all elements in the unauthorized application time set and the average value of all elements in the login failure time set of each day is calculated, namely the third difference is recorded as a time difference, and the product of the daily data change trend coefficient, the time difference and the operation sequence correlation coefficient is used as the daily data association change index.
When the difference between the average value of all elements in the unauthorized application time set and the average value of all elements in the login failure time set is larger, the possibility that the risk exists in the user system is higher after the time point of most of the time points in the unauthorized application time set of the user is in the login failure time set, the possibility of data abnormality is higher, the data association change index of the user is larger, the data change trend coefficient and the operation sequence correlation coefficient are also larger, the possibility of data abnormality is also higher, and the data association change index of the user is larger.
And S6, combining the data consistency score, the data association change index and the safety log vector, and completing online monitoring of abnormal data of financial service application by utilizing an isolated forest algorithm.
In one embodiment of the present application, the number of samples extracted each time is set to 50 by human, the number of isolated trees in the isolated forest algorithm is set to 100 by human, n=400, and the number of samples, the number of isolated trees and the value implementer of N can be set according to the actual situation by themselves.
In the original isolated forest algorithm, the weight of each isolated tree is the same, but according to the construction process of the isolated tree, if the selected sample data has higher similarity when constructing the isolated tree, the value of the isolated tree is lower when the data anomaly detection is carried out subsequently; if the depths of the left subtree and the right subtree of the isolated tree are relatively close, the corresponding time cost of the isolated tree is relatively high when the data anomaly is detected, and the online monitoring of the anomaly data is also not facilitated. Therefore, in the application, the weight of the isolated tree in the isolated forest algorithm is adjusted by combining the daily data consistency score and the data association change index of the user, specifically:
Calculating the discrete degree of the data consistency scores of all the safety log vectors contained in each isolated tree in corresponding days, calculating the average value of the data association change indexes of all the safety log vectors contained in each isolated tree in corresponding days, and calculating the difference between the maximum value and the minimum value of the depths of the left subtree and the right subtree of each isolated tree to be marked as a fourth difference;
determining a weight for each orphan tree based on the degree of discretization, the average value, and the fourth difference;
The weight of each isolated tree is in positive correlation with the degree of discretization, the average value and the fourth difference.
The degree of dispersion represents the distribution characteristics and the trend of the data distribution, and specifically, the degree of dispersion may be calculated using variance, standard deviation, or the like.
In one embodiment of the present application, taking the i-th isolated tree as an example, calculating the variance of the data consistency scores of all days corresponding to all the safety log vectors in the i-th isolated tree, taking the variance as an index of an exponential function with a natural constant as a base, marking the index as a first exponential function, calculating the mean value of the data association change indexes of all the days corresponding to all the safety log vectors in the i-th isolated tree, taking the mean value of the data association change indexes as an independent variable of a sign function, marking the independent variable as a first sign function, when the independent variable of the first sign function is less than or equal to 0, taking the function value of the first sign function as the value of the independent variable, otherwise, taking the function value of the first sign function as the value of the independent variable, calculating the ratio of the maximum value to the minimum value of the depth of the left and right subtrees of the i-th isolated tree, namely, the fourth difference, and taking the calculation result of the first sign function, the product of the maximum value and the product of the minimum value as the weight of the i-th isolated tree.
If all samples of the ith isolated tree, namely all safety log vectors, have higher similarity, the smaller the variance of the data consistency scores of the ith isolated tree is, the smaller the possibility that the data consistency scores of the ith isolated tree are abnormal, the smaller the weight of the ith isolated tree is, the first index function is calculated to amplify the variance of the data consistency scores, when the variance of the data consistency scores is larger, the larger the weight of the ith isolated tree is caused, if the mean value of the data association change indexes of all the safety log vectors in the ith isolated tree is larger, the greater the probability that the data abnormality exists in all the safety log vectors in the ith isolated tree is, the greater the weight of the ith isolated tree is, the first symbol function is calculated to ensure the non-negative property of the weight of the ith isolated tree, the greater the difference between the maximum value and the minimum value of the depth of the left and right of the ith isolated tree is, and the higher the value of the data can be distinguished on the basis of the greater the data is, and the greater the value of the data abnormality is monitored on the ith isolated tree is indicated.
The weight of each isolated tree in the isolated forest algorithm is calculated by adopting the method which is the same as the weight of the ith isolated tree, so that the training of the isolated forest is completed, and a weight construction flow chart of the isolated tree is shown in figure 2.
And taking the safety log vector monitored every day as input of an isolated forest algorithm after training, outputting the safety log vector as an abnormal score of the corresponding safety log vector, judging the corresponding safety log vector as abnormal data when the abnormal score is smaller than a preset threshold value, otherwise, judging the corresponding safety log vector as normal data, wherein the isolated forest algorithm is the prior known technology, and the application is not described in detail herein. In one embodiment of the present application, the preset threshold is set to 0.5 by human, and the operator can set the preset threshold according to the actual situation, which is not limited herein.
Based on the same inventive concept as the above method, the embodiment of the application further provides an online monitoring system for abnormal data of financial service application, which comprises a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor realizes the steps of any one of the online monitoring methods for abnormal data of financial service application when executing the computer program.
In summary, the application collects the login failure frequency, the unauthorized access application frequency and the access module frequency of the user in the financial service application system to form the daily security log vector of the user, obtains the login failure time set, the access time set and the unauthorized application time set of the user each day, respectively analyzes the fluctuation and the variation trend of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user, obtains the daily data consistency score of the user, reflects the similarity of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user on all days before any day, further judges the security of the financial service application data of the user on any day, and improves the reliability of monitoring the abnormal data of the financial service application; based on the difference of elements in the login failure time set and the access time set of the user every day, determining the operational sequence correlation coefficient of the user every day, reflecting the precedence relation between the login failure time set and the time points in the access time set, reflecting the possibility that a user account is attacked by an attacker, adding causality features in the abnormal data monitoring process of the financial service application, and improving the accuracy of the abnormal data monitoring of the financial service application; analyzing the correlation between the login failure frequency and the non-authority access application frequency of the user to obtain a data change trend coefficient of each day of the user, combining the correlation coefficient of the operation sequence with the data change trend coefficient and the difference between a non-authority application time set and a login failure time set to obtain a data correlation change index of each day of the user, reflecting the possibility that a financial service application system of each day of the user is attacked and abnormal data appears, combining the data consistency score, the data correlation change index and the security log vector, utilizing an isolated forest algorithm to complete online monitoring of the abnormal data of the financial service application, reflecting the value and timeliness of carrying out abnormal monitoring on the security log vector in the isolated forest algorithm, improving the sensitivity of abnormal monitoring while guaranteeing the accuracy of abnormal data monitoring, and effectively challenging risks in the financial service application system.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present application are intended to be included within the scope of the present application.