[go: up one dir, main page]

CN118569871B - A method and system for online monitoring of abnormal data in financial service applications - Google Patents

A method and system for online monitoring of abnormal data in financial service applications Download PDF

Info

Publication number
CN118569871B
CN118569871B CN202410740166.0A CN202410740166A CN118569871B CN 118569871 B CN118569871 B CN 118569871B CN 202410740166 A CN202410740166 A CN 202410740166A CN 118569871 B CN118569871 B CN 118569871B
Authority
CN
China
Prior art keywords
data
day
frequency
difference
login failure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410740166.0A
Other languages
Chinese (zh)
Other versions
CN118569871A (en
Inventor
李彦乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhou Rongxin Cloud Technology Co ltd
Original Assignee
Shenzhou Rongxin Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhou Rongxin Cloud Technology Co ltd filed Critical Shenzhou Rongxin Cloud Technology Co ltd
Priority to CN202410740166.0A priority Critical patent/CN118569871B/en
Publication of CN118569871A publication Critical patent/CN118569871A/en
Application granted granted Critical
Publication of CN118569871B publication Critical patent/CN118569871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4016Transaction verification involving fraud or risk level assessment in transaction processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Accounting & Taxation (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请涉及数据处理技术领域,具体涉及一种金融服务应用异常数据在线监测方法及系统,该方法包括:分别分析用户的登录失败频数、无权限访问申请频数及访问模块频数的波动和变化趋势,得到用户每天的数据一致性得分,基于用户每天的登录失败时间集合与访问时间集合中元素的差异,确定用户每天的操作顺序相关性系数,分析用户的登录失败频数与无权限访问申请频数的相关关系,得到用户每天的数据变化趋势系数,结合无权限申请时间集合与登录失败时间集合的差异,得到用户每天的数据关联变化指数,利用孤立森林算法,对金融服务应用异常数据在线监测。从而提高对金融服务应用异常数据监测的准确性。

The present application relates to the field of data processing technology, and specifically to a method and system for online monitoring of abnormal data in financial service applications, the method comprising: respectively analyzing the fluctuation and change trend of the user's login failure frequency, the frequency of unauthorized access application and the frequency of access module, obtaining the user's daily data consistency score, determining the user's daily operation sequence correlation coefficient based on the difference between the elements in the user's daily login failure time set and the access time set, analyzing the correlation between the user's login failure frequency and the frequency of unauthorized access application, obtaining the user's daily data change trend coefficient, combining the difference between the unauthorized application time set and the login failure time set, obtaining the user's daily data association change index, and using the isolation forest algorithm to monitor abnormal data in financial service applications online. Thereby improving the accuracy of abnormal data monitoring in financial service applications.

Description

Financial service application abnormal data online monitoring method and system
Technical Field
The application relates to the technical field of data processing, in particular to a financial service application abnormal data online monitoring method and system.
Background
With the rapid development of financial technology, financial service applications have become an integral part of modern life. However, with the increase in the number of users and the diversification of service types, financial service applications face increasingly serious security challenges. Hacking, account theft, fraudulent transactions, etc. are frequently occurring, seriously affecting the asset security and privacy protection of financial institutions and their customers. In order to maintain the steady operation and user interests of the financial system, it is important to establish a set of efficient and accurate online monitoring method for abnormal data, so that potential safety risks can be timely found and prevented, and economic losses and trust crisis of users are reduced.
At present, many financial service applications adopt basic security protection measures, such as login verification, authority management, firewall and the like, but the existing anomaly monitoring scheme cannot fully utilize the time sequence characteristics of data and causal relations among events, so that the online monitoring of the anomaly data is low in accuracy.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide a method and a system for online monitoring abnormal data of financial service application, and the adopted technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for online monitoring abnormal data of a financial service application, where the method includes the following steps:
Collecting login failure frequency, unauthorized access application frequency and access module frequency of a user in a financial service application system every day, and forming a security log vector of the user every day; acquiring a login failure time set, an access time set and a non-authority application time set of a user every day;
Respectively analyzing login failure frequency, unauthorized access application frequency and fluctuation and change trend of access module frequency of a user to obtain data consistency scores of the user every day;
determining an operation sequence correlation coefficient of a user every day based on the difference of elements in the login failure time set and the access time set of the user every day;
analyzing the correlation between the login failure frequency and the unauthorized access application frequency of the user to obtain the daily data change trend coefficient of the user;
Combining the operation sequence correlation coefficient with the data change trend coefficient and the difference between the unauthorized application time set and the login failure time set to obtain a data association change index of a user every day;
and combining the data consistency score, the data association change index and the safety log vector, and utilizing an isolated forest algorithm to apply abnormal data on-line monitoring to the financial service.
In one embodiment, the access module frequency is a sum of the number of times the user accesses the authorized module and the unauthorized access application frequency.
In one embodiment, the determination of the data consistency score comprises:
The login failure frequency of any day and all the days before the any day of the user is formed into a login failure sequence of the any day, the login failure sequence is utilized to obtain moving averages by a moving average method to form a moving average sequence of the login failure sequence, and the data comprehensive fluctuation coefficient of the login failure frequency of the any day is determined based on the difference of adjacent elements in the moving average sequence of the login failure sequence;
Acquiring the data comprehensive fluctuation coefficient of the access application frequency and the access module frequency without permission of any day by adopting a calculation method which is the same as the data comprehensive fluctuation coefficient of the login failure frequency of any day;
Calculating the difference between the comprehensive fluctuation coefficient of the data of the access module frequency and the comprehensive fluctuation coefficient of the data of the non-authority access application frequency, which is marked as a first difference, and calculating the difference between the average value of the non-authority access application frequency of any day and the non-authority access application frequency of all days before any day, which is marked as a second difference;
The data comprehensive fluctuation coefficient of the login failure frequency, the first difference and the second difference are synthesized, and the data consistency score of any day is determined;
and the data consistency score of any day is in negative correlation with the comprehensive fluctuation coefficient of the login failure frequency data and the second difference, and is in positive correlation with the first difference.
In one embodiment, the determining process of the comprehensive fluctuation coefficient of the login failure frequency data is as follows:
calculating the difference between each element in the moving average sequence of the login failure sequence and the adjacent element, marking the difference as the adjacent difference, carrying out forward fusion on all the adjacent differences in the moving average sequence of the login failure sequence, and taking the forward fusion result as the data comprehensive fluctuation coefficient of the login failure frequency.
In one embodiment, the determining process of the operation sequence correlation coefficient is:
If all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the daily operation sequence correlation coefficient is a first preset value, otherwise, the daily operation sequence correlation coefficient is a second preset value, and the first preset value is smaller than the second preset value.
In one embodiment, the determining the data trend coefficient includes:
calculating the correlation between a login failure sequence and a non-authority access application sequence every day, calculating the growth rate of each element in the login failure sequence every day relative to the previous element, marking the growth rate as a first growth rate, calculating the growth rate of each element in the non-authority access application sequence every day relative to the previous element, marking the growth rate as a second growth rate, carrying out forward fusion on the correlation, and taking the forward fusion result as a data change trend coefficient every day.
In one embodiment, the determining the data association change index includes:
calculating the difference of the average value of all elements in the unlicensed application time set and the login failure time set of each day, marking the difference as a third difference, and integrating the third difference, the data change trend coefficient and the operation sequence correlation coefficient of each day to determine the data association change index of each day;
The daily data correlation change index and the daily data change trend coefficient, the third difference and the operation sequence correlation coefficient are all positive correlations.
In one embodiment, the applying abnormal data online monitoring to the financial service includes:
Taking the daily security log vector as input of an isolated forest algorithm, and determining the weight of each isolated tree based on the data consistency score and the distribution of the data association change indexes of the security log vector corresponding to the days contained in each isolated tree in the isolated forest algorithm and the depths of the left subtree and the right subtree of each isolated tree;
The output of the isolated forest algorithm is the anomaly score of the safety log vector of each day, when the anomaly score is smaller than a preset threshold value, the corresponding safety log vector is judged to be abnormal data, and otherwise, the corresponding safety log vector is judged to be normal data.
In one embodiment, the determining process of the weight of each isolated tree is:
Calculating the discrete degree of the data consistency scores of all the safety log vectors contained in each isolated tree in corresponding days, calculating the average value of the data association change indexes of all the safety log vectors contained in each isolated tree in corresponding days, and calculating the difference between the maximum value and the minimum value of the depths of the left subtree and the right subtree of each isolated tree to be marked as a fourth difference;
determining a weight for each orphan tree based on the degree of discretization, the average value, and the fourth difference;
The weight of each isolated tree is in positive correlation with the degree of discretization, the average value and the fourth difference.
In a second aspect, an embodiment of the present application further provides an online monitoring system for abnormal data of a financial service application, including a memory, a processor, and a computer program stored in the memory and running on the processor, where the processor implements the steps of any one of the methods described above when executing the computer program.
The application has at least the following beneficial effects:
the application acquires the login failure frequency, the unauthorized access application frequency and the access module frequency of the user in the financial service application system every day, so as to form a security log vector of the user every day, acquire a login failure time set, an access time set and an unauthorized application time set of the user every day, respectively analyze fluctuation and variation trend of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user, acquire a data consistency score of the user every day, reflect the similarity of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user every day before, further judge the security of the financial service application data of the user every day, and improve the reliability of monitoring abnormal data of the financial service application; based on the difference of elements in the login failure time set and the access time set of the user every day, determining the operational sequence correlation coefficient of the user every day, reflecting the precedence relation between the login failure time set and the time points in the access time set, reflecting the possibility that a user account is attacked by an attacker, adding causality features in the abnormal data monitoring process of the financial service application, and improving the accuracy of the abnormal data monitoring of the financial service application; analyzing the correlation between the login failure frequency and the non-authority access application frequency of the user to obtain a data change trend coefficient of each day of the user, combining the correlation coefficient of the operation sequence with the data change trend coefficient and the difference between a non-authority application time set and a login failure time set to obtain a data correlation change index of each day of the user, reflecting the possibility that a financial service application system of each day of the user is attacked and abnormal data appears, combining the data consistency score, the data correlation change index and the security log vector, utilizing an isolated forest algorithm to complete online monitoring of the abnormal data of the financial service application, reflecting the value and timeliness of carrying out abnormal monitoring on the security log vector in the isolated forest algorithm, improving the sensitivity of abnormal monitoring while guaranteeing the accuracy of abnormal data monitoring, and effectively challenging risks in the financial service application system.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart illustrating a method for online monitoring abnormal data of a financial service application according to an embodiment of the present application;
FIG. 2 is a flow chart of weight construction of an orphan tree.
Detailed Description
In order to further describe the technical means and effects adopted by the present application to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of a method and a system for online monitoring abnormal data of financial service according to the present application, which are described in detail below with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
The application provides a method and a system for online monitoring of abnormal data of financial service application, which are specifically described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for online monitoring abnormal data of a financial service application according to an embodiment of the application is shown, the method includes the following steps:
S1, acquiring login failure frequency, unauthorized access application frequency and access module frequency of a user in a financial service application system every day, and forming a security log vector of the user every day; and acquiring a login failure time set, an access time set and a non-authority application time set of the user every day.
And downloading the safety operation log of the user from the safety operation log module of the financial service application system, and counting the daily data according to the period of one day. And counting to obtain login failure frequency, non-authority access application frequency and access module frequency of the user every day, wherein the login failure frequency represents the number of login failure times of the user every day, the non-authority access application frequency represents the number of access application times of the user accessing the non-authority module every day, and the access module frequency represents the sum of the number of times of the user accessing the authority module and the access application number of the non-authority module every day. And forming a security log vector of the user every day by the login failure frequency, the unauthorized access application frequency and the access module frequency of the user every day, for example, the login failure frequency of the user on the a-th day is 4, the unauthorized access application frequency is 1, the access module frequency is 2, and the security log vector of the user on the a-th day is (4, 1, 2).
The time interval between the time point of each login failure of the user and the zero point of each day takes seconds as a unit to form a login failure time set of the user every day, the time interval between the starting time point of each access of any module of the user every day and the zero point of each day takes seconds as a unit to form an access time set of the user every day, and the time interval between the starting time point of each unauthorized access application of the user every day and the zero point of each day takes seconds as a unit to form an unauthorized application time set of the user every day. Taking the login failure time set of the user's day a as an example, if the login failure time point of the user's day a is 0, 2 minutes and 20 seconds, the element corresponding to the login failure time set is 140, that is, the number of seconds included in 2 minutes and 20 seconds.
If the account of the user does not have any operation within one day, the account is dormant day, and the value of each component in the security log vector corresponding to the dormant day is the average value of each component in the security log vector corresponding to all non-dormant days of the user.
S2, respectively analyzing login failure frequency, unauthorized access application frequency and fluctuation and change trend of access module frequency of the user to obtain data consistency scores of the user every day.
Further, as an example of the present application, the process of constructing the data consistency score for each day of the user is:
Firstly, when a user operates a financial service application system, due to the development of habit of the user, the collected safe operation log data in most cases have high similarity, for example, login failure hardly or rarely occurs, and an operated module is also often a module with authority of the user, so that unauthorized access application is not frequently performed.
Taking the day a as an example, the login failure frequency of the previous day a is formed into a login failure sequence of the day a according to the time sequence, and then the login failure sequence and the subsequence length delta when moving average is used as input, and each moving average is obtained by adopting a moving average method. The length δ=5 of the subsequence at the time of moving average in this example, and the implementer can set itself according to the actual situation, the present application is not limited herein. The moving average method is a known technique, and the detailed description of the present application is omitted here; all the obtained moving averages are formed into a moving average sequence L a of a day login failure frequency according to a time sequence, and a moving average sequence P a of a day unauthorized access application frequency and a moving average sequence Q a of an access module frequency are calculated by adopting the same calculation method as the moving average sequence of the day login failure frequency.
Taking a moving average sequence L a of a day login failure frequency as an example, calculating the difference between each element in the moving average sequence L a and the previous element, marking the difference as a neighboring difference, carrying out forward fusion on all the neighboring differences in the moving average sequence L a, and taking a forward fusion result as a data comprehensive fluctuation coefficient of the day login failure frequency.
The difference represents the degree of difference between two variables, and specifically may be calculated by using a difference value, a ratio value, or the like, and the forward fusion represents the combination of a plurality of variables in such a manner as to enhance the overall effect, and specifically may be calculated by using addition, multiplication, or the like.
In one embodiment of the present application, the difference between each element in the moving average sequence L a and the previous element is calculated, and the sum of all the differences in the moving average sequence L a is used as the data integrated fluctuation coefficient of the login failure frequency on the a-th day.
And calculating the data comprehensive fluctuation coefficient of the frequency of the access application without permission on the a day and the data comprehensive fluctuation coefficient of the frequency of the access module on the a day by adopting the same calculation method as the data comprehensive fluctuation coefficient of the frequency of the login failure on the a day.
Calculating the difference between the data comprehensive fluctuation coefficient of the access module frequency and the data comprehensive fluctuation coefficient of the non-authority access application frequency, and marking the difference as a first difference, and calculating the difference between the non-authority access application frequency of any day and the average value of the non-authority access application frequency of all days before any day, and marking the difference as a second difference;
The data comprehensive fluctuation coefficient of the login failure frequency, the first difference and the second difference are synthesized, and the data consistency score of any day is determined;
and the data consistency score of any day is in negative correlation with the comprehensive fluctuation coefficient of the login failure frequency data and the second difference, and is in positive correlation with the first difference.
It should be understood that a positive correlation indicates that the dependent variable increases with increasing independent variable, and that the dependent variable decreases with decreasing independent variable, and may specifically be an additive relationship, a multiplicative relationship, an idempotent of an exponential function, and a negative correlation indicates that the dependent variable decreases with increasing independent variable, and that the dependent variable increases with decreasing independent variable, and may specifically be a subtractive relationship, a divisive relationship, or the like.
In one embodiment of the present application, taking the a day as an example, calculating the sum result of the data integrated fluctuation coefficient of the a-th access application frequency and the adjustment factor preset to be greater than 0, calculating the ratio of the data integrated fluctuation coefficient of the a-th access module frequency to the sum result, that is, the first difference, calculating the subtraction result of the mean value of the a-th access application frequency and the a-th access application frequency, that is, the second difference, taking the subtraction result as the independent variable of the sign function, when the independent variable of the sign function is less than or equal to 0, otherwise, the function value of the sign function is the value of the independent variable, taking the function value of the sign function as the index of the index function based on the natural constant, calculating the product of the calculation result of the index function and the data integrated fluctuation coefficient of the a-th login failure frequency, calculating the reciprocal of the product and the sum value of the adjustment factor preset to be greater than 0, and taking the reciprocal of the reciprocal and the ratio as the data consistency score of the a-th day.
It should be noted that, as an embodiment of the present application, the adjustment factors preset to be greater than 0 and the adjustment coefficients preset to be greater than 0 are both set to be 0.01 by human, and the practitioner can set themselves according to the actual situation, the present application is not limited herein, and the purpose of using the function value of the sign function as the exponent of the exponent function based on the natural constant is to calculate the subtraction result of the average value of the unauthorized access application frequency on the a-th day and the unauthorized access application frequency on the a-th day as the argument of the sign function, and measuring whether the frequency of the unauthorized access application on the a-th day is normal or not, if the frequency of the unauthorized access application on the a-th day is smaller than the average value of the frequency of the unauthorized access application on the a-th day, indicating that the frequency of the unauthorized access application on the a-th day may be normal, at this time, reducing the influence of the frequency of the unauthorized access application on the a-th day on the data consistency score on the a-th day, otherwise, indicating that the frequency of the unauthorized access application on the a-th day may be abnormal, and amplifying the influence of the frequency of the unauthorized access application on the a-th day on the data consistency score on the a-th day.
When no abnormality exists in the safety operation log data of the current day a, the risk of the account is smaller, the safety operation log data of each day has high similarity, so that after moving average, each moving average fluctuates around a fixed value, but the fluctuation is small, and after positive and negative counteraction, the comprehensive fluctuation coefficient of the data is smaller or is approximately 0, so that the first difference is larger, and the comprehensive fluctuation coefficient of the data of the login failure frequency of the day a is smaller; and because of high similarity between the safety operation log data of each day, the average value of the unlicensed access application frequency of the a day is close to that of the unlicensed access application frequency of the previous a day, so that the calculation result of the exponential function is smaller. The greater the data consistency score on day a when no anomaly exists in the safety operation log data on day a.
Otherwise, if the login failure frequency is abnormal, the moving average value of the login failure frequency is larger, and the comprehensive fluctuation coefficient of the data of the login failure frequency on the a-th day is larger, so that the smaller the data consistency score is finally caused; or the frequency of the unauthorized access application increases, so that the larger the calculation result of the exponential function is, the smaller the data consistency score is finally caused.
S3, determining an operation sequence correlation coefficient of each day of the user based on the difference between the login failure time set and the elements in the access time set of each day of the user.
For the time point in the login failure time set of the user every day, the attacker is possibly unfamiliar with the financial service application system, the password of the user is continuously tried to be cracked, so that the login failure occurs, and the time point in the access time set represents the starting time point of any module in the access financial service application system after the user is successfully logged in, so that the precedence relation between the time point in the login failure time set and the time point in the access time set is analyzed, namely, the operation sequence correlation coefficient of the user every day is determined, and the account security of the user can be assisted to be judged.
If all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the daily operation sequence correlation coefficient is a first preset value, otherwise, the daily operation sequence correlation coefficient is a second preset value, and the first preset value is smaller than the second preset value.
The first preset value is set to 1 by human, the second preset value is set to 2 by human, and the operator can set according to the actual situation, which is not limited in this application.
When all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the fact that the time points in the access time set are all before the time points in the login failure time set is reflected that the phenomenon that the user account is successful in login after being attacked by an attacker is not shown, the fact that the user account is safe at the moment is shown, the operation sequence correlation coefficient of the user is smaller, otherwise, the fact that the module in the financial service application system is accessed after the login failure is shown, at the moment, the fact that the attacker is likely to succeed in cracking the financial service application system is shown, the fact that the module in the financial service application system is accessed is likely to be caused by the attacker is shown, the fact that the user account is at risk is shown, and the operation sequence correlation coefficient of the user is larger is shown.
And S4, analyzing the correlation between the login failure frequency and the unauthorized access application frequency of the user to obtain the daily data change trend coefficient of the user.
When the account of the user is attacked by an attacker, the attacker needs to continuously try to crack the password of the user, so that the login failure frequency of the user is increased, after the login is successful, the attacker is unfamiliar with a financial service application system, and meanwhile, the authority owned by the user is not clear, so that the attacker can continuously access each module to check the authority module and information of the user, thereby increasing the frequency of the access module and the frequency of the unauthorized access application frequency, the attacker can try to utilize the existing authority of the account, and the unauthorized access is performed by searching and utilizing the vulnerability of the authority management of the financial service application system, so that the speed of the unauthorized access application frequency is increased more quickly than the speed of the access module in the process; when the user uses the financial service application system, the familiarity of the user with the financial service application system is high, and the authority of the user is clear, so that the user can perform purposeful operation in the financial service application system, and can not frequently click each module and access the module without the authority.
Therefore, according to the analysis, the correlation between the login failure sequence and the unauthorized access application sequence is calculated, the growth rate of each element in the login failure sequence is calculated relative to the previous element, the growth rate is marked as a first growth rate, the growth rate of each element in the unauthorized access application sequence is calculated relative to the previous element, the growth rate is marked as a second growth rate, the differences between the first growth rate and the second growth rate at all the same positions are used for the login failure sequence and the unauthorized access application sequence, and the correlation is fused in a forward direction, and the forward direction fusion result is used as the data change trend coefficient of each day.
It should be noted that, the correlation represents a correlation between sequences, and specifically, pearson correlation coefficients, cosine similarity, spearman class correlation coefficients, and the like may be used for calculation.
In one embodiment of the present application, pearson correlation coefficients of a login failure sequence and a non-authority access application sequence of each day are calculated, for the login failure sequence and the non-authority access application sequence of each day, a difference value between the first growth rate and the second growth rate at the same position is recorded as a trend difference value, the trend difference value is used as an independent variable of a sign function, when the independent variable of the sign function is less than or equal to 0, a function value of the sign function is 0, otherwise, the function value of the sign function is the value of the independent variable, and a sum of the function values of the sign function at all the same positions and a sum of the pearson correlation coefficients are used as a data change trend coefficient of each day of a user.
If the data change trend coefficient of the user on the a day is larger, the account of the user is more likely to be attacked by an attacker on the a day, and the risk of the account is more likely to be present, and the abnormal data of the financial service application is more likely to be present.
S5, combining the operation sequence correlation coefficient with the data change trend coefficient and the difference between the unauthorized application time set and the login failure time set to obtain a data association change index of the user every day.
Calculating the difference of the average value of all elements in the unlicensed application time set and the login failure time set of each day, marking the difference as a third difference, and integrating the third difference, the data change trend coefficient and the operation sequence correlation coefficient of each day to determine the data association change index of each day;
The daily data correlation change index and the daily data change trend coefficient, the third difference and the operation sequence correlation coefficient are all positive correlations.
In one embodiment of the present application, the difference between the average value of all elements in the unauthorized application time set and the average value of all elements in the login failure time set of each day is calculated, namely the third difference is recorded as a time difference, and the product of the daily data change trend coefficient, the time difference and the operation sequence correlation coefficient is used as the daily data association change index.
When the difference between the average value of all elements in the unauthorized application time set and the average value of all elements in the login failure time set is larger, the possibility that the risk exists in the user system is higher after the time point of most of the time points in the unauthorized application time set of the user is in the login failure time set, the possibility of data abnormality is higher, the data association change index of the user is larger, the data change trend coefficient and the operation sequence correlation coefficient are also larger, the possibility of data abnormality is also higher, and the data association change index of the user is larger.
And S6, combining the data consistency score, the data association change index and the safety log vector, and completing online monitoring of abnormal data of financial service application by utilizing an isolated forest algorithm.
In one embodiment of the present application, the number of samples extracted each time is set to 50 by human, the number of isolated trees in the isolated forest algorithm is set to 100 by human, n=400, and the number of samples, the number of isolated trees and the value implementer of N can be set according to the actual situation by themselves.
In the original isolated forest algorithm, the weight of each isolated tree is the same, but according to the construction process of the isolated tree, if the selected sample data has higher similarity when constructing the isolated tree, the value of the isolated tree is lower when the data anomaly detection is carried out subsequently; if the depths of the left subtree and the right subtree of the isolated tree are relatively close, the corresponding time cost of the isolated tree is relatively high when the data anomaly is detected, and the online monitoring of the anomaly data is also not facilitated. Therefore, in the application, the weight of the isolated tree in the isolated forest algorithm is adjusted by combining the daily data consistency score and the data association change index of the user, specifically:
Calculating the discrete degree of the data consistency scores of all the safety log vectors contained in each isolated tree in corresponding days, calculating the average value of the data association change indexes of all the safety log vectors contained in each isolated tree in corresponding days, and calculating the difference between the maximum value and the minimum value of the depths of the left subtree and the right subtree of each isolated tree to be marked as a fourth difference;
determining a weight for each orphan tree based on the degree of discretization, the average value, and the fourth difference;
The weight of each isolated tree is in positive correlation with the degree of discretization, the average value and the fourth difference.
The degree of dispersion represents the distribution characteristics and the trend of the data distribution, and specifically, the degree of dispersion may be calculated using variance, standard deviation, or the like.
In one embodiment of the present application, taking the i-th isolated tree as an example, calculating the variance of the data consistency scores of all days corresponding to all the safety log vectors in the i-th isolated tree, taking the variance as an index of an exponential function with a natural constant as a base, marking the index as a first exponential function, calculating the mean value of the data association change indexes of all the days corresponding to all the safety log vectors in the i-th isolated tree, taking the mean value of the data association change indexes as an independent variable of a sign function, marking the independent variable as a first sign function, when the independent variable of the first sign function is less than or equal to 0, taking the function value of the first sign function as the value of the independent variable, otherwise, taking the function value of the first sign function as the value of the independent variable, calculating the ratio of the maximum value to the minimum value of the depth of the left and right subtrees of the i-th isolated tree, namely, the fourth difference, and taking the calculation result of the first sign function, the product of the maximum value and the product of the minimum value as the weight of the i-th isolated tree.
If all samples of the ith isolated tree, namely all safety log vectors, have higher similarity, the smaller the variance of the data consistency scores of the ith isolated tree is, the smaller the possibility that the data consistency scores of the ith isolated tree are abnormal, the smaller the weight of the ith isolated tree is, the first index function is calculated to amplify the variance of the data consistency scores, when the variance of the data consistency scores is larger, the larger the weight of the ith isolated tree is caused, if the mean value of the data association change indexes of all the safety log vectors in the ith isolated tree is larger, the greater the probability that the data abnormality exists in all the safety log vectors in the ith isolated tree is, the greater the weight of the ith isolated tree is, the first symbol function is calculated to ensure the non-negative property of the weight of the ith isolated tree, the greater the difference between the maximum value and the minimum value of the depth of the left and right of the ith isolated tree is, and the higher the value of the data can be distinguished on the basis of the greater the data is, and the greater the value of the data abnormality is monitored on the ith isolated tree is indicated.
The weight of each isolated tree in the isolated forest algorithm is calculated by adopting the method which is the same as the weight of the ith isolated tree, so that the training of the isolated forest is completed, and a weight construction flow chart of the isolated tree is shown in figure 2.
And taking the safety log vector monitored every day as input of an isolated forest algorithm after training, outputting the safety log vector as an abnormal score of the corresponding safety log vector, judging the corresponding safety log vector as abnormal data when the abnormal score is smaller than a preset threshold value, otherwise, judging the corresponding safety log vector as normal data, wherein the isolated forest algorithm is the prior known technology, and the application is not described in detail herein. In one embodiment of the present application, the preset threshold is set to 0.5 by human, and the operator can set the preset threshold according to the actual situation, which is not limited herein.
Based on the same inventive concept as the above method, the embodiment of the application further provides an online monitoring system for abnormal data of financial service application, which comprises a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor realizes the steps of any one of the online monitoring methods for abnormal data of financial service application when executing the computer program.
In summary, the application collects the login failure frequency, the unauthorized access application frequency and the access module frequency of the user in the financial service application system to form the daily security log vector of the user, obtains the login failure time set, the access time set and the unauthorized application time set of the user each day, respectively analyzes the fluctuation and the variation trend of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user, obtains the daily data consistency score of the user, reflects the similarity of the login failure frequency, the unauthorized access application frequency and the access module frequency of the user on all days before any day, further judges the security of the financial service application data of the user on any day, and improves the reliability of monitoring the abnormal data of the financial service application; based on the difference of elements in the login failure time set and the access time set of the user every day, determining the operational sequence correlation coefficient of the user every day, reflecting the precedence relation between the login failure time set and the time points in the access time set, reflecting the possibility that a user account is attacked by an attacker, adding causality features in the abnormal data monitoring process of the financial service application, and improving the accuracy of the abnormal data monitoring of the financial service application; analyzing the correlation between the login failure frequency and the non-authority access application frequency of the user to obtain a data change trend coefficient of each day of the user, combining the correlation coefficient of the operation sequence with the data change trend coefficient and the difference between a non-authority application time set and a login failure time set to obtain a data correlation change index of each day of the user, reflecting the possibility that a financial service application system of each day of the user is attacked and abnormal data appears, combining the data consistency score, the data correlation change index and the security log vector, utilizing an isolated forest algorithm to complete online monitoring of the abnormal data of the financial service application, reflecting the value and timeliness of carrying out abnormal monitoring on the security log vector in the isolated forest algorithm, improving the sensitivity of abnormal monitoring while guaranteeing the accuracy of abnormal data monitoring, and effectively challenging risks in the financial service application system.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.
The foregoing description of the preferred embodiments of the present application is not intended to be limiting, but rather, any modifications, equivalents, improvements, etc. that fall within the principles of the present application are intended to be included within the scope of the present application.

Claims (5)

1. An online monitoring method for abnormal data of financial service application is characterized by comprising the following steps:
Collecting login failure frequency, unauthorized access application frequency and access module frequency of a user in a financial service application system every day, and forming a security log vector of the user every day; acquiring a login failure time set, an access time set and a non-authority application time set of a user every day;
Respectively analyzing login failure frequency, unauthorized access application frequency and fluctuation and change trend of access module frequency of a user to obtain data consistency scores of the user every day;
determining an operation sequence correlation coefficient of a user every day based on the difference of elements in the login failure time set and the access time set of the user every day;
analyzing the correlation between the login failure frequency and the unauthorized access application frequency of the user to obtain the daily data change trend coefficient of the user;
Combining the operation sequence correlation coefficient with the data change trend coefficient and the difference between the unauthorized application time set and the login failure time set to obtain a data association change index of a user every day;
combining the data consistency score, the data association change index and the safety log vector, and utilizing an isolated forest algorithm to apply abnormal data on-line monitoring to the financial service;
the login failure frequency of any day and all days before the any day of the user is formed into a login failure sequence of the any day;
the determining process of the operation sequence correlation coefficient comprises the following steps:
If all elements in the daily access time set are smaller than or equal to the minimum value in the daily login failure time set, the daily operation sequence correlation coefficient is a first preset value, otherwise, the daily operation sequence correlation coefficient is a second preset value, wherein the first preset value is smaller than the second preset value;
The data change trend coefficient determining process comprises the following steps:
calculating the correlation between a login failure sequence and a non-authority access application sequence every day, calculating the growth rate of each element in the login failure sequence every day relative to the previous element, marking the growth rate as a first growth rate, calculating the growth rate of each element in the non-authority access application sequence every day relative to the previous element, marking the growth rate as a second growth rate, carrying out forward fusion on the correlation, and taking the forward fusion result as a data change trend coefficient every day;
the determination of the data consistency score includes:
obtaining moving averages of login failure sequences by using a moving average method, forming a moving average sequence of the login failure sequences, and determining a data comprehensive fluctuation coefficient of the login failure frequency of any day based on the difference of adjacent elements in the moving average sequence of the login failure sequences;
acquiring the data comprehensive fluctuation coefficient of the access application frequency and the access module frequency without permission of any day by adopting a calculation method which is the same as the data comprehensive fluctuation coefficient of the login failure frequency of any day;
Calculating the difference between the comprehensive fluctuation coefficient of the data of the access module frequency and the comprehensive fluctuation coefficient of the data of the non-authority access application frequency, which is marked as a first difference, and calculating the difference between the average value of the non-authority access application frequency of any day and the non-authority access application frequency of all days before any day, which is marked as a second difference;
the data comprehensive fluctuation coefficient of the login failure frequency, the first difference and the second difference are integrated, and the data consistency score of any day is determined;
The data consistency score of any day, the comprehensive fluctuation coefficient of the login failure frequency data and the second difference form a negative correlation relationship, and the data consistency score of any day and the first difference form a positive correlation relationship;
the determining process of the comprehensive fluctuation coefficient of the login failure frequency data comprises the following steps:
Calculating the difference between each element in the moving average sequence of the login failure sequence and the adjacent element, marking the difference as the adjacent difference, carrying out forward fusion on all the adjacent differences in the moving average sequence of the login failure sequence, and taking the forward fusion result as the data comprehensive fluctuation coefficient of the login failure frequency;
the process for determining the data association change index comprises the following steps:
Calculating the difference of the average value of all elements in the unlicensed application time set and the login failure time set of each day, marking the difference as a third difference, and integrating the third difference, the data change trend coefficient and the operation sequence correlation coefficient of each day to determine the data association change index of each day;
The daily data correlation change index and the daily data change trend coefficient, the third difference and the operation sequence correlation coefficient are all positive correlations.
2. The method for online monitoring abnormal data of financial service application according to claim 1, wherein the access module frequency is a sum of the number of times the user accesses the authorized module and the unauthorized access application frequency.
3. The method as claimed in claim 1, wherein the online monitoring of the abnormal data of the financial service application comprises:
Taking the daily security log vector as input of an isolated forest algorithm, and determining the weight of each isolated tree based on the data consistency score and the distribution of the data association change indexes of the security log vector corresponding to the days contained in each isolated tree in the isolated forest algorithm and the depths of the left subtree and the right subtree of each isolated tree;
The output of the isolated forest algorithm is the anomaly score of the safety log vector of each day, when the anomaly score is smaller than a preset threshold value, the corresponding safety log vector is judged to be abnormal data, and otherwise, the corresponding safety log vector is judged to be normal data.
4. The online monitoring method of abnormal data for financial service application as claimed in claim 3, wherein the weight of each isolated tree is determined by:
Calculating the discrete degree of the data consistency scores of all the safety log vectors contained in each isolated tree in corresponding days, calculating the average value of the data association change indexes of all the safety log vectors contained in each isolated tree in corresponding days, and calculating the difference between the maximum value and the minimum value of the depths of the left subtree and the right subtree of each isolated tree to be marked as a fourth difference;
determining a weight for each orphan tree based on the degree of discretization, the average value, and the fourth difference;
The weight of each isolated tree is in positive correlation with the degree of discretization, the average value and the fourth difference.
5. An online monitoring system for abnormal data of a financial service application, comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-4 when executing the computer program.
CN202410740166.0A 2024-06-07 2024-06-07 A method and system for online monitoring of abnormal data in financial service applications Active CN118569871B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410740166.0A CN118569871B (en) 2024-06-07 2024-06-07 A method and system for online monitoring of abnormal data in financial service applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410740166.0A CN118569871B (en) 2024-06-07 2024-06-07 A method and system for online monitoring of abnormal data in financial service applications

Publications (2)

Publication Number Publication Date
CN118569871A CN118569871A (en) 2024-08-30
CN118569871B true CN118569871B (en) 2024-11-26

Family

ID=92468103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410740166.0A Active CN118569871B (en) 2024-06-07 2024-06-07 A method and system for online monitoring of abnormal data in financial service applications

Country Status (1)

Country Link
CN (1) CN118569871B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119299194B (en) * 2024-10-17 2025-04-11 山东唯选康科技创新有限公司 A network security protection method for cloud service system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692216A (en) * 2023-12-13 2024-03-12 航天信息股份有限公司 Abnormal login behavior management method and device, storage medium and electronic equipment
CN117978461A (en) * 2024-01-15 2024-05-03 兵器装备集团财务有限责任公司 Abnormal login detection method and system based on isolated forest

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11757906B2 (en) * 2019-04-18 2023-09-12 Oracle International Corporation Detecting behavior anomalies of cloud users for outlier actions
CN114090402A (en) * 2021-11-03 2022-02-25 中国电子科技集团公司第三十研究所 User abnormal access behavior detection method based on isolated forest
CN117852003B (en) * 2024-02-04 2024-06-21 杭州全顺科技有限公司 Account monitoring early warning management method based on data analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117692216A (en) * 2023-12-13 2024-03-12 航天信息股份有限公司 Abnormal login behavior management method and device, storage medium and electronic equipment
CN117978461A (en) * 2024-01-15 2024-05-03 兵器装备集团财务有限责任公司 Abnormal login detection method and system based on isolated forest

Also Published As

Publication number Publication date
CN118569871A (en) 2024-08-30

Similar Documents

Publication Publication Date Title
CN113542279B (en) Network security risk assessment method, system and device
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
US20210400066A1 (en) Identifying data processing timeouts in live risk analysis systems
CN118569871B (en) A method and system for online monitoring of abnormal data in financial service applications
CN118134634B (en) Internet credit integrated management system
Abushark et al. Cyber security analysis and evaluation for intrusion detection systems
CN114553456B (en) Digital identity network alarm
CN119513919A (en) Privacy data protection method and system based on homomorphic encryption and federated learning
CN111783073A (en) Black product identification method and device and readable storage medium
CN106951776A (en) A kind of Host Anomaly Detection method and system
CN117955863A (en) Data security detection method and system based on artificial intelligence
CN116232745A (en) Knowledge-graph-based network security monitoring and early warning management method and system
CN115174205A (en) Network space safety real-time monitoring method, system and computer storage medium
CN119766522A (en) A network security situation awareness prediction method based on knowledge graph
RU2659736C1 (en) System and method of detecting new devices under user interaction with banking services
CN118965328A (en) Magnetic induction smart card secure access authentication method and system based on deep learning
CN118247054A (en) Transaction data processing method, device and server
CN118278730A (en) Mine external factor fire early warning method based on game theory combined weighting
Mendes et al. Benchmarking the security of web serving systems based on known vulnerabilities
CN112966732B (en) Multi-factor interactive behavior anomaly detection method with periodic attribute
Wu et al. Vulnerability time series prediction based on multivariable LSTM
CN114553587A (en) Big data analysis method and server for dealing with cloud service threat
KR101725450B1 (en) Reputation management system provides safety in html5 and method of the same
Cheng et al. Defending on-line web application security with user-behavior surveillance
CN115426186B (en) Network behavior detection method and related device based on network security policy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Country or region after: China

Address after: 100193 619, Floor 6, Building 1, Yard 138, Malianwa North Road, Haidian District, Beijing

Applicant after: Shenzhou Rongxin Cloud Technology Co.,Ltd.

Address before: 100193 619, Floor 6, Building 1, Yard 138, Malianwa North Road, Haidian District, Beijing

Applicant before: Digital China Rongxin cloud Technology Service Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant