[go: up one dir, main page]

CN119130623B - A testing method and system based on financial environment stability - Google Patents

A testing method and system based on financial environment stability Download PDF

Info

Publication number
CN119130623B
CN119130623B CN202411588020.5A CN202411588020A CN119130623B CN 119130623 B CN119130623 B CN 119130623B CN 202411588020 A CN202411588020 A CN 202411588020A CN 119130623 B CN119130623 B CN 119130623B
Authority
CN
China
Prior art keywords
data
risk
transaction
performance
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411588020.5A
Other languages
Chinese (zh)
Other versions
CN119130623A (en
Inventor
才春海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Farben Information Technology Co ltd
Original Assignee
Shenzhen Farben Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Farben Information Technology Co ltd filed Critical Shenzhen Farben Information Technology Co ltd
Priority to CN202411588020.5A priority Critical patent/CN119130623B/en
Publication of CN119130623A publication Critical patent/CN119130623A/en
Application granted granted Critical
Publication of CN119130623B publication Critical patent/CN119130623B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/3668Testing of software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Educational Administration (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本发明涉及金融服务技术领域,尤其涉及一种基于金融环境稳定性的测试方法及系统。该方法包括以下步骤:对金融交易系统进行实时状态监测,生成系统状态基础数据;根据系统状态基础数据构建虚拟测试环境,并进行测试场景自适应配置,生成测试环境参数数据;对测试环境参数数据进行多维度压力测试分析,并计算系统稳定性指标,生成稳定性基准数据;根据稳定性基准数据进行动态负载梯度处理,得到系统压力梯度数据;对系统压力梯度数据进行风险等级映射,生成风险评估映射数据。本发明通过构建虚拟测试环境并进行自适应配置,能够模拟真实的交易场景,提升测试的相关性和有效性,确保系统在不同压力条件下的稳定运行。

The present invention relates to the field of financial service technology, and in particular to a testing method and system based on the stability of a financial environment. The method comprises the following steps: real-time status monitoring of a financial transaction system to generate basic system status data; constructing a virtual test environment based on the basic system status data, and performing adaptive configuration of test scenarios to generate test environment parameter data; performing multi-dimensional stress test analysis on the test environment parameter data, and calculating system stability indicators to generate stability benchmark data; performing dynamic load gradient processing based on the stability benchmark data to obtain system pressure gradient data; performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data. By constructing a virtual test environment and performing adaptive configuration, the present invention can simulate real transaction scenarios, improve the relevance and effectiveness of the test, and ensure the stable operation of the system under different stress conditions.

Description

Financial environment stability-based test method and system
Technical Field
The invention relates to the technical field of financial services, in particular to a method and a system for testing stability based on a financial environment.
Background
Financial environmental stability refers to the ability of a financial system to remain in normal operation, resist risk impact, and ensure that financial institutions and markets can smoothly conduct transactions and clearing under the influence of various internal and external environmental factors. Stability of a financial environment involves multiple layers including robustness of the financial institution, liquidity of the financial market, prevention and control of systematic risks, and effectiveness of macro-economic policies. Financial environmental stability includes robustness of the financial system, liquidity of the financial market, systematic risk prevention and control capability, reliability of the financial infrastructure, risk management and early warning mechanisms, and the like. Testing of financial environmental stability is a critical process to evaluate and ensure that a financial system can operate stably under varying pressure conditions. The test may help financial institutions, regulatory authorities, and other market participants pre-identify potential risks and take measures to prevent outbreaks of systematic financial risks. The following is a detailed description of the stability test of the financial environment, covering the purpose of the test, key test indicators, common methods and specific test procedures. However, the traditional testing method is difficult to comprehensively cover the multidimensional parameters such as the number of concurrent users, the transaction frequency, the transaction complexity and the like, is generally only dependent on static pressure testing results, lacks the response capability to the real-time change of the system, and is difficult to quickly locate and identify the system bottleneck in the traditional system testing especially in a large-scale transaction scene.
Disclosure of Invention
Accordingly, the present invention is directed to a method and system for testing stability of financial environment, which solve at least one of the above-mentioned problems.
In order to achieve the above purpose, a method for testing stability based on financial environment comprises the following steps:
Step S1, monitoring a real-time state of a financial transaction system to generate system state basic data, constructing a virtual test environment according to the system state basic data, and performing test scene self-adaptive configuration to generate test environment parameter data;
S2, performing multidimensional pressure test analysis on the test environment parameter data, calculating a system stability index to generate stability reference data, performing dynamic load gradient processing according to the stability reference data to obtain system pressure gradient data, performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data;
Step S3, acquiring abnormal behavior characteristics of the financial transaction system by using the distributed monitoring system to generate abnormal behavior characteristic data, carrying out fault mode identification on the abnormal behavior characteristic data, establishing a fault early warning threshold value and generating early warning trigger condition data;
Step S4, performing multi-dimensional performance analysis based on system monitoring data acquired in real time to obtain performance index data, wherein the multi-dimensional performance comprises multi-dimensional performance, resource utilization dimension, service processing dimension and reliability dimension;
Step S5, performing risk level assessment on the performance bottleneck positioning data through early warning triggering condition data to obtain risk assessment data, constructing a system recovery strategy according to the risk assessment data to generate fault recovery scheme data, performing simulation verification on the fault recovery scheme data, and performing recovery strategy generation based on simulation results to obtain recovery strategy data;
and S6, performing stability evaluation on the financial transaction system by using the risk evaluation mapping data and the recovery strategy data, so as to obtain environmental stability evaluation data.
The invention also provides a testing system based on the stability of the financial environment, which is used for executing the testing method based on the stability of the financial environment, and comprises the following steps:
The system comprises an environment construction module, a virtual test environment, a test scene self-adaptive configuration module, a test environment parameter module and a test environment parameter module, wherein the environment construction module is used for carrying out real-time state monitoring on a financial transaction system to generate system state basic data;
The system comprises a pressure evaluation module, a dynamic load gradient processing module, a risk level mapping module and a risk level analysis module, wherein the pressure evaluation module is used for carrying out multidimensional pressure test analysis on test environment parameter data and calculating a system stability index to generate stability reference data;
The abnormal monitoring module is used for acquiring abnormal behavior characteristics of the financial transaction system by using the distributed monitoring system and generating abnormal behavior characteristic data;
The performance diagnosis module is used for carrying out multi-dimensional performance analysis based on the system monitoring data acquired in real time to obtain performance index data, wherein the multi-dimensional performance comprises multi-dimensional performance, resource utilization dimension, service processing dimension and reliability dimension;
the recovery strategy module is used for carrying out risk level assessment on the performance bottleneck positioning data through the early warning triggering condition data to obtain risk assessment data, constructing a system recovery strategy according to the risk assessment data to generate fault recovery scheme data, carrying out simulation verification on the fault recovery scheme data, and carrying out recovery strategy generation based on a simulation result to obtain recovery strategy data;
and the stability evaluation module is used for evaluating the stability of the financial transaction system by utilizing the risk evaluation mapping data and the recovery strategy data so as to obtain environmental stability evaluation data.
The method and the system can quickly identify the running state and potential problems of the system through real-time state monitoring, ensure the high availability of the system, provide a reliable data basis for subsequent analysis, help decision makers to make more scientific selections when optimizing system configuration, enable a constructed virtual test environment to simulate a real transaction scene, provide a safe and reliable test platform for subsequent pressure test and performance evaluation, reduce the risk of actual system operation, enable environmental parameters to be dynamically adjusted according to actual conditions through self-adaptive configuration of the test scene, promote the correlation and effectiveness of the test, and better reflect the performance of the system under the real scene. The performance of the system under different pressure conditions can be comprehensively evaluated through multi-dimensional pressure test analysis, the performance bottleneck of the system can be identified, the generated stability reference data provides important references for subsequent performance optimization, the limits of normal operation and abnormal states can be identified, the dynamic load gradient processing can reflect the bearing capacity of the system more accurately, the system can be ensured to be stable under high load, potential high risk areas can be found in time through risk level mapping of the pressure gradient data, and data support is provided for risk management and control. By collecting the abnormal behavior characteristics, the abnormal condition in the running process of the system can be timely found, the sensitivity and the response speed of the system are improved, and the recognition of the abnormal behavior can help a system administrator to locate the problem source more quickly, so that the influence of faults on the service is reduced. And a fault early warning threshold and a triggering condition are established, so that timely intervention is facilitated before the problem is upgraded, and the risk of system breakdown or serious performance reduction is reduced. The multi-dimensional performance analysis provides comprehensive understanding of system performance, identifies key factors influencing service processing efficiency, can accurately identify the bottleneck of the system by combining pressure gradient data with performance indexes, improves pertinence of performance optimization, and can provide clear directions for system improvement and optimization by performance bottleneck positioning data, thereby enhancing the overall efficiency of the system. The risk level assessment of the performance bottleneck provides data support for the subsequent recovery strategy formulation, so that recovery measures are more scientific and effective, and the recovery strategy constructed according to the assessment data ensures quick response when problems occur, and reduces the time of service interruption. The defects in the scheme can be found in advance by carrying out simulation test on the recovery scheme, so that risks and resource waste in actual operation are reduced, and the effectiveness of the recovery scheme is improved. The system stability can be comprehensively evaluated through comprehensively analyzing the risk evaluation and recovery strategies, the system can be ensured to stably run under different conditions, and the environmental stability evaluation data provides key decision support for the management layer, so that the management layer is helped to consider potential risks during system adjustment and optimization. Periodic stability assessment can promote reliability of the system, enhance continuity of business, and ensure high availability and consistency throughout financial transactions.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of a non-limiting implementation, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of the steps of the testing method based on the stability of the financial environment;
FIG. 2 is a detailed step flow chart of step S1 in FIG. 1;
Fig. 3 is a detailed step flow chart of step S2 in fig. 1.
Detailed Description
The following is a clear and complete description of the technical method of the present invention, taken in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
In order to achieve the above objective, referring to fig. 1 to 3, the present invention provides a method for testing stability based on a financial environment, the method comprises the following steps:
Step S1, monitoring a real-time state of a financial transaction system to generate system state basic data, constructing a virtual test environment according to the system state basic data, and performing test scene self-adaptive configuration to generate test environment parameter data;
In a certain financial transaction system, by installing a distributed monitoring agent, each module of the system is monitored in real time, system layer data including CPU utilization rate, memory occupation, disk I/O, network transmission rate and the like are acquired, database state data including database connection number, active session number, SQL execution time and the like are acquired, and middleware state data including message queue depth, processing delay and the like are acquired. Then, a virtual test environment is constructed based on these monitoring data. For example, hardware resources such as the number of CPU cores, the memory capacity, the network bandwidth and the like configured in the virtual environment are adjusted according to the state of the actual financial transaction system. In addition, the system can carry out self-adaptive configuration according to scene characteristics, such as parameters of the number of concurrent users, the transaction frequency and the like in a transaction scene, and finally test environment parameter data is generated, so that the simulation environment is ensured to truly reflect the state of an actual system.
S2, performing multidimensional pressure test analysis on the test environment parameter data, calculating a system stability index to generate stability reference data, performing dynamic load gradient processing according to the stability reference data to obtain system pressure gradient data, performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data;
According to the test environment parameter data generated in the step S1, the pressure test of the financial system comprises simulating thousands of concurrent transaction requests, and the number of concurrent users and the transaction complexity are gradually increased. In the process, multidimensional pressure test is carried out aiming at transaction capability, database read-write capability, network bandwidth and the like of the system. For example, starting with 1000 concurrent users, 1000 users are added each time until the system crashes or the response time significantly exceeds a preset threshold. And generating stability reference data of the system according to the test results, and recording the stability performance of the system under different loads. Then, dynamic load gradient processing is performed by using the reference data, system pressure gradient data is generated, the gradient data is mapped into risk levels based on a risk level model, and risk assessment mapping data is output.
Step S3, acquiring abnormal behavior characteristics of the financial transaction system by using the distributed monitoring system to generate abnormal behavior characteristic data, carrying out fault mode identification on the abnormal behavior characteristic data, establishing a fault early warning threshold value and generating early warning trigger condition data;
The embodiment of the invention collects abnormal behavior data of the financial transaction system in real time through the distributed monitoring system, wherein the abnormal behavior data comprise abnormal phenomena such as memory leakage, abnormal occupation of a CPU, excessive connection of a database and the like. These outlier data are preprocessed using a time series analysis tool, such as outlier detection on the resource usage curves of the different modules. Then, using a failure mode recognition model based on machine learning, common failure types of the system are recognized, such as database connection pool saturation, message queue accumulation and the like, and a failure early warning threshold value is generated. These thresholds will be used as early warning trigger conditions to ensure that the system will send out early warning before the critical point of failure in order to take recovery measures in time.
Step S4, performing multi-dimensional performance analysis based on system monitoring data acquired in real time to obtain performance index data, wherein the multi-dimensional performance comprises multi-dimensional performance, resource utilization dimension, service processing dimension and reliability dimension;
The embodiment of the invention constructs a multi-dimensional performance analysis model based on the system monitoring data acquired in real time, and performs performance analysis from a system layer, a resource layer and a service layer respectively. For example, in the resource layer, the dimensions such as CPU, memory, disk and network utilization are analyzed with emphasis, and in the business layer, the indexes such as transaction success rate, transaction response time and clearing efficiency are analyzed. By correlating these performance data with the system pressure gradient data generated in step S2, the system can identify performance bottlenecks under high load conditions, such as too slow a database query speed, insufficient network bandwidth, and the like. And then, positioning the performance bottleneck point through a link tracking technology to generate performance bottleneck positioning data.
Step S5, performing risk level assessment on the performance bottleneck positioning data through early warning triggering condition data to obtain risk assessment data, constructing a system recovery strategy according to the risk assessment data to generate fault recovery scheme data, performing simulation verification on the fault recovery scheme data, and performing recovery strategy generation based on simulation results to obtain recovery strategy data;
According to the performance bottleneck positioning data, the risk level of the current system is estimated by utilizing the early warning trigger condition data generated in the step S3. For example, if it is monitored that the response time of a database instance continuously exceeds the early warning threshold, the system will automatically trigger a corresponding fault early warning. Next, based on the risk assessment data, the system will consult a historical failure recovery scheme library to build a recovery strategy appropriate for the current scenario. The recovery policy may include measures such as resource expansion, load balancing, service degradation, etc. And performing simulation test on the generated fault recovery scheme, evaluating the effectiveness of the scheme by testing parameters such as the recovery time, the fault resolution, the resource consumption and the like of the system, and optimizing based on the test result to generate final recovery strategy data.
And S6, performing stability evaluation on the financial transaction system by using the risk evaluation mapping data and the recovery strategy data, so as to obtain environmental stability evaluation data.
The embodiment of the invention utilizes the risk assessment mapping data in the step S2 and the recovery strategy data in the step S5 to carry out overall stability assessment on the financial transaction system. In practical applications, the system may evaluate the stability of the financial transaction system according to the current load situation, in combination with historical reference data, such as predicting the risk level and potential failure point of the system for a period of time in the future. Through the real-time evaluation, the system can discover and solve potential performance bottlenecks or resource bottlenecks in advance, and ensure the long-term stability of the financial environment.
As an embodiment of the present invention, referring to fig. 2, a detailed step flow diagram of step S1 in fig. 1 is shown, and in the embodiment of the present invention, step S1 includes the following steps:
Step S11, collecting system operation basic parameters of a financial transaction system to generate basic performance data, wherein the system operation basic parameters comprise CPU (Central processing Unit) utilization rate, memory occupancy rate, disk IO (input/output) and network transmission rate;
In the financial transaction system, the system performance monitoring tool such as Prometheus or Zabbix is utilized to collect the operation basic parameters of the system in real time. The method comprises the steps of monitoring CPU utilization rate of a system, recording CPU occupation conditions of the system at different transaction peaks, tracking dynamic changes of the memory occupation rate through a memory monitoring module to ensure stable memory use, evaluating disk performance of the system by tracking disk read-write speed and waiting time through disk I/O monitoring, measuring use conditions and transmission efficiency of network bandwidth of the system by capturing network flow data packets through network transmission rate, and finally generating basic performance data containing the parameters to serve as basic reference of system running states.
Step S12, acquiring the database connection number, the active session number, SQL execution time and lock waiting condition of the financial transaction system, so as to generate database state data;
According to the embodiment of the invention, the database connection number and the active session number in the financial transaction system are monitored through a database monitoring tool such as Oracle ENTERPRISE MANAGER or MySQL Workbench. The collected SQL execution time data comprises the time consumption of executing each query, and the lock waiting condition judges whether the resource competition problem exists or not by analyzing the lock log and the waiting queue of the database. The data are analyzed and processed through the monitoring tools, and the system generates database state data for further evaluating the operation efficiency and health condition of the database.
Step S13, monitoring the message queue depth, message processing delay and message accumulation condition of the financial transaction system in real time to generate middleware state data;
The embodiment of the invention monitors the depth of the message queue in the financial transaction system in real time through a message queue monitoring tool such as Kafka Monitor or RabbitMQ Management Plugin, and tracks the message accumulation condition of the system in a high concurrent transaction scene. Meanwhile, message processing delay data are collected, and delay time from extraction of the message from the queue to actual processing completion is estimated. The monitoring of the states of the key process and the thread is integrated with Prometaheus through a process monitoring tool such as Grafana, so that the CPU and memory use condition of the process and the concurrent state of the thread are collected, the normal operation of the system core process under high load is ensured, and middleware state data and process state data are generated.
Step S14, collecting service running states of a financial transaction system so as to obtain transaction performance data, clearing state data and wind control state data, wherein the wind control state data comprises transaction processing time, transaction success rate and transaction concurrency, the clearing state data comprises clearing processing efficiency, fund account time and account checking consistency, and the wind control state data comprises wind control rule response time, wind control interception rate and wind control accuracy rate;
in the business operation process of the financial transaction system, the embodiment of the invention collects transaction performance data in real time through a business monitoring system, such as an ELK (ELASTICSEARCH, LOGSTASH, KIBANA) stack based on log analysis. The transaction performance data includes processing time, transaction success rate and transaction concurrency of each transaction. The clearing state data is acquired through a clearing system interface and comprises the clearing efficiency, the fund account time and the consistency data of the daily terminal account. And the wind control state data is collected, and the response time, interception rate and accuracy of the wind control rule are monitored through a wind control system interface, so that the service running state of the financial transaction system is comprehensively analyzed, and corresponding transaction, clearing and wind control state data are generated.
Step S15, performing time series processing on basic performance data, database state data, middleware state data and process state data to generate system layer state sequence data;
The embodiment of the invention stores and processes basic performance data, database state data, middleware state data and process state data acquired in the steps S11 to S13 according to time sequence databases such as InfluxDB. Specifically, by time-stamping each monitor index, a system state change at each time is recorded as time-series data. For example, the CPU utilization rate, the memory occupancy rate and other data are recorded according to the second level, the connection number of the database and the SQL execution time are collected according to the minute level, and the system layer state sequence data are generated through time series processing, so that the subsequent trend analysis and state prediction are convenient.
S16, carrying out business association analysis on transaction performance data, clearing state data and wind control state data to generate business layer state association data;
Aiming at the transaction performance data, the clearing state data and the wind control state data generated in the step S14, the embodiment of the invention utilizes a correlation analysis tool such as Tableau or Power BI to perform correlation analysis of business layer data. For example, the system analyzes the correlation between the transaction performance and the clearing efficiency, evaluates the influence of the transaction processing time on the clearing efficiency, and simultaneously determines the relationship between the wind control interception rate and the transaction concurrency through the correlation analysis of the wind control state data to generate business layer state correlation data so as to help system operation and maintenance personnel identify business bottlenecks and potential risk factors.
S17, merging the system state sequence data and the business state associated data into system state basic data;
The embodiment of the invention combines the system layer state sequence data generated in the step S15 with the business layer state related data generated in the step S16. And the data from different sources are aligned in time sequence through a data fusion tool such as APACHE NIFI, so that the synchronism of the system layer data and the service layer data is ensured. In actual operation, firstly, system layer data and business layer data are converted into a unified data format, and then unified system state basic data are generated through comparison and association analysis of time sequence data, so that a basis is provided for construction of a subsequent test environment.
And S18, constructing a virtual test environment according to the system state basic data, performing self-adaptive configuration of the test scene, and generating test environment parameter data.
According to the system state basic data generated in the step S17, a virtual test environment similar to a real financial transaction system is constructed by using a virtualization technology such as VMware or dock. In the construction process, the system adaptively configures hardware resources in the virtual environment, such as virtual CPU, memory, storage and network parameters, according to the transaction amount, concurrency amount, clearing efficiency and the like of the actual scene. For example, if high concurrency transactions are frequently occurring as indicated in the system state base data, disks with high network bandwidth and fast I/O will be deployed in the virtual environment. Finally, test environment parameter data are generated to ensure that the virtual test scene can truly reflect the running state of the actual system.
The basic performance data reflects the hardware resource consumption and the overall health state of the system, and provides a basis for judging the system load and the performance bottleneck. The step ensures the dynamic monitoring of system resources, thereby timely finding out abnormal or overload conditions, facilitating rapid measures and preventing the system performance from being reduced. The acquisition of database state data may reflect the operational load, efficiency, and potential bottlenecks of the database. By monitoring the performance indexes of the key databases, a system manager can know the processing capacity of the databases, avoid the performance degradation of the databases caused by the over-high connection number or lock waiting problem, and ensure the response speed and stability of the databases in the transaction process. And monitoring the message queue depth, message processing delay and accumulation condition in real time, and generating middleware state data, so that the message processing efficiency in a transaction system can be recognized. Especially in a highly concurrent transaction environment, the health of the message queue directly affects the response speed and reliability of the system. In addition, the monitoring of the states of the key process and the thread generates process state data, so that the normal operation of a core function module of the system can be ensured, and system faults or delays caused by the abnormality of the key process or the thread can be avoided. The collection of service operational status data, including transaction performance, clearing status, and wind control status, can generate specific service performance data. The data not only reflects the operation efficiency of the transaction system, but also helps a system manager to master key business indexes such as transaction success rate, wind control response time, clearing processing efficiency and the like. Through deep knowledge of the data, the system can better optimize the business process, improve the overall efficiency and accuracy, and particularly play an important role in optimizing wind control and clearing links. And carrying out time serialization processing on the basic performance data, the database state data, the middleware state data and the process state data to generate system layer state sequence data. The time serialization can reflect the change trend of the system state, helps to identify potential performance degradation or abnormal conditions, and enables a system administrator to pre-judge the trend of the system load by comparing historical data and take preventive measures in advance. And carrying out business association analysis on the transaction performance data, the clearing state data and the wind control state data to generate business layer state association data. In the step, the mutual influence among different service functions can be identified by carrying out association analysis on the data of the service layer. For example, the influence of the wind control rules on transaction processing, or the relation of clearing efficiency on overall funds flow, helps to optimize business logic and system flow, thereby improving overall operation efficiency. And combining the system state sequence data and the business state association data into system state basic data. Through the integration of the data, the system can form global knowledge of the running state, can see the condition of the system resource layer and can master the health condition of the business process. The method provides an important data basis for subsequent virtual testing and optimization, and is helpful for improving pertinence and effectiveness of the testing. On the basis of the real running state, the system performs scene simulation and pressure test, ensures that the test environment can accurately reflect the actual running condition of the system, and improves the test precision and comprehensiveness according to the self-adaptive configuration of different scenes.
Preferably, step S18 comprises the steps of:
Step S181, carrying out hardware resource configuration based on CPU core number, memory capacity, storage space and network bandwidth according to system state basic data, carrying out configuration based on a connection pool size, a cache size and an execution plan optimization parameter database instance, and carrying out middleware parameter configuration based on a message queue size, thread pool parameters and a cache strategy, thereby obtaining environment parameter configuration data;
According to the system state basic data, an automatic configuration tool such as Ansible or Terraform is used for configuring hardware resources of the virtual environment. Specifically, the system dynamically allocates corresponding virtual machine resources according to the actual CPU core number, memory capacity, storage space and network bandwidth requirements. For example, when higher concurrent transaction loads are displayed in the system state base data, the system may configure more CPU cores and more memory to handle the transaction requests. Meanwhile, based on configuration requirements of the database instance, the size of a connection pool, the size of a cache and optimization parameters of an execution plan of the database are adjusted so as to improve query efficiency. For the parameter configuration of the middleware, the system optimizes the performance of the middleware according to the size of the message queue, the concurrency parameter of the thread pool and the requirement of the caching strategy, and finally generates environment parameter configuration data.
Step S182, constructing a transaction scene model according to the transaction performance data, and generating transaction scene data, wherein the transaction scene data comprises transaction type distribution, transaction frequency mode and transaction scale distribution;
The embodiment of the invention builds a transaction scene model by analyzing transaction performance data and using a simulation tool such as LoadRunner or JMeter. The model reflects the transaction type distribution, transaction frequency, and transaction size in the actual transaction scenario. For example, for a transaction scenario in a retail bank, the system simulates different types of transactions, such as funds transfers, bill payments, and credit card payments, based on historical data, and generates a distribution of transaction peak and valley periods through a pattern of transaction frequencies. Meanwhile, the transaction scale distribution is generated according to historical transaction amount statistics, such as the distribution range of the amount of a single transaction and the concentration of high-frequency transaction, and finally transaction scene data is generated for use in subsequent test scenes.
Step S183, a clearing scene model is built based on the clearing state data, and clearing scene data is generated, wherein the clearing scene data comprises a clearing batch, a clearing scale and a clearing time window;
The embodiment of the invention builds a clearing scene model by using a clearing simulation system based on clearing state data. The system first simulates daily or periodic clearing batches and scales based on historical clearing data. For example, if a financial institution has multiple clearing batches per day, each batch having different amounts of transactions and funds, the system will simulate the clearing batches and generate corresponding clearing scale data. The clearing time window is set according to actual operation time, for example, the end-of-day clearing usually occurs at night, and the clearing time schedule is optimized through a model so as to ensure that the clearing scene can truly reflect the clearing operation of the financial system and generate complete clearing scene data.
Step S184, constructing a wind control scene model according to wind control state data, and generating wind control scene data, wherein the wind control scene data comprises a wind control rule set, a wind control threshold value and a wind control response strategy;
According to the embodiment of the invention, a rule engine tool such as Drools is used for constructing a wind control scene model according to wind control state data. The method comprises the specific operations of establishing a wind control rule set, such as configuring different risk identification rules for different transaction types, setting a wind control threshold, such as setting stricter wind control standards when large-amount transfer or high-risk area transactions are conducted, and optimizing a wind control response strategy by analyzing historical data to ensure that the wind control rule can rapidly respond in a high-load environment. For example, for an anti-fraud scenario, the system may simulate an abnormally high frequency transaction triggering a wind-controlled response mechanism in a short time, thereby generating wind-controlled scenario data.
Step S185, performing scene combination optimization on transaction scene data, clearing scene data and wind control scene data to generate comprehensive test scene data;
In the embodiment of the invention, in the construction of a comprehensive test scene, transaction scene data, clearing scene data and wind control scene data are optimally combined through a multi-scene optimizing tool such as Genetic Algorithm. By simulating concurrent operations of different business scenes, the system evaluates the interaction among the three of transaction, clearing and wind control. For example, during peak clearing hours, the transaction frequency and execution of the wind control rules may stress the overall performance of the system, and the system optimizes the combined effect of each scenario by adjusting the transaction and clearing batch and wind control rule execution frequency, and generates comprehensive test scenario data for further testing the performance of the system in complex environments.
Step S186, carrying out load parameter self-adaptive adjustment on the comprehensive test scene data based on the system state basic data to generate scene load data;
The embodiment of the invention carries out self-adaptive adjustment of load parameters on the comprehensive test scene data based on the system state basic data. Through load balancing tools such as HAProxy or nmginx, the system dynamically adjusts load distribution in different scenarios according to real-time status data. For example, during peak traffic hours, the system may automatically increase virtual machine instances or extend network bandwidth to accommodate load demands based on traffic. Meanwhile, load of the clearing and wind control scenes is evaluated through a load pressure testing tool such as Stress-ng, and execution time of the clearing batch and the wind control rules is adjusted to ensure that the system is still stable under high load, and finally scene load data is generated.
And step 187, integrating and matching the environment parameter configuration data with the scene load data to generate test environment parameter data.
When the embodiment of the invention integrates the environment parameter configuration data and the scene load data, a system automation management platform such as Kubernetes is used for combining hardware resource configuration with a service scene, so that the test environment can accurately simulate the running state of an actual system. For example, the system dynamically allocates computing resources according to the environmental parameter configuration data, and simultaneously matches the scenario load data with the environmental parameter configuration data, so that reasonable resource configuration of a CPU, a memory, a network bandwidth and the like is ensured under each test scenario, and load test pressure can be accurately applied to the system. Through the integration operation, the system finally generates complete test environment parameter data, and the authenticity and stability of a test scene are ensured.
The invention generates environment parameter configuration data through configuration of hardware resources such as CPU core number, memory capacity, storage space, network bandwidth and the like, and optimization of database examples and middleware parameters (such as connection pool size, cache strategy, thread pool parameters and the like). The testing environment can accurately reflect the use condition of hardware and software resources of the production environment, so that the testing result has more practical reference value. Through reasonable resource allocation, resource waste or deficiency can be avoided, and stability and efficiency of the system in an actual scene are improved. And constructing a transaction scene model according to the transaction performance data, generating transaction scene data (comprising transaction type distribution, transaction frequency mode and transaction scale distribution), and accurately simulating transaction activities in a financial system. This helps to test the performance of the system under different transaction loads, identifying potential performance bottlenecks, such as response speed, system stability, etc. in the case of high frequency transactions or large scale transactions. By simulating the real transaction pattern, the reliability of the system can be better assessed. And constructing a clearing scene model according to the clearing state data, and generating clearing scene data, including clearing batches, scales and time windows. Clearing is an important link of a financial transaction system, and the processing efficiency and consistency of the system in the clearing process can be evaluated by simulating the actual clearing scene. Meanwhile, the step can help to find the pressure bearing capacity of the system when the peak value is cleared, so that the clearing flow is optimized, and the overall operation capacity of the system is improved. And constructing a wind control scene model based on the wind control state data, generating wind control scene data (such as a wind control rule set, a wind control threshold value and a wind control response strategy), and deeply testing the wind control capability of the system. The method can help the system to identify the rationality and the effectiveness of the wind control rules by simulating different wind control strategies and rule responses, and prevent the risk event from being intercepted or not reported by mistake, thereby ensuring the accuracy and the response speed of the system in the aspect of risk management and control. The comprehensive test scene data is generated by performing scene combination optimization on the transaction, clearing and wind control scene data, so that the performance of the system in the transaction, clearing and wind control processes can be evaluated simultaneously in a unified test environment. According to the method, through optimizing different scene combinations, multidimensional scene interaction in an actual production environment can be simulated maximally, and the system can be ensured to run stably in a complex service scene. This plays a key role in the overall testing and optimization of the overall performance of the system. And carrying out load parameter self-adaptive adjustment on the comprehensive test scene data based on the system state basic data to generate scene load data. The step ensures that the test can be carried out under different load conditions by dynamically adjusting the load parameters, helps to find out the performance of the system under the extreme load condition and finds out the links of performance bottleneck or insufficient resources. The self-adaptive adjustment can enable the test result to have more practical guiding significance, so that the robustness of the system under high load is improved. And integrating and matching the environment parameter configuration data with the scene load data to generate test environment parameter data. This step ensures that the configuration of the test environment matches the load model, making the test more closely to the actual operating environment. Meanwhile, through the integration and matching of parameters, the resource consumption condition in the production environment can be better simulated, and the reliability and accuracy of the test result are ensured. Finally, the system configuration is optimized, so that the system is more stable and efficient in actual operation.
As an embodiment of the present invention, referring to fig. 3, a detailed step flow chart of step S2 in fig. 1 is shown, and in the embodiment of the present invention, step S2 includes the following steps:
Step S21, carrying out gradient incremental test on transaction parameters in the test environment parameter data to generate transaction pressure test data, wherein the transaction parameters comprise the number of concurrent users, the transaction frequency and the transaction complexity;
When the embodiment of the invention carries out gradient incremental test on the transaction parameters in the test environment parameter data, a pressure test tool such as JMeter or LoadRunner is used, the system firstly increases the number of concurrent users gradually, the initial 100 users are set to be gradually increased to 5000 users, and the response time and the transaction processing capacity of the system under different concurrent conditions are monitored. Meanwhile, the system simulates the transaction frequency, gradually increases from 10 transactions per second to 100 transactions per second, and analyzes the performance of the system under high-frequency transactions. The transaction complexity is tested by adjusting the transaction type, such as simple funds transfer and complex cross-border transaction, and the time delay and success rate of the complex transaction processing by the recording system are recorded to finally generate the transaction pressure test data.
S22, carrying out high-volume transaction and batch transaction pressure test according to the test environment parameter data, and generating transaction capacity data comprising the upper limit of the amount of a single transaction, the concurrency of batch transaction and the cross-system transaction processing capacity;
according to the embodiment of the invention, the system performs pressure test on large-scale transactions and batch transactions according to the test environment parameter data. The method comprises the steps of firstly, gradually increasing the upper limit of the amount of a single transaction by simulating a large-amount transaction, for example, increasing from 1 ten thousand yuan to 100 ten thousand yuan, and recording the response time and the transaction success rate of the system for processing the large-amount transaction. Secondly, for the pressure test of batch transactions, the system simulates a plurality of concurrent batch transaction scenes, and 100 batch transaction requests are simultaneously initiated through a tool such as Gatling, and the upper limit of the concurrency of batch transactions is recorded. Meanwhile, the system simulates a cross-system transaction, analyzes the processing capacity of the transaction, and tests the response speed and accuracy of the transaction when the system performs high-frequency cross-system transaction with a third party payment platform, so as to generate transaction capacity data.
Step S23, carrying out correlation analysis on the transaction pressure test data and the transaction capacity data to generate transaction bearing capacity data;
In the embodiment of the invention, when the transaction pressure test data and the transaction capacity data are associated, the system utilizes a data analysis tool such as ELASTIC STACK to perform correlation analysis on the transaction pressure test data and the transaction capacity data. In particular, the system may analyze whether an increase in the number of concurrent users significantly affects the transaction capacity, such as whether there are concurrent users exceeding 2000, the transaction response time of the system significantly increases or the error rate increases. Through the correlation analysis, the system can determine the influence of different transaction parameters on the bearing capacity of the system, and finally generates transaction bearing capacity data, so as to judge the bearing performance of the system in a real transaction scene.
S24, performing system capacity pressure test according to the test environment parameter data to generate system capacity limit data, wherein the system capacity pressure test comprises database pressure test and system storage capacity test;
In the system capacity pressure test, the system performs special test on the database and the system storage capacity according to the test environment parameter data. Database stress testing simulates high concurrency database read-write operations through a tool such as SysBench, gradually increases the processing amount of database transactions until the system response is significantly delayed or errors occur, and records the maximum transaction amount upper limit of database processing. The storage capacity test simulates writing and reading operation of a large amount of data through a tool such as IOmeter, gradually increases the data amount to the upper limit of the storage of the system, monitors the performance of the system when the storage capacity is close to saturation, and generates the limit data of the system capacity as the limit performance index of the system.
Step S25, performing network pressure test based on network bandwidth pressure and network connection number limit according to the test environment parameter data to generate network bearing capacity data;
In the network pressure test, the system performs limit test of network bandwidth and connection number according to the test environment parameter data. The use of tools such as iPerf to simulate high-traffic network transmissions gradually increases the network bandwidth utilization, e.g., from 100Mbps to 1Gbps, recording the response speed and delay of the system in high bandwidth situations. For the network connection number test, the system gradually increases the number of users connected simultaneously by simulating connection requests of a large number of users, for example, from 1000 connection numbers to 50000, analyzes the processing capacity and stability of the system when the connection numbers reach the limit, and finally generates network bearing capacity data.
S26, calculating a system stability index according to the system capacity limit data, the transaction bearing capacity data and the network bearing capacity data, and comparing and analyzing the system stability index with a pre-acquired historical benchmark so as to obtain stability benchmark data;
In the embodiment of the invention, in the calculation of the system stability index, the system firstly comprehensively analyzes the system capacity limit data, the transaction bearing capacity data and the network bearing capacity data. Using an algorithmic model such as a linear regression or random forest model, the system compares these data with pre-acquired historical stability baseline data. For example, when the current test results indicate that the system's transaction bearing capacity is below a historical reference, the system may evaluate its potential risk and generate an alert. Through the analysis, the system obtains system stability reference data in the current test environment, and the system stability reference data is used for evaluating the overall stability of the system.
And step S27, carrying out dynamic load gradient processing according to the stability reference data to generate system pressure gradient data, and carrying out risk level mapping on the system pressure gradient data to generate risk assessment mapping data.
According to the embodiment of the invention, the system can carry out dynamic gradient processing on the load according to the stability reference data. By automating load management tools such as Kubernetes HPA (Horizontal Pod Autoscaler), the system adjusts the distribution of load based on real-time stability data. For example, when system pressure exceeds a certain threshold, the system may automatically increase computing resources or decrease concurrent transaction load, generating system pressure gradient data. The system then maps these pressure gradient data to a risk level model, for example by setting thresholds for different risk levels to determine the risk level of the current system, ultimately generating risk assessment mapping data for further risk management and coping strategy formulation.
The invention enables testing to run under ever increasing loads, thereby identifying the performance of the system under different trade conditions. Through fine transaction parameter adjustment, the bearing capacity of the system can be clarified, and data support is provided for subsequent performance optimization. This test ensures the processing power of the system in the face of large and high concurrent transactions, being able to identify bottlenecks and problems that may occur in the actual transaction scenario. Assessment of transaction capacity is critical to ensure that the system is able to handle the expected amount of transactions, reducing the risk in actual operation. By cross-verifying the relationship between the different data sets, a thorough understanding of the behavior of the system under stress is facilitated. For example, the response time and processing efficiency of the system under high concurrent trade conditions may be evaluated. Such analysis can provide important basis for system design and architecture, ensuring that business requirements can be met. And carrying out system capacity pressure test according to the test environment parameter data to generate system capacity limit data. This includes database stress testing and system storage capacity testing, which can evaluate the operation of the system under extreme conditions. This step can lead to the discovery of potential problems of under-storage or slow database response, providing an important reference for subsequent capacity planning and expansion. The system reliability under the condition of high network load is ensured, network bottleneck and the influence of the network bottleneck on transaction processing can be identified, a quantization standard is provided for the system performance, and the system stability and reliability under different load conditions can be evaluated. The improvement effect of the system and the possible problems can be better understood by comparison with the historical data. The dynamic load gradient processing not only helps to identify the risk level of the system under different loads, but also provides a basis for dynamically adjusting the load of the system. Through the risk mapping, the system can take corresponding precautions against different pressure conditions, so that potential risks are reduced, and the safety of transactions and the stability of the system are ensured.
Preferably, step S27 comprises the steps of:
step S271, performing multi-dimensional data layering processing on the stability reference data to generate layering reference data including a transaction layer pressure reference, a system layer pressure reference and a network layer pressure reference;
In the embodiment of the invention, the system firstly carries out multi-dimensional data layering processing on the stability reference data, and uses a data layering algorithm such as decision trees or hierarchical clustering to classify different hierarchy data of the system. At the transaction layer, the system generates transaction layer pressure reference data according to the factors of the concurrent user number, the transaction frequency, the transaction type and the like, at the system layer, generates system layer pressure reference data according to the system performance parameters of CPU occupancy rate, memory utilization rate, disk IO and the like, and at the network layer, generates network layer pressure reference data according to the network bandwidth utilization rate, network delay and connection number. Through the layering processing, the system can identify the pressure performance of each level, so that a foundation is laid for subsequent load calculation.
Step S272, constructing a dynamic load calculation model based on the layered reference data, and carrying out load parameter standardization processing to generate load standardization data;
When the dynamic load calculation model is built based on layered reference data, the system uses a mathematical model such as a multiple linear regression model, and combines the reference data of a transaction layer, a system layer and a network layer to build the load calculation model. And then, the system performs standardization processing on the load parameters, and converts the load parameters of each level into dimensionless data by adopting a normalization or Z-score method, so that the data of different levels have uniform dimensionality, and the subsequent model calculation and analysis can be facilitated. Finally, load standardized data are generated, and parameter consistency and comparability in the load calculation process are ensured.
Step S273, carrying out gradient feature extraction based on machine learning on the load standardized data so as to generate load feature vector data;
Based on the load standardized data, the system of the embodiment of the invention applies a gradient feature extraction algorithm based on machine learning, such as Principal Component Analysis (PCA) or Convolutional Neural Network (CNN), to perform feature extraction on the standardized load data. Specific operations include extracting gradient features most sensitive to system load, such as sudden increase of CPU utilization, limit use of network bandwidth, etc., by analyzing correlation among various load parameters, and generating load feature vector data. The data contains the characteristic with the most obvious influence on the load, and can provide accurate information for the subsequent load gradient classification.
Step S274, a multi-level load gradient model is established according to the load characteristic vector data, and load gradient classification data comprising a light load level, a medium load level, a heavy load level and an overload level are generated;
According to the embodiment of the invention, a system constructs a multi-level load gradient model according to the load characteristic vector data. And classifying the load characteristic vector data by using a K-means clustering algorithm or a fuzzy C-means algorithm to generate load gradient classification data comprising light load, medium load, heavy load and overload levels. The light load level may represent a system in an idle or light load condition, the medium load level reflecting the load condition during normal operation, the heavy load level indicating that the system load is approaching an upper limit, and the overload level indicating that the system is under extreme pressure, a performance bottleneck or fault may be imminent. By these classifications, the system is able to monitor and predict load conditions more accurately.
Step 275, performing dynamic threshold calculation on the load gradient classification data, performing threshold self-adaptive optimization according to preset historical pressure measurement data, and generating gradient threshold data;
When the embodiment of the invention carries out dynamic threshold calculation on the load gradient classification data, the system can combine the historical pressure measurement data, and adopts a self-adaptive threshold optimization algorithm, such as a genetic algorithm or Bayesian optimization, to adjust the threshold of each load level. For example, according to the system performance under different pressure measurement scenes in the past, the threshold value between the medium load and the heavy load is dynamically adjusted so as to ensure that the threshold value setting is more consistent with the actual load condition. And finally, gradient threshold data are generated, and the data not only reflect the running state of the current system, but also have the self-adaptive adjustment capability so as to cope with the future dynamic change of the system.
Step S276, carrying out real-time evaluation on the current load state of the system based on gradient threshold data, and carrying out load prediction analysis to generate system pressure gradient data;
Based on gradient threshold data, the system evaluates the current load state in real time. The system predicts the load change trend of the system in the future time period, particularly the system pressure change condition of the peak transaction time by monitoring various parameters of the transaction layer, the system layer and the network layer in real time and combining gradient threshold data and using a sliding window algorithm or a Recurrent Neural Network (RNN) model to carry out load prediction analysis. Through these analyses, the system can pre-warn of pressure peaks that may occur in advance, thereby generating system pressure gradient data to support self-regulation of the system at high loads.
Step S277, performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data.
The system of the embodiment of the invention carries out risk level mapping on the generated pressure gradient data, applies a risk assessment model such as a model based on fuzzy logic or decision tree, and corresponds different pressure gradients to different risk levels, for example, a light load state corresponds to low risk, a medium load state corresponds to medium risk, and an overload state corresponds to high risk. Finally, the system generates risk assessment mapping data according to the mapping result, provides corresponding early warning prompts and processing suggestions, helps operation and maintenance personnel to take measures in time, and reduces the fault risk of the system in a high-voltage state.
According to the invention, the complex reference data is split into different levels of standards, so that the analysis of each layer is more accurate, and the optimization and monitoring are convenient to conduct in a targeted manner. The construction of the layered datum provides a clear reference frame for subsequent analysis and decision making, and ensures that the pressure conditions of all layers are effectively monitored. Through standardization, the difference between different data sources can be eliminated, so that the analysis model has comparability and consistency, and the accuracy and reliability of load calculation are ensured. The machine learning algorithm is utilized to extract the feature vectors, so that potential modes and relations in the data can be deeply mined, and the intelligent degree of the load model is improved. The generation of the load characteristic vector can provide data support for dynamically adjusting the load and optimizing the resource allocation, and the self-adaptive capacity of the system is enhanced. The process of establishing a multi-level load gradient model helps to achieve fine management and monitoring by dividing the load into multiple levels. The load gradient classification data of different levels provides clear standards for real-time load assessment, so that the system can take corresponding response measures aiming at different load conditions, and the flexibility and the stability of the system are improved. The load management is more accurate, and the dynamic threshold value can be set according to actual running conditions, so that sudden load changes can be more effectively handled. Through self-adaptive optimization, the system can adjust the load tolerance in real time, and the strain capacity of the system is improved. Real-time assessment and load prediction analysis enable the system to quickly identify current operating conditions and predict potential load problems ahead of time. The process can provide timely early warning for the system, help the operation and maintenance team to take necessary measures, reduce the risk of system breakdown and ensure the continuity and reliability of financial transactions. And performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data. Through the risk assessment mapping of the pressure gradient data, the load condition of the system can be converted into an intuitive risk level, and an operation and maintenance team can quickly know the safety condition of the system and react. This mapping data can help to formulate a corresponding emergency plan, ensure that resources can be adjusted or other remedial action taken in time when the system is under too high a pressure.
Preferably, step S277 includes the steps of:
Step S2771, establishing a multi-dimensional risk assessment index system according to system pressure gradient data to generate risk index data, wherein the risk index comprises system performance risks, service processing risks, data consistency risks and network transmission risks;
In the embodiment of the invention, a system firstly establishes a multidimensional risk assessment index system according to system pressure gradient data, and uses a Analytic Hierarchy Process (AHP) or a weighted scoring method to take system performance, business processing, data consistency and network transmission as main dimensions. The system performance risk is determined by analyzing key indexes such as CPU utilization rate, memory occupancy rate, disk IO, system response time and the like, the business processing risk is evaluated according to parameters such as transaction processing success rate, transaction delay, concurrency and the like, the data consistency risk is determined according to data such as account checking consistency, fund account arrival time and the like in the clearing process, and the network transmission risk is evaluated through network bandwidth utilization rate, network delay and packet loss rate. The risk indexes of all dimensions are combined according to different weights to generate risk index data for comprehensively reflecting the running risk condition of the system.
Step S2772, classifying risk levels of the risk index data according to a preset risk index threshold value to obtain risk level classification data;
In the embodiment of the invention, the system classifies the risk level of the generated risk index data according to the preset risk index threshold value. Using logistic regression or fuzzy logic methods, risk indicators of different dimensions are classified into different risk classes, such as low risk, medium risk and high risk. The system performance risk may be judged as a risk of wind when CPU usage exceeds 70%, while more than 90% is a high risk; the risk of transaction processing also triggers a high risk alert when the transaction delay exceeds a preset threshold. Through these classifications, the system generates risk ranking data such that different risk scenarios can be assigned to the appropriate risk ranks according to the respective metrics.
Step S2773, constructing an early warning rule model comprising early warning trigger conditions, early warning level judgment and early warning message pushing strategies based on risk level division data;
based on the risk classification data, the system builds an early warning rule model, and the early warning rule model comprises early warning trigger conditions, early warning level judgment and early warning message pushing strategies. The triggering conditions can comprise triggering early warning when the CPU utilization rate exceeds 90% or the network delay exceeds the designated time, determining early warning level judgment according to risk level classification data, triggering low-level early warning by low risk, triggering medium-level early warning by medium risk and triggering emergency early warning by high risk. The early warning message pushing strategy is based on the level and scene of risk, for example, the low-level early warning can be notified through mail, and the emergency early warning is sent to relevant operation and maintenance personnel through instant message. This model ensures that the pre-warning can trigger and notify the relevant personnel at the proper time by means of strictly defined conditions and policies.
Step S2774, performing reliability verification on the early warning rule model, and performing rule optimization adjustment based on a verification result to generate early warning strategy data;
After the early warning rule model is constructed, the embodiment of the invention can perform reliability verification, adopts historical data to perform test verification, or performs test through data generated by simulation pressure test. The system can analyze the accuracy and timeliness of early warning triggering and judge whether the early warning level accords with the actual condition or not. Based on the validation results, the system optimizes the pre-warning rules by using a machine learning model such as a random forest or Gradient Boosting Decision Tree (GBDT). For example, if the early warning level of a certain type of risk is judged to be too frequent and the false alarm rate is high, the system can properly adjust the triggering threshold or condition of early warning, and finally optimized early warning strategy data is generated, so that the early warning model is more accurate and reliable.
And S2775, mapping and associating the risk classification data with the early warning strategy data, and constructing a risk response mechanism so as to obtain risk assessment mapping data.
According to the embodiment of the invention, the risk classification data and the optimized early warning strategy data are mapped and associated, and a complete risk response mechanism is constructed. In the process, the system matches the pre-warning strategies corresponding to each risk level, so that the optimal response measures under different risk scenes are determined. For example, for high risk level scenarios, the system not only sends emergency pre-warning, but also triggers automated emergency treatment procedures, such as automatic expansion of resources or transaction limiting measures. Through the mapping association, the system finally generates risk assessment mapping data, and provides a comprehensive risk prevention and control and response mechanism for the financial transaction system.
The method has the beneficial effect that potential risk factors of the system can be more comprehensively identified and evaluated by classifying risks in different dimensions. The multidimensional risk assessment index can provide a richer information basis for subsequent risk analysis and decision making, and improves the sensitivity and the reaction capability of the system to various risks. Through the classification process, the system can identify indexes of different risk levels in real time and rapidly judge the current risk state. The division of the risk level is beneficial to the operation and maintenance team to quickly acquire key information and take appropriate response measures under the emergency condition, so that the efficiency and effectiveness of risk management are improved. The construction of models makes risk management more systematic and standardized. By setting clear early warning triggering conditions and message pushing strategies, relevant personnel can be timely notified when the risk level reaches a certain threshold value, rapid response is ensured, and the influence of potential risks on services is reduced. The reliability verification and optimization process can ensure the effectiveness and adaptability of the early warning rule model. Through continuous testing and adjustment, the system can better adapt to the continuously-changing service environment and risk factors, and improves the accuracy and reliability of the early warning system. Through mapping association, corresponding response measures can be ensured to be implemented under different risk levels, and a complete set of risk management closed loops is formed. The risk response mechanism can help organizations to take action quickly, prevent risk events from happening, and reduce interference and loss to business operation.
Preferably, step S3 comprises the steps of:
S31, acquiring abnormal behavior data of a financial transaction system through a distributed monitoring system, so as to obtain original abnormal data comprising system layer abnormal data, application layer abnormal data and network layer abnormal data;
According to the embodiment of the invention, the distributed monitoring system is used for monitoring all levels of the financial transaction system in real time, and the acquired original abnormal data comprise various abnormal behaviors of a system layer, an application layer and a network layer. For example, the system layer abnormal data comprises resource anomalies such as CPU overload and memory leakage, the application layer abnormal data comprises problems such as application program breakdown and request timeout, and the network layer abnormal data comprises phenomena such as packet loss and delay overhigh. The distributed monitoring system aggregates the original abnormal data and integrates the original abnormal data into an original abnormal data set according to data sources of different levels for subsequent processing and analysis.
Step S32, carrying out data cleaning and time sequence alignment on the original abnormal data to generate standardized abnormal data, and carrying out feature extraction based on performance abnormality, resource abnormality and business abnormality on the standardized abnormal data to obtain abnormal behavior feature data;
The original abnormal data collected by the embodiment of the invention often contains noise and redundant data, so that the system firstly cleans the data, removes irrelevant or repeated data and processes the missing value. And then, the system adopts a time sequence alignment algorithm to align the abnormal data from different sources under a unified time line to generate standardized abnormal data. And then, the system performs feature extraction on the standardized abnormal data based on the dimensionalities of performance abnormality, resource abnormality, business abnormality and the like, wherein the extracted features possibly comprise CPU overload time, memory use peak value, transaction request failure rate and the like, and finally abnormal behavior feature data is generated, so that a basis is provided for subsequent fault mode identification.
S33, constructing a fault mode classification model for distinguishing the system fault type, the service fault type and the network fault type according to the abnormal mode sequence data, and carrying out fault type identification by utilizing the fault mode classification model to obtain fault classification data;
According to the embodiment of the invention, the abnormal behavior characteristic data is analyzed, the fault mode classification model is constructed, and the model can distinguish different types such as system faults, service faults, network faults and the like. For example, a system failure may be associated with a hardware resource or operating system, a business failure may be associated with an anomaly in business logic processing, and a network failure may be associated with network connectivity or bandwidth limitations. By training the classification model based on the data of the abnormal pattern sequence, the system is able to identify a specific type of fault and output fault classification data. For example, the model may determine a system layer failure when the system detects that a CPU overload and a database connection timeout occur simultaneously, and may be identified as a network failure when the transaction success rate is suddenly reduced and the network packet loss rate is increased.
Step S34, training an intelligent early warning model according to the pre-acquired historical fault data to generate early warning model parameter data;
According to the embodiment of the invention, the intelligent early warning model is trained by utilizing the historical fault data, and the parameter data of the early warning model, such as weight, threshold setting and the like, are generated by a supervised learning or reinforcement learning method. The system can carry out dynamic probability evaluation according to the fault classification data, evaluate the occurrence possibility of different fault types and generate fault probability data. The fault probability evaluation usually adopts Bayesian probability or Markov chain algorithm, and takes the frequency of occurrence of historical faults, the state of the current system and the time correlation of the faults into consideration. For example, when the system detects multiple business anomalies concurrent with a system-layer anomaly, the system may calculate a higher business failure probability.
Step S35, carrying out multi-level early warning threshold calculation according to the early warning model parameter data and the fault probability data so as to obtain threshold grading data;
According to the embodiment of the invention, the system carries out multi-level early warning threshold calculation on various faults according to the early warning model parameter data and the fault probability data. Using algorithms based on dynamic threshold adjustment, the system is able to set different threshold levels based on current fault data and historical data. For example, a system level fault may be set to a low level early warning when CPU utilization exceeds 80% and to a high level early warning when CPU utilization exceeds 95%. Network failures may trigger mid-level early warning when the packet loss rate exceeds 2%. Ultimately, the results of these calculations generate threshold ranking data for distinguishing between different levels of risk of failure.
And step S36, constructing an early warning rule model based on the threshold grading data, and generating early warning trigger condition data, wherein the early warning rule model comprises early warning condition definition, early warning level judgment and an early warning trigger mechanism.
The embodiment of the invention constructs an early warning rule model based on the generated threshold grading data, and defines the conditions for triggering early warning, the judgment standard of the early warning level and the triggering mechanism of early warning. For example, the early warning condition can be defined as that the CPU utilization rate of the system layer exceeds 90% and the network delay exceeds 100ms, the early warning level is judged to be classified into low-level early warning, medium-level early warning and high-level early warning according to the severity and the influence range of the fault, and the early warning triggering mechanism controls the sending mode of the early warning, such as that the low-level early warning is notified through mail, and the medium-level early warning and the high-level early warning are sent to related personnel through instant messages. The finally generated early warning trigger condition data ensures that the system can respond to the fault risk in real time and timely inform related personnel to take countermeasures.
The invention collects the abnormal behavior data of the financial transaction system through the distributed monitoring system, and can comprehensively obtain the original abnormal information of the system layer, the application layer and the network layer. The multidimensional data acquisition not only provides a more complete visual angle, but also helps identify the root cause of potential problems, improves the accuracy and timeliness of fault detection, further enhances the stability and safety of the system, and reduces the risk caused by abnormal behaviors. The original abnormal data is cleaned and time series aligned, so that noise and inconsistent data can be eliminated, and analysis is more reliable. By feature extraction based on performance, resource and business anomalies, data can be converted into feature sets that are more interpretable. The standardized processing reduces the complexity of subsequent analysis and improves the detection capability of abnormal behaviors, thereby reducing the possibility of false alarm and missing report and helping to locate the problem more quickly. By constructing a fault mode classification model for distinguishing system faults, service faults and network faults, automatic fault diagnosis can be realized. By using the model to identify the fault type, the fault source can be rapidly and accurately positioned, and the time required by manual intervention and judgment is reduced. The automatic fault classification not only improves the reaction speed, but also optimizes the configuration of resources, and ensures that key problems can be solved in time. And training an intelligent early warning model and carrying out dynamic probability evaluation, thereby being beneficial to accurately identifying and predicting potential fault risks according to historical data. The generated fault probability data can help a team to develop a more effective emergency response strategy, take precautionary measures in advance and reduce the possibility of fault occurrence. This predictive capability promotes overall system robustness and enables active intervention before failure occurs, thereby reducing losses. The multi-level early warning threshold calculation provides clear fault triggering conditions for the system. By setting different levels of thresholds, the system can alert when a problem is initially present, thereby enabling the team to take action before the problem is exacerbated. The early warning mechanism remarkably improves the flexibility and timeliness of emergency response, and further reduces potential risks and losses. The constructed early warning rule model ensures that early warning can be automatically triggered when abnormal conditions occur. The mechanism comprises definition of early warning conditions, level judgment and a trigger mechanism, so that the system can rapidly respond when the system faces an emergency. Through the systematic early warning mechanism, the safety and reliability of the financial transaction system are obviously improved, the dependence on manual monitoring is reduced, and the possibility of human errors is reduced.
Preferably, step S4 comprises the steps of:
Step S41, constructing a multi-dimensional performance evaluation model comprising a system performance dimension, a resource utilization dimension, a service processing dimension and a reliability dimension according to system monitoring data acquired in real time to obtain performance index data;
According to the embodiment of the invention, the multidimensional performance evaluation model is constructed through the monitoring data collected in real time. The model includes four dimensions, namely a system performance dimension (such as response time, throughput and the like), a resource utilization dimension (such as use cases of a CPU, a memory, a disk IO and the like), a business processing dimension (such as transaction success rate, request processing time and the like) and a reliability dimension (such as failure rate, recovery time and the like). And after integrating the data of each dimension, the system generates performance index data by using a data normalization and weighting algorithm. For example, in a high concurrency scenario, the response time in the system performance dimension may increase significantly, while the CPU and memory usage in the resource utilization dimension may also climb, the system aggregating these data in real-time and generating overall performance assessment data.
Step S42, extracting pressure characteristic vectors from the system pressure gradient data to generate pressure characteristic data, and carrying out multidimensional characteristic decomposition on the performance index data to obtain the performance characteristic data;
In the embodiment of the invention, from the system pressure gradient data, the system firstly extracts key pressure characteristic vectors, such as the change rate of the pressure gradient, the response curve of the system under different load conditions and the like, and generates the pressure characteristic data. Meanwhile, the system performs multidimensional feature decomposition on the generated performance index data, and the decomposed performance feature data can comprise specific performance performances under different dimensions, such as service response delay, throughput reduction and the like when the CPU utilization rate exceeds 80%. Through feature decomposition, the system can describe the feature behaviors under each performance dimension more accurately, and subsequent performance analysis is facilitated.
Step S43, performing performance characteristic data identification based on a machine learning algorithm on the pressure characteristic data and the performance characteristic data to generate performance abnormal characteristic data;
The embodiment of the invention utilizes a machine learning algorithm, such as a Support Vector Machine (SVM) or Random Forest (Random Forest), and the like, and the system identifies and classifies the extracted pressure characteristic data and performance characteristic data and screens out abnormal performance characteristic data. For example, when the system exhibits abnormal response times and throughput degradation trends under certain specific pressure conditions, the system may identify such abnormal behavior and flag as performance abnormality signature data. The machine learning model can improve the sensitivity to abnormal conditions through learning the historical data, so that potential performance problems can be accurately detected.
S44, performing root cause analysis on the performance abnormal characteristic data, and utilizing a link tracking technology to locate performance bottleneck points to generate preliminary bottleneck location data;
According to the embodiment of the invention, the system can analyze the root cause of the identified performance abnormal characteristic data, and the historical data and the current system environment are combined to analyze the source of the abnormality. In order to accurately locate performance bottlenecks, the system may use a link tracking technique, such as a distributed link tracking tool (e.g., zipkin or Jaeger), to analyze the anomaly occurrence path and track the delay of each link from user requests to database accesses, thereby generating preliminary bottleneck location data. For example, when a system response time anomaly is found, it can be found through link tracking that bottlenecks may occur at the lock latency of database queries that is too long or the thread pool at the application layer is exhausted.
S45, analyzing propagation paths and influence ranges of performance bottlenecks among all system components to the preliminary bottleneck positioning data to generate performance influence evaluation data;
After determining the preliminary bottleneck position, the system further analyzes the propagation path and the influence range of the bottleneck in different system components. For example, if the database query is a bottleneck, the system may analyze whether the bottleneck may cause resources of other components, such as an application server or a load balancer, to be exhausted, thereby expanding the scope of impact of the failure. By modeling or historical data-based speculation, the system generates performance impact assessment data that specifies the direct and indirect impact ranges of the bottleneck, helping system administrators assess the impact of the bottleneck on overall system performance.
And step S46, performing performance optimization strategy formulation based on system configuration optimization, resource capacity expansion suggestion and code optimization direction on the preliminary bottleneck positioning data according to the performance influence evaluation data, so as to obtain the performance bottleneck positioning data.
The embodiment of the invention establishes a corresponding performance optimization strategy based on the performance influence evaluation data. For example, if the bottleneck is due to a performance problem with a database query, the system may suggest optimizing the database query statement, or increasing the cache size of the database. In addition, if the system resources are insufficient, the system may suggest to perform resource expansion, such as increasing the CPU core number or memory capacity. At the same time, for a performance bottleneck at the code level, the system may suggest optimizing the execution logic of the code or improving concurrency handling capability. Finally, the system generates performance bottleneck location data and provides explicit optimization suggestions to system administrators, helping to improve overall system performance.
The invention can comprehensively evaluate the overall health state of the system by constructing the performance evaluation model with multiple dimensions including system performance, resource utilization, business processing, reliability and the like. The generation of the performance index data enables a manager to monitor the multidimensional performance of the system in real time, so that potential problems can be found in time and decisions can be made. The omnibearing monitoring and evaluation ensures the stability of the system under the condition of high load and improves the response capability to service demands. Extracting pressure characteristic vectors from the system pressure gradient data helps to quantify the performance of the system under different loads. The resulting pressure signature data, in combination with the multi-dimensional signature decomposition of the performance index data, provides a deeper understanding of performance. The analysis can reveal the change of the system performance under specific load, help identify potential weaknesses of the performance, and lay a foundation for subsequent optimization. The pressure characteristic data and the performance characteristic data are identified by using a machine learning algorithm, so that the performance abnormal characteristics can be automatically detected and identified. Through the intelligent analysis, the accuracy and the efficiency of anomaly detection can be greatly improved, and the need of manual intervention is reduced. The method not only accelerates the problem discovery process, but also improves the strain capacity of the system and ensures the continuity of key business. And the root cause analysis is carried out on the performance abnormal characteristic data, and the link tracking technology is utilized to locate the performance bottleneck, so that the deep mining of the problem can be realized. By accurately identifying the performance bottleneck, the team can take targeted measures to solve the problem, and service interruption or performance degradation caused by failure to find the bottleneck in time is avoided. The process improves the efficiency and accuracy of fault removal, and makes the system maintenance more scientific and effective. By analyzing the propagation paths and the influence ranges of the preliminary bottleneck positioning data among all the system components, the system influence and the propagation modes of the bottleneck can be deeply known. The performance influence evaluation data provides an important basis for formulating an optimization strategy, so that the optimization scheme is more targeted and effective. This analysis helps the manager identify key points in the system, ensuring that resources are concentrated where they really need to be optimized, thereby optimizing overall system performance. And an optimization strategy formulated according to the performance influence evaluation data comprises system configuration optimization, resource capacity expansion suggestion and code optimization direction, so that clear action guidance is provided for the performance improvement of the system. The systematic optimization method not only can effectively solve the identified performance bottleneck, but also can lay a foundation for future performance improvement. By implementing the strategies, the response speed, the processing capacity and the reliability of the system can be obviously improved, so that the continuously-changing business requirements can be better met.
Preferably, step S5 comprises the steps of:
step S51, constructing a multi-dimensional risk assessment model according to early warning triggering condition data, extracting risk characteristics from performance bottleneck positioning data, and generating risk characteristic data, wherein the risk characteristics comprise performance degradation risk characteristics, system collapse risk characteristics, data consistency risk characteristics and service interruption risk characteristics;
According to the embodiment of the invention, a multidimensional risk assessment model is constructed according to the early warning trigger condition data, and the analysis and feature extraction are carried out on different dimensional risks of the system by combining the performance bottleneck positioning data. For example, performance degradation risk features may result from a gradual increase in system response time, system crash risk features may result from CPU or memory usage approaching 100%, data consistency risk features may result from excessive database lock latency or data write delays, and business disruption risk features may result from significantly reduced transaction success rates. By extracting these risk features, the system generates risk feature data that facilitates subsequent risk processing.
Step S52, risk propagation path analysis is carried out on the risk characteristic data, so that risk propagation path data are obtained;
the embodiment of the invention analyzes the extracted risk characteristic data, and mainly analyzes the propagation path of the risk. By tracking the impact propagation paths of performance bottlenecks in different system components, for example, from database query delay to thread pool resource exhaustion of an application layer, to increase of request response time of a user side, the system can clearly describe the propagation modes of risks and generate risk propagation path data. This helps the system administrator quickly determine the source of the problem and the extent of potential risk spread.
Step S53, probability evaluation is carried out on the risk propagation path data, a risk probability matrix is established, and risk probability distribution data are generated;
According to the risk propagation path data, the risk occurrence probability of different nodes is evaluated, and a risk probability matrix is generated. The matrix calculates the risk probability distribution of each node through comprehensive analysis of the historical data and the current system state. For example, in high concurrency situations, the probability of a database lock latency extension is greater, which can further increase the risk of transaction failure. Finally, the system generates risk probability distribution data, and the occurrence probability and the influence degree of each risk point are clear.
Step S54, carrying out multi-level risk classification according to the risk probability distribution data to generate risk assessment data;
According to the embodiment of the invention, the system classifies risk grades of different risk points according to the risk probability distribution data, and adopts a multi-level mode, such as low, medium, high and serious grades. The system determines the level of each risk point by comparing the historical benchmark data with the current system's real-time status. For example, if the probability of a crash at a certain risk point is high and may affect the whole system, it is scored as high risk, and risk assessment data is generated to help to prioritize high risk.
Step S55, carrying out strategy matching analysis on the risk assessment data according to a preset system recovery strategy library to generate strategy matching data, wherein the system recovery strategy library comprises a resource capacity expansion strategy, a load balancing strategy, a service degradation strategy and a fault isolation strategy;
according to the risk assessment data, the embodiment of the invention matches with a preset system recovery strategy library. The policy repository includes resource expansion policies (e.g., increasing CPU and memory resources), load balancing policies (e.g., dynamically allocating task loads), service degradation policies (e.g., reducing the frequency of execution of non-critical tasks), and fault isolation policies (e.g., service isolation that will fail). The system finds a coping strategy suitable for the current risk situation through strategy matching analysis, generates strategy matching data, and ensures that the subsequent recovery scheme can effectively cope with different risk scenes.
Step S56, constructing a multi-dimensional recovery scheme based on the strategy matching data, and generating fault recovery scheme data comprising an instant recovery scheme, a progressive recovery scheme, an emergency recovery scheme and a disaster recovery switching scheme;
The embodiment of the invention is based on policy matching data, and the system constructs a multi-dimensional fault recovery scheme. These schemes are classified into an immediate recovery scheme (e.g., immediately reallocating resources), a gradual recovery scheme (e.g., gradually increasing the load capacity of the system), an emergency recovery scheme (e.g., temporarily shutting down a portion of non-critical functions to ensure core service operation), and a disaster recovery switching scheme (e.g., starting up a backup system when a major system fails), according to the needs of different scenarios. The system generates fault recovery scheme data according to actual conditions, and provides guidance for subsequent execution.
S57, performing scene simulation test on the fault recovery scheme data, and performing validity evaluation based on the scheme success rate, recovery time and resource consumption to generate simulation result data;
the embodiment of the invention carries out scene simulation test on the generated fault recovery scheme, and the performance of the test scheme under different conditions is tested by simulating the actual fault scene. The emphasis of the simulation test includes the success rate of the recovery scheme (i.e., the validity of the scheme), the recovery time (i.e., the execution efficiency of the scheme), and the resource consumption (i.e., the system resources consumed during recovery). Through statistical analysis of the test results, the system generates simulation result data to verify the effectiveness of each recovery scheme.
And S58, performing recovery strategy generation according to the simulation result data, so as to obtain recovery strategy data.
According to the embodiment of the invention, the fault recovery strategy is further optimized and adjusted according to the simulation result data. For example, if a scheme is too long to recover in simulation test or the resource consumption is too high, the system can re-optimize the strategy according to the test data, so as to ensure that the system can be recovered in a more efficient manner in practical application. Finally, the system generates recovery strategy data to ensure that the system operation can be quickly and effectively recovered under the real fault scene.
According to the invention, various potential risks can be systematically identified and quantified by constructing a multidimensional risk assessment model and extracting the risk characteristics of the performance bottleneck positioning data. These risk features (e.g., performance degradation, system crashes, data consistency, and business outage risk) provide insight into the health of the system, enabling the manager to identify problems earlier, optimize the system design, reduce the likelihood of failure, and promote overall system stability. And (3) analyzing the risk propagation path of the risk characteristic data, and helping to understand the transmission mechanism and the influence range of the risk among the system components. Such analysis can reveal potential system weaknesses, helping administrators take prospective risk control measures. By defining the risk propagation path, organizations can better formulate counter measures, reducing the impact of risks on the overall performance of the system. By probability evaluating the risk propagation path data and establishing a risk probability matrix, the probability of occurrence of different risk events can be quantified. The risk probability distribution data enables a manager to make a more scientific decision based on the data and preferentially process high-probability risk events, so that risks of service interruption and system breakdown are effectively reduced, and continuity and reliability of overall service are improved. The risk probability distribution data are used for carrying out multi-level risk classification, and different risks can be systematically ranked in priority. Such risk assessment data enables teams to focus resources and attention on the most important risks, thereby optimizing the effects of risk management, ensuring that critical risks are handled in time, and reducing negative impact on business. And carrying out policy matching analysis on the risk assessment data and a preset system recovery policy library, so that the most suitable countermeasure can be determined for different types of risks. The strategy matching process improves the system recovery efficiency, so that the recovery scheme is more flexible and targeted, is beneficial to quick response when risks occur, and reduces potential loss. The multidimensional restoration scheme constructed based on the policy matching data provides a plurality of restoration options including instant restoration, progressive restoration, emergency restoration and disaster recovery switching schemes. The diversified recovery scheme ensures that the system can be quickly and effectively recovered under different conditions, the flexibility and toughness of the service are improved, and the availability of key services is ensured. The scene simulation test is carried out on the fault recovery scheme data, the effectiveness of the fault recovery scheme data is evaluated, and potential problems and defects can be identified before actual implementation. The validity evaluation ensures the rationality of the selected recovery scheme in aspects of resource consumption, recovery time, success rate and the like, and provides data support for the optimization of the subsequent recovery scheme, thereby improving the feasibility and reliability of the recovery strategy. The recovery strategy data generated according to the simulation result data can realize continuous optimization and adjustment of the scheme. The dynamic strategy generation process based on the data ensures that the recovery strategy is suitable for actual service demands and environmental changes, improves the overall risk resistance and recovery capacity of the system, and enhances the capability of enterprises to cope with emergency events.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. The testing method based on the stability of the financial environment is characterized by comprising the following steps of:
Step S1, monitoring a real-time state of a financial transaction system to generate system state basic data, constructing a virtual test environment according to the system state basic data, and performing test scene self-adaptive configuration to generate test environment parameter data;
S2, performing multidimensional pressure test analysis on the test environment parameter data, calculating a system stability index to generate stability reference data, performing dynamic load gradient processing according to the stability reference data to obtain system pressure gradient data, performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data;
Step S3, acquiring abnormal behavior characteristics of the financial transaction system by using the distributed monitoring system to generate abnormal behavior characteristic data, carrying out fault mode identification on the abnormal behavior characteristic data, establishing a fault early warning threshold value and generating early warning trigger condition data;
Step S4, performing multi-dimensional performance analysis based on system monitoring data acquired in real time to obtain performance index data, wherein the multi-dimensional performance comprises multi-dimensional performance, resource utilization dimension, service processing dimension and reliability dimension, performing association analysis on system pressure gradient data and the performance index data, and performing bottleneck recognition to generate performance bottleneck positioning data, wherein the step S4 specifically comprises the following steps:
Step S41, constructing a multi-dimensional performance evaluation model comprising a system performance dimension, a resource utilization dimension, a service processing dimension and a reliability dimension according to system monitoring data acquired in real time to obtain performance index data;
step S42, extracting pressure characteristic vectors from the system pressure gradient data to generate pressure characteristic data, and carrying out multidimensional characteristic decomposition on the performance index data to obtain the performance characteristic data;
Step S43, performing performance characteristic data identification based on a machine learning algorithm on the pressure characteristic data and the performance characteristic data to generate performance abnormal characteristic data;
S44, performing root cause analysis on the performance abnormal characteristic data, and utilizing a link tracking technology to locate performance bottleneck points to generate preliminary bottleneck location data;
S45, analyzing propagation paths and influence ranges of performance bottlenecks among all system components to the preliminary bottleneck positioning data to generate performance influence evaluation data;
Step S46, performing performance optimization strategy formulation based on system configuration optimization, resource capacity expansion suggestion and code optimization direction on the preliminary bottleneck positioning data according to the performance influence evaluation data so as to obtain the performance bottleneck positioning data;
Step S5, performing risk level assessment on the performance bottleneck positioning data through early warning triggering condition data to obtain risk assessment data, constructing a system recovery strategy according to the risk assessment data to generate fault recovery scheme data, performing simulation verification on the fault recovery scheme data, and performing recovery strategy generation based on simulation results to obtain recovery strategy data;
and S6, performing stability evaluation on the financial transaction system by using the risk evaluation mapping data and the recovery strategy data, so as to obtain environmental stability evaluation data.
2. The method of claim 1, wherein step S1 includes the steps of:
Step S11, collecting system operation basic parameters of a financial transaction system to generate basic performance data, wherein the system operation basic parameters comprise CPU (Central processing Unit) utilization rate, memory occupancy rate, disk IO (input/output) and network transmission rate;
Step S12, acquiring the database connection number, the active session number, SQL execution time and lock waiting condition of the financial transaction system, so as to generate database state data;
Step S13, monitoring the message queue depth, message processing delay and message accumulation condition of the financial transaction system in real time to generate middleware state data;
Step S14, collecting service running states of a financial transaction system so as to obtain transaction performance data, clearing state data and wind control state data, wherein the wind control state data comprises transaction processing time, transaction success rate and transaction concurrency, the clearing state data comprises clearing processing efficiency, fund account time and account checking consistency, and the wind control state data comprises wind control rule response time, wind control interception rate and wind control accuracy rate;
Step S15, performing time series processing on basic performance data, database state data, middleware state data and process state data to generate system layer state sequence data;
S16, carrying out business association analysis on transaction performance data, clearing state data and wind control state data to generate business layer state association data;
s17, merging the system state sequence data and the business state associated data into system state basic data;
And S18, constructing a virtual test environment according to the system state basic data, performing self-adaptive configuration of the test scene, and generating test environment parameter data.
3. The method of claim 2, wherein step S18 includes the steps of:
Step S181, carrying out hardware resource configuration based on CPU core number, memory capacity, storage space and network bandwidth according to system state basic data, carrying out configuration based on a connection pool size, a cache size and an execution plan optimization parameter database instance, and carrying out middleware parameter configuration based on a message queue size, thread pool parameters and a cache strategy, thereby obtaining environment parameter configuration data;
Step S182, constructing a transaction scene model according to the transaction performance data, and generating transaction scene data, wherein the transaction scene data comprises transaction type distribution, transaction frequency mode and transaction scale distribution;
Step S183, a clearing scene model is built based on the clearing state data, and clearing scene data is generated, wherein the clearing scene data comprises a clearing batch, a clearing scale and a clearing time window;
Step S184, constructing a wind control scene model according to wind control state data, and generating wind control scene data, wherein the wind control scene data comprises a wind control rule set, a wind control threshold value and a wind control response strategy;
step S185, performing scene combination optimization on transaction scene data, clearing scene data and wind control scene data to generate comprehensive test scene data;
step S186, carrying out load parameter self-adaptive adjustment on the comprehensive test scene data based on the system state basic data to generate scene load data;
And step 187, integrating and matching the environment parameter configuration data with the scene load data to generate test environment parameter data.
4. The method of claim 3, wherein the step S2 comprises the steps of:
Step S21, carrying out gradient incremental test on transaction parameters in the test environment parameter data to generate transaction pressure test data, wherein the transaction parameters comprise the number of concurrent users, the transaction frequency and the transaction complexity;
S22, carrying out high-volume transaction and batch transaction pressure test according to the test environment parameter data, and generating transaction capacity data comprising the upper limit of the amount of a single transaction, the concurrency of batch transaction and the cross-system transaction processing capacity;
Step S23, carrying out correlation analysis on the transaction pressure test data and the transaction capacity data to generate transaction bearing capacity data;
s24, performing system capacity pressure test according to the test environment parameter data to generate system capacity limit data, wherein the system capacity pressure test comprises database pressure test and system storage capacity test;
step S25, performing network pressure test based on network bandwidth pressure and network connection number limit according to the test environment parameter data to generate network bearing capacity data;
S26, calculating a system stability index according to the system capacity limit data, the transaction bearing capacity data and the network bearing capacity data, and comparing and analyzing the system stability index with a pre-acquired historical benchmark so as to obtain stability benchmark data;
And step S27, carrying out dynamic load gradient processing according to the stability reference data to generate system pressure gradient data, and carrying out risk level mapping on the system pressure gradient data to generate risk assessment mapping data.
5. The method of claim 4, wherein the step S27 includes the steps of:
step S271, performing multi-dimensional data layering processing on the stability reference data to generate layering reference data including a transaction layer pressure reference, a system layer pressure reference and a network layer pressure reference;
Step S272, constructing a dynamic load calculation model based on the layered reference data, and carrying out load parameter standardization processing to generate load standardization data;
step S273, carrying out gradient feature extraction based on machine learning on the load standardized data so as to generate load feature vector data;
Step S274, a multi-level load gradient model is established according to the load characteristic vector data, and load gradient classification data comprising a light load level, a medium load level, a heavy load level and an overload level are generated;
step 275, performing dynamic threshold calculation on the load gradient classification data, performing threshold self-adaptive optimization according to preset historical pressure measurement data, and generating gradient threshold data;
Step S276, carrying out real-time evaluation on the current load state of the system based on gradient threshold data, and carrying out load prediction analysis to generate system pressure gradient data;
Step S277, performing risk level mapping on the system pressure gradient data to generate risk assessment mapping data.
6. The method of claim 5, wherein the step S277 includes the steps of:
Step S2771, establishing a multi-dimensional risk assessment index system according to system pressure gradient data to generate risk index data, wherein the risk index comprises system performance risks, service processing risks, data consistency risks and network transmission risks;
step S2772, classifying risk levels of the risk index data according to a preset risk index threshold value to obtain risk level classification data;
Step S2773, constructing an early warning rule model comprising early warning trigger conditions, early warning level judgment and early warning message pushing strategies based on risk level division data;
Step S2774, performing reliability verification on the early warning rule model, and performing rule optimization adjustment based on a verification result to generate early warning strategy data;
And S2775, mapping and associating the risk classification data with the early warning strategy data, and constructing a risk response mechanism so as to obtain risk assessment mapping data.
7. The method of claim 6, wherein the step S3 includes the steps of:
S31, acquiring abnormal behavior data of a financial transaction system through a distributed monitoring system, so as to obtain original abnormal data comprising system layer abnormal data, application layer abnormal data and network layer abnormal data;
Step S32, carrying out data cleaning and time sequence alignment on the original abnormal data to generate standardized abnormal data, and carrying out feature extraction based on performance abnormality, resource abnormality and business abnormality on the standardized abnormal data to obtain abnormal behavior feature data;
s33, constructing a fault mode classification model for distinguishing the system fault type, the service fault type and the network fault type according to the abnormal mode sequence data, and carrying out fault type identification by utilizing the fault mode classification model to obtain fault classification data;
step S34, training an intelligent early warning model according to the pre-acquired historical fault data to generate early warning model parameter data;
Step S35, carrying out multi-level early warning threshold calculation according to the early warning model parameter data and the fault probability data so as to obtain threshold grading data;
and step S36, constructing an early warning rule model based on the threshold grading data, and generating early warning trigger condition data, wherein the early warning rule model comprises early warning condition definition, early warning level judgment and an early warning trigger mechanism.
8. The method of claim 1, wherein step S5 includes the steps of:
step S51, constructing a multi-dimensional risk assessment model according to early warning triggering condition data, extracting risk characteristics from performance bottleneck positioning data, and generating risk characteristic data, wherein the risk characteristics comprise performance degradation risk characteristics, system collapse risk characteristics, data consistency risk characteristics and service interruption risk characteristics;
step S52, risk propagation path analysis is carried out on the risk characteristic data, so that risk propagation path data are obtained;
Step S53, probability evaluation is carried out on the risk propagation path data, a risk probability matrix is established, and risk probability distribution data are generated;
Step S54, carrying out multi-level risk classification according to the risk probability distribution data to generate risk assessment data;
Step S55, carrying out strategy matching analysis on the risk assessment data according to a preset system recovery strategy library to generate strategy matching data, wherein the system recovery strategy library comprises a resource capacity expansion strategy, a load balancing strategy, a service degradation strategy and a fault isolation strategy;
step S56, constructing a multi-dimensional recovery scheme based on the strategy matching data, and generating fault recovery scheme data comprising an instant recovery scheme, a progressive recovery scheme, an emergency recovery scheme and a disaster recovery switching scheme;
S57, performing scene simulation test on the fault recovery scheme data, and performing validity evaluation based on the scheme success rate, recovery time and resource consumption to generate simulation result data;
And S58, performing recovery strategy generation according to the simulation result data, so as to obtain recovery strategy data.
9. A financial environmental stability-based testing system for performing the financial environmental stability-based testing method of claim 1, the financial environmental stability-based testing system comprising:
The system comprises an environment construction module, a virtual test environment, a test scene self-adaptive configuration module, a test environment parameter module and a test environment parameter module, wherein the environment construction module is used for carrying out real-time state monitoring on a financial transaction system to generate system state basic data;
The system comprises a pressure evaluation module, a dynamic load gradient processing module, a risk level mapping module and a risk level analysis module, wherein the pressure evaluation module is used for carrying out multidimensional pressure test analysis on test environment parameter data and calculating a system stability index to generate stability reference data;
The abnormal monitoring module is used for acquiring abnormal behavior characteristics of the financial transaction system by using the distributed monitoring system and generating abnormal behavior characteristic data;
The performance diagnosis module is used for carrying out multi-dimensional performance analysis based on the system monitoring data acquired in real time to obtain performance index data, wherein the multi-dimensional performance comprises multi-dimensional performance, resource utilization dimension, service processing dimension and reliability dimension;
the recovery strategy module is used for carrying out risk level assessment on the performance bottleneck positioning data through the early warning triggering condition data to obtain risk assessment data, constructing a system recovery strategy according to the risk assessment data to generate fault recovery scheme data, carrying out simulation verification on the fault recovery scheme data, and carrying out recovery strategy generation based on a simulation result to obtain recovery strategy data;
and the stability evaluation module is used for evaluating the stability of the financial transaction system by utilizing the risk evaluation mapping data and the recovery strategy data so as to obtain environmental stability evaluation data.
CN202411588020.5A 2024-11-08 2024-11-08 A testing method and system based on financial environment stability Active CN119130623B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411588020.5A CN119130623B (en) 2024-11-08 2024-11-08 A testing method and system based on financial environment stability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411588020.5A CN119130623B (en) 2024-11-08 2024-11-08 A testing method and system based on financial environment stability

Publications (2)

Publication Number Publication Date
CN119130623A CN119130623A (en) 2024-12-13
CN119130623B true CN119130623B (en) 2025-02-28

Family

ID=93762493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411588020.5A Active CN119130623B (en) 2024-11-08 2024-11-08 A testing method and system based on financial environment stability

Country Status (1)

Country Link
CN (1) CN119130623B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076270A (en) * 2023-08-03 2023-11-17 浪潮通用软件有限公司 System stability evaluation method, device and medium
CN117632718A (en) * 2023-11-28 2024-03-01 中国建设银行股份有限公司 Bank counter channel system transaction service testing method and device
CN118193346A (en) * 2024-04-17 2024-06-14 京东科技信息技术有限公司 Stability testing method and testing system for business system, and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11204861B2 (en) * 2019-03-05 2021-12-21 Honeywell International Inc. Systems and methods for fault injection and ensuring failsafe FMS SaaS platforms
CN115562978A (en) * 2022-09-26 2023-01-03 四川启睿克科技有限公司 Performance test system and method based on service scene
CN118897809B (en) * 2024-10-09 2025-01-24 浙江安防职业技术学院 A method and system for monitoring the testing process of computer network application programs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117076270A (en) * 2023-08-03 2023-11-17 浪潮通用软件有限公司 System stability evaluation method, device and medium
CN117632718A (en) * 2023-11-28 2024-03-01 中国建设银行股份有限公司 Bank counter channel system transaction service testing method and device
CN118193346A (en) * 2024-04-17 2024-06-14 京东科技信息技术有限公司 Stability testing method and testing system for business system, and electronic equipment

Also Published As

Publication number Publication date
CN119130623A (en) 2024-12-13

Similar Documents

Publication Publication Date Title
US12130699B2 (en) Using an event graph schema for root cause identification and event classification in system monitoring
CN113946499B (en) A microservice link tracking and performance analysis method, system, device and application
US9489379B1 (en) Predicting data unavailability and data loss events in large database systems
US11704186B2 (en) Analysis of deep-level cause of fault of storage management
Wu et al. Invalid bug reports complicate the software aging situation
CN117640350A (en) Autonomous real-time fault isolation method based on event log
Gupta et al. A supervised deep learning framework for proactive anomaly detection in cloud workloads
Bommala et al. Machine learning job failure analysis and prediction model for the cloud environment
CN118939562B (en) Method and system for non-functional testing of distributed financial systems
Ali et al. [Retracted] Classification and Prediction of Software Incidents Using Machine Learning Techniques
CN118761745B (en) OA collaborative workflow optimization method applied to enterprise
US20240370177A1 (en) Hard Disk Drive Failure Prediction Method
CN119130623B (en) A testing method and system based on financial environment stability
US20230214739A1 (en) Recommendation system for improving support for a service
US20230214693A1 (en) Sequence prediction explanation using causal chain extraction based on neural network attributions
Jha et al. Holistic measurement-driven system assessment
Nehme Database, heal thyself
Rojas et al. Understanding failures through the lifetime of a top-level supercomputer
US20250111150A1 (en) Narrative generation for situation event graphs
Gupta et al. Diagnosing heterogeneous hadoop clusters
CN118409974B (en) Optimization method of reverse hotel Ai intelligent robbery list platform based on big data analysis
KR102763990B1 (en) Artificial intelligence hybrid fake deposit bank account detection system and method
CN118467178B (en) Implementation method of self-service settlement system based on digital RMB
Wang et al. A Two‐Layer Architecture for Failure Prediction Based on High‐Dimension Monitoring Sequences
CN118689639A (en) A cloud resource intelligent recovery method, system and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant