[go: up one dir, main page]

CN116843456A - Financial big data processing method and system based on artificial intelligence - Google Patents

Financial big data processing method and system based on artificial intelligence Download PDF

Info

Publication number
CN116843456A
CN116843456A CN202311092782.1A CN202311092782A CN116843456A CN 116843456 A CN116843456 A CN 116843456A CN 202311092782 A CN202311092782 A CN 202311092782A CN 116843456 A CN116843456 A CN 116843456A
Authority
CN
China
Prior art keywords
feature vector
vector set
financial
value
structured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311092782.1A
Other languages
Chinese (zh)
Other versions
CN116843456B (en
Inventor
张一超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhang Yichao
Original Assignee
Beijing Yanzhixin Technology Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yanzhixin Technology Service Co ltd filed Critical Beijing Yanzhixin Technology Service Co ltd
Priority to CN202311092782.1A priority Critical patent/CN116843456B/en
Publication of CN116843456A publication Critical patent/CN116843456A/en
Application granted granted Critical
Publication of CN116843456B publication Critical patent/CN116843456B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Technology Law (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a financial big data processing method and system based on artificial intelligence, which relates to the field of artificial intelligence, and comprises the steps of preprocessing data based on the obtained structured financial data of a target object, converting the data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm, and determining a first feature vector set; performing tag coding on the unstructured financial data of the obtained target object based on the unstructured financial data, and mapping the unstructured financial data into a second feature vector set through vector mapping; inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set, and outputting a risk prediction value.

Description

Financial big data processing method and system based on artificial intelligence
Technical Field
The application relates to an artificial intelligence technology, in particular to a financial big data processing method and system based on artificial intelligence.
Background
The financial risk identification capability is an important factor for the health and rapid development of relational financial enterprises, and as the financial industry is influenced by factors such as environment, policies and the overall development trend of the industry, certain periodic fluctuation exists, financial risk indexes are controlled in real time, risk situation pre-judgment of the risk indexes is carried out, potential risks are early taken, and the method is a main method for preventing the financial enterprises from generating systematic risks.
At present, financial risk index calculation mainly reflects historical risk conditions, risk identification has certain hysteresis, risk index calculation is carried out according to historical data, and risk in the short and medium term in the future is predicted by means of expert experience.
Disclosure of Invention
The embodiment of the application provides a financial big data processing method and system based on artificial intelligence, which can at least solve part of problems in the prior art.
In a first aspect of the embodiment of the present application, there is provided an artificial intelligence-based financial big data processing method, including:
performing data preprocessing on the obtained structured financial data of the target object, converting the structured financial data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm, and determining a first feature vector set;
performing tag coding on the unstructured financial data of the obtained target object based on the unstructured financial data, and mapping the unstructured financial data into a second feature vector set through vector mapping;
inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set, and outputting risk prediction values, wherein the financial risk assessment model is constructed based on an extreme learning machine and a particle swarm algorithm.
In an optional implementation manner, the filtering the structured feature vector set through a feature selection algorithm, and determining the first feature vector set includes:
randomly selecting any one of the structured feature vector sets as a splitting point, and respectively determining a neighboring feature vector set of the splitting point and a first weight value corresponding to the splitting point;
respectively determining a second weight value of a left node of the splitting point and a third weight value of a right node of the splitting point in the adjacent feature vector set;
determining a splitting gain value of the splitting point according to the first weight value, the second weight value, the third weight value and the characteristic value of the splitting point;
traversing the structured feature vector set, determining a split gain value corresponding to each feature vector in the structured feature vector set, sorting according to the size of the split gain values, and reserving the feature vectors with the split gain values larger than a preset screening threshold as a first feature vector set.
In an alternative embodiment, the method further comprises training a financial risk assessment model:
randomly initializing a population, wherein the initialized population comprises a plurality of particles, and the positions and the speeds of the particles respectively correspond to the weight values from an input layer to a hidden layer and the weight values from the hidden layer to an output layer of the financial risk assessment model;
setting a fitness function according to a loss function of the financial risk assessment model, determining a fitness value corresponding to each particle, dynamically setting a crossing rate and a variation rate according to the fitness value, performing crossing and variation operations on individuals in the initialized population based on the crossing rate and the variation rate, and taking the particle with the highest fitness value after the crossing and variation operations as an initial optimal individual;
generating a random factor and a convergence factor by iteration randomly until a preset iteration condition is met, and if the random factor is greater than or equal to a random threshold value, updating the position of the initial optimal individual according to a first position updating mode;
if the random factor is smaller than the random threshold, further judging the relation between the convergence factor and the convergence threshold:
if the convergence factor is greater than or equal to the convergence threshold, updating the position of the initial optimal individual according to a second position updating mode;
if the convergence factor is smaller than the convergence threshold, updating the position of the initial optimal individual according to a third position updating mode;
and respectively taking the position and the speed of the optimal individual meeting the preset iteration condition as the weight value from the input layer to the hidden layer and the weight value from the hidden layer to the output layer.
In an alternative embodiment, setting the fitness function according to the loss function of the financial risk assessment model comprises:
wherein ,FITthe fitness function is represented as a function of the fitness,Nrepresenting the number of nodes to be connected,G()the excitation function is represented by a function of the excitation,W i representing input layer to hidden layeriThe weight value of the individual node(s),Xa set of integrated feature vectors is represented,b i represent the firstiThe number of bias parameters is a function of,h i indicating hidden layer to output layeriWeight value of each node.
In an alternative embodiment, the crossover rate and the mutation rate are dynamically set according to the fitness value as shown in the following formula:
wherein ,indicating the j-th crossing rate,Lrepresenting the number of fitness values, +.>Representation of representation NojIndividual population diversity index,/->、/>Standard deviation and variance of fitness values, respectively, +.>、/>、/>Respectively representing the maximum value, the minimum value and the average value of the fitness value;
wherein ,represent the firstvMutation rate of->Representing the variation adjustment coefficient for controlling the speed and amplitude of the adjustment.
In an alternative embodiment, the first location updating means is used for indicating that the location of the initial optimal individual is updated in a spiral ascending manner;
the second position updating mode is used for indicating that the current position of the initial optimal individual is taken as a reference, and position adjustment is randomly carried out;
the third position updating mode is used for indicating that the current position of the initial optimal individual is used as a reference, and position adjustment is conducted towards a preset global optimal position.
In an alternative embodiment, the first location update mode is represented by the following formula:
wherein ,pos(t+1)representation oft+1The location information of the time of day,pos(t)representation oftThe location information of the time of day,Drepresenting the initial position of the initial optimal individual,lrepresenting the spiral rise coefficient;
the second location updating mode is shown as the following formula:
wherein ,RANDrepresenting a random factor;
the third location updating mode is shown as the following formula:
wherein ,representation oftThe global optimum position of the moment in time,rrepresenting the spatial distance of the initially optimal individual from the globally optimal location,arepresenting the wobble factor.
In a second aspect of the embodiments of the present application, there is provided an artificial intelligence based financial big data processing system, comprising:
the first unit is used for preprocessing the data of the obtained structured financial data of the target object, converting the data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm and determining a first feature vector set;
a second unit, configured to perform tag encoding on the obtained unstructured financial data of the target object based on the unstructured financial data, and map the unstructured financial data into a second feature vector set through vector mapping;
and the third unit is used for inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set and outputting a risk prediction value, wherein the financial risk assessment model is constructed based on an extreme learning machine and a particle swarm algorithm.
In a third aspect of an embodiment of the present application, there is provided an electronic device including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.
In a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.
According to the application, the weight values from the input layer to the hidden layer and the weight values from the hidden layer to the output layer are mapped to the position and the speed of each individual in the population, the process of training the weight values of the financial risk assessment model is converted into the process of searching the optimal individual, and the global searching and local searching capabilities of the particle swarm algorithm are fully utilized.
Performing crossover and mutation operations on individuals in the initialized population, and taking the particles with the highest fitness value after the crossover and mutation operations as initial optimal individuals; through dynamic crossover and mutation operation, the whole algorithm can be effectively prevented from falling into local optimum.
When the random factor is greater than or equal to the random threshold, the first location update approach may be more random or exploratory, which helps the algorithm jump out of the locally optimal solution, increasing the diversity of the search, and thus having a greater chance to find the globally optimal solution. When the randomness factor is less than the randomness threshold and the convergence factor is greater than or equal to the convergence threshold, the use of the second location update approach may be more deterministic or utilizable, which facilitates the algorithm to perform finer searches around the current optimal solution, thereby performing local optimizations. When the random factor is less than the random threshold and the convergence factor is less than the convergence threshold, the use of a third location update approach may be a strategy between exploration and utilization, which allows the algorithm to dynamically adjust its strategy according to the current search situation, thus achieving better adaptability.
Drawings
FIG. 1 is a flow chart of a financial big data processing method based on artificial intelligence according to an embodiment of the application;
FIG. 2 is a schematic diagram of an artificial intelligence based financial big data processing system according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The technical scheme of the application is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
FIG. 1 is a schematic flow chart of an artificial intelligence-based financial big data processing method according to an embodiment of the application, as shown in FIG. 1, the method includes:
s101, preprocessing data based on the obtained structured financial data of the target object, converting the data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm, and determining a first feature vector set;
illustratively, the financial structured data of the target object may include: credit history: credit reports, debit card accounts, loan records, etc. may provide a borrower's credit history, including overdue conditions, repayment records, etc., to help assess its repayment capabilities; revenue and employment information: structured data such as personal income, work units, professions, etc. can be used to determine the repayment capabilities of the borrower; financial index: including financial statements, personal assets, liabilities, etc., may be used to evaluate the borrower's financial status; history of borrowing: the past history of borrowing and payment records of borrowers can reflect the borrowing behavior and debt management capability.
In practical applications, the financial structured data of the target object often has a large data volume, has more noise data, abnormal data and the like, causes larger data interference for subsequent analysis, and increases calculation load, so that it is necessary to perform data preprocessing on the structured financial data, where the data preprocessing may include data cleaning, such as missing value processing and abnormal value processing, and the missing value processing may include selecting to fill in missing values (using a mean value, a median value, a mode value and the like), or directly deleting records containing missing values; outlier processing may include detecting and processing outliers by statistical methods (e.g., IQR, Z-score). The preprocessing of data may refer to the prior art, and the application is not limited in this regard.
After the structured financial data is subjected to data preprocessing, the structured financial data is converted into a structured feature vector set for subsequent feature extraction and feature screening.
In an alternative embodiment, the screening the structured feature vector set by the feature selection algorithm, determining the first feature vector set includes:
randomly selecting any one of the structured feature vector sets as a splitting point, and respectively determining a neighboring feature vector set of the splitting point and a first weight value corresponding to the splitting point;
respectively determining a second weight value of a left node adjacent to a splitting point in the feature vector set and a third weight value of a right node adjacent to the splitting point;
determining a splitting gain value of the splitting point according to the first weight value, the second weight value, the third weight value and the characteristic value of the splitting point;
traversing the structured feature vector set, determining a split gain value corresponding to each feature vector in the structured feature vector set, sorting according to the size of the split gain values, and reserving the feature vectors with the split gain values larger than a preset screening threshold as a first feature vector set.
In practice, as the number of features increases, the amount of data required increases exponentially, which can lead to model training becoming very difficult, especially in situations where the amount of data is limited; not all features are related to the target variable, some features may be noise, while some features may be highly correlated (redundant) with other features, these uncorrelated or redundant features may degrade the performance of the model; reducing the number of features can reduce the time for model training and prediction, improving computational efficiency.
Illustratively, the present application performs feature screening based on a decision tree, randomly selecting any feature vector from a structured set of feature vectors as a split point, which is a specific feature value used to divide the data set into two or more subsets, and when constructing the decision tree, the algorithm selects one or more split points for each feature in order to maximize some metric (such as information gain, kini uncertainty, or mean square error), the selection of the split points determining how the data is split at each node of the tree. And determining a set of adjacent feature vectors of the splitting point according to a distance measure (such as Euclidean distance), for example, calculating the spatial distance between the splitting point and other feature vectors, and taking the feature vectors with the spatial distance smaller than a certain value as the adjacent feature vectors.
The method for determining the weight value may include determining according to an average gain value of the splitting point in the overall tree, and may refer to the existing method specifically, but the application is not limited thereto.
Further, determining the splitting gain value of the splitting point according to the first weight value, the second weight value, the third weight value, and the feature value of the splitting point includes:
wherein ,SGrepresenting the value of the splitting gain,w 1w 2w 3 respectively representing a first weight value, a second weight value and a third weight value,FF leftF right the characteristic value of the split point, the characteristic value of the left node of the split point and the characteristic value of the right node of the split point are respectively represented.
Illustratively, the split gain values are used in decision tree algorithms to evaluate a measure of the effect of a particular feature value or split point in dividing a data set into two or more subsets, measuring the change in purity before and after splitting. The splitting gain provides an objective criterion for the decision tree to select the best features and splitting points, which ensures that each splitting of the tree is based on a gain that maximizes purity, and by preferentially selecting features with high splitting gain, the decision tree construction process can be more efficient because it avoids unnecessary computation on low gain features.
Traversing the structured feature vector set, determining a split gain value corresponding to each feature vector in the structured feature vector set, sorting according to the size of the split gain values, and reserving the feature vectors with the split gain values larger than a preset screening threshold as a first feature vector set.
The application screens the structured feature vector set through the feature selection algorithm to eliminate noise and redundant features, and the feature selection can improve the accuracy and generalization capability of the model; reducing the number of features can reduce the complexity of the model, thereby reducing the risk of overfitting; feature selection may reduce the time and computational resources required for model training; the selected feature subset may provide a clearer view of the decision.
S102, carrying out tag coding on the unstructured financial data of the obtained target object based on the unstructured financial data, and mapping the unstructured financial data into a second feature vector set through vector mapping;
illustratively, the unstructured financial data of the target object may include non-numeric class information such as address, academic information, and consumption habits of the target object, and the tag encoding of the unstructured financial data may include assigning a unique integer to each unique tag, e.g., address a, academic information B, and consumption habits C may be encoded as follows: a-1, B-2, C-3; the unstructured financial data subjected to tag encoding may be mapped to the second feature vector set using one-hot encoding, and specifically, for each category, a vector having a length equal to the number of categories may be created, for example, the number of categories of the unstructured financial data described above is three, the length of the vector is three, the position representing the category is set to 1 for the vector of each category, the remaining positions are set to 0, that is, the address a may be expressed as [1, 0], the learning information B may be expressed as [0,1,0], and the consumption habit C may be expressed as [0,1], by which the unstructured financial data is mapped to the second feature vector set.
S103, inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set, and outputting a risk prediction value.
For example, considering that the meaning and calculation standard represented by two different types of feature vectors are different, the direct fusion process may cause disturbance of the point ratio of the feature, and therefore, the feature vector set is first subjected to normalization before fusion. The normalized feature conversion of the first feature vector set is shown in the following formula:
;
wherein ,representing the first feature vector set after normalized feature transformation, < > and>representing any vector in the first set of feature vectors, is->、/>Respectively represent the firstThe vector with the largest eigenvalue and the vector with the smallest eigenvalue in the eigenvector set;
normalized feature transformation for the second set of feature vectors is shown in the following equation:
;
wherein ,representing a second set of feature vectors after normalized feature transformation, < > and>representing any vector of the second set of feature vectors, is->、/>The vector with the largest eigenvalue and the vector with the smallest eigenvalue in the second eigenvector set are respectively represented.
The first feature vector set and the second feature vector set may be fused into a comprehensive feature vector set, and the feature vector set may be vector-spliced, or may be weighted and averaged, which is not limited herein.
The financial risk assessment model of the embodiment of the application can be improved and constructed based on an extreme learning machine and combined with a particle swarm algorithm.
In an alternative embodiment, the method further comprises training a financial risk assessment model:
randomly initializing a population, wherein the initialized population comprises a plurality of chromosomes, and the chromosomes comprise weight values from an input layer to a hidden layer and weight values from the hidden layer to an output layer of a financial risk assessment model;
setting fitness functions according to the loss functions of the financial risk assessment model, determining fitness values corresponding to each chromosome, dynamically setting crossing rate and mutation rate according to the fitness values, performing crossing and mutation operations on the chromosomes based on the crossing rate and the mutation rate, and taking the chromosome with the highest fitness value after the crossing and mutation operations as an initial optimal individual;
generating random factors and convergence factors at random in an iteration mode until a preset iteration condition is met, and if the random factors are larger than or equal to a random threshold value, updating the position of an initial optimal individual according to a first position updating mode;
if the random factor is smaller than the random threshold, further judging the relation between the convergence factor and the convergence threshold:
if the convergence factor is greater than or equal to the convergence threshold, updating the position of the initial optimal individual according to a second position updating mode;
if the convergence factor is smaller than the convergence threshold, updating the position of the initial optimal individual according to a third position updating mode;
and taking the position of the optimal individual meeting the preset iteration condition as the output weight of the output layer and the hiding weight of the hiding layer.
According to the application, the weight values from the input layer to the hidden layer and the weight values from the hidden layer to the output layer are mapped to the position and the speed of each individual in the population, the process of training the weight values of the financial risk assessment model is converted into the process of searching the optimal individual, and the global searching and local searching capabilities of the particle swarm algorithm are fully utilized.
Specifically, randomly initializing a population, wherein the initialized population comprises a plurality of particles, and the positions and the speeds of the particles respectively correspond to the weight values from an input layer to a hidden layer and the weight values from the hidden layer to an output layer of the financial risk assessment model; setting the fitness function according to the loss function of the financial risk assessment model, wherein the loss function of the financial risk assessment model may include a loss function of an extreme learning machine, and setting the fitness function according to the loss function of the financial risk assessment model includes:
wherein ,FITthe fitness function is represented as a function of the fitness,Nrepresenting the number of nodes to be connected,G()the excitation function is represented by a function of the excitation,W i representing input layers to hiddenLayer numberiThe weight value of the individual node(s),Xa set of integrated feature vectors is represented,b i represent the firstiThe number of bias parameters is a function of,h i indicating hidden layer to output layeriWeight value of each node.
For example, the selection of the optimal solution of the conventional particle swarm algorithm has a particularly large dependence on the optimal individuals of the initial population, so that the individual quality thereof has a great influence on the overall operation method, and other algorithms must be combined for improving in order to shorten the iterative convergence speed. The improved method has complex operation steps and flow, the optimal solution is solved by a decoding method after the flow is carried out aiming at the summarization of all individual codes in the population, the process is difficult to realize, and the convergence speed is low.
The application can enhance the global searching capability of the improved particle swarm algorithm and avoid sinking into local optimum by dynamically setting the crossing rate and the variation rate to be adaptively adjusted along with the change of the adaptability in the iterative process.
Optionally, dynamically setting the crossover rate and the mutation rate corresponding to the fitness value includes:
wherein ,indicating the j-th crossing rate,Lrepresenting the number of fitness values, +.>Representation of representation NojIndividual population diversity index,/->、/>Standard deviation and square respectively representing fitness valueDifference (S)>、/>、/>Respectively representing the maximum value, the minimum value and the average value of the fitness value;
wherein ,represent the firstvMutation rate of->Representing the variation adjustment coefficient for controlling the speed and amplitude of the adjustment.
Performing crossover and mutation operations on individuals in the initialized population, and taking the particles with the highest fitness value after the crossover and mutation operations as initial optimal individuals; through dynamic crossover and mutation operation, the whole algorithm can be effectively prevented from falling into local optimum.
Further, the speed of particle movement is represented by velocity and the direction of movement is represented by position. In the search space, each particle searches for an optimal solution, each time a new solution is searched, the new solution is compared with the current individual optimal solution, the better solution is set as the new optimal solution, the individual optimal solution is compared with the individual optimal solutions of other particles in the whole particle swarm, the optimal individual optimal solution is found to be the current global optimal solution of the whole particle swarm, and all the particles in the particle swarm adjust the speed and the position of the particle according to the current self optimal solution found by the particle and the current global optimal solution in the whole particle swarm.
The diversity of the search space is increased by jumping out of the local optimal solution by the random factor, and whether the current search strategy is effective is determined by the convergence factor. Specifically, if the random factor is greater than or equal to a random threshold, updating the position of the initial optimal individual according to a first position updating mode;
if the random factor is smaller than the random threshold, further judging the relation between the convergence factor and the convergence threshold:
if the convergence factor is greater than or equal to the convergence threshold, updating the position of the initial optimal individual according to a second position updating mode;
if the convergence factor is smaller than the convergence threshold, updating the position of the initial optimal individual according to a third position updating mode;
wherein the random factor is a random number in the range of 0,1, which is used to decide whether to take some policy or action. The result of the comparison of the random factor with the random threshold is used to decide whether to update the location of the initially optimal individual in a first location update, and if the random factor is greater than or equal to a given random threshold, such an update strategy is adopted. The first position updating mode is used for indicating the particles to adjust positions according to a spiral rising mode; the first location update is as follows:
wherein ,pos(t+1)representation oft+1The location information of the time of day,pos(t)representation oftThe location information of the time of day,Drepresenting the initial position of the initial optimal individual,lrepresents the spiral increase coefficient, wherein the spiral increase coefficient is [ -1,1]Random numbers in between.
If the random factor is smaller than the random threshold, further judging the relation between the convergence factor and the convergence threshold, wherein the convergence factor is a measure representing the convergence speed or trend of the algorithm to the global optimal solution or a certain target solution, and can be a value in the range of [0,1] or other measures related to the convergence performance of the algorithm. The convergence factor helps the algorithm determine if its current search strategy is valid, and if the algorithm converges too fast, it may fall into a local optimum; if it converges too slowly, it may waste computing resources.
Further, if the convergence factor is greater than or equal to the convergence threshold, updating the position of the initial optimal individual according to a second position updating mode;
if the convergence factor is smaller than the convergence threshold, updating the position of the initial optimal individual according to a third position updating mode;
the second position updating mode is used for indicating that the current position of the initial optimal individual is taken as a reference, and position adjustment is randomly carried out;
the second location update mode is shown in the following formula:
wherein ,RANDrepresenting a random factor;
the third position updating mode is used for indicating that the current position of the initial optimal individual is used as a reference, and position adjustment is conducted towards the preset global optimal position.
The third location update is shown in the following formula:
wherein ,representation oftThe global optimum position of the moment in time,rrepresenting the spatial distance of the initial optimal individual from the global optimal location,arepresenting a swing factor, wherein the swing factor is used for indicating a direction guiding value for adjusting the position of an initial optimal individual to a preset global optimal position, and the swing factor can be [0,1]]A previous random value.
Illustratively, when the random factor is greater than or equal to the random threshold, using the first location update approach may be more random or exploratory, which helps the algorithm jump out of the locally optimal solution, increasing the diversity of the search, and thus having a greater chance to find the globally optimal solution. When the randomness factor is less than the randomness threshold and the convergence factor is greater than or equal to the convergence threshold, the use of the second location update approach may be more deterministic or utilizable, which facilitates the algorithm to perform finer searches around the current optimal solution, thereby performing local optimizations. When the random factor is less than the random threshold and the convergence factor is less than the convergence threshold, the use of a third location update approach may be a strategy between exploration and utilization, which allows the algorithm to dynamically adjust its strategy according to the current search situation, thus achieving better adaptability.
In a second aspect of the embodiment of the present application, there is provided an artificial intelligence-based big financial data processing system, and fig. 2 is a schematic structural diagram of the artificial intelligence-based big financial data processing system according to the embodiment of the present application, including:
the first unit is used for preprocessing the data of the obtained structured financial data of the target object, converting the data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm and determining a first feature vector set;
a second unit, configured to perform tag encoding on the obtained unstructured financial data of the target object based on the obtained unstructured financial data, and map the unstructured financial data into a second feature vector set through vector mapping;
and the third unit is used for inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set, and outputting a risk prediction value, wherein the financial risk assessment model is constructed based on a limit learning machine combined with a particle swarm algorithm.
In a third aspect of an embodiment of the present application, there is provided an electronic device including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method as described above.
In a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the foregoing method.
The present application may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims (10)

1. The financial big data processing method based on artificial intelligence is characterized by comprising the following steps:
performing data preprocessing on the obtained structured financial data of the target object, converting the structured financial data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm, and determining a first feature vector set;
performing tag coding on the unstructured financial data of the obtained target object based on the unstructured financial data, and mapping the unstructured financial data into a second feature vector set through vector mapping;
inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set, and outputting risk prediction values, wherein the financial risk assessment model is constructed based on an extreme learning machine and a particle swarm algorithm.
2. The method of claim 1, wherein the screening the set of structured feature vectors by a feature selection algorithm to determine a first set of feature vectors comprises:
randomly selecting any one of the structured feature vector sets as a splitting point, and respectively determining a neighboring feature vector set of the splitting point and a first weight value corresponding to the splitting point;
respectively determining a second weight value of a left node of the splitting point and a third weight value of a right node of the splitting point in the adjacent feature vector set;
determining a splitting gain value of the splitting point according to the first weight value, the second weight value, the third weight value and the characteristic value of the splitting point;
traversing the structured feature vector set, determining a split gain value corresponding to each feature vector in the structured feature vector set, sorting according to the size of the split gain values, and reserving the feature vectors with the split gain values larger than a preset screening threshold as a first feature vector set.
3. The method of claim 1, further comprising training a financial risk assessment model:
randomly initializing a population, wherein the initialized population comprises a plurality of particles, and the positions and the speeds of the particles respectively correspond to the weight values from an input layer to a hidden layer and the weight values from the hidden layer to an output layer of the financial risk assessment model;
setting a fitness function according to a loss function of the financial risk assessment model, determining a fitness value corresponding to each particle, dynamically setting a crossing rate and a variation rate according to the fitness value, performing crossing and variation operations on individuals in the initialized population based on the crossing rate and the variation rate, and taking the particle with the highest fitness value after the crossing and variation operations as an initial optimal individual;
generating a random factor and a convergence factor by iteration randomly until a preset iteration condition is met, and if the random factor is greater than or equal to a random threshold value, updating the position of the initial optimal individual according to a first position updating mode;
if the random factor is smaller than the random threshold, further judging the relation between the convergence factor and the convergence threshold:
if the convergence factor is greater than or equal to the convergence threshold, updating the position of the initial optimal individual according to a second position updating mode;
if the convergence factor is smaller than the convergence threshold, updating the position of the initial optimal individual according to a third position updating mode;
and respectively taking the position and the speed of the optimal individual meeting the preset iteration condition as the weight value from the input layer to the hidden layer and the weight value from the hidden layer to the output layer.
4. A method according to claim 3, wherein setting an fitness function according to a loss function of the financial risk assessment model comprises:
wherein ,FITthe fitness function is represented as a function of the fitness,Nrepresenting the number of nodes to be connected,G()the excitation function is represented by a function of the excitation,W i representing input layer to hidden layeriThe weight value of the individual node(s),Xa set of integrated feature vectors is represented,b i represent the firstiThe number of bias parameters is a function of,h i indicating hidden layer to output layeriWeight value of each node.
5. A method according to claim 3, wherein the crossover rate and the mutation rate are dynamically set according to the fitness value as shown in the following formula:
wherein ,indicating the j-th crossing rate,Lrepresenting the number of fitness values, +.>Representation of representation NojIndividual population diversity index,/->、/>Standard deviation and variance of fitness values, respectively, +.>、/>、/>Respectively representing the maximum value, the minimum value and the average value of the fitness value;
wherein ,represent the firstvMutation rate of->Representing the variation adjustment coefficient for controlling the speed and amplitude of the adjustment.
6. The method of claim 3, wherein the step of,
the first position updating mode is used for indicating to update the position of the initial optimal individual in a spiral ascending mode;
the second position updating mode is used for indicating that the current position of the initial optimal individual is taken as a reference, and position adjustment is randomly carried out;
the third position updating mode is used for indicating that the current position of the initial optimal individual is used as a reference, and position adjustment is conducted towards a preset global optimal position.
7. A method according to claim 3, wherein the first location update means is represented by the formula:
wherein ,pos(t+1)representation oft+1The location information of the time of day,pos(t)representation oftThe location information of the time of day,Drepresenting the initial position of the initial optimal individual,lrepresenting the spiral rise coefficient;
the second location updating mode is shown as the following formula:
wherein ,RANDrepresenting a random factor;
the third location updating mode is shown as the following formula:
wherein ,representation oftThe global optimum position of the moment in time,rrepresenting the spatial distance of the initially optimal individual from the globally optimal location,arepresenting the wobble factor.
8. An artificial intelligence based financial big data processing system, comprising:
the first unit is used for preprocessing the data of the obtained structured financial data of the target object, converting the data into a structured feature vector set, screening the structured feature vector set through a feature selection algorithm and determining a first feature vector set;
a second unit, configured to perform tag encoding on the obtained unstructured financial data of the target object based on the unstructured financial data, and map the unstructured financial data into a second feature vector set through vector mapping;
and the third unit is used for inputting the first feature vector set and the second feature vector set into a pre-constructed financial risk assessment model, fusing the first feature vector set and the second feature vector set into a comprehensive feature vector set, distributing hidden weights for the comprehensive feature vector set and outputting a risk prediction value, wherein the financial risk assessment model is constructed based on an extreme learning machine and a particle swarm algorithm.
9. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.
CN202311092782.1A 2023-08-29 2023-08-29 Financial big data processing method and system based on artificial intelligence Active CN116843456B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311092782.1A CN116843456B (en) 2023-08-29 2023-08-29 Financial big data processing method and system based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311092782.1A CN116843456B (en) 2023-08-29 2023-08-29 Financial big data processing method and system based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN116843456A true CN116843456A (en) 2023-10-03
CN116843456B CN116843456B (en) 2023-11-07

Family

ID=88163799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311092782.1A Active CN116843456B (en) 2023-08-29 2023-08-29 Financial big data processing method and system based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN116843456B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932245A (en) * 2024-03-21 2024-04-26 华南理工大学 Financial data missing value completion method, device and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097459A (en) * 2019-05-08 2019-08-06 重庆斐耐科技有限公司 A kind of financial risks appraisal procedure and system based on big data technology
CN112037012A (en) * 2020-08-14 2020-12-04 百维金科(上海)信息科技有限公司 Internet financial credit evaluation method based on PSO-BP neural network
CN112329906A (en) * 2020-11-06 2021-02-05 汉唐智华(深圳)科技发展有限公司 Artificial intelligence financial risk measurement method based on particle swarm algorithm
CN113538125A (en) * 2021-06-29 2021-10-22 百维金科(上海)信息科技有限公司 Risk rating method for optimizing Hopfield neural network based on firefly algorithm
CN113657028A (en) * 2021-08-05 2021-11-16 长春理工大学 Multi-source information-based aerosol optical thickness online prediction method
US20220215467A1 (en) * 2021-01-06 2022-07-07 Capital One Services, Llc Systems and methods for determining financial security risks using self-supervised natural language extraction
CN115147208A (en) * 2022-07-21 2022-10-04 上海安能聚创物流科技有限公司 Supply chain financial credit risk evaluation method and system based on artificial intelligence
CN115393108A (en) * 2022-08-31 2022-11-25 中国银行股份有限公司 Financial risk control method, device and equipment
CN115795035A (en) * 2022-12-01 2023-03-14 上海大学 Science and technology service resource classification method and system based on evolutionary neural network and computer readable storage medium thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097459A (en) * 2019-05-08 2019-08-06 重庆斐耐科技有限公司 A kind of financial risks appraisal procedure and system based on big data technology
CN112037012A (en) * 2020-08-14 2020-12-04 百维金科(上海)信息科技有限公司 Internet financial credit evaluation method based on PSO-BP neural network
CN112329906A (en) * 2020-11-06 2021-02-05 汉唐智华(深圳)科技发展有限公司 Artificial intelligence financial risk measurement method based on particle swarm algorithm
US20220215467A1 (en) * 2021-01-06 2022-07-07 Capital One Services, Llc Systems and methods for determining financial security risks using self-supervised natural language extraction
CN113538125A (en) * 2021-06-29 2021-10-22 百维金科(上海)信息科技有限公司 Risk rating method for optimizing Hopfield neural network based on firefly algorithm
CN113657028A (en) * 2021-08-05 2021-11-16 长春理工大学 Multi-source information-based aerosol optical thickness online prediction method
CN115147208A (en) * 2022-07-21 2022-10-04 上海安能聚创物流科技有限公司 Supply chain financial credit risk evaluation method and system based on artificial intelligence
CN115393108A (en) * 2022-08-31 2022-11-25 中国银行股份有限公司 Financial risk control method, device and equipment
CN115795035A (en) * 2022-12-01 2023-03-14 上海大学 Science and technology service resource classification method and system based on evolutionary neural network and computer readable storage medium thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117932245A (en) * 2024-03-21 2024-04-26 华南理工大学 Financial data missing value completion method, device and storage medium
CN117932245B (en) * 2024-03-21 2024-06-11 华南理工大学 A method, device and storage medium for completing missing values of financial data

Also Published As

Publication number Publication date
CN116843456B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN110263227B (en) Group partner discovery method and system based on graph neural network
Liang et al. The effect of feature selection on financial distress prediction
Zhao et al. Investigation and improvement of multi-layer perceptron neural networks for credit scoring
Krishnaiah et al. Survey of classification techniques in data mining
Chen et al. Reinforcement Learning‐Based Genetic Algorithm in Optimizing Multidimensional Data Discretization Scheme
Li et al. Predicting business failure using an RSF‐based case‐based reasoning ensemble forecasting method
Faritha Banu et al. Artificial intelligence based customer churn prediction model for business markets
Chen et al. Research on credit card default prediction based on k-means SMOTE and BP neural network
CN111210347A (en) Transaction risk early warning method, device, equipment and storage medium
Kang et al. A CWGAN-GP-based multi-task learning model for consumer credit scoring
CN116843456B (en) Financial big data processing method and system based on artificial intelligence
Chen et al. Credit risk prediction in peer-to-peer lending with ensemble learning framework
Eddy et al. Credit scoring models: Techniques and issues
Georgieva Genetic fuzzy system for financial management
Sadiq et al. Normal parameter reduction algorithm in soft set based on hybrid binary particle swarm and biogeography optimizer
He et al. Reference point reconstruction-based firefly algorithm for irregular multi-objective optimization
Wang et al. Deep reinforcement learning based on balanced stratified prioritized experience replay for customer credit scoring in peer-to-peer lending
Victor et al. Loan default prediction using Genetic Algorithm: A study within peer-to-peer lending communities
Kamalloo et al. Credit risk prediction using fuzzy immune learning
Zhu Predicting stock volatility based on weighted fusion model of XGBoost and LightGBM
Ouyang Loan Default Prediction Based on Logistic Regression and XGBoost Modeling
Shetabi Evolutionary-based ensemble feature selection technique for dynamic application-specific credit risk optimization in FinTech lending
Song et al. Feature selection for support vector machine in financial crisis prediction: a case study in China
Smiti et al. Tri-XGBoost Model: An Interpretable Semi-supervised Approach for Addressing Bankruptcy Prediction
Maciel et al. MIMO evolving participatory learning fuzzy modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231108

Address after: 1701, Building 7, Courtyard 1, Yuetan South Street, Xicheng District, Beijing, 100045

Patentee after: Zhang Yichao

Address before: 100000 Commercial 5-040, 2nd Floor, Building 1, No. 66 Zhongguancun East Road, Haidian District, Beijing

Patentee before: Beijing Yanzhixin Technology Service Co.,Ltd.

Patentee before: Zhang Yichao