CN118364261A - Object evaluation method, device, computer equipment and storage medium - Google Patents
Object evaluation method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN118364261A CN118364261A CN202410372803.3A CN202410372803A CN118364261A CN 118364261 A CN118364261 A CN 118364261A CN 202410372803 A CN202410372803 A CN 202410372803A CN 118364261 A CN118364261 A CN 118364261A
- Authority
- CN
- China
- Prior art keywords
- feature
- characteristic variable
- characteristic
- variable group
- machine learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011156 evaluation Methods 0.000 title claims abstract description 183
- 238000010801 machine learning Methods 0.000 claims abstract description 171
- 238000012549 training Methods 0.000 claims abstract description 111
- 238000000034 method Methods 0.000 claims abstract description 68
- 238000012216 screening Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 40
- 230000002068 genetic effect Effects 0.000 claims abstract description 33
- 238000004590 computer program Methods 0.000 claims abstract description 28
- 238000012360 testing method Methods 0.000 claims description 38
- 238000012795 verification Methods 0.000 claims description 37
- 230000035772 mutation Effects 0.000 description 12
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000003672 processing method Methods 0.000 description 5
- 238000006467 substitution reaction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000011282 treatment Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Genetics & Genomics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The present application relates to an object assessment method, apparatus, computer device, storage medium and computer program product usable in big data fields. The method comprises the following steps: generating a plurality of feature variable sets based on the candidate features, each feature variable set including a number of features selected from the candidate features; training machine learning models corresponding to a plurality of characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting conditions from the characteristic variable groups based on the evaluation parameters; under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the characteristic variable groups; and under the condition that the iteration ending condition is met, acquiring a plurality of selected characteristic groups based on the parent characteristic variable group. By adopting the method, the object evaluation can be accurately performed.
Description
Technical Field
The present application relates to the field of artificial intelligence technology, and in particular, to an object evaluation method, an object evaluation apparatus, a computer device, a storage medium, and a computer program product.
Background
With the development of artificial intelligence technology, in various fields, a machine learning model is trained to obtain a trained machine learning model, and then according to the trained machine learning model, a means for realizing object evaluation gradually becomes a mainstream technical means.
Currently, model training processes are typically combined with feature selection. Firstly, a machine learning model is established, the machine learning model corresponding to each characteristic variable combination is trained to obtain a trained machine learning model, based on the performance of the machine learning model, the correlation, stability and distinguishing force of the characteristic variables are determined, the characteristic variables are screened based on the correlation, stability and distinguishing force of the characteristic variables, an optimal characteristic variable combination is obtained based on the screened characteristic variables, the machine learning model corresponding to the optimal characteristic variable combination is trained, the optimal trained machine learning model is determined, and object assessment is carried out based on the optimal trained machine learning model.
However, the current object evaluation method lacks a certain accuracy, and an accurate feature selection method is needed.
Disclosure of Invention
Based on this, it is necessary to provide an accurate object assessment method, an apparatus, a computer device, a computer readable storage medium and a computer program product in view of the above technical problems.
In a first aspect, the present application provides an object assessment method, including:
generating a plurality of feature variable sets based on candidate features, each feature variable set comprising a number of features selected from the candidate features; the candidate features are a number of features for object assessment;
Training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
Under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training machine learning models of the characteristic variable groups;
Under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups;
Training the machine learning models corresponding to the selected multiple groups of feature groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
In some embodiments, the generating a plurality of feature variable sets based on the candidate features includes:
acquiring the number of candidate features and target features;
and randomly generating a plurality of feature variable groups based on the candidate features, wherein the feature quantity in each feature variable group is the same as the target feature quantity.
In some embodiments, the obtaining, based on the training result, the evaluation parameter corresponding to each of the feature variable groups includes:
Processing a preset verification set by adopting each trained machine learning model to obtain a verification result, and obtaining a first evaluation parameter of each trained machine learning model based on the verification result and the training result;
processing a preset test set by adopting each trained machine learning model to obtain a test result, and obtaining a second evaluation parameter of each trained machine learning model based on the test result and the training result;
and combining the first evaluation parameters with the second evaluation parameters to obtain evaluation parameters of the feature variable groups corresponding to the trained machine learning models.
In some embodiments, the first evaluation parameter includes a first stability indicator and a first feature importance indicator; the second evaluation parameter comprises a second stability index and a second characteristic importance index;
The step of screening the parent characteristic variable group meeting the object evaluation condition from the characteristic variable group based on the evaluation parameters comprises the following steps:
and screening from the characteristic variable group to obtain a parent characteristic variable group with the first stability index smaller than a preset first stability threshold, the absolute value of the difference between the second stability indexes smaller than a preset second stability threshold, the first characteristic importance index and the second characteristic importance index smaller than a preset first characteristic importance threshold and the first characteristic importance index larger than or equal to a preset second characteristic importance index threshold.
In some embodiments, the genetic processing of the set of parent feature variables includes one or more of the following:
First kind:
Cross-replacing a first target feature in a first feature variable group and a second target feature matched with the first target feature in a second feature variable group; the first characteristic variable group and the second characteristic variable group are any one characteristic variable group in the parent characteristic variable group, and the first characteristic variable group is different from the second characteristic variable group; the first target feature is any feature in the first feature variable group;
Second kind:
Updating the third target feature in the parent feature variable group to be a fourth target feature; the third target feature is any one feature in the parent feature variable group; the fourth target feature is any one feature in the candidate variables;
third kind:
Acquiring a candidate characteristic variable group;
replacing a third characteristic variable group by adopting the candidate characteristic variable group; the third characteristic variable group is at least one characteristic variable group in the parent characteristic variable group, and the number of the candidate characteristic variable groups is the same as that of the third characteristic variable group;
fourth kind:
The set of parent feature variables is maintained.
In some embodiments, said replacing the third set of feature variables with said candidate set of feature variables comprises:
Randomly generating at least one candidate feature variable set based on the candidate features;
And sequentially replacing a third characteristic variable group in the parent characteristic variable group by adopting the candidate characteristic variable group.
In a second aspect, the present application also provides an object assessment apparatus, the apparatus comprising:
the characteristic variable group generation module is used for generating a plurality of characteristic variable groups based on candidate characteristics, wherein each characteristic variable group comprises a plurality of characteristics selected from the candidate characteristics; the candidate features are a number of features for object assessment;
The first screening module is used for training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
The iteration module is used for carrying out genetic processing on the parent characteristic variable group under the condition that the iteration ending condition is not met, obtaining a plurality of updated characteristic variable groups, and returning to the step of training the machine learning models of the characteristic variable groups;
The second screening module is used for obtaining a plurality of selected characteristic groups based on the parent characteristic variable groups under the condition that the iteration ending condition is met;
and the application module is used for training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
In some embodiments, the feature variable group generation module is further configured to: acquiring the number of candidate features and target features; and randomly generating a plurality of feature variable groups based on the candidate features, wherein the feature quantity in each feature variable group is the same as the target feature quantity.
In some embodiments, the first screening module is further configured to: processing a preset verification set by adopting each trained machine learning model to obtain a verification result, and obtaining a first evaluation parameter of each trained machine learning model based on the verification result and the training result; processing a preset test set by adopting each trained machine learning model to obtain a test result, and obtaining a second evaluation parameter of each trained machine learning model based on the test result and the training result; and combining the first evaluation parameters with the second evaluation parameters to obtain evaluation parameters of the feature variable groups corresponding to the trained machine learning models.
In some embodiments, the first evaluation parameter includes a first stability indicator and a first feature importance indicator; the second evaluation parameter comprises a second stability index and a second characteristic importance index; the first screening module is further configured to: and screening from the characteristic variable group to obtain a parent characteristic variable group with the first stability index smaller than a preset first stability threshold, the absolute value of the difference between the second stability indexes smaller than a preset second stability threshold, the first characteristic importance index and the second characteristic importance index smaller than a preset first characteristic importance threshold and the first characteristic importance index larger than or equal to a preset second characteristic importance index threshold.
In some embodiments, the iteration module is further configured to: first kind: cross-replacing a first target feature in a first feature variable group and a second target feature matched with the first target feature in a second feature variable group; the first characteristic variable group and the second characteristic variable group are any one characteristic variable group in the parent characteristic variable group, and the first characteristic variable group is different from the second characteristic variable group; the first target feature is any feature in the first feature variable group; second kind: updating the third target feature in the parent feature variable group to be a fourth target feature; the third target feature is any one feature in the parent feature variable group; the fourth target feature is any one feature in the candidate variables; third kind: acquiring a candidate characteristic variable group; replacing a third characteristic variable group by adopting the candidate characteristic variable group; the third characteristic variable group is at least one characteristic variable group in the parent characteristic variable group, and the number of the candidate characteristic variable groups is the same as that of the third characteristic variable group; fourth kind: the set of parent feature variables is maintained.
In some embodiments, the iteration module is further configured to: randomly generating at least one candidate feature variable set based on the candidate features; and sequentially replacing a third characteristic variable group in the parent characteristic variable group by adopting the candidate characteristic variable group.
In a third aspect, the present application also provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
generating a plurality of feature variable sets based on candidate features, each feature variable set comprising a number of features selected from the candidate features; the candidate features are a number of features for object assessment;
Training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
Under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training machine learning models of the characteristic variable groups;
Under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups;
Training the machine learning models corresponding to the selected multiple groups of feature groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
generating a plurality of feature variable sets based on candidate features, each feature variable set comprising a number of features selected from the candidate features; the candidate features are a number of features for object assessment;
Training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
Under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training machine learning models of the characteristic variable groups;
Under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups;
Training the machine learning models corresponding to the selected multiple groups of feature groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
In a fifth aspect, the application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of:
generating a plurality of feature variable sets based on candidate features, each feature variable set comprising a number of features selected from the candidate features; the candidate features are a number of features for object assessment;
Training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
Under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training machine learning models of the characteristic variable groups;
Under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups;
Training the machine learning models corresponding to the selected multiple groups of feature groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
The object evaluation method, apparatus, computer device, storage medium, and computer program product described above, generating a plurality of feature variable sets based on candidate features, each feature variable set including a number of features selected from the candidate features; candidate features are several features for object assessment; training machine learning models corresponding to a plurality of characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters; under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the characteristic variable groups; under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups; training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model. In the whole process, firstly, through the evaluation parameters obtained after training the machine learning models corresponding to a plurality of characteristic variable groups, the parent characteristic variable groups are accurately screened, and further, under the condition that iteration ending conditions are not met, genetic processing is adopted to enrich the parent characteristic variable groups, so that the range of subsequent screening is enlarged, further, iteration screening is continuously carried out on the updated parent characteristic variable groups, finally selected multiple groups of characteristic groups are accurately obtained, and the target machine learning model obtained through training the models corresponding to the selected multiple groups of characteristic groups is used for realizing accurate object evaluation.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for those skilled in the art.
FIG. 1 is a diagram of an application environment for an object assessment method in one embodiment;
FIG. 2 is a flow diagram of a method of object assessment in one embodiment;
FIG. 3 is a flow chart of an object assessment method according to another embodiment;
FIG. 4 is a block diagram of an object assessment apparatus in one embodiment;
FIG. 5 is an internal block diagram of a computer device in one embodiment;
fig. 6 is an internal structural view of a computer device in another embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are both information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to meet the related regulations.
The object evaluation method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
Specifically, the user interacts with the terminal 102, and sends out an object evaluation request, where the object evaluation request carries a plurality of candidate features, and the terminal 102 forwards the object evaluation request to the server 104. The server 104 obtains the object evaluation request, selects a plurality of features from the object evaluation request, generates a plurality of feature variable groups, the candidate features are the plurality of features for object evaluation, and each feature variable group includes the plurality of features selected from the candidate features. The server 104 trains the machine learning models corresponding to the characteristic variable groups, obtains the evaluation parameters corresponding to the characteristic variable groups based on training results, and screens parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters; under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the characteristic variable groups; under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups; training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model. Further, the server 104 pushes the evaluation result obtained after evaluating the object to be evaluated to the terminal 102, and the evaluation result is displayed to the user by the terminal 102.
The object evaluation method of the application can also be applied to application scenes of a single terminal or server.
For example, the user interacts with the terminal 102 to issue an object evaluation request, where the object evaluation request carries a plurality of candidate features, the terminal 102 selects a plurality of features from the object evaluation request to generate a plurality of feature variable sets, the candidate features are a plurality of features for object evaluation, and each feature variable set includes a plurality of features selected from the candidate features. Training machine learning models corresponding to a plurality of characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters; under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the characteristic variable groups; under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups; training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model. Further, the terminal 102 displays the evaluation result obtained after the object to be evaluated is evaluated to the user.
For another example, the server 104 obtains candidate features, evaluates a number of features from the candidate object, generates a plurality of feature variable sets, the candidate features are the number of features for object evaluation, and each feature variable set includes a number of features selected from the candidate features. Training machine learning models corresponding to a plurality of characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters; under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the characteristic variable groups; under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups; training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
In some embodiments, as shown in fig. 2, an object evaluation method is provided, and an example of application of the method to the computer device in fig. 1 is described, where the computer device includes a terminal or a server. Wherein:
S100, generating a plurality of feature variable groups based on the candidate features.
Wherein the candidate features are a number of features for object assessment, each feature variable set comprising a number of features selected from the candidate features. The candidate features may vary from application scenario to application scenario, for example, the candidate features may be features in the user dimension, such as gender, age, and transaction amount. The number of candidate features is m, m > 1.
Specifically, the computer device obtains a plurality of candidate features (x 1,x2,…,xm) for object evaluation, randomly selects a required feature variable from the plurality of candidate features, and uses the required feature variable as a combination to obtain a feature variable group. In the set of feature variables, the selected desired feature variable position is noted as 1 and the unselected desired feature variable position is noted as 0, e.g., a certain set of feature variables may be (1, 0, …, 1), i.e., representing selected feature variables x 1 and x m, but not selected feature variable x 2. Further, the computer device may obtain a plurality of feature variable groups to obtain machine learning models corresponding to the feature variable groups, and determine a selection effect of the feature variable groups based on the machine learning models.
S200, training machine learning models corresponding to a plurality of feature variable groups, obtaining evaluation parameters corresponding to the feature variable groups based on training results, and screening parent feature variable groups meeting object evaluation conditions from the feature variable groups based on the evaluation parameters.
The machine learning model includes, but is not limited to, a logistic regression model such as a scoring card model.
Specifically, different characteristic variable groups corresponding to different machine learning models are obtained, training data are obtained, and the machine learning model corresponding to each characteristic variable group is trained to obtain a trained machine learning model. And based on the training result obtained after the training is completed, evaluating the trained machine learning model to obtain the evaluation parameters of the trained machine learning model, and taking the evaluation parameters of the trained machine learning model as the evaluation parameters of the characteristic variable group of the machine learning model.
The evaluation parameters represent the effect quality of the characteristic variable group of the machine learning model, if the evaluation parameters of the characteristic variable group do not meet the preset object evaluation conditions, the effect of the characteristic variable group is poor, and the characteristic variable group is removed; if the evaluation parameters of the feature variable group meet the preset object evaluation conditions, the effect of representing the feature variable group is better, the feature variable group is taken as a parent feature variable group, the representativeness of the feature variable group in the next generation is increased, and the parent feature variable group is used in the subsequent iteration process so as to continuously perform iterative optimization on the parent feature variable group.
And S300, under the condition that the iteration ending condition is not met, carrying out genetic processing on the parent characteristic variable group, obtaining a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the plurality of characteristic variable groups.
The application combines model training with genetic algorithm to evaluate object accurately. The genetic algorithm is to simulate the phenomena of replication, crossover, mutation and the like in natural selection and inheritance, and from any initial population, a group of individuals more suitable for the environment is generated through random selection, crossover and mutation operation, so that the group evolves to a better and better area in a search space, the generation of the generation is continuously propagated and evolved, and finally, the generation of the generation is converged to a group of individuals most suitable for the environment, so that a high-quality solution of the problem is obtained.
In particular, under the condition that the iteration end condition is not met, the computer equipment continuously performs iteration optimization on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and takes the plurality of updated characteristic variable groups obtained by the current iteration round as the characteristic variable groups of the machine learning model to be trained in the next iteration process. That is, in the first iteration process, after the parent feature variable group meeting the conditions is screened out from the feature variable groups based on the evaluation parameters, genetic processing is performed on the parent feature variable group to obtain a plurality of updated feature variable groups, in the second iteration process, training is continuously performed on the machine learning model corresponding to the plurality of updated feature variable groups, based on the training result, evaluation parameters corresponding to the plurality of updated feature variable groups are obtained, and based on the evaluation parameters, the parent feature variable groups meeting the conditions are continuously screened out from the plurality of updated feature variable groups, the parent feature variable groups in the first iteration process are abandoned, and the parent feature variable groups updated in the second iteration process are used as the latest parent features.
S400, under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups.
Specifically, when the iteration end condition is satisfied, the iteration is ended, and a plurality of selected feature groups are obtained based on the parent feature variable group in the last iteration process. Further, based on the parent feature variable group in the last iteration process, a plurality of selected feature groups are obtained, wherein the parent feature variable group in the last iteration process can be used as the plurality of selected feature groups; or continuously screening the feature variable group with higher quality from the parent feature variable group in the last iteration process, and selecting the feature variable group as a plurality of selected feature groups.
S500, training the machine learning models corresponding to the selected multiple groups of feature groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
Specifically, since the selected plurality of feature groups are high-quality feature variable groups, the machine learning model corresponding to the selected plurality of feature groups can be regarded as a high-quality machine learning model, and the machine learning model corresponding to the selected plurality of feature groups is trained to obtain a trained target machine learning model, which is a high-quality trained machine learning model. Therefore, the object to be evaluated can be evaluated more accurately based on the target machine learning model.
In the object evaluation method, a plurality of feature variable groups are generated based on candidate features, and each feature variable group comprises a plurality of features selected from the candidate features; candidate features are several features for object assessment; training machine learning models corresponding to a plurality of characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters; under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training the machine learning model of the characteristic variable groups; under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups; training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model. In the whole process, firstly, through the evaluation parameters obtained after training the machine learning models corresponding to a plurality of characteristic variable groups, the parent characteristic variable groups are accurately screened, and further, under the condition that iteration ending conditions are not met, genetic processing is adopted to enrich the parent characteristic variable groups, so that the range of subsequent screening is enlarged, further, iteration screening is continuously carried out on the updated parent characteristic variable groups, finally selected multiple groups of characteristic groups are accurately obtained, and the target machine learning model obtained through training the models corresponding to the selected multiple groups of characteristic groups is used for realizing accurate object evaluation.
In some embodiments, generating a plurality of feature variable sets based on the candidate features includes:
acquiring the number of candidate features and target features; based on the candidate features, a plurality of feature variable sets are randomly generated.
The target feature quantity is the feature quantity in each feature variable group, that is, the feature quantity in each feature variable group is the same as the target feature quantity.
Specifically, let the candidate feature be m and the number of target features be n, in the case that the candidate feature list is (x 1,x2,…,xm), randomly select n feature variables from the candidate feature list each time, and obtain feature variable groups selected each time, where the feature variable groups may be characterized in terms of solutions, for example, a certain group of feature variable groups may be (1, 0,1, …, 1), where the selected feature positions are 1, the others are 0, and the sum of all 1 s is n.
It will be appreciated that each set of feature variables may be considered as an individual based on model training combined with genetic algorithms for feature selection, that is, randomly generating a plurality of sets of feature variables, that is, randomly generating a plurality of individuals, and generating an initial population based on the plurality of individuals.
In this embodiment, by defining the number of target features, interference caused by different numbers of features on training of the machine learning model can be reduced, that is, different feature data are avoided, so that there is a large difference in the evaluation parameters corresponding to the feature variable group, the evaluation result is accurate, and a more accurate feature selection process is further executed.
In some embodiments, as shown in fig. 3, S200 includes:
S210, training machine learning models corresponding to the characteristic variable groups.
Specifically, a training set for training a machine learning model, a validation set for verifying performance of the trained machine learning model, and a test set are obtained. At least one machine learning model, each machine learning model corresponding to a different set of feature variables. And training the machine learning model corresponding to the plurality of characteristic variable groups by adopting the training set to obtain a trained machine learning model. Further, after training the machine learning model corresponding to the plurality of feature variable groups, training results can be obtained, wherein the training results comprise prediction training results of the plurality of feature variable groups.
S220, processing a preset verification set by adopting each trained machine learning model to obtain a verification result, and obtaining first evaluation parameters of each trained machine learning model based on the verification result and the training result.
Specifically, on the basis of obtaining each trained machine learning model, processing a preset verification set by adopting each trained machine learning model to obtain a predicted output result of the preset verification set, namely a verification result, and comparing the performance of each trained machine learning model in a training process and a verification process by integrating the training result and the verification result to obtain a first evaluation parameter of each trained machine learning model. In practical applications, the first evaluation parameters include, but are not limited to, AUC (AreaUnderRoc, the size of the area under the receiver operation characteristic), PSI (Population stability index, model stability index), etc.
S230, processing a preset test set by adopting each trained machine learning model to obtain a test result, and obtaining second evaluation parameters of each trained machine learning model based on the test result and the training result.
Specifically, on the basis of obtaining each trained machine learning model, processing a preset test set by adopting each trained machine learning model to obtain a predicted output result of the preset test set, namely a test result, and comparing the performance of each trained machine learning model in a training process and a testing process by integrating the training result and the test result to obtain a second evaluation parameter of each trained machine learning model. In practical applications, the second evaluation parameters include, but are not limited to, AUC, PSI, etc.
S240, combining the first evaluation parameters and the second evaluation parameters to obtain evaluation parameters of each characteristic variable group corresponding to each trained machine learning model.
Specifically, all the evaluation parameters are summarized to obtain the evaluation parameters of each trained machine learning model, and because each trained machine learning model corresponds to different characteristic variable groups, the evaluation parameters of each trained machine learning model can be used as the evaluation parameters of each corresponding characteristic variable group.
S250, based on the evaluation parameters, the parent characteristic variable group meeting the object evaluation conditions is screened out from the characteristic variable group.
Specifically, a characteristic variable group with the evaluation parameters meeting the evaluation conditions of the preset object is selected from the characteristic variable groups and used as a parent characteristic variable group.
In this embodiment, the preset verification set and the preset test set are adopted to verify each trained machine learning model to obtain output results of different data, and the output results of different data and the training results obtained in the training process can be compared to evaluate the performance of the trained machine learning model, so that accurate evaluation parameters of each trained machine learning model are obtained, and the evaluation parameters of each trained machine learning model are used as evaluation parameters of each feature variable group.
In some embodiments, the first evaluation parameter includes a first stability indicator and a first feature importance indicator; the second evaluation parameter includes a second stability index and a second feature importance index.
The characteristic importance index in the application is an AUC index, and the larger the AUC index is, the better the model effect of the corresponding characteristic variable group is. AUC is the area under the ROC (receiver operating characteristic curve, receiver operating characteristics) curve. Colloquially, AUC refers to the probability that a positive sample and a negative sample are randomly given, and classified and predicted by a classifier, with the score of the positive sample being greater than the score of the negative sample.
The abscissa of the ROC curve is the false positive rate (also called false positive class rate, false Positive Rate), and the ordinate is the True positive rate (True class rate, true Positive Rate), and correspondingly also the True negative rate (True negative class rate, true NEGATIVE RATE) and the false negative rate (false negative class rate, FALSE NEGATIVE RATE). The four indexes are calculated as follows:
1) False positive rate: the probability that the positive example is determined not to be the positive example, that is, the probability that the negative example is determined to be the positive example.
2) True positive rate: the probability that the positive example is determined to be the positive example is also the probability that the positive example is determined to be the positive example.
3) False negative rate: the probability that the negative example is determined not to be the negative example, that is, the probability that the positive example is determined to be the negative example.
4) True negative rate: the probability that the negative example is determined to be a negative example, that is, the probability that the negative example is determined to be a negative example.
The stability index reflects the stability of the distribution of the verification/test set in each segment and the distribution of the modeling sample, wherein the training result is taken as the expected distribution, and the output result corresponding to the test set and the verification set is taken as the actual distribution.
Specifically, there are many methods for screening the parent feature variable group satisfying the condition from the feature variable group, that is, there are many types of evaluation parameters. The first evaluation parameter comprises a first stability index and a first characteristic importance index; the second evaluation parameter includes a second stability index and a second feature importance index, and the method for determining the first evaluation parameter and the second evaluation parameter includes:
In one aspect, the computer device bins the training results based on the training results and obtains a first distribution duty cycle of the data in each bin.
Processing a preset verification set by adopting a machine learning model completed by each training to obtain a first prediction output result of the preset verification set, and carrying out box division on the first prediction output result in a box division mode consistent with the training result to obtain a second distribution duty ratio of data in each box corresponding to the first prediction output result; comparing the first distribution duty ratio with the second distribution duty ratio, determining a first stability index, and when the first distribution duty ratio and the second distribution duty ratio are not different, the first stability index is better; when the first distribution duty cycle differs greatly from the second distribution duty cycle, the first stability index is poor.
Processing the preset test set by adopting the machine learning model completed by each training to obtain a second predicted output result of the preset test set, and carrying out box division on the second predicted output result in a box division mode consistent with the training result to obtain a third distribution duty ratio of data in each box corresponding to the second predicted output result; comparing the first distribution duty ratio with the third distribution duty ratio, determining a second stability index, and when the first distribution duty ratio and the third distribution duty ratio are not different, the second stability index is better; when the first distribution duty cycle differs greatly from the third distribution duty cycle, the second stability index is poor.
On the other hand, the preset verification set is processed by using each trained machine learning model to obtain a verification result, based on the verification result, the probability P 1 that each trained machine learning model predicts the positive sample in the preset verification set as the negative sample is obtained, the probability P 2 that each trained machine learning model predicts the negative sample in the preset verification set as the positive sample is obtained, the probability P 1 that P 1 is larger than P 2 is obtained, and the probability P 1 is used as a first feature importance index.
Similarly, the machine learning models after each training are adopted to process a preset test set to obtain test results, based on the test results, the probability P 3 that positive samples in the preset test set are predicted as negative samples by the machine learning models after each training is obtained, the probability P 4 that negative samples in the preset test set are predicted as positive samples by the machine learning models after each training is obtained, the probability P 2 that P 3 is larger than P 4 is obtained, and the probability P 2 is used as a second feature importance index.
Based on the evaluation parameters, selecting a parent characteristic variable group meeting object evaluation conditions from the characteristic variable groups, wherein the method comprises the following steps:
And screening from the characteristic variable group to obtain a parent characteristic variable group with a first stability index smaller than a preset first stability threshold, an absolute value of a difference between the second stability index smaller than a preset second stability threshold, an absolute value of a difference between the first characteristic importance index and the second characteristic importance index smaller than a preset first characteristic importance threshold and a first characteristic importance index larger than or equal to a preset second characteristic importance index threshold.
The preset first stability threshold, the preset first feature importance threshold and the preset second feature importance index threshold are all self-defined thresholds, and the preset first stability threshold may be the same or different, and the preset first feature importance threshold and the preset second feature importance index threshold may be the same or different.
Specifically, the object evaluation conditions include:
1) The first stability index is smaller than a preset first stability threshold;
2) The second stability index is smaller than a preset second stability threshold;
3) The absolute value of the difference between the first characteristic importance index and the second characteristic importance index is smaller than a preset first characteristic importance threshold;
4) The first characteristic importance index is greater than or equal to a preset second characteristic importance index threshold.
Furthermore, the object evaluation condition may further include: the features in the feature variable group corresponding to the machine learning model are positive.
And screening the characteristic variable group meeting the object evaluation condition from the characteristic variable groups, and taking the characteristic variable group meeting the object evaluation condition as a parent characteristic variable group.
For example, a set of feature variables having a first stability index of less than 0.1, a second stability index of less than 0.1, an absolute value of a difference between the first feature importance index and the second feature importance index of less than 0.02, the first feature importance index being greater than or equal to 0.8, and positive features in the set of feature variables may be selected as the parent set of feature variables.
In this embodiment, whether the feature variable group is a high-quality feature variable group is determined based on the first stability index, the second stability index, the first feature importance index, and the second feature importance index, so as to accurately select the feature variable group.
In some embodiments, after determining the parent feature variable group, the parent feature variable group may be further ordered according to the size of the first feature importance index, where the parent feature variable group ordered before is better.
In some embodiments, the set of parent feature variables is genetically processed, including one or more of the following:
First kind:
and performing cross substitution on the first target feature in the first feature variable group and the second target feature matched with the first target feature in the second feature variable group.
The first processing method is a crossing pairing method in a genetic algorithm, cross-operating selected parent individuals, comparing two different individuals, introducing different characteristics in another individual aiming at one individual, replacing the existing characteristics, and generating a new individual, namely simulating the crossing and combining process of genes in nature. The first characteristic variable group and the second characteristic variable group are any one characteristic variable group in the parent characteristic variable group, and the first characteristic variable group is different from the second characteristic variable group; the first target feature is any one feature in the first feature variable group.
Specifically, a pair of characteristic variable groups is randomly selected from the parent characteristic variable groups to serve as a first characteristic variable group and a second characteristic variable group, and any one of the first characteristic variable group and the corresponding characteristic of the second characteristic variable group are subjected to cross substitution because each characteristic variable group consists of a plurality of characteristics. For example, when the first feature variable set is (1,0,1,0,1) and the second feature variable set is (0, 1), the feature variable at the fourth position in the first feature variable set is cross-substituted to obtain an updated first feature variable set is (1, 0, 1), and the updated second feature variable set is (0,0,1,0,1), and based on the updated first feature variable set and the updated second feature variable set, an updated parent feature variable set is obtained. In practical application, not only the first characteristic variable group and the second characteristic variable group can be replaced at the same time, but also only the first characteristic variable group or the second characteristic variable group can be replaced.
Further, the first target feature is not just one feature but may be a plurality of features. And performing cross substitution on the plurality of target features in the first feature variable group and second target features matched with the first target features in the second feature variable group.
Second kind:
and updating the third target feature in the parent feature variable group to be the fourth target feature.
The second processing method is a mutation method in a genetic algorithm, wherein mutation refers to mutation operation of selecting a parent individual, namely, randomly selecting a new feature to replace the existing feature, and the mutation is helpful for maintaining the diversity of the population by introducing a new gene combination. The third target feature is any one of the features in the first feature variable group. The third target feature may be one feature in the parent feature variable set or may be a plurality of features in the parent feature variable set. The fourth target feature may be any one feature in the candidate variables or may be a plurality of features in the candidate variables. The fourth target features are the same as the third target features in number and in one-to-one correspondence. Specifically, more than one feature in the parent feature variable group randomly mutates at least one feature in the parent feature variable group, in the mutation process, the rest features in the parent feature variable group are kept unchanged, only the third target feature is updated, and the specific means of mutation is to update the third target feature by adopting randomly selected features in the candidate features. For example, there is a certain parent feature variable set (1, 0, …,1, 0), two features are randomly obtained from the candidate features, the two features in the parent feature variable set are updated, and the child feature variable set is obtained as (1, 0, …,0, 1), which can be seen that the features at the two reciprocal positions in the parent feature variable set are mutated.
Third kind:
acquiring a candidate characteristic variable group; and replacing the third characteristic variable group by using the candidate characteristic variable group.
The third processing method is an alternative method in a genetic algorithm, and the third characteristic variable group is at least one characteristic variable group in the parent characteristic variable group. The candidate set of feature variables is a new randomly generated set of feature variables. The number of candidate feature variable sets is the same as the number of third feature variable sets. Specifically, when a plurality of feature variable groups are generated at first, all combinations of the generated feature variable groups are not exhaustive, so that a plurality of feature variable groups can be randomly generated later, and when genetic substitution is performed, a new randomly generated feature variable group is introduced to replace any one feature variable group in the original parent feature variable group. For example, the third feature variable set is (1, 0, …,1, 0), the candidate feature variable set generated at random is (1, 0, …, 1), the candidate feature variable set is used to replace the third feature variable set, and the updated third feature variable set is (1, 0, …, 1).
Fourth kind:
the parent set of feature variables is maintained.
The fourth processing method is a retention method in a genetic algorithm.
Specifically, it should be noted that in the process of maintaining the parent characteristic variable group, all parent characteristic variable groups can be maintained, or part of the parent characteristic variable groups can be randomly selected from the parent characteristic variable groups to be maintained. That is, the meaning of hold means that the features in the parent feature variable set are not changed.
In this embodiment, the inheritance process in nature can be fully simulated by performing cross pairing, replacement, mutation, maintenance and other treatments on the parent characteristic variable group, and a new characteristic variable combination can be introduced, so that the parent characteristic variable group is enriched, and is subjected to inheritance treatment based on the parent characteristic variable group meeting the conditions, and the updated parent characteristic variable group is also a characteristic with better performance, so that the characteristic variable group is continuously optimized in the iteration process, and accurate characteristic selection is performed. And the genetic algorithm replaces the algorithm personnel to make a decision on the model, so that more resources can be saved, and the efficiency of characteristic variable selection is improved.
In some embodiments, replacing the third set of feature variables with the candidate set of feature variables includes:
Randomly generating at least one candidate feature variable set based on the candidate features; and sequentially replacing a third characteristic variable group in the parent characteristic variable group by adopting the candidate characteristic variable group.
Specifically, in the replacement process, candidate feature variable sets for replacement are generated based on candidate features, and in the generated candidate feature variable sets, feature variable sets that do not initially satisfy the object evaluation condition are filtered out, so as to obtain updated candidate feature variable sets. Further, the candidate characteristic variable group is adopted to replace the third characteristic variable group in the parent characteristic variable group in sequence. For example, when only one candidate characteristic variable group exists, the candidate characteristic variable group is directly adopted to randomly replace a third characteristic variable group in the parent characteristic variable group; and when more than one candidate characteristic variable group is adopted, randomly determining a third characteristic variable group which is consistent with the number of the candidate characteristic variable groups, and adopting the candidate characteristic variable groups to replace the third characteristic variable group in sequence.
In this embodiment, the candidate feature variable set is randomly generated based on the candidate feature, so that the feature variable set after replacement is obtained based on the candidate feature, and the feature variable set is not beyond a preset feature range, so that the feature variable set is more accurate, and accurate feature selection is performed.
In some embodiments, the parent characteristic variable group is subjected to cross pairing, replacement, mutation, maintenance and other genetic processes to obtain an updated plurality of characteristic variable groups, wherein the updated plurality of characteristic variable groups comprise the parent characteristic variable group subjected to cross pairing, the parent characteristic variable group subjected to replacement, the parent characteristic variable group subjected to mutation and the original parent characteristic variable group, and the updated plurality of characteristic variable groups can be controlled by defining the duty ratio of the updated parent characteristic variable group obtained by four genetic processing methods.
In some embodiments, the iteration end condition includes the number of iterations reaching a preset number of times threshold, or the third feature importance index of each parent feature variable set satisfying a preset index condition. Specifically, the third feature importance index is an AUC index of the machine learning model corresponding to the parent feature variable group obtained in the iteration process, and when the highest AUC index is unchanged or the iteration number reaches a preset number threshold in the continuous several iteration processes, the iteration can be terminated.
In some embodiments, in the event that the iteration end condition is satisfied, obtaining the selected plurality of sets of features based on the set of parent feature variables includes: under the condition that the iteration ending condition is met, judging whether the father-generation characteristic variable group in the last iteration process meets the object evaluation condition, screening the father-generation characteristic variable group to obtain characteristic variable groups meeting the object evaluation condition, sorting the characteristic variable groups meeting the object evaluation condition based on the AUC index of each characteristic variable group meeting the object evaluation condition, and taking N characteristic variable groups with the front sorting as a plurality of finally selected characteristic groups.
In a specific embodiment, taking a machine learning model as a scoring card model as an example, the number of candidate variables is assumed to be m, and the number of target features actually required is assumed to be n, wherein m is equal to or greater than n. A training set, a test set, and a validation set outside of the time samples are obtained.
1. Initializing a population phase.
Given a candidate variable list of (x 1,x2,x3,…,xm), a set of n combinations of features is set as a feature variable set, e.g., (m, 1) dimensional vectors (1, 0,1 …, 1), where the selected feature locations are 1, the unselected feature locations are 0, and the sum of all 1 s is n.
P characteristic variable groups are randomly generated, and are used as a population of a genetic algorithm, wherein each characteristic variable group is an individual.
2. And (5) an evaluation stage.
Training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups with evaluation parameters meeting conditions from the characteristic variable groups.
The evaluation parameters include: AUC corresponding to the verification set and the PSI, AUC corresponding to the test set and the positive and negative of the features in the PSI and feature variable group.
The conditions for evaluating parameter satisfaction include: AUC corresponding to the verification set is more than or equal to 0.8; PSI corresponding to the test set and PSI corresponding to the verification set are smaller than 0.1; the absolute value of the AUC difference value between the AUC corresponding to the verification set and the AUC corresponding to the test set is less than 0.02; the features in the feature variable group are all positive.
3. The parent feature variable set determines the phase.
If the characteristic variable group with the evaluation parameters not meeting the conditions exists, the part of the characteristic variable group needs to be filtered, the characteristic variable group is re-identified, and the characteristic variable group which is filtered from the newly randomly generated characteristic variable group. If none of the initial random feature variable sets meets the evaluation parameters, then all new feature variable sets are directly and randomly regenerated,
And if the characteristic variable group with the evaluation parameters meeting the conditions exists, taking the characteristic variable group as a parent characteristic variable group.
4. And (5) an iteration stage.
And adopting genetic processing means such as maintaining, crossing, replacing, mutating and the like for the parent characteristic variable group to update the parent characteristic variable group.
And under the condition that the preset iteration termination condition is not met, genetic processing means such as selection, maintenance, crossover, replacement, mutation and the like are repeatedly executed, individuals in the population are gradually optimized, and more individuals (characteristic variable groups) meeting the evaluation parameters appear.
And stopping iteration when the iteration number reaches the preset iteration number or the highest AUC does not change after the iteration is continued for several times.
5. And a feature selection stage.
Obtaining a parent characteristic variable group in the last iteration process, obtaining a characteristic variable group with evaluation parameters meeting object evaluation conditions from the parent characteristic variable group in the last iteration process, and selecting a plurality of characteristic variable groups with larger AUC according to the AUC from the characteristic variable groups with the evaluation parameters meeting the object evaluation conditions.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an object evaluation device for realizing the above-mentioned object evaluation method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the object assessment device or devices provided below may refer to the limitation of the object assessment method hereinabove, and will not be repeated here.
In some embodiments, as shown in fig. 4, there is provided an object evaluation apparatus including: a feature variable group generation module 100, a first screening module 200, an iteration module 300, a second screening module 400, and an application module 500, wherein:
A feature variable group generating module 100, configured to generate a plurality of feature variable groups based on candidate features, where each feature variable group includes a number of features selected from the candidate features; candidate features are several features for object assessment;
The first screening module 200 is configured to train machine learning models corresponding to a plurality of feature variable groups, obtain evaluation parameters corresponding to each feature variable group based on training results, and screen parent feature variable groups satisfying object evaluation conditions from the feature variable groups based on the evaluation parameters;
The iteration module 300 is configured to perform genetic processing on the parent feature variable set under the condition that the iteration end condition is not satisfied, obtain a plurality of updated feature variable sets, and return to the step of training the machine learning model of the plurality of feature variable sets;
A second screening module 400, configured to obtain, based on the parent feature variable set, a selected plurality of feature sets if the iteration end condition is satisfied;
the application module 500 is configured to train the machine learning models corresponding to the selected multiple feature sets to obtain a target machine learning model, and evaluate the object to be evaluated based on the target machine learning model.
In some embodiments, the feature variable group generation module is further configured to: acquiring the number of candidate features and target features; based on the candidate features, a plurality of feature variable groups are randomly generated, wherein the feature quantity in each feature variable group is the same as the target feature quantity.
In some embodiments, the first screening module 200 is further configured to: processing a preset verification set by adopting each trained machine learning model to obtain a verification result, and obtaining first evaluation parameters of each trained machine learning model based on the verification result and the training result; processing a preset test set by adopting each trained machine learning model to obtain a test result, and obtaining second evaluation parameters of each trained machine learning model based on the test result and the training result; and combining the first evaluation parameters with the second evaluation parameters to obtain evaluation parameters of each characteristic variable group corresponding to each trained machine learning model.
In some embodiments, the first evaluation parameter includes a first stability indicator and a first feature importance indicator; the second evaluation parameter comprises a second stability index and a second characteristic importance index; the first screening module 200 is further configured to: and screening from the characteristic variable group to obtain a parent characteristic variable group with a first stability index smaller than a preset first stability threshold, an absolute value of a difference between the second stability indexes smaller than a preset second stability threshold, a first characteristic importance index and a second characteristic importance index smaller than a preset first characteristic importance threshold and a first characteristic importance index larger than or equal to a preset second characteristic importance index threshold.
In some embodiments, the iteration module 300 is further configured to: first kind: cross-replacing a first target feature in the first feature variable group with a second target feature matched with the first target feature in the second feature variable group; the first characteristic variable group and the second characteristic variable group are any one characteristic variable group in the parent characteristic variable group, and the first characteristic variable group is different from the second characteristic variable group; the first target feature is any one feature in the first feature variable group; second kind: updating the third target feature in the parent feature variable group to be a fourth target feature; the third target feature is any one feature in the parent feature variable group; the fourth target feature is any one feature in the candidate variables; third kind: acquiring a candidate characteristic variable group; replacing a third characteristic variable group by adopting a candidate characteristic variable group; the third characteristic variable group is at least one characteristic variable group in the parent characteristic variable group, and the number of candidate characteristic variable groups is the same as that of the third characteristic variable group; fourth kind: the parent set of feature variables is maintained.
In some embodiments, the iteration module 300 is further configured to: randomly generating at least one candidate feature variable set based on the candidate features; and sequentially replacing a third characteristic variable group in the parent characteristic variable group by adopting the candidate characteristic variable group.
The respective modules in the above-described object evaluation apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing candidate characteristics and other data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement an object assessment method.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an object assessment method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 5 or 6 are merely block diagrams of portions of structures associated with aspects of the application and are not intended to limit the computer device to which aspects of the application may be applied, and that a particular computer device may include more or fewer components than those shown, or may combine certain components, or may have a different arrangement of components.
In some embodiments, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In some embodiments, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.
Claims (15)
1. An object assessment method, the method comprising:
generating a plurality of feature variable sets based on candidate features, each feature variable set comprising a number of features selected from the candidate features; the candidate features are a number of features for object assessment;
Training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
Under the condition that iteration ending conditions are not met, carrying out genetic processing on the parent characteristic variable group to obtain a plurality of updated characteristic variable groups, and returning to the step of training machine learning models of the characteristic variable groups;
Under the condition that the iteration ending condition is met, based on the parent characteristic variable group, obtaining a plurality of selected characteristic groups;
Training the machine learning models corresponding to the selected multiple groups of feature groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
2. The method of claim 1, wherein generating a plurality of feature variable sets based on candidate features comprises:
acquiring the number of candidate features and target features;
and randomly generating a plurality of feature variable groups based on the candidate features, wherein the feature quantity in each feature variable group is the same as the target feature quantity.
3. The method according to claim 1, wherein the obtaining, based on the training result, the evaluation parameter corresponding to each of the feature variable groups includes:
Processing a preset verification set by adopting each trained machine learning model to obtain a verification result, and obtaining a first evaluation parameter of each trained machine learning model based on the verification result and the training result;
processing a preset test set by adopting each trained machine learning model to obtain a test result, and obtaining a second evaluation parameter of each trained machine learning model based on the test result and the training result;
and combining the first evaluation parameters with the second evaluation parameters to obtain evaluation parameters of the feature variable groups corresponding to the trained machine learning models.
4. A method according to claim 3, wherein the first evaluation parameter comprises a first stability indicator and a first characteristic importance indicator; the second evaluation parameter comprises a second stability index and a second characteristic importance index;
The step of screening the parent characteristic variable group meeting the object evaluation condition from the characteristic variable group based on the evaluation parameters comprises the following steps:
and screening from the characteristic variable group to obtain a parent characteristic variable group with the first stability index smaller than a preset first stability threshold, the absolute value of the difference between the second stability indexes smaller than a preset second stability threshold, the first characteristic importance index and the second characteristic importance index smaller than a preset first characteristic importance threshold and the first characteristic importance index larger than or equal to a preset second characteristic importance index threshold.
5. The method of claim 1, wherein said genetically manipulating said set of parent feature variables comprises one or more of the following:
First kind:
Cross-replacing a first target feature in a first feature variable group and a second target feature matched with the first target feature in a second feature variable group; the first characteristic variable group and the second characteristic variable group are any one characteristic variable group in the parent characteristic variable group, and the first characteristic variable group is different from the second characteristic variable group; the first target feature is any feature in the first feature variable group;
Second kind:
Updating the third target feature in the parent feature variable group to be a fourth target feature; the third target feature is any one feature in the parent feature variable group; the fourth target feature is any one feature in the candidate variables;
third kind:
Acquiring a candidate characteristic variable group;
replacing a third characteristic variable group by adopting the candidate characteristic variable group; the third characteristic variable group is at least one characteristic variable group in the parent characteristic variable group, and the number of the candidate characteristic variable groups is the same as that of the third characteristic variable group;
fourth kind:
The set of parent feature variables is maintained.
6. The method of claim 5, wherein replacing the third set of feature variables with the set of candidate feature variables comprises:
Randomly generating at least one candidate feature variable set based on the candidate features;
And sequentially replacing a third characteristic variable group in the parent characteristic variable group by adopting the candidate characteristic variable group.
7. An object assessment apparatus, the apparatus comprising:
the characteristic variable group generation module is used for generating a plurality of characteristic variable groups based on candidate characteristics, wherein each characteristic variable group comprises a plurality of characteristics selected from the candidate characteristics; the candidate features are a number of features for object assessment;
The first screening module is used for training the machine learning models corresponding to the characteristic variable groups, obtaining evaluation parameters corresponding to the characteristic variable groups based on training results, and screening parent characteristic variable groups meeting object evaluation conditions from the characteristic variable groups based on the evaluation parameters;
The iteration module is used for carrying out genetic processing on the parent characteristic variable group under the condition that the iteration ending condition is not met, obtaining a plurality of updated characteristic variable groups, and returning to the step of training the machine learning models of the characteristic variable groups;
The second screening module is used for obtaining a plurality of selected characteristic groups based on the parent characteristic variable groups under the condition that the iteration ending condition is met;
and the application module is used for training the machine learning models corresponding to the selected multiple groups of characteristic groups to obtain a target machine learning model, and evaluating the object to be evaluated based on the target machine learning model.
8. The apparatus of claim 7, wherein the feature variable group generation module is further configured to: acquiring the number of candidate features and target features; and randomly generating a plurality of feature variable groups based on the candidate features, wherein the feature quantity in each feature variable group is the same as the target feature quantity.
9. The apparatus of claim 7, wherein the first screening module is further configured to: processing a preset verification set by adopting each trained machine learning model to obtain a verification result, and obtaining a first evaluation parameter of each trained machine learning model based on the verification result and the training result; processing a preset test set by adopting each trained machine learning model to obtain a test result, and obtaining a second evaluation parameter of each trained machine learning model based on the test result and the training result; and combining the first evaluation parameters with the second evaluation parameters to obtain evaluation parameters of the feature variable groups corresponding to the trained machine learning models.
10. The apparatus of claim 9, wherein the first evaluation parameter comprises a first stability indicator and a first characteristic importance indicator; the second evaluation parameter comprises a second stability index and a second characteristic importance index; the first screening module is further configured to: and screening from the characteristic variable group to obtain a parent characteristic variable group with the first stability index smaller than a preset first stability threshold, the absolute value of the difference between the second stability indexes smaller than a preset second stability threshold, the first characteristic importance index and the second characteristic importance index smaller than a preset first characteristic importance threshold and the first characteristic importance index larger than or equal to a preset second characteristic importance index threshold.
11. The apparatus of claim 7, wherein the iteration module is further configured to: first kind: cross-replacing a first target feature in a first feature variable group and a second target feature matched with the first target feature in a second feature variable group; the first characteristic variable group and the second characteristic variable group are any one characteristic variable group in the parent characteristic variable group, and the first characteristic variable group is different from the second characteristic variable group; the first target feature is any feature in the first feature variable group; second kind: updating the third target feature in the parent feature variable group to be a fourth target feature; the third target feature is any one feature in the parent feature variable group; the fourth target feature is any one feature in the candidate variables; third kind: acquiring a candidate characteristic variable group; replacing a third characteristic variable group by adopting the candidate characteristic variable group; the third characteristic variable group is at least one characteristic variable group in the parent characteristic variable group, and the number of the candidate characteristic variable groups is the same as that of the third characteristic variable group; fourth kind: the set of parent feature variables is maintained.
12. The apparatus of claim 11, wherein the iteration module is further configured to: randomly generating at least one candidate feature variable set based on the candidate features; and sequentially replacing a third characteristic variable group in the parent characteristic variable group by adopting the candidate characteristic variable group.
13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
14. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410372803.3A CN118364261A (en) | 2024-03-29 | 2024-03-29 | Object evaluation method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410372803.3A CN118364261A (en) | 2024-03-29 | 2024-03-29 | Object evaluation method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118364261A true CN118364261A (en) | 2024-07-19 |
Family
ID=91886190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410372803.3A Pending CN118364261A (en) | 2024-03-29 | 2024-03-29 | Object evaluation method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118364261A (en) |
-
2024
- 2024-03-29 CN CN202410372803.3A patent/CN118364261A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10685286B1 (en) | Automated neural network generation using fitness estimation | |
CN111177473B (en) | Personnel relationship analysis method, device and readable storage medium | |
CN115659177A (en) | Method, device and computer equipment for generating data recommendation model | |
Li et al. | Nature-inspired multiobjective epistasis elucidation from genome-wide association studies | |
JP2021184148A (en) | Optimization device, optimization method, and optimization program | |
CN115545214A (en) | User screening method, device, computer equipment, storage medium and program product | |
CN111582313A (en) | Sample data generation method and device and electronic equipment | |
JP2020177508A (en) | Prediction system, prediction method, and prediction program | |
CN113570043A (en) | Credit risk prediction model training method and device | |
CN118364261A (en) | Object evaluation method, device, computer equipment and storage medium | |
CN117376410A (en) | Service pushing method, device, computer equipment and storage medium | |
CN117422119A (en) | Question-answering system response prediction model training method and device and computer equipment | |
CN108229572B (en) | Parameter optimization method and computing equipment | |
CN115905893A (en) | Resource numerical value prediction method, device, computer equipment and storage medium | |
CN116258923A (en) | Image recognition model training method, device, computer equipment and storage medium | |
CN116304607A (en) | Automated feature engineering for predictive modeling using deep reinforcement learning | |
CN116312741A (en) | Medium formulation optimization method, device, computer equipment and storage medium | |
CN119150701B (en) | A method, device and medium for establishing a crack growth model based on data drive | |
CN115130539B (en) | Classification model training, data classification method, device and computer equipment | |
CN115879533B (en) | Class increment learning method and system based on analogy learning | |
CN116881122A (en) | Test case generation method, device, equipment, storage medium and program product | |
CN116880886A (en) | Updating method and device of product recommendation model | |
CN119415565A (en) | Distributed cache parameter optimization method, device, computer equipment and storage medium | |
CN119201755A (en) | Test case generation method, device, computer equipment, readable storage medium and program product | |
CN116127183A (en) | Service recommendation method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |