CN115374874B

CN115374874B - A feature selection integration method and system for large-scale wind power grid connection

Info

Publication number: CN115374874B
Application number: CN202211055817.XA
Authority: CN
Inventors: 石访; 杜宗展; 赵昱臣; 张恒旭; 王谱宇; 郭全; 刘晓宁; 董振风; 田硕硕; 刘尊龙
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2022-08-31
Filing date: 2022-08-31
Publication date: 2025-11-14
Anticipated expiration: 2042-08-31
Also published as: CN115374874A

Abstract

The invention belongs to the technical field of power systems, and particularly relates to a feature selection integration method and system for large-scale wind power grid connection, wherein the method comprises the steps of obtaining multidimensional feature data containing wind power grid connection response features; the method comprises the steps of carrying out incremental search on the obtained multidimensional data features based on preset optimal correlation and redundancy evaluation criteria to construct a plurality of groups of nested candidate feature subsets, calculating classification precision of the constructed plurality of groups of nested candidate feature subsets, recording the candidate feature subset with the largest classification precision, verifying dimensionality of the recorded candidate feature subset to obtain the optimal candidate feature subset, and completing feature selection integration.

Description

Feature selection integration method and system containing large-scale wind power grid connection

Technical Field

The disclosure belongs to the technical field of power systems, and particularly relates to a feature selection integration method and system containing large-scale wind power grid connection.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The system after the wind power and other new energy units are connected presents the new characteristics which are not possessed by the traditional synchronous units, has the characteristics of weak stability, low inertia response characteristic, weak anti-interference performance, low overload capacity, strong output fluctuation and the like, brings new risks and challenges to the classical stability problem, and obviously increases the difficulty of transient power angle stability assessment and control by applying the traditional method. At present, the traditional analysis method based on the model faces the problems of complex modeling and difficulty of a new energy unit and novel power electronic equipment when a large amount of new energy is accessed, real-time requirements are difficult to meet, and a novel power electronic element is not easy to independently construct a Lyapunov energy function, so that a quick, accurate and online transient stability assessment method is needed.

The advantages of strong learning capability and high calculation speed of the big data technology are utilized to fully mine potential useful information, meanwhile, the problems of difficult modeling of a new energy unit and the like can be avoided, and a new thought is provided for transient power angle stability evaluation of a high-proportion new energy power system. From the data driving point of view, the model structure and the internal control logic do not need to be considered in detail, and the measurement data is learned in an off-line mode, so that on-line evaluation can be carried out. Meanwhile, the new energy unit has complex control strategy and numerous element parameters, and the power grid measurement means are continuously improved, so that a practical condition is objectively provided for the application of a data driving method.

The selection of the appropriate input features is based on and importance of the subsequent transient stability assessment using a machine learning model. If the original feature set constructed by all the information such as the generator and the bus is directly used as input, on one hand, the complexity and the parameter number of the subsequent model structure can be greatly increased, a heavy burden is brought to training, compared with the optimal feature subset, the training time length can be increased by tens of times or tens of times, the application in the power grid is not facilitated in real time, and even the dimension disaster problem can be caused, on the other hand, the constructed feature set is usually combined and calculated according to the manual experience from a physical mechanism to obtain the corresponding feature quantity, and the deep implicit association relation in the data cannot be identified by the manual work, so that a large amount of redundant information can exist among the features, the redundant information not only can waste valuable computing resources, but also can cause the reduction of model precision. Therefore, it is important to select a representative feature subset from the original feature set, remove redundant information, and realize the optimal reduction of dimension, so as to further improve the correlation performance of the subsequent algorithm.

The inventor knows that the main methods of dimension reduction at present comprise two types of feature extraction and feature selection, wherein the feature extraction compresses original data by mapping the original data into a new dimension space, but the physical meaning represented by the feature is lost in the process, and the feature selection does not change the original expression of the feature, only reduces the complexity by a screening mode, and is more beneficial to explaining the subsequent classification result from the aspect of the physical meaning.

Disclosure of Invention

In order to solve the problems, the disclosure provides a feature selection integration method and system containing large-scale wind power grid connection, and the original feature set is processed in a feature selection mode to meet the requirement of rapid evaluation.

According to some embodiments, a first scheme of the present disclosure provides a feature selection integration method including large-scale wind power grid connection, which adopts the following technical scheme:

a feature selection integration method containing large-scale wind power grid connection comprises the following steps:

Acquiring multidimensional characteristic data containing wind power grid-connected response characteristics;

Performing incremental search on the acquired multidimensional data features based on preset optimal correlation and redundancy evaluation criteria, and constructing a plurality of groups of nested candidate feature subsets;

Calculating classification precision of the constructed multiple groups of nested candidate feature subsets;

And recording the candidate feature subset with the maximum classification precision, verifying the dimensionality of the recorded candidate feature subset to obtain an optimal candidate feature subset, and completing feature selection integration.

As a further technical limitation, the determining process of the preset optimal correlation and redundancy evaluation criterion is as follows:

Carrying out normalization pretreatment on the obtained multidimensional feature data containing wind power grid-connected response features, and dividing the treated multidimensional feature data into an experimental data set and a test data set;

respectively carrying out correlation criterion calculation and redundancy evaluation on the obtained experimental data set to obtain the most preferred feature subset;

Based on the obtained optimal feature subset, dimension reduction and classification precision are comprehensively considered, optimal correlation and redundancy evaluation are obtained, and preset optimal correlation and redundancy evaluation criteria are determined.

Further, based on intensive fusion comprehensive analysis, T-test, χ ² test, characteristic Score based on a Relief algorithm, filter characteristic selection algorithm Fisher-Score, information gain rate and Kruskal-Wallis test Kruskal-Wallis and maximum correlation coefficient are selected as measurement indexes of an experimental data set, wherein the T-test compares whether the difference of characteristics of two experimental data sets is significant or not by using T distribution theory, correlation between the characteristics and class labels is calculated by using χ ² test to realize importance scoring, the Relief algorithm scores the characteristics by calculating sample distances between the intra-class and inter-class, the Fisher-Score selects the characteristics with more discrimination information according to the principle that the intra-class distance is small, the greater the Fisher-Score value is, the more important the characteristics are, the greater the degree of correlation between the characteristics and the class is, the information gain rate is calculated by calculating the change rate of the information entropy before and after the characteristics are used to Score the characteristics, the more important characteristics are marked, the characteristics are represented by the greater the values are more important characteristics, the maximum degree of the characteristics are calculated, and the maximum degree of correlation between the characteristics and the maximum correlation coefficient is calculated by the maximum degree of the correlation is not converted by the optimal degree, and the optimal degree is calculated, and the optimal degree is not converted, and the correlation between the characteristics is calculated.

Further, in the aspect of redundancy among features, in order to balance the correlation and the redundancy, the mutual information is overcome to be biased to take more attributes, and the improvement of the mutual information is carried out based on normalization, namely, the NMI in a normalization form isWhere H (x) and H (y) are the entropy of x and y, respectively, to measure the amount of information in a plurality of states that an event may have, i.e., the expected value of the amount of information with respect to the probability distribution of the event.

Further, in the process of constructing the multiple sets of nested candidate feature subsets, searching is performed based on different initial multidimensional data features, quantiles are introduced for dividing weights between the relevance measure and the redundancy, and iterative assignment is performed on the introduced quantiles to obtain optimal feature sequences under different weights, wherein the optimal feature sequences under different weights are the multiple sets of nested candidate feature subsets.

As a further technical limitation, the classification precision of a plurality of groups of nested candidate feature subsets obtained by calculation through a support vector machine is adopted, the advantages and disadvantages of each group of nested candidate feature subsets are verified based on the classification precision, the nested candidate feature subset with the optimal classification performance is selected, and the record of the candidate feature subset with the largest classification precision is completed.

As a further technical limitation, let X be a feature set formed by N features in total, S be a selected feature set, and F be a feature set to be selected, the specific steps for obtaining the optimal candidate feature subset are as follows:

(1) Setting the selected feature set S as an empty set and the candidate feature set F as a full feature set, namely S- & gt phi, F- & gt X;

(2) Calculation of Correlation measure W (x _i; y) with label y, all features are ordered according to W to form new candidate feature set F', and the first k features are respectively recorded as One of the features is taken in turnAs an initial feature, namely order

(3) Setting the fractional number alpha _t E [0.1,0.25,0.5,0.75,0.9] (1 is less than or equal to T is less than or equal to T) of the correlation and redundancy measure, setting x _i∈S_m-1,x_j∈X-S_m-1 (m=2,.. N), seeking a characteristic for enabling max phi (W (x _j;y),G(x_j;x_i),α_t) to be x _vt ^** from F' _m-1, and enabling F _m-1′-{x_vt ^**}→F_m′,S_m-1+{x_vt ^**}→S_m;

(4) Repeating the steps (2) - (3) until F' is Obtaining a set of initial features corresponding to differentT×n mutually nested candidate feature subset matricesThe total feature set combination is S= [ S ₁ S₂ … S_v … S_k]^T (v is more than or equal to 1 and is less than or equal to k);

(5) For the corresponding nested candidate feature set group in S _v, each feature subset is verified successively by using SVM Classification accuracy of (2)And recording candidate feature subsets which reach stability and approach global highest precision and their associated weights alpha _t ^*, i.e. j satisfies

(6) Sequentially obtaining k results in S, and recording the results obtained in (5)Order theSatisfy the following requirementsAnd the candidate feature subset with smaller dimension is the optimal candidate feature subset, wherein delta and delta' both represent set thresholds.

According to some embodiments, a second aspect of the present disclosure provides a feature selection integration system including large-scale wind power grid connection, which adopts the following technical scheme:

A feature selection integration system including large scale wind integration, comprising:

an acquisition module configured to acquire multidimensional feature data including wind power grid-connected response features;

the construction module is configured to perform incremental search on the acquired multidimensional data features based on preset optimal correlation and redundancy evaluation criteria, and construct a plurality of groups of nested candidate feature subsets;

a computing module configured to compute classification accuracy of the constructed plurality of sets of nested candidate feature subsets;

And the selection integration module is configured to record the candidate feature subset with the largest classification precision, verify the dimension of the recorded candidate feature subset, obtain the optimal candidate feature subset and finish feature selection integration.

According to some embodiments, a third aspect of the present disclosure provides a computer-readable storage medium, which adopts the following technical solutions:

A computer readable storage medium having stored thereon a program which when executed by a processor performs the steps in a feature selection integration method with large scale wind integration according to the first aspect of the disclosure.

According to some embodiments, a fourth aspect of the present disclosure provides an electronic device, which adopts the following technical solutions:

an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the feature selection integration method comprising large scale wind integration according to the first aspect of the disclosure when the program is executed.

Compared with the prior art, the beneficial effects of the present disclosure are:

The method comprises the steps of adopting an improved MMRMR integrated selection method, deriving a plurality of MMRMR algorithms by introducing multiple correlation and redundancy evaluation criteria, realizing intensive fusion by corresponding strategies, simultaneously introducing weight factors into an evaluation function, expanding an initial feature search space, obtaining a group of nested candidate feature subsets, adopting a learning algorithm to verify one by one to obtain an optimal feature subset, verifying superiority of the selected optimal feature subset, and realizing the requirement of quick evaluation.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.

FIG. 1 is a flow chart of a feature selection integration method including large scale wind integration in accordance with an embodiment of the present disclosure;

FIG. 2 is a flowchart of an algorithm for the integration MMRMR selection framework to determine the optimal evaluation criteria in one embodiment of the present disclosure;

FIG. 3 is a flow chart of an improved incremental search algorithm in accordance with an embodiment of the present disclosure;

FIG. 4 is a schematic diagram showing a comparison of different MMRMR feature selection effects in an integrated framework according to one embodiment of the present disclosure;

Fig. 5 (a) is a comparison diagram of the feature selection effect of mRMR-R-NMI at different quantiles α when k=3 in the first embodiment of the present disclosure;

Fig. 5 (b) is a comparison diagram of the feature selection effect of mRMR-R-NMI at different quantiles α when k=4 in the first embodiment of the present disclosure;

FIG. 6 is a comparison of time complexity of a feature selection process in accordance with one embodiment of the present disclosure;

FIG. 7 is a flow chart of a feature selection integration system including large scale wind integration in a second embodiment of the disclosure.

Detailed Description

The disclosure is further described below with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

Example 1

The embodiment of the disclosure first introduces a feature selection integration method containing large-scale wind power grid connection.

The feature selection integration method with the large-scale wind power grid connection shown in fig. 1 comprises the following steps:

The obtained multidimensional characteristic data containing wind power grid connection response characteristics comprises voltage drop of each wind power plant grid connection point at the moment of failure, output current jump of each wind power plant grid connection port at the moment of failure, instantaneous change ratio of output current of each wind power plant grid connection port at the moment of failure relative to that before failure removal, active power output of each wind power plant at the moment of failure relative to that before failure removal, and reactive power output of each wind power plant at the moment of failure removal relative to that before failure.

Specifically, the feature selection integration method including large-scale wind power grid connection introduced in the embodiment mainly includes two stages, namely selecting a suitable mRMR evaluation criterion and selecting an optimal feature set based on an improved mRMR search strategy.

In selecting the appropriate mRMR evaluation criteria, various correlation measure criteria are introduced.

The traditional mRMR algorithm generally adopts mutual information as an index for measuring the correlation between the features and the categories, has poor adaptability and limits the improvement of the feature searching performance to a certain extent. There are a number of criteria for evaluating the importance of features, but there is no corresponding theoretical support for which method is more appropriate for which data type.

In the embodiment, intensive fusion is adopted, and through comprehensive analysis, T-test, χ ² test, characteristic Score based on a Relief algorithm, a filtering characteristic selection algorithm (Fisher-Score), an information gain rate (Information gain ratio, IGR), a Kruskal-Wallis test (Kruskal-Wallis), a maximum correlation coefficient MIC and the like are selected as indexes for measuring the importance degree of the characteristics, and the side can reflect the association relation with the category. The T-test compares whether the difference between two features is obvious or not by using a T distribution theory, chiSquare checks and calculates the correlation between the features and class labels by using χ ² to realize importance scoring, the Relief algorithm scores the features by calculating the sample distances between the features in the classes and between the classes, the Fisher-Score mainly selects the features with more discrimination information according to the principle that the distances between the classes are small, the larger the value of the features is, the more important the features are, the greater the correlation between the features and the classes is, the IGR scores the features by calculating the change rate of the entropy of the information before and after the features are used, the larger the value of the features is, the more important the features are, the higher the correlation between the features and the classes is, the Kruskal-Wallis scores the features by calculating whether the difference exists between the different feature distributions, and the MIC is a new method for detecting the nonlinear correlation between the features, and the value of the mutual information is converted into a new measurement mode by searching for discretization.

In the aspect of redundancy among features, in order to better balance correlation and redundancy and overcome the defect that mutual information is biased to take more attributes, the mutual information is improved, and a normalized NMI is provided, as shown in a formula (1).

Wherein H (x) and H (y) are the entropy of x and y, respectively.

In this example, a conventional MMRMR algorithm and seven modified MMRMR algorithms were used, and the integration framework is shown in table 1 with conventional mRMR as a control.

Table 1 MMRMR algorithm integration framework

The eight selection algorithms are constructed, training is carried out through training sets respectively, a plurality of groups of nested candidate subsets are obtained according to corresponding evaluation functions max phi (D, R), and the maximum classification accuracy and the corresponding optimal feature subsets under different algorithms are obtained through classifier performance comparison analysis. And comprehensively considering two factors of dimension reduction and classification precision, determining MMRMR an optimal correlation degree and redundancy degree estimation method under the whole framework, and verifying in a test set, wherein the algorithm flow chart is shown in figure 2.

Since the feature that makes max [ I (x _i; y) ] hold is always selected as the initial feature in the incremental search process, which may cause the subsequent feature search to be too limited, the embodiment makes an improvement, instead of selecting the feature that is most relevant to the category as the initial feature, the first k features that are ordered according to the determined optimal relevance measure W (x _i; y) are replaced by the initial feature, and respectively sequentially performing the mRMR incremental search as the initial feature, so as to obtain multiple sets of nested candidate subsets.

In addition, in the searching process under different initial characteristics, in order to divide the weight between the correlation measure and the redundancy more carefully, a quantile alpha is introduced, an optimal characteristic sequence under different weights is obtained by carrying out iterative assignment on the alpha, and the advantages and disadvantages of all nested characteristic subsets are verified by using a support vector machine, so that the characteristic subsets with optimal classification performance are selected. The specific correction criteria are as follows:

F=maxΦ(W,G)

Φ=αW-(1-α)G (2)

Wherein W and G represent the optimal correlation measure and redundancy measure obtained in the preamble step, respectively. When new features are introduced, the following strategy search is performed:

the flow of the improved incremental search algorithm is shown in fig. 3, where b is the variable step size of the quantile α.

Let X be the feature set formed by N features in total, S be the selected feature set, F be the feature set to be selected, then the specific steps for obtaining the optimal candidate feature subset are as follows:

The method proposed in this embodiment is subjected to calculation analysis, and specifically includes the following steps:

The proposed MMRMR policy-based feature selection integration framework is verified in an IEEE39 node power system containing wind power. The original feature set adopts a 63-dimensional feature set containing wind power response features. By employing different relevance and redundancy evaluation criteria, the "value" of the corresponding feature can be changed throughout the incremental search, affecting its prioritization in the candidate subset.

In order to compare the characteristic selection algorithm, the optimal correlation and redundancy evaluation criteria are selected, the dimension is reduced to the greatest extent on the basis of guaranteeing the classification accuracy as much as possible, the training time of a subsequent model is shortened, and the model performance is improved. The eight nested candidate subsets are classified and predicted by using the SVM, and the classification accuracy is used as an evaluation standard, so that the feature selection effect can be indirectly reflected, and the classification effect is shown in fig. 4.

In the process of adding features one by one according to the sequence, the evaluation accuracy trend gradually increases from a certain value, and after a certain extreme value is reached, the trend is stable or slightly fluctuates. This demonstrates that there is indeed redundancy between features, and also confirms the necessity of selecting a suitable number of features that can reduce redundancy of information, while too many features can place a heavy burden on training of the model, degrading model evaluation performance. On the other hand, the performances of different MMRMR algorithms before reaching the peak value are also different, as the number of features is continuously increased, the accuracy of the mRMR-R-NMI algorithm reaches the vicinity of a stable value at first, which indicates that compared with other MMRMR algorithms, the mRMR-R-NMI algorithm can reach relatively higher classification accuracy with the least number of features, and the evaluation criterion based on the Relief score is more suitable for the data types of the power grid under various permeability, and the NMI adopts a normalization form, so that the characteristics with smaller attribute values can be considered, and the defect of the attribute with larger bias value in the process of selecting the features is effectively overcome. Therefore, mRMR-R-NMI is selected as the feature selection means, i.e. the Relief score is selected as the correlation measure, and NMI is selected as the redundancy measure.

By improving the incremental search algorithm, alternative initial features are enlarged, quantiles are introduced to refine the search space, and the screening level is further improved. The classification effect when k takes different values is shown in fig. 5 (a) and 5 (b), respectively:

By refining the search space with different initial features and weights, multiple sets of nested candidate subsets can be obtained. Taking k=3 as an example, different feature sequences are generated under different bit division factors alpha, and comparison shows that the feature changes at the head and tail are not large, and the feature ordering changes near the optimal feature dimension are large, which means that the intrinsic difference between the features is small, the sequencing order of the features can be changed through fine weight adjustment, and the final selection of the optimal subset is affected. As shown by comprehensive comparison, when the quantile alpha is 0.25, the overall performance is best in the middle and low dimensions, and when the feature dimension is only 14 dimensions, the accuracy reaches 96.48%, compared with the original feature set, the efficient simplified expression of the information is realized, the possible decline of the accuracy caused by excessive redundant information is avoided, the model performance is improved, and the dimension compression is 22% of the original dimension.

To further verify the superiority of the method and the selected feature subset proposed in this embodiment, the selected feature subset A1 is compared with the current existing dimension reduction methods, such as classical mRMR, principal component analysis PCA, recursive feature elimination method based on correlation bias reduction SVM-RFE-CBR, improved algorithm ReliefF, fisher, LASSO of Relief, regularized discrimination feature selection UDFS and InfFS algorithm oriented to unsupervised learning, and the like. When the PCA algorithm is used, two compression methods are adopted, namely one is to reserve the variance of 99% of the original data set, and the other is to compress to the same dimension as A1, so that two feature subsets A2 and A3 are respectively constructed. And the other algorithms all take the same dimension as A1 for comparison verification. Wherein the recursive feature elimination method uses SVM to score individual features, C and g are 2 ⁰ and 2 ^-6, respectively.

And adopting a Least Square Support Vector Machine (LSSVM) as an evaluation model, selecting an RBF kernel function, optimizing and selecting model parameters by combining grid search and ten-fold cross validation, respectively setting initial values of gamma and sigma ² to be 10 and 0.5, and then testing and validating. In evaluating the effect, accuracy ACC, kappa statistics and area under ROC curve AUC were chosen as indicators herein. Taking the average value eta of the three as a comprehensive evaluation index, wherein the expression is shown in a formula (4):

the final different feature selection algorithm effect pairs are shown in table 2, and the time complexity in the feature selection process is shown in fig. 6.

Table 2 comparison of the best subset effects selected by different feature selection algorithms

Based on table 2, the optimal subset selected by the method of the present embodiment has the best performance, which is higher than the conventional mRMR and other algorithms, in the same dimension. PCA loses the physical meaning of the feature while reducing the dimensionality, making the compressed subset less interpretable. In terms of time complexity, as can be seen from fig. 6, the SVM-RFE-CBR algorithm consumes the most time in the feature selection process, and other Filter methods are far lower than the wrapier algorithm, mainly because the SVM-RFE-CBR algorithm needs to repeatedly train the SVM to determine the optimal parameters of the model in the process of recursively eliminating the features, and score each feature according to the classification result, thus greatly increasing the calculation time. The time complexity of the method provided by the embodiment is at a medium level in the Filter method and is smaller than that of the LASSO algorithm, so that the requirement of rapid evaluation is met.

According to the embodiment, an improved MMRMR integrated selection method is adopted, multiple MMRMR algorithms are derived through introducing multiple correlation and redundancy evaluation criteria, intensive fusion is achieved through corresponding strategies, meanwhile, weight factors are introduced into an evaluation function, an initial feature search space is enlarged, a group of nested candidate feature subsets is obtained, then the optimal feature subsets are obtained through verification one by one through a learning algorithm, superiority of the selected optimal feature subsets is verified, and the requirement of rapid evaluation is met.

Example two

The second embodiment of the disclosure introduces a feature selection integrated system containing large-scale wind power grid connection.

A feature selection integration system including large scale wind integration as shown in fig. 7, comprising:

The detailed steps are the same as those of the feature selection integration method including large-scale wind power grid connection provided in the first embodiment, and are not described herein again.

Example III

A third embodiment of the present disclosure provides a computer-readable storage medium.

A computer readable storage medium having stored thereon a program which when executed by a processor performs the steps in a feature selection integration method with large scale wind integration according to one embodiment of the disclosure.

Example IV

The fourth embodiment of the disclosure provides an electronic device.

An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps in the feature selection integration method including large scale wind grid connection according to the first embodiment of the disclosure when executing the program.

The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims

1. A feature selection integration method for large-scale wind power grid connection, characterized in that it includes:

Obtain multidimensional feature data containing wind power grid connection response characteristics;

Based on the preset optimal correlation and redundancy evaluation criteria, incremental search is performed on the acquired multidimensional data features to construct multiple nested candidate feature subsets;

Calculate the classification accuracy of the constructed multiple nested candidate feature subsets;

Record the candidate feature subset with the highest classification accuracy, verify the dimensionality of the recorded candidate feature subset, obtain the optimal candidate feature subset, and complete the feature selection integration;

Let X be a feature set consisting of N features, S be the selected feature set, and F be the candidate feature set. The specific steps to obtain the optimal candidate feature subset are as follows:

(1) Initialization; Set the selected feature set S to an empty set and the candidate feature set F to a full feature set, i.e. , ;

(2) Calculation Relevance measure between label y Sort all features according to W to form a new candidate feature set Fˊ , and denote the first k features as follows: , , …, Take one feature in turn As an initial feature, that is, let , ;

(3) Next, set the quantiles of the correlation and redundancy measures. At the same time, set , ,from China seeks to make The characteristics of establishment are denoted as ,make , ;

(4) Repeat steps (2) to (3) until Fˊ is This yields a set of features corresponding to different initial characteristics. T×N nested candidate feature subset matrix The total feature set combination is ;

(5) The corresponding nested candidate feature sets are used to successively verify each feature subset using SVM. Classification accuracy It also records the candidate feature subsets that achieve stable accuracy and are close to the global highest accuracy, along with their associated weights. That is, j satisfies ;

(6) Calculate in sequence Find k results and record the results obtained in (5). ,make ,satisfy Furthermore, the subset of candidate features with smaller dimensionality is the optimal subset of candidate features. and Both indicate the setting of a threshold.

2. The feature selection integration method for large-scale wind power grid connection as described in claim 1, characterized in that the process of determining the preset optimal correlation and redundancy evaluation criteria is as follows:

The acquired multidimensional feature data containing wind power grid connection response characteristics were normalized and preprocessed, and the processed multidimensional feature data were divided into experimental dataset and test dataset.

The correlation criterion and redundancy were calculated and evaluated on the obtained experimental datasets to obtain the optimal feature subset;

Based on the obtained optimal feature subset, taking into account both dimensionality reduction and classification accuracy, the optimal relevance and redundancy assessments are obtained, and the preset optimal relevance and redundancy evaluation criteria are determined.

3. The feature selection integration method for large-scale wind power grid connection as described in claim 2, characterized in that, based on intensive fusion comprehensive analysis, T-test, ^χ² test, feature score based on the Relief algorithm, filtering feature selection algorithm, information gain ratio, Kruskal-Wallis test, and maximum correlation coefficient are selected as evaluation indicators for the experimental dataset; wherein, the T-test uses t-distribution theory to compare whether the differences between the features of two experimental datasets are significant; and the χ² test is used to evaluate the features of the experimental datasets. ^2. The correlation between calculated features and class labels is examined to achieve importance scoring; the Relief algorithm scores features by calculating intra-class and inter-class sample distances; the filtering feature selection algorithm selects features containing more discriminative information based on the principle of small intra-class distance and large inter-class distance. The larger the value of the filtering feature selection algorithm, the more important the feature and the greater its correlation with the class; the information gain ratio scores features by calculating the rate of change of information entropy before and after the feature is used. The larger the value, the more important the feature and the greater its correlation with the class; the Kruskal-Wallis test scores features by calculating whether there are differences in the distribution of different features; a new method for detecting nonlinear correlation between features is the maximum correlation coefficient, which transforms the mutual information value into a new metric by finding the optimal discretization.

4. The feature selection and integration method for large-scale wind power grid connection as described in claim 3, characterized in that, regarding feature redundancy, to balance correlation and redundancy and overcome the bias of mutual information towards attributes with more values, mutual information is improved based on normalization, i.e., the normalized form NMI is... ,in, and The entropy of x and y are respectively used to measure the amount of information in an event with multiple states, that is, the expected value of the amount of information with respect to the probability distribution of the event.

5. The feature selection integration method for large-scale wind power grid connection as described in claim 2, characterized in that, in the process of constructing the multiple nested candidate feature subsets, a search is performed based on different initial multidimensional data features. In order to divide the weight between correlation measure and redundancy, quantiles are introduced. By iteratively assigning values to the introduced quantiles, the optimal feature sequence under different weights is obtained. The optimal feature sequence under different weights is the multiple nested candidate feature subsets.

6. The feature selection integration method for large-scale wind power grid connection as described in claim 1, characterized in that the classification accuracy of multiple nested candidate feature subsets obtained by support vector machine calculation is used, the merits of each nested candidate feature subset are verified based on the classification accuracy, the nested candidate feature subset with the best classification performance is selected, and the candidate feature subset with the highest classification accuracy is recorded.

7. A feature selection integration system for large-scale wind power grid connection, employing a feature selection integration method for large-scale wind power grid connection as described in any one of claims 1-6, characterized in that it comprises:

The acquisition module is configured to acquire multidimensional feature data containing wind power grid connection response characteristics;

The construction module is configured to perform incremental search on the acquired multidimensional data features based on preset optimal relevance and redundancy evaluation criteria, and construct multiple nested candidate feature subsets.

The computation module is configured to calculate the classification accuracy of the constructed multiple nested subsets of candidate features;

The selection integration module is configured to record the candidate feature subset with the highest classification accuracy, verify the dimensionality of the recorded candidate feature subset, obtain the optimal candidate feature subset, and complete the feature selection integration.

8. A computer-readable storage medium having a program stored thereon, characterized in that, when executed by a processor, the program implements the steps of the feature selection integration method for large-scale wind power grid connection as described in any one of claims 1-6.

9. An electronic device comprising a memory, a processor, and a program stored in the memory and executable on the processor, characterized in that, when the processor executes the program, it implements the steps of the feature selection integration method for large-scale wind power grid connection as described in any one of claims 1-6.