[go: up one dir, main page]

CN115374874B - A feature selection integration method and system for large-scale wind power grid connection - Google Patents

A feature selection integration method and system for large-scale wind power grid connection

Info

Publication number
CN115374874B
CN115374874B CN202211055817.XA CN202211055817A CN115374874B CN 115374874 B CN115374874 B CN 115374874B CN 202211055817 A CN202211055817 A CN 202211055817A CN 115374874 B CN115374874 B CN 115374874B
Authority
CN
China
Prior art keywords
feature
features
candidate
wind power
power grid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211055817.XA
Other languages
Chinese (zh)
Other versions
CN115374874A (en
Inventor
石访
杜宗展
赵昱臣
张恒旭
王谱宇
郭全
刘晓宁
董振风
田硕硕
刘尊龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202211055817.XA priority Critical patent/CN115374874B/en
Publication of CN115374874A publication Critical patent/CN115374874A/en
Application granted granted Critical
Publication of CN115374874B publication Critical patent/CN115374874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Supply And Distribution Of Alternating Current (AREA)

Abstract

The invention belongs to the technical field of power systems, and particularly relates to a feature selection integration method and system for large-scale wind power grid connection, wherein the method comprises the steps of obtaining multidimensional feature data containing wind power grid connection response features; the method comprises the steps of carrying out incremental search on the obtained multidimensional data features based on preset optimal correlation and redundancy evaluation criteria to construct a plurality of groups of nested candidate feature subsets, calculating classification precision of the constructed plurality of groups of nested candidate feature subsets, recording the candidate feature subset with the largest classification precision, verifying dimensionality of the recorded candidate feature subset to obtain the optimal candidate feature subset, and completing feature selection integration.

Description

Feature selection integration method and system containing large-scale wind power grid connection
Technical Field
The disclosure belongs to the technical field of power systems, and particularly relates to a feature selection integration method and system containing large-scale wind power grid connection.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The system after the wind power and other new energy units are connected presents the new characteristics which are not possessed by the traditional synchronous units, has the characteristics of weak stability, low inertia response characteristic, weak anti-interference performance, low overload capacity, strong output fluctuation and the like, brings new risks and challenges to the classical stability problem, and obviously increases the difficulty of transient power angle stability assessment and control by applying the traditional method. At present, the traditional analysis method based on the model faces the problems of complex modeling and difficulty of a new energy unit and novel power electronic equipment when a large amount of new energy is accessed, real-time requirements are difficult to meet, and a novel power electronic element is not easy to independently construct a Lyapunov energy function, so that a quick, accurate and online transient stability assessment method is needed.
The advantages of strong learning capability and high calculation speed of the big data technology are utilized to fully mine potential useful information, meanwhile, the problems of difficult modeling of a new energy unit and the like can be avoided, and a new thought is provided for transient power angle stability evaluation of a high-proportion new energy power system. From the data driving point of view, the model structure and the internal control logic do not need to be considered in detail, and the measurement data is learned in an off-line mode, so that on-line evaluation can be carried out. Meanwhile, the new energy unit has complex control strategy and numerous element parameters, and the power grid measurement means are continuously improved, so that a practical condition is objectively provided for the application of a data driving method.
The selection of the appropriate input features is based on and importance of the subsequent transient stability assessment using a machine learning model. If the original feature set constructed by all the information such as the generator and the bus is directly used as input, on one hand, the complexity and the parameter number of the subsequent model structure can be greatly increased, a heavy burden is brought to training, compared with the optimal feature subset, the training time length can be increased by tens of times or tens of times, the application in the power grid is not facilitated in real time, and even the dimension disaster problem can be caused, on the other hand, the constructed feature set is usually combined and calculated according to the manual experience from a physical mechanism to obtain the corresponding feature quantity, and the deep implicit association relation in the data cannot be identified by the manual work, so that a large amount of redundant information can exist among the features, the redundant information not only can waste valuable computing resources, but also can cause the reduction of model precision. Therefore, it is important to select a representative feature subset from the original feature set, remove redundant information, and realize the optimal reduction of dimension, so as to further improve the correlation performance of the subsequent algorithm.
The inventor knows that the main methods of dimension reduction at present comprise two types of feature extraction and feature selection, wherein the feature extraction compresses original data by mapping the original data into a new dimension space, but the physical meaning represented by the feature is lost in the process, and the feature selection does not change the original expression of the feature, only reduces the complexity by a screening mode, and is more beneficial to explaining the subsequent classification result from the aspect of the physical meaning.
Disclosure of Invention
In order to solve the problems, the disclosure provides a feature selection integration method and system containing large-scale wind power grid connection, and the original feature set is processed in a feature selection mode to meet the requirement of rapid evaluation.
According to some embodiments, a first scheme of the present disclosure provides a feature selection integration method including large-scale wind power grid connection, which adopts the following technical scheme:
a feature selection integration method containing large-scale wind power grid connection comprises the following steps:
Acquiring multidimensional characteristic data containing wind power grid-connected response characteristics;
Performing incremental search on the acquired multidimensional data features based on preset optimal correlation and redundancy evaluation criteria, and constructing a plurality of groups of nested candidate feature subsets;
Calculating classification precision of the constructed multiple groups of nested candidate feature subsets;
And recording the candidate feature subset with the maximum classification precision, verifying the dimensionality of the recorded candidate feature subset to obtain an optimal candidate feature subset, and completing feature selection integration.
As a further technical limitation, the determining process of the preset optimal correlation and redundancy evaluation criterion is as follows:
Carrying out normalization pretreatment on the obtained multidimensional feature data containing wind power grid-connected response features, and dividing the treated multidimensional feature data into an experimental data set and a test data set;
respectively carrying out correlation criterion calculation and redundancy evaluation on the obtained experimental data set to obtain the most preferred feature subset;
Based on the obtained optimal feature subset, dimension reduction and classification precision are comprehensively considered, optimal correlation and redundancy evaluation are obtained, and preset optimal correlation and redundancy evaluation criteria are determined.
Further, based on intensive fusion comprehensive analysis, T-test, χ 2 test, characteristic Score based on a Relief algorithm, filter characteristic selection algorithm Fisher-Score, information gain rate and Kruskal-Wallis test Kruskal-Wallis and maximum correlation coefficient are selected as measurement indexes of an experimental data set, wherein the T-test compares whether the difference of characteristics of two experimental data sets is significant or not by using T distribution theory, correlation between the characteristics and class labels is calculated by using χ 2 test to realize importance scoring, the Relief algorithm scores the characteristics by calculating sample distances between the intra-class and inter-class, the Fisher-Score selects the characteristics with more discrimination information according to the principle that the intra-class distance is small, the greater the Fisher-Score value is, the more important the characteristics are, the greater the degree of correlation between the characteristics and the class is, the information gain rate is calculated by calculating the change rate of the information entropy before and after the characteristics are used to Score the characteristics, the more important characteristics are marked, the characteristics are represented by the greater the values are more important characteristics, the maximum degree of the characteristics are calculated, and the maximum degree of correlation between the characteristics and the maximum correlation coefficient is calculated by the maximum degree of the correlation is not converted by the optimal degree, and the optimal degree is calculated, and the optimal degree is not converted, and the correlation between the characteristics is calculated.
Further, in the aspect of redundancy among features, in order to balance the correlation and the redundancy, the mutual information is overcome to be biased to take more attributes, and the improvement of the mutual information is carried out based on normalization, namely, the NMI in a normalization form isWhere H (x) and H (y) are the entropy of x and y, respectively, to measure the amount of information in a plurality of states that an event may have, i.e., the expected value of the amount of information with respect to the probability distribution of the event.
Further, in the process of constructing the multiple sets of nested candidate feature subsets, searching is performed based on different initial multidimensional data features, quantiles are introduced for dividing weights between the relevance measure and the redundancy, and iterative assignment is performed on the introduced quantiles to obtain optimal feature sequences under different weights, wherein the optimal feature sequences under different weights are the multiple sets of nested candidate feature subsets.
As a further technical limitation, the classification precision of a plurality of groups of nested candidate feature subsets obtained by calculation through a support vector machine is adopted, the advantages and disadvantages of each group of nested candidate feature subsets are verified based on the classification precision, the nested candidate feature subset with the optimal classification performance is selected, and the record of the candidate feature subset with the largest classification precision is completed.
As a further technical limitation, let X be a feature set formed by N features in total, S be a selected feature set, and F be a feature set to be selected, the specific steps for obtaining the optimal candidate feature subset are as follows:
(1) Setting the selected feature set S as an empty set and the candidate feature set F as a full feature set, namely S- & gt phi, F- & gt X;
(2) Calculation of Correlation measure W (x i; y) with label y, all features are ordered according to W to form new candidate feature set F', and the first k features are respectively recorded as One of the features is taken in turnAs an initial feature, namely order
(3) Setting the fractional number alpha t E [0.1,0.25,0.5,0.75,0.9] (1 is less than or equal to T is less than or equal to T) of the correlation and redundancy measure, setting x i∈Sm-1,xj∈X-Sm-1 (m=2,.. N), seeking a characteristic for enabling max phi (W (x j;y),G(xj;xi),αt) to be x vt ** from F' m-1, and enabling F m-1′-{xvt **}→Fm′,Sm-1+{xvt **}→Sm;
(4) Repeating the steps (2) - (3) until F' is Obtaining a set of initial features corresponding to differentT×n mutually nested candidate feature subset matricesThe total feature set combination is S= [ S 1 S2 … Sv … Sk]T (v is more than or equal to 1 and is less than or equal to k);
(5) For the corresponding nested candidate feature set group in S v, each feature subset is verified successively by using SVM Classification accuracy of (2)And recording candidate feature subsets which reach stability and approach global highest precision and their associated weights alpha t *, i.e. j satisfies
(6) Sequentially obtaining k results in S, and recording the results obtained in (5)Order theSatisfy the following requirementsAnd the candidate feature subset with smaller dimension is the optimal candidate feature subset, wherein delta and delta' both represent set thresholds.
According to some embodiments, a second aspect of the present disclosure provides a feature selection integration system including large-scale wind power grid connection, which adopts the following technical scheme:
A feature selection integration system including large scale wind integration, comprising:
an acquisition module configured to acquire multidimensional feature data including wind power grid-connected response features;
the construction module is configured to perform incremental search on the acquired multidimensional data features based on preset optimal correlation and redundancy evaluation criteria, and construct a plurality of groups of nested candidate feature subsets;
a computing module configured to compute classification accuracy of the constructed plurality of sets of nested candidate feature subsets;
And the selection integration module is configured to record the candidate feature subset with the largest classification precision, verify the dimension of the recorded candidate feature subset, obtain the optimal candidate feature subset and finish feature selection integration.
According to some embodiments, a third aspect of the present disclosure provides a computer-readable storage medium, which adopts the following technical solutions:
A computer readable storage medium having stored thereon a program which when executed by a processor performs the steps in a feature selection integration method with large scale wind integration according to the first aspect of the disclosure.
According to some embodiments, a fourth aspect of the present disclosure provides an electronic device, which adopts the following technical solutions:
an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the feature selection integration method comprising large scale wind integration according to the first aspect of the disclosure when the program is executed.
Compared with the prior art, the beneficial effects of the present disclosure are:
The method comprises the steps of adopting an improved MMRMR integrated selection method, deriving a plurality of MMRMR algorithms by introducing multiple correlation and redundancy evaluation criteria, realizing intensive fusion by corresponding strategies, simultaneously introducing weight factors into an evaluation function, expanding an initial feature search space, obtaining a group of nested candidate feature subsets, adopting a learning algorithm to verify one by one to obtain an optimal feature subset, verifying superiority of the selected optimal feature subset, and realizing the requirement of quick evaluation.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate and explain the exemplary embodiments of the disclosure and together with the description serve to explain the disclosure, and do not constitute an undue limitation on the disclosure.
FIG. 1 is a flow chart of a feature selection integration method including large scale wind integration in accordance with an embodiment of the present disclosure;
FIG. 2 is a flowchart of an algorithm for the integration MMRMR selection framework to determine the optimal evaluation criteria in one embodiment of the present disclosure;
FIG. 3 is a flow chart of an improved incremental search algorithm in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram showing a comparison of different MMRMR feature selection effects in an integrated framework according to one embodiment of the present disclosure;
Fig. 5 (a) is a comparison diagram of the feature selection effect of mRMR-R-NMI at different quantiles α when k=3 in the first embodiment of the present disclosure;
Fig. 5 (b) is a comparison diagram of the feature selection effect of mRMR-R-NMI at different quantiles α when k=4 in the first embodiment of the present disclosure;
FIG. 6 is a comparison of time complexity of a feature selection process in accordance with one embodiment of the present disclosure;
FIG. 7 is a flow chart of a feature selection integration system including large scale wind integration in a second embodiment of the disclosure.
Detailed Description
The disclosure is further described below with reference to the drawings and examples.
It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the present disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments in accordance with the present disclosure. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.
Example 1
The embodiment of the disclosure first introduces a feature selection integration method containing large-scale wind power grid connection.
The feature selection integration method with the large-scale wind power grid connection shown in fig. 1 comprises the following steps:
Acquiring multidimensional characteristic data containing wind power grid-connected response characteristics;
Performing incremental search on the acquired multidimensional data features based on preset optimal correlation and redundancy evaluation criteria, and constructing a plurality of groups of nested candidate feature subsets;
Calculating classification precision of the constructed multiple groups of nested candidate feature subsets;
And recording the candidate feature subset with the maximum classification precision, verifying the dimensionality of the recorded candidate feature subset to obtain an optimal candidate feature subset, and completing feature selection integration.
The obtained multidimensional characteristic data containing wind power grid connection response characteristics comprises voltage drop of each wind power plant grid connection point at the moment of failure, output current jump of each wind power plant grid connection port at the moment of failure, instantaneous change ratio of output current of each wind power plant grid connection port at the moment of failure relative to that before failure removal, active power output of each wind power plant at the moment of failure relative to that before failure removal, and reactive power output of each wind power plant at the moment of failure removal relative to that before failure.
Specifically, the feature selection integration method including large-scale wind power grid connection introduced in the embodiment mainly includes two stages, namely selecting a suitable mRMR evaluation criterion and selecting an optimal feature set based on an improved mRMR search strategy.
In selecting the appropriate mRMR evaluation criteria, various correlation measure criteria are introduced.
The traditional mRMR algorithm generally adopts mutual information as an index for measuring the correlation between the features and the categories, has poor adaptability and limits the improvement of the feature searching performance to a certain extent. There are a number of criteria for evaluating the importance of features, but there is no corresponding theoretical support for which method is more appropriate for which data type.
In the embodiment, intensive fusion is adopted, and through comprehensive analysis, T-test, χ 2 test, characteristic Score based on a Relief algorithm, a filtering characteristic selection algorithm (Fisher-Score), an information gain rate (Information gain ratio, IGR), a Kruskal-Wallis test (Kruskal-Wallis), a maximum correlation coefficient MIC and the like are selected as indexes for measuring the importance degree of the characteristics, and the side can reflect the association relation with the category. The T-test compares whether the difference between two features is obvious or not by using a T distribution theory, chiSquare checks and calculates the correlation between the features and class labels by using χ 2 to realize importance scoring, the Relief algorithm scores the features by calculating the sample distances between the features in the classes and between the classes, the Fisher-Score mainly selects the features with more discrimination information according to the principle that the distances between the classes are small, the larger the value of the features is, the more important the features are, the greater the correlation between the features and the classes is, the IGR scores the features by calculating the change rate of the entropy of the information before and after the features are used, the larger the value of the features is, the more important the features are, the higher the correlation between the features and the classes is, the Kruskal-Wallis scores the features by calculating whether the difference exists between the different feature distributions, and the MIC is a new method for detecting the nonlinear correlation between the features, and the value of the mutual information is converted into a new measurement mode by searching for discretization.
In the aspect of redundancy among features, in order to better balance correlation and redundancy and overcome the defect that mutual information is biased to take more attributes, the mutual information is improved, and a normalized NMI is provided, as shown in a formula (1).
Wherein H (x) and H (y) are the entropy of x and y, respectively.
In this example, a conventional MMRMR algorithm and seven modified MMRMR algorithms were used, and the integration framework is shown in table 1 with conventional mRMR as a control.
Table 1 MMRMR algorithm integration framework
The eight selection algorithms are constructed, training is carried out through training sets respectively, a plurality of groups of nested candidate subsets are obtained according to corresponding evaluation functions max phi (D, R), and the maximum classification accuracy and the corresponding optimal feature subsets under different algorithms are obtained through classifier performance comparison analysis. And comprehensively considering two factors of dimension reduction and classification precision, determining MMRMR an optimal correlation degree and redundancy degree estimation method under the whole framework, and verifying in a test set, wherein the algorithm flow chart is shown in figure 2.
Since the feature that makes max [ I (x i; y) ] hold is always selected as the initial feature in the incremental search process, which may cause the subsequent feature search to be too limited, the embodiment makes an improvement, instead of selecting the feature that is most relevant to the category as the initial feature, the first k features that are ordered according to the determined optimal relevance measure W (x i; y) are replaced by the initial feature, and respectively sequentially performing the mRMR incremental search as the initial feature, so as to obtain multiple sets of nested candidate subsets.
In addition, in the searching process under different initial characteristics, in order to divide the weight between the correlation measure and the redundancy more carefully, a quantile alpha is introduced, an optimal characteristic sequence under different weights is obtained by carrying out iterative assignment on the alpha, and the advantages and disadvantages of all nested characteristic subsets are verified by using a support vector machine, so that the characteristic subsets with optimal classification performance are selected. The specific correction criteria are as follows:
F=maxΦ(W,G)
Φ=αW-(1-α)G (2)
Wherein W and G represent the optimal correlation measure and redundancy measure obtained in the preamble step, respectively. When new features are introduced, the following strategy search is performed:
the flow of the improved incremental search algorithm is shown in fig. 3, where b is the variable step size of the quantile α.
Let X be the feature set formed by N features in total, S be the selected feature set, F be the feature set to be selected, then the specific steps for obtaining the optimal candidate feature subset are as follows:
(1) Setting the selected feature set S as an empty set and the candidate feature set F as a full feature set, namely S- & gt phi, F- & gt X;
(2) Calculation of Correlation measure W (x i; y) with label y, all features are ordered according to W to form new candidate feature set F', and the first k features are respectively recorded as One of the features is taken in turnAs an initial feature, namely order
(3) Setting the fractional number alpha t E [0.1,0.25,0.5,0.75,0.9] (1 is less than or equal to T is less than or equal to T) of the correlation and redundancy measure, setting x i∈Sm-1,xj∈X-Sm-1 (m=2,.. N), seeking a characteristic for enabling max phi (W (x j;y),G(xj;xi),αt) to be x vt ** from F' m-1, and enabling F m-1′-{xvt **}→Fm′,Sm-1+{xvt **}→Sm;
(4) Repeating the steps (2) - (3) until F' is Obtaining a set of initial features corresponding to differentT×n mutually nested candidate feature subset matricesThe total feature set combination is S= [ S 1 S2 … Sv … Sk]T (v is more than or equal to 1 and is less than or equal to k);
(5) For the corresponding nested candidate feature set group in S v, each feature subset is verified successively by using SVM Classification accuracy of (2)And recording candidate feature subsets which reach stability and approach global highest precision and their associated weights alpha t *, i.e. j satisfies
(6) Sequentially obtaining k results in S, and recording the results obtained in (5)Order theSatisfy the following requirementsAnd the candidate feature subset with smaller dimension is the optimal candidate feature subset, wherein delta and delta' both represent set thresholds.
The method proposed in this embodiment is subjected to calculation analysis, and specifically includes the following steps:
The proposed MMRMR policy-based feature selection integration framework is verified in an IEEE39 node power system containing wind power. The original feature set adopts a 63-dimensional feature set containing wind power response features. By employing different relevance and redundancy evaluation criteria, the "value" of the corresponding feature can be changed throughout the incremental search, affecting its prioritization in the candidate subset.
In order to compare the characteristic selection algorithm, the optimal correlation and redundancy evaluation criteria are selected, the dimension is reduced to the greatest extent on the basis of guaranteeing the classification accuracy as much as possible, the training time of a subsequent model is shortened, and the model performance is improved. The eight nested candidate subsets are classified and predicted by using the SVM, and the classification accuracy is used as an evaluation standard, so that the feature selection effect can be indirectly reflected, and the classification effect is shown in fig. 4.
In the process of adding features one by one according to the sequence, the evaluation accuracy trend gradually increases from a certain value, and after a certain extreme value is reached, the trend is stable or slightly fluctuates. This demonstrates that there is indeed redundancy between features, and also confirms the necessity of selecting a suitable number of features that can reduce redundancy of information, while too many features can place a heavy burden on training of the model, degrading model evaluation performance. On the other hand, the performances of different MMRMR algorithms before reaching the peak value are also different, as the number of features is continuously increased, the accuracy of the mRMR-R-NMI algorithm reaches the vicinity of a stable value at first, which indicates that compared with other MMRMR algorithms, the mRMR-R-NMI algorithm can reach relatively higher classification accuracy with the least number of features, and the evaluation criterion based on the Relief score is more suitable for the data types of the power grid under various permeability, and the NMI adopts a normalization form, so that the characteristics with smaller attribute values can be considered, and the defect of the attribute with larger bias value in the process of selecting the features is effectively overcome. Therefore, mRMR-R-NMI is selected as the feature selection means, i.e. the Relief score is selected as the correlation measure, and NMI is selected as the redundancy measure.
By improving the incremental search algorithm, alternative initial features are enlarged, quantiles are introduced to refine the search space, and the screening level is further improved. The classification effect when k takes different values is shown in fig. 5 (a) and 5 (b), respectively:
By refining the search space with different initial features and weights, multiple sets of nested candidate subsets can be obtained. Taking k=3 as an example, different feature sequences are generated under different bit division factors alpha, and comparison shows that the feature changes at the head and tail are not large, and the feature ordering changes near the optimal feature dimension are large, which means that the intrinsic difference between the features is small, the sequencing order of the features can be changed through fine weight adjustment, and the final selection of the optimal subset is affected. As shown by comprehensive comparison, when the quantile alpha is 0.25, the overall performance is best in the middle and low dimensions, and when the feature dimension is only 14 dimensions, the accuracy reaches 96.48%, compared with the original feature set, the efficient simplified expression of the information is realized, the possible decline of the accuracy caused by excessive redundant information is avoided, the model performance is improved, and the dimension compression is 22% of the original dimension.
To further verify the superiority of the method and the selected feature subset proposed in this embodiment, the selected feature subset A1 is compared with the current existing dimension reduction methods, such as classical mRMR, principal component analysis PCA, recursive feature elimination method based on correlation bias reduction SVM-RFE-CBR, improved algorithm ReliefF, fisher, LASSO of Relief, regularized discrimination feature selection UDFS and InfFS algorithm oriented to unsupervised learning, and the like. When the PCA algorithm is used, two compression methods are adopted, namely one is to reserve the variance of 99% of the original data set, and the other is to compress to the same dimension as A1, so that two feature subsets A2 and A3 are respectively constructed. And the other algorithms all take the same dimension as A1 for comparison verification. Wherein the recursive feature elimination method uses SVM to score individual features, C and g are 2 0 and 2 -6, respectively.
And adopting a Least Square Support Vector Machine (LSSVM) as an evaluation model, selecting an RBF kernel function, optimizing and selecting model parameters by combining grid search and ten-fold cross validation, respectively setting initial values of gamma and sigma 2 to be 10 and 0.5, and then testing and validating. In evaluating the effect, accuracy ACC, kappa statistics and area under ROC curve AUC were chosen as indicators herein. Taking the average value eta of the three as a comprehensive evaluation index, wherein the expression is shown in a formula (4):
the final different feature selection algorithm effect pairs are shown in table 2, and the time complexity in the feature selection process is shown in fig. 6.
Table 2 comparison of the best subset effects selected by different feature selection algorithms
Based on table 2, the optimal subset selected by the method of the present embodiment has the best performance, which is higher than the conventional mRMR and other algorithms, in the same dimension. PCA loses the physical meaning of the feature while reducing the dimensionality, making the compressed subset less interpretable. In terms of time complexity, as can be seen from fig. 6, the SVM-RFE-CBR algorithm consumes the most time in the feature selection process, and other Filter methods are far lower than the wrapier algorithm, mainly because the SVM-RFE-CBR algorithm needs to repeatedly train the SVM to determine the optimal parameters of the model in the process of recursively eliminating the features, and score each feature according to the classification result, thus greatly increasing the calculation time. The time complexity of the method provided by the embodiment is at a medium level in the Filter method and is smaller than that of the LASSO algorithm, so that the requirement of rapid evaluation is met.
According to the embodiment, an improved MMRMR integrated selection method is adopted, multiple MMRMR algorithms are derived through introducing multiple correlation and redundancy evaluation criteria, intensive fusion is achieved through corresponding strategies, meanwhile, weight factors are introduced into an evaluation function, an initial feature search space is enlarged, a group of nested candidate feature subsets is obtained, then the optimal feature subsets are obtained through verification one by one through a learning algorithm, superiority of the selected optimal feature subsets is verified, and the requirement of rapid evaluation is met.
Example two
The second embodiment of the disclosure introduces a feature selection integrated system containing large-scale wind power grid connection.
A feature selection integration system including large scale wind integration as shown in fig. 7, comprising:
an acquisition module configured to acquire multidimensional feature data including wind power grid-connected response features;
the construction module is configured to perform incremental search on the acquired multidimensional data features based on preset optimal correlation and redundancy evaluation criteria, and construct a plurality of groups of nested candidate feature subsets;
a computing module configured to compute classification accuracy of the constructed plurality of sets of nested candidate feature subsets;
And the selection integration module is configured to record the candidate feature subset with the largest classification precision, verify the dimension of the recorded candidate feature subset, obtain the optimal candidate feature subset and finish feature selection integration.
The detailed steps are the same as those of the feature selection integration method including large-scale wind power grid connection provided in the first embodiment, and are not described herein again.
Example III
A third embodiment of the present disclosure provides a computer-readable storage medium.
A computer readable storage medium having stored thereon a program which when executed by a processor performs the steps in a feature selection integration method with large scale wind integration according to one embodiment of the disclosure.
The detailed steps are the same as those of the feature selection integration method including large-scale wind power grid connection provided in the first embodiment, and are not described herein again.
Example IV
The fourth embodiment of the disclosure provides an electronic device.
An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps in the feature selection integration method including large scale wind grid connection according to the first embodiment of the disclosure when executing the program.
The detailed steps are the same as those of the feature selection integration method including large-scale wind power grid connection provided in the first embodiment, and are not described herein again.
The foregoing description of the preferred embodiments of the present disclosure is provided only and not intended to limit the disclosure so that various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
While the specific embodiments of the present disclosure have been described above with reference to the drawings, it should be understood that the present disclosure is not limited to the embodiments, and that various modifications and changes can be made by one skilled in the art without inventive effort on the basis of the technical solutions of the present disclosure while remaining within the scope of the present disclosure.

Claims (9)

1.一种含大规模风电并网的特征选择集成方法,其特征在于,包括:1. A feature selection integration method for large-scale wind power grid connection, characterized in that it includes: 获取含风电并网响应特征的多维特征数据;Obtain multidimensional feature data containing wind power grid connection response characteristics; 基于预设的最优相关性及冗余评价准则对所获取的多维数据特征进行增量搜索,构建多组嵌套候选特征子集;Based on the preset optimal correlation and redundancy evaluation criteria, incremental search is performed on the acquired multidimensional data features to construct multiple nested candidate feature subsets; 计算所构建的多组嵌套候选特征子集的分类精度;Calculate the classification accuracy of the constructed multiple nested candidate feature subsets; 记录所述分类精度最大的候选特征子集,验证所记录的候选特征子集的维度,得到最优候选特征子集,完成特征选择集成;Record the candidate feature subset with the highest classification accuracy, verify the dimensionality of the recorded candidate feature subset, obtain the optimal candidate feature subset, and complete the feature selection integration; 设X为共有N个特征构成的特征集合,S为已选特征集,F为待选特征集,则得到最优候选特征子集的具体步骤如下:Let X be a feature set consisting of N features, S be the selected feature set, and F be the candidate feature set. The specific steps to obtain the optimal candidate feature subset are as follows: (1)初始化;将已选特征集S设为空集,候选特征集F设为全特征集,即(1) Initialization; Set the selected feature set S to an empty set and the candidate feature set F to a full feature set, i.e. , ; (2)计算与标签y之间的相关性测度,将所有特征按照W排序形成新的候选特征集,前k个特征分别记为, …, ,依次取其中一个特征作为初始特征,即令(2) Calculation Relevance measure between label y Sort all features according to W to form a new candidate feature set , and denote the first k features as follows: , , …, Take one feature in turn As an initial feature, that is, let , ; (3)接着设置相关及冗余测度的分位数 ,同时设,从中寻求使成立的特征记为,令(3) Next, set the quantiles of the correlation and redundancy measures. At the same time, set , ,from China seeks to make The characteristics of establishment are denoted as ,make , ; (4)重复步骤(2)~(3),直到,得到一组对应于不同初始特征的T×N相互嵌套的候选特征子集矩阵 ;总的特征集组合为 (4) Repeat steps (2) to (3) until is This yields a set of features corresponding to different initial characteristics. T×N nested candidate feature subset matrix The total feature set combination is ; (5)对中对应的嵌套候选特征集组,使用SVM逐次验证各特征子集的分类精度,并记录达到稳定并接近全局最高精度的候选特征子集及其相关权值,即j满足(5) The corresponding nested candidate feature sets are used to successively verify each feature subset using SVM. Classification accuracy It also records the candidate feature subsets that achieve stable accuracy and are close to the global highest accuracy, along with their associated weights. That is, j satisfies ; (6)依次求取k个结果,并记录(5)中所得,令,满足且维度较小的候选特征子集即为最优候选特征子集,其中,均表示设定阈值。(6) Calculate in sequence Find k results and record the results obtained in (5). ,make ,satisfy Furthermore, the subset of candidate features with smaller dimensionality is the optimal subset of candidate features. and Both indicate the setting of a threshold. 2.如权利要求1中所述的一种含大规模风电并网的特征选择集成方法,其特征在于,所述预设的最优相关性及冗余评价准则的确定过程为:2. The feature selection integration method for large-scale wind power grid connection as described in claim 1, characterized in that the process of determining the preset optimal correlation and redundancy evaluation criteria is as follows: 对所获取的含风电并网响应特征的多维特征数据进行归一化预处理,将处理后的多维特征数据分为实验数据集和测试数据集;The acquired multidimensional feature data containing wind power grid connection response characteristics were normalized and preprocessed, and the processed multidimensional feature data were divided into experimental dataset and test dataset. 对所得到的实验数据集分别进行相关度准则计算和冗余度评估,得到最优选特征子集;The correlation criterion and redundancy were calculated and evaluated on the obtained experimental datasets to obtain the optimal feature subset; 基于所得到的最优特征子集,综合考虑维度缩减和分类精度,得到最优相关度和冗余度评估,确定预设的最优相关性及冗余评价准则。Based on the obtained optimal feature subset, taking into account both dimensionality reduction and classification accuracy, the optimal relevance and redundancy assessments are obtained, and the preset optimal relevance and redundancy evaluation criteria are determined. 3.如权利要求2中所述的一种含大规模风电并网的特征选择集成方法,其特征在于,基于集约融合的综合分析,选取T-test、χ2检验、基于Relief算法的特征得分、过滤式特征选择算法、信息增益率及克鲁斯卡尔-沃利斯检验和最大相关系数作为实验数据集的衡量指标;其中,T-test利用t分布理论比较两个实验数据集特征的差异是否显著;利用χ2检验计算特征与类标签的相关性,实现重要性打分;Relief算法通过计算类内和类间样本距离对特征打分;过滤式特征选择算法按照类内距离小,类间距离大的原则,选出包含鉴别信息较多的特征,过滤式特征选择算法的值越大,则该特征越重要,与类别的相关度越大;信息增益率通过计算特征被使用前后信息熵变化率来为特征进行打分,其值越大表示该特征越重要,与类别的相关度越大;克鲁斯卡尔-沃利斯检验通过计算不同特征分布是否存在差异对特征打分;最大相关系数检测特征间非线性相关度的新方法,通过寻找最优离散化,把互信息取值转换成新的度量方式。3. The feature selection integration method for large-scale wind power grid connection as described in claim 2, characterized in that, based on intensive fusion comprehensive analysis, T-test, χ² test, feature score based on the Relief algorithm, filtering feature selection algorithm, information gain ratio, Kruskal-Wallis test, and maximum correlation coefficient are selected as evaluation indicators for the experimental dataset; wherein, the T-test uses t-distribution theory to compare whether the differences between the features of two experimental datasets are significant; and the χ² test is used to evaluate the features of the experimental datasets. 2. The correlation between calculated features and class labels is examined to achieve importance scoring; the Relief algorithm scores features by calculating intra-class and inter-class sample distances; the filtering feature selection algorithm selects features containing more discriminative information based on the principle of small intra-class distance and large inter-class distance. The larger the value of the filtering feature selection algorithm, the more important the feature and the greater its correlation with the class; the information gain ratio scores features by calculating the rate of change of information entropy before and after the feature is used. The larger the value, the more important the feature and the greater its correlation with the class; the Kruskal-Wallis test scores features by calculating whether there are differences in the distribution of different features; a new method for detecting nonlinear correlation between features is the maximum correlation coefficient, which transforms the mutual information value into a new metric by finding the optimal discretization. 4.如权利要求3中所述的一种含大规模风电并网的特征选择集成方法,其特征在于,在特征间冗余度方面,为平衡相关性和冗余度,克服互信息偏向于取值较多属性,基于归一化进行互信息的改进,即归一化形式NMI为,其中,分别为xy的熵,用来度量一个事件具有多个状态下的信息量,即信息量关于事件概率分布的期望值。4. The feature selection and integration method for large-scale wind power grid connection as described in claim 3, characterized in that, regarding feature redundancy, to balance correlation and redundancy and overcome the bias of mutual information towards attributes with more values, mutual information is improved based on normalization, i.e., the normalized form NMI is... ,in, and The entropy of x and y are respectively used to measure the amount of information in an event with multiple states, that is, the expected value of the amount of information with respect to the probability distribution of the event. 5.如权利要求2中所述的一种含大规模风电并网的特征选择集成方法,其特征在于,在构建所述多组嵌套候选特征子集的过程中,基于不同的初始多维数据特征进行搜索,为划分相关性测度和冗余度间的权重,引入分位数,通过对所引入的分位数进行迭代赋值,求取不同权重下的最优特征序列,不同权重下的最优特征序列即为多组嵌套候选特征子集。5. The feature selection integration method for large-scale wind power grid connection as described in claim 2, characterized in that, in the process of constructing the multiple nested candidate feature subsets, a search is performed based on different initial multidimensional data features. In order to divide the weight between correlation measure and redundancy, quantiles are introduced. By iteratively assigning values to the introduced quantiles, the optimal feature sequence under different weights is obtained. The optimal feature sequence under different weights is the multiple nested candidate feature subsets. 6.如权利要求1中所述的一种含大规模风电并网的特征选择集成方法,其特征在于,采用支持向量机计算所得到的多组嵌套候选特征子集的分类精度,基于分类精度验证各组嵌套候选特征子集的优劣,选择出分类性能达到最优的嵌套候选特征子集,完成分类精度最大的候选特征子集的记录。6. The feature selection integration method for large-scale wind power grid connection as described in claim 1, characterized in that the classification accuracy of multiple nested candidate feature subsets obtained by support vector machine calculation is used, the merits of each nested candidate feature subset are verified based on the classification accuracy, the nested candidate feature subset with the best classification performance is selected, and the candidate feature subset with the highest classification accuracy is recorded. 7.一种含大规模风电并网的特征选择集成系统,采用了如权利要求1-6中任一项所述的一种含大规模风电并网的特征选择集成方法,其特征在于,包括:7. A feature selection integration system for large-scale wind power grid connection, employing a feature selection integration method for large-scale wind power grid connection as described in any one of claims 1-6, characterized in that it comprises: 获取模块,其被配置为获取含风电并网响应特征的多维特征数据;The acquisition module is configured to acquire multidimensional feature data containing wind power grid connection response characteristics; 构建模块,其被配置为基于预设的最优相关性及冗余评价准则对所获取的多维数据特征进行增量搜索,构建多组嵌套候选特征子集;The construction module is configured to perform incremental search on the acquired multidimensional data features based on preset optimal relevance and redundancy evaluation criteria, and construct multiple nested candidate feature subsets. 计算模块,其被配置为计算所构建的多组嵌套候选特征子集的分类精度;The computation module is configured to calculate the classification accuracy of the constructed multiple nested subsets of candidate features; 选择集成模块,其被配置为记录所述分类精度最大的候选特征子集,验证所记录的候选特征子集的维度,得到最优候选特征子集,完成特征选择集成。The selection integration module is configured to record the candidate feature subset with the highest classification accuracy, verify the dimensionality of the recorded candidate feature subset, obtain the optimal candidate feature subset, and complete the feature selection integration. 8.一种计算机可读存储介质,其上存储有程序,其特征在于,该程序被处理器执行时实现如权利要求1-6中任一项所述的含大规模风电并网的特征选择集成方法中的步骤。8. A computer-readable storage medium having a program stored thereon, characterized in that, when executed by a processor, the program implements the steps of the feature selection integration method for large-scale wind power grid connection as described in any one of claims 1-6. 9.一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-6中任一项所述的含大规模风电并网的特征选择集成方法中的步骤。9. An electronic device comprising a memory, a processor, and a program stored in the memory and executable on the processor, characterized in that, when the processor executes the program, it implements the steps of the feature selection integration method for large-scale wind power grid connection as described in any one of claims 1-6.
CN202211055817.XA 2022-08-31 2022-08-31 A feature selection integration method and system for large-scale wind power grid connection Active CN115374874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211055817.XA CN115374874B (en) 2022-08-31 2022-08-31 A feature selection integration method and system for large-scale wind power grid connection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211055817.XA CN115374874B (en) 2022-08-31 2022-08-31 A feature selection integration method and system for large-scale wind power grid connection

Publications (2)

Publication Number Publication Date
CN115374874A CN115374874A (en) 2022-11-22
CN115374874B true CN115374874B (en) 2025-11-14

Family

ID=84069147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211055817.XA Active CN115374874B (en) 2022-08-31 2022-08-31 A feature selection integration method and system for large-scale wind power grid connection

Country Status (1)

Country Link
CN (1) CN115374874B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116421457B (en) * 2023-02-28 2025-09-16 山东大学 Cardiopulmonary resuscitation system and method based on noninvasive prediction of coronary perfusion pressure
CN118940973A (en) * 2024-10-12 2024-11-12 山东国研自动化有限公司 A big data-based energy operation command and management system and method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103278326A (en) * 2013-06-14 2013-09-04 上海电机学院 Method for diagnosing faults of wind generating set gear case
CN108875795A (en) * 2018-05-28 2018-11-23 哈尔滨工程大学 A kind of feature selecting algorithm based on Relief and mutual information

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AUPQ131399A0 (en) * 1999-06-30 1999-07-22 Silverbrook Research Pty Ltd A method and apparatus (NPAGE02)
CN103699523B (en) * 2013-12-16 2016-06-29 深圳先进技术研究院 Product classification method and apparatus
CN108694470B (en) * 2018-06-12 2022-02-22 天津大学 Data prediction method and device based on artificial intelligence

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103278326A (en) * 2013-06-14 2013-09-04 上海电机学院 Method for diagnosing faults of wind generating set gear case
CN108875795A (en) * 2018-05-28 2018-11-23 哈尔滨工程大学 A kind of feature selecting algorithm based on Relief and mutual information

Also Published As

Publication number Publication date
CN115374874A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN112699605B (en) Charging pile fault element prediction method and system
CN115374874B (en) A feature selection integration method and system for large-scale wind power grid connection
CN115576502B (en) Data storage method and device, electronic equipment and storage medium
CN110766137A (en) Power electronic circuit fault diagnosis method based on longicorn whisker optimized deep confidence network algorithm
CN109960808B (en) A text recognition method, device, equipment and computer-readable storage medium
CN103488790A (en) Polychronic time sequence similarity analysis method based on weighting BORDA counting method
CN116226629B (en) Multi-model feature selection method and system based on feature contribution
CN106951778A (en) A kind of intrusion detection method towards complicated flow data event analysis
CN107491425A (en) Determine method, determining device, computer installation and computer-readable recording medium
CN117973899B (en) Land development and management information intelligent management system based on big data
CN114611976A (en) Power consumer behavior portrait method, system and device
CN118478695B (en) A safety warning method, device and electronic equipment for power battery
CN114429172A (en) Load clustering method, device, equipment and medium based on transformer substation user constitution
CN117520980A (en) An outlier detection method based on local sparsity factor
CN116089876A (en) Electrical appliance load identification method and system
CN115618244A (en) Gaussian mixture model initialization method and device based on peak density clustering, and clustering method and device
CN111221915A (en) Online learning resource quality analysis method based on CWK-means
CN119128077B (en) A matching method and system for semantic understanding and question answering
CN114722875A (en) Electric power data identification method and system based on multi-domain feature analysis and feature selection
US7583845B2 (en) Associative vector storage system supporting fast similarity search based on self-similarity feature extractions across multiple transformed domains
CN119513631A (en) Pneumonia virus gene data multi-view clustering integration method, device and electronic device
CN119128567A (en) A wind power output analysis method based on WOA-K-means
JP7800305B2 (en) Training data generation program, device, and method
CN113792141B (en) Feature selection method based on covariance measurement factor
CN110426612A (en) A kind of two-stage type transformer oil paper insulation time domain dielectric response characteristic quantity preferred method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant