CN117874532A

CN117874532A - Short-term wind power output scene generation method for data-missing wind power plant

Info

Publication number: CN117874532A
Application number: CN202311543456.8A
Authority: CN
Inventors: 谭玉华; 张迁; 陈奇林; 余诺
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2023-11-17
Filing date: 2023-11-17
Publication date: 2024-04-12

Abstract

The invention relates to a short-term wind power output scene generation method of a data-missing wind power plant, which comprises the following steps: acquiring source domain data in a reference wind power plant and a small amount of sample data of a target wind power plant, and screening and purifying the source domain data in the reference wind power plant by adopting a similar data domain matching method so as to improve the accuracy of data migration from the reference wind power plant to the target wind power plant; pre-training the purified source domain data by using a C-DCGAN model, and extrapolating a training model containing common characteristics to a target wind power plant through transfer learning; and fine-tuning the migrated model by using a small amount of samples of the target wind power plant, so as to obtain a new scene generation model. The method solves the problem that the wind power output scene generation model is difficult to accurately describe the actual wind power output characteristics and probability distribution rules, and improves the effects of learning and transferring and C-DCGAN model training and the accuracy of wind power plant wind power output scene generation with data loss.

Description

Short-term wind power output scene generation method for data-missing wind power plant

Technical Field

The invention relates to the technical field of renewable energy power generation, in particular to a method for generating a short-term wind power output scene of a data-missing wind power plant by considering space-time correlation.

Background

Under the background of low carbonization transformation of an energy structure, the proportion of wind power generation in a power system is continuously improved, and accurate modeling of wind power output has important significance for realizing reasonable planning operation of the power system. The existing wind power output scene generation method mainly comprises a scene generation method based on a probability model and a scene generation method based on deep learning. The scene generation method based on the deep learning does not need to carry out statistical assumption on the data, and the wind power output data is utilized to train and update the deep learning model, so that the unknown distribution and the complex characteristics of the wind power output can be accurately, flexibly and effectively depicted.

However, for partially newly-built and expanded wind power plants, historical data of the wind power plant have the conditions of missing or insufficient, and the like, the uncertainty of wind power output cannot be well reflected by the wind power output scene generated by the existing scene generation method based on deep learning, and the accuracy of scene data is reduced. In addition, some researches propose to realize scene generation of the data-missing wind power plant by utilizing the wind power output space-time correlation of adjacent wind power plants, but not all wind power output characteristics of the adjacent wind power plants are similar to those of the data-missing wind power plant, and the degree of similarity also can cause poor effect of data migration and accuracy of scene generation.

Therefore, in the generation of the output scene of the wind power plant with missing data, how to screen effective adjacent wind power plants and similar output data thereof, and further realize the accurate generation of the short-term wind power output scene of the wind power plant are important problems to be researched.

Disclosure of Invention

Aiming at the problem that a wind power output scene generation model is difficult to accurately describe actual wind power output characteristics and probability distribution rules, the short-term wind power output scene generation method for the data-missing wind power plant is provided. The invention provides an accurate and effective short-term wind power output scene generation method of a data loss wind power plant, which is formed by organically combining and improving a similar data domain matching method, transfer learning (Transfer Learning, TL), a condition convolution generation countermeasure network (Conditional Deep Convolutions Generative Adversarial Networks, C-DCGAN) and a parameter optimization method. According to the method, source domain data of the wind power plant with adjacent geographic positions and sufficient data can be screened and purified, similar wind power output characteristics of the source domain data can be extracted, and then the source domain data is transferred into the wind power plant with missing data through learning and training, so that an accurate wind power scene generation model is obtained. The invention not only provides a specific principle and a flow framework of the method, but also analyzes and expresses a large number of data sequences and principles in detail from the perspective of mathematical modeling. In order to verify the effectiveness and the advantages of the method, simulation experiments and case comparison analysis are also carried out according to the actual wind power data of the national renewable energy laboratory (National Renewable Energy Laboratory, NREL), and the result shows that the method is more accurate and effective.

The technical scheme of the invention is as follows:

a short-term wind power output scene generation method of a data-missing wind power plant comprises the following steps:

step one: acquiring source domain data in a reference wind power plant and a small amount of sample data of a target wind power plant, and screening and purifying the source domain data in the reference wind power plant by adopting a similar data domain matching method so as to improve the accuracy of data migration from the reference wind power plant to the target wind power plant;

step two: pre-training the purified source domain data by using a C-DCGAN model, and extrapolating a training model containing common characteristics to a target wind power plant through transfer learning;

step three: and fine-tuning the migrated model by using a small amount of samples of the target wind power plant so as to obtain a new scene generation model, wherein the scene generation model can realize accurate generation of a short-term wind power output scene of the data-missing wind power plant.

Further, the similar data domain matching method in the first step specifically comprises the following steps:

step1.1, automatically dividing a source domain data sequence of a plurality of reference wind power plants by using a Toeplitz inverse covariance algorithm, and obtaining a target domain sequence which is a target wind power plant under the condition of retaining the power generation characteristic, wherein the divided source domain subsequences with similar time length are expressed as follows by mathematics:

The source domain data sequence is as follows equation (1):

the segmented source domain subsequence is as follows formula (2):

the target domain sequence is as follows equation (3):

P _tgt ＝[y ₁ ,y ₂ ,…,y _T ] ^T (3)

in the formula (1-3), P _ref The method comprises the steps that a source domain data matrix formed by output data of a plurality of reference wind power plants is shown, the source domain data matrix is a known multidimensional time sequence input variable, wherein the number of the reference wind power plants is w, each row represents the output data of one reference wind power plant, each column represents the output data at one moment, and t represents the time length, namely the matrix has the output data at t moments; x is X ₁ ,…,X _k Representing k source domain output subsequences obtained after being partitioned by TICC algorithm and clustered according to correlation between dimensions, the subsequences being w x delta T in dimension _i A matrix of (i=1, …, k), each row in the matrix representing output data of a reference wind farm, each column representing output data at a time; delta T ₁ ,…,ΔT _k Representing the time length of each sub-sequence, satisfying DeltaT ₁ +ΔT ₁ +…+ΔT ₁ ＝t；P _tgt The method comprises the steps that a target domain output sequence of a target wind power plant is represented, y in a matrix represents wind power output data of the target wind power plant at each moment, and T represents the time length of the wind power output data;

step1.2, quantitatively calculating the similarity of the source domain subsequence and the target domain sequence by using a time dynamic normalization algorithm, and firstly constructing a nonlinear folded path between the source domain subsequence and the target domain sequence by using a normalization function so as to realize nonlinear mapping between sequences with different time lengths; the jth subsequence for the ith reference wind farm In other words, it is matched with the target wind farm output sequence P _tgt Can be used +.>A representation; to calculate the distance, the dimension DeltaT is constructed using the data of the two sequences _j The calculation of each element in the cumulative distance matrix L, L of x T is as follows formula (4):

wherein x is _i,m Representing the output data of the ith reference wind power plant at the mth moment, y _n Output data of the target wind power plant at the nth moment, l _m,n Elements representing the mth row and the nth column in the cumulative distance matrix L;

matrix elementsIt indicates the minimum bending path distance +.>I.e. < ->On the basis of this, the similarity contribution coefficient between these two sequences is further calculated by the following formula (5)>The larger the value of the coefficient is, the j sub-sequence of the i-th reference wind park is +>Output sequence P of target wind power plant _tgt The higher the similarity between them, i.e. the more similar the force characteristics between them;

in the method, in the process of the invention,representing the coefficient of similarity contribution, ++>Representing the minimum inflection path distance between the jth subsequence of the ith reference wind farm and the target wind farm output sequence, +.>Represents the maximum value of all minimum bending paths, < ->

Step1.3, calculating similarity contribution coefficients according to a formula (5) sequentially from source domain subsequences of all reference wind power plants and target domain sequences of target wind power plants, so as to obtain a similarity contribution coefficient matrix C with a dimension of w multiplied by k, wherein the similarity contribution coefficient matrix C is shown in a formula (6):

Setting a reasonable similarity contribution coefficient threshold value, comparing each element in the matrix C with the threshold value, and taking a reference wind power plant source domain subsequence corresponding to a certain column element in the matrix as new source domain data when the column element is larger than the threshold value; for example, if all elements of columns i, …, j, …, k are greater than the threshold, then the new source domain data is represented by the following formula (7):

further, the specific operation of the second step is as follows:

step2.1, establishing a C-DCGAN basic model; based on the C-DCGAN model, establishing a wind power plant output scene generation model; the loss function of the generator and the discriminator is constructed, and the Wasserstein distance, the Kantonovich-Rubistein dual form and the gradient penalty function are further introduced to improve on the basis of combining the following formula (8) and the formula (9), so that the objective function in the C-DCGAN training process is shown in the following formula (10):

wherein L is _G ,L _D Representing the objective function of generator and countermeasure, E [ E ]]Representing expected values of the corresponding data distribution, D () representing a discriminator function, G () representing a generator function, Z representing the input noise data, c representing a conditional input, P representing historical wind power data,represents random sampling of generated data and historical wind power data on a connecting line, epsilon represents random numbers, the value range is 0-1, the weight coefficient lambda=10, and P _Z ,P _P ,P _G Respectively representing noise data, historical wind power data and probability distribution of generated data;

after multiple rounds of game training of the generator and the arbiter, probability distribution of the data G (Z|c) is generated, and network structure parameters of the generator and the arbiter are generatedNumber theta ^(G) And theta ^(D) The wind power generation system is also continuously regulated, optimized and updated, and finally a training model capable of accurately generating a wind power scene is obtained;

step2.2, determining a C-DCGAN model parameter optimization algorithm: when the network structure parameters are optimally updated, a gradient descent algorithm RMSProp with self-adaptive learning rate is adopted, the parameters are trained in a batch updating mode, gradient values of the parameters are calculated according to a loss function of a generator and a discriminator, and then the network structure parameters are updated by combining the learning rate, wherein the specific formula (11-14) is as follows:

in the method, in the process of the invention,and->Representing the gradients of the generator and the arbiter, N _bat Represents the training size of the batch, alpha represents the learning rate, Z ⁽ⁱ⁾ ,P ⁽ⁱ⁾ ,c ⁽ⁱ⁾ And->Representing the i-th group of noise input, historical wind power data, condition input and random sampling of the generated data and the historical wind power data on the connection line, theta' ^(G) And theta'. ^(D) Respectively representing updated generator and arbiter network parameters, r _(～) Representing the cumulative squared gradient, initiallyThe value is 0- >Represents the updated cumulative square gradient, delta is a small constant, and is set to be 1×10 ^-6 ρ is the decay rate, set to 0.9;

an improved C-DCGAN model is obtained after optimization;

step2.3, migrating the model trained by the reference wind power plant to a target wind power plant, and further training a scene generating model by utilizing target wind power plant data:

when the improved C-DCGAN model is utilized to conduct scene modeling on wind power output of a wind power plant, if the reference wind power plant source domain data P 'after screening and purifying is adopted in the real historical wind power data P' _ref I.e. p=p' _ref The generator and the arbiter are subjected to multiple game training to obtain a training model with reference wind power plant output characteristics, and the training model is marked asAssume that the network structure parameter of the training model is +.>And->

Training model to be referenced to wind farmExtrapolation to a target wind farm for its output scenario generation by transfer learning; in order to obtain a training model capable of completely and accurately reflecting the output characteristics of the target wind power plant, the training model is marked asThe transferred training model is also needed to be used by utilizing the historical wind power data of the target wind power plantFurther adjusting and training; that is, the network structure parameters of the improved C-DCGAN model are shifted along +. >And->The actual data input of the discriminator is to adopt the historical wind power data of the target wind power plant, namely P=P _tgt Performing game training on the generator and the discriminator on the basis, and obtaining a training model with the output characteristics of the target wind power plant after training is finished>Wherein the network structure parameters of the generator and the discriminator are +.>And

further, the specific operation of the third step is as follows:

based on transfer learning, generating an improved C-DCGAN model and a data missing wind power plant short-term wind power output scene with optimized parameters; in order to obtain a short-term wind power output scene of a target wind power plant on a specific day, a related constraint condition is added to randomly generated noise input Z, and a specific numerical value of Z is obtained through optimization; knowing the complete historical wind power data of the reference wind power plant and a small amount of historical wind power data of the target wind power plant, if the wind power output data of the target wind power plant on the ith day is missing, the output scene of the target wind power plant can be accurately generated according to the following steps:

step3.1: defining random noise data Z with dimension of 1×100 and inputting training model M _des The generator generates 576 wind power output data, and then the data are divided into two parts G in time sequence ₁ (Z) and G ₂ (Z) they correspond to the target wind farm generation scenes on day i-1 and day i, respectively;

step3.2: establishing a data loss wind power plant short-term wind power output scene generation parameter optimization model, wherein the model objective function and constraint conditions are shown in the following formula (15),the model requires G ₁ The data of (Z) is as same as possible with the historical wind power data of the target wind power plant on the i-1 th day, and G is required ₂ The data of (Z) fluctuates within a certain interval range around the historical wind power data of the ith day of the reference wind power plant, namely the time-space correlation characteristics of adjacent wind power plants are satisfied,

in the method, in the process of the invention,historical wind power data representing the i-1 th day of the target wind farm, < >>Historical wind power data obtained after data screening and purification are carried out on the ith day of a reference wind power plant, and sigma represents output interval control parameters;

step3.3: converting the formula (15) into an unconstrained optimization problem shown as the formula (16) through the introduction of a logarithmic barrier function, and further adopting a momentum gradient descent method to optimize and solve to obtain Z;

wherein τ and v represent weight parameters;

step3.4: training the optimized Z input modelThe short-term output scene of the ith day of the target wind power plant can be accurately generated.

Further, in the first step, source domain data in the reference wind power plant and a small amount of sample data of the target wind power plant are obtained, specifically: selecting a wind power public data set issued by a national renewable energy laboratory in the United states as an experimental data source for carrying out calculation analysis; in the experimental process, 4 wind power plants with adjacent geographic positions are selected for testing, wherein 3 wind power plants are set as reference wind power plants with sufficient data, and the data are used as source domain data; the other 1 was set as the target wind farm and it was assumed that there was a wind power data loss.

The invention has the beneficial effects that:

(1) According to the method for matching similar data fields, source field data of adjacent reference wind power plants are screened and purified in advance, and the learning and migration effects and the C-DCGAN model training effects and the accuracy of generating wind power output scenes of the wind power plants with data loss can be improved to a certain extent.

(2) According to the invention, through transfer learning and C-DCGAN model training, similar wind power output characteristics of adjacent reference wind power plants can be accurately extracted and transferred to a target wind power plant with data loss, so that the problem of too few training samples caused by data loss or deficiency of newly built or expanded wind power plants is effectively solved.

(3) The method provided by the invention can accurately generate a short-term wind power scene of a specific day of the data-missing wind power plant, and compared with other methods, the result can better show the actual wind power output characteristics and distribution rules, and the error is smaller.

(4) The method provided by the invention can be popularized and applied to scene generation and prediction of other new energy stations, and can provide theoretical technical support for engineering practical problems such as operation planning of a novel power system or a comprehensive energy system.

Drawings

FIG. 1 is a schematic diagram of the overall concept of the method of the present invention;

FIG. 2 is a diagram of known data and missing data of a reference wind farm and a target wind farm according to the present invention;

FIG. 3 is a diagram of a short-term wind power output scenario generation process for a data loss wind farm of the present invention;

FIG. 4 is a graph comparing a wind power output scenario generated by the method of the present invention with measured wind power output history data, wherein (a) is 4/5/2012 and (b) is 11/12/2012;

FIG. 5 is a graph comparing autocorrelation coefficients of wind power output scenes and measured wind power output history data generated by the method under different time delays, wherein (a) is 4 months and 5 days in 2012, and (b) is 11 months and 12 days in 2012;

FIG. 6 is a graph comparing probability density functions of wind power output scene and measured wind power output history data generated by the method with cumulative probability density functions, wherein (a) is 4 months 5 days 2012, and (b) is 11 months 12 days 2012.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

Some technical term concepts within the present invention are as follows:

Target wind farm: newly built or expanded wind power plant with missing wind power output data.

Reference wind farm: the wind power output data is sufficient and the wind power plant is geographically adjacent to the target wind power plant.

Target domain data: and (5) wind power output historical data of the target wind power plant.

Source domain data: and referring to wind power output historical data of the wind power plant.

The conditional convolution generates an antagonism network (Conditional Deep Convolutions Generative Adversarial Network, C-DCGAN) which is a model of generation that learns by way of two neural networks that gamble with each other. The basic idea is that two persons in the game theory play zero and game, and two persons participating in game play are a generator and a discriminator which are formed by a deep convolutional neural network. Meanwhile, the condition convolution generation countermeasure network combines the supervised learning and the unsupervised learning technologies, and introduces the condition variable input, so that the condition convolution generation countermeasure network can learn the sample probability distribution meeting the corresponding condition, and has a good effect on generalization of the data samples of the specified type.

Generator and discriminator: the generator and the discriminator are deep learning networks comprising a plurality of convolution layers, batch normalization layers and activation functions, the generator mainly learns potential characteristics of real data to generate false data which is the same as the real data as far as possible, the discriminator makes true and false judgment on the false data generated by the real data and the generator as far as possible, the judgment result is fed back to the generator and the discriminator to carry out new learning training and optimizing updating, and the training is stopped until the generated data discriminator cannot distinguish true and false, and Nash equilibrium is achieved between the generator and the discriminator.

Migration learning: the transfer learning is an important machine learning method, has stronger data fitting capability and stronger generalization capability, and has the core idea that samples, features, models or relations of a certain field are applied to other fields through learning and transfer from the similarity.

Noise data Z: a randomly generated set of data that conforms to a gaussian distribution is convolved as a conditional convolution to generate a noise input against the network.

In the aspect of generating a wind power output scene of a wind power plant based on deep learning, document [1] proposes to learn the time-space correlation of renewable energy output by using a generating countermeasure network (Generative Adversarial Network, GAN), and to use an optimal transmission distance as a loss function of a discriminator to improve the network training quality, and establishes a renewable energy output scene generating model based on GAN. For this purpose, document [2] uses CGAN model to enhance learning training goals by fusing supervised learning techniques and introducing conditional variable constraints in the game learning structure of GAN. According to the method, the optimal transmission distance is used as a loss function of a discriminator, a network structure suitable for generating the renewable energy day-ahead scene is designed, a generator learns the mapping relation between noise distribution and day-ahead scene set under the prediction condition through game training of the countermeasure network, and a generation result shows that the model can describe day-ahead wind power uncertainty more accurately. However, the scene generating method based on deep learning needs to take a large amount of historical data as a premise, and for partially newly-built and expanded new energy stations, the historical data are in the condition of missing or insufficient, and the like, so that the traditional scene generating method is difficult to support to extract a data distribution rule, and further the generated new energy power generation scene cannot better reflect the uncertainty of new energy output, the accuracy of scene data is reduced, and the like.

In the aspect of generating a wind power output scene of a data-missing wind power plant, the literature [3] learns the data mapping relation between two wind power plants by generating an antagonism network, so that the data migration from the data-sufficient wind power plant to the data-missing wind power plant is realized. However, the document only selects one adjacent wind power plant to perform learning training and data migration, so that the migrated data features are easy to be on one surface, and the actual wind power output features of the data missing wind power plant are difficult to be completely embodied. The document [4] overcomes the problems, and learns the wind speed correlation between a plurality of data-sufficient wind power plants and a newly built wind power plant by a joint distribution self-adaptive method, so as to provide a wind speed scene generation model of the newly built wind power plant based on migration learning. However, it should be noted that wind power output characteristics of all adjacent wind power plants are not similar to those of the wind power plants with data missing, and the degree of similarity also can cause poor effect of data migration and accuracy of scene generation. Therefore, in the generation of the output scene of the wind power plant with missing data, how to screen effective adjacent wind power plants and similar output data thereof, and further realize the accurate generation of the short-term wind power output scene of the wind power plant are important problems to be researched.

The references are as follows:

[1]Chen Y,Wang Y,Kirschen D,et al.Model-free renewable scenario generation using generativeadversarial networks[J].IEEE Transactions on Power Systems,2018,33(3):3265-3275.

[2] dong Xiao, sun Yingyun, pu Tianjiao. Condition-based method for generating renewable energy day-ahead scenes for combat networks [ J ]. Chinese Motor engineering Programming, 2020,40 (17): 5527-5536.DOI:10.13334/j.0258-8013.Pcsee.190633.

[3] Zhang Chengsheng, shao Zhenguo, chen Feixiong, etc. a new energy power generation scene data migration method [ J ]. Grid technology for generating an countermeasure network based on conditional deep convolution 2022,46 (06): 2182-2190.DOI:10.13335/j.1000-3673.Pst.2021.1008.

[4]Hu J,Li H.A transfer learning-based scenario generation method for stochastic optimal scheduling of microgrid with newly-built wind farm[J].Renewable Energy,2022,185:1139-1151.

Considering that adjacent wind power stations have space-time correlation and similar wind power output characteristics, the invention mainly introduces a migration learning thought and generates accurate scene generation for short-term wind power output of a data missing wind power station against a network model, and the thought mainly comprises three steps:

1. and acquiring a large amount of data in the reference wind power plant as source domain data and a small amount of data in the target wind power plant as target domain data, and providing a similar data domain matching method to screen and purify the source domain data in the reference wind power plant so as to improve the accuracy of data migration from the reference wind power plant to the target wind power plant.

2. And pre-training the purified source domain data by using a C-DCGAN model, and extrapolating a training model containing common characteristics to a target wind power plant through transfer learning.

3. And fine-tuning the migrated model by using a small amount of samples of the target wind power plant, so as to obtain a new scene generation model, and the model can realize accurate generation of the output scene of the data-missing wind power plant.

For data acquisition, the invention selects a wind power public data set issued by a national renewable energy laboratory in the United states as an experimental data source for carrying out calculation analysis, wherein the data set records various wind power scene data monitored by a plurality of wind power stations in 7 years (1 st in 2007-12 nd in 2013 and 31 th in 2013), and the time interval is 5 minutes. In the experimental process, 4 wind power stations with adjacent geographic positions are selected for testing, so that the wind power output between the wind power stations has space-time correlation. The method comprises the steps that 3 wind power plants are set as reference wind power plants with sufficient data, the data are used as source domain data, and 105120 groups of complete wind power data are selected from 1 month, 1 year, 2012, 12 months, 31 days and one year; and the other 1 is set as a target wind power plant, wind power data of the wind power plant, namely 2 months of 3 months, 4 months and 4 days, 10 months, 11 days and 11 months, is selected as target domain data, and the wind power data of 4 months, 5 days and 11 months and 12 days in 2012 are assumed to be missing.

(1) Similar data domain matching method

When the migration learning method is used for generating the target wind power plant output scene model with data loss, enough similarity or correlation between the target wind power plant output scene model and the reference wind power plant needs to be ensured in advance. Therefore, the invention provides a similar data domain matching method, which performs pretreatment work of screening and purifying source domain data of a reference wind power plant so as to improve learning and migration accuracy. The method comprises the following steps:

step1, automatically dividing source domain data (namely a source domain data sequence) of a plurality of reference wind power plants by using a toeplitz inverse covariance (Toeplitz Inverse Covariance-based modeling, TICC) algorithm, and obtaining a large number of subsequences (namely divided source domain subsequences) with the time length similar to that of target domain data (namely a target domain sequence) of a target wind power plant under the condition of keeping power generation characteristics. These data sequences are represented mathematically as follows:

source field data sequence (equation 1 below)

Segmented source domain subsequence (equation 2 below)

Target domain sequence (equation 3 below)

P _tgt ＝[y ₁ ,y ₂ ,…,y _T ] ^T (3)

In the formula (1-3), P _ref The source domain data matrix representing the output data of a plurality of reference wind power plants is a known multidimensional time sequence input variable, wherein the number of the reference wind power plants is w, each row represents the output data of one reference wind power plant, and each column represents one Output data at each moment, wherein t represents the time length, namely the matrix has output data at t moments; x is X ₁ ,…,X _k Representing k source domain output subsequences obtained after being partitioned by TICC algorithm and clustered according to correlation between dimensions, the subsequences being w x delta T in dimension _i A matrix of (i=1, …, k), each row in the matrix representing output data of a reference wind farm, each column representing output data at a time; delta T ₁ ,…,ΔT _k Representing the time length of each sub-sequence, satisfying DeltaT ₁ +ΔT ₁ +…+ΔT ₁ ＝t；P _tgt And (3) representing a target domain output sequence of the target wind power plant, wherein y in the matrix represents wind power output data of the target wind power plant at each moment, and T represents the time length of the wind power output data.

Step2 the time length DeltaT of each source domain subsequence generated by TICC algorithm segmentation ₁ ,L,ΔT _k The method for judging the similarity by directly calculating the data distance (such as Euclidean distance, manhattan distance and Chebyshev distance) between different sequences is not necessarily the same as the time length T of the target domain sequence, so that the method is not applicable any more, and the similarity between the source domain subsequence and the target domain sequence is quantitatively calculated by using a time dynamic normalization algorithm (Dynamic Time Warping, DTW). The method firstly utilizes a regular function to construct a nonlinear inflection path between a source domain subsequence and a target domain sequence, thereby realizing nonlinear mapping between sequences with different time lengths. The jth subsequence for the ith reference wind farm In other words, it is matched with the target wind farm output sequence P _tgt Can be used +.>A representation; to calculate the distance, the dimension DeltaT is constructed using the data of the two sequences _j The calculation of each element in the cumulative distance matrix L, L of x T is as follows formula (4):

wherein x is _i,m Representing the output data of the ith reference wind power plant at the mth moment, y _n Output data of the target wind power plant at the nth moment, l _m,n Representing the element of the mth row and the nth column in the cumulative distance matrix L.

Matrix elementsIt indicates the minimum bending path distance +.>I.e. < ->On the basis of this, the similarity contribution coefficient between these two sequences is further calculated by the following formula (5)>The larger the value of the coefficient is, the j sub-sequence of the i-th reference wind park is +>Output sequence P of target wind power plant _tgt The higher the similarity between them, i.e. the more similar the force characteristics between them.

Step3, calculating the similarity contribution coefficients of the source domain subsequences of all the reference wind power plants and the target domain sequences of the target wind power plants according to the formula (5) in sequence, so as to obtain a similarity contribution coefficient matrix C with the dimension of w multiplied by k, wherein the similarity contribution coefficient matrix C is shown in the formula (6).

In order to screen and purify a source domain subsequence with high similarity as new source domain data, reduce negative influence of irrelevant data on transfer learning, so as to improve accuracy of a scene generation model, a reasonable similarity contribution coefficient threshold value needs to be set, each element in a matrix C is compared with the threshold value, and when a certain column of elements in the matrix is larger than the threshold value, the reference wind power field source domain subsequence corresponding to the column of elements is used as new source domain data; for example, if all elements of columns i, …, j, …, k are greater than the threshold, the new source domain data is represented by the following formula (7). It should be noted that, because the selected reference wind farm and the target wind farm often have strong space-time correlation, most of the source domain subsequences meet the requirement of high similarity, so that the screened and purified new source domain data can be ensured to have both high similarity and data sufficiency.

(2) C-DCGAN-based wind power output scene modeling

According to the migration learning idea, in order to obtain an output scene generation model of a target wind power plant with data missing on a large scale, output scene modeling can be performed on adjacent reference wind power plants with sufficient data and space-time correlation, and then the generated scene model is migrated into the target wind power plant for application. In the process, the existing historical wind power data is required to be studied and trained by using a C-DCGAN model, so that a wind power output scene generation model is obtained.

Step1, establishing a C-DCGAN basic model.

The C-DCGAN is a deep learning model which is improved based on GAN, the basic idea is that two persons in a game theory play zero and game, two persons participating in the game play are a generator and a discriminator which are formed by a deep convolutional neural network, and a wind power plant output scene generating model is established based on the C-DCGAN model. To describe the differences between the generated data and the real data in the training process, a loss function of the generator and the arbiter is constructed. The generator is optimized to have its loss function L _G Minimizing, the optimization objective of the arbiter is to reduce its loss function L _D Maximization, the two goals need to be combined and optimized during the game training process. In addition, in order to avoid the problems of difficult training, mode collapse, unstable training, inaccurate calculation and the like, the Wasserstein distance, kantorovich-Rubistein dual form and gradient penalty function are further introduced on the basis of combining the following formula (8) and the formula (9), so that an objective function in the C-DCGAN training process can be obtained as shown in the following formula (10).

Wherein L is _G ,L _D Representing the objective function of generator and countermeasure, E [ E ]]Representing expected values of the corresponding data distribution, D () representing a discriminator function, G () representing a generator function, Z representing the input noise data, c representing a conditional input, P representing historical wind power data, Representing random on-line of generated data and historical wind power dataSampling, epsilon represents a random number, the value range is 0-1, the weight coefficient lambda=10, and P _Z ,P _P ,P _G Noise data, historical wind power data and probability distribution of generated data are respectively represented.

After the generator and the discriminator are subjected to multi-round game training, the probability distribution of the generated data G (Z|c) is almost the same as that of the real wind power data P, the discriminator is difficult to distinguish true from false, and the training is finished at the moment. In this process, the network structure parameters θ of the generator and the arbiter ^(G) And theta ^(D) And continuously obtaining adjustment and optimization updating, and finally obtaining a training model capable of accurately generating the wind power scene.

Step2, determining a C-DCGAN model parameter optimization algorithm.

When the network structure parameters are optimally updated, the invention adopts a gradient descent algorithm RMSProp with self-adaptive learning rate, and the parameters are trained in a batch updating mode. The core of the algorithm is that the gradient value of the algorithm is calculated according to the loss function of the generator and the arbiter, and then the network structure parameter is updated by combining the learning rate, in particular the following formula (11-14):

in the method, in the process of the invention,and->Representing the gradients of the generator and the arbiter, N _bat Represents the training size of the batch, alpha represents the learning rate, Z ⁽ⁱ⁾ ,P ⁽ⁱ⁾ ,c ⁽ⁱ⁾ And->Representing the i-th group of noise input, historical wind power data, condition input and random sampling of the generated data and the historical wind power data on the connection line, theta' ^(G) And theta'. ^(D) Respectively representing updated generator and arbiter network parameters, r _(～) Represents the cumulative square gradient, with an initial value of 0, < >>Represents the updated cumulative square gradient, delta is a small constant, and is set to be 1×10 ^-6 ρ is the decay rate, which is set to 0.9.

And obtaining an improved C-DCGAN model after optimization.

Step3, migrating the model trained by the reference wind power plant to the target wind power plant, and further training a scene by using the data of the target wind power plant to generate a model.

Because the historical wind power data of the target wind power plant is in large-scale missing, if the historical wind power data P of the target wind power plant is directly obtained _tgt Game training instead of P input arbiterThe obtained training model is difficult to accurately embody the output characteristics of the target wind power plant. Considering that the reference wind power plant with sufficient data has space-time correlation with the target wind power plant, the training model of the reference wind power plant can be used forExtrapolation to a target wind farm for its output scenario generation is performed by transfer learning. It should be noted, however, that the training model +.>The output characteristics of the target wind power plant are not fully provided, because the target wind power plant is only trained by using the historical wind power data of the reference wind power plant, and only has common characteristics similar to those of the target wind power plant. Therefore, in order to obtain a training model (marked +.>) The transferred training model is also needed to be used by utilizing the historical wind power data of the target wind power plantFurther adjustment and training. That is, the network structure parameters of the improved C-DCGAN model are transferred along>And->However, the actual data input of the discriminator is to adopt the historical wind power data of the target wind power plant, namely P=P _tgt Performing game training on the generator and the discriminator on the basis, and obtaining a training model with the output characteristics of the target wind power plant after training is finished >Wherein the network structure parameters of the generator and the discriminator are +.>And->

(3) Short-term wind power output scene generation of data missing wind power plant based on transfer learning and C-DCGAN and parameter optimization

Due to training the modelIs trained from a large amount of historical wind power data in a long time dimension, and if only randomly generated noise data Z in a dimension of 1 multiplied by 100 is directly input into a generator, the date of a generated wind power scene is unknown. Therefore, in order to obtain a short-term wind power output scene of a target wind power plant on a specific day, a related constraint condition is added to randomly generated noise input Z, and a specific numerical value of Z is obtained through optimization. As shown in fig. 3, given the complete historical wind power data of the reference wind power plant and a small amount of historical wind power data of the target wind power plant, if the wind power output data of the target wind power plant on the i-th day is missing, the output scene of the target wind power plant can be accurately generated according to the following steps:

step1: defining random noise data Z with dimension of 1×100 and inputting training model M _des The generator generates 576 wind power output data, and then the data are divided into two parts G in time sequence ₁ (Z) and G ₂ (Z) which correspond to the target wind farm generation scenes on day i-1 and day i, respectively.

Step2: establishing a data loss wind power plant short-term wind power output scene generation parameter optimization model, wherein a model objective function and constraint conditions are shown in the following formula (15), and the model requires G ₁ The data of (Z) is as same as possible with the historical wind power data of the target wind power plant on the i-1 th day, and G is required ₂ The data of (Z) fluctuates within a certain interval range around the historical wind power data of the ith day of the reference wind power plant, namely the time-space correlation characteristics of the adjacent wind power plants are met.

In the method, in the process of the invention,historical wind power data representing the i-1 th day of the target wind farm, < >>And (5) representing historical wind power data obtained after data screening and purification are carried out on the ith day of the reference wind power plant, wherein sigma represents output interval control parameters.

Step3: by introducing a logarithmic barrier function, the optimization model (15) is converted into an unconstrained optimization problem (formula 16 below) shown in formula 16, and then a momentum gradient descent method is adopted to optimize and solve to obtain Z.

Where τ and v represent weight parameters.

Step4: training the optimized Z input modelThe short-term output scene of the ith day of the target wind power plant can be accurately generated.

The short-term wind power generation scene generation process of the data missing wind power plant is shown in fig. 3.

In order to verify the accuracy of the short-term wind power output scene generation method of the data-missing wind power plant, the following experiment is carried out, and the experimental result is analyzed as follows:

Fig. 4 to 6 respectively show comparison graphs of wind power output scenes and actually measured wind power output historical data generated by the proposed method, comparison graphs of autocorrelation coefficients of the wind power output scenes and actually measured wind power output historical data generated under different time delays, and comparison graphs of probability density functions and accumulated probability density functions of the generated wind power output scenes and actually measured wind power output historical data, and meanwhile table 1 respectively compares and shows comparison of the generated results of the proposed method and a scene generation method based on a time sequence method and a scene generation method only adopting a C-DCGAN pre-training model.

As can be seen from fig. 4 (a) and fig. b), the method provided by the invention can accurately generate the scene of the missing wind power data (for example, 4/5/11/12/2012) of the target wind power plant on a specific day, the change trend of the generated wind power scene data is basically consistent with that of the real actual measured data, and the interval range formed by the wind power scene data can well envelop the real actual measured data. Meanwhile, the generated data and the real data have similar wind power output characteristics (such as peak-valley characteristics, slope characteristics, fluctuation trend and the like), so that the method can well learn the data mapping relation between the data and the data characteristics and accurately extract the data characteristics, and further migrate the source domain data characteristics of the reference wind power plant to the target wind power plant for model training, thereby realizing accurate scene generation of the target wind power plant with data loss.

As can be seen from fig. 5 (a) and (b), the median (representing the average level) of the autocorrelation coefficient of the generated scenario data of the target wind farm is continuously reduced along with the increase of the time interval, which indicates that the generated wind scenario data accords with the output characteristic of the real wind output in the time dimension, that is, the wind output at the similar moment shows stronger correlation, and the correlation of the wind output at the longer time interval is weaker. As the lengths of the upper and lower quartile boxes can reflect the discrete degree or fluctuation of the data, the data of the generated wind power scene of the target wind power plant in 4 months and 5 days are concentrated and have smaller fluctuation, and the generated wind power scene of the target wind power plant in 11 months and 12 days is scattered and has larger fluctuation. And comparing the box-shaped graph with the autocorrelation coefficients of the real measured data of the target wind power plant represented by red dots to obtain that the autocorrelation coefficients of the real measured data are located between the upper extreme value and the lower extreme value of the autocorrelation coefficients of the generated scene, which means that the time sequence correlation of the generated scene and the real measured data is similar, but a certain error exists. Therefore, the short-term output scene of the target wind power plant generated by the method provided by the invention is basically consistent with the real situation in time sequence correlation.

In fig. 6, PDF and CDF represent a probability density function and an accumulated probability distribution function, respectively, and as can be seen from (a) and (b), two probability distribution curves of generated scene data of 4 months and 5 days in 2012 are substantially the same as that of actual measured data within a certain error range. The probability density curve of the generated scene data is steeper and concentrated, the displayed volatility is slightly smaller than that of the actual situation, and the accumulated distribution curve has small deviation from the actual situation when the wind power output is smaller than 1.2MW, because the accuracy of the wind power scene generation on the same day depends on the target domain data of 4 months and 4 days of the target wind power plant and the source domain data of 4 months and 5 days of the reference wind power plant to a certain extent. In order to verify the accuracy of the method, the probability distribution curve of 11/12/2012 is further analyzed to see that the scene data generated by the target wind power plant on the day is almost completely consistent with the two probability distribution curves of the real measured data, so that the distribution characteristics of the generated scene data are proved to be consistent with the actual conditions, and the method can be used for verifying that the short-term wind power data missing on a specific day of the target wind power plant can be accurately generated.

To further verify the accuracy of the method of the present invention, the experimental cases simultaneously employ a time series method (Time Series Method, TSM), a C-DCGAN-based pre-training model method to generate short-term wind power output scenes for the target wind farm with data loss, and calculate root mean square error (Root Mean Squared Error, RMSE), mean absolute error (Mean Absolute Error, MAE) and decision coefficient (Coefficient of Determination, R) from the generated wind power scene data ² ) Comparative analyses are shown in table 1.

Table 1 different scene generation method comparison results

Compared with the method provided by the invention, the data of the target new energy station is seriously lost, and only less than 10% of data in one year is needed, the data scale of the method can not support the parameter fitting of the traditional scene generating method based on the time sequence method, the under fitting phenomenon is very easy to occur, and the generating result has larger error and lower accuracy. If only the C-DCGAN model obtained by training the source domain data of the reference wind power plant is used for scene generation, namely, a C-DCGAN-based pre-training model method is not used for migrating the model to the target wind power plant any more and the target domain data of the target wind power plant is used for model fine adjustment, the method can also be used for generating more accurate scenes for the wind power data which is lost in the target wind power plant at a certain day. However, because only similar output characteristics of the adjacent reference wind power plants are utilized, learning and training of the characteristic output characteristics of the target wind power plant are lacking, and therefore higher accuracy is still lacking in generating scene data of the target wind power plant. On the basis, the invention provides a short-term wind power scene generation method based on a similar data domain matching method, transfer learning, C-DCGAN and a parameter optimization method. According to the data in the table, root mean square error and average absolute error calculated by using the wind power scene data generated by the method provided by the invention in 2012, namely 4 th and 5 th, are 0.96 and 0.70, respectively reduced by 63.5% and 55.4% compared with a time sequence method, respectively reduced by 33.8% and 32.7% compared with a simple C-DCGAN method, and the determination coefficient is also closer to 1. As can be seen from a further analysis of the data at 11/12/2012, the root mean square error and the average absolute error of the proposed method are 1.43 and 1.01, which are respectively reduced by 44.4% and 36.1% compared with the time series method, by 25.1% and 34.0% compared with the simple C-DCGAN method, and the determination coefficients are also increased from 0.66 and 0.87 to 0.94. In conclusion, the short-term wind power output scene generation method of the data missing wind power plant provided by the invention has higher accuracy than other methods, and the generated scene data is more similar to the historical data of the target wind power plant.

The above examples represent only 1 embodiment of the present invention, which is described in more detail and detail, but are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. The short-term wind power output scene generation method for the data-missing wind power plant is characterized by comprising the following steps of:

2. The method for generating the short-term wind power output scene of the data loss wind power plant according to claim 1, wherein the similar data domain matching method in the step one specifically comprises the following steps:

the source domain data sequence is as follows equation (1):

the segmented source domain subsequence is as follows formula (2):

the target domain sequence is as follows equation (3):

P _tgt ＝[y ₁ ，y ₂ ，…，y _T ] ^T (3)

in the formula (1-3), P _ref The method comprises the steps that a source domain data matrix formed by output data of a plurality of reference wind power plants is shown, the source domain data matrix is a known multidimensional time sequence input variable, wherein the number of the reference wind power plants is w, each row represents the output data of one reference wind power plant, each column represents the output data at one moment, and t represents the time length, namely the matrix has the output data at t moments; x is X ₁ ,…,X _k Representing k source domain output subsequences obtained after being partitioned by TICC algorithm and clustered according to correlation between dimensions, the subsequences being w x delta T in dimension _i A matrix of (i=1,., k), each row in the matrix representing output data of one reference wind farm, each column representing output data at one time; delta T ₁ ,L,ΔT _k Representing the time length of each sub-sequence, satisfying DeltaT ₁ +ΔT ₂ +…+ΔT _k ＝t；P _tgt The method comprises the steps that a target domain output sequence of a target wind power plant is represented, y in a matrix represents wind power output data of the target wind power plant at each moment, and T represents the time length of the wind power output data;

step1.2, quantitatively calculating the similarity of the source domain subsequence and the target domain sequence by using a time dynamic normalization algorithm, and firstly constructing a nonlinear folded path between the source domain subsequence and the target domain sequence by using a normalization function so as to realize nonlinear mapping between sequences with different time lengths; the jth subsequence for the ith reference wind farmIn other words, it is matched with the target wind farm output sequence P _tgt Can be used +.>A representation; to calculate the distance, the dimension DeltaT is constructed using the data of the two sequences _j The calculation of each element in the cumulative distance matrix L, L of x T is as follows formula (4):

3. The method for generating the short-term wind power output scene of the data-missing wind power plant according to claim 2, wherein the specific operation of the second step is as follows:

after multiple rounds of game training of the generator and the arbiter, probability distribution of data G (Z|c) is generated, and network structure parameters theta of the generator and the arbiter are obtained ^(G) And theta ^(D) The wind power generation system is also continuously regulated, optimized and updated, and finally a training model capable of accurately generating a wind power scene is obtained;

in the method, in the process of the invention,and->Representing the gradients of the generator and the arbiter, N _bat Represents the training size of the batch, alpha represents the learning rate, Z ⁽ⁱ⁾ ,P ⁽ⁱ⁾ ,c ⁽ⁱ⁾ And->Representing the i-th group of noise input, historical wind power data, condition input and random sampling of the generated data and the historical wind power data on the connection line, theta' ^(G) And theta'. ^(D) Respectively representing updated generator and arbiter network parameters, r _(～) Represents the cumulative square gradient, with an initial value of 0, < >>Represents the updated cumulative square gradient, delta is a small constant, and is set to be 1×10 ^-6 ρ is the decay rate, set to 0.9;

an improved C-DCGAN model is obtained after optimization;

In the scene modeling of wind power output of a wind power plant by utilizing an improved C-DCGAN model, if trueThe real historical wind power data P adopts reference wind power plant source domain data P 'after screening and purifying' _ref I.e. p=p' _ref The generator and the arbiter are subjected to multiple game training to obtain a training model with reference wind power plant output characteristics, and the training model is marked asAssume that the network structure parameter of the training model is +.>And->

Training model to be referenced to wind farmExtrapolation to a target wind farm for its output scenario generation by transfer learning; in order to obtain a training model capable of completely and accurately reflecting the output characteristics of the target wind power plant, the training model is marked asThe transferred training model is also needed to be used by utilizing the historical wind power data of the target wind power plantFurther adjusting and training; that is, the network structure parameters of the improved C-DCGAN model are shifted along +.>And->The actual data input of the discriminator is to adopt the historical wind power data of the target wind power plant, namely P=P _tgt Performing game training on the generator and the discriminator on the basis, and obtaining a training model with the output characteristics of the target wind power plant after training is finished>Wherein the network structure parameters of the generator and the discriminator are +. >And

4. the method for generating a short-term wind power output scene of a data loss wind power plant according to claim 3, wherein the specific operation of the third step is as follows:

step3.2: establishing a data loss wind power plant short-term wind power output scene generation parameter optimization model, wherein a model objective function and constraint conditions are shown in the following formula (15), and the model requires G ₁ The data of (Z) is as same as possible with the historical wind power data of the target wind power plant on the i-1 th day, and G is required ₂ The data of (Z) fluctuates within a certain interval range around the historical wind power data of the ith day of the reference wind power plant, namely the time-space correlation characteristics of adjacent wind power plants are satisfied,

wherein τ and v represent weight parameters;

5. The method for generating a short-term wind power output scene of a data loss wind power plant according to claim 1, wherein the method is characterized in that in the first step, source domain data in a reference wind power plant and a small amount of sample data of a target wind power plant are obtained, specifically: selecting a wind power public data set issued by a national renewable energy laboratory in the United states as an experimental data source for carrying out calculation analysis; in the experimental process, 4 wind power plants with adjacent geographic positions are selected for testing, wherein 3 wind power plants are set as reference wind power plants with sufficient data, and the data are used as source domain data; the other 1 was set as the target wind farm and it was assumed that there was a wind power data loss.