CN115034492B

CN115034492B - Non-deterministic energy consumption prediction method and related device under the condition of missing input variables

Info

Publication number: CN115034492B
Application number: CN202210720027.2A
Authority: CN
Inventors: 陈斐然; 周克楠; 招婉媚; 朱迪; 何德卫; 戚建平; 梁永权; 郭子科
Original assignee: Guangdong Power Grid Co Ltd; Foshan Power Supply Bureau of Guangdong Power Grid Corp
Current assignee: Guangdong Power Grid Co Ltd; Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date: 2022-06-23
Filing date: 2022-06-23
Publication date: 2025-01-17
Anticipated expiration: 2042-06-23
Also published as: CN115034492A

Abstract

The present application discloses a non-deterministic energy consumption prediction method and related devices in the case of missing input variables, including: analyzing whether there is an association rule relationship between each known input variable and an unknown input variable in a database according to the association rules defined by the user through the apriori algorithm; when there is an association rule relationship between the known input variable and the unknown input variable, the distribution prediction of the unknown input variable is performed according to the association rules between the two, otherwise, a uniform distribution is established according to the value range of the unknown input variable, so as to obtain the distribution interval of the unknown input variable; sampling the unknown input variable based on the distribution interval to obtain a set of unknown input variables; inputting the set of unknown input variables into an energy consumption model for batch calculation to obtain the distribution range of the energy consumption prediction value; thereby solving the technical problem of low energy consumption prediction accuracy in the prior art when the input variables are missing.

Description

Uncertain energy consumption prediction method and related device under condition of input variable missing

Technical Field

The application relates to the technical field of energy consumption prediction, in particular to a non-deterministic energy consumption prediction method and a related device under the condition of input variable missing.

Background

The energy consumption simulation can help a user to carry out simulation analysis on the construction energy consumption of a building, and the method for establishing the model mainly comprises a forward model and a data driving model. The forward model, also called white-box model, is built based on physical characteristics and laws of physics (e.g., thermal mass balance, momentum mass conservation, etc.) of the subject system. The building of forward models requires in-depth knowledge of the object features, starting from simple relationships, to build complex models describing the whole system. The model based on data driving is also called a black box model, and the basis of the model is that a great amount of test data generated in the operation process of an object system is found by a statistical method, and the mapping relation between variables and outputs is not needed to be known, so that the model is also called a reverse model.

In an actual energy consumption prediction scenario, however, the values of all input variables cannot be obtained accurately in many cases. For example, some input parameters are needed for building a white box model, and relevant input parameters such as convergence tolerance, solar radiation intensity, material roughness and the like are difficult to obtain from drawings and design information completely even in a design stage, so that an established building is not required to be lifted, and because construction and design information are lost for a long time, the data are basically difficult to obtain or difficult to determine, so that input variable information is lost, and further, energy consumption prediction is inaccurate. There is still a lack of more scientific means to improve the problem of missing feature values when using energy consuming software, but it is mostly determined from literature or artificial experience. However, the literature and the artificial experience have limitations, and cannot be applied to all cases, so that filling data is inaccurate, and the energy consumption prediction accuracy is reduced.

Disclosure of Invention

The application provides a non-deterministic energy consumption prediction method and a related device under the condition of input variable deficiency, which are used for solving the technical problem of lower energy consumption prediction precision under the condition of input variable deficiency in the prior art.

In view of this, a first aspect of the present application provides a method for uncertain energy consumption prediction in the absence of input variables, the method comprising:

Analyzing whether an association rule relation exists between each known input variable and each unknown input variable in the database according to association rules defined by a user through an apriori algorithm;

when the association rule relation exists between the known input variable and the unknown input variable, carrying out distribution prediction of the unknown input variable according to the association rule of the known input variable and the unknown input variable, otherwise, establishing uniform distribution according to the value range of the unknown input variable, so as to obtain a distribution interval of the unknown input variable;

sampling the unknown input variable based on the distribution interval to obtain an unknown input variable set;

and inputting the unknown input variable set into an energy consumption model to perform batch calculation to obtain the distribution range of the energy consumption predicted value.

Optionally, analyzing whether an association rule relation exists between each known input variable and each unknown input variable in the database according to the association rule defined by the user through the apriori algorithm, and performing discretization processing on continuous variables in the database.

Optionally, the analyzing, by the apriori algorithm, whether an association rule relationship exists between each known input variable and each unknown input variable in the database according to the association rule defined by the user specifically includes:

And iteratively finding out all frequent item sets meeting a preset threshold value in the database, and constructing an association rule meeting the minimum trust degree of the user by utilizing each frequent item set, so as to analyze whether association rule relations exist between each known input variable and unknown input variable in the database.

Optionally, the sampling the unknown input variable based on the distribution interval to obtain an unknown input variable set specifically includes:

and sampling the unknown input variable by a Morris analysis method based on the distribution interval to obtain an unknown input variable set.

A second aspect of the present application provides a non-deterministic energy consumption prediction system in the absence of input variables, the apparatus comprising:

The analysis unit is used for analyzing whether an association rule relation exists between each known input variable and each unknown input variable in the database according to the association rule defined by the user through an apriori algorithm;

The establishing unit is used for carrying out distribution prediction on the unknown input variable according to the association rule when the association rule relation exists between the known input variable and the unknown input variable, otherwise, establishing uniform distribution according to the value range of the unknown input variable so as to obtain a distribution interval of the unknown input variable;

The sampling unit is used for sampling the unknown input variable based on the distribution interval to obtain an unknown input variable set;

The prediction unit is used for inputting the unknown input variable set into the energy consumption model to perform batch calculation, and obtaining the distribution range of the energy consumption predicted value.

Optionally, the device also comprises a preprocessing unit;

The preprocessing unit is used for discretizing continuous variables in the database.

Optionally, the analysis unit is specifically configured to:

Optionally, the sampling unit is specifically configured to:

A third aspect of the present application provides an uncertain energy consumption prediction apparatus in the event of a missing input variable, the apparatus comprising a processor and a memory:

the memory is used for storing program codes and transmitting the program codes to the processor;

The processor is configured to execute the steps of the method for uncertain energy consumption prediction in the absence of an input variable according to the first aspect according to instructions in the program code.

A fourth aspect of the present application provides a computer-readable storage medium storing program code for executing the non-deterministic energy consumption prediction method in the case of the input variable absence described in the first aspect.

From the above technical scheme, the application has the following advantages:

The application relates to a consumption prediction method, which comprises the steps of discretizing continuous variables, extracting association rules according to the algorithm steps, establishing association between known variables and unknown variables, recovering the continuous value range of the unknown variables after the item set of the unknown characteristics is obtained by the known variable speculation, and obtaining originally missing unknown variable data through abstract mining means. Unlike conventional energy consumption simulation, the energy consumption prediction method of the present application determines the distribution interval of the unknown variable, and the finally given energy consumption prediction result is not a determined numerical value but an interval. The method has great reference significance for energy consumption simulation, especially for energy consumption simulation with unknown input characteristics, has wider applicability, and the prediction result is rich in more information, thereby being more beneficial to helping users to make decisions. Therefore, the technical problem of low energy consumption prediction accuracy under the condition of missing input variables in the prior art is solved.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of a method for predicting uncertain energy consumption in the absence of an input variable according to the present application;

Fig. 2 is a schematic structural diagram of an embodiment of an uncertain energy consumption prediction system in the absence of an input variable according to an embodiment of the present application.

Detailed Description

In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The following is an illustration of the inventive principles of this patent disclosure:

In order to obtain a non-deterministic interval of an energy consumption predicted value, the method is mainly based on an association rule mining algorithm (Apriori) to supplement information of unknown input variables so as to determine the unknown information. The algorithm design basis is to consider that association rules exist among building features, such as the early-built building has higher illumination power density and low equipment aging efficiency. The building performance of the energy-saving reconstruction building is improved, if the energy-saving reconstruction is carried out, the building performance of the energy-saving reconstruction building can be inferred to belong to a better section under certain conditions, the low efficiency of equipment such as a cold air pump and the like indicates low operation and maintenance management level, so that the efficiency of other equipment is inferred to be lower.

Referring to fig. 1, a method for predicting uncertain energy consumption under the condition of missing input variables provided in an embodiment of the present application includes:

step 101, analyzing whether an association rule relation exists between each known input variable and each unknown input variable in a database according to an association rule defined by a user through an apriori algorithm;

the method comprises the steps of screening out a powerful association rule relation between the existing complete input variable data and the unknown input variable in a database according to association rules defined by owners or management staff by using an apriori algorithm.

It should be noted that, the Apriori algorithm is a method proposed by r.agrawal and r.srikant for searching a frequent item set of boolean association rules in a data set, and may use a priori knowledge to predict the association rule of the data, and its name Apriori is also derived therefrom.

First, several basic concepts of association analysis need to be defined:

(1) The K item set is an event containing K elements, and the event meeting a certain minimum support threshold is called frequent K item set.

(2) Support degree-association rule a→b support degree refers to the probability that events a and B (event A, B intersection is empty) occur simultaneously, support=p (AB);

(3) Confidence, meaning the probability of occurrence of B (event A, B intersection is null) on the basis of occurrence of event a, confidence=p (b|a) =p (AB)/P (a);

briefly, the Apriori algorithm is divided into two steps:

1. through iteration, all frequent item sets in the database are found out, and the threshold value of the frequent item set is determined to be set by a user;

2. And constructing a rule meeting the minimum trust degree of the user by using the frequent item set.

The specific steps include first scanning a data set through to generate a 1-item set C1 therefrom. And then, a Scan function is called to Scan C1, the item set which does not meet the minimum support degree is filtered, and the last item set is the frequent item set L1. In the second iteration, only the frequent item sets generated in the previous iteration are required to be subjected to new combination, then a Scan function is called to check whether the support degree of the new combination meets the minimum support degree requirement, and the new combination which is not met is filtered. This loops until no new combinations can be generated. Wherein C1, C2..Ck represents a 1-item set, a 2-item set, a k-item set, L1, L2..Lk represents a frequent item set after "filtering" the corresponding item set, respectively, and Scan represents a data item scanning function that filters the item set that does not meet minimum support. The connection step is divided into two cases, the first is to generate C1 from the dataset and the second is to generate Ck according to Lk-1. Briefly, the join step is the process of generating a set of items. Pruning, namely eliminating the item set which does not meet the minimum support degree. The process is effective because if P (a) > = P (AB), if P (a) < t, P (AB) < t is certain, where t is the support threshold value.

102, When an association rule relation exists between the known input variable and the unknown input variable, carrying out distribution prediction on the unknown input variable according to the association rule of the known input variable and the unknown input variable, otherwise, establishing uniform distribution according to the value range of the unknown input variable, so as to obtain a distribution interval of the unknown input variable;

It can be understood that if there is some correlation between the unknown variable and the known variable in the correlation rule base, the distribution prediction of the unknown characteristic is performed according to the rule between the unknown variable and the known variable, otherwise, uniform distribution is established according to the value range of the unknown variable in the database, so as to obtain the distribution interval of the unknown input variable.

Step 103, sampling the unknown input variable based on the distribution interval to obtain an unknown input variable set;

The embodiment specifically samples the unknown input variable through Morris analysis based on the distribution interval to obtain an unknown input variable set.

It should be noted that the Morris method is also called a meta-effect method, and is a simple and effective method, and some important input factors can be screened from many input factors contained in the model. One meta-effect is defined as follows:

let a model have k independent inputs Xi, i=1, a.m., k, these independent variables vary in a unit volume of k dimensions, the level of variation is p, that is to say the input variable space is discretized into p-level lattice space Ω. For a given input variable X, the meta-effect of dimension i is defined as:

Where the value of Δ is taken from {1p-1,..1-1 p-1}, X+eiΔ is still in Ω, ei is a vector of 1 in the ith dimension and 0 in the remaining dimensions.

The element effect distribution of the i-th dimension input variable is obtained by randomly sampling X in an omega space and is represented by Fi, namely EEi-Fi. The two sensitivity indexes, mu and sigma, proposed by Morris are the mean value and standard deviation of Fi distribution. μ is an estimate of the overall effect of the variable on the result, and σ is an estimate of the overall effect of the variable, with a greater σ indicating that the effect of the variable on the output is greatly affected by the values of other variables. Since some effects will cancel positively and negatively when Fi contains positive and negative values, underestimating the sensitivity of a certain important factor, μ is currently commonly used Campolongo to replace μ, μ is the mean of the absolute value distribution of the meta-effects, which is defined as Gi, |eei|to Gi.

And 104, inputting the unknown input variable set into an energy consumption model to perform batch calculation, and obtaining the distribution range of the energy consumption predicted value.

And finally substituting all sampling results into the energy consumption model to perform batch calculation to obtain the distribution range of the energy consumption predicted value.

Further, in an alternative embodiment, considering that the existing Apriori algorithm is generally only applicable to discrete variables, in order to be applicable to continuous variables, the application firstly discretizes the continuous variables, then extracts association rules according to the algorithm steps, establishes association between known variables and unknown variables, and restores the continuous value range of the unknown variables after the item set of unknown characteristics is obtained by the known variable speculation, so that the original missing unknown variable data can be obtained by an abstract mining means. When the association rule relation does not exist between the known input variable and the unknown input variable, the distribution information of the unknown characteristics is lacking, so that the distribution information is assumed to be uniform distribution, then sampling points of a plurality of unknown variables are obtained through random sampling from the distribution, and all sampling results are substituted into an energy consumption model to carry out calculation and prediction, so that the distribution range of the energy consumption predicted value can be obtained.

For example, through association rule mining, it is found that there is an association between the material thickness and the thermal conductivity, and when the material thickness is a, the probability of p is b, and when the material thickness is a, the probability of p is p=b, and the probability of 1-p is not b.

According to the uncertain energy consumption prediction method under the condition of the input variable missing, continuous variables are discretized, association rules are extracted according to the algorithm steps, association between known variables and unknown variables is established, a term set of unknown characteristics is obtained through speculation of the known variables, the continuous value range of the unknown variables is restored, original missing unknown variable data can be obtained through abstract mining means, when association rules are not related between the known input variables and the unknown input variables, the unknown variables are assumed to be evenly distributed due to the fact that the association rules are not related, sampling points of a plurality of unknown variables are obtained through random sampling from the distribution, and all sampling results are substituted into an energy consumption model to be calculated and predicted, so that the distribution range of energy consumption predicted values can be obtained. Therefore, the technical problem of low energy consumption prediction accuracy under the condition of missing input variables in the prior art is solved.

The above is a method for predicting uncertain energy consumption under the condition of missing input variables provided in the embodiment of the present application, and the following is a system for predicting uncertain energy consumption under the condition of missing input variables provided in the embodiment of the present application.

Referring to fig. 2, an uncertain energy consumption prediction system in the absence of an input variable according to an embodiment of the present application includes:

an analysis unit 201, configured to analyze whether an association rule relationship exists between each known input variable and each unknown input variable in the database according to an association rule defined by a user through an apriori algorithm;

The establishing unit 202 is configured to predict the distribution of the unknown input variable according to the association rule when the association rule relationship exists between the known input variable and the unknown input variable, otherwise, establish uniform distribution according to the value range of the unknown input variable, so as to obtain a distribution interval of the unknown input variable;

a sampling unit 203, configured to sample an unknown input variable based on a distribution interval, to obtain an unknown input variable set;

and the prediction unit 204 is used for inputting the unknown input variable set into the energy consumption model to perform batch calculation, so as to obtain the distribution range of the energy consumption predicted value.

Further, in an embodiment of the present application, there is further provided an apparatus for predicting uncertain energy consumption in the case of missing input variables, where the apparatus includes a processor and a memory:

the processor is configured to execute the method for predicting the uncertain energy consumption in the case of missing input variables according to the embodiment of the method according to the instruction in the program code.

Further, in an embodiment of the present application, there is further provided a computer readable storage medium, where the computer readable storage medium is configured to store program code, where the program code is configured to execute the method for predicting uncertain energy consumption in the case of missing an input variable according to the embodiment of the method described above.

It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working procedures of the above-described system and unit may refer to the corresponding procedures in the foregoing method embodiments, which are not repeated here.

The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above-described figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" is used to describe an association relationship of an associated object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that only a exists, only B exists, and three cases of a and B exist simultaneously, where a and B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one of a, b or c may represent a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk.

While the application has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit and scope of the embodiments of the application.

Claims

1. A method for predicting non-deterministic energy consumption when input variables are missing, characterized by comprising:

All frequent item sets that meet the preset threshold in the database are found through iteration, and association rules that meet the user's minimum trust are constructed using the frequent item sets, so as to analyze whether there is an association rule relationship between each known input variable and the unknown input variable in the database;

When there is an association rule relationship between the known input variables and the unknown input variables, the distribution prediction of the unknown input variables is performed according to the association rule between the two. Otherwise, a uniform distribution is established according to the value range of the unknown input variables to obtain the distribution interval of the unknown input variables.

Based on the distribution interval, sampling the unknown input variables by Morris analysis method to obtain a set of unknown input variables;

The unknown input variable set is input into the energy consumption model for batch calculation to obtain the distribution range of the energy consumption prediction value.

2. According to the non-deterministic performance consumption prediction method in the case of missing input variables described in claim 1, it is characterized in that, according to the association rules defined by the user, an apriori algorithm is used to analyze whether there is an association rule relationship between each known input variable and the unknown input variable in the database, and it also includes: discretizing the continuous variables in the database.

3. A non-deterministic energy consumption prediction system in the case of missing input variables, characterized by comprising:

An analysis unit is used to find all frequent item sets that meet a preset threshold in the database through iteration, and use each of the frequent item sets to construct an association rule that meets the user's minimum trust, so as to analyze whether there is an association rule relationship between each known input variable and an unknown input variable in the database;

Establishing a unit, used for predicting the distribution of the unknown input variable according to the association rule between the known input variable and the unknown input variable when there is an association rule relationship between the two, otherwise, establishing a uniform distribution according to the value range of the unknown input variable, thereby obtaining the distribution interval of the unknown input variable;

A sampling unit, used for sampling unknown input variables by Morris analysis method based on the distribution interval to obtain a set of unknown input variables;

The prediction unit is used to input the unknown input variable set into the energy consumption model for batch calculation to obtain the distribution range of the energy consumption prediction value.

4. The non-deterministic energy consumption prediction system in the case of missing input variables according to claim 3, characterized in that it also includes: a pre-processing unit;

The preprocessing unit is used to discretize the continuous variables in the database.

5. A non-deterministic performance consumption prediction device in the case of missing input variables, characterized in that the device includes a processor and a memory:

The memory is used to store program code and transmit the program code to the processor;

The processor is used to execute the non-deterministic performance consumption prediction method in the case of missing input variables as described in any one of claims 1-2 according to the instructions in the program code.

6. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store program code, and the program code is used to execute the non-deterministic performance consumption prediction method in the case of missing input variables as described in any one of claims 1-2.