[go: up one dir, main page]

CN117494877A - Forecasting method of electric meter installation based on cluster analysis - Google Patents

Forecasting method of electric meter installation based on cluster analysis Download PDF

Info

Publication number
CN117494877A
CN117494877A CN202311425732.0A CN202311425732A CN117494877A CN 117494877 A CN117494877 A CN 117494877A CN 202311425732 A CN202311425732 A CN 202311425732A CN 117494877 A CN117494877 A CN 117494877A
Authority
CN
China
Prior art keywords
clustering
prediction
cluster
center
time series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311425732.0A
Other languages
Chinese (zh)
Inventor
赖国书
张荔鹃
叶强
周厚源
王姣
洪巧文
曹舒
曾清娟
杨涵脂
胡敏贤
林素存
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Fujian Electric Power Co Ltd
Marketing Service Center of State Grid Fujian Electric Power Co Ltd
Original Assignee
State Grid Fujian Electric Power Co Ltd
Marketing Service Center of State Grid Fujian Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Fujian Electric Power Co Ltd, Marketing Service Center of State Grid Fujian Electric Power Co Ltd filed Critical State Grid Fujian Electric Power Co Ltd
Priority to CN202311425732.0A priority Critical patent/CN117494877A/en
Publication of CN117494877A publication Critical patent/CN117494877A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Probability & Statistics with Applications (AREA)
  • Operations Research (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to an ammeter installation quantity prediction method based on cluster analysis, which comprises the following steps: carrying out data preprocessing on original data comprising a plurality of time sequences, dividing a training set and a prediction set according to prediction requirements, and enabling the time sequences in the training set to have the same length; determining the optimal clustering quantity by adopting an elbow rule; adopting a K-means++ algorithm to select an initial clustering center so as to improve the quality of a clustering result; according to the determined clustering quantity and the initial clustering center, iterating K-means clustering to obtain a clustering center and a classification result; according to the obtained classification result, respectively selecting the optimal order of the ARIMA model for each class of clustering centers; for the same kind of time sequences, respectively establishing an ARIMA model for each time sequence according to the selected corresponding ARIMA order to obtain a prediction result of each time sequence; evaluating the prediction effect of the model; and then predicting the electric meter installation quantity by using a model with the standard-reaching prediction effect. The method is favorable for efficiently and accurately predicting the installation quantity of the future ammeter.

Description

Electric meter installation quantity prediction method based on cluster analysis
Technical Field
The invention relates to the technical field of power data processing, in particular to an ammeter installation quantity prediction method based on cluster analysis.
Background
Predicting the amount of electricity meter installation is of great importance to planning, operation, policy making and business decisions in the power industry. The system can provide key information about future electricity meter demands, help each party make reasonable decisions, and promote sustainable development and efficiency improvement of the power system. And particularly, the purchasing and deployment strategies of the intelligent electric meter can be guided. The intelligent ammeter has the functions of data acquisition, remote monitoring, regulation and control and the like, and can improve the monitoring and management level of the power grid. Accurate prediction of the electricity meter installation amount is helpful for determining purchasing plans and arrangement strategies, and coverage range and quantity of the intelligent electricity meter are ensured to meet requirements.
Predicting electricity meter installation is based on historical data, and there are a number of prediction methods for time series prediction, both industrially and academically. The traditional time sequence method comprises ARIMA, GARCH model and the like, the machine learning method comprises random forest, GBDT and the like, and the deep neural network comprises CNN, LSTM and the like. However, although the conventional time series method, the machine learning method and the deep neural network are suitable for predicting the installation quantity of the electric meter, a large number of time series exist in the original data due to the fact that the indexes for dividing the electric meter in the original data are more, such as regions, types of the electric meter, installation flow and the like. If a model is built for each time series, the calculation amount is large. And the model is further corrected later, so that the targeted correction is difficult to be carried out on each time sequence independently.
Disclosure of Invention
The invention aims to provide an ammeter installation quantity prediction method based on cluster analysis, which is beneficial to efficiently and accurately predicting the installation quantity of future ammeter.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: an ammeter installation quantity prediction method based on cluster analysis comprises the following steps:
s1, carrying out data preprocessing on original data comprising a plurality of time sequences, dividing a training set and a prediction set according to prediction requirements, and enabling the time sequences in the training set to have the same length;
s2, determining the optimal clustering quantity by adopting an elbow rule;
s3, adopting a K-means++ algorithm to select an initial clustering center so as to improve the quality of a clustering result;
s4, iterating K-means clustering according to the determined clustering quantity and the initial clustering center to obtain a clustering center and a classification result;
s5, respectively selecting the optimal ARIMA model order for each class of clustering centers according to the obtained classification result;
s6, for the same type of time sequences, respectively establishing an ARIMA model for each time sequence according to the selected corresponding ARIMA order to obtain a prediction result of each time sequence;
s7, evaluating the prediction effect of the model; and then predicting the electric meter installation quantity by using a model with the standard-reaching prediction effect.
Further, in step S1, the training set and the prediction set are divided according to the prediction requirement, specifically: if the electricity meter installation amount of each time series is to be predicted for the last n months, the data except the last n months are taken as a training set, and the data of the last n months are taken as a prediction set.
Further, in step S1, if the lengths of the time series are different, the time series are made to have the same length by interpolation or truncation, specifically: if the time series itself only records the data from the first installation to the last installation of the ammeter, the installation amount at the time point which is not recorded before can be recorded as 0; if it cannot be determined whether or not the amount of installation at the previous unrecorded time point is 0, all the time series may be truncated to the same length as the shortest time series, and the time series may be truncated from the initial part of the time series.
Further, in step S2, the elbow rule is adopted to determine the optimal number of clusters, and the specific method is as follows: and calculating the square sum SSE of the clustering errors under different clustering numbers, wherein when the clustering number is increased to a certain degree, the reducing speed of the SSE is suddenly slowed down, and when an elbow is formed, the clustering number is the optimal clustering number.
Further, in step S3, an initial clustering center is selected by adopting a K-means++ algorithm, and the specific method is as follows:
a1, selecting a first clustering center: randomly selecting a time sequence from the training set as a first clustering center;
a2, calculating a distance weighted probability: for each time sequence, calculating the distance between the time sequence and the selected cluster center, and taking the square of the distance as a weight; then, calculating probability distribution of each time sequence as the next cluster center according to the obtained weight; the more distant the time series of the selected cluster center will have a higher probability of being the next cluster center;
a3, selecting the next cluster center: selecting the next cluster center according to the calculated distance weighted probability distribution;
a4, repeating the steps A2 and A3 until K cluster centers are selected.
Further, in step S4, iterative K-means clustering is performed, and the specific method is as follows:
b1, assigning data points to the nearest clustering center: calculating the distance between each time sequence and the cluster center selected in the step S3, and distributing each time sequence to the nearest cluster center;
b2, updating a clustering center: for each cluster, calculating the average value of all time sequences in the cluster, and taking the average value as a new cluster center;
b3, repeating the steps B1 and B2, namely repeating the steps of time sequence distribution and cluster center updating until a stopping condition is reached; the stopping condition is that the maximum iteration number is reached or the clustering center is not changed any more.
Further, in step S5, the order of the optimal ARIMA model is selected for each class of cluster center, and is used as the order of the optimal ARIMA model of the corresponding class of time series, and the specific method is as follows:
determining the order of the ARIMA model is a process of selecting the appropriate autoregressive order p, differential order d, and moving average order q; for the differential order d, carrying out stability test on the clustering center, and if the stability test cannot be passed, increasing the differential order d until the sequence passes the stability test; in selecting the orders p and q, the order is determined using an autocorrelation function and a partial autocorrelation function: p is determined by the intercept point of the partial autocorrelation function plot and q is determined by the intercept point of the autocorrelation function plot.
Further, in step S5, if the autocorrelation and partial autocorrelation function diagrams cannot determine the orders p and q, a subset selection algorithm is used to select the appropriate p and q, specifically: using bayesian information criteria as evaluation criteria and selecting the model with the smallest BIC value as the best model by trying different combinations of p, d and q.
Further, in step S6, a corresponding ARIMA model is built for each time series according to the order of the ARIMA model used for each time series determined in step S5; and estimating parameter values of the ARIMA model by using maximum likelihood estimation so as to maximize the fitting degree of the model to the observed data, thereby obtaining the predicted data of the last n months of each time sequence.
Further, in step S7, the relative error is averaged j Estimating the prediction error of the model:
wherein y is act (i, j) real electricity meter installation amount data of the ith month of the jth time series in the prediction set, y pred (i, J) is the i month prediction data to be predicted for the J time series obtained by the prediction method of the electricity meter installation amount based on cluster analysis, n is the total prediction number, and J is the total number of the time series; by error j Reflecting the predictive effect on the j-th time sequence.
Compared with the prior art, the invention has the following beneficial effects: compared with the traditional method for directly establishing an ARIMA model for each time sequence to predict the installation quantity of the electric meter based on the clustering analysis, the method provided by the invention has the advantages that for the condition of a large number of time sequences, after K-means clustering is carried out, the clustering center can reflect the data characteristics of the class, so that the order is suitable for the orders of the time sequences of the same class only by selecting the appropriate order of the ARIMA model for the clustering center. If the order of the ARIMA model is directly selected for each time series without classification, whatever method is adopted is difficult. Although there are also functions in the R language that provide for automatic selection of the order of the ARIMA model, the prediction of the order selected by the function is poor. Therefore, the method can efficiently and accurately predict the installation quantity of various electric meters in the future through the historical data of the installation quantity of various electric meters, and has strong practicability and wide application prospect.
Drawings
FIG. 1 is a flow chart of a method implementation of an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
As shown in fig. 1, the present embodiment provides a method for predicting an installation amount of an electric meter based on cluster analysis, including:
s1, carrying out data preprocessing on original data comprising a plurality of time sequences, dividing a training set and a prediction set according to prediction requirements, and enabling the time sequences in the training set to have the same length.
S2, determining the optimal clustering quantity by adopting an elbow rule.
S3, adopting a K-means++ algorithm to select an initial clustering center so as to improve the quality of a clustering result.
S4, iterating K-means clustering according to the determined clustering quantity and the initial clustering center to obtain a clustering center and a classification result.
S5, respectively selecting the optimal ARIMA model order for each class of clustering centers according to the obtained classification result.
S6, for the same kind of time sequences, respectively establishing an ARIMA model for each time sequence according to the selected corresponding ARIMA order to obtain a prediction result of each time sequence.
S7, evaluating the prediction effect of the model; and then predicting the electric meter installation quantity by using a model with the standard-reaching prediction effect.
In step S1, the training set and the prediction set are firstly divided according to the prediction requirement, specifically: if the electricity meter installation amount of each time series is to be predicted for the last n months, the data except the last n months are taken as a training set, and the data of the last n months are taken as a prediction set. Since there are numerous time sequences in the training set, the different time sequences are not the same length, but K-means clustering requires that the time sequences be uniform in length. In order to be able to classify the time series in the training set subsequently with K-means clustering, the time series are made to have the same length by interpolation or truncation. If the time series itself records only the data of the meter from the first installation to the last installation, the installation amount at the previous unrecorded time point can be recorded as 0. If it cannot be determined whether or not the amount of installation at the previous unrecorded time point is 0, all the time series may be truncated to the same length as the shortest time series, and the time series may be truncated from the initial part of the time series. In addition, if the number of time series with inconsistent lengths is small, the sequences can be selected to be independently built into an ARIMA model for prediction, and the rest time series with consistent lengths can be predicted by adopting a method for predicting the installation quantity of the ammeter based on cluster analysis.
In step S2, in order to use K-means clustering, it is determined that the time series in the training set are to be divided into several categories, i.e. the number of clusters K. In this embodiment, the elbow rule is adopted to determine the optimal clustering number, and the specific method is as follows: the optimal number of clusters is determined by calculating the sum of squared cluster errors (SSE) for different numbers of clusters. SSE refers to the sum of squares of the clusters between each data point and the middle point of the class to which it belongs. As the number of clusters increases, the SSE will gradually decrease, but the rate of decrease will gradually slow. When the number of clusters increases to some extent, the rate of decrease of SSE will dramatically slow, forming an "elbow". The "elbow" is the key to the elbow rule and represents the optimal number of clusters. In practical application, the number of clusters where the "elbow" is located can be intuitively confirmed by drawing a relation diagram of SSE and the number of clusters K. The optimal cluster number is determined through an elbow rule, is simple and easy to use, does not need any priori knowledge, and can be rapidly selected.
The traditional K-means clustering algorithm randomly selects an initial cluster center, which can lead to the algorithm being sensitive to initial values and isolated point data. In order to solve the problem, in step S3, an initial cluster center is selected by adopting a K-means++ algorithm, and the algorithm makes a new specification in selecting an initial data center, so that the distance between the initial data centers can be kept the farthest, and the specific method is as follows:
a1, selecting a first clustering center: a time series is randomly selected from the training set as a first cluster center.
A2, calculating a distance weighted probability: for each time sequence, calculating the distance between the time sequence and the selected cluster center, and taking the square of the distance as a weight; then, calculating probability distribution of each time sequence as the next cluster center according to the obtained weight; the more distant the time series of selected cluster centers will have a higher probability of being the next cluster center.
A3, selecting the next cluster center: and selecting the next cluster center according to the calculated distance weighted probability distribution.
A4, repeating the steps A2 and A3 until K cluster centers are selected.
The initial clustering center is intelligently selected by the method, so that the distribution of the data set can be better represented, and the quality of a clustering result is improved. The method can reduce the risk of the K-means algorithm falling into a local optimal solution, and is better than the traditional method for randomly selecting the initial clustering center in most cases.
Iterative K-means clustering is a distance-based clustering algorithm that optimizes the clustering result by iteratively updating the cluster centers and reassigning data points. The distance chosen here is the euclidean distance. In the step S4, iterative K-means clustering is carried out, and the specific method is as follows:
b1, assigning data points to the nearest clustering center: and (3) calculating the distance between each time sequence and the cluster center selected in the step S3, and distributing each time sequence to the nearest cluster center.
B2, updating a clustering center: for each cluster, the average value of all time sequences in the cluster is calculated, and the average value is taken as a new cluster center.
B3, repeating the steps B1 and B2, namely repeating the steps of time sequence distribution and cluster center updating until a stopping condition is reached; the stopping condition is that the maximum iteration number is reached or the clustering center is not changed any more.
The training set is divided into K classes by steps S1-S4. The cluster centers may be used to explain and describe the cluster features, so that in step S5, the order of the optimal ARIMA model is selected for each cluster center, respectively, as the order of the optimal ARIMA model for the time series of the corresponding classes. The specific method comprises the following steps:
determining the order of the ARIMA model is a process of selecting the appropriate autoregressive order p, differential order d, and moving average order q. And (3) carrying out stability test on the clustering center for the differential order d, and if the stability test cannot be passed, increasing the differential order d until the sequence passes the stability test. In practice, however, d is generally not more than 2. In selecting the orders p and q, the order is determined using an autocorrelation function and a partial autocorrelation function: by observing the autocorrelation and partial autocorrelation function plots of the data, p and q can be determined. p is determined by the intercept point of the partial autocorrelation function plot and q is determined by the intercept point of the autocorrelation function plot. If the autocorrelation and partial autocorrelation function diagrams cannot determine the orders p and q, a subset selection algorithm is used to select the appropriate p and q. The algorithm uses bayesian information criteria (Bayesian Information Criterion, BIC) as evaluation criteria and selects the model with the smallest BIC value as the best model by trying different combinations of p, d and q. The method can automatically try combinations of p, d and q without manually performing experiments and evaluations. And the complexity and the goodness of fit of the model can be comprehensively considered, the problem of over-fitting can be effectively avoided, and a model which is simpler but has good fit can be selected. Furthermore, the results of the subset selection algorithm may typically be visualized in order to more intuitively understand the process and results of model selection.
The corresponding ARIMA model is established for each time series by the order of ARIMA model used for each time series of step S5 (i.e., the same order as ARIMA model of the cluster center of the class). And estimating parameter values of the ARIMA model by using maximum likelihood estimation so as to maximize the fitting degree of the model to the observed data, thereby obtaining the predicted data of the last n months of each time sequence.
In step S7, the relative error is averaged j Estimating the prediction error of the model:
wherein y is act (i, j) real electricity meter installation amount data of the ith month of the jth time series in the prediction set, y pred (i, J) is the i month prediction data to be predicted for the J time series obtained by the prediction method of the electricity meter installation amount based on cluster analysis, n is the total prediction number, and J is the total number of the time series; by error j Reflecting the predictive effect on the j-th time sequence.
The original data has a plurality of time sequences, so the invention firstly clusters the time sequences, and the adopted clustering algorithm is a K-means clustering algorithm. And establishing a corresponding ARIMA model for predicting the data characteristics of each class. K-means clustering has the following advantages: 1. simple and efficient: it is easy to understand and implement and has a low computational complexity, suitable for processing large-scale data sets. 2. Scalability: the K-means clustering algorithm can be adapted to different data set sizes and dimensions. It can process data having a large number of samples and high-dimensional features, and can perform parallelization processing to improve the calculation efficiency. The parallelization implementation of the K-means algorithm can be realized based on Spark: dividing mass data onto different computing nodes, sharing cluster center point coordinates among the different computing nodes through a Spark Context broadcasting method, completing data distribution work by a Map function, and then completing updating of a mean center by a Reduce function. When the algorithm is realized through parallelization design, all data blocks for clustering can be operated in parallel, and the operation process of updating the clustering center value is to calculate the average value, and can be completed in parallel. 3. Visual interpretation and visualization: the clustering result generated by the K-means clustering is relatively visual, and easy to explain and visualize. It divides the data points into K clusters, each cluster representing a cluster, such that the time series within the same cluster have similar characteristics. 4. Interpretive and interpretive: the clustering result generated by the K-means clustering algorithm is generally better in interpretation. The cluster center represents the center point of each cluster and can be used to interpret and describe the cluster characteristics. On the contrary, if the ARIMA model is built for each time sequence, on the one hand, due to the numerous number, it cannot be guaranteed that the optimal ARIMA model is selected for each time sequence to predict; on the other hand, if the ARIMA model is to be improved subsequently, it is difficult to improve each ARIMA model separately.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the invention in any way, and any person skilled in the art may make modifications or alterations to the disclosed technical content to the equivalent embodiments. However, any simple modification, equivalent variation and variation of the above embodiments according to the technical substance of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1. The method for predicting the installation quantity of the ammeter based on cluster analysis is characterized by comprising the following steps of:
s1, carrying out data preprocessing on original data comprising a plurality of time sequences, dividing a training set and a prediction set according to prediction requirements, and enabling the time sequences in the training set to have the same length;
s2, determining the optimal clustering quantity by adopting an elbow rule;
s3, adopting a K-means++ algorithm to select an initial clustering center so as to improve the quality of a clustering result;
s4, iterating K-means clustering according to the determined clustering quantity and the initial clustering center to obtain a clustering center and a classification result;
s5, respectively selecting the optimal ARIMA model order for each class of clustering centers according to the obtained classification result;
s6, for the same type of time sequences, respectively establishing an ARIMA model for each time sequence according to the selected corresponding ARIMA order to obtain a prediction result of each time sequence;
s7, evaluating the prediction effect of the model; and then predicting the electric meter installation quantity by using a model with the standard-reaching prediction effect.
2. The method for predicting the installation quantity of the electric meter based on the cluster analysis according to claim 1, wherein in the step S1, the training set and the prediction set are divided according to the prediction requirement, specifically: if the electricity meter installation amount of each time series is to be predicted for the last n months, the data except the last n months are taken as a training set, and the data of the last n months are taken as a prediction set.
3. The method for predicting the installation quantity of an electric meter based on cluster analysis according to claim 1, wherein in step S1, if the lengths of the time series are different, the time series are made to have the same length by interpolation or truncation, specifically: if the time series itself only records the data from the first installation to the last installation of the ammeter, the installation amount at the time point which is not recorded before can be recorded as 0; if it cannot be determined whether or not the amount of installation at the previous unrecorded time point is 0, all the time series may be truncated to the same length as the shortest time series, and the time series may be truncated from the initial part of the time series.
4. The method for predicting the installation quantity of an electric meter based on cluster analysis according to claim 1, wherein in step S2, the optimal number of clusters is determined by using an elbow rule, and the method specifically comprises: and calculating the square sum SSE of the clustering errors under different clustering numbers, wherein when the clustering number is increased to a certain degree, the reducing speed of the SSE is suddenly slowed down, and when an elbow is formed, the clustering number is the optimal clustering number.
5. The method for predicting the installation quantity of the ammeter based on cluster analysis according to claim 1, wherein in the step S3, an initial cluster center is selected by adopting a K-means++ algorithm, and the specific method is as follows:
a1, selecting a first clustering center: randomly selecting a time sequence from the training set as a first clustering center;
a2, calculating a distance weighted probability: for each time sequence, calculating the distance between the time sequence and the selected cluster center, and taking the square of the distance as a weight; then, calculating probability distribution of each time sequence as the next cluster center according to the obtained weight; the more distant the time series of the selected cluster center will have a higher probability of being the next cluster center;
a3, selecting the next cluster center: selecting the next cluster center according to the calculated distance weighted probability distribution;
a4, repeating the steps A2 and A3 until K cluster centers are selected.
6. The method for predicting the installation quantity of the electric meter based on the cluster analysis according to claim 1, wherein in the step S4, the K-means clustering is iterated, and the specific method is as follows:
b1, assigning data points to the nearest clustering center: calculating the distance between each time sequence and the cluster center selected in the step S3, and distributing each time sequence to the nearest cluster center;
b2, updating a clustering center: for each cluster, calculating the average value of all time sequences in the cluster, and taking the average value as a new cluster center;
b3, repeating the steps B1 and B2, namely repeating the steps of time sequence distribution and cluster center updating until a stopping condition is reached; the stopping condition is that the maximum iteration number is reached or the clustering center is not changed any more.
7. The method for predicting the installation quantity of an electric meter based on cluster analysis according to claim 1, wherein in step S5, the order of the optimal ARIMA model is selected for each class of cluster center, and is used as the order of the optimal ARIMA model of the corresponding class of time series, and the specific method is as follows:
determining the order of the ARIMA model is a process of selecting the appropriate autoregressive order p, differential order d, and moving average order q; for the differential order d, carrying out stability test on the clustering center, and if the stability test cannot be passed, increasing the differential order d until the sequence passes the stability test; in selecting the orders p and q, the order is determined using an autocorrelation function and a partial autocorrelation function: p is determined by the intercept point of the partial autocorrelation function plot and q is determined by the intercept point of the autocorrelation function plot.
8. The method for predicting the installation quantity of an electric meter based on cluster analysis according to claim 7, wherein in step S5, if the autocorrelation and partial autocorrelation function diagrams cannot determine the orders p and q, a subset selection algorithm is used to select the appropriate p and q, specifically: using bayesian information criteria as evaluation criteria and selecting the model with the smallest BIC value as the best model by trying different combinations of p, d and q.
9. The method for predicting the installation quantity of an electric meter based on cluster analysis according to claim 1, wherein in step S6, a corresponding ARIMA model is established for each time series by the order of ARIMA model used for each time series determined in step S5; and estimating parameter values of the ARIMA model by using maximum likelihood estimation so as to maximize the fitting degree of the model to the observed data, thereby obtaining the predicted data of the last n months of each time sequence.
10. The base of claim 1The method for predicting the installation quantity of the electric meter by the cluster analysis is characterized in that in the step S7, the average relative error is adopted j Estimating the prediction error of the model:
wherein y is act (i, j) real electricity meter installation amount data of the ith month of the jth time series in the prediction set, y pred (i, J) is the i month prediction data to be predicted for the J time series obtained by the prediction method of the electricity meter installation amount based on cluster analysis, n is the total prediction number, and J is the total number of the time series; by error j Reflecting the predictive effect on the j-th time sequence.
CN202311425732.0A 2023-10-31 2023-10-31 Forecasting method of electric meter installation based on cluster analysis Pending CN117494877A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311425732.0A CN117494877A (en) 2023-10-31 2023-10-31 Forecasting method of electric meter installation based on cluster analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311425732.0A CN117494877A (en) 2023-10-31 2023-10-31 Forecasting method of electric meter installation based on cluster analysis

Publications (1)

Publication Number Publication Date
CN117494877A true CN117494877A (en) 2024-02-02

Family

ID=89668316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311425732.0A Pending CN117494877A (en) 2023-10-31 2023-10-31 Forecasting method of electric meter installation based on cluster analysis

Country Status (1)

Country Link
CN (1) CN117494877A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118505246A (en) * 2024-04-25 2024-08-16 北京春风药业有限公司 Traditional Chinese medicine production traceability method and system based on big data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118505246A (en) * 2024-04-25 2024-08-16 北京春风药业有限公司 Traditional Chinese medicine production traceability method and system based on big data

Similar Documents

Publication Publication Date Title
CN110717535B (en) Automatic modeling method and system based on data analysis processing system
CN111967696B (en) Method, system and device for forecasting electric vehicle charging demand based on neural network
CN118628161B (en) Supply chain demand prediction method and system
US12271962B2 (en) Cross-bore risk assessment and risk management tool
CN118472981B (en) Operation strategy optimization method for energy storage EMS system
CN112734135A (en) Power load prediction method, intelligent terminal and computer readable storage medium
CN110414627A (en) A kind of training method and relevant device of model
CN112819208A (en) Spatial similarity geological disaster prediction method based on feature subset coupling model
CN115115119A (en) OA-GRU short-term power load prediction method based on grey correlation
CN117494877A (en) Forecasting method of electric meter installation based on cluster analysis
May et al. Multi-variate time-series for time constraint adherence prediction in complex job shops
CN114240102A (en) Line loss abnormal data identification method and device, electronic equipment and storage medium
Kim et al. Extracting baseline electricity usage using gradient tree boosting
CN106022959A (en) Peak clipping and valley filling-oriented electricity utilization behavior analysis method and system
CN106611381A (en) Algorithm for analyzing influence of material purchase to production scheduling of manufacturing shop based on cloud manufacturing
CN109784362A (en) A kind of DGA shortage of data value interpolating method based on iteration KNN and interpolation priority
CN116911414A (en) Electricity consumption prediction method, device, equipment and computer storage medium
CN116720662A (en) Distributed energy system applicability assessment method based on set pair analysis
CN117131382A (en) Typhoon "space-time-shape-quantity" similarity prediction method and device, electronic equipment
Gagin et al. Training size matters: Impact of training data size on electrical load forecasting
CN109543930B (en) Method and system for dispatching workers based on multi-level steady-state production rate of machines
CN119030150B (en) A multi-energy complementary industrial park microgrid autonomous control method and system
CN118820473B (en) A consumer protection hotspot data analysis system and method based on clustering algorithm
Tikhonov et al. Development of Forecasting Theory and Methods for Developing Forecasts in Electroenergetics
Safarudin et al. The Number of Nodes Effect to Predict the Electrical Consumption in Seven Distinct Countries

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination