CN118194074B

CN118194074B - Load curve clustering method based on improved rough C-means

Info

Publication number: CN118194074B
Application number: CN202410622100.1A
Authority: CN
Inventors: 张腾飞; 程奕凌; 饶玉凡; 马福民; 刘建; 陈舒; 于洋; 姚金明; 吕思
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2024-05-20
Filing date: 2024-05-20
Publication date: 2024-09-10
Anticipated expiration: 2044-05-20
Also published as: CN118194074A

Abstract

The present invention belongs to the technical field of load classification of power systems, and discloses a load curve clustering method based on improved rough C-means. First, the original daily load data is normalized, and the number of target clusters, initial cluster centers, lower approximate weights, upper approximate weights, and distance judgment thresholds are determined. Then, the distance from each load curve to each cluster center is calculated, and each load curve is classified into the upper/lower approximate set of the corresponding cluster. Finally, considering the distance between the load curve and the cluster center and the data distribution density in the neighborhood, a mixed imbalance metric is introduced into the iterative formula of the cluster center to quantify the degree of influence of the imbalanced spatial distribution of the load curve within the cluster on the iteration of the cluster center. The present invention can effectively deal with the problem of uneven distribution of load curves, improve the clustering effect, and better provide technical support for demand response, load forecasting, etc.

Description

A load curve clustering method based on improved rough C-means

技术领域Technical Field

本发明属于电力系统负荷分类技术领域，具体是涉及一种基于改进粗糙C-means的负荷曲线聚类方法。The invention belongs to the technical field of power system load classification, and in particular relates to a load curve clustering method based on improved rough C-means.

背景技术Background Art

伴随着智能电表等数据采集装置的大量部署以及信息通信技术的发展，用户侧的历史用电数据呈现爆炸式增长。如何从低价值密度的海量负荷数据中挖掘潜在的信息，掌握不同类型用户的用电特性及其内在联系，对于辅助电网调度人员开展需求响应、负荷预测、制定电价等工作具有十分重要的意义；实现多元灵活负荷管控是新型电力系统发展的必然趋势。With the massive deployment of data collection devices such as smart meters and the development of information and communication technology, the historical electricity consumption data on the user side has exploded. How to mine potential information from the massive load data with low value density and grasp the electricity consumption characteristics and internal connections of different types of users is of great significance for assisting grid dispatchers in carrying out demand response, load forecasting, and electricity price setting. Realizing diversified and flexible load control is an inevitable trend in the development of new power systems.

聚类分析是挖掘海量负荷数据潜在信息的有效方法，其基本思想是将大量负荷曲线分成几个簇，簇内的负荷曲线具有高度的相似性，而簇间的负荷曲线具有高度的相异性。目前针对负荷曲线聚类的研究大多集中在数据降维、聚类算法性能优化、最佳聚类数以及初始聚类中心选取这几个方面；然而，由于电力市场的开放使得用户行为更加自由，导致负荷曲线的分布情况愈发复杂，产生了簇间负荷曲线形态重叠、分布不均衡的特征；比如工业用户与行政用户形态上都为单峰型负荷，但二者用户数量以及可调容量却相差甚远。现有负荷曲线聚类方法在处理这种数据时会产生“均匀效应”，即大类中的样本会被错误的划分到小类中。因此，为了实现对柔性负荷资源的精细化感知，需要强化对边界区域负荷曲线的分析，降低错误聚类的可能。Cluster analysis is an effective method to mine the potential information of massive load data. Its basic idea is to divide a large number of load curves into several clusters. The load curves within a cluster have a high degree of similarity, while the load curves between clusters have a high degree of difference. At present, most of the research on load curve clustering focuses on data dimension reduction, clustering algorithm performance optimization, optimal number of clusters, and selection of initial cluster centers. However, due to the opening of the power market, user behavior has become more free, resulting in more complex distribution of load curves, resulting in overlapping and uneven distribution of load curves between clusters. For example, industrial users and administrative users are both single-peak loads, but the number of users and adjustable capacity of the two are very different. The existing load curve clustering method will produce a "uniform effect" when processing such data, that is, samples in a large category will be incorrectly divided into small categories. Therefore, in order to achieve refined perception of flexible load resources, it is necessary to strengthen the analysis of load curves in boundary areas to reduce the possibility of incorrect clustering.

粗糙C-means聚类算法是将粗糙集理论融入到C-means聚类算法中，把每个簇看作是一个粗糙集，每一个数据对象或者确定属于某个簇的下近似集，或者同时属于多个簇的上近似集。已有粗糙聚类及其衍生算法考虑了不同边界区域的不同加权系数，对电力市场中负荷曲线的聚类精度有一定的提升，但其对同一边界区域内的数据对象仍是赋予同样的权重，忽视了同一边界区域内部的各种不平衡分布情况，仍会导致负荷曲线聚类错误，不能精确的为需求响应、负荷预测等提供技术支持。The rough C-means clustering algorithm integrates the rough set theory into the C-means clustering algorithm, and regards each cluster as a rough set. Each data object either belongs to the lower approximate set of a certain cluster, or belongs to the upper approximate set of multiple clusters at the same time. The existing rough clustering and its derivative algorithms consider different weighting coefficients of different boundary areas, which has improved the clustering accuracy of load curves in the power market to a certain extent. However, it still gives the same weight to data objects in the same boundary area, ignoring various unbalanced distribution situations within the same boundary area, which will still lead to load curve clustering errors and cannot accurately provide technical support for demand response, load forecasting, etc.

发明内容Summary of the invention

为解决上述技术问题，本发明提供了一种基于改进粗糙C-means的负荷曲线聚类方法，综合考虑同一边界区域内不同负荷曲线与聚类中心的距离以及邻域内数据分布密度的混合不平衡度量，能够有效量化簇内负荷曲线空间分布不平衡对聚类中心迭代的影响程度，可以在很大程度上提升对边界区域负荷曲线的聚类精度，并为负荷曲线的聚类方法开辟新的方向。To solve the above technical problems, the present invention provides a load curve clustering method based on improved rough C-means, which comprehensively considers the distance between different load curves and the cluster center in the same boundary area and the mixed imbalance measurement of the data distribution density in the neighborhood, and can effectively quantify the influence of the spatial distribution imbalance of the load curve within the cluster on the iteration of the cluster center, which can greatly improve the clustering accuracy of the load curve in the boundary area and open up a new direction for the clustering method of load curves.

本发明所述的一种基于改进粗糙C-means的负荷曲线聚类方法，包括以下步骤：The load curve clustering method based on improved rough C-means described in the present invention comprises the following steps:

步骤1、采集历史日负荷数据，获取同一日的N条原始日负荷曲线，并采用极大值归一化方法对原始日负荷曲线进行归一化处理，得到需要进行聚类的N条负荷聚类对象曲线，；Step 1: Collect historical daily load data and obtain N original daily load curves on the same day , and the maximum normalization method is used to normalize the original daily load curve to obtain N load clustering object curves that need to be clustered , ;

步骤2、确定目标聚类个数c、c个初始聚类中心、下近似权值W_low、上近似权值W_up和距离判断阈值Δ，；Step 2: Determine the number of target clusters c and c initial cluster centers , lower approximate weight W _low , upper approximate weight W _up and distance judgment threshold Δ, ;

步骤3、对于负荷聚类对象曲线X_k，，计算其到各聚类中心的欧氏距离，并将X_k归到最近的聚类中心C_i所对应类簇U_i的上近似集；Step 3: For the load clustering object curve X _k , , calculate its Euclidean distance to each cluster center, and assign _Xk to the upper approximate set of cluster _Ui corresponding to the nearest cluster center _Ci ;

步骤4、若存在另一聚类中心C_j，使得X_k到C_j的距离和X_k到C_i的距离之差小于阈值Δ，则将X_k同时归入到C_j所对应类簇U_j的上近似集；否则，将X_k归入到类簇U_i的下近似集；Step 4: If there is another cluster center C _j such that the difference between the distance from X _k to C _j and the distance from X _k to _Ci is less than the threshold Δ, then X _k is simultaneously included in the upper approximate set of the cluster U _j corresponding to C _j. Otherwise, classify _Xk into the lower approximation set of cluster _Ui ;

步骤5、按如下迭代公式更新聚类中心：Step 5: Update the cluster center according to the following iterative formula:

， ,

其中，h_ik为混合不平衡度量，上近似集与下近似集之差表示类簇U_i的边界集；Among them, h _ik is the mixed imbalance measure, the difference between the upper approximation set and the lower approximation set represents the boundary set of cluster U _i ;

步骤6、重复步骤2-步骤5，直到聚类收敛。Step 6: Repeat steps 2 to 5 until the clustering converges.

进一步的，步骤1中，原始日负荷曲线采样间隔为1小时，则同一日内N条原始日负荷曲线为，其中表示第k条原始日负荷曲线，表示第k条原始日负荷曲线在第i个小时的取值。Furthermore, in step 1, the sampling interval of the original daily load curve is 1 hour, so the N original daily load curves on the same day are ,in represents the kth original daily load curve, Represents the value of the kth original daily load curve at the i-th hour.

进一步的，步骤1中采用极大值归一化方法对原始日负荷数据进行归一化处理，具体方法为：Furthermore, in step 1, the maximum value normalization method is used to normalize the original daily load data. The specific method is as follows:

， ,

从而得到归一化后的N条负荷聚类对象曲线，其中表示第k条负荷聚类对象曲线，X_ki表示第k条负荷聚类对象曲线在第i个小时的取值。Thus, we can obtain N normalized load clustering object curves. ,in represents the kth load clustering object curve, and X _ki represents the value of the kth load clustering object curve in the ith hour.

进一步的，步骤3中，第k条负荷聚类对象曲线X_k到初始聚类中心C_i=(C_i1, C_i2, …, C_i24) 的欧氏距离计算如下：Furthermore, in step 3, the Euclidean distance from the kth load clustering object curve _Xk to the initial clustering center _Ci = ( _Ci1 , _Ci2 , ..., _Ci24 ) is calculated as follows:

， ,

其中，d_ik为第k条负荷聚类对象曲线X_k到初始聚类中心C_i的欧氏距离。Wherein, d _ik is the Euclidean distance from the kth load clustering object curve X _k to the initial clustering center _Ci .

进一步的，步骤5中，混合不平衡度量h_ik计算公式如下：Furthermore, in step 5, the calculation formula of the mixed imbalance metric h _ik is as follows:

， ,

其中，表示距离X_k为的邻域范围内负荷曲线的个数；表示聚类中心C_i所对应类簇U_i的上近似集内负荷曲线个数；表示第k条负荷聚类对象曲线X_k参与聚类中心C_i迭代时的距离度量，距离越远，则X_k到C_i的距离d_ik对于C_i更新迭代的影响程度越低，距离越近则影响程度越高；表示第k条负荷聚类对象曲线X_k参与聚类中心C_i迭代时局部密度度量，密度越低，则X_k邻域范围内负荷曲线密度对于C_i更新迭代的影响程度越低，密度越高则影响程度越高。in, Denote the distance X _k as The number of load curves within the neighborhood of ; Represents the upper approximation set of cluster _Ui corresponding to cluster center _Ci Number of internal load curves; It represents the distance metric when the k-th load clustering object curve _Xk participates in the iteration of the cluster center _Ci. The farther the distance, the lower the influence of the distance _dik from _Xk to _Ci on the update iteration of _Ci , and the closer the distance, the higher the influence. It represents the local density measurement when the k-th load clustering object curve _Xk participates in the iteration of the cluster center _Ci . The lower the density, the lower the influence of the load curve density in the neighborhood _{of Xk} on the update iteration of _Ci , and the higher the density, the greater the influence.

本发明所述的有益效果为：本发明所述方法将距离度量和领域密度度量相结合，充分考虑了簇间负荷曲线形态交叉重叠、分布不均衡的影响，为聚类中心迭代公式中每个负荷曲线均赋予不同的加权系数，量化簇内负荷曲线空间分布不平衡对聚类中心迭代的影响程度；负荷曲线越靠近中心、邻域内数据分布密度越高，加权系数就越大，在中心迭代中的贡献越大；同理，负荷曲线越远离中心、邻域内数据分布密度越低，加权系数就越小，在中心迭代中的贡献越小；通过强化对交叉边界区域负荷曲线的分析，有效地处理了负荷曲线分布不均衡的问题，提升了对交叉边界区域负荷曲线的聚类精度，更好地为需求响应、负荷预测等提供技术支持。The beneficial effects described in the present invention are as follows: the method described in the present invention combines distance measurement and domain density measurement, fully considers the influence of cross-overlap and uneven distribution of load curves between clusters, assigns different weighting coefficients to each load curve in the cluster center iteration formula, and quantifies the influence of the imbalanced spatial distribution of load curves within the cluster on the iteration of the cluster center; the closer the load curve is to the center and the higher the data distribution density in the neighborhood, the greater the weighting coefficient, and the greater the contribution to the center iteration; similarly, the farther the load curve is from the center and the lower the data distribution density in the neighborhood, the smaller the weighting coefficient, and the smaller the contribution to the center iteration; by strengthening the analysis of the load curves in the cross-boundary area, the problem of uneven distribution of load curves is effectively dealt with, the clustering accuracy of the load curves in the cross-boundary area is improved, and better technical support is provided for demand response, load forecasting, etc.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明所述方法流程图；FIG1 is a flow chart of the method of the present invention;

图2是本发明所述方法下负荷聚类结果示意图；FIG2 is a schematic diagram of load clustering results under the method of the present invention;

图3是本发明所述方法下聚类中心结果示意图；FIG3 is a schematic diagram of cluster center results under the method of the present invention;

图4是传统改进C-means聚类方法下负荷聚类结果示意图；FIG4 is a schematic diagram of load clustering results under the traditional improved C-means clustering method;

具体实施方式DETAILED DESCRIPTION

为了使本发明的内容更容易被清楚地理解，下面根据具体实施例并结合附图，对本发明作进一步详细的说明。In order to make the contents of the present invention more clearly understood, the present invention is further described in detail below based on specific embodiments in conjunction with the accompanying drawings.

本发明所述的一种基于改进粗糙C-means的负荷曲线聚类方法，如图1所示，包括以下步骤：The load curve clustering method based on improved rough C-means described in the present invention, as shown in FIG1 , comprises the following steps:

所述步骤1的原始日负荷曲线采样间隔为1小时，则同一日内N条原始日负荷曲线为，其中表示第k条原始日负荷曲线，表示第k条原始日负荷曲线在第i个小时的取值；The sampling interval of the original daily load curve in step 1 is 1 hour, so the N original daily load curves in the same day are ,in represents the kth original daily load curve, represents the value of the kth original daily load curve at the i-th hour;

所述步骤1中采用极大值归一化方法对原始日负荷数据进行归一化处理，具体方法为：In step 1, the maximum value normalization method is used to normalize the original daily load data. The specific method is:

， ,

从而得到归一化后的N条负荷聚类对象曲线，其中表示第k条负荷聚类对象曲线，X_ki表示第k条负荷聚类对象曲线在第i个小时的取值；Thus, we can obtain N normalized load clustering object curves. ,in represents the k-th load clustering object curve, X _ki represents the value of the k-th load clustering object curve at the i-th hour;

第k条负荷聚类对象曲线X_k到初始聚类中心的欧氏距离计算如下：The kth load clustering object curve X _k to the initial clustering center The Euclidean distance is calculated as follows:

， ,

其中，d_ik为第k条负荷聚类对象曲线X_k到初始聚类中心C_i的欧氏距离；Where, d _ik is the Euclidean distance from the kth load clustering object curve X _k to the initial clustering center _Ci ;

； ;

所述混合不平衡度量h_ik计算公式如下：The calculation formula of the hybrid imbalance metric h _ik is as follows:

， ,

其中，表示距离X_k为的邻域范围内负荷曲线的个数；表示聚类中心C_i所对应类簇U_i的上近似集内负荷曲线个数；表示第k条负荷聚类对象曲线X_k参与聚类中心C_i迭代时的距离度量，距离越远，则X_k到C_i的距离d_ik对于C_i更新迭代的影响程度越低，距离越近则影响程度越高；表示第k条负荷聚类对象曲线X_k参与聚类中心C_i迭代时局部密度度量，密度越低，则X_k邻域范围内负荷曲线密度对于C_i更新迭代的影响程度越低，密度越高则影响程度越高；in, Denote the distance X _k as The number of load curves within the neighborhood of ; Represents the upper approximation set of cluster _Ui corresponding to cluster center _Ci Number of internal load curves; It represents the distance metric when the k-th load clustering object curve _Xk participates in the iteration of the cluster center _Ci. The farther the distance, the lower the influence of the distance _dik from _Xk to _Ci on the update iteration of _Ci , and the closer the distance, the higher the influence. It represents the local density measurement of the kth load clustering object curve _Xk when participating in the iteration of the cluster center _Ci . The lower the density, the lower the influence of the load curve density in the neighborhood _{of Xk} on the update iteration of _Ci. The higher the density, the higher the influence.

选取某地区2015年8月某工作日实测1910个用户（其中工业用户为200个，行政行业用户为300个，商业用户为400个，居民用户为350个，餐饮行业用户为360个，路灯和高耗能企业用户为300个）的日负荷曲线为研究对象，日负荷采样间隔为1h，每日共计24个采样点。算法目标聚类个数c设置为6，下近似权值W_low设置为0.9，上近似权值W_up设置为0.1，距离判断阈值Δ设置为0.1，邻域范围设置为0.2。The daily load curve of 1910 users (including 200 industrial users, 300 administrative users, 400 commercial users, 350 residential users, 360 catering users, and 300 street lamp and high energy consumption enterprise users) measured on a certain working day in August 2015 in a certain area was selected as the research object. The daily load sampling interval was 1h, with a total of 24 sampling points per day. The number of algorithm target clusters c was set to 6, the lower approximate weight W _low was set to 0.9, the upper approximate weight W _up was set to 0.1, the distance judgment threshold Δ was set to 0.1, and the neighborhood range Set to 0.2.

本发明所述的基于改进粗糙C-means的负荷曲线聚类方法结果如图2所示，将用户负荷曲线划分为6类，类别1中的负荷曲线为202条，类别2中的负荷曲线为304条，类别3中的负荷曲线为401条，类别4中的负荷曲线为350条，类别5中的负荷曲线为363条，类别6中的负荷曲线为300条。类别1、2均为单峰型负荷，但类别1用电时间集中在06：00-21：00时段，类别2用电时间集中在08：00-19：00时段，且类别1的平均负荷水平要比类别2高，类别1主要为工业用户，类别2主要为学校、医院、政府等行政行业用户。类别3、4均为双峰型负荷，类别3的两个峰值分别出现在12：00和18：00左右，该类别主要为酒店、超市等商业用户；类别4的两个峰值分别出现在07：00和21：00左右，该类别主要为公寓等居民用户。类别5为三峰型负荷，主要为餐饮行业用户，在早中晚时段均存在高峰用电现象，在上午工作时段、午间休息时段和下午工作时段用电有所下降。类别6为避峰型负荷，主要为公用路灯和高耗能企业用户。6类负荷的聚类中心如图3所示。The result of the load curve clustering method based on improved rough C-means described in the present invention is shown in FIG2, and the user load curves are divided into 6 categories, with 202 load curves in category 1, 304 load curves in category 2, 401 load curves in category 3, 350 load curves in category 4, 363 load curves in category 5, and 300 load curves in category 6. Categories 1 and 2 are both single-peak loads, but the electricity consumption time of category 1 is concentrated in the period of 06:00-21:00, and the electricity consumption time of category 2 is concentrated in the period of 08:00-19:00, and the average load level of category 1 is higher than that of category 2. Category 1 is mainly industrial users, and category 2 is mainly administrative industry users such as schools, hospitals, and governments. Category 3 and 4 are both bimodal loads. The two peaks of Category 3 appear at around 12:00 and 18:00, respectively. This category mainly includes commercial users such as hotels and supermarkets. The two peaks of Category 4 appear at around 07:00 and 21:00, respectively. This category mainly includes residential users such as apartments. Category 5 is a trimodal load, mainly for users in the catering industry. There is a peak power consumption phenomenon in the morning, noon and evening, and the power consumption decreases during the morning working hours, lunch break and afternoon working hours. Category 6 is a peak avoidance load, mainly for public street lamps and high-energy-consuming enterprise users. The cluster centers of the six types of loads are shown in Figure 3.

将本发明所提方法与传统改进C-means负荷聚类方法做对比，传统改进C-means负荷聚类方法只对算法性能进行提升，不考虑负荷曲线不平衡分布的影响。传统改进C-means负荷聚类方法的聚类结果如图4所示，将用户负荷曲线划分为6类，类别1中的负荷曲线为226条，类别2中的负荷曲线为276条，类别3中的负荷曲线为373条，类别4中的负荷曲线为352条，类别5中的负荷曲线为383条，类别6中的负荷曲线为300条。在传统改进C-means负荷聚类方法下，类别2中的相当一部分负荷被错误的划分到类别1中，类别3中也有相当一部分负荷被错误的划分到类别5中。由上述分析可知，本发明所提的基于改进粗糙C-means的负荷曲线聚类方法聚类错误的负荷曲线数量更少，具有更好的分类效果。The method proposed in the present invention is compared with the traditional improved C-means load clustering method. The traditional improved C-means load clustering method only improves the algorithm performance and does not consider the impact of the unbalanced distribution of the load curve. The clustering result of the traditional improved C-means load clustering method is shown in Figure 4. The user load curves are divided into 6 categories. There are 226 load curves in category 1, 276 load curves in category 2, 373 load curves in category 3, 352 load curves in category 4, 383 load curves in category 5, and 300 load curves in category 6. Under the traditional improved C-means load clustering method, a considerable part of the load in category 2 is incorrectly classified into category 1, and a considerable part of the load in category 3 is also incorrectly classified into category 5. From the above analysis, it can be seen that the load curve clustering method based on improved rough C-means proposed in the present invention has fewer incorrectly clustered load curves and has a better classification effect.

为进一步验证所提考虑距离与密度的混合不平衡度量能够提升聚类方法的性能，选取C-means算法、粗糙C-means算法作为对比算法，从如下四个方面评判聚类结果：In order to further verify that the proposed mixed imbalanced metric considering distance and density can improve the performance of clustering methods, the C-means algorithm and the rough C-means algorithm are selected as comparison algorithms, and the clustering results are evaluated from the following four aspects:

1）DBI指标：该指标反映了簇间的分散性和簇内的紧凑性，其值越小说明聚类质量越高；计算公式如下所示：1) DBI index: This index reflects the dispersion between clusters and the compactness within clusters. The smaller the value, the higher the clustering quality. The calculation formula is as follows:

， ,

式中：为第i类中所有样本到其聚类中心的平均距离；为第i类与第j类聚类中心的距离；越小，聚类效果更佳； Where: is the average distance from all samples in the i-th class to its cluster center; is the distance between the cluster centers of the i-th and j-th categories; The smaller it is, the better the clustering effect is;

2）SSE指标：该指标反映了所有簇的簇内凝聚度，其值越小说明聚类质量越高。计算公式如下所示：2) SSE index: This index reflects the intra-cluster cohesion of all clusters. The smaller the value, the higher the clustering quality. The calculation formula is as follows:

； ;

3）SC指标：该指标反映了簇之间的疏密程度，其值越接近1说明簇与簇之间越疏远；3) SC index: This index reflects the density between clusters. The closer its value is to 1, the farther the clusters are from each other.

， ,

其中a(X_k)表示X_k到同一簇内其他样本的平均距离；b(X_k)表示X_k到其他簇内样本的最小平均距离；Where a(X _k ) represents the average distance from X _k to other samples in the same cluster; b(X _k ) represents the minimum average distance from X _k to samples in other clusters;

4)聚类精度AC：该指标是指对比原数据集的决策属性值，被正确聚类的数据对象在数据集中所占的百分比，计算公式为：4) Clustering accuracy AC: This indicator refers to the percentage of correctly clustered data objects in the data set compared with the decision attribute values of the original data set. The calculation formula is:

。 .

表1 聚类效果对比Table 1 Comparison of clustering effects

表1为聚类效果对比，从聚类指标的反映来看，本发明所提改进聚类方法的DBI指标的值最小，说明簇内更为紧密，簇间更为分散；本发明所提改进聚类方法的SSE指标的值最小，说明各个簇的凝聚度最优；本发明所提改进聚类方法的SC指标的值最大，说明簇间更稀疏；本发明所提改进聚类方法的聚类精度最高，说明聚类更加准确。Table 1 is a comparison of clustering effects. From the perspective of clustering indicators, the DBI index of the improved clustering method proposed in the present invention has the smallest value, indicating that the clusters are closer and the clusters are more dispersed; the SSE index of the improved clustering method proposed in the present invention has the smallest value, indicating that the cohesion of each cluster is optimal; the SC index of the improved clustering method proposed in the present invention has the largest value, indicating that the clusters are more sparse; the clustering accuracy of the improved clustering method proposed in the present invention is the highest, indicating that the clustering is more accurate.

以上所述仅为本发明的优选方案，并非作为对本发明的进一步限定，凡是利用本发明说明书及附图内容所作的各种等效变化均在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to be a further limitation of the present invention. All equivalent changes made using the contents of the present specification and drawings are within the protection scope of the present invention.

Claims

1. The load curve clustering method based on the improved rough C-means is characterized by comprising the following steps of:

step 1, acquiring historical daily load data, and acquiring N original daily load curves of the same day Normalizing the original daily load curve by adopting a maximum normalization method to obtain N load clustering object curves needing to be clustered，；

Step 2, determining the number c of target clusters and c initial cluster centersA lower approximation weight W _low, an upper approximation weight W _up, and a distance determination threshold delta,；

Step 3, for the load clustering object curve X _k,Calculating Euclidean distance from the cluster center to each cluster center, and classifying X _k into the upper approximation set of the cluster U _i corresponding to the nearest cluster center C _i ；

Step 4, if another cluster center C _j exists so that the difference between the distance from X _k to C _j and the distance from X _k to C _i is smaller than the threshold delta, classifying X _k into the upper approximate set of the cluster U _j corresponding to C _j at the same time; Otherwise, X _k is classified into the lower approximation set of cluster U _i ；

Step 5, updating the clustering center according to the following iterative formula:

，

Wherein h _ik is the mixed imbalance metric, the difference between the upper approximation set and the lower approximation set A boundary set representing a cluster-like U _i;

and 6, repeating the steps 2-5 until the clustering converges.

2. The method for clustering load curves based on improved coarse C-means according to claim 1, wherein in step 1, the sampling interval of the original daily load curves is1 hour, and then N original daily load curves in the same day areWhereinThe kth raw daily load curve is shown,The value of the kth original daily load curve at the ith hour is shown.

3. The load curve clustering method based on the improved rough C-means according to claim 1, wherein the method for normalizing the original daily load data by adopting a maximum normalization method in the step 1 is as follows:

，

thereby obtaining N normalized load clustering object curves WhereinAnd X _ki represents the value of the kth load clustering object curve at the ith hour.

4. The load curve clustering method based on modified coarse C-means according to claim 1, wherein in step 3, the euclidean distance from the kth load cluster object curve X _k to the initial cluster center C _i=(C_i1, C_i2, … , C_i24) is calculated as follows:

，

Wherein d _ik is the Euclidean distance from the kth load clustering object curve X _k to the initial clustering center C _i.

5. The load curve clustering method based on modified coarse C-means according to claim 1, wherein in step 5, the mixing unbalance measure h _ik is calculated as follows:

，

wherein, Indicating a distance X _k ofThe number of load curves in the neighborhood range; Representing the upper approximation set of class cluster U _i corresponding to cluster center C _i The number of internal load curves; The distance measurement when the kth load clustering object curve X _k participates in the iteration of the clustering center C _i is shown, the farther the distance is, the lower the influence degree of the distance d _ik from X _k to C _i on the updating iteration of C _i is, and the higher the influence degree is when the distance is closer; The k-th load clustering object curve X _k is represented to participate in local density measurement when the clustering center C _i iterates, the lower the density is, the lower the influence degree of the density of the load curve in the X _k neighborhood range on the C _i updating iteration is, and the higher the density is, the higher the influence degree is.