[go: up one dir, main page]

CN118194074B - Load curve clustering method based on improved rough C-means - Google Patents

Load curve clustering method based on improved rough C-means Download PDF

Info

Publication number
CN118194074B
CN118194074B CN202410622100.1A CN202410622100A CN118194074B CN 118194074 B CN118194074 B CN 118194074B CN 202410622100 A CN202410622100 A CN 202410622100A CN 118194074 B CN118194074 B CN 118194074B
Authority
CN
China
Prior art keywords
load
clustering
cluster
curve
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410622100.1A
Other languages
Chinese (zh)
Other versions
CN118194074A (en
Inventor
张腾飞
程奕凌
饶玉凡
马福民
刘建
陈舒
于洋
姚金明
吕思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202410622100.1A priority Critical patent/CN118194074B/en
Publication of CN118194074A publication Critical patent/CN118194074A/en
Application granted granted Critical
Publication of CN118194074B publication Critical patent/CN118194074B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/003Load forecast, e.g. methods or systems for forecasting future load demand
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明属于电力系统负荷分类技术领域,公开了一种基于改进粗糙C‑means的负荷曲线聚类方法,首先对原始日负荷数据进行归一化处理,并确定目标聚类个数、初始聚类中心、下近似权值、上近似权值和距离判断阈值。随后计算每条负荷曲线到各聚类中心的距离,并将每条负荷曲线归入到对应类簇的上/下近似集。最后考虑负荷曲线与聚类中心的距离以及邻域内数据分布密度,在聚类中心的迭代公式中引入混合不平衡度量,量化簇内负荷曲线空间分布不平衡对聚类中心迭代的影响程度。本发明能够有效地处理负荷曲线分布不均衡的问题,提升聚类效果,更好地为需求响应、负荷预测等提供技术支持。

The present invention belongs to the technical field of load classification of power systems, and discloses a load curve clustering method based on improved rough C-means. First, the original daily load data is normalized, and the number of target clusters, initial cluster centers, lower approximate weights, upper approximate weights, and distance judgment thresholds are determined. Then, the distance from each load curve to each cluster center is calculated, and each load curve is classified into the upper/lower approximate set of the corresponding cluster. Finally, considering the distance between the load curve and the cluster center and the data distribution density in the neighborhood, a mixed imbalance metric is introduced into the iterative formula of the cluster center to quantify the degree of influence of the imbalanced spatial distribution of the load curve within the cluster on the iteration of the cluster center. The present invention can effectively deal with the problem of uneven distribution of load curves, improve the clustering effect, and better provide technical support for demand response, load forecasting, etc.

Description

一种基于改进粗糙C-means的负荷曲线聚类方法A load curve clustering method based on improved rough C-means

技术领域Technical Field

本发明属于电力系统负荷分类技术领域,具体是涉及一种基于改进粗糙C-means的负荷曲线聚类方法。The invention belongs to the technical field of power system load classification, and in particular relates to a load curve clustering method based on improved rough C-means.

背景技术Background Art

伴随着智能电表等数据采集装置的大量部署以及信息通信技术的发展,用户侧的历史用电数据呈现爆炸式增长。如何从低价值密度的海量负荷数据中挖掘潜在的信息,掌握不同类型用户的用电特性及其内在联系,对于辅助电网调度人员开展需求响应、负荷预测、制定电价等工作具有十分重要的意义;实现多元灵活负荷管控是新型电力系统发展的必然趋势。With the massive deployment of data collection devices such as smart meters and the development of information and communication technology, the historical electricity consumption data on the user side has exploded. How to mine potential information from the massive load data with low value density and grasp the electricity consumption characteristics and internal connections of different types of users is of great significance for assisting grid dispatchers in carrying out demand response, load forecasting, and electricity price setting. Realizing diversified and flexible load control is an inevitable trend in the development of new power systems.

聚类分析是挖掘海量负荷数据潜在信息的有效方法,其基本思想是将大量负荷曲线分成几个簇,簇内的负荷曲线具有高度的相似性,而簇间的负荷曲线具有高度的相异性。目前针对负荷曲线聚类的研究大多集中在数据降维、聚类算法性能优化、最佳聚类数以及初始聚类中心选取这几个方面;然而,由于电力市场的开放使得用户行为更加自由,导致负荷曲线的分布情况愈发复杂,产生了簇间负荷曲线形态重叠、分布不均衡的特征;比如工业用户与行政用户形态上都为单峰型负荷,但二者用户数量以及可调容量却相差甚远。现有负荷曲线聚类方法在处理这种数据时会产生“均匀效应”,即大类中的样本会被错误的划分到小类中。因此,为了实现对柔性负荷资源的精细化感知,需要强化对边界区域负荷曲线的分析,降低错误聚类的可能。Cluster analysis is an effective method to mine the potential information of massive load data. Its basic idea is to divide a large number of load curves into several clusters. The load curves within a cluster have a high degree of similarity, while the load curves between clusters have a high degree of difference. At present, most of the research on load curve clustering focuses on data dimension reduction, clustering algorithm performance optimization, optimal number of clusters, and selection of initial cluster centers. However, due to the opening of the power market, user behavior has become more free, resulting in more complex distribution of load curves, resulting in overlapping and uneven distribution of load curves between clusters. For example, industrial users and administrative users are both single-peak loads, but the number of users and adjustable capacity of the two are very different. The existing load curve clustering method will produce a "uniform effect" when processing such data, that is, samples in a large category will be incorrectly divided into small categories. Therefore, in order to achieve refined perception of flexible load resources, it is necessary to strengthen the analysis of load curves in boundary areas to reduce the possibility of incorrect clustering.

粗糙C-means聚类算法是将粗糙集理论融入到C-means聚类算法中,把每个簇看作是一个粗糙集,每一个数据对象或者确定属于某个簇的下近似集,或者同时属于多个簇的上近似集。已有粗糙聚类及其衍生算法考虑了不同边界区域的不同加权系数,对电力市场中负荷曲线的聚类精度有一定的提升,但其对同一边界区域内的数据对象仍是赋予同样的权重,忽视了同一边界区域内部的各种不平衡分布情况,仍会导致负荷曲线聚类错误,不能精确的为需求响应、负荷预测等提供技术支持。The rough C-means clustering algorithm integrates the rough set theory into the C-means clustering algorithm, and regards each cluster as a rough set. Each data object either belongs to the lower approximate set of a certain cluster, or belongs to the upper approximate set of multiple clusters at the same time. The existing rough clustering and its derivative algorithms consider different weighting coefficients of different boundary areas, which has improved the clustering accuracy of load curves in the power market to a certain extent. However, it still gives the same weight to data objects in the same boundary area, ignoring various unbalanced distribution situations within the same boundary area, which will still lead to load curve clustering errors and cannot accurately provide technical support for demand response, load forecasting, etc.

发明内容Summary of the invention

为解决上述技术问题,本发明提供了一种基于改进粗糙C-means的负荷曲线聚类方法,综合考虑同一边界区域内不同负荷曲线与聚类中心的距离以及邻域内数据分布密度的混合不平衡度量,能够有效量化簇内负荷曲线空间分布不平衡对聚类中心迭代的影响程度,可以在很大程度上提升对边界区域负荷曲线的聚类精度,并为负荷曲线的聚类方法开辟新的方向。To solve the above technical problems, the present invention provides a load curve clustering method based on improved rough C-means, which comprehensively considers the distance between different load curves and the cluster center in the same boundary area and the mixed imbalance measurement of the data distribution density in the neighborhood, and can effectively quantify the influence of the spatial distribution imbalance of the load curve within the cluster on the iteration of the cluster center, which can greatly improve the clustering accuracy of the load curve in the boundary area and open up a new direction for the clustering method of load curves.

本发明所述的一种基于改进粗糙C-means的负荷曲线聚类方法,包括以下步骤:The load curve clustering method based on improved rough C-means described in the present invention comprises the following steps:

步骤1、采集历史日负荷数据,获取同一日的N条原始日负荷曲线,并采用极大值归一化方法对原始日负荷曲线进行归一化处理,得到需要进行聚类的N条负荷聚类对象曲线Step 1: Collect historical daily load data and obtain N original daily load curves on the same day , and the maximum normalization method is used to normalize the original daily load curve to obtain N load clustering object curves that need to be clustered , ;

步骤2、确定目标聚类个数c、c个初始聚类中心、下近似权值Wlow、上近似权值Wup和距离判断阈值Δ,Step 2: Determine the number of target clusters c and c initial cluster centers , lower approximate weight W low , upper approximate weight W up and distance judgment threshold Δ, ;

步骤3、对于负荷聚类对象曲线Xk,计算其到各聚类中心的欧氏距离,并将Xk归到最近的聚类中心Ci所对应类簇Ui的上近似集Step 3: For the load clustering object curve X k , , calculate its Euclidean distance to each cluster center, and assign Xk to the upper approximate set of cluster Ui corresponding to the nearest cluster center Ci ;

步骤4、若存在另一聚类中心Cj,使得Xk到Cj的距离和Xk到Ci的距离之差小于阈值Δ,则将Xk同时归入到Cj所对应类簇Uj的上近似集;否则,将Xk归入到类簇Ui的下近似集Step 4: If there is another cluster center C j such that the difference between the distance from X k to C j and the distance from X k to Ci is less than the threshold Δ, then X k is simultaneously included in the upper approximate set of the cluster U j corresponding to C j. Otherwise, classify Xk into the lower approximation set of cluster Ui ;

步骤5、按如下迭代公式更新聚类中心:Step 5: Update the cluster center according to the following iterative formula:

,

其中,hik为混合不平衡度量,上近似集与下近似集之差表示类簇Ui的边界集;Among them, h ik is the mixed imbalance measure, the difference between the upper approximation set and the lower approximation set represents the boundary set of cluster U i ;

步骤6、重复步骤2-步骤5,直到聚类收敛。Step 6: Repeat steps 2 to 5 until the clustering converges.

进一步的,步骤1中,原始日负荷曲线采样间隔为1小时,则同一日内N条原始日负荷曲线为,其中表示第k条原始日负荷曲线,表示第k条原始日负荷曲线在第i个小时的取值。Furthermore, in step 1, the sampling interval of the original daily load curve is 1 hour, so the N original daily load curves on the same day are ,in represents the kth original daily load curve, Represents the value of the kth original daily load curve at the i-th hour.

进一步的,步骤1中采用极大值归一化方法对原始日负荷数据进行归一化处理,具体方法为:Furthermore, in step 1, the maximum value normalization method is used to normalize the original daily load data. The specific method is as follows:

,

从而得到归一化后的N条负荷聚类对象曲线,其中表示第k条负荷聚类对象曲线,Xki表示第k条负荷聚类对象曲线在第i个小时的取值。Thus, we can obtain N normalized load clustering object curves. ,in represents the kth load clustering object curve, and X ki represents the value of the kth load clustering object curve in the ith hour.

进一步的,步骤3中,第k条负荷聚类对象曲线Xk到初始聚类中心Ci=(Ci1, Ci2, …, Ci24) 的欧氏距离计算如下:Furthermore, in step 3, the Euclidean distance from the kth load clustering object curve Xk to the initial clustering center Ci = ( Ci1 , Ci2 , ..., Ci24 ) is calculated as follows:

,

其中,dik为第k条负荷聚类对象曲线Xk到初始聚类中心Ci的欧氏距离。Wherein, d ik is the Euclidean distance from the kth load clustering object curve X k to the initial clustering center Ci .

进一步的,步骤5中,混合不平衡度量hik计算公式如下:Furthermore, in step 5, the calculation formula of the mixed imbalance metric h ik is as follows:

,

其中,表示距离Xk的邻域范围内负荷曲线的个数;表示聚类中心Ci所对应类簇Ui的上近似集内负荷曲线个数;表示第k条负荷聚类对象曲线Xk参与聚类中心Ci迭代时的距离度量,距离越远,则Xk到Ci的距离dik对于Ci更新迭代的影响程度越低,距离越近则影响程度越高;表示第k条负荷聚类对象曲线Xk参与聚类中心Ci迭代时局部密度度量,密度越低,则Xk邻域范围内负荷曲线密度对于Ci更新迭代的影响程度越低,密度越高则影响程度越高。in, Denote the distance X k as The number of load curves within the neighborhood of ; Represents the upper approximation set of cluster Ui corresponding to cluster center Ci Number of internal load curves; It represents the distance metric when the k-th load clustering object curve Xk participates in the iteration of the cluster center Ci. The farther the distance, the lower the influence of the distance dik from Xk to Ci on the update iteration of Ci , and the closer the distance, the higher the influence. It represents the local density measurement when the k-th load clustering object curve Xk participates in the iteration of the cluster center Ci . The lower the density, the lower the influence of the load curve density in the neighborhood of Xk on the update iteration of Ci , and the higher the density, the greater the influence.

本发明所述的有益效果为:本发明所述方法将距离度量和领域密度度量相结合,充分考虑了簇间负荷曲线形态交叉重叠、分布不均衡的影响,为聚类中心迭代公式中每个负荷曲线均赋予不同的加权系数,量化簇内负荷曲线空间分布不平衡对聚类中心迭代的影响程度;负荷曲线越靠近中心、邻域内数据分布密度越高,加权系数就越大,在中心迭代中的贡献越大;同理,负荷曲线越远离中心、邻域内数据分布密度越低,加权系数就越小,在中心迭代中的贡献越小;通过强化对交叉边界区域负荷曲线的分析,有效地处理了负荷曲线分布不均衡的问题,提升了对交叉边界区域负荷曲线的聚类精度,更好地为需求响应、负荷预测等提供技术支持。The beneficial effects described in the present invention are as follows: the method described in the present invention combines distance measurement and domain density measurement, fully considers the influence of cross-overlap and uneven distribution of load curves between clusters, assigns different weighting coefficients to each load curve in the cluster center iteration formula, and quantifies the influence of the imbalanced spatial distribution of load curves within the cluster on the iteration of the cluster center; the closer the load curve is to the center and the higher the data distribution density in the neighborhood, the greater the weighting coefficient, and the greater the contribution to the center iteration; similarly, the farther the load curve is from the center and the lower the data distribution density in the neighborhood, the smaller the weighting coefficient, and the smaller the contribution to the center iteration; by strengthening the analysis of the load curves in the cross-boundary area, the problem of uneven distribution of load curves is effectively dealt with, the clustering accuracy of the load curves in the cross-boundary area is improved, and better technical support is provided for demand response, load forecasting, etc.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明所述方法流程图;FIG1 is a flow chart of the method of the present invention;

图2是本发明所述方法下负荷聚类结果示意图;FIG2 is a schematic diagram of load clustering results under the method of the present invention;

图3是本发明所述方法下聚类中心结果示意图;FIG3 is a schematic diagram of cluster center results under the method of the present invention;

图4是传统改进C-means聚类方法下负荷聚类结果示意图;FIG4 is a schematic diagram of load clustering results under the traditional improved C-means clustering method;

具体实施方式DETAILED DESCRIPTION

为了使本发明的内容更容易被清楚地理解,下面根据具体实施例并结合附图,对本发明作进一步详细的说明。In order to make the contents of the present invention more clearly understood, the present invention is further described in detail below based on specific embodiments in conjunction with the accompanying drawings.

本发明所述的一种基于改进粗糙C-means的负荷曲线聚类方法,如图1所示,包括以下步骤:The load curve clustering method based on improved rough C-means described in the present invention, as shown in FIG1 , comprises the following steps:

步骤1、采集历史日负荷数据,获取同一日的N条原始日负荷曲线,并采用极大值归一化方法对原始日负荷曲线进行归一化处理,得到需要进行聚类的N条负荷聚类对象曲线Step 1: Collect historical daily load data and obtain N original daily load curves on the same day , and the maximum normalization method is used to normalize the original daily load curve to obtain N load clustering object curves that need to be clustered , ;

所述步骤1的原始日负荷曲线采样间隔为1小时,则同一日内N条原始日负荷曲线为,其中表示第k条原始日负荷曲线,表示第k条原始日负荷曲线在第i个小时的取值;The sampling interval of the original daily load curve in step 1 is 1 hour, so the N original daily load curves in the same day are ,in represents the kth original daily load curve, represents the value of the kth original daily load curve at the i-th hour;

所述步骤1中采用极大值归一化方法对原始日负荷数据进行归一化处理,具体方法为:In step 1, the maximum value normalization method is used to normalize the original daily load data. The specific method is:

,

从而得到归一化后的N条负荷聚类对象曲线,其中表示第k条负荷聚类对象曲线,Xki表示第k条负荷聚类对象曲线在第i个小时的取值;Thus, we can obtain N normalized load clustering object curves. ,in represents the k-th load clustering object curve, X ki represents the value of the k-th load clustering object curve at the i-th hour;

步骤2、确定目标聚类个数c、c个初始聚类中心、下近似权值Wlow、上近似权值Wup和距离判断阈值Δ,Step 2: Determine the number of target clusters c and c initial cluster centers , lower approximate weight W low , upper approximate weight W up and distance judgment threshold Δ, ;

步骤3、对于负荷聚类对象曲线Xk,计算其到各聚类中心的欧氏距离,并将Xk归到最近的聚类中心Ci所对应类簇Ui的上近似集Step 3: For the load clustering object curve X k , , calculate its Euclidean distance to each cluster center, and assign Xk to the upper approximate set of cluster Ui corresponding to the nearest cluster center Ci ;

第k条负荷聚类对象曲线Xk到初始聚类中心的欧氏距离计算如下:The kth load clustering object curve X k to the initial clustering center The Euclidean distance is calculated as follows:

,

其中,dik为第k条负荷聚类对象曲线Xk到初始聚类中心Ci的欧氏距离;Where, d ik is the Euclidean distance from the kth load clustering object curve X k to the initial clustering center Ci ;

步骤4、若存在另一聚类中心Cj,使得Xk到Cj的距离和Xk到Ci的距离之差小于阈值Δ,则将Xk同时归入到Cj所对应类簇Uj的上近似集;否则,将Xk归入到类簇Ui的下近似集Step 4: If there is another cluster center C j such that the difference between the distance from X k to C j and the distance from X k to Ci is less than the threshold Δ, then X k is simultaneously included in the upper approximate set of the cluster U j corresponding to C j. Otherwise, classify Xk into the lower approximation set of cluster Ui ;

步骤5、按如下迭代公式更新聚类中心:Step 5: Update the cluster center according to the following iterative formula:

;

其中,hik为混合不平衡度量,上近似集与下近似集之差表示类簇Ui的边界集;Among them, h ik is the mixed imbalance measure, the difference between the upper approximation set and the lower approximation set represents the boundary set of cluster U i ;

所述混合不平衡度量hik计算公式如下:The calculation formula of the hybrid imbalance metric h ik is as follows:

,

其中,表示距离Xk的邻域范围内负荷曲线的个数;表示聚类中心Ci所对应类簇Ui的上近似集内负荷曲线个数;表示第k条负荷聚类对象曲线Xk参与聚类中心Ci迭代时的距离度量,距离越远,则Xk到Ci的距离dik对于Ci更新迭代的影响程度越低,距离越近则影响程度越高;表示第k条负荷聚类对象曲线Xk参与聚类中心Ci迭代时局部密度度量,密度越低,则Xk邻域范围内负荷曲线密度对于Ci更新迭代的影响程度越低,密度越高则影响程度越高;in, Denote the distance X k as The number of load curves within the neighborhood of ; Represents the upper approximation set of cluster Ui corresponding to cluster center Ci Number of internal load curves; It represents the distance metric when the k-th load clustering object curve Xk participates in the iteration of the cluster center Ci. The farther the distance, the lower the influence of the distance dik from Xk to Ci on the update iteration of Ci , and the closer the distance, the higher the influence. It represents the local density measurement of the kth load clustering object curve Xk when participating in the iteration of the cluster center Ci . The lower the density, the lower the influence of the load curve density in the neighborhood of Xk on the update iteration of Ci. The higher the density, the higher the influence.

步骤6、重复步骤2-步骤5,直到聚类收敛。Step 6: Repeat steps 2 to 5 until the clustering converges.

选取某地区2015年8月某工作日实测1910个用户(其中工业用户为200个,行政行业用户为300个,商业用户为400个,居民用户为350个,餐饮行业用户为360个,路灯和高耗能企业用户为300个)的日负荷曲线为研究对象,日负荷采样间隔为1h,每日共计24个采样点。算法目标聚类个数c设置为6,下近似权值Wlow设置为0.9,上近似权值Wup设置为0.1,距离判断阈值Δ设置为0.1,邻域范围设置为0.2。The daily load curve of 1910 users (including 200 industrial users, 300 administrative users, 400 commercial users, 350 residential users, 360 catering users, and 300 street lamp and high energy consumption enterprise users) measured on a certain working day in August 2015 in a certain area was selected as the research object. The daily load sampling interval was 1h, with a total of 24 sampling points per day. The number of algorithm target clusters c was set to 6, the lower approximate weight W low was set to 0.9, the upper approximate weight W up was set to 0.1, the distance judgment threshold Δ was set to 0.1, and the neighborhood range Set to 0.2.

本发明所述的基于改进粗糙C-means的负荷曲线聚类方法结果如图2所示,将用户负荷曲线划分为6类,类别1中的负荷曲线为202条,类别2中的负荷曲线为304条,类别3中的负荷曲线为401条,类别4中的负荷曲线为350条,类别5中的负荷曲线为363条,类别6中的负荷曲线为300条。类别1、2均为单峰型负荷,但类别1用电时间集中在06:00-21:00时段,类别2用电时间集中在08:00-19:00时段,且类别1的平均负荷水平要比类别2高,类别1主要为工业用户,类别2主要为学校、医院、政府等行政行业用户。类别3、4均为双峰型负荷,类别3的两个峰值分别出现在12:00和18:00左右,该类别主要为酒店、超市等商业用户;类别4的两个峰值分别出现在07:00和21:00左右,该类别主要为公寓等居民用户。类别5为三峰型负荷,主要为餐饮行业用户,在早中晚时段均存在高峰用电现象,在上午工作时段、午间休息时段和下午工作时段用电有所下降。类别6为避峰型负荷,主要为公用路灯和高耗能企业用户。6类负荷的聚类中心如图3所示。The result of the load curve clustering method based on improved rough C-means described in the present invention is shown in FIG2, and the user load curves are divided into 6 categories, with 202 load curves in category 1, 304 load curves in category 2, 401 load curves in category 3, 350 load curves in category 4, 363 load curves in category 5, and 300 load curves in category 6. Categories 1 and 2 are both single-peak loads, but the electricity consumption time of category 1 is concentrated in the period of 06:00-21:00, and the electricity consumption time of category 2 is concentrated in the period of 08:00-19:00, and the average load level of category 1 is higher than that of category 2. Category 1 is mainly industrial users, and category 2 is mainly administrative industry users such as schools, hospitals, and governments. Category 3 and 4 are both bimodal loads. The two peaks of Category 3 appear at around 12:00 and 18:00, respectively. This category mainly includes commercial users such as hotels and supermarkets. The two peaks of Category 4 appear at around 07:00 and 21:00, respectively. This category mainly includes residential users such as apartments. Category 5 is a trimodal load, mainly for users in the catering industry. There is a peak power consumption phenomenon in the morning, noon and evening, and the power consumption decreases during the morning working hours, lunch break and afternoon working hours. Category 6 is a peak avoidance load, mainly for public street lamps and high-energy-consuming enterprise users. The cluster centers of the six types of loads are shown in Figure 3.

将本发明所提方法与传统改进C-means负荷聚类方法做对比,传统改进C-means负荷聚类方法只对算法性能进行提升,不考虑负荷曲线不平衡分布的影响。传统改进C-means负荷聚类方法的聚类结果如图4所示,将用户负荷曲线划分为6类,类别1中的负荷曲线为226条,类别2中的负荷曲线为276条,类别3中的负荷曲线为373条,类别4中的负荷曲线为352条,类别5中的负荷曲线为383条,类别6中的负荷曲线为300条。在传统改进C-means负荷聚类方法下,类别2中的相当一部分负荷被错误的划分到类别1中,类别3中也有相当一部分负荷被错误的划分到类别5中。由上述分析可知,本发明所提的基于改进粗糙C-means的负荷曲线聚类方法聚类错误的负荷曲线数量更少,具有更好的分类效果。The method proposed in the present invention is compared with the traditional improved C-means load clustering method. The traditional improved C-means load clustering method only improves the algorithm performance and does not consider the impact of the unbalanced distribution of the load curve. The clustering result of the traditional improved C-means load clustering method is shown in Figure 4. The user load curves are divided into 6 categories. There are 226 load curves in category 1, 276 load curves in category 2, 373 load curves in category 3, 352 load curves in category 4, 383 load curves in category 5, and 300 load curves in category 6. Under the traditional improved C-means load clustering method, a considerable part of the load in category 2 is incorrectly classified into category 1, and a considerable part of the load in category 3 is also incorrectly classified into category 5. From the above analysis, it can be seen that the load curve clustering method based on improved rough C-means proposed in the present invention has fewer incorrectly clustered load curves and has a better classification effect.

为进一步验证所提考虑距离与密度的混合不平衡度量能够提升聚类方法的性能,选取C-means算法、粗糙C-means算法作为对比算法,从如下四个方面评判聚类结果:In order to further verify that the proposed mixed imbalanced metric considering distance and density can improve the performance of clustering methods, the C-means algorithm and the rough C-means algorithm are selected as comparison algorithms, and the clustering results are evaluated from the following four aspects:

1)DBI指标:该指标反映了簇间的分散性和簇内的紧凑性,其值越小说明聚类质量越高;计算公式如下所示:1) DBI index: This index reflects the dispersion between clusters and the compactness within clusters. The smaller the value, the higher the clustering quality. The calculation formula is as follows:

,

式中:为第i类中所有样本到其聚类中心的平均距离;为第i类与第j类聚类 中心的距离;越小,聚类效果更佳; Where: is the average distance from all samples in the i-th class to its cluster center; is the distance between the cluster centers of the i-th and j-th categories; The smaller it is, the better the clustering effect is;

2)SSE指标:该指标反映了所有簇的簇内凝聚度,其值越小说明聚类质量越高。计算公式如下所示:2) SSE index: This index reflects the intra-cluster cohesion of all clusters. The smaller the value, the higher the clustering quality. The calculation formula is as follows:

;

3)SC指标:该指标反映了簇之间的疏密程度,其值越接近1说明簇与簇之间越疏远;3) SC index: This index reflects the density between clusters. The closer its value is to 1, the farther the clusters are from each other.

,

其中a(Xk)表示Xk到同一簇内其他样本的平均距离;b(Xk)表示Xk到其他簇内样本的最小平均距离;Where a(X k ) represents the average distance from X k to other samples in the same cluster; b(X k ) represents the minimum average distance from X k to samples in other clusters;

4)聚类精度AC:该指标是指对比原数据集的决策属性值,被正确聚类的数据对象在数据集中所占的百分比,计算公式为:4) Clustering accuracy AC: This indicator refers to the percentage of correctly clustered data objects in the data set compared with the decision attribute values of the original data set. The calculation formula is:

.

表1 聚类效果对比Table 1 Comparison of clustering effects

表1为聚类效果对比,从聚类指标的反映来看,本发明所提改进聚类方法的DBI指标的值最小,说明簇内更为紧密,簇间更为分散;本发明所提改进聚类方法的SSE指标的值最小,说明各个簇的凝聚度最优;本发明所提改进聚类方法的SC指标的值最大,说明簇间更稀疏;本发明所提改进聚类方法的聚类精度最高,说明聚类更加准确。Table 1 is a comparison of clustering effects. From the perspective of clustering indicators, the DBI index of the improved clustering method proposed in the present invention has the smallest value, indicating that the clusters are closer and the clusters are more dispersed; the SSE index of the improved clustering method proposed in the present invention has the smallest value, indicating that the cohesion of each cluster is optimal; the SC index of the improved clustering method proposed in the present invention has the largest value, indicating that the clusters are more sparse; the clustering accuracy of the improved clustering method proposed in the present invention is the highest, indicating that the clustering is more accurate.

以上所述仅为本发明的优选方案,并非作为对本发明的进一步限定,凡是利用本发明说明书及附图内容所作的各种等效变化均在本发明的保护范围之内。The above description is only a preferred embodiment of the present invention and is not intended to be a further limitation of the present invention. All equivalent changes made using the contents of the present specification and drawings are within the protection scope of the present invention.

Claims (5)

1. The load curve clustering method based on the improved rough C-means is characterized by comprising the following steps of:
step 1, acquiring historical daily load data, and acquiring N original daily load curves of the same day Normalizing the original daily load curve by adopting a maximum normalization method to obtain N load clustering object curves needing to be clustered
Step 2, determining the number c of target clusters and c initial cluster centersA lower approximation weight W low, an upper approximation weight W up, and a distance determination threshold delta,
Step 3, for the load clustering object curve X k,Calculating Euclidean distance from the cluster center to each cluster center, and classifying X k into the upper approximation set of the cluster U i corresponding to the nearest cluster center C i
Step 4, if another cluster center C j exists so that the difference between the distance from X k to C j and the distance from X k to C i is smaller than the threshold delta, classifying X k into the upper approximate set of the cluster U j corresponding to C j at the same time; Otherwise, X k is classified into the lower approximation set of cluster U i
Step 5, updating the clustering center according to the following iterative formula:
Wherein h ik is the mixed imbalance metric, the difference between the upper approximation set and the lower approximation set A boundary set representing a cluster-like U i;
and 6, repeating the steps 2-5 until the clustering converges.
2. The method for clustering load curves based on improved coarse C-means according to claim 1, wherein in step 1, the sampling interval of the original daily load curves is1 hour, and then N original daily load curves in the same day areWhereinThe kth raw daily load curve is shown,The value of the kth original daily load curve at the ith hour is shown.
3. The load curve clustering method based on the improved rough C-means according to claim 1, wherein the method for normalizing the original daily load data by adopting a maximum normalization method in the step 1 is as follows:
thereby obtaining N normalized load clustering object curves WhereinAnd X ki represents the value of the kth load clustering object curve at the ith hour.
4. The load curve clustering method based on modified coarse C-means according to claim 1, wherein in step 3, the euclidean distance from the kth load cluster object curve X k to the initial cluster center C i=(Ci1, Ci2, … , Ci24) is calculated as follows:
Wherein d ik is the Euclidean distance from the kth load clustering object curve X k to the initial clustering center C i.
5. The load curve clustering method based on modified coarse C-means according to claim 1, wherein in step 5, the mixing unbalance measure h ik is calculated as follows:
wherein, Indicating a distance X k ofThe number of load curves in the neighborhood range; Representing the upper approximation set of class cluster U i corresponding to cluster center C i The number of internal load curves; The distance measurement when the kth load clustering object curve X k participates in the iteration of the clustering center C i is shown, the farther the distance is, the lower the influence degree of the distance d ik from X k to C i on the updating iteration of C i is, and the higher the influence degree is when the distance is closer; The k-th load clustering object curve X k is represented to participate in local density measurement when the clustering center C i iterates, the lower the density is, the lower the influence degree of the density of the load curve in the X k neighborhood range on the C i updating iteration is, and the higher the density is, the higher the influence degree is.
CN202410622100.1A 2024-05-20 2024-05-20 Load curve clustering method based on improved rough C-means Active CN118194074B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410622100.1A CN118194074B (en) 2024-05-20 2024-05-20 Load curve clustering method based on improved rough C-means

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410622100.1A CN118194074B (en) 2024-05-20 2024-05-20 Load curve clustering method based on improved rough C-means

Publications (2)

Publication Number Publication Date
CN118194074A CN118194074A (en) 2024-06-14
CN118194074B true CN118194074B (en) 2024-09-10

Family

ID=91404707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410622100.1A Active CN118194074B (en) 2024-05-20 2024-05-20 Load curve clustering method based on improved rough C-means

Country Status (1)

Country Link
CN (1) CN118194074B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199016A (en) * 2019-09-29 2020-05-26 国网湖南省电力有限公司 DTW-based improved K-means daily load curve clustering method
CN111539657A (en) * 2020-05-30 2020-08-14 国网湖南省电力有限公司 Classification and synthesis method of load characteristics of typical electricity industry combined with daily electricity consumption curve of users

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110571793B (en) * 2019-08-23 2021-01-12 华北电力大学 A method for multi-dimensional identification of demand response effects of flexible loads
CN112819299A (en) * 2021-01-21 2021-05-18 上海电力大学 Differential K-means load clustering method based on center optimization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199016A (en) * 2019-09-29 2020-05-26 国网湖南省电力有限公司 DTW-based improved K-means daily load curve clustering method
CN111539657A (en) * 2020-05-30 2020-08-14 国网湖南省电力有限公司 Classification and synthesis method of load characteristics of typical electricity industry combined with daily electricity consumption curve of users

Also Published As

Publication number Publication date
CN118194074A (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN110991786B (en) Parameter identification method of 10kV static load model based on similar daily load curve
CN108280479B (en) Power grid user classification method based on load characteristic index weighted clustering algorithm
CN111177650B (en) Power quality monitoring and comprehensive evaluation system and method for power distribution network
CN106022509B (en) Consider the Spatial Load Forecasting For Distribution method of region and load character double differences
CN103135009B (en) Electric appliance detection method and system based on user feedback information
CN110781332A (en) Clustering method of daily load curve of electric residential users based on compound clustering algorithm
CN111242161B (en) Non-invasive non-resident user load identification method based on intelligent learning
CN105389636A (en) Low-voltage area KFCM-SVR reasonable line loss prediction method
CN110111024A (en) Scientific and technological achievement market value evaluation method based on AHP fuzzy comprehensive evaluation model
CN112819299A (en) Differential K-means load clustering method based on center optimization
CN105160416A (en) Transformer area reasonable line loss prediction method based on principal component analysis and neural network
CN111695807A (en) Regional power grid energy efficiency evaluation method and system considering power generation and power utilization side energy efficiency
CN111949939B (en) Evaluation method of smart meter operating state based on improved TOPSIS and cluster analysis
CN108376262A (en) A kind of analysis model construction method of wind power output typical characteristics
CN110197345A (en) It is a kind of using route as the power distribution network synthesis evaluation method of unit
CN110147871A (en) A kind of stealing detection method and system based on SOM neural network Yu K- mean cluster
CN111754091A (en) A power user demand side control system
CN115238167A (en) Power consumer refined portrait and management method considering load and social information
CN113392877B (en) Daily load curve clustering method based on ant colony algorithm and C-K algorithm
CN111784379B (en) Estimation method and device for electric charge after-payment and screening method and device for abnormal cases
CN111914900A (en) User power consumption mode classification method
CN118194074B (en) Load curve clustering method based on improved rough C-means
CN111259965A (en) A method and system for mean clustering of electrical feature data based on dimension reduction
CN110298603B (en) Distributed photovoltaic system capacity estimation method
CN109064353B (en) Large building user behavior analysis method based on improved cluster fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant