CN115587130A

CN115587130A - Intelligent power utilization platform system

Info

Publication number: CN115587130A
Application number: CN202211278110.5A
Authority: CN
Inventors: 陈前; 夏桃芳; 高琛; 林华; 丁忠安; 王雅平; 鄢盛腾; 林峰; 陈宇颖; 董良彬; 付晓曦; 谢静怡; 陈伟寅; 龚林燕
Original assignee: Fujian Power Supply Service Co ltd; State Grid Fujian Electric Power Co Ltd; Marketing Service Center of State Grid Fujian Electric Power Co Ltd
Current assignee: Fujian Power Supply Service Co ltd; State Grid Fujian Electric Power Co Ltd; Marketing Service Center of State Grid Fujian Electric Power Co Ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-01-10

Abstract

The invention relates to an intelligent power consumption platform system, comprising: a communication module, used for large-scale concurrent user access; an intelligent analysis module, used for intelligently analyzing massive power consumption data; and a data parallel computing module, used for realizing The big data mining of the parallel mining mechanism; and the data storage module for storing the data processed by the intelligent analysis module and the data parallel computing module. The system is conducive to efficient data mining of massive electricity consumption data.

Description

An intelligent power consumption platform system

技术领域technical field

本发明属于大数据计算领域，具体涉及一种智能用电平台系统。The invention belongs to the field of big data computing, and in particular relates to an intelligent power utilization platform system.

背景技术Background technique

目前的用电信息平台依托于传统的oracle数据库系统，可以高效地实现数据录入、查询、统计等功能；通信接入技术使用传统阻塞式IO方法。但是，现有数据库无法发现海量用电数据中存在的关系与规则，也无法根据现有的数据预测未来发展趋势，且传统的系统架构无法满足新形势下高吞吐、高并发的核心诉求。The current electricity consumption information platform relies on the traditional oracle database system, which can efficiently realize data entry, query, statistics and other functions; the communication access technology uses the traditional blocking IO method. However, existing databases cannot discover the relationships and rules existing in massive power consumption data, nor can they predict future development trends based on existing data, and the traditional system architecture cannot meet the core demands of high throughput and high concurrency under the new situation.

发明内容Contents of the invention

本发明的目的在于提供一种智能用电平台系统，该系统有利于高效地对海量用电数据进行数据挖掘。The purpose of the present invention is to provide an intelligent power consumption platform system, which is conducive to efficient data mining of massive power consumption data.

为实现上述目的，本发明采用的技术方案是：一种智能用电平台系统，包括：In order to achieve the above purpose, the technical solution adopted by the present invention is: an intelligent power utilization platform system, comprising:

通信模块，用于大规模并发用户接入；Communication module for large-scale concurrent user access;

智能分析模块，用于对海量用电数据进行智能分析；Intelligent analysis module for intelligent analysis of massive power consumption data;

数据并行计算模块，用于实现基于数据并行挖掘机制的大数据挖掘；以及A data parallel computing module for realizing big data mining based on a data parallel mining mechanism; and

数据存储模块，用于存储智能分析模块和数据并行计算模块处理后的数据。The data storage module is used for storing the data processed by the intelligent analysis module and the data parallel computing module.

进一步地，所述智能分析模块进行数据智能分析的流程包括数据预处理、数据分类分析、数据聚类分析、数据关联性分析和数据整合，所述智能分析模块针对实际业务场景根据用电数据特点提取出有价值的数据，以提升整个系统的管理和服务的水平。Further, the process of intelligent data analysis by the intelligent analysis module includes data preprocessing, data classification analysis, data cluster analysis, data correlation analysis and data integration. Extract valuable data to improve the management and service level of the entire system.

进一步地，所述智能分析模块进行数据智能分析在实际业务场景中的应用包括：Further, the application of intelligent data analysis by the intelligent analysis module in actual business scenarios includes:

1）建立节假日用电调配方案；基于海量用户用电数据建立节假日负荷预测模型，通过建立自动化系数与企业节假日开工概率预测函数，预测节假日开工的转移电量，进一步迭代得到电力企业为达到目标填谷值时应设定的优惠电价，通过设置优惠电价，鼓励部分工业企业在节假日开工，以期达到填谷的效果；1) Establish a holiday power allocation plan; build a holiday load forecasting model based on massive user power consumption data, and predict the transfer power of the holiday start by establishing the automation coefficient and the company's holiday start probability prediction function, and further iterate to get the power company to fill the valley to achieve the goal The preferential electricity price should be set when the value is set. By setting the preferential electricity price, some industrial enterprises are encouraged to start work on holidays, in order to achieve the effect of filling the valley;

2）建立电网月故障次数预测模型；通过划分配变故障次数等级，利用高阶马尔科夫预测模型并计算状态转移矩阵来预测配变故障次数，等级划分的越细致，预测效率越高；2) Establish a prediction model for the number of monthly faults in the power grid; by dividing the number of distribution transformer faults, using a high-order Markov prediction model and calculating the state transition matrix to predict the number of distribution transformer faults, the more detailed the classification, the higher the prediction efficiency;

3）分析大数据用户用电行为；通过提取日负荷特征曲线的特征点，采用聚类SOM神经网络，在Matlab环境下，对数据进行聚类仿真，输出可视化聚类结果，并对聚类结果进行分析，为电网企业优化电力营销服务提供参考和指导；3) Analyze the power consumption behavior of big data users; by extracting the characteristic points of the daily load characteristic curve, using the clustering SOM neural network, in the Matlab environment, perform clustering simulation on the data, output visual clustering results, and analyze the clustering results Conduct analysis to provide reference and guidance for power grid enterprises to optimize power marketing services;

4）进行基于采集数据的窃电行为分析；利用集合经验模态分解方法剔除波动因素后，得到某一用户与其所属的类在任意两个相同的时间段上的负荷剩余趋势项，计算两者的相关性系数，来判断该用户用电行为是否具有异常情况。4) Carry out the electricity stealing behavior analysis based on the collected data; use the ensemble empirical mode decomposition method to eliminate the fluctuation factors, and obtain the load residual trend items of a certain user and its class in any two identical time periods, and calculate the two The correlation coefficient is used to judge whether the user's electricity consumption behavior is abnormal.

进一步地，所述数据并行计算模块采用基于MapReduce模型的数据并行挖掘机制实现，包括数据提取组件、数据分发组件、数据分类组件、数据聚合组件和数据持久化组件，所述数据分类组件包括Map组件，所述数据聚合组件包括Reduce组件，用于综合快速处理用电数据，实现数据处理的实时性。Further, the data parallel computing module is realized by using a data parallel mining mechanism based on the MapReduce model, including a data extraction component, a data distribution component, a data classification component, a data aggregation component and a data persistence component, and the data classification component includes a Map component , the data aggregation component includes a Reduce component, which is used to comprehensively and quickly process the electricity consumption data, so as to realize the real-time performance of data processing.

进一步地，所述数据并行挖掘机制根据查询用电数据，结合具体业务，对用电数据进行分类、聚合操作，最终将结果数据汇总和持久化到文件或数据库中；数据提取组件根据数据加载管理器已配置的加载规则，以终端类型、供电单位条件，分块并行由数据库预加载待分析的原始数据放入到内存缓存中；数据加载管理器实时监听各个数据集状态，若发现某个数据集失效，则通知重新加载相应原始数据；数据分发组件根据提取到的原始数据，进行排序后，均衡调度分发给Map处理组件；Map组件根据配置的各种业务规则，对原始数据进行更小粒度的分割，这个过程会对同个数据集进行多次迭代分割，最终生成中间结果；Reduce组件根据配置的业务分析规则，对中间结果进行多次迭代计算、合并操作，生成最终结果，同样保存在内存缓存区中；数据持久化组件负责最终结果写入数据库。Further, the data parallel mining mechanism classifies and aggregates the electricity consumption data according to the electricity consumption data query, combined with the specific business, and finally summarizes and persists the result data into a file or database; the data extraction component manages the data according to the data loading The loading rules configured by the controller, according to the terminal type and power supply unit conditions, the original data to be analyzed are preloaded by the database in parallel and put into the memory cache; the data loading manager monitors the status of each data set in real time, and if a certain data is found If the set fails, it will notify the corresponding original data to be reloaded; the data distribution component sorts the extracted original data, and distributes them to the Map processing component in a balanced manner; In this process, the same data set will be divided iteratively for multiple times to finally generate intermediate results; the Reduce component will perform multiple iterative calculations and merge operations on the intermediate results according to the configured business analysis rules to generate the final results, which are also stored in In the memory cache; the data persistence component is responsible for writing the final result to the database.

进一步地，所述通信模块采用高并发通信技术，其实现方法为：当某个连接发送请求到服务器，服务器把这个连接请求当作一个请求“事件”，并把这个“事件”分配给相应的函数处理；然后把这个处理函数放到线程中去执行，执行完就把线程归还，从而使一个线程可以异步的处理多个事件。Further, the communication module adopts high concurrent communication technology, and its implementation method is as follows: when a connection sends a request to the server, the server regards the connection request as a request "event" and assigns the "event" to the corresponding Function processing; then put this processing function into a thread to execute, and return the thread after execution, so that one thread can process multiple events asynchronously.

进一步地，所述通信模块采用Zookeeper多服务节点的协同架构，其实现方法为：一般client端的读写请求都由总leader调度，总leader 通过调度机制可以知道每个server是否活跃的状态，并进行负荷信息的总调度，以做到各个server的负载均衡；总leader通过Zookeeper 选举算法得出，因此即使总leader 发生故障，整个系统还会选举出另外一个总leader来做调度服务。Further, the communication module adopts the collaborative architecture of Zookeeper multi-service nodes, and its implementation method is: generally, the read and write requests of the client end are scheduled by the general leader, and the general leader can know whether each server is active through the scheduling mechanism, and perform The overall scheduling of load information is used to achieve load balancing of each server; the overall leader is obtained through the Zookeeper election algorithm, so even if the overall leader fails, the entire system will elect another overall leader to do the scheduling service.

进一步地，所述通信模块采用针对单个服务节点的多线程技术，其实现方法为：一个线程池包括线程池管理器、工作线程、任务接口、任务队列四个部分；一个程序可以创建多个服务节点，一个服务节点为一个进程，每个进程可以创建多个线程池，每个线程池由多个独立线程组成；线程为程序运行中系统的最小资源的单位；在大规模并发用户的通信中，需要处理大量客户端的通信连接，每个连接都通过一个线程去处理。Further, the communication module adopts multi-threading technology for a single service node, and its implementation method is as follows: a thread pool includes four parts: thread pool manager, worker thread, task interface, and task queue; one program can create multiple service nodes Node, a service node is a process, each process can create multiple thread pools, each thread pool is composed of multiple independent threads; thread is the smallest resource unit of the system during program running; in the communication of large-scale concurrent users , need to handle a large number of client communication connections, each connection is processed by a thread.

与现有技术相比，本发明具有以下有益效果：本发明通过对海量用电数据进行数据挖掘，实现营销管理的智能化查询、监督、统计、分析和预测，使管理层能够及时全面地了解各单位营销与服务各项指标完成情况及业务发展情况，为公司经营管理提供决策依据。在实际业务场景中，通过负荷预测模型可获取企业与居民在不同时间点负荷情况，从而设置不同的电价利用经济杠杆进行削峰填谷；准确预测配变线路故障可以使电力企业及时掌握未来线路可能发生的故障次数，因此可以制定相应的措施尽量减少停电、投入更大的精力解除设备缺陷、科学安排检修计划，同时向社会和广大客户做好宣传工作，将社会不良影响尽量减小；而用电企业也能根据电力企业提供的预测结果科学地安排生产计划，减少由于停电造成的损失，因此对配变故障次数的准确预测也能带来一定的经济效益；从海量负荷数据中提取用户日负荷特征曲线并进行深入数据挖掘分析能够准确及时获取用户用电行为特征，提升电力营销服务水平。数据并行挖掘机制采用的系统具有扩展性，稳定性等特点，使用MapReduce模式，处理数据并行化程度高，能够极大的提高程序的运行效率，而且在某些程度上完成串行算法不能完成的运算，为处理海量电力数据提供了有力的技术支持。Compared with the prior art, the present invention has the following beneficial effects: the present invention realizes intelligent query, supervision, statistics, analysis and prediction of marketing management through data mining of massive electricity consumption data, enabling the management to timely and comprehensively understand The completion of various marketing and service indicators and business development of each unit provide decision-making basis for the company's operation and management. In actual business scenarios, the load forecasting model can be used to obtain the load conditions of enterprises and residents at different time points, so as to set different electricity prices and use economic leverage to cut peaks and fill valleys; accurate prediction of distribution transformer line failures can enable power companies to grasp future lines in a timely manner The number of failures that may occur, so we can formulate corresponding measures to minimize power outages, invest more energy to eliminate equipment defects, scientifically arrange maintenance plans, and at the same time do a good job of publicity to the society and customers to minimize social adverse effects; and Electricity companies can also scientifically arrange production plans based on the prediction results provided by power companies to reduce losses caused by power outages. Therefore, accurate prediction of the number of distribution transformer failures can also bring certain economic benefits; users can be extracted from massive load data The daily load characteristic curve and in-depth data mining analysis can accurately and timely obtain the user's electricity consumption behavior characteristics and improve the power marketing service level. The system adopted by the data parallel mining mechanism has the characteristics of scalability and stability. Using the MapReduce mode, the degree of data parallelization is high, which can greatly improve the operating efficiency of the program, and to some extent, it can complete what the serial algorithm cannot complete. It provides powerful technical support for processing massive power data.

附图说明Description of drawings

图1是本发明实施例的系统架构示意图；Fig. 1 is a schematic diagram of the system architecture of an embodiment of the present invention;

图2是本发明实施例中ZooKeeper多服务节点通信模块的架构图；Fig. 2 is the architectural diagram of ZooKeeper multi-service node communication module in the embodiment of the present invention;

图3是本发明实施例中线程池技术模型图；Fig. 3 is a technical model diagram of thread pool in the embodiment of the present invention;

图4是本发明实施例中大规模并发用户接入方案的架构图；FIG. 4 is an architecture diagram of a large-scale concurrent user access solution in an embodiment of the present invention;

图5是本发明实施例中海量用电数据智能分析的数据流向图；Fig. 5 is a data flow diagram of intelligent analysis of massive power consumption data in an embodiment of the present invention;

图6是本发明实施例中MapReduce工作流程图；Fig. 6 is a MapReduce work flowchart in the embodiment of the present invention;

图7是本发明实施例中数据并行挖掘机制功能图；7 is a functional diagram of a data parallel mining mechanism in an embodiment of the present invention;

图8是本发明实施例中数据并行挖掘机制流程图。Fig. 8 is a flowchart of the data parallel mining mechanism in the embodiment of the present invention.

具体实施方式detailed description

下面结合附图及实施例对本发明做进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

应该指出，以下详细说明都是示例性的，旨在对本申请提供进一步的说明。除非另有指明，本文使用的所有技术和科学术语具有与本申请所属技术领域的普通技术人员通常理解的相同含义。It should be pointed out that the following detailed description is exemplary and is intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

需要注意的是，这里所使用的术语仅是为了描述具体实施方式，而非意图限制根据本申请的示例性实施方式。如在这里所使用的，除非上下文另外明确指出，否则单数形式也意图包括复数形式，此外，还应当理解的是，当在本说明书中使用术语“包含”和/或“包括”时，其指明存在特征、步骤、操作、器件、组件和/或它们的组合。It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and/or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and/or combinations thereof.

如图1所示，本实施例提供了一种智能用电平台系统，包括通信模块、智能分析模块、数据并行计算模块和数据存储模块。其中，通信模块主要负责大规模并发用户的接入。智能分析模块用于实现海量用电数据的智能分析。数据并行计算模块主要实现基于数据并行挖掘机制的大数据挖掘。数据存储模块主要用于存储智能分析模块和数据并行计算模块处理后的数据，其采用通用的关系型数据库Oracle，有利于系统兼容和可编程型。As shown in FIG. 1 , this embodiment provides an intelligent power utilization platform system, including a communication module, an intelligent analysis module, a data parallel computing module and a data storage module. Among them, the communication module is mainly responsible for the access of large-scale concurrent users. The intelligent analysis module is used to realize the intelligent analysis of massive power consumption data. The data parallel computing module mainly implements big data mining based on the data parallel mining mechanism. The data storage module is mainly used to store the data processed by the intelligent analysis module and the data parallel computing module. It adopts a general-purpose relational database Oracle, which is conducive to system compatibility and programmability.

所述通信模块应用了高并发通信技术。传统阻塞式IO的瓶颈在于不能处理过多的连接，本发明采用基于非阻塞式IO的高并发通信技术来解决这个问题。具体实现方法为：当某个连接发送请求到服务器，服务器把这个连接请求当作一个请求“事件”，并把这个“事件”分配给相应的函数处理；然后把这个处理函数放到线程中去执行，执行完就把线程归还，从而使一个线程可以异步的处理多个事件。这种异步非阻塞通信技术提高了通信效率。The communication module applies high concurrent communication technology. The bottleneck of traditional blocking IO lies in the inability to handle too many connections. The present invention solves this problem by using a high-concurrency communication technology based on non-blocking IO. The specific implementation method is: when a connection sends a request to the server, the server regards the connection request as a request "event", and assigns the "event" to the corresponding function for processing; then puts the processing function into the thread Execute, and return the thread after execution, so that one thread can process multiple events asynchronously. This asynchronous non-blocking communication technology improves communication efficiency.

所述通信模块采用了如图2所示的Zookeeper多服务节点的协同架构。其实现方法为：一般client端的读写请求都由总leader调度，总leader 通过调度机制可以知道每个server是否活跃的状态，并进行负荷信息的总调度，以做到各个server的负载均衡；总leader通过Zookeeper 选举算法得出，因此即使总leader 发生故障，整个系统还会选举出另外一个总leader来做调度服务。The communication module adopts the collaborative architecture of Zookeeper multi-service nodes as shown in FIG. 2 . The implementation method is as follows: general client-side read and write requests are scheduled by the general leader, and the general leader can know whether each server is active or not through the scheduling mechanism, and perform general scheduling of load information to achieve load balancing of each server; The leader is obtained through the Zookeeper election algorithm, so even if the general leader fails, the entire system will elect another general leader to provide scheduling services.

所述通信模块应用了如图3所示的针对单个服务节点的多线程技术。其实现方法为：一个线程池包括线程池管理器、工作线程、任务接口、任务队列四个部分；一个程序可以创建多个服务节点，一个服务节点为一个进程，每个进程可以创建多个线程池，每个线程池由多个独立线程组成；线程为程序运行中系统的最小资源的单位；在大规模并发用户的通信中，需要处理大量客户端的通信连接，每个连接都通过一个线程去处理，从而提高了程序运行的效率。The communication module applies the multi-thread technology for a single service node as shown in FIG. 3 . The implementation method is as follows: a thread pool includes four parts: thread pool manager, worker thread, task interface, and task queue; a program can create multiple service nodes, a service node is a process, and each process can create multiple threads Pool, each thread pool is composed of multiple independent threads; thread is the smallest resource unit of the system during program running; in the communication of large-scale concurrent users, it is necessary to handle a large number of client communication connections, and each connection goes through a thread Processing, thereby improving the efficiency of program operation.

大规模用户并发接入方案采用线程池和非阻塞异步通信技术，并将系统功能组件化设计，大大提升单个业务节点的用户并发接入能力。以Zookeeper技术实现分布式机制，支持业务节点线性扩展。大规模用户接入方案采用组件式架构，如图4所示，每个server包括线程池和通讯管理模块；线程池中提供通信线程和业务线程，通信线程只处理通信连接，业务线程只负责业务处理，通信线程与业务线程独立分开，线程由线程池统一管理，单个线程的执行异常不会影响到其他线程的执行，这种线程池能够提高单个节点的稳定性；采用非阻塞异步通信方式，可以使接收线程和响应线程分开，能够大幅度提高通信信道的吞吐量；采用多节点集群部署，每个server 都有通讯管理模块，通过zookeeper技术，可以实现多个节点负载均衡，当某个server 故障时候，此server 上的客户终端会自动调度到其他正常工作的server 上，从而有效消除单个节点的故障。也可以线性增加server ，使客户端重新分配，减少单个服务节点的客户端连接数。The large-scale user concurrent access solution adopts thread pool and non-blocking asynchronous communication technology, and the system function is designed in components, which greatly improves the concurrent user access capability of a single business node. The distributed mechanism is implemented with Zookeeper technology, which supports linear expansion of business nodes. The large-scale user access solution adopts a component architecture, as shown in Figure 4, each server includes a thread pool and a communication management module; the thread pool provides communication threads and business threads, the communication thread only handles communication connections, and the business thread is only responsible for business Processing, the communication thread is separated from the business thread independently, and the thread is managed by the thread pool. The abnormal execution of a single thread will not affect the execution of other threads. This thread pool can improve the stability of a single node; using non-blocking asynchronous communication mode, The receiving thread and the response thread can be separated, and the throughput of the communication channel can be greatly improved; multi-node cluster deployment is adopted, and each server has a communication management module. Through zookeeper technology, load balancing of multiple nodes can be achieved. When a server In the event of a failure, the client terminal on this server will be automatically dispatched to other servers that are working normally, thus effectively eliminating the failure of a single node. It is also possible to linearly increase the server to redistribute clients and reduce the number of client connections of a single service node.

本发明的具体工作原理如下：Concrete working principle of the present invention is as follows:

本智能用电平台通过研究智能用电海量信息挖掘和大规模并发用户接入系统，设计出适合用电数据的智能分析的方法和并行化数据挖掘机制，并极大地提高了平台的通信效率。This intelligent electricity consumption platform designs a method suitable for intelligent analysis of electricity consumption data and a parallel data mining mechanism by studying the massive information mining of intelligent electricity consumption and the large-scale concurrent user access system, and greatly improves the communication efficiency of the platform.

如图5所示，所述智能分析模块进行数据智能分析的流程包括数据预处理、数据分类分析、数据聚类分析、数据关联性分析和数据整合，所述智能分析模块针对实际业务场景根据用电数据特点提取出有价值的数据，以提升整个系统的管理和服务的水平。所述智能分析模块进行数据智能分析在实际业务场景中的应用包括：As shown in Figure 5, the process of intelligent data analysis by the intelligent analysis module includes data preprocessing, data classification analysis, data cluster analysis, data correlation analysis, and data integration. Valuable data can be extracted from electrical data characteristics to improve the management and service level of the entire system. The application of the intelligent analysis module to perform data intelligent analysis in actual business scenarios includes:

1）建立节假日用电调配方案；基于海量用户用电数据建立节假日负荷预测模型，通过建立自动化系数与企业节假日开工概率预测函数，预测节假日开工的转移电量，进一步迭代得到电力企业为达到目标填谷值时应设定的优惠电价，通过设置优惠电价，鼓励部分工业企业在节假日开工，以期达到填谷的效果。1) Establish a holiday power allocation plan; build a holiday load forecasting model based on massive user power consumption data, and predict the transfer power of the holiday start by establishing the automation coefficient and the company's holiday start probability prediction function, and further iterate to get the power company to fill the valley to achieve the goal The preferential electricity price should be set when the value is set. By setting the preferential electricity price, some industrial enterprises are encouraged to start work on holidays, in order to achieve the effect of filling the valley.

2）建立电网月故障次数预测模型；通过划分配变故障次数等级，利用高阶马尔科夫预测模型并计算状态转移矩阵来预测配变故障次数，等级划分的越细致，预测效率越高。2) Establish a prediction model for the number of monthly faults in the power grid; by dividing the number of distribution transformer faults, using a high-order Markov prediction model and calculating the state transition matrix to predict the number of distribution transformer faults, the more detailed the classification, the higher the prediction efficiency.

3）分析大数据用户用电行为；通过提取日负荷特征曲线的特征点，采用聚类SOM神经网络，在Matlab环境下，对数据进行聚类仿真，输出可视化聚类结果，并对聚类结果进行分析，验证了采用SOM神经网络聚类算法提取出的公变用户日负荷特征曲线能够较好地显示不同类型公变用户用电行为特征上的差异，具备良好的聚类效果，同时为电网企业优化电力营销服务提供参考和指导。3) Analyze the power consumption behavior of big data users; by extracting the characteristic points of the daily load characteristic curve, using the clustering SOM neural network, in the Matlab environment, perform clustering simulation on the data, output visual clustering results, and analyze the clustering results Through the analysis, it is verified that the daily load characteristic curves of public transformer users extracted by the SOM neural network clustering algorithm can better show the differences in the behavior characteristics of different types of public transformer users, and have a good clustering effect. Provide reference and guidance for enterprises to optimize power marketing services.

4）进行基于采集数据的窃电行为分析；利用EEMD方法（集合经验模态分解方法）剔除波动因素后，得到某一用户与其所属的类在任意两个相同的时间段上的负荷剩余趋势项，计算两者的相关性系数，来判断该用户用电行为是否具有异常情况。4) Carry out the electricity stealing behavior analysis based on the collected data; use the EEMD method (Ensemble Empirical Mode Decomposition Method) to remove the fluctuation factors, and obtain the load residual trend item of a certain user and its class in any two identical time periods , and calculate the correlation coefficient between the two to judge whether the user's electricity consumption behavior is abnormal.

所述数据并行计算模块采用基于MapReduce模型的数据并行挖掘机制实现。如图6所示为所述MapReduce工作流程，其是一个针对大规模群组中的海量数据处理的分布式编程模型。从MapReduce框架的实现角度看，MapReduce程序有着两个组件：一个实现了Mapper，另一个实现了Reducer，最终可实现把大规模的数据转变成更小的总结数据。The data parallel computing module is realized by using a data parallel mining mechanism based on the MapReduce model. Figure 6 shows the MapReduce workflow, which is a distributed programming model for massive data processing in large-scale groups. From the perspective of the implementation of the MapReduce framework, the MapReduce program has two components: one implements the Mapper, and the other implements the Reducer, which can finally convert large-scale data into smaller summary data.

如图7、8所示，所述数据并行挖掘机制包括数据提取组件、数据分发组件、数据分类组件、数据聚合组件和数据持久化组件，所述数据分类组件包括Map组件，所述数据聚合组件包括Reduce组件，用于综合快速处理用电数据，实现数据处理的实时性。As shown in Figures 7 and 8, the data parallel mining mechanism includes a data extraction component, a data distribution component, a data classification component, a data aggregation component and a data persistence component, the data classification component includes a Map component, and the data aggregation component Including the Reduce component, which is used to comprehensively and quickly process power consumption data to realize real-time data processing.

所述数据并行挖掘机制根据查询用电数据，结合具体业务，对用电数据进行分类、聚合等操作，最终将结果数据汇总和持久化到文件或数据库中；数据提取组件根据数据加载管理器已配置的加载规则，以终端类型、供电单位等条件，分块并行由Oracle数据库预加载待分析的原始数据放入到内存缓存中；数据加载管理器实时监听各个数据集状态，若发现某个数据集失效，则通知重新加载相应原始数据；数据分发组件根据提取到的原始数据，进行排序后，均衡调度分发给Map处理组件；Map组件根据配置的各种业务规则，对原始数据进行更小粒度的分割，这个过程会对同个数据集进行多次迭代分割，最终生成中间结果；Reduce组件根据配置的业务分析规则，对中间结果进行多次迭代计算、合并等操作，生成最终结果，同样保存在内存缓存区中；数据持久化组件负责最终结果写入Oracle数据库。The data parallel mining mechanism classifies and aggregates the electricity consumption data according to the electricity consumption data query, combined with the specific business, and finally summarizes and persists the result data into a file or database; The configured loading rules, based on the terminal type, power supply unit and other conditions, the original data to be analyzed is preloaded by the Oracle database in parallel and put into the memory cache; the data loading manager monitors the status of each data set in real time, and if a certain data is found If the set fails, it will notify the corresponding original data to be reloaded; the data distribution component sorts the extracted original data, and distributes them to the Map processing component in a balanced manner; Segmentation, this process will perform multiple iterative segmentations on the same data set, and finally generate intermediate results; the Reduce component performs multiple iterative calculations, merges, and other operations on the intermediate results according to the configured business analysis rules to generate the final results, which are also saved In the memory cache area; the data persistence component is responsible for writing the final result to the Oracle database.

在实际业务应用中，本发明基于MapReduce设计了一个用于海量电力数据异常检测的并行化计算方法，首先采用initTableMapperJob的方法读取Hbase之上的各分块的采集点数据；之后每个计算节点上的Map任务对输入的采集点数据进行分析，Map任务首先使用异常发现算法对此节点上分配的采集点的数据进行分析，监测异常数据的产生。当发现某个采集点的数据具有一个或多个异常数据基本特征时，即可确认该采集点异常。然后，根据异常数据所表现的基本特征，执行相应类别的特征分析算法，进一步分析异常数据的所有数据异常特征。因为MapReduce运行在多节点集群之上，同一时刻不同Map之间进行并行计算。最后执行Reduce任务，并在Reduce阶段把检测到的异常数据及其特征集输入到HBase表中，这样就完成了异常数据的发现过程。In practical business applications, the present invention designs a parallel computing method for abnormal detection of massive power data based on MapReduce. First, the method of initTableMapperJob is used to read the collection point data of each block on Hbase; after that, each computing node The Map task on the node analyzes the input collection point data. The Map task first uses the exception discovery algorithm to analyze the data of the collection points allocated on this node, and monitors the generation of abnormal data. When the data of a collection point is found to have one or more basic characteristics of abnormal data, it can be confirmed that the collection point is abnormal. Then, according to the basic characteristics of the abnormal data, the corresponding type of feature analysis algorithm is executed to further analyze all data abnormal characteristics of the abnormal data. Because MapReduce runs on a multi-node cluster, parallel computing is performed between different Maps at the same time. Finally, the Reduce task is executed, and the detected abnormal data and its feature set are input into the HBase table in the Reduce stage, thus completing the discovery process of abnormal data.

以上所述，仅是本发明的较佳实施例而已，并非是对本发明作其它形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例。但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所作的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention to other forms. Any skilled person who is familiar with this profession may use the technical content disclosed above to change or modify the equivalent of equivalent changes. Example. However, any simple modifications, equivalent changes and modifications made to the above embodiments according to the technical essence of the present invention without departing from the content of the technical solution of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims

1. An intelligent electricity platform system, comprising:

the communication module is used for accessing large-scale concurrent users;

the intelligent analysis module is used for intelligently analyzing mass power utilization data;

the data parallel computing module is used for realizing big data mining based on a data parallel mining mechanism; and

and the data storage module is used for storing the data processed by the intelligent analysis module and the data parallel computing module.

2. The intelligent power utilization platform system according to claim 1, wherein the intelligent analysis module performs intelligent data analysis including data preprocessing, data classification analysis, data clustering analysis, data association analysis and data integration, and extracts valuable data according to power utilization data characteristics in view of actual service scenarios to improve the management and service level of the whole system.

3. The intelligent power platform system according to claim 2, wherein the application of the intelligent analysis module to intelligent analysis of data in an actual service scenario comprises:

1) Establishing a festival and holiday power utilization allocation scheme; establishing a holiday load prediction model based on massive user power consumption data, predicting the transferred electric quantity of holiday start operation by establishing an automation number and an enterprise holiday start operation probability prediction function, further iterating to obtain a preferential power price which is set for the power enterprise to reach a target valley filling value, and encouraging part of industrial enterprises to start operation on holidays by setting the preferential power price so as to achieve the effect of valley filling;

2) Establishing a power grid monthly fault frequency prediction model; the distribution transformer fault frequency is predicted by dividing the distribution transformer fault frequency grade and utilizing a high-order Markov prediction model and calculating a state transition matrix, and the more detailed the grade division is, the higher the prediction efficiency is;

3) Analyzing the power utilization behavior of a big data user; by extracting characteristic points of a daily load characteristic curve, performing clustering simulation on data in a Matlab environment by adopting a clustering SOM neural network, outputting a visual clustering result, and analyzing the clustering result, so as to provide reference and guidance for power grid enterprise optimization power marketing service;

4) Carrying out electricity stealing behavior analysis based on the collected data; after fluctuation factors are eliminated by using a set empirical mode decomposition method, a load residual trend item of a certain user and the class to which the user belongs in any two same time periods is obtained, and a correlation coefficient of the two items is calculated to judge whether the power utilization behavior of the user has an abnormal condition.

4. The intelligent power utilization platform system according to claim 1, wherein the data parallel computing module is implemented by a data parallel mining mechanism based on a MapReduce model, and comprises a data extraction component, a data distribution component, a data classification component, a data aggregation component and a data persistence component, wherein the data classification component comprises a Map component, and the data aggregation component comprises a Reduce component and is used for comprehensively and rapidly processing power utilization data to achieve real-time performance of data processing.

5. The intelligent power utilization platform system according to claim 4, wherein the data parallel mining mechanism classifies and aggregates power utilization data according to queried power utilization data in combination with specific services, and finally summarizes and persists result data into a file or a database; the data extraction component is used for blocking and parallelly preloading original data to be analyzed by a database according to a loading rule configured by a data loading manager and terminal type and power supply unit conditions and putting the original data into a memory cache; the data loading manager monitors the state of each data set in real time, and if a certain data set is found to be invalid, the data loading manager informs to reload corresponding original data; the data distribution component sorts the extracted original data and distributes the sorted original data to the Map processing component in a balanced dispatching manner; the Map component performs smaller-granularity segmentation on the original data according to various configured business rules, and the process performs iterative segmentation on the same data set for multiple times to finally generate an intermediate result; the Reduce component performs repeated iterative computation and combination operation on the intermediate result according to the configured service analysis rule to generate a final result, and the final result is also stored in a memory cache region; the data persistence component is responsible for writing the final result to the database.

6. The intelligent power platform system according to claim 1, wherein the communication module employs a high-concurrency communication technology, and the implementation method is as follows: when a connection sends a request to a server, the server treats the connection request as a request 'event' and distributes the 'event' to the corresponding function processing; the processing function is then put into a thread for execution, and the thread is returned after execution is completed, so that one thread can asynchronously process a plurality of events.

7. The intelligent power platform system according to claim 1, wherein the communication module adopts a Zookeeper multi-service node cooperative architecture, and the implementation method thereof is as follows: generally, read-write requests of client ends are scheduled by a master leader, the master leader can know whether each server is active or not through a scheduling mechanism, and total scheduling of load information is carried out, so that load balance of each server is achieved; the total leader is obtained through a Zookeeper election algorithm, so that even if the total leader fails, the whole system can select another total leader for scheduling service.

8. The intelligent power utilization platform system according to claim 1, wherein the communication module adopts a multithreading technology for a single service node, and the realization method is as follows: one thread pool comprises a thread pool manager, a working thread, a task interface and a task queue; one program can create a plurality of service nodes, one service node is a process, each process can create a plurality of thread pools, and each thread pool is composed of a plurality of independent threads; the thread is the unit of the minimum resource of the system in the program operation; in the communication of large-scale concurrent users, communication connections of a large number of clients need to be processed, and each connection is processed through one thread.