[go: up one dir, main page]

CN114064997B - An artificial intelligence power dispatch decision-making system based on big data - Google Patents

An artificial intelligence power dispatch decision-making system based on big data Download PDF

Info

Publication number
CN114064997B
CN114064997B CN202111310992.4A CN202111310992A CN114064997B CN 114064997 B CN114064997 B CN 114064997B CN 202111310992 A CN202111310992 A CN 202111310992A CN 114064997 B CN114064997 B CN 114064997B
Authority
CN
China
Prior art keywords
data
module
information
professional
crawling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111310992.4A
Other languages
Chinese (zh)
Other versions
CN114064997A (en
Inventor
杨斌
周航
邓星
潘小辉
孙佳炜
胡健
宋冰倩
马楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nanjing Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202111310992.4A priority Critical patent/CN114064997B/en
Publication of CN114064997A publication Critical patent/CN114064997A/en
Application granted granted Critical
Publication of CN114064997B publication Critical patent/CN114064997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种基于大数据的人工智能电力调度决策系统。该基于大数据的人工智能电力调度决策系统,包括:数据爬取模块:利用Focused Crawler软件开发,从国内外行业网站、多媒体中搜索获取与电力行业相关的数据源,该基于大数据的人工智能电力调度决策系统,通过数据爬取模块与数据库储存模块的配合,可采集互联网中有关电力行业的专业信息,并可通过数据库管理模块和数据处理分析模块,对采集的信息数据进行管理和归类处理,从而建立专业性数据信息库,同时通过信息爬取时间识别模块与最新信息推送模块的配合,可利用Web页在专业数据信息库中检索与电力行业有关的有效信息,同时实现最新有效信息的智能推送,以便实现电力调度的智能决策。

The present invention provides an artificial intelligence power dispatching decision system based on big data. The artificial intelligence power dispatching decision system based on big data includes: a data crawling module: developed using Focused Crawler software, searching and acquiring data sources related to the power industry from domestic and foreign industry websites and multimedia. The artificial intelligence power dispatching decision system based on big data can collect professional information about the power industry on the Internet through the cooperation of the data crawling module and the database storage module, and can manage and classify the collected information data through the database management module and the data processing and analysis module, so as to establish a professional data information library. At the same time, through the cooperation of the information crawling time identification module and the latest information push module, the effective information related to the power industry can be retrieved from the professional data information library using a Web page, and the intelligent push of the latest effective information can be realized, so as to realize the intelligent decision of power dispatching.

Description

一种基于大数据的人工智能电力调度决策系统An artificial intelligence power dispatch decision-making system based on big data

技术领域Technical Field

本发明涉及电力调度技术领域,具体为一种基于大数据的人工智能电力调度决策系统。The present invention relates to the technical field of electric power dispatching, and in particular to an artificial intelligence electric power dispatching decision system based on big data.

背景技术Background Art

电力调度是为了保证电网安全稳定运行、可靠供电、电力生产工作有序进行而采取的一种有效管理手段。电力调度专业支持系统的更迭、完善、发展与国家大政方针、行业发展趋势紧密相关,现阶段大量的政策导向文件、技术革新论文、最新行业标准,均是电力调度决策的依据。Power dispatching is an effective management method to ensure the safe and stable operation of the power grid, reliable power supply, and orderly power production. The replacement, improvement, and development of the power dispatching professional support system are closely related to national policies and industry development trends. At this stage, a large number of policy-oriented documents, technical innovation papers, and the latest industry standards are the basis for power dispatching decisions.

随着网络时代的快速发展,可运用大数据、人工智能等现代信息技术,通过对电力科技与经济大数据、专家大数据、区域电力产业大数据、互联网电力信息资源融合分析利用,但是现有的电力调度技术支持系统在日常工作中,对上级主管部门的文件、行业最新发展等信息缺乏有效检索、智能分类推送和提取关键有效信息的手段,会影响电力调度的有效决策。With the rapid development of the Internet era, modern information technologies such as big data and artificial intelligence can be used to integrate and analyze power technology and economic big data, expert big data, regional power industry big data, and Internet power information resources. However, in daily work, the existing power dispatching technical support system lacks the means to effectively retrieve, intelligently classify and push, and extract key and effective information on documents from higher-level authorities, the latest industry developments, etc., which will affect the effective decision-making of power dispatching.

发明内容Summary of the invention

为实现以上基于大数据的人工智能电力调度决策系统目的,本发明通过以下技术方案予以实现:一种基于大数据的人工智能电力调度决策系统,包括:In order to achieve the above purpose of the artificial intelligence power dispatching decision system based on big data, the present invention is implemented through the following technical solutions: an artificial intelligence power dispatching decision system based on big data, comprising:

数据爬取模块:利用Focused Crawler软件开发,从国内外行业网站、多媒体中搜索获取与电力行业相关的数据源;Data crawling module: Developed using the Focused Crawler software, it searches for data sources related to the power industry from domestic and foreign industry websites and multimedia;

数据库储存模块:用于储存数据爬取模块获取的数据源;Database storage module: used to store the data source obtained by the data crawling module;

数据库管理模块:利用MySQL数据库的智能管理,对数据库储存模块中的数据源进行处理,处理包括过滤无用信息、去重纠错、提取标签、数据安全管理;Database management module: Utilizes the intelligent management of MySQL database to process the data source in the database storage module, including filtering useless information, deduplication and error correction, label extraction, and data security management;

数据处理分析模块:基于Python网络对数据库储存模块中的数据进行分析处理,同时结合Sharksearch算法,分析数据信息与主题的相关性,完成对数据库储存模块中数据源的归类处理;Data processing and analysis module: Analyze and process the data in the database storage module based on the Python network, and combine the Sharksearch algorithm to analyze the relevance of data information and topics, and complete the classification of data sources in the database storage module;

专业数据信息库:通过数据处理分析模块对数据的归类处理,并通过正则表达式匹配信息数据,建立与电力行业相关的专业性数据信息库,用以对归类处理之后的数据源进行储存管理;Professional data information database: Classify the data through the data processing and analysis module, and match the information data through regular expressions to establish a professional data information database related to the power industry to store and manage the data sources after classification;

索引生成单元:用于根据专业数据信息库中归类的数据源,生成索引;Index generation unit: used to generate indexes according to data sources classified in the professional data information database;

Web页面:用于向专业数据信息库中输入需要检索的关键字信息;Web page: used to input keyword information to be retrieved into the professional data information database;

检索优化模块:用于对Web页面检索的信心进行优化处理和推荐展示;Retrieval optimization module: used to optimize the confidence of Web page retrieval and recommend display;

数据统计模块:用于对专业数据信息库中收集的数据源进行统计分析;Data statistics module: used to perform statistical analysis on data sources collected in the professional data information database;

电力调度信息可视化平台:根据数据统计模块对专业数据模块中数据源的统计,建立电力调度信息可视化平台,电力调度信息可视化平台内部包括地区分布模块、数据分类占比模块、数据下载记录模块;Electric power dispatching information visualization platform: Based on the statistics of the data sources in the professional data module by the data statistics module, an electric power dispatching information visualization platform is established. The electric power dispatching information visualization platform includes a regional distribution module, a data classification proportion module, and a data download record module;

地区分布模块:用于每个地区的数据源的统计;Regional distribution module: used for statistics of data sources in each region;

数据分类占比模块:用于根据数据源划分的种类,对数据源进行分类统计;Data classification proportion module: used to classify and count data sources according to the types of data sources;

数据下载记录模块:用于记录统计从专业信息库中下载的数据源情况;Data download record module: used to record and count the data sources downloaded from the professional information database;

信息爬取时间识别模块:用于识别爬取的数据源的入网时间;Information crawling time identification module: used to identify the network access time of the crawled data source;

最新信息推送模块:用于将最新发布的与电力行业相关的信息自动推送到Web页面。Latest information push module: used to automatically push the latest information related to the power industry to the Web page.

进一步的,所述数据爬取模块中,数据爬取的具体过程包括以下过程:Furthermore, in the data crawling module, the specific process of data crawling includes the following process:

S1、通过Focused Crawler软件开发建立初始化待爬取URL序列;S1. Initialize the URL sequence to be crawled through Focused Crawler software development;

S2、利用待爬取URL序列搜索电力行业网站和多媒体页面;S2, using the URL sequence to be crawled to search for power industry websites and multimedia pages;

S3、当搜索页面为空时,数据爬取结束,当搜索到与电力行业相关的页面时,下载URL对应的页面内容;S3. When the search page is empty, the data crawling ends. When a page related to the power industry is found, the page content corresponding to the URL is downloaded.

S4、页面下载成功时,将URL加入已爬取列表,并继续分析处理文档,当页面下载失败时,选择是否重试,当不需要重试时,将URL加入已爬取列表,并添加相关的异常处理和日志记录,当需要重试时,重复S2-S3;S4. When the page is downloaded successfully, add the URL to the crawled list and continue to analyze and process the document. When the page fails to download, choose whether to retry. If retry is not required, add the URL to the crawled list and add relevant exception handling and log records. If retry is required, repeat S2-S3.

S5、将URL加入已爬取列表时,选择是否收集页面,需要收集页面时,会将页面作为有效数据,并储存到数据库模块中;S5. When adding the URL to the crawled list, choose whether to collect the page. If the page needs to be collected, the page will be taken as valid data and stored in the database module;

S6、当不需要收集时,判断URL为有价值的URL,并继续重复S2-S5。S6. When collection is not needed, determine that the URL is a valuable URL, and continue to repeat S2-S5.

进一步的,所述数据爬取模块中获取的数据源包括电力行业政策、论文、成果、技术、研发方向、产品、规划信息。Furthermore, the data sources obtained in the data crawling module include power industry policies, papers, achievements, technologies, research and development directions, products, and planning information.

进一步的,所述数据分类占比模块中数据的分类包括标准、文献、软著、专利。Furthermore, the data classification in the data classification proportion module includes standards, documents, software copyrights, and patents.

进一步的,所述地区分布模块、数据分类占比模块和数据下载记录模块之间两两相连。Furthermore, the regional distribution module, the data classification proportion module and the data download record module are connected in pairs.

进一步的,所述Web页面有多个,Web页面之间为网络连接。Furthermore, there are multiple Web pages, and the Web pages are connected by a network.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

1、该基于大数据的人工智能电力调度决策系统,通过数据爬取模块与数据库储存模块的配合,可采集互联网中有关电力行业的专业信息,并可通过数据库管理模块和数据处理分析模块,对采集的信息数据进行管理和归类处理,以便获得更专业的数据源,从而建立专业性数据信息库,同时通过信息爬取时间识别模块与最新信息推送模块的配合,可利用Web页在专业数据信息库中检索与电力行业有关的有效信息,同时实现最新有效信息的智能推送,以便实现电力调度的智能决策。1. This artificial intelligence power dispatching decision-making system based on big data can collect professional information about the power industry on the Internet through the cooperation of the data crawling module and the database storage module, and can manage and classify the collected information data through the database management module and the data processing and analysis module, so as to obtain a more professional data source, thereby establishing a professional data information database. At the same time, through the cooperation of the information crawling time identification module and the latest information push module, the Web page can be used to retrieve valid information related to the power industry in the professional data information database, and the intelligent push of the latest valid information can be realized, so as to realize intelligent decision-making of power dispatching.

2、该基于大数据的人工智能电力调度决策系统,通过数据爬取模块的数据采集方法,可自动和实时采集互联网中与电力行业有关的专业信心,有利于获取最新的电力行业有关政策、行业标准等有关信息,数据统计模块与专业数据信息库的配合,可对专业信息库内部的信息进行统计分析,从而建立电力调度信息可视化分析平台,通过电力调度信息可视化分析平台中地区分布模块、数据分类占比模块和数据下载记录模块之间的配合,可形成专业的数据库预测体系,从而对电力调度做出预测分析和智能决策。2. The artificial intelligence power dispatching decision-making system based on big data can automatically and in real time collect professional information related to the power industry on the Internet through the data collection method of the data crawling module, which is conducive to obtaining the latest information on policies, industry standards and other relevant information of the power industry. The cooperation of the data statistics module and the professional data information database can conduct statistical analysis on the information within the professional information database, thereby establishing a power dispatching information visualization analysis platform. Through the cooperation between the regional distribution module, the data classification proportion module and the data download record module in the power dispatching information visualization analysis platform, a professional database prediction system can be formed, thereby making predictive analysis and intelligent decision-making on power dispatching.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明专业数据信息库与电力调度信息可视化平台建立示意图;FIG1 is a schematic diagram of establishing a professional data information database and a power dispatching information visualization platform according to the present invention;

图2为本发明数据爬起模块的数据爬取方法流程图;FIG2 is a flow chart of a data crawling method of a data crawling module of the present invention;

图3为本发明电力调度信息可视化分析平台示意图;FIG3 is a schematic diagram of a power dispatch information visualization analysis platform according to the present invention;

图4为本发明专业数据信息库与Web页面最新信息推送示意图。FIG. 4 is a schematic diagram of the professional data information database and the latest information push of the Web page of the present invention.

具体实施方式DETAILED DESCRIPTION

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.

该基于大数据的人工智能电力调度决策系统的实施例如下:The implementation example of the artificial intelligence power dispatch decision system based on big data is as follows:

请参阅图1-图4,一种基于大数据的人工智能电力调度决策系统,包括:Please refer to Figures 1 to 4, an artificial intelligence power dispatch decision system based on big data, including:

数据爬取模块:利用Focused Crawler软件开发,从国内外行业网站、多媒体中搜索获取与电力行业相关的数据源,通过国网文库、中国知网、高校、政府、自媒体公众号等网络渠道,全面收集电力调度相关的政策、方法、技术,电力网络安全、电网发展、技术革新、产品等方面的有效信息;Data crawling module: Developed using the Focused Crawler software, it searches for data sources related to the power industry from domestic and foreign industry websites and multimedia. Through network channels such as the State Grid Library, China National Knowledge Infrastructure, universities, governments, and self-media public accounts, it comprehensively collects effective information on policies, methods, and technologies related to power dispatching, power network security, power grid development, technological innovation, and products;

数据库储存模块:用于储存数据爬取模块获取的数据源;Database storage module: used to store the data source obtained by the data crawling module;

数据库管理模块:利用MySQL数据库的智能管理,对数据库储存模块中的数据源进行处理,处理包括过滤无用信息、去重纠错、提取标签、数据安全管理,确保数据源的安全存放和合理有效使用;Database management module: Utilize the intelligent management of MySQL database to process the data source in the database storage module, including filtering useless information, deduplication and error correction, label extraction, and data security management, to ensure the safe storage and reasonable and effective use of data sources;

数据处理分析模块:基于Python网络将数据爬取模块爬取的大量专业信息用于数据分析和大数据研究,同时结合Sharksearch算法,分析数据信息与主题的相关性,完成对数据库储存模块中数据源的归类处理;Data processing and analysis module: Based on the Python network, the large amount of professional information crawled by the data crawling module is used for data analysis and big data research. At the same time, combined with the Sharksearch algorithm, the relevance of data information and topics is analyzed to complete the classification of data sources in the database storage module;

专业数据信息库:通过数据处理分析模块对数据的归类处理,并通过正则表达式匹配信息数据,建立与电力行业相关的专业性数据信息库,用以对归类处理之后的数据源进行储存管理,使专业数据信息库对采集的数据源进行分类和系统化储存管理;Professional data information database: The data processing and analysis module is used to classify the data, and the information data is matched through regular expressions to establish a professional data information database related to the power industry, which is used to store and manage the data sources after classification, so that the professional data information database can classify and systematically store and manage the collected data sources;

索引生成单元:用于根据专业数据信息库中归类的数据源,生成索引,以便快速检索专业数据信息库中的特定信息;Index generation unit: used to generate indexes according to the data sources classified in the professional data information database, so as to quickly retrieve specific information in the professional data information database;

Web页面:用于向专业数据信息库中输入需要检索的关键字信息,Web页面有多个,Web页面之间为网络连接,可通过Web页面相互推送和分享电力行业优点的有效信息;Web page: used to input keyword information to be retrieved into the professional data information database. There are multiple Web pages, which are connected by a network. Effective information on the advantages of the power industry can be pushed and shared through the Web pages.

检索优化模块:用于对Web页面检索的信心进行优化处理和推荐展示;Retrieval optimization module: used to optimize the confidence of Web page retrieval and recommend display;

数据统计模块:用于对专业数据信息库中收集的数据源进行统计分析,通过对专业数据信息库中的数据源进行实时监测统计分析,建立电力调度信息可视化平台;Data statistics module: used to conduct statistical analysis on the data sources collected in the professional data information database, and establish a power dispatch information visualization platform through real-time monitoring and statistical analysis of the data sources in the professional data information database;

电力调度信息可视化平台:根据数据统计模块对专业数据模块中数据源的统计,建立电力调度信息可视化平台,电力调度信息可视化平台内部包括地区分布模块、数据分类占比模块、数据下载记录模块;Electric power dispatching information visualization platform: Based on the statistics of the data sources in the professional data module by the data statistics module, an electric power dispatching information visualization platform is established. The electric power dispatching information visualization platform includes a regional distribution module, a data classification proportion module, and a data download record module;

地区分布模块:用于每个地区的数据源的统计;Regional distribution module: used for statistics of data sources in each region;

数据分类占比模块:用于根据数据源划分的种类,对数据源进行分类统计;Data classification proportion module: used to classify and count data sources according to the types of data sources;

数据下载记录模块:用于记录统计从专业信息库中下载的数据源情况;Data download record module: used to record and count the data sources downloaded from the professional information database;

通过地区分布模块与数据分类占比模块,可分析某一地区与电力行业相关的数据源的数量以及数据源电力相关信息中各分类的占比情况,通过地区分布模块与数据下载记录模块和数据分类占比模块,可分析某一地区下载的与电力行业相关的数据源的数量以及数据源电力相关信息中各分类下载的占比情况,根据地区分布模块、数据分类占比模块和数据下载记录模块中对数据源的统计,可形成专业的数据库预测体系,从而对电力调度做出预测分析和智能决策。Through the regional distribution module and the data classification proportion module, the number of data sources related to the power industry in a certain area and the proportion of each category in the power-related information of the data source can be analyzed. Through the regional distribution module, the data download record module and the data classification proportion module, the number of data sources related to the power industry downloaded in a certain area and the proportion of each category in the power-related information of the data source can be analyzed. According to the statistics of data sources in the regional distribution module, the data classification proportion module and the data download record module, a professional database prediction system can be formed, thereby making predictive analysis and intelligent decision-making for power dispatching.

信息爬取时间识别模块:用于识别爬取的数据源的入网时间,以检测数据爬取模块爬取的数据源的最初有效时间;Information crawling time identification module: used to identify the network access time of the crawled data source, so as to detect the initial effective time of the data source crawled by the data crawling module;

最新信息推送模块:用于将最新发布的与电力行业相关的信息自动推送到Web页面,信息爬取时间识别模块与专业数据信息库配合,可将最新的电力行业信息通过最新信息推送模块智能推送到Web页面中。The latest information push module is used to automatically push the latest information related to the power industry to the Web page. The information crawling time identification module cooperates with the professional data information database to intelligently push the latest power industry information to the Web page through the latest information push module.

数据爬取模块中,数据爬取的具体过程包括以下过程:In the data crawling module, the specific process of data crawling includes the following processes:

S1、通过Focused Crawler软件开发建立初始化待爬取URL序列;S1. Initialize the URL sequence to be crawled through Focused Crawler software development;

S2、利用待爬取URL序列搜索电力行业网站和多媒体页面;S2, using the URL sequence to be crawled to search for power industry websites and multimedia pages;

S3、当搜索页面为空时,数据爬取结束,当搜索到与电力行业相关的页面时,下载URL对应的页面内容;S3. When the search page is empty, the data crawling ends. When a page related to the power industry is found, the page content corresponding to the URL is downloaded.

S4、页面下载成功时,将URL加入已爬取列表,并继续分析处理文档,当页面下载失败时,选择是否重试,当不需要重试时,将URL加入已爬取列表,并添加相关的异常处理和日志记录,当需要重试时,重复S2-S3;S4. When the page is downloaded successfully, add the URL to the crawled list and continue to analyze and process the document. When the page fails to download, choose whether to retry. If retry is not required, add the URL to the crawled list and add relevant exception handling and log records. If retry is required, repeat S2-S3.

S5、将URL加入已爬取列表时,选择是否收集页面,需要收集页面时,会将页面作为有效数据,并储存到数据库模块中;S5. When adding the URL to the crawled list, choose whether to collect the page. If the page needs to be collected, the page will be taken as valid data and stored in the database module;

S6、当不需要收集时,判断URL为有价值的URL,并继续重复S2-S5。S6. When collection is not needed, determine that the URL is a valuable URL, and continue to repeat S2-S5.

数据爬取模块中获取的数据源包括电力行业政策、论文、成果、技术、研发方向、产品、规划信息。The data sources obtained in the data crawling module include power industry policies, papers, achievements, technologies, R&D directions, products, and planning information.

数据分类占比模块中数据的分类包括标准、文献、软著、专利。The data categories in the data classification proportion module include standards, documents, software copyrights, and patents.

地区分布模块、数据分类占比模块和数据下载记录模块之间两两相连。The regional distribution module, the data classification proportion module and the data download record module are connected in pairs.

Web页面有多个,Web页面之间为网络连接。There are multiple web pages, and the web pages are connected by a network.

在使用时,首先通过Focused Crawler软件开发,利用数据爬取模块通过电力行业网站、多媒体平台等网络渠道,获取和收集与电力行业相关的专业信息,包括图片、文字、链接等,收集电力调度相关的政策、方法、技术、电力网络安全、电网发展、技术革新、产品等方面的有效信息。When in use, first use the Focused Crawler software development, and use the data crawling module to obtain and collect professional information related to the power industry through power industry websites, multimedia platforms and other network channels, including pictures, text, links, etc., and collect effective information on policies, methods, technologies, power network security, power grid development, technological innovation, products, etc. related to power dispatching.

将数据爬取模块收集的信息进行储存管理,建立数据库储存模块,同时通过MySQL数据库的智能管理,利用数据库管理模块,对数据库储存模块中的数据源进行处理,过滤数据源中的无效信息、去重纠错、提取标签,并对数据库储存模块中的数据源进行安全存放、合理有效使用管理。The information collected by the data crawling module is stored and managed, and a database storage module is established. At the same time, through the intelligent management of the MySQL database, the database management module is used to process the data source in the database storage module, filter out invalid information in the data source, remove duplications and correct errors, extract labels, and safely store the data source in the database storage module and manage its use reasonably and effectively.

基于Python网络将数据库储存模块中的大量专业信息用于数据分析与大数据研究,同时利用Sharksearch算法,有效增强搜集信息与主题的相关性,以此对数据库储存模块中的数据源进行归类处理。Based on the Python network, a large amount of professional information in the database storage module is used for data analysis and big data research. At the same time, the Sharksearch algorithm is used to effectively enhance the relevance of the collected information and the subject, so as to classify the data sources in the database storage module.

依据归类之后的数据库储存模块中的数据源,并通过正则表达式去匹配信息数据,建立内部强大的专业性系统信息库,即专业数据信息库,同时利用索引生成单元生成索引。According to the data sources in the database storage module after classification, and by matching the information data through regular expressions, a powerful internal professional system information base, namely the professional data information base, is established, and the index is generated by the index generation unit.

通过web页面可对专业数据信息库中的数据源进行检索,同时利用检索优化模块对检索到的信息进行优化展示,便于检索调取和下载应用。The data sources in the professional data information library can be searched through the web page, and the search optimization module can be used to optimize the display of the retrieved information to facilitate retrieval and download application.

数据统计模块根据专业数据信息库中的数据源建立电力调度信息可视化分析平台,电力调度信息可视化平台内部包括地区分布模块、数据分类占比模块、数据下载记录模块,通过地区分布模块与数据分类占比模块,可分析某一地区与电力行业相关的数据源的数量以及数据源电力相关信息中各分类的占比情况,通过地区分布模块与数据下载记录模块和数据分类占比模块,可分析某一地区下载的与电力行业相关的数据源的数量以及数据源电力相关信息中各分类下载的占比情况。The data statistics module establishes a power dispatching information visualization analysis platform based on the data source in the professional data information library. The power dispatching information visualization platform includes a regional distribution module, a data classification proportion module, and a data download record module. Through the regional distribution module and the data classification proportion module, the number of data sources related to the power industry in a certain region and the proportion of each category in the power-related information of the data source can be analyzed. Through the regional distribution module, the data download record module and the data classification proportion module, the number of data sources related to the power industry downloaded in a certain region and the proportion of each category in the power-related information of the data source can be analyzed.

根据地区分布模块、数据分类占比模块和数据下载记录模块中对数据源的统计,可形成专业的数据库预测体系,从而对电力调度做出预测分析和智能决策。According to the statistics of data sources in the regional distribution module, data classification proportion module and data download record module, a professional database prediction system can be formed to make predictive analysis and intelligent decision-making for power dispatching.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the present invention, and that the scope of the present invention is defined by the appended claims and their equivalents.

Claims (6)

1. An artificial intelligence power scheduling decision system based on big data, which is characterized in that: comprising the following steps:
And the data crawling module: utilizing Focused Crawler software development to search and obtain data sources related to the power industry from domestic and foreign industry websites and multimedia;
And a database storage module: the data source is used for storing the data acquired by the data crawling module;
And a database management module: processing the data source in the database storage module by utilizing intelligent management of the MySQL database, wherein the processing comprises useless information filtering, duplicate removal and error correction, tag extraction and data security management;
And the data processing and analyzing module is used for: analyzing and processing the data in the database storage module based on the Python network, and simultaneously combining SHARKSEARCH algorithm to analyze the correlation between the data information and the subject, so as to complete the classification processing of the data sources in the database storage module;
Professional data information base: classifying the data through a data processing analysis module, matching information data through a regular expression, and establishing a professional data information base related to the power industry for storing and managing the classified data sources;
an index generation unit: the index is generated according to the data sources classified in the professional data information base;
Web page: the key word information to be searched is input into the professional data information base;
and (5) a retrieval optimization module: the method is used for optimizing and recommending the information retrieved by the Web page;
and a data statistics module: the system is used for carrying out statistical analysis on data sources collected in the professional data information base;
And the power scheduling information visualization platform comprises: according to statistics of data sources in the professional data module by the data statistics module, a power scheduling information visualization platform is established, wherein the power scheduling information visualization platform internally comprises a region distribution module, a data classification duty ratio module and a data downloading recording module;
Region distribution module: statistics of data sources for each region;
And the data classification duty ratio module is used for: the data source classifying and counting device is used for classifying and counting the data sources according to the types of the data source classification;
and the data downloading and recording module is used for: the method is used for recording and counting the situation of the data source downloaded from the professional information base;
information crawling time identification module: the method comprises the steps of identifying the network access time of a crawled data source;
the latest information pushing module: for automatically pushing newly released power industry related information to the Web page.
2. The big data based artificial intelligence power scheduling decision system of claim 1, wherein: in the data crawling module, the specific process of data crawling comprises the following steps:
S1, initializing a URL sequence to be crawled through Focused Crawler software development;
S2, searching a website and a multimedia page of the power industry by utilizing the URL sequence to be crawled;
S3, when the search page is empty, finishing data crawling, and when the page related to the power industry is searched, downloading page content corresponding to the URL;
S4, when the page downloading is successful, adding the URL into the crawled list, continuously analyzing and processing the document, when the page downloading fails, selecting whether to retry, when the retry is not needed, adding the URL into the crawled list, adding related exception handling and log records, and when the retry is needed, repeating S2-S3;
S5, selecting whether to collect pages when the URL is added into the crawled list, and taking the pages as effective data and storing the effective data into a database module when the pages need to be collected;
and S6, when the collection is not needed, judging the URL to be a valuable URL, and continuously repeating the steps S2-S5.
3. The big data based artificial intelligence power scheduling decision system of claim 1, wherein: the data sources acquired in the data crawling module comprise policies, papers, achievements, technologies, research and development directions, products and planning information of the power industry.
4. The big data based artificial intelligence power scheduling decision system of claim 1, wherein: the data classification duty ratio module is used for classifying data, wherein the data classification duty ratio module comprises standards, documents, books and patents.
5. The big data based artificial intelligence power scheduling decision system of claim 1, wherein: the region distribution module, the data classification duty ratio module and the data downloading recording module are connected in pairs.
6. The big data based artificial intelligence power scheduling decision system of claim 1, wherein: the number of the Web pages is multiple, and network connection is arranged among the Web pages.
CN202111310992.4A 2021-11-08 2021-11-08 An artificial intelligence power dispatch decision-making system based on big data Active CN114064997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111310992.4A CN114064997B (en) 2021-11-08 2021-11-08 An artificial intelligence power dispatch decision-making system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111310992.4A CN114064997B (en) 2021-11-08 2021-11-08 An artificial intelligence power dispatch decision-making system based on big data

Publications (2)

Publication Number Publication Date
CN114064997A CN114064997A (en) 2022-02-18
CN114064997B true CN114064997B (en) 2024-11-05

Family

ID=80274432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111310992.4A Active CN114064997B (en) 2021-11-08 2021-11-08 An artificial intelligence power dispatch decision-making system based on big data

Country Status (1)

Country Link
CN (1) CN114064997B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117277566B (en) * 2023-09-20 2024-05-03 国网河南省电力公司濮阳供电公司 Power grid data analysis and power dispatching system and method based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100835307B1 (en) * 2006-12-08 2008-06-04 부산대학교 산학협력단 Agent system and method for collecting schedule information on the web
CN110046294A (en) * 2019-03-04 2019-07-23 国网浙江省电力有限公司经济技术研究院 A kind of energy information system based on electric power big data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104881424A (en) * 2015-03-13 2015-09-02 国家电网公司 Regular expression-based acquisition, storage and analysis method of power big data

Also Published As

Publication number Publication date
CN114064997A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN111460252B (en) Automatic search engine method and system based on network public opinion analysis
CN109446344B (en) Intelligent analysis report automatic generation system based on big data
WO2019214245A1 (en) Information pushing method and apparatus, and terminal device and storage medium
EP3270303A1 (en) An automated monitoring and archiving system and method
CN117726166A (en) Artificial intelligence enterprise customer risk information analysis and evaluation method and system based on large language model
CN117909440B (en) Intelligent archive index and retrieval system
TWI793432B (en) Document management method and system for engineering project
CN112182248A (en) A Statistical Method for the Key Policy of Electricity Price
CN111382184A (en) Method for verifying drug document and drug document verification system
CN111400369A (en) Big data analysis-based policy information service system and method
CN111666499A (en) Public opinion monitoring cloud service platform based on big data
CN115132366A (en) Multi-source data processing method and system based on health and medical big data standard library
CN118410405A (en) Intelligent identification system for hierarchical relationship of data assets
CN114064997B (en) An artificial intelligence power dispatch decision-making system based on big data
CN119249094A (en) A multi-source heterogeneous data fusion and analysis method and system
CN119597929A (en) Novel intelligent data architecture knowledge retrieval method and system for power system construction
CN113469704A (en) System and method for automatically recommending customer service strategy
CN117763076B (en) File retrieval method and system based on cloud computing
CN115982429B (en) Knowledge management method and system based on flow control
CN112800219B (en) Method and system for feeding back customer service log to return database
CN116401338A (en) Design feature extraction and attention mechanism and method based on data asset intelligent retrieval input and output requirements
CN114722801B (en) Government data classification storage method and related device
CN114841155A (en) Intelligent theme content aggregation method and device, electronic equipment and storage medium
CN117668313A (en) Industrial search systems, methods, electronic devices and storage media
Hu et al. A faq finding process in open source project forums

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant