CN114064997B - An artificial intelligence power dispatch decision-making system based on big data - Google Patents
An artificial intelligence power dispatch decision-making system based on big data Download PDFInfo
- Publication number
- CN114064997B CN114064997B CN202111310992.4A CN202111310992A CN114064997B CN 114064997 B CN114064997 B CN 114064997B CN 202111310992 A CN202111310992 A CN 202111310992A CN 114064997 B CN114064997 B CN 114064997B
- Authority
- CN
- China
- Prior art keywords
- data
- module
- information
- professional
- crawling
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Marketing (AREA)
- General Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供一种基于大数据的人工智能电力调度决策系统。该基于大数据的人工智能电力调度决策系统,包括:数据爬取模块:利用Focused Crawler软件开发,从国内外行业网站、多媒体中搜索获取与电力行业相关的数据源,该基于大数据的人工智能电力调度决策系统,通过数据爬取模块与数据库储存模块的配合,可采集互联网中有关电力行业的专业信息,并可通过数据库管理模块和数据处理分析模块,对采集的信息数据进行管理和归类处理,从而建立专业性数据信息库,同时通过信息爬取时间识别模块与最新信息推送模块的配合,可利用Web页在专业数据信息库中检索与电力行业有关的有效信息,同时实现最新有效信息的智能推送,以便实现电力调度的智能决策。
The present invention provides an artificial intelligence power dispatching decision system based on big data. The artificial intelligence power dispatching decision system based on big data includes: a data crawling module: developed using Focused Crawler software, searching and acquiring data sources related to the power industry from domestic and foreign industry websites and multimedia. The artificial intelligence power dispatching decision system based on big data can collect professional information about the power industry on the Internet through the cooperation of the data crawling module and the database storage module, and can manage and classify the collected information data through the database management module and the data processing and analysis module, so as to establish a professional data information library. At the same time, through the cooperation of the information crawling time identification module and the latest information push module, the effective information related to the power industry can be retrieved from the professional data information library using a Web page, and the intelligent push of the latest effective information can be realized, so as to realize the intelligent decision of power dispatching.
Description
技术领域Technical Field
本发明涉及电力调度技术领域,具体为一种基于大数据的人工智能电力调度决策系统。The present invention relates to the technical field of electric power dispatching, and in particular to an artificial intelligence electric power dispatching decision system based on big data.
背景技术Background Art
电力调度是为了保证电网安全稳定运行、可靠供电、电力生产工作有序进行而采取的一种有效管理手段。电力调度专业支持系统的更迭、完善、发展与国家大政方针、行业发展趋势紧密相关,现阶段大量的政策导向文件、技术革新论文、最新行业标准,均是电力调度决策的依据。Power dispatching is an effective management method to ensure the safe and stable operation of the power grid, reliable power supply, and orderly power production. The replacement, improvement, and development of the power dispatching professional support system are closely related to national policies and industry development trends. At this stage, a large number of policy-oriented documents, technical innovation papers, and the latest industry standards are the basis for power dispatching decisions.
随着网络时代的快速发展,可运用大数据、人工智能等现代信息技术,通过对电力科技与经济大数据、专家大数据、区域电力产业大数据、互联网电力信息资源融合分析利用,但是现有的电力调度技术支持系统在日常工作中,对上级主管部门的文件、行业最新发展等信息缺乏有效检索、智能分类推送和提取关键有效信息的手段,会影响电力调度的有效决策。With the rapid development of the Internet era, modern information technologies such as big data and artificial intelligence can be used to integrate and analyze power technology and economic big data, expert big data, regional power industry big data, and Internet power information resources. However, in daily work, the existing power dispatching technical support system lacks the means to effectively retrieve, intelligently classify and push, and extract key and effective information on documents from higher-level authorities, the latest industry developments, etc., which will affect the effective decision-making of power dispatching.
发明内容Summary of the invention
为实现以上基于大数据的人工智能电力调度决策系统目的,本发明通过以下技术方案予以实现:一种基于大数据的人工智能电力调度决策系统,包括:In order to achieve the above purpose of the artificial intelligence power dispatching decision system based on big data, the present invention is implemented through the following technical solutions: an artificial intelligence power dispatching decision system based on big data, comprising:
数据爬取模块:利用Focused Crawler软件开发,从国内外行业网站、多媒体中搜索获取与电力行业相关的数据源;Data crawling module: Developed using the Focused Crawler software, it searches for data sources related to the power industry from domestic and foreign industry websites and multimedia;
数据库储存模块:用于储存数据爬取模块获取的数据源;Database storage module: used to store the data source obtained by the data crawling module;
数据库管理模块:利用MySQL数据库的智能管理,对数据库储存模块中的数据源进行处理,处理包括过滤无用信息、去重纠错、提取标签、数据安全管理;Database management module: Utilizes the intelligent management of MySQL database to process the data source in the database storage module, including filtering useless information, deduplication and error correction, label extraction, and data security management;
数据处理分析模块:基于Python网络对数据库储存模块中的数据进行分析处理,同时结合Sharksearch算法,分析数据信息与主题的相关性,完成对数据库储存模块中数据源的归类处理;Data processing and analysis module: Analyze and process the data in the database storage module based on the Python network, and combine the Sharksearch algorithm to analyze the relevance of data information and topics, and complete the classification of data sources in the database storage module;
专业数据信息库:通过数据处理分析模块对数据的归类处理,并通过正则表达式匹配信息数据,建立与电力行业相关的专业性数据信息库,用以对归类处理之后的数据源进行储存管理;Professional data information database: Classify the data through the data processing and analysis module, and match the information data through regular expressions to establish a professional data information database related to the power industry to store and manage the data sources after classification;
索引生成单元:用于根据专业数据信息库中归类的数据源,生成索引;Index generation unit: used to generate indexes according to data sources classified in the professional data information database;
Web页面:用于向专业数据信息库中输入需要检索的关键字信息;Web page: used to input keyword information to be retrieved into the professional data information database;
检索优化模块:用于对Web页面检索的信心进行优化处理和推荐展示;Retrieval optimization module: used to optimize the confidence of Web page retrieval and recommend display;
数据统计模块:用于对专业数据信息库中收集的数据源进行统计分析;Data statistics module: used to perform statistical analysis on data sources collected in the professional data information database;
电力调度信息可视化平台:根据数据统计模块对专业数据模块中数据源的统计,建立电力调度信息可视化平台,电力调度信息可视化平台内部包括地区分布模块、数据分类占比模块、数据下载记录模块;Electric power dispatching information visualization platform: Based on the statistics of the data sources in the professional data module by the data statistics module, an electric power dispatching information visualization platform is established. The electric power dispatching information visualization platform includes a regional distribution module, a data classification proportion module, and a data download record module;
地区分布模块:用于每个地区的数据源的统计;Regional distribution module: used for statistics of data sources in each region;
数据分类占比模块:用于根据数据源划分的种类,对数据源进行分类统计;Data classification proportion module: used to classify and count data sources according to the types of data sources;
数据下载记录模块:用于记录统计从专业信息库中下载的数据源情况;Data download record module: used to record and count the data sources downloaded from the professional information database;
信息爬取时间识别模块:用于识别爬取的数据源的入网时间;Information crawling time identification module: used to identify the network access time of the crawled data source;
最新信息推送模块:用于将最新发布的与电力行业相关的信息自动推送到Web页面。Latest information push module: used to automatically push the latest information related to the power industry to the Web page.
进一步的,所述数据爬取模块中,数据爬取的具体过程包括以下过程:Furthermore, in the data crawling module, the specific process of data crawling includes the following process:
S1、通过Focused Crawler软件开发建立初始化待爬取URL序列;S1. Initialize the URL sequence to be crawled through Focused Crawler software development;
S2、利用待爬取URL序列搜索电力行业网站和多媒体页面;S2, using the URL sequence to be crawled to search for power industry websites and multimedia pages;
S3、当搜索页面为空时,数据爬取结束,当搜索到与电力行业相关的页面时,下载URL对应的页面内容;S3. When the search page is empty, the data crawling ends. When a page related to the power industry is found, the page content corresponding to the URL is downloaded.
S4、页面下载成功时,将URL加入已爬取列表,并继续分析处理文档,当页面下载失败时,选择是否重试,当不需要重试时,将URL加入已爬取列表,并添加相关的异常处理和日志记录,当需要重试时,重复S2-S3;S4. When the page is downloaded successfully, add the URL to the crawled list and continue to analyze and process the document. When the page fails to download, choose whether to retry. If retry is not required, add the URL to the crawled list and add relevant exception handling and log records. If retry is required, repeat S2-S3.
S5、将URL加入已爬取列表时,选择是否收集页面,需要收集页面时,会将页面作为有效数据,并储存到数据库模块中;S5. When adding the URL to the crawled list, choose whether to collect the page. If the page needs to be collected, the page will be taken as valid data and stored in the database module;
S6、当不需要收集时,判断URL为有价值的URL,并继续重复S2-S5。S6. When collection is not needed, determine that the URL is a valuable URL, and continue to repeat S2-S5.
进一步的,所述数据爬取模块中获取的数据源包括电力行业政策、论文、成果、技术、研发方向、产品、规划信息。Furthermore, the data sources obtained in the data crawling module include power industry policies, papers, achievements, technologies, research and development directions, products, and planning information.
进一步的,所述数据分类占比模块中数据的分类包括标准、文献、软著、专利。Furthermore, the data classification in the data classification proportion module includes standards, documents, software copyrights, and patents.
进一步的,所述地区分布模块、数据分类占比模块和数据下载记录模块之间两两相连。Furthermore, the regional distribution module, the data classification proportion module and the data download record module are connected in pairs.
进一步的,所述Web页面有多个,Web页面之间为网络连接。Furthermore, there are multiple Web pages, and the Web pages are connected by a network.
与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
1、该基于大数据的人工智能电力调度决策系统,通过数据爬取模块与数据库储存模块的配合,可采集互联网中有关电力行业的专业信息,并可通过数据库管理模块和数据处理分析模块,对采集的信息数据进行管理和归类处理,以便获得更专业的数据源,从而建立专业性数据信息库,同时通过信息爬取时间识别模块与最新信息推送模块的配合,可利用Web页在专业数据信息库中检索与电力行业有关的有效信息,同时实现最新有效信息的智能推送,以便实现电力调度的智能决策。1. This artificial intelligence power dispatching decision-making system based on big data can collect professional information about the power industry on the Internet through the cooperation of the data crawling module and the database storage module, and can manage and classify the collected information data through the database management module and the data processing and analysis module, so as to obtain a more professional data source, thereby establishing a professional data information database. At the same time, through the cooperation of the information crawling time identification module and the latest information push module, the Web page can be used to retrieve valid information related to the power industry in the professional data information database, and the intelligent push of the latest valid information can be realized, so as to realize intelligent decision-making of power dispatching.
2、该基于大数据的人工智能电力调度决策系统,通过数据爬取模块的数据采集方法,可自动和实时采集互联网中与电力行业有关的专业信心,有利于获取最新的电力行业有关政策、行业标准等有关信息,数据统计模块与专业数据信息库的配合,可对专业信息库内部的信息进行统计分析,从而建立电力调度信息可视化分析平台,通过电力调度信息可视化分析平台中地区分布模块、数据分类占比模块和数据下载记录模块之间的配合,可形成专业的数据库预测体系,从而对电力调度做出预测分析和智能决策。2. The artificial intelligence power dispatching decision-making system based on big data can automatically and in real time collect professional information related to the power industry on the Internet through the data collection method of the data crawling module, which is conducive to obtaining the latest information on policies, industry standards and other relevant information of the power industry. The cooperation of the data statistics module and the professional data information database can conduct statistical analysis on the information within the professional information database, thereby establishing a power dispatching information visualization analysis platform. Through the cooperation between the regional distribution module, the data classification proportion module and the data download record module in the power dispatching information visualization analysis platform, a professional database prediction system can be formed, thereby making predictive analysis and intelligent decision-making on power dispatching.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明专业数据信息库与电力调度信息可视化平台建立示意图;FIG1 is a schematic diagram of establishing a professional data information database and a power dispatching information visualization platform according to the present invention;
图2为本发明数据爬起模块的数据爬取方法流程图;FIG2 is a flow chart of a data crawling method of a data crawling module of the present invention;
图3为本发明电力调度信息可视化分析平台示意图;FIG3 is a schematic diagram of a power dispatch information visualization analysis platform according to the present invention;
图4为本发明专业数据信息库与Web页面最新信息推送示意图。FIG. 4 is a schematic diagram of the professional data information database and the latest information push of the Web page of the present invention.
具体实施方式DETAILED DESCRIPTION
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
该基于大数据的人工智能电力调度决策系统的实施例如下:The implementation example of the artificial intelligence power dispatch decision system based on big data is as follows:
请参阅图1-图4,一种基于大数据的人工智能电力调度决策系统,包括:Please refer to Figures 1 to 4, an artificial intelligence power dispatch decision system based on big data, including:
数据爬取模块:利用Focused Crawler软件开发,从国内外行业网站、多媒体中搜索获取与电力行业相关的数据源,通过国网文库、中国知网、高校、政府、自媒体公众号等网络渠道,全面收集电力调度相关的政策、方法、技术,电力网络安全、电网发展、技术革新、产品等方面的有效信息;Data crawling module: Developed using the Focused Crawler software, it searches for data sources related to the power industry from domestic and foreign industry websites and multimedia. Through network channels such as the State Grid Library, China National Knowledge Infrastructure, universities, governments, and self-media public accounts, it comprehensively collects effective information on policies, methods, and technologies related to power dispatching, power network security, power grid development, technological innovation, and products;
数据库储存模块:用于储存数据爬取模块获取的数据源;Database storage module: used to store the data source obtained by the data crawling module;
数据库管理模块:利用MySQL数据库的智能管理,对数据库储存模块中的数据源进行处理,处理包括过滤无用信息、去重纠错、提取标签、数据安全管理,确保数据源的安全存放和合理有效使用;Database management module: Utilize the intelligent management of MySQL database to process the data source in the database storage module, including filtering useless information, deduplication and error correction, label extraction, and data security management, to ensure the safe storage and reasonable and effective use of data sources;
数据处理分析模块:基于Python网络将数据爬取模块爬取的大量专业信息用于数据分析和大数据研究,同时结合Sharksearch算法,分析数据信息与主题的相关性,完成对数据库储存模块中数据源的归类处理;Data processing and analysis module: Based on the Python network, the large amount of professional information crawled by the data crawling module is used for data analysis and big data research. At the same time, combined with the Sharksearch algorithm, the relevance of data information and topics is analyzed to complete the classification of data sources in the database storage module;
专业数据信息库:通过数据处理分析模块对数据的归类处理,并通过正则表达式匹配信息数据,建立与电力行业相关的专业性数据信息库,用以对归类处理之后的数据源进行储存管理,使专业数据信息库对采集的数据源进行分类和系统化储存管理;Professional data information database: The data processing and analysis module is used to classify the data, and the information data is matched through regular expressions to establish a professional data information database related to the power industry, which is used to store and manage the data sources after classification, so that the professional data information database can classify and systematically store and manage the collected data sources;
索引生成单元:用于根据专业数据信息库中归类的数据源,生成索引,以便快速检索专业数据信息库中的特定信息;Index generation unit: used to generate indexes according to the data sources classified in the professional data information database, so as to quickly retrieve specific information in the professional data information database;
Web页面:用于向专业数据信息库中输入需要检索的关键字信息,Web页面有多个,Web页面之间为网络连接,可通过Web页面相互推送和分享电力行业优点的有效信息;Web page: used to input keyword information to be retrieved into the professional data information database. There are multiple Web pages, which are connected by a network. Effective information on the advantages of the power industry can be pushed and shared through the Web pages.
检索优化模块:用于对Web页面检索的信心进行优化处理和推荐展示;Retrieval optimization module: used to optimize the confidence of Web page retrieval and recommend display;
数据统计模块:用于对专业数据信息库中收集的数据源进行统计分析,通过对专业数据信息库中的数据源进行实时监测统计分析,建立电力调度信息可视化平台;Data statistics module: used to conduct statistical analysis on the data sources collected in the professional data information database, and establish a power dispatch information visualization platform through real-time monitoring and statistical analysis of the data sources in the professional data information database;
电力调度信息可视化平台:根据数据统计模块对专业数据模块中数据源的统计,建立电力调度信息可视化平台,电力调度信息可视化平台内部包括地区分布模块、数据分类占比模块、数据下载记录模块;Electric power dispatching information visualization platform: Based on the statistics of the data sources in the professional data module by the data statistics module, an electric power dispatching information visualization platform is established. The electric power dispatching information visualization platform includes a regional distribution module, a data classification proportion module, and a data download record module;
地区分布模块:用于每个地区的数据源的统计;Regional distribution module: used for statistics of data sources in each region;
数据分类占比模块:用于根据数据源划分的种类,对数据源进行分类统计;Data classification proportion module: used to classify and count data sources according to the types of data sources;
数据下载记录模块:用于记录统计从专业信息库中下载的数据源情况;Data download record module: used to record and count the data sources downloaded from the professional information database;
通过地区分布模块与数据分类占比模块,可分析某一地区与电力行业相关的数据源的数量以及数据源电力相关信息中各分类的占比情况,通过地区分布模块与数据下载记录模块和数据分类占比模块,可分析某一地区下载的与电力行业相关的数据源的数量以及数据源电力相关信息中各分类下载的占比情况,根据地区分布模块、数据分类占比模块和数据下载记录模块中对数据源的统计,可形成专业的数据库预测体系,从而对电力调度做出预测分析和智能决策。Through the regional distribution module and the data classification proportion module, the number of data sources related to the power industry in a certain area and the proportion of each category in the power-related information of the data source can be analyzed. Through the regional distribution module, the data download record module and the data classification proportion module, the number of data sources related to the power industry downloaded in a certain area and the proportion of each category in the power-related information of the data source can be analyzed. According to the statistics of data sources in the regional distribution module, the data classification proportion module and the data download record module, a professional database prediction system can be formed, thereby making predictive analysis and intelligent decision-making for power dispatching.
信息爬取时间识别模块:用于识别爬取的数据源的入网时间,以检测数据爬取模块爬取的数据源的最初有效时间;Information crawling time identification module: used to identify the network access time of the crawled data source, so as to detect the initial effective time of the data source crawled by the data crawling module;
最新信息推送模块:用于将最新发布的与电力行业相关的信息自动推送到Web页面,信息爬取时间识别模块与专业数据信息库配合,可将最新的电力行业信息通过最新信息推送模块智能推送到Web页面中。The latest information push module is used to automatically push the latest information related to the power industry to the Web page. The information crawling time identification module cooperates with the professional data information database to intelligently push the latest power industry information to the Web page through the latest information push module.
数据爬取模块中,数据爬取的具体过程包括以下过程:In the data crawling module, the specific process of data crawling includes the following processes:
S1、通过Focused Crawler软件开发建立初始化待爬取URL序列;S1. Initialize the URL sequence to be crawled through Focused Crawler software development;
S2、利用待爬取URL序列搜索电力行业网站和多媒体页面;S2, using the URL sequence to be crawled to search for power industry websites and multimedia pages;
S3、当搜索页面为空时,数据爬取结束,当搜索到与电力行业相关的页面时,下载URL对应的页面内容;S3. When the search page is empty, the data crawling ends. When a page related to the power industry is found, the page content corresponding to the URL is downloaded.
S4、页面下载成功时,将URL加入已爬取列表,并继续分析处理文档,当页面下载失败时,选择是否重试,当不需要重试时,将URL加入已爬取列表,并添加相关的异常处理和日志记录,当需要重试时,重复S2-S3;S4. When the page is downloaded successfully, add the URL to the crawled list and continue to analyze and process the document. When the page fails to download, choose whether to retry. If retry is not required, add the URL to the crawled list and add relevant exception handling and log records. If retry is required, repeat S2-S3.
S5、将URL加入已爬取列表时,选择是否收集页面,需要收集页面时,会将页面作为有效数据,并储存到数据库模块中;S5. When adding the URL to the crawled list, choose whether to collect the page. If the page needs to be collected, the page will be taken as valid data and stored in the database module;
S6、当不需要收集时,判断URL为有价值的URL,并继续重复S2-S5。S6. When collection is not needed, determine that the URL is a valuable URL, and continue to repeat S2-S5.
数据爬取模块中获取的数据源包括电力行业政策、论文、成果、技术、研发方向、产品、规划信息。The data sources obtained in the data crawling module include power industry policies, papers, achievements, technologies, R&D directions, products, and planning information.
数据分类占比模块中数据的分类包括标准、文献、软著、专利。The data categories in the data classification proportion module include standards, documents, software copyrights, and patents.
地区分布模块、数据分类占比模块和数据下载记录模块之间两两相连。The regional distribution module, the data classification proportion module and the data download record module are connected in pairs.
Web页面有多个,Web页面之间为网络连接。There are multiple web pages, and the web pages are connected by a network.
在使用时,首先通过Focused Crawler软件开发,利用数据爬取模块通过电力行业网站、多媒体平台等网络渠道,获取和收集与电力行业相关的专业信息,包括图片、文字、链接等,收集电力调度相关的政策、方法、技术、电力网络安全、电网发展、技术革新、产品等方面的有效信息。When in use, first use the Focused Crawler software development, and use the data crawling module to obtain and collect professional information related to the power industry through power industry websites, multimedia platforms and other network channels, including pictures, text, links, etc., and collect effective information on policies, methods, technologies, power network security, power grid development, technological innovation, products, etc. related to power dispatching.
将数据爬取模块收集的信息进行储存管理,建立数据库储存模块,同时通过MySQL数据库的智能管理,利用数据库管理模块,对数据库储存模块中的数据源进行处理,过滤数据源中的无效信息、去重纠错、提取标签,并对数据库储存模块中的数据源进行安全存放、合理有效使用管理。The information collected by the data crawling module is stored and managed, and a database storage module is established. At the same time, through the intelligent management of the MySQL database, the database management module is used to process the data source in the database storage module, filter out invalid information in the data source, remove duplications and correct errors, extract labels, and safely store the data source in the database storage module and manage its use reasonably and effectively.
基于Python网络将数据库储存模块中的大量专业信息用于数据分析与大数据研究,同时利用Sharksearch算法,有效增强搜集信息与主题的相关性,以此对数据库储存模块中的数据源进行归类处理。Based on the Python network, a large amount of professional information in the database storage module is used for data analysis and big data research. At the same time, the Sharksearch algorithm is used to effectively enhance the relevance of the collected information and the subject, so as to classify the data sources in the database storage module.
依据归类之后的数据库储存模块中的数据源,并通过正则表达式去匹配信息数据,建立内部强大的专业性系统信息库,即专业数据信息库,同时利用索引生成单元生成索引。According to the data sources in the database storage module after classification, and by matching the information data through regular expressions, a powerful internal professional system information base, namely the professional data information base, is established, and the index is generated by the index generation unit.
通过web页面可对专业数据信息库中的数据源进行检索,同时利用检索优化模块对检索到的信息进行优化展示,便于检索调取和下载应用。The data sources in the professional data information library can be searched through the web page, and the search optimization module can be used to optimize the display of the retrieved information to facilitate retrieval and download application.
数据统计模块根据专业数据信息库中的数据源建立电力调度信息可视化分析平台,电力调度信息可视化平台内部包括地区分布模块、数据分类占比模块、数据下载记录模块,通过地区分布模块与数据分类占比模块,可分析某一地区与电力行业相关的数据源的数量以及数据源电力相关信息中各分类的占比情况,通过地区分布模块与数据下载记录模块和数据分类占比模块,可分析某一地区下载的与电力行业相关的数据源的数量以及数据源电力相关信息中各分类下载的占比情况。The data statistics module establishes a power dispatching information visualization analysis platform based on the data source in the professional data information library. The power dispatching information visualization platform includes a regional distribution module, a data classification proportion module, and a data download record module. Through the regional distribution module and the data classification proportion module, the number of data sources related to the power industry in a certain region and the proportion of each category in the power-related information of the data source can be analyzed. Through the regional distribution module, the data download record module and the data classification proportion module, the number of data sources related to the power industry downloaded in a certain region and the proportion of each category in the power-related information of the data source can be analyzed.
根据地区分布模块、数据分类占比模块和数据下载记录模块中对数据源的统计,可形成专业的数据库预测体系,从而对电力调度做出预测分析和智能决策。According to the statistics of data sources in the regional distribution module, data classification proportion module and data download record module, a professional database prediction system can be formed to make predictive analysis and intelligent decision-making for power dispatching.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the present invention, and that the scope of the present invention is defined by the appended claims and their equivalents.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111310992.4A CN114064997B (en) | 2021-11-08 | 2021-11-08 | An artificial intelligence power dispatch decision-making system based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111310992.4A CN114064997B (en) | 2021-11-08 | 2021-11-08 | An artificial intelligence power dispatch decision-making system based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114064997A CN114064997A (en) | 2022-02-18 |
CN114064997B true CN114064997B (en) | 2024-11-05 |
Family
ID=80274432
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111310992.4A Active CN114064997B (en) | 2021-11-08 | 2021-11-08 | An artificial intelligence power dispatch decision-making system based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114064997B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117277566B (en) * | 2023-09-20 | 2024-05-03 | 国网河南省电力公司濮阳供电公司 | Power grid data analysis and power dispatching system and method based on big data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104820670A (en) * | 2015-03-13 | 2015-08-05 | 国家电网公司 | Method for acquiring and storing big data of power information |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100835307B1 (en) * | 2006-12-08 | 2008-06-04 | 부산대학교 산학협력단 | Agent system and method for collecting schedule information on the web |
CN110046294A (en) * | 2019-03-04 | 2019-07-23 | 国网浙江省电力有限公司经济技术研究院 | A kind of energy information system based on electric power big data |
-
2021
- 2021-11-08 CN CN202111310992.4A patent/CN114064997B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104820670A (en) * | 2015-03-13 | 2015-08-05 | 国家电网公司 | Method for acquiring and storing big data of power information |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
Also Published As
Publication number | Publication date |
---|---|
CN114064997A (en) | 2022-02-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111460252B (en) | Automatic search engine method and system based on network public opinion analysis | |
CN109446344B (en) | Intelligent analysis report automatic generation system based on big data | |
WO2019214245A1 (en) | Information pushing method and apparatus, and terminal device and storage medium | |
EP3270303A1 (en) | An automated monitoring and archiving system and method | |
CN117726166A (en) | Artificial intelligence enterprise customer risk information analysis and evaluation method and system based on large language model | |
CN117909440B (en) | Intelligent archive index and retrieval system | |
TWI793432B (en) | Document management method and system for engineering project | |
CN112182248A (en) | A Statistical Method for the Key Policy of Electricity Price | |
CN111382184A (en) | Method for verifying drug document and drug document verification system | |
CN111400369A (en) | Big data analysis-based policy information service system and method | |
CN111666499A (en) | Public opinion monitoring cloud service platform based on big data | |
CN115132366A (en) | Multi-source data processing method and system based on health and medical big data standard library | |
CN118410405A (en) | Intelligent identification system for hierarchical relationship of data assets | |
CN114064997B (en) | An artificial intelligence power dispatch decision-making system based on big data | |
CN119249094A (en) | A multi-source heterogeneous data fusion and analysis method and system | |
CN119597929A (en) | Novel intelligent data architecture knowledge retrieval method and system for power system construction | |
CN113469704A (en) | System and method for automatically recommending customer service strategy | |
CN117763076B (en) | File retrieval method and system based on cloud computing | |
CN115982429B (en) | Knowledge management method and system based on flow control | |
CN112800219B (en) | Method and system for feeding back customer service log to return database | |
CN116401338A (en) | Design feature extraction and attention mechanism and method based on data asset intelligent retrieval input and output requirements | |
CN114722801B (en) | Government data classification storage method and related device | |
CN114841155A (en) | Intelligent theme content aggregation method and device, electronic equipment and storage medium | |
CN117668313A (en) | Industrial search systems, methods, electronic devices and storage media | |
Hu et al. | A faq finding process in open source project forums |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |