[go: up one dir, main page]

CN119988157A - A data collection method and system based on big data of intelligent operation and maintenance platform - Google Patents

A data collection method and system based on big data of intelligent operation and maintenance platform Download PDF

Info

Publication number
CN119988157A
CN119988157A CN202411819939.0A CN202411819939A CN119988157A CN 119988157 A CN119988157 A CN 119988157A CN 202411819939 A CN202411819939 A CN 202411819939A CN 119988157 A CN119988157 A CN 119988157A
Authority
CN
China
Prior art keywords
data
data source
clustering
characteristic
source set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411819939.0A
Other languages
Chinese (zh)
Other versions
CN119988157B (en
Inventor
董惠玉
毛董东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lianxin Information Technology Co ltd
Original Assignee
Zhejiang Lianxin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lianxin Information Technology Co ltd filed Critical Zhejiang Lianxin Information Technology Co ltd
Priority to CN202411819939.0A priority Critical patent/CN119988157B/en
Publication of CN119988157A publication Critical patent/CN119988157A/en
Application granted granted Critical
Publication of CN119988157B publication Critical patent/CN119988157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种基于智能运维平台大数据的数据采集方法和系统。其中,该方法包括:获取多个数据源的配置文件,并基于多个数据源的配置文件使用多层次规则引擎从多个数据源中筛选出目标数据源,以形成数据源集合;采用目标预测模型预测数据源集合对应的最优数据采集方式;目标预测模型引入有支持向量机、目标聚类算法和长短期记忆网络;结合智能运维平台的分布式计算框架,按照最优数据采集方式将数据源集合的数据采集任务分配给智能运维平台内的多台设备,以供多台设备并行采集数据源集合中所有数据源采集到的原始数据;对原始数据进行处理,并根据处理后的数据和运维业务需求信息确定运维结果。本申请提供的方案提高了数据采集的准确性和灵活性。

The present application provides a data collection method and system based on big data of an intelligent operation and maintenance platform. The method includes: obtaining configuration files of multiple data sources, and using a multi-level rule engine to filter out target data sources from multiple data sources based on the configuration files of multiple data sources to form a data source set; using a target prediction model to predict the optimal data collection method corresponding to the data source set; the target prediction model introduces a support vector machine, a target clustering algorithm, and a long short-term memory network; combining the distributed computing framework of the intelligent operation and maintenance platform, the data collection tasks of the data source set are assigned to multiple devices in the intelligent operation and maintenance platform according to the optimal data collection method, so that multiple devices can collect the original data collected by all data sources in the data source set in parallel; processing the original data, and determining the operation and maintenance results based on the processed data and the operation and maintenance business demand information. The solution provided by the present application improves the accuracy and flexibility of data collection.

Description

Data acquisition method and system based on big data of intelligent operation and maintenance platform
Technical Field
The embodiment of the application relates to the technical field of data acquisition, in particular to a data acquisition method and system based on big data of an intelligent operation and maintenance platform.
Background
With the rapid development of information technology, the amount of data generated by enterprises and organizations has increased explosively. Intelligent operation and maintenance platforms are important tools for managing and maintaining large-scale systems, requiring efficient collection and processing of data from multiple data sources. The data acquisition is one of the core functions of the intelligent operation and maintenance platform, and directly affects the performance, reliability and safety of a large-scale system.
Existing data acquisition methods are mostly based on fixed rules or simple algorithms. Because the requirements of different data sources and application scenes on data acquisition are different, the existing data acquisition method is difficult to adapt to the requirements of diversified data sources and complex operation and maintenance, and the problems of low data acquisition accuracy and low flexibility are caused.
Disclosure of Invention
The embodiment of the application provides a data acquisition method and system based on big data of an intelligent operation and maintenance platform, which are used for solving the problems of low data acquisition accuracy and low flexibility in the prior art.
In a first aspect, an embodiment of the present application provides a data acquisition method based on big data of an intelligent operation and maintenance platform, including:
acquiring configuration files of a plurality of data sources connected with an intelligent operation and maintenance platform, and screening target data sources from the plurality of data sources by using a multi-level rule engine based on the configuration files of the plurality of data sources to form a data source set;
predicting an optimal data acquisition mode corresponding to the data source set by adopting a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network;
The method comprises the steps that a distributed computing framework of an intelligent operation and maintenance platform is combined, data acquisition tasks of a data source set are distributed to a plurality of devices in the intelligent operation and maintenance platform according to an optimal data acquisition mode, and the plurality of devices are used for parallelly acquiring original data acquired by all data sources in the data source set;
And processing the original data, and determining an operation and maintenance result according to the processed data and the operation and maintenance service requirement information.
Optionally, predicting an optimal data acquisition mode corresponding to the data source set by using a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network, and the method comprises the following steps:
Analyzing the characteristics of each data source in the data source set by using a support vector machine and a target clustering algorithm based on the configuration file of each data source in the data source set to obtain an analysis result;
and according to the analysis result, predicting an optimal data acquisition mode corresponding to the data source set by adopting a long-term and short-term memory network in combination with the enterprise operation and maintenance personalized demand information and the historical data acquisition mode.
Optionally, the analyzing, based on the configuration file of each data source in the data source set, the characteristics of each data source in the data source set by using a support vector machine and a target clustering algorithm to obtain an analysis result includes:
extracting first characteristic information of each data source in the data source set by adopting a metadata extraction technology based on configuration files of each data source in the data source set;
Based on the first characteristic information of each data source in the data source set, classifying the class attribute of all the data sources in the data source set by adopting a support vector machine to obtain a data source classification result;
Based on the data source classification result, clustering all data sources in each category by adopting a target clustering algorithm to obtain a data source clustering result;
Based on the data source clustering result, adopting a principal component analysis technology to select partial characteristic information from the first characteristic information and taking the partial characteristic information as second characteristic information;
Optimizing the selection of an initial center point of a target clustering algorithm in the clustering process by adopting a genetic algorithm based on the association relation between the data sources so as to obtain an optimized data source clustering result;
Based on the optimized data source clustering result, constructing a characteristic probability model of the data source by adopting a Bayesian network so as to form a data source behavior prediction model;
And generating an analysis result based on the configuration file of each data source in the data source set, the first characteristic information of each data source in the data source set, the data source classification result, the data source clustering result, the second characteristic information, the association relationship among the data sources, the optimized data source clustering result and the data source behavior prediction model.
The target clustering algorithm comprises a K-means clustering algorithm, wherein the clustering processing is carried out on all data sources in each category by adopting the target clustering algorithm based on the data source classification result to obtain a data source clustering result, and the method comprises the following steps:
defining a characteristic quantization index system of a data source, wherein the characteristic quantization index system comprises characteristic quantization indexes including data size, access delay, update frequency and safety;
Based on the data source classification result and the characteristic quantization index system, carrying out standard quantization processing on the first characteristic information of all the data sources in each category by adopting a standardized technology to obtain characteristic values of all the data sources in each category;
for each category, a hierarchical clustering algorithm is introduced to determine the grouping number of the data source, and the grouping number is used as the value of the initial input parameter of the K-means clustering algorithm;
Selecting an initial center point in a probability weighting mode, executing a K-means clustering process by adopting a K-means clustering algorithm based on the value of an initial input parameter and the initial center point, and introducing a dynamic adjustment mechanism of distance measurement in the clustering process to obtain a data source clustering result;
And evaluating the data source clustering result by using a contour coefficient method to obtain a contour coefficient value, and adjusting the value of the initial input parameter or optimizing the distance metric standard under the condition that the contour coefficient value is smaller than a preset threshold value, and repeatedly executing the K-means clustering process and the evaluation operation until the contour coefficient value is larger than or equal to the preset threshold value.
Optionally, a dynamic adjustment mechanism of distance measurement is introduced in the clustering process to obtain a data source clustering result, wherein the dynamic adjustment mechanism of distance measurement indicates that different distance measurement standards are adopted for data sources with different characteristic types, and the method comprises the following steps:
constructing a distance metric library by adopting various distance metrics, wherein the distance metric library comprises Euclidean distance, manhattan distance, cosine similarity distance and dynamic time warping distance;
Based on the initial characteristic classification result, selecting a distance metric corresponding to each classification in the initial characteristic classification result from the distance metric library by adopting a self-adaptive distance metric selection algorithm;
Assigning corresponding weights to all characteristic quantization indexes in each category according to the characteristic importance, and dynamically adjusting the distance measurement standard corresponding to each category according to the weights corresponding to all characteristic quantization indexes in each category to obtain the adjusted distance measurement standard corresponding to each category;
and determining a data source clustering result based on the adjusted distance measurement standards corresponding to all the classifications.
Optionally, optimizing the selection of the initial center point of the target clustering algorithm in the clustering process by adopting a genetic algorithm based on the association relationship between the data sources to obtain an optimized data source clustering result, including:
initializing a population to obtain an initial population, wherein each individual in the initial population represents the position of a center point to be determined of a group of data sources;
Calculating the fitness value of each individual in the population based on the fitness function, wherein the population is an initial population in the first iteration process and is an updated population in other times of iteration processes;
Selecting based on the fitness value of each individual to obtain an optimized population, and performing cross operation and mutation operation on the individuals in the optimized population to obtain an updated population;
Judging whether a preset iteration stopping condition is reached, if not, re-executing the calculation step, the selection operation, the cross operation, the mutation operation and the judgment step until the preset iteration stopping condition is reached, wherein the preset iteration stopping condition is that the maximum iteration times or the genetic algorithm convergence is reached.
Optionally, according to the analysis result, in combination with the personalized requirement information of enterprise operation and maintenance and the historical data collection mode, the optimal data collection mode corresponding to the data source set is predicted by adopting a long-term and short-term memory network, including:
based on the analysis result, generating a comprehensive demand feature matrix by combining the enterprise operation and maintenance personalized demand information;
based on the comprehensive demand feature matrix, generating time sequence features of a historical acquisition mode by adopting a time sequence analysis method;
Based on the comprehensive demand characteristic matrix and the time sequence characteristics of the historical acquisition mode, the optimal data acquisition mode corresponding to the data source set is predicted by adopting a long-short-period memory network.
In a second aspect, an embodiment of the present application provides a data acquisition system based on big data of an intelligent operation and maintenance platform, including:
The system comprises an acquisition and screening module, a data source collection module and a data source management module, wherein the acquisition and screening module is used for acquiring configuration files of a plurality of data sources connected with the intelligent operation and maintenance platform, and screening target data sources from the plurality of data sources by using a multi-level rule engine based on the configuration files of the plurality of data sources to form the data source collection;
the prediction module is used for predicting an optimal data acquisition mode corresponding to the data source set by adopting a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network;
The classification acquisition module is used for combining a distributed computing framework of the intelligent operation and maintenance platform, distributing data acquisition tasks of a data source set to a plurality of devices in the intelligent operation and maintenance platform according to an optimal data acquisition mode, and enabling the plurality of devices to acquire original data acquired by all data sources in the data source set in parallel;
And the processing determining module is used for processing the original data and determining an operation and maintenance result according to the processed data and the operation and maintenance service requirement information.
In a third aspect, an embodiment of the present application provides a computing device, including a processing component and a storage component, where the storage component stores one or more computer instructions, and the one or more computer instructions are used to be invoked and executed by the processing component to implement a data collection method based on big data of an intelligent operation and maintenance platform according to any one of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer storage medium storing a computer program, where the computer program when executed by a computer implements a data collection method based on big data of an intelligent operation and maintenance platform according to any one of the first aspects.
The embodiment of the application provides a data acquisition method based on big data of an intelligent operation and maintenance platform, which comprises the steps of acquiring configuration files of a plurality of data sources connected with the intelligent operation and maintenance platform, and screening target data sources from the plurality of data sources by using a multi-level rule engine based on the configuration files of the plurality of data sources to form a data source set; the method comprises the steps of predicting an optimal data acquisition mode corresponding to a data source set by using a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network, distributing data acquisition tasks of the data source set to a plurality of devices in an intelligent operation and maintenance platform according to the optimal data acquisition mode by combining a distributed computing framework of the intelligent operation and maintenance platform, enabling the plurality of devices to acquire original data acquired by all data sources in the data source set in parallel, processing the original data, and determining operation and maintenance results according to the processed data and operation and maintenance service requirement information.
According to the embodiment, the multi-layer rule engine can flexibly set multi-layer rules according to different operation and maintenance requirements and scenes, and accuracy and applicability of screening results are ensured. The target prediction model in the embodiment combines a support vector machine, a target clustering algorithm and a long-term and short-term memory network, so that an optimal data acquisition mode can be intelligently predicted, and the accuracy and the flexibility of data acquisition are improved. Specifically, the embodiment uses the support vector machine to perform classification analysis on the characteristics of the data sources, so that different types of data sources, such as structured data, unstructured data, static data, dynamic data and the like, can be accurately distinguished. According to the embodiment, the data sources are classified into different groups according to the characteristics (such as data size, access delay and the like) of the data sources through a target clustering algorithm, so that finer-granularity characteristic analysis is facilitated. According to the method, the time sequence analysis capability of the long-period and short-period memory network is utilized, the optimal data acquisition mode is predicted by combining the historical data acquisition mode and the enterprise operation and maintenance personalized demand, and the personalized data acquisition mode is provided according to the demands of different data sources and application scenes.
These and other aspects of the application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data acquisition method based on big data of an intelligent operation and maintenance platform provided by an embodiment of the application;
Fig. 2 is a schematic structural diagram of a data acquisition system based on big data of an intelligent operation and maintenance platform according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
In order to enable those skilled in the art to better understand the present application, the following description will make clear and complete descriptions of the technical solutions according to the embodiments of the present application with reference to the accompanying drawings.
In some of the flows described in the specification and claims of the present application and in the foregoing figures, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed in other than the order in which they appear herein or in parallel, the sequence numbers of the operations such as S11, S12, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
Fig. 1 is a flowchart of a data collection method based on big data of an intelligent operation and maintenance platform according to an embodiment of the present application, as shown in fig. 1, the method includes:
S11, acquiring configuration files of a plurality of data sources connected with the intelligent operation and maintenance platform, and screening target data sources from the plurality of data sources by using a multi-level rule engine based on the configuration files of the plurality of data sources to form a data source set.
It should be appreciated that the present embodiment utilizes an automation tool (e.g., a network scanning tool) of the intelligent operation and maintenance platform to automatically identify and configure various data sources by scanning the network environment, reading configuration files, and the like. The data sources may include databases, log files, network interfaces, and the like.
Specifically, the step S11 may include the step of scanning the network environment of the intelligent operation and maintenance platform by using a network scanning tool to obtain a scanning result including a data source list and detailed information thereof, where the detailed information includes an internet protocol (Internet Protocol, IP) address, a port number, a user name, a password, and the like. Step 112, reading and analyzing the existing configuration file by using a configuration file analyzer (such as regular expression, target analyzer, etc.), dynamically generating a new configuration file by using a template engine and a programming language (such as Python, java, etc.) according to the analyzed existing configuration information and the scanning result, and introducing a genetic algorithm in the generation process. The configuration file includes a data source type, an IP address, a port number, authentication information, etc. The target parser may be a JavaScript object notation (JavaScript Object Notation, JSON) parser, extensible markup language (eXtensible Markup Language, XML) parser, or the like, among others. And 113, carrying out grammar and logic verification on the new configuration file by using a configuration file verification tool to obtain a verified configuration file.
It should also be appreciated that the multi-level rules engine may be designed with preset, dynamic, and comprehensive rules layers. The preset rule layer may be built with a plurality of preset rules, including screening conditions of common data sources. The dynamic rule layer can intelligently and dynamically generate proper screening rules by utilizing a machine learning algorithm according to historical data and user behavior data so as to improve the accuracy and efficiency of screening. The comprehensive rule layer can combine preset rules and screening rules to form comprehensive screening rules, so that screening accuracy and screening efficiency are improved. For the dynamic rule layer, the intelligent operation and maintenance platform can intelligently recommend the most suitable screening rule according to historical data and user behaviors by using a machine learning algorithm (such as a decision tree, a random forest and the like), so that the screening accuracy and efficiency are improved. Wherein, the historical data comprises screening records, screening results, user feedback and the like in the past time period. The user behavior data includes screening habits, common rules, preference settings, etc. of the user.
Optionally, a powerful multi-level rule engine is built in the intelligent operation and maintenance platform, and the data sources can be screened according to preset rules, comprehensive screening rules and the like to obtain a data source set. These rules may be based on various conditions of data type, data format, time stamp, keywords, etc. Illustratively, the data type is a specified database or a log file of a particular type. The data format is a log file with a preset format. The time stamp is the data source updated in the last week. Keywords are numerical values within a particular word or symbol or interval.
S12, predicting an optimal data acquisition mode corresponding to the data source set by using a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network.
Specifically, the embodiment can utilize a support vector machine and a K-means clustering algorithm to perform deep analysis on characteristics (such as data types, data amounts, network environments and the like) of the data sources based on the data source set, so as to obtain a detailed analysis report. Illustratively, the support vector machine is used to classify the data source types, so as to ensure that different types of data correspond to different processing modes. And grouping the data sources by using a K-means clustering algorithm to obtain a data source clustering result. And predicting an optimal data acquisition mode by using a long-term and short-term memory network according to the data source clustering result.
S13, combining the distributed computing framework of the intelligent operation and maintenance platform, distributing the data acquisition tasks of the data source set to a plurality of devices in the intelligent operation and maintenance platform according to an optimal data acquisition mode, so that the plurality of devices can acquire the original data acquired by all the data sources in the data source set in parallel.
It should be appreciated that the intelligent operation and maintenance platform supports a variety of data access modes including application programming interface (Application Programming Interface, API) calls, file reads, message queues, and the like. The intelligent operation and maintenance platform can automatically select an optimal data acquisition mode, and high efficiency and reliability of data acquisition are ensured. The distributed computing framework of the intelligent operation and maintenance platform can be APACHE SPARK, apache Hadoop or Spark type distributed computing framework, and can also be other types of frameworks, and the structure of the framework is not particularly limited in this embodiment. For example, the step can be performed by using APACHE SPARK distributed computing frames to allocate data collection tasks for data in the data source to multiple devices of the intelligent operation and maintenance platform in parallel according to the optimal data collection mode predicted in the previous step. Accordingly, by using the distributed computing framework, the intelligent operation and maintenance platform can process data acquisition tasks of a plurality of data sources in parallel, and the speed and concurrency of data acquisition are improved.
Optionally, the embodiment can dynamically adjust the task allocation strategy according to the load condition and the network condition of each device by combining the self-adaptive task scheduling algorithm and the reinforcement learning technology aiming at the data acquisition task of the data source set to obtain the task scheduling result, and the task scheduling result can ensure that each data acquisition task can be efficiently executed on the most suitable device. In addition, the embodiment can further optimize the allocation of computing resources based on the task scheduling result by combining a resource management algorithm and a linear programming method, and obtain an optimized task scheduling result. The embodiment reduces resource waste while improving the resource utilization rate, and ensures that the optimal data acquisition effect is achieved under the limited resources.
S14, processing the original data, and determining an operation and maintenance result according to the processed data and the operation and maintenance service requirement information.
It should be appreciated that the purpose of data collection is to support various functions of intelligent operation and maintenance, such as fault detection, performance optimization, resource management, etc. Therefore, after the data acquisition task is performed, the embodiment can transmit the acquired original data to the data processing module. The intelligent operation and maintenance platform can adopt high-efficiency data transmission protocols (such as Kafka, flume and the like) to ensure the stability and speed of data transmission. Meanwhile, the intelligent operation and maintenance platform supports data compression and breakpoint continuous transmission, and the integrity and reliability of data transmission are guaranteed. The data processing module in the intelligent operation and maintenance platform can process the original data in real time, and illustratively, a stream processing framework (such as APACHE FLINK and Storm) is utilized for data cleaning, formatting and preliminary analysis, so that the instant availability of the data is ensured. Still further exemplary, the present embodiment may employ a data cleansing technique to remove invalid or erroneous data from the collected raw data, and then use a feature selection algorithm to pick out the most valuable information. And finally, identifying potential problems or risk points through an anomaly detection algorithm, and providing accurate data support for operation and maintenance decision.
Alternatively, the present embodiment may store the processed data in a designated data warehouse or database. Specifically, the intelligent operation and maintenance platform supports various data storage modes, including relational databases (such as MySQL, postgreSQL), noSQL databases (such as MongoDB, cassandra), data warehouses (such as Hive and Amazon Redshift), and the like. The intelligent operation and maintenance platform can automatically select the most suitable storage mode according to the data characteristics and the application scene. The intelligent operation and maintenance platform can automatically create data indexes and partitions, and improves the performance of data query and analysis. Meanwhile, the intelligent operation and maintenance platform supports data life cycle management, automatically files and deletes expired data, and saves storage space.
According to the embodiment, the multi-layer rule engine can flexibly set multi-layer rules according to different operation and maintenance requirements and scenes, and accuracy and applicability of screening results are ensured. The target prediction model in the embodiment combines a support vector machine, a target clustering algorithm and a long-term and short-term memory network, so that an optimal data acquisition mode can be intelligently predicted, and the accuracy and the flexibility of data acquisition are improved. Specifically, the embodiment uses the support vector machine to perform classification analysis on the characteristics of the data sources, so that different types of data sources, such as structured data, unstructured data, static data, dynamic data and the like, can be accurately distinguished.
In some possible embodiments, S12, predicting an optimal data acquisition mode corresponding to the data source set by using a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network, and the method comprises the following steps:
And 121, analyzing the characteristics of each data source in the data source set by using a support vector machine and a target clustering algorithm based on the configuration file of each data source in the data source set to obtain an analysis result.
And step 122, according to the analysis result, combining the enterprise operation and maintenance personalized demand information and the historical data acquisition mode, and adopting an optimal data acquisition mode corresponding to the long-term and short-term memory network prediction data source set.
The present embodiment classifies data sources into different groups (or groups) according to characteristics of the data sources (such as data size, access delay, etc.) through a target clustering algorithm, so as to facilitate finer-grained characteristic analysis. According to the method, the time sequence analysis capability of the long-period and short-period memory network is utilized, the optimal data acquisition mode is predicted by combining the historical data acquisition mode and the enterprise operation and maintenance personalized demand, and the personalized data acquisition mode is provided according to the demands of different data sources and application scenes.
In the foregoing embodiment, as a possible implementation manner, step 121, based on the configuration file of each data source in the data source set, uses a support vector machine and a target clustering algorithm to analyze the characteristics of each data source in the data source set to obtain an analysis result, where the analysis result includes:
And a1, extracting first characteristic information of each data source in the data source set by adopting a metadata extraction technology based on configuration files of each data source in the data source set. The first characteristic information includes, but is not limited to, data type, data size, update frequency, network environment parameters, etc., and provides data support for subsequent characteristic analysis.
And a2, based on the first characteristic information of each data source in the data source set, classifying the class attribute of all the data sources in the data source set by adopting a support vector machine to obtain a data source classification result. The classification process is used for evaluating similarity and difference among different data sources, and the data source classification result is used for distinguishing a structured data source from a unstructured data source or distinguishing a static data source from a dynamic data source and the like.
And a3, based on the data source classification result, clustering all the data sources in each category by adopting a target clustering algorithm to obtain a data source clustering result. Target clustering algorithms include, but are not limited to, the K-means clustering algorithm. In the clustering process, the data sources can be classified into different groups according to factors such as data size, access delay and the like, so that finer characteristic analysis is facilitated.
And a step a4 of selecting partial characteristic information from the first characteristic information by adopting a principal component analysis technology based on the clustering result of the data sources, taking the partial characteristic information as second characteristic information, and generating the association relation between the data sources in each category by adopting an association rule learning algorithm based on the second characteristic information. It should be understood that the principal component analysis technique eliminates redundant and irrelevant characteristic information, retains the key characteristic with the greatest influence on the clustering result, can reduce characteristic dimension, and simultaneously keeps the key characteristic of the data source unchanged, so as to improve analysis efficiency and optimize performance of the support vector machine and the target clustering algorithm. Association rule learning algorithms may explore implicit relationships that may exist between data sources, such as where some data sources are updated simultaneously under certain conditions, or where some types of data tend to occur in certain network environments, which findings facilitate a deep understanding of the data source's mode of operation.
And a5, optimizing the selection of initial center points of a target clustering algorithm in the clustering process by adopting a genetic algorithm based on the association relation between the data sources so as to obtain an optimized data source clustering result. The selection of the initial center in the K-means clustering process is optimized by adopting a genetic algorithm, so that the clustering effect can be enhanced, the data sources in each group are ensured to have high similarity, and the data source differences among different groups are obvious.
And a step a6 of constructing a characteristic probability model of the data source by adopting a Bayesian network based on the optimized data source clustering result so as to form a data source behavior prediction model. And constructing a characteristic probability model of the data sources by adopting a Bayesian network, and predicting the behavior mode of a certain data source under a specific condition by analyzing the condition dependency relationship among the data sources so as to provide a basis for the selection and the priority ordering of the data sources.
And a step a7 of generating an analysis result based on the configuration file of each data source in the data source set, the first characteristic information of each data source in the data source set, the data source classification result, the data source clustering result, the second characteristic information, the association relation among the data sources, the optimized data source clustering result and the data source behavior prediction model. It should be appreciated that the analysis results may be indicative of the results of a characteristic analysis of the data source. The analysis result not only comprises the characteristic quantitative analysis result of the data source, but also comprises characteristic qualitative evaluation and suggestion, such as which data source is most suitable for real-time processing, which data source is more suitable for batch processing, and the like, so that comprehensive data support is provided for a decision maker.
Through the refinement steps, the embodiment not only deepens the understanding of the characteristic analysis of the data source, but also ensures the scientificity and rationality of the analysis process by introducing various algorithms and technical means, and provides more accurate characteristic analysis results of the data source.
In the embodiment, the target clustering algorithm comprises a K-means clustering algorithm, and correspondingly, step a3, based on the data source classification result, adopts the target clustering algorithm to perform clustering processing on all the data sources in each category to obtain the data source clustering result, and comprises the following steps:
Step a31, defining a characteristic quantization index system of the data source, wherein the characteristic quantization index system comprises the following characteristic quantization indexes of data size, access delay, update frequency and safety. According to the embodiment, each characteristic quantization index can be ensured to accurately reflect the characteristics of the data source, and accurate data input is provided for subsequent cluster analysis.
And a32, carrying out standard quantization processing on the first characteristic information of all the data sources in each category by adopting a standardized technology based on the data source classification result and the characteristic quantization index system, so as to obtain the characteristic values of all the data sources in each category. And processing each item of first characteristic information by adopting a standardized technology, and eliminating deviation caused by different dimensions so as to ensure that the K-means clustering algorithm can fairly consider the influence of each characteristic quantization index and improve the accuracy of a clustering result.
Step a33, introducing a hierarchical clustering algorithm to determine the number of packets of the data source according to each category, and taking the number of packets as the value of the initial input parameter of the K-means clustering algorithm. The number of packets is alternatively referred to as the number of groups. Specifically, the embodiment can introduce a hierarchical clustering algorithm as a preprocessing step of the K-means clustering algorithm, firstly preliminarily determine the grouping number of the data sources through hierarchical clustering, and then use the grouping number as an input parameter K of the K-means clustering algorithm so as to solve the problem that the K-means clustering algorithm needs to be pre-assigned with a K value, and improve the effectiveness and rationality of clustering.
Step a34, selecting an initial center point in a probability weighting mode, executing a K-means clustering process by adopting a K-means clustering algorithm based on the value of an initial input parameter and the initial center point, and introducing a dynamic adjustment mechanism of distance measurement in the clustering process to obtain a data source clustering result, wherein the dynamic adjustment mechanism of the distance measurement indicates that different distance measurement standards are adopted for data sources of different characteristic types. It should be appreciated that by selecting the initial center point in a probability weighted manner, the bias caused by random selection can be reduced, and the quality and stability of clustering can be improved. The dynamic adjustment mechanism for introducing the distance measurement means that the distance measurement standard is dynamically adjusted according to the actual difference between the data source characteristics, for example, euclidean distance can be adopted for high-dimensional characteristics, and dynamic time warping distance can be adopted for time sequence characteristics, so that the requirements of different types of data source characteristics can be met, and the clustering precision can be further improved.
Specifically, in step a34, a dynamic adjustment mechanism of distance measurement is introduced in the clustering process to obtain a data source clustering result, wherein the dynamic adjustment mechanism of distance measurement indicates that different distance measurement standards are adopted for data sources with different characteristic types, and the method comprises the following steps:
Step a341, constructing a distance metric library by adopting various distance metrics, wherein the distance metric library comprises Euclidean distance, manhattan distance, cosine similarity distance and dynamic time warping distance.
And a342, classifying all characteristic quantization indexes in a characteristic quantization index system by adopting a characteristic engineering method to obtain an initial characteristic classification result, and selecting a distance metric corresponding to each classification in the initial characteristic classification result from a distance metric library by adopting a self-adaptive distance metric selection algorithm based on the initial characteristic classification result. In the embodiment, various distance measurement standards are introduced, and a self-adaptive distance measurement selection algorithm is designed, so that the most suitable distance measurement standard can be dynamically selected according to the actual condition of the characteristics of the data source in the data source clustering process, and the accuracy and the reliability of the data source clustering result are improved.
And a343, endowing corresponding weights to all characteristic quantization indexes in each category according to the characteristic importance, and dynamically adjusting the distance measurement standard corresponding to each category according to the weights corresponding to all characteristic quantization indexes in each category to obtain the adjusted distance measurement standard corresponding to each category.
Step a344, determining the clustering result of the data source based on the adjusted distance measurement standards corresponding to all the classifications.
By introducing a dynamic adjustment mechanism of distance measurement, the embodiment can dynamically select the most suitable distance measurement standard according to the actual condition of the characteristics of the data source in the data source clustering process. Specifically, the mechanism selects the most suitable distance metric from the standard library by constructing a plurality of distance metric libraries (comprising Euclidean distance, manhattan distance, cosine similarity distance and dynamic time warping distance), classifying the characteristic quantization indexes by adopting a characteristic engineering method and combining an adaptive distance metric selection algorithm. In addition, a weight is given to the characteristic quantization index in each category according to the characteristic importance, and the distance measurement standard corresponding to each category is dynamically adjusted, so that the accuracy and the reliability of the clustering result of the data source are ensured. The embodiment not only improves the accuracy of data source clustering, but also enhances the robustness and generalization capability of the model, so that the data source clustering result is more in line with the characteristics of actual data, and a reliable basis is provided for subsequent data analysis and application.
And a35, evaluating the data source clustering result by using a contour coefficient method to obtain a contour coefficient value, and adjusting the value of an initial input parameter or optimizing a distance metric standard under the condition that the contour coefficient value is smaller than a preset threshold value, and repeatedly executing a K-means clustering process and an evaluation operation until the contour coefficient value is larger than or equal to the preset threshold value.
It should be understood that after clustering is completed, the clustering result of the data sources is evaluated by using a contour coefficient method, and the clustering quality is measured by calculating the distance ratio of each data source relative to the cluster in which the data source is located and other clusters. If the profile coefficient is lower, the clustering process is re-executed by considering the adjustment of the K value or the optimization of the distance measurement standard until a satisfactory data source clustering result is obtained.
Optionally, based on the optimized distance measurement standard, the stability and the robustness of the data source clustering result are tested by adopting a method of multiple operation, noise test and parameter sensitivity analysis, so that a stable and robust data source clustering result is obtained.
Through the steps, the applicability and the accuracy of the K-means clustering algorithm can be improved through the application of the standardized technology, the hierarchical clustering algorithm, the K-means clustering algorithm and the dynamic adjustment mechanism introducing the distance measurement, and meanwhile the rationality of the clustering analysis process is ensured.
Specifically, step a5, optimizing the selection of an initial center point of a target clustering algorithm in the clustering process by adopting a genetic algorithm based on the association relation between data sources to obtain an optimized data source clustering result, wherein the method comprises the following steps:
Step a51, initializing a population to obtain an initial population, wherein each individual in the initial population represents the position of a center point to be determined of a group of data sources. Population size may be determined based on the number of data sources in a category and the characteristic dimension.
And a52, calculating the fitness value of each individual in the population based on the fitness function, wherein the population is an initial population in the first iteration process and is an updated population in other times of iteration processes. The fitness function may select the individual with the highest fitness as the current optimal solution based on the compactness and the degree of separation of the clusters, etc.
Step a53, performing a selection operation based on the fitness value of each individual to obtain an optimized population, and performing a crossover operation and a mutation operation on the individuals in the optimized population to obtain an updated population. Specifically, the embodiment adopts the crossover operation of a genetic algorithm, generates new individuals by selecting two individuals with higher adaptability to carry out gene exchange, increases the diversity of the population to obtain the crossed population, adopts the mutation operation of the genetic algorithm based on the crossed population to randomly change part of genes of some individuals, prevents the algorithm from converging prematurely, maintains the exploratory capacity of the population, and obtains the updated population.
And a step a54 of judging whether a preset iteration stopping condition is reached, if not, re-executing the calculation step, the selection operation, the cross operation, the variation operation and the judgment step in the step a 52-a 54 until the preset iteration stopping condition is reached, wherein the preset iteration stopping condition is that the maximum iteration times or the genetic algorithm convergence is reached.
Correspondingly, the quality and stability of the data source clustering result can be improved by optimizing the selection of the initial center point of the target clustering algorithm in the clustering process through the genetic algorithm. Specifically, in this embodiment, by initializing a population and calculating an fitness value of each individual based on a fitness function, an individual with the highest fitness is selected as a current optimal solution. Then, the population is continuously optimized through selection, crossing and mutation operations, so that the diversity and exploratory capacity of the population are increased, and premature convergence of the algorithm is prevented. The embodiment not only improves the rationality of the initial center point selection, but also enhances the robustness and accuracy of the clustering algorithm, and finally obtains the optimized data source clustering result. The clustering result of the data sources is more stable and reliable, the actual association relation between the data sources can be reflected better, and a solid foundation is provided for subsequent data analysis and application.
In the foregoing embodiment, as a possible implementation manner, step 122, according to the analysis result, combines the personalized requirement information of the enterprise operation and maintenance and the historical data collection manner, adopts the optimal data collection manner corresponding to the long-term memory network prediction data source set, and includes:
and b1, generating a comprehensive demand feature matrix by combining the enterprise operation and maintenance personalized demand information based on the analysis result.
And b2, generating time sequence features of a historical acquisition mode by adopting a time sequence analysis method based on the comprehensive demand feature matrix.
And b3, based on the comprehensive demand feature matrix and the time sequence features of the historical acquisition mode, adopting an optimal data acquisition mode corresponding to the long-term and short-term memory network prediction data source set.
Based on analysis results and enterprise operation and maintenance personalized demand information, the comprehensive demand feature matrix is generated, so that the model can comprehensively consider specific demands and actual conditions of enterprises, and the pertinence and practicability of prediction are improved. Based on the comprehensive demand feature matrix, a time sequence analysis method is adopted to generate time sequence features of a historical acquisition mode, so that the time dependence and the change trend of historical data are fully utilized, and the prediction capability of a model is enhanced. The long-term and short-term memory network improves the accuracy of the predicted data acquisition mode and provides more efficient and reliable operation and maintenance support for enterprises.
In summary, the present embodiment has the following advantages:
(1) Most existing data acquisition methods rely on manual configuration and selection of acquisition modes, and lack an intelligent automatic selection mechanism. The intelligent operation and maintenance platform in the embodiment automatically selects the optimal data acquisition mode according to the characteristics of the data source (such as data type, data volume, network environment and the like) by an automatic tool, a support vector machine and a K-means clustering algorithm and combining the enterprise operation and maintenance personalized demand information and the historical data acquisition mode, so that manual intervention is reduced, and acquisition efficiency and reliability are improved.
(2) The existing data acquisition method generally adopts single-equipment or multi-equipment serial processing, and cannot fully utilize the computing resources of a plurality of equipment, so that the data acquisition speed is limited. The intelligent operation and maintenance platform in the embodiment utilizes the distributed computing framework to distribute the data acquisition tasks to a plurality of devices in parallel for execution, so that the speed and concurrency of data acquisition are greatly improved. The distributed processing mode not only can process large-scale data, but also can ensure the real-time performance and high efficiency of data acquisition.
(3) The existing data acquisition method often lacks an intelligent recommendation mechanism, and a user needs to manually configure and adjust acquisition parameters, so that errors are easy to occur, and the efficiency is low. The intelligent operation and maintenance platform in the embodiment is internally provided with a multi-level rule engine, and a dynamic rule layer in the multi-level rule engine can intelligently recommend the most suitable screening rule according to historical data and user behavior data by utilizing a machine learning algorithm, so that the accuracy and the efficiency of data acquisition are further improved.
Fig. 2 is a schematic structural diagram of a data acquisition system based on big data of an intelligent operation and maintenance platform according to an embodiment of the present application, as shown in fig. 2, the system includes:
The acquiring and filtering module 21 is configured to acquire configuration files of a plurality of data sources connected to the intelligent operation and maintenance platform, and screen a target data source from the plurality of data sources by using a multi-level rule engine based on the configuration files of the plurality of data sources, so as to form a data source set.
The prediction module 22 is configured to predict an optimal data acquisition mode corresponding to the data source set by using a target prediction model, where the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-term and short-term memory network.
The classification acquisition module 23 is configured to combine with the distributed computing framework of the intelligent operation and maintenance platform, and distribute the data acquisition task of the data source set to multiple devices in the intelligent operation and maintenance platform according to an optimal data acquisition mode, so that the multiple devices can acquire the original data acquired by all the data sources in the data source set in parallel.
The processing determining module 24 is configured to process the raw data, and determine an operation and maintenance result according to the processed data and the operation and maintenance service requirement information.
The data collection system based on the big data of the intelligent operation and maintenance platform described in fig. 2 may execute the data collection method based on the big data of the intelligent operation and maintenance platform described in the embodiment shown in fig. 1, and its implementation principle and technical effects are not described again. The specific manner in which the modules and units perform the operations in the data acquisition system based on big data of the intelligent operation and maintenance platform in the foregoing embodiments has been described in detail in the embodiments related to the method, and will not be described in detail herein.
In one possible design, the smart operation and maintenance platform big data based data acquisition system of the embodiment shown in fig. 2 may be implemented as a computing device, which may include a storage component 31 and a processing component 32, as shown in fig. 3.
The storage component 31 stores one or more computer instructions for execution by the processing component 32.
The processing component 32 is configured to obtain configuration files of a plurality of data sources connected to the intelligent operation and maintenance platform, screen target data sources from the plurality of data sources by using a multi-level rule engine based on the configuration files of the plurality of data sources to form a data source set, predict an optimal data acquisition mode corresponding to the data source set by using a target prediction model, wherein the target prediction model is introduced with a support vector machine, a target clustering algorithm and a long-short-term memory network, combine a distributed computing framework of the intelligent operation and maintenance platform, distribute data acquisition tasks of the data source set to a plurality of devices in the intelligent operation and maintenance platform according to the optimal data acquisition mode, so that the plurality of devices can acquire original data acquired by all the data sources in the data source set in parallel, process the original data, and determine operation and maintenance results according to the processed data and operation and maintenance service requirement information.
Wherein the processing component 32 may include one or more processors to execute computer instructions to perform all or part of the steps of the methods described above. Of course, the processing component may also be implemented as one or more Application-specific integrated circuits (ASICs), digital signal processors (DIGITAL SIGNAL processes, DSPs), digital signal processing devices (DIGITAL SIGNAL Process devices, DSPDs), programmable logic devices (Programmable Logic Device, PLDs), field programmable gate arrays (Field Programmable GATE ARRAY, FPGA), controllers, microcontrollers, microprocessors, or other electronic elements for performing the above method.
The storage component 31 is configured to store various types of data to support operations at the terminal. The Memory component may be implemented by any type or combination of volatile or nonvolatile Memory devices such as Random Access Memory (Random Access Memory, RAM), static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.
Of course, the computing device may necessarily include other components as well, such as input/output interfaces, display components, communication components, and the like.
The input/output interface provides an interface between the processing component and a peripheral interface module, which may be an output device, an input device, etc.
The communication component is configured to facilitate wired or wireless communication between the computing device and other devices, and the like.
The computing device may be a physical device or an elastic computing host provided by the cloud computing platform, and at this time, the computing device may be a cloud server, and the processing component, the storage component, and the like may be a base server resource rented or purchased from the cloud computing platform.
The embodiment of the application also provides a computer storage medium which stores a computer program, and the computer program can realize the data acquisition method based on the big data of the intelligent operation and maintenance platform in the embodiment shown in the figure 1 when being executed by a computer.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working procedures of the above-described system and unit may refer to the corresponding procedures in the foregoing method embodiments, which are not repeated here.
The system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same, and although the present application has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present application.

Claims (10)

1.一种基于智能运维平台大数据的数据采集方法,其特征在于,包括:1. A data collection method based on big data of an intelligent operation and maintenance platform, characterized by comprising: 获取与智能运维平台连接的多个数据源的配置文件,并基于所述多个数据源的配置文件使用多层次规则引擎从所述多个数据源中筛选出目标数据源,以形成数据源集合;Obtain configuration files of multiple data sources connected to the intelligent operation and maintenance platform, and use a multi-level rule engine to filter out a target data source from the multiple data sources based on the configuration files of the multiple data sources to form a data source set; 采用目标预测模型预测所述数据源集合对应的最优数据采集方式;其中,所述目标预测模型引入有支持向量机、目标聚类算法和长短期记忆网络;A target prediction model is used to predict the optimal data collection method corresponding to the data source set; wherein the target prediction model introduces a support vector machine, a target clustering algorithm and a long short-term memory network; 结合智能运维平台的分布式计算框架,按照最优数据采集方式将数据源集合的数据采集任务分配给智能运维平台内的多台设备,以供所述多台设备并行采集所述数据源集合中所有数据源采集到的原始数据;In combination with the distributed computing framework of the intelligent operation and maintenance platform, the data collection tasks of the data source set are allocated to multiple devices in the intelligent operation and maintenance platform according to the optimal data collection method, so that the multiple devices can collect the original data collected by all data sources in the data source set in parallel; 对所述原始数据进行处理,并根据处理后的数据和运维业务需求信息确定运维结果。The original data is processed, and the operation and maintenance results are determined based on the processed data and the operation and maintenance business demand information. 2.根据权利要求1所述的方法,其特征在于,所述采用目标预测模型预测所述数据源集合对应的最优数据采集方式;其中,所述目标预测模型引入有支持向量机、目标聚类算法和长短期记忆网络,包括:2. The method according to claim 1 is characterized in that the target prediction model is used to predict the optimal data collection mode corresponding to the data source set; wherein the target prediction model introduces a support vector machine, a target clustering algorithm and a long short-term memory network, including: 基于所述数据源集合中各数据源的配置文件,利用支持向量机和目标聚类算法对所述数据源集合中各数据源的特性进行分析,得到分析结果;Based on the configuration files of each data source in the data source set, the characteristics of each data source in the data source set are analyzed by using a support vector machine and a target clustering algorithm to obtain an analysis result; 根据所述分析结果,结合企业运维个性化需求信息和历史数据采集方式,采用长短期记忆网络预测所述数据源集合对应的最优数据采集方式。According to the analysis results, combined with the enterprise operation and maintenance personalized demand information and historical data collection methods, a long short-term memory network is used to predict the optimal data collection method corresponding to the data source set. 3.根据权利要求2所述的方法,其特征在于,所述基于所述数据源集合中各数据源的配置文件,利用支持向量机和目标聚类算法对所述数据源集合中各数据源的特性进行分析,得到分析结果,包括:3. The method according to claim 2, characterized in that the analyzing the characteristics of each data source in the data source set based on the configuration file of each data source in the data source set using a support vector machine and a target clustering algorithm to obtain the analysis result comprises: 基于所述数据源集合中各数据源的配置文件,采用元数据提取技术提取所述数据源集合中各数据源的第一特性信息;Based on the configuration files of each data source in the data source set, extracting first characteristic information of each data source in the data source set by using metadata extraction technology; 基于所述数据源集合中各数据源的第一特性信息,采用支持向量机对所述数据源集合中所有数据源的类别属性进行分类处理,得到数据源分类结果;Based on the first characteristic information of each data source in the data source set, a support vector machine is used to classify the category attributes of all data sources in the data source set to obtain a data source classification result; 基于所述数据源分类结果,采用目标聚类算法对每个类别中所有数据源进行聚类处理,得到数据源聚类结果;Based on the data source classification results, a target clustering algorithm is used to perform clustering processing on all data sources in each category to obtain a data source clustering result; 基于所述数据源聚类结果,采用主成分分析技术从所述第一特性信息中选择部分特性信息,并将所述部分特性信息作为第二特性信息;基于所述第二特性信息,采用关联规则学习算法生成每个类别中数据源之间的关联关系;Based on the data source clustering result, a principal component analysis technique is used to select part of the characteristic information from the first characteristic information, and the part of the characteristic information is used as the second characteristic information; based on the second characteristic information, an association rule learning algorithm is used to generate an association relationship between the data sources in each category; 基于所述数据源之间的关联关系,采用遗传算法优化目标聚类算法在聚类处理过程中初始中心点的选择,以得到优化后的数据源聚类结果;Based on the association relationship between the data sources, a genetic algorithm is used to optimize the selection of the initial center point of the target clustering algorithm during the clustering process to obtain an optimized data source clustering result; 基于所述优化后的数据源聚类结果,采用贝叶斯网络构建数据源的特性概率模型,以形成数据源行为预测模型;Based on the optimized data source clustering results, a characteristic probability model of the data source is constructed using a Bayesian network to form a data source behavior prediction model; 基于所述数据源集合中各数据源的配置文件、所述数据源集合中各数据源的第一特性信息、所述数据源分类结果、所述数据源聚类结果、所述第二特性信息、所述数据源之间的关联关系、所述优化后的数据源聚类结果和所述数据源行为预测模型,生成分析结果。Generate analysis results based on the configuration files of each data source in the data source set, the first characteristic information of each data source in the data source set, the data source classification results, the data source clustering results, the second characteristic information, the association relationships between the data sources, the optimized data source clustering results and the data source behavior prediction model. 4.根据权利要求3所述的方法,其特征在于,所述目标聚类算法包括K-means聚类算法;所述基于所述数据源分类结果,采用目标聚类算法对每个类别中所有数据源进行聚类处理,得到数据源聚类结果,包括:4. The method according to claim 3, characterized in that the target clustering algorithm comprises a K-means clustering algorithm; the step of clustering all data sources in each category using the target clustering algorithm based on the data source classification result to obtain the data source clustering result comprises: 定义数据源的特性量化指标体系,所述特性量化指标体系包括以下特性量化指标:数据量大小、访问延迟、更新频率和安全性;Define a characteristic quantification indicator system of a data source, wherein the characteristic quantification indicator system includes the following characteristic quantification indicators: data volume, access delay, update frequency, and security; 基于所述数据源分类结果和所述特性量化指标体系,采用标准化技术对每个类别中所有数据源的第一特性信息进行标准量化处理,得到每个类别中所有数据源的特性值;Based on the data source classification result and the characteristic quantification index system, a standardization technology is used to perform standard quantification processing on the first characteristic information of all data sources in each category to obtain characteristic values of all data sources in each category; 针对每个类别,引入层次聚类算法确定数据源的分组数目,并将所述分组数目作为K-means聚类算法的初始输入参数的数值;For each category, a hierarchical clustering algorithm is introduced to determine the number of groups of the data source, and the number of groups is used as the value of the initial input parameter of the K-means clustering algorithm; 通过概率加权的方式选择初始中心点,基于初始输入参数的数值和所述初始中心点采用K-means聚类算法执行K-means聚类过程,并在聚类过程中引入距离度量的动态调整机制,以得到数据源聚类结果;所述距离度量的动态调整机制表示针对不同特性类型的数据源采用不同的距离度量标准;An initial center point is selected in a probability weighted manner, a K-means clustering process is performed using a K-means clustering algorithm based on the values of the initial input parameters and the initial center point, and a dynamic adjustment mechanism of the distance metric is introduced in the clustering process to obtain a data source clustering result; the dynamic adjustment mechanism of the distance metric means that different distance metrics are used for data sources of different characteristic types; 应用轮廓系数法对数据源聚类结果进行评估,得到轮廓系数值,在轮廓系数值小于预设阈值的情况下,调整所述初始输入参数的数值或优化距离度量标准,并重复执行K-means聚类过程和评估操作,直至轮廓系数值大于或等于预设阈值。The silhouette coefficient method is applied to evaluate the data source clustering results to obtain a silhouette coefficient value. When the silhouette coefficient value is less than a preset threshold, the numerical value of the initial input parameter is adjusted or the distance metric is optimized, and the K-means clustering process and evaluation operation are repeated until the silhouette coefficient value is greater than or equal to the preset threshold. 5.根据权利要求4所述的方法,其特征在于,所述在聚类过程中引入距离度量的动态调整机制,以得到数据源聚类结果;所述距离度量的动态调整机制表示针对不同特性类型的数据源采用不同的距离度量标准,包括:5. The method according to claim 4, characterized in that a dynamic adjustment mechanism of distance metric is introduced in the clustering process to obtain a data source clustering result; the dynamic adjustment mechanism of distance metric means that different distance metric standards are used for data sources of different characteristic types, including: 采用多种距离度量标准构建距离度量标准库,所述距离度量标准库包括以下距离度量标准:欧氏距离、曼哈顿距离、余弦相似度距离和动态时间规整距离;A distance metric library is constructed by using a plurality of distance metrics, wherein the distance metric library includes the following distance metrics: Euclidean distance, Manhattan distance, cosine similarity distance, and dynamic time warping distance; 采用特征工程方法对特性量化指标体系中的所有特性量化指标进行分类,得到初始特性分类结果;基于所述初始特性分类结果,采用自适应距离度量选择算法从所述距离度量标准库中选择与初始特性分类结果中每个分类对应的距离度量标准;Using a feature engineering method to classify all characteristic quantification indicators in the characteristic quantification indicator system to obtain an initial characteristic classification result; based on the initial characteristic classification result, using an adaptive distance metric selection algorithm to select a distance metric corresponding to each classification in the initial characteristic classification result from the distance metric standard library; 根据特性重要性为每个分类内所有特性量化指标赋予对应的权重,并根据每个分类内所有特性量化指标对应的权重动态调整所述每个分类对应的距离度量标准,得到每个分类对应的调整后的距离度量标准;Assign corresponding weights to all characteristic quantification indicators in each category according to characteristic importance, and dynamically adjust the distance metric standard corresponding to each category according to the weights corresponding to all characteristic quantification indicators in each category to obtain an adjusted distance metric standard corresponding to each category; 基于所有分类对应的调整后的距离度量标准,确定数据源聚类结果。The data source clustering results are determined based on the adjusted distance metrics corresponding to all classifications. 6.根据权利要求3所述的方法,其特征在于,所述基于所述数据源之间的关联关系,采用遗传算法优化目标聚类算法在聚类处理过程中初始中心点的选择,以得到优化后的数据源聚类结果,包括:6. The method according to claim 3, characterized in that the step of optimizing the selection of the initial center point of the target clustering algorithm in the clustering process based on the association relationship between the data sources to obtain the optimized data source clustering result by using a genetic algorithm comprises: 初始化种群,得到初始种群,所述初始种群内的每个个体代表一组数据源的待定中心点的位置;Initializing a population to obtain an initial population, wherein each individual in the initial population represents a position of a to-be-determined center point of a group of data sources; 基于适应度函数计算种群内每个个体的适应度值;所述种群在第一次迭代过程中为初始种群,在其他次数的迭代过程中为更新后的种群;Calculating the fitness value of each individual in the population based on the fitness function; the population is the initial population in the first iteration process and is the updated population in other iteration processes; 基于所述每个个体的适应度值进行选择操作,以得到优化中的种群,对所述优化中的种群中的个体进行交叉操作和变异操作,以得到更新后的种群;Performing a selection operation based on the fitness value of each individual to obtain an optimizing population, and performing a crossover operation and a mutation operation on the individuals in the optimizing population to obtain an updated population; 判断是否达到预设停止迭代条件,若否,则重新执行计算步骤、选择操作、交叉操作、变异操作和判断步骤,直至达到预设停止迭代条件;所述预设停止迭代条件为达到最大迭代次数或遗传算法收敛。Determine whether the preset stop iteration condition is reached. If not, re-execute the calculation step, selection operation, crossover operation, mutation operation and judgment step until the preset stop iteration condition is reached; the preset stop iteration condition is reaching the maximum number of iterations or the genetic algorithm converges. 7.根据权利要求2所述的方法,其特征在于,所述根据所述分析结果,结合企业运维个性化需求信息和历史数据采集方式,采用长短期记忆网络预测所述数据源集合对应的最优数据采集方式,包括:7. The method according to claim 2, characterized in that the step of using a long short-term memory network to predict the optimal data collection method corresponding to the data source set based on the analysis results, combined with the enterprise operation and maintenance personalized demand information and the historical data collection method, comprises: 基于所述分析结果,结合企业运维个性化需求信息,生成综合需求特征矩阵;Based on the analysis results and combined with the enterprise's personalized operation and maintenance demand information, a comprehensive demand feature matrix is generated; 基于综合需求特征矩阵,采用时间序列分析方法生成历史采集方式的时间序列特征;Based on the comprehensive demand feature matrix, the time series analysis method is used to generate the time series features of the historical collection mode; 基于综合需求特征矩阵和历史采集方式的时间序列特征,采用长短期记忆网络预测所述数据源集合对应的最优数据采集方式。Based on the comprehensive demand feature matrix and the time series characteristics of historical collection methods, a long short-term memory network is used to predict the optimal data collection method corresponding to the data source set. 8.一种基于智能运维平台大数据的数据采集系统,其特征在于,包括:8. A data collection system based on big data of intelligent operation and maintenance platform, characterized by comprising: 获取筛选模块,用于获取与智能运维平台连接的多个数据源的配置文件,并基于所述多个数据源的配置文件使用多层次规则引擎从所述多个数据源中筛选出目标数据源,以形成数据源集合;An acquisition and screening module is used to acquire configuration files of multiple data sources connected to the intelligent operation and maintenance platform, and to screen out a target data source from the multiple data sources using a multi-level rule engine based on the configuration files of the multiple data sources to form a data source set; 预测模块,用于采用目标预测模型预测所述数据源集合对应的最优数据采集方式;其中,所述目标预测模型引入有支持向量机、目标聚类算法和长短期记忆网络;A prediction module, used to predict the optimal data collection mode corresponding to the data source set by using a target prediction model; wherein the target prediction model introduces a support vector machine, a target clustering algorithm and a long short-term memory network; 分类采集模块,用于结合智能运维平台的分布式计算框架,按照最优数据采集方式将数据源集合的数据采集任务分配给智能运维平台内的多台设备,以供所述多台设备并行采集所述数据源集合中所有数据源采集到的原始数据;A classification collection module is used to combine the distributed computing framework of the intelligent operation and maintenance platform to allocate the data collection tasks of the data source set to multiple devices in the intelligent operation and maintenance platform according to the optimal data collection method, so that the multiple devices can collect the original data collected by all data sources in the data source set in parallel; 处理确定模块,用于对所述原始数据进行处理,并根据处理后的数据和运维业务需求信息确定运维结果。The processing and determination module is used to process the original data and determine the operation and maintenance results according to the processed data and the operation and maintenance business demand information. 9.一种计算设备,其特征在于,包括处理组件以及存储组件;所述存储组件存储一个或多个计算机指令;所述一个或多个计算机指令用以被所述处理组件调用执行,实现如权利要求1~7任一项所述的一种基于智能运维平台大数据的数据采集方法。9. A computing device, characterized in that it includes a processing component and a storage component; the storage component stores one or more computer instructions; the one or more computer instructions are used to be called and executed by the processing component to implement a data collection method based on big data of an intelligent operation and maintenance platform as described in any one of claims 1 to 7. 10.一种计算机存储介质,其特征在于,存储有计算机程序,所述计算机程序被计算机执行时,实现如权利要求1~7任一项所述的一种基于智能运维平台大数据的数据采集方法。10. A computer storage medium, characterized in that a computer program is stored therein, and when the computer program is executed by a computer, a data collection method based on big data of an intelligent operation and maintenance platform as described in any one of claims 1 to 7 is implemented.
CN202411819939.0A 2024-12-11 2024-12-11 A data collection method and system based on big data of intelligent operation and maintenance platform Active CN119988157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411819939.0A CN119988157B (en) 2024-12-11 2024-12-11 A data collection method and system based on big data of intelligent operation and maintenance platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411819939.0A CN119988157B (en) 2024-12-11 2024-12-11 A data collection method and system based on big data of intelligent operation and maintenance platform

Publications (2)

Publication Number Publication Date
CN119988157A true CN119988157A (en) 2025-05-13
CN119988157B CN119988157B (en) 2025-08-29

Family

ID=95623351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411819939.0A Active CN119988157B (en) 2024-12-11 2024-12-11 A data collection method and system based on big data of intelligent operation and maintenance platform

Country Status (1)

Country Link
CN (1) CN119988157B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120124594A (en) * 2025-05-14 2025-06-10 广东亚齐信息技术股份有限公司 File digitalized storage management method
CN120744584A (en) * 2025-07-02 2025-10-03 中国标准化研究院 Multi-dimensional data correlation analysis-based algorithm discrimination identification method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197737A (en) * 2017-12-29 2018-06-22 山大地纬软件股份有限公司 A kind of method and system for establishing medical insurance hospitalization cost prediction model
US10838836B1 (en) * 2020-05-21 2020-11-17 AlteroSmart Solutions LTD Data acquisition and processing platform for internet of things analysis and control
CN112884452A (en) * 2021-03-17 2021-06-01 北京幂数科技有限公司 Intelligent operation and maintenance multi-source data acquisition visualization analysis system
US20240193598A1 (en) * 2022-12-13 2024-06-13 Truist Bank Implementing and displaying digital transfer instruments
CN119003849A (en) * 2024-06-28 2024-11-22 深圳朗道智通科技有限公司 Deep learning-based big data intelligent acquisition method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197737A (en) * 2017-12-29 2018-06-22 山大地纬软件股份有限公司 A kind of method and system for establishing medical insurance hospitalization cost prediction model
US10838836B1 (en) * 2020-05-21 2020-11-17 AlteroSmart Solutions LTD Data acquisition and processing platform for internet of things analysis and control
CN112884452A (en) * 2021-03-17 2021-06-01 北京幂数科技有限公司 Intelligent operation and maintenance multi-source data acquisition visualization analysis system
US20240193598A1 (en) * 2022-12-13 2024-06-13 Truist Bank Implementing and displaying digital transfer instruments
CN119003849A (en) * 2024-06-28 2024-11-22 深圳朗道智通科技有限公司 Deep learning-based big data intelligent acquisition method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120124594A (en) * 2025-05-14 2025-06-10 广东亚齐信息技术股份有限公司 File digitalized storage management method
CN120744584A (en) * 2025-07-02 2025-10-03 中国标准化研究院 Multi-dimensional data correlation analysis-based algorithm discrimination identification method

Also Published As

Publication number Publication date
CN119988157B (en) 2025-08-29

Similar Documents

Publication Publication Date Title
CN119988157B (en) A data collection method and system based on big data of intelligent operation and maintenance platform
US11068789B2 (en) Dynamic model data facility and automated operational model building and usage
CN118761745B (en) OA collaborative workflow optimization method applied to enterprise
CN107168995B (en) Data processing method and server
CN119620957A (en) Erasure code compatible reading and writing method and system based on bidirectional data access agent
CN119168075A (en) A method and system for real-time processing and analysis of AI big data
CN119051996B (en) Training method and device for abnormal flow detection model, monitoring method and equipment
CN114510405A (en) Index data evaluation method, index data evaluation device, index data evaluation apparatus, storage medium, and program product
WO2023110059A1 (en) Method and system trace controller for a microservice system
CN118982347A (en) IT operation and maintenance service management method based on big data
CN113191540A (en) Construction method and device of industrial link manufacturing resources
CN120073667A (en) Ultra-short-term wind power prediction method and system for complex climate environment
CN119202979A (en) Isolation forest-based indicator data early warning method, device, computer equipment, storage medium and program product
CN115099356B (en) Industrial imbalance data classification method, device, electronic equipment and storage medium
CN117574181A (en) Consumption habit analysis method and device
US20240104436A1 (en) Chained feature synthesis and dimensional reduction
CN119046824A (en) Abnormal state identification method, device, electronic equipment, chip and storage medium
CN117971337A (en) A hybrid cloud automatic configuration method based on LSTM model
CN117312912A (en) Method, device and computer equipment for generating business data classification prediction model
CN116795908A (en) Inspection log collection methods, devices, equipment and storage media
Morichetta et al. Formal and Empirical Study of Metadata-Based Profiling for Resource Management in the Computing Continuum
CN120896908B (en) A method and apparatus for intelligent maintenance of Kubernetes nodes
CN119883551B (en) A task scheduling method and system based on distributed computing
CN120909557A (en) Service processing method, device, medium and product
CN121145036A (en) Online methods, devices, and computer equipment for big data anti-fraud scenario rule mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant