CN111339175B - Data processing method, device, electronic equipment and readable storage medium - Google Patents
Data processing method, device, electronic equipment and readable storage medium Download PDFInfo
- Publication number
- CN111339175B CN111339175B CN202010133602.XA CN202010133602A CN111339175B CN 111339175 B CN111339175 B CN 111339175B CN 202010133602 A CN202010133602 A CN 202010133602A CN 111339175 B CN111339175 B CN 111339175B
- Authority
- CN
- China
- Prior art keywords
- data
- monitoring
- integrated
- integrated data
- monitoring data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title abstract description 21
- 238000012544 monitoring process Methods 0.000 claims abstract description 116
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 33
- 230000008569 process Effects 0.000 claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims description 20
- 238000004891 communication Methods 0.000 claims description 8
- 230000009471 action Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 2
- 238000004141 dimensional analysis Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 238000013461 design Methods 0.000 description 14
- 230000008878 coupling Effects 0.000 description 4
- 238000010168 coupling process Methods 0.000 description 4
- 238000005859 coupling reaction Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a data processing method, a device, an electronic device and a readable storage medium, comprising: obtaining a plurality of original data; transmitting the plurality of raw data to a distributed message queue Kafka; obtaining a plurality of original data from Kafka by utilizing a Flink, and performing ETL processing to obtain a plurality of integrated data, wherein the integrated data is provided with a label; writing each integrated data into a corresponding database according to the label of each integrated data; and collecting monitoring data from the Kafka, the Flink and API interfaces corresponding to the databases, and monitoring the monitoring data, wherein the monitoring data comprises original data and integrated data. The data processing method provided by the embodiment of the application can utilize the Flink to carry out ETL processing on the original data, monitor the data generated in each stage of the whole data processing flow, monitor the data generated in each stage, better grasp the change condition of the data in the processed process, unify the caliber of the data and be beneficial to ensuring the consistency of the data.
Description
Technical Field
The present application relates to the field of big data technologies, and in particular, to a data processing method, a device, an electronic apparatus, and a readable storage medium.
Background
In the prior art, the demands for real-time data are more and more, a plurality of independent real-time tasks can cause great cluster resource waste and pay high development and operation costs, so that a unified real-time data warehouse is needed to improve the task expansibility and save the cluster resources.
The existing real-time data warehouse is usually developed based on a chimney type architecture, in the chimney type architecture, each requirement is usually provided with a corresponding task, the problem that single requirements are calculated respectively and the data caliber is not uniform often occurs, and the data consistency is difficult to ensure.
Disclosure of Invention
An embodiment of the application aims to provide a data processing method, a device, electronic equipment and a readable storage medium, which are used for solving the problems that in the prior art, the data caliber is not uniform and the data consistency is difficult to guarantee.
In a first aspect, an embodiment of the present application provides a data processing method, where the method includes: obtaining a plurality of original data; transmitting the plurality of raw data to a distributed message queue Kafka; obtaining the plurality of original data from Kafka by utilizing a Flink and carrying out ETL processing to obtain a plurality of integrated data, wherein the integrated data is provided with a label; writing each integrated data into a corresponding database according to the label of each integrated data; and collecting monitoring data from the Kafka, the Flink and the API interfaces corresponding to the databases, and monitoring the monitoring data, wherein the monitoring data comprises original data and integrated data.
In the above embodiment, the integrated data is provided with a tag, and each integrated data can be written into the corresponding database according to the tag. The data processing method provided by the embodiment of the application can utilize the Flink to carry out ETL processing on the original data, and monitor the data generated in each stage of the whole data processing flow. The data generated in each stage is monitored, so that the change condition of the data in the processed process can be well mastered, the data caliber is unified, and the consistency of the data is guaranteed.
In one possible design, the monitoring data includes: judging whether the specific value of the monitoring data exceeds the corresponding numerical range; if yes, an alarm signal is sent out.
In the above embodiment, when monitoring the monitoring data, a specific value of the monitoring data may be obtained, and whether the specific value exceeds a numerical range corresponding to the monitoring data is determined, if yes, the abnormal condition of the data is represented, and an alarm signal is sent to remind the user.
In one possible design, the monitoring data includes: judging whether a heartbeat signal of the monitoring data is received within a preset time period; if not, an alarm signal is sent out.
In the above embodiment, when monitoring the monitoring data, the timing may be performed after each heartbeat signal corresponding to the monitoring data is received, and if the time length after the last time of receiving the monitoring data exceeds the preset time period, the abnormality of the corresponding monitoring data is represented, and an alarm signal is sent to remind the user.
In one possible design, the database includes Redis; the writing each integrated data into the corresponding database according to the label of the integrated data comprises the following steps: if the tag of the integrated data characterizes the integrated data, frequent query and read-write actions are required to be performed on the integrated data, and the integrated data is written into Redis.
In the embodiment, the data stored in the Redis database can be more conveniently and rapidly queried and read-written, and the integrated data meeting the requirements is stored in the Redis database, so that the running speed and the efficiency of the whole data processing method are improved.
In one possible design, the database includes Mysql; the writing each integrated data into the corresponding database according to the label of the integrated data comprises the following steps: if the label of the integrated data indicates that the data amount corresponding to the integrated data is smaller or the integrated data is required to be provided for a fixed report for use, writing the integrated data into Mysql.
In the embodiment, the data stored in the Mysql database can be more conveniently provided for the fixed report for use, and the integrated data meeting the requirements is stored in the Mysql database, so that the running speed and the efficiency of the whole data processing method are improved.
In one possible design, the database includes GreenPlum; the writing each integrated data into the corresponding database according to the label of the integrated data comprises the following steps: if the tag of the integrated data is present, the integrated data is characterized by multi-dimensional analysis, and the integrated data is written into GreenPlum.
In the embodiment, the data stored in the GreenPlum database can be more conveniently subjected to multidimensional analysis, and the integrated data meeting the requirements is stored in the GreenPlum database, so that the running speed and the efficiency of the whole data processing method are improved.
In one possible design, after the writing of each of the integrated data into the corresponding database according to the tag of the integrated data, the method further includes: and acquiring and processing the data from at least one database in the plurality of databases according to the service type by utilizing the unified service platform.
In the above embodiment, the unified service platform includes multiple service types, and the unified service platform may acquire data from the corresponding databases according to the service types, and may acquire data from the databases in the multiple databases each time, thereby implementing multiplexing of the multiple databases and reducing cost.
In one possible design, the method further comprises: analyzing the operation rule of the monitoring data by utilizing the analysis code to obtain the source and the destination of the monitoring data and the operation process of the monitoring data; and determining the topological relation among the monitoring data according to the source and the destination of the monitoring data and the operation process of the monitoring data.
In the above embodiment, the analysis code can be used to analyze the real-time processing logic of the monitoring data, record where the data is obtained from and sent to after being processed, thereby constructing the blood-margin relationship between the data and perfecting the management of the metadata.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including: the original data acquisition module is used for acquiring a plurality of original data; the data storage module is used for sending the plurality of original data to a distributed message queue Kafka; the integrated data acquisition module is used for acquiring the plurality of original data from Kafka by utilizing the Flink and carrying out ETL processing to acquire a plurality of integrated data, wherein the integrated data is provided with a label; the data writing module is used for writing each integrated data into a corresponding database according to the label of each integrated data; and the data monitoring module is used for collecting monitoring data from the Kafka, the Flink and the API interfaces corresponding to the databases and monitoring the monitoring data, wherein the monitoring data comprises original data and integrated data.
In one possible design, the data monitoring module is specifically configured to determine whether a specific value of the monitored data exceeds a corresponding numerical range; if yes, an alarm signal is sent out.
In one possible design, the data monitoring module is specifically configured to determine whether a heartbeat signal of the monitoring data is received within a preset time period; if not, an alarm signal is sent out.
In one possible design, the database includes Redis; the data writing module is specifically configured to write the integrated data into the dis when the tag having the integrated data characterizes the integrated data to perform frequent query and read-write actions.
In one possible design, the database includes Mysql; the data writing module is specifically used for writing the integrated data into Mysql when the label with the integrated data represents that the data quantity corresponding to the integrated data is smaller or the integrated data is required to be provided for a fixed report.
In one possible design, the database includes GreenPlum; the data writing module is specifically configured to write the integrated data into greenplus when the tag with the integrated data characterizes the integrated data to be subjected to multidimensional analysis.
In one possible design, the apparatus further comprises: and the platform processing module is used for acquiring and processing data from at least one database in the plurality of databases according to the service type by utilizing the unified service platform.
In one possible design, the apparatus further comprises: the rule analysis module is used for analyzing the operation rule of the monitoring data by utilizing the analysis code to obtain the source and the destination of the monitoring data and the operation process of the monitoring data; and the topological relation determining module is used for determining the topological relation among the monitoring data according to the source and the destination of the monitoring data and the operation process of the monitoring data.
In a third aspect, the present application provides an electronic device comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of the first aspect or any alternative implementation of the first aspect.
In a fourth aspect, the present application provides a readable storage medium having stored thereon a computer program which when executed by a processor performs the method of the first aspect or any alternative implementation of the first aspect.
In a fifth aspect, the application provides a computer program product which, when run on a computer, causes the computer to perform the method of the first aspect or any of the possible implementations of the first aspect.
In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating steps of step S140 in FIG. 1;
FIG. 3 is a flowchart illustrating a portion of steps of a data processing method according to an embodiment of the present application;
fig. 4 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
In a comparative embodiment, the existing real-time data warehouse is developed based on a chimney-like architecture. In a chimney type architecture, the problem that individual demands are respectively calculated and the data caliber is not uniform often occurs, and the data consistency is difficult to ensure. The data processing method provided by the embodiment of the application can utilize the Flink to carry out ETL processing on the original data, monitor the data generated in each stage of the whole data processing flow, monitor the data generated in each stage, and better grasp the change condition of the data in the processed process, so that the data caliber is uniform, and the consistency of the data is ensured.
Fig. 1 is a schematic diagram of a data processing method according to an embodiment of the present application, where the data processing method may be executed by an electronic device, and the electronic device may be a server or a terminal device, and the data processing method according to the embodiment of the present application includes steps S110 to S150 as follows:
step S110, obtaining a plurality of original data.
The raw data includes data generated by the business system and event data. The data generated by the business system comprises user transaction information, complaint information and the like; the event data includes a user browsing log, a gateway log, and the like.
The specific obtaining mode of the original data can be as follows: raw data is collected using data collection tools, typically by accessing a data source, including Canal, flume, oggforbigdata, and the like.
Step S120, transmitting the plurality of raw data to the distributed message queue Kafka.
The data generated by the business system, usually in the form of mysql binlog, is accessed to Kafka; event data typically exists in json format and is also accessed to Kafka.
Step S130, obtaining the plurality of original data from Kafka by utilizing Fink and performing ETL processing to obtain a plurality of integrated data, wherein the integrated data is provided with a label.
And acquiring original data from Kafka by using a Flink real-time computing framework, and performing extraction-transformation-Load (ETL) processing on the original data to obtain integrated data with labels, wherein the labels can reflect the attribute of the integrated data.
The real-time data warehouse constructed based on the Flink real-time computing framework has strong expansibility, and each module of the system has high cohesion and low coupling, so that the subsequent architecture evolution expansion is convenient. The data model has clear hierarchy, when the service changes, the quick iteration can be realized by only changing the logic of the bottom layer and arranging the modules according to the modules, and the service change is responded; and the construction and maintenance cost is low, the modeling is unified, the repeated calculation is reduced, and the data consistency is ensured.
And step S140, writing each integrated data into a corresponding database according to the label of each integrated data.
Optionally, referring to fig. 2, in a specific embodiment, the database may include dis, mysql, and greenplus, and step S140 may include the following steps S141 to S143:
in step S141, if the tag of the integrated data indicates that the integrated data needs to be frequently queried and read/written, the integrated data is written into the Redis.
In step S142, if the tag of the integrated data indicates that the data size corresponding to the integrated data is smaller or the integrated data needs to be provided for the fixed report, the integrated data is written into Mysql.
In step S143, if the tag of the integrated data is present to characterize the integrated data, multidimensional analysis is performed, and the integrated data is written into greenplus.
And the integrated data meeting the requirements is stored in the corresponding database, so that the running speed and the efficiency of the whole data processing method are improved.
After step 140, step S150 is continuously performed, and monitoring data is collected from the Kafka, the Flink and the API interfaces corresponding to the databases, and the monitoring data is monitored.
The monitoring data comprise original data and integrated data, and an API interface can be provided with interface services by an elastic search and an HBase. Through the API interfaces and development programs provided by Kafka, flink, a plurality of databases and other components, staff can monitor the state of each task, thereby realizing the monitoring of the full link from data access, data processing to data on-off service and the monitoring of data quality. For data quality monitoring, the monitoring data is typically calculated in real time according to monitoring rules, which include data volume fluctuation, primary key uniqueness, and the like. The monitoring result of the monitoring data and the monitoring result of the data quality can be displayed through an open source tool grafana.
After obtaining a plurality of original data, storing the original data in Kafka, obtaining the original data from Kafka by using a Flink, and performing ETL processing to obtain integrated data (i.e. the processed original data), wherein the integrated data is provided with a label. Each integrated data can be written into a corresponding database according to the tag. Raw data and aggregate data may be collected from Kafka, flink, and API interfaces corresponding to multiple databases and monitored. The data processing method provided by the embodiment of the application can utilize the Flink to carry out ETL processing on the original data, and monitor the data generated in each stage of the whole data processing flow, so that the data caliber is uniform, and the consistency of the data is ensured.
Optionally, in a specific embodiment, step S150 includes: judging whether the specific value of the monitoring data exceeds the corresponding numerical range; if yes, an alarm signal is sent out.
When monitoring the monitoring data, a specific value of the monitoring data can be obtained, whether the specific value exceeds a numerical range corresponding to the monitoring data or not is judged, if yes, the abnormal state of the data is represented, and an alarm signal is sent to remind a user.
Step S150 may further include: judging whether a heartbeat signal of the monitoring data is received within a preset time period; if not, an alarm signal is sent out.
When monitoring the monitoring data, the timing can be performed after each heartbeat signal corresponding to the monitoring data is received, if the time length after the last time of receiving the monitoring data exceeds a preset time period, the corresponding monitoring data is characterized as abnormal, for example, if the heartbeat signal of Kafka is not received in the corresponding preset time period, the Kafka service can be judged to be unavailable, and an alarm signal is sent to remind a user.
The alarm signal may be an audible and visual alert signal, such as a intermittent flashing light, a continuous sounding sound; and may also be communication information signals, such as push messages, short message notifications, telephone notifications, etc. for communication applications. The particular type of alert signal should not be construed as limiting the application.
Optionally, when the alarm signal is a communication information signal, the format may be as follows:
alarm event ID:1
Alarm type:
job task: task name
Data time: 20200217
Alarm details: task operation failure
Alarm time: 00:00:0
Optionally, after step S140, the method may further include: and acquiring and processing the data from at least one database in the plurality of databases according to the service type by utilizing the unified service platform.
The number of service types is multiple, for example, a certain service type is that one data is acquired every time it is executed, and then the operation corresponding to the unified service platform acquires one data for an interface corresponding to a certain database in multiple databases (Redis database, mysql database, greenplus database).
For another example, if a certain service type is to query a certain data according to a query condition, the unified service platform is correspondingly operated to obtain a plurality of data from a certain database in a plurality of databases (Redis database, mysql database, greenplus database), and to screen the data from the plurality of data according to the query condition.
Referring to fig. 3, in a specific implementation manner, the data processing method provided by the embodiment of the present application may further include steps S210 to S220, where step S210 may be performed after step S140, or may be performed after step "acquire and process data from at least one database of the multiple databases according to a service type by using a unified service platform".
Step S210, analyzing the operation rule of the monitoring data by utilizing the analysis code to obtain the source and the destination of the monitoring data and the operation process of the monitoring data.
Step S220, determining a topological relation between the monitoring data according to the source and destination of the monitoring data and the operation process undergone by the monitoring data.
The analysis code can be stored on the distributed version control system GIT, the operation rule of the monitoring data can be specifically a flinksql execution plan and a data dependency relationship, and the analysis code is utilized to analyze the flinksql execution plan and the data dependency relationship, so that the topological relationship between the monitoring data can be obtained. The source and the destination of the monitoring data can be represented in a form of tables, the tables can be connected through nodes, and the nodes are marked with specific operation processes.
The analysis code can be used for analyzing the real-time processing logic of the monitoring data, and recording where the data is acquired from and sent to after being processed, so that the blood relationship among the data is constructed, and the management of the metadata is perfected.
Referring to fig. 4, fig. 4 shows a data processing apparatus provided by an embodiment of the present application, where the apparatus 400 includes:
the raw data acquisition module 410 is configured to acquire a plurality of raw data.
The data storing module 420 is configured to send the plurality of raw data to the distributed message queue Kafka.
And the integrated data obtaining module 430 is configured to obtain the plurality of raw data from Kafka by using the link and perform ETL processing to obtain a plurality of integrated data, where the integrated data has a tag.
The data writing module 440 is configured to write each piece of integrated data into a corresponding database according to the tag of each piece of integrated data.
And the data monitoring module 450 is configured to collect monitoring data from the Kafka, the Flink and the API interfaces corresponding to the databases, and monitor the monitoring data, where the monitoring data includes original data and integrated data.
The data writing module 440 is specifically configured to write the integrated data into the dis when the tag having the integrated data characterizes that the integrated data needs to be frequently queried and read-written.
The data writing module 440 is specifically configured to write the integrated data into Mysql when the tag with the integrated data indicates that the data size corresponding to the integrated data is smaller or the integrated data needs to be provided for the fixed report.
The data writing module 440 is specifically configured to write the integrated data into greenplus when the tag with the integrated data characterizes the integrated data to be subjected to multidimensional analysis.
The data monitoring module 450 is specifically configured to determine whether a specific value of the monitored data exceeds a corresponding numerical range; if yes, an alarm signal is sent out.
The data monitoring module 450 is specifically configured to determine whether a heartbeat signal of the monitoring data is received within a preset time period; if not, an alarm signal is sent out.
The apparatus further comprises:
and the platform processing module is used for acquiring and processing data from at least one database in the plurality of databases according to the service type by utilizing the unified service platform.
And the rule analysis module is used for analyzing the operation rule of the monitoring data by utilizing the analysis code to obtain the source and the destination of the monitoring data and the operation process of the monitoring data.
And the topological relation determining module is used for determining the topological relation among the monitoring data according to the source and the destination of the monitoring data and the operation process of the monitoring data.
The working principle of the data processing device provided by the embodiment of the application is the same as that of the data processing method, and will not be described herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via some communication interfaces, devices or units, in electrical, mechanical, or other forms.
Further, the units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Furthermore, functional modules in various embodiments of the present application may be integrated together to form a single portion, or each module may exist alone, or two or more modules may be integrated to form a single portion.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (10)
1. A method of data processing, the method comprising:
obtaining a plurality of original data;
transmitting the plurality of raw data to a distributed message queue Kafka;
obtaining the plurality of original data from Kafka by utilizing a Flink and carrying out ETL processing to obtain a plurality of integrated data, wherein the integrated data is provided with a label;
writing each integrated data into a corresponding database according to the label of each integrated data;
collecting monitoring data from the Kafka, the Flink and API interfaces corresponding to a plurality of databases, and monitoring the monitoring data, wherein the monitoring data comprises original data and integrated data;
the method further comprises the steps of:
analyzing the operation rule of the monitoring data by utilizing the analysis code to obtain the source and the destination of the monitoring data and the operation process of the monitoring data; the operation rule is a flinksql execution plan and a data dependency relationship;
determining the topological relation among the monitoring data according to the source and the destination of the monitoring data and the operation process of the monitoring data;
and analyzing the flinksql execution plan and the data dependency relationship by using the analysis code to obtain the topological relationship between the monitoring data.
2. The method of claim 1, wherein monitoring the monitoring data comprises:
judging whether the specific value of the monitoring data exceeds the corresponding numerical range;
if yes, an alarm signal is sent out.
3. The method of claim 1, wherein monitoring the monitoring data comprises:
judging whether a heartbeat signal of the monitoring data is received within a preset time period;
if not, an alarm signal is sent out.
4. The method of claim 1, wherein the database comprises a Redis;
the writing each integrated data into the corresponding database according to the label of the integrated data comprises the following steps:
if the tag of the integrated data characterizes the integrated data, frequent query and read-write actions are required to be performed on the integrated data, and the integrated data is written into Redis.
5. The method of claim 1, wherein the database comprises Mysql;
the writing each integrated data into the corresponding database according to the label of the integrated data comprises the following steps:
if the label of the integrated data indicates that the data amount corresponding to the integrated data is smaller or the integrated data is required to be provided for a fixed report for use, writing the integrated data into Mysql.
6. The method of claim 1, wherein the database comprises greenplus;
the writing each integrated data into the corresponding database according to the label of the integrated data comprises the following steps:
if the tag of the integrated data is present, the integrated data is characterized by multi-dimensional analysis, and the integrated data is written into GreenPlum.
7. The method of claim 1, wherein after said writing each of said consolidated data into a corresponding database in accordance with a tag of said consolidated data, said method further comprises:
and acquiring and processing the data from at least one database in the plurality of databases according to the service type by utilizing the unified service platform.
8. A data processing apparatus, the apparatus comprising:
the original data acquisition module is used for acquiring a plurality of original data;
the data storage module is used for sending the plurality of original data to a distributed message queue Kafka;
the integrated data acquisition module is used for acquiring the plurality of original data from Kafka by utilizing the Flink and carrying out ETL processing to acquire a plurality of integrated data, wherein the integrated data is provided with a label;
the data writing module is used for writing each integrated data into a corresponding database according to the label of each integrated data;
the data monitoring module is used for collecting monitoring data from the Kafka, the Flink and the API interfaces corresponding to the databases and monitoring the monitoring data, wherein the monitoring data comprises original data and integrated data;
the device also comprises an analysis module, wherein the analysis module is used for: analyzing the operation rule of the monitoring data by utilizing the analysis code to obtain the source and the destination of the monitoring data and the operation process of the monitoring data; the operation rule is a flinksql execution plan and a data dependency relationship; determining the topological relation among the monitoring data according to the source and the destination of the monitoring data and the operation process of the monitoring data; and analyzing the flinksql execution plan and the data dependency relationship by using the analysis code to obtain the topological relationship between the monitoring data.
9. An electronic device, comprising: a processor, a storage medium, and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor in communication with the storage medium via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the method of any one of claims 1-7 when executed.
10. A readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, performs the method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010133602.XA CN111339175B (en) | 2020-02-28 | 2020-02-28 | Data processing method, device, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010133602.XA CN111339175B (en) | 2020-02-28 | 2020-02-28 | Data processing method, device, electronic equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111339175A CN111339175A (en) | 2020-06-26 |
CN111339175B true CN111339175B (en) | 2023-08-11 |
Family
ID=71184041
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010133602.XA Active CN111339175B (en) | 2020-02-28 | 2020-02-28 | Data processing method, device, electronic equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111339175B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111970195B (en) * | 2020-08-13 | 2022-04-19 | 上海哔哩哔哩科技有限公司 | Data transmission method and streaming data transmission system |
CN112181678A (en) * | 2020-09-10 | 2021-01-05 | 珠海格力电器股份有限公司 | Service data processing method, device and system, storage medium and electronic device |
CN112288907A (en) * | 2020-10-28 | 2021-01-29 | 山东超越数控电子股份有限公司 | Vehicle real-time monitoring method |
CN112287007B (en) * | 2020-10-30 | 2022-02-11 | 常州微亿智造科技有限公司 | Industrial production data real-time processing method and system based on Flink SQL engine |
CN112597203A (en) * | 2020-12-28 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | General data monitoring method and system based on big data platform |
CN112765130A (en) * | 2021-01-20 | 2021-05-07 | 银盛支付服务股份有限公司 | Data warehouse construction method and system, computer equipment and storage medium |
CN112527945A (en) * | 2021-02-10 | 2021-03-19 | 中关村科学城城市大脑股份有限公司 | Method and device for processing geographic space big data |
CN113111107B (en) * | 2021-04-06 | 2023-10-13 | 创意信息技术股份有限公司 | Data comprehensive access system and method |
CN114036183A (en) * | 2021-11-24 | 2022-02-11 | 杭州安恒信息技术股份有限公司 | A data ETL processing method, device, equipment and medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105939393A (en) * | 2016-06-30 | 2016-09-14 | 北京奇虎科技有限公司 | Task operating state monitoring method and system |
CN106202324A (en) * | 2016-06-30 | 2016-12-07 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
CN106209482A (en) * | 2016-09-13 | 2016-12-07 | 郑州云海信息技术有限公司 | A kind of data center monitoring method and system |
CN109684156A (en) * | 2018-08-27 | 2019-04-26 | 平安科技(深圳)有限公司 | Monitoring method, device, terminal and storage medium based on mixed mode applications |
CN109726074A (en) * | 2018-08-31 | 2019-05-07 | 网联清算有限公司 | Log processing method, device, computer equipment and storage medium |
CN109753531A (en) * | 2018-12-26 | 2019-05-14 | 深圳市麦谷科技有限公司 | A kind of big data statistical method, system, computer equipment and storage medium |
CN109829765A (en) * | 2019-03-05 | 2019-05-31 | 北京博明信德科技有限公司 | Method, system and device based on Flink and Kafka real time monitoring sales data |
CN110413701A (en) * | 2019-08-08 | 2019-11-05 | 江苏满运软件科技有限公司 | Distributed data base storage method, system, equipment and storage medium |
CN110442628A (en) * | 2019-07-09 | 2019-11-12 | 恩亿科(北京)数据科技有限公司 | A kind of data monitoring method, system and computer equipment |
CN110750562A (en) * | 2018-07-20 | 2020-02-04 | 武汉烽火众智智慧之星科技有限公司 | Storm-based real-time data comparison early warning method and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10652174B2 (en) * | 2016-01-07 | 2020-05-12 | Wipro Limited | Real-time message-based information generation |
-
2020
- 2020-02-28 CN CN202010133602.XA patent/CN111339175B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105939393A (en) * | 2016-06-30 | 2016-09-14 | 北京奇虎科技有限公司 | Task operating state monitoring method and system |
CN106202324A (en) * | 2016-06-30 | 2016-12-07 | 北京奇虎科技有限公司 | The data processing method of a kind of real-time calculating platform and device |
CN106209482A (en) * | 2016-09-13 | 2016-12-07 | 郑州云海信息技术有限公司 | A kind of data center monitoring method and system |
CN110750562A (en) * | 2018-07-20 | 2020-02-04 | 武汉烽火众智智慧之星科技有限公司 | Storm-based real-time data comparison early warning method and system |
CN109684156A (en) * | 2018-08-27 | 2019-04-26 | 平安科技(深圳)有限公司 | Monitoring method, device, terminal and storage medium based on mixed mode applications |
CN109726074A (en) * | 2018-08-31 | 2019-05-07 | 网联清算有限公司 | Log processing method, device, computer equipment and storage medium |
CN109753531A (en) * | 2018-12-26 | 2019-05-14 | 深圳市麦谷科技有限公司 | A kind of big data statistical method, system, computer equipment and storage medium |
CN109829765A (en) * | 2019-03-05 | 2019-05-31 | 北京博明信德科技有限公司 | Method, system and device based on Flink and Kafka real time monitoring sales data |
CN110442628A (en) * | 2019-07-09 | 2019-11-12 | 恩亿科(北京)数据科技有限公司 | A kind of data monitoring method, system and computer equipment |
CN110413701A (en) * | 2019-08-08 | 2019-11-05 | 江苏满运软件科技有限公司 | Distributed data base storage method, system, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
Design and Implementation of a High-Performance Stream-Oriented Big Data Processing System;Meng Wang;2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC);第1-4页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111339175A (en) | 2020-06-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111339175B (en) | Data processing method, device, electronic equipment and readable storage medium | |
CN115809183A (en) | Method for discovering and disposing information-creating terminal fault based on knowledge graph | |
CN106487574A (en) | Automatic operating safeguards monitoring system | |
CN111339073A (en) | Real-time data processing method and device, electronic equipment and readable storage medium | |
CN113179173B (en) | Operation and maintenance monitoring system for expressway system | |
CN110309130A (en) | A kind of method and device for host performance monitor | |
CN105718351A (en) | Hadoop cluster-oriented distributed monitoring and management system | |
CN112800061B (en) | Data storage method, device, server and storage medium | |
CN117971606B (en) | Log management system and method based on elastic search | |
CN101989931A (en) | Operation alarm processing method and device | |
CN113067717A (en) | Network request log chain tracking method, full link call monitoring system and medium | |
CN112579552A (en) | Log storage and calling method, device and system | |
CN114095333A (en) | Network troubleshooting method, device, equipment and readable storage medium | |
CN113032252A (en) | Method and device for collecting buried point data, client device and storage medium | |
CN114090529A (en) | A log management method, device, system and storage medium | |
CN114780335A (en) | Correlation method, apparatus, computer equipment and storage medium for monitoring data | |
CN113342619A (en) | Log monitoring method and system, electronic device and readable medium | |
CN110677271B (en) | Big data alarm method, device, equipment and storage medium based on ELK | |
CN118627023B (en) | An analysis system for tracking calls between microservices | |
CN109522349B (en) | Cross-type data calculation and sharing method, system and equipment | |
CN116701525A (en) | Early warning method and system based on real-time data analysis and electronic equipment | |
KR102787233B1 (en) | System and method for log monitoring processing based on latent space | |
CN116795631A (en) | Service system monitoring alarm method, device, equipment and medium | |
CN112596974A (en) | Full link monitoring method, device, equipment and storage medium | |
CN113900898B (en) | Data processing system, equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |