[go: up one dir, main page]

CN113849505B - Data compression method and device - Google Patents

Data compression method and device Download PDF

Info

Publication number
CN113849505B
CN113849505B CN202111076416.8A CN202111076416A CN113849505B CN 113849505 B CN113849505 B CN 113849505B CN 202111076416 A CN202111076416 A CN 202111076416A CN 113849505 B CN113849505 B CN 113849505B
Authority
CN
China
Prior art keywords
data
compressed
processed
piece
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111076416.8A
Other languages
Chinese (zh)
Other versions
CN113849505A (en
Inventor
陆明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202111076416.8A priority Critical patent/CN113849505B/en
Publication of CN113849505A publication Critical patent/CN113849505A/en
Application granted granted Critical
Publication of CN113849505B publication Critical patent/CN113849505B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the application provides a data compression method and device, wherein the method comprises the following steps: acquiring at least two data to be processed; determining at least one data to be compressed in the at least two data to be processed by carrying out data analysis on each data to be processed; performing feature analysis on each piece of data to be compressed, and determining a sampling step length of a service corresponding to each piece of data to be compressed; and adopting the sampling step length to resample the data of the service corresponding to the at least one data to be compressed to obtain at least one compressed data.

Description

Data compression method and device
Technical Field
The embodiment of the application relates to the field of big data, and relates to a data compression method and device.
Background
In the data information age, most of the information exists in the form of time series data, and when the time series data is processed and analyzed, a large number of data curves can be simultaneously presented in one graph. When a plurality of curves are drawn in one graph, the network transmission data volume is huge, and more resources are required to be consumed when the front-end page performs data presentation.
The related art reduces the transmission amount of data by compressing all time series data, but this causes data distortion, so how to provide a data compression method capable of retaining the characteristics of the original data is a current urgent problem to be solved.
Disclosure of Invention
Based on the problems in the related art, the embodiment of the application provides a data compression method and device.
The technical scheme of the embodiment of the application is realized as follows:
the embodiment of the application provides a data compression method, which comprises the following steps:
acquiring at least two data to be processed;
Determining at least one data to be compressed in the at least two data to be processed by carrying out data analysis on each data to be processed;
performing feature analysis on each piece of data to be compressed, and determining a sampling step length of a service corresponding to each piece of data to be compressed;
And adopting the sampling step length to resample the data of the service corresponding to the at least one data to be compressed to obtain at least one compressed data.
An embodiment of the present application provides a data compression apparatus, including:
the acquisition module is used for acquiring at least two data to be processed;
The data analysis module is used for determining at least one data to be compressed in the at least two data to be processed by carrying out data analysis on each data to be processed;
The feature analysis module is used for carrying out feature analysis on each piece of data to be compressed and determining the sampling step length of the service corresponding to each piece of data to be compressed;
and the resampling module is used for resampling data of the service corresponding to at least one piece of data to be compressed by adopting the sampling step length to obtain at least one piece of compressed data.
An embodiment of the present application provides a data compression apparatus, including:
A memory for storing executable instructions; and the processor is used for realizing the data compression method when executing the executable instructions stored in the memory.
The embodiment of the application provides a computer readable storage medium, which stores executable instructions for causing a processor to execute the executable instructions to implement the data compression method.
According to the data compression method and device, at least one piece of data to be compressed is determined by carrying out data analysis on the acquired at least two pieces of data to be processed, each piece of data to be compressed is subjected to feature analysis, the sampling step length corresponding to each piece of data to be compressed when the data to be compressed is determined, and data resampling is carried out on the service corresponding to the at least one piece of data to be compressed through the determined sampling step length, so that compressed data is obtained. Therefore, the embodiment of the application determines the data to be compressed according to the characteristics of each data to be processed, and resamples different data to be compressed by adopting different step sizes, so that the embodiment of the application compresses the data on the premise of not reducing the data analysis quality, reduces the data transmission quantity and reduces the consumption of resources and memory.
Drawings
Fig. 1 is a schematic diagram of an application scenario of a data compression method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data compression method according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of a data compression method according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of a data compression method according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of a data compression method according to an embodiment of the present application;
FIG. 6 is a flow chart of a data compression method according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of a data compression method according to an embodiment of the present application;
FIG. 8 is a flow chart of a data compression method according to an embodiment of the present application;
FIG. 9 is a schematic flow chart of a data compression method according to an embodiment of the present application;
fig. 10 is a schematic diagram of a composition structure of a data compression device according to an embodiment of the present application;
Fig. 11 is a schematic diagram of a composition structure of a data compression apparatus according to an embodiment of the present application.
Detailed Description
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of this application belong. The terminology used in the embodiments of the application is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
In the process of analyzing different time series data, the related technology can qualitatively judge the data grabbing quality and the time interval with data loss by observing the density of the time series data. Through the histogram of the standard deviation of the monitoring indexes of the data center, the fluctuation of most monitoring curves is small, and only a small amount of monitoring data of a small amount of indexes has certain fluctuation. The related art discovers that many tools can dynamically generate resampling step sizes through observing time series data visualization components (such as Zabbix or Grafana visualization components) related to a monitoring system, but such setting is for all transmission curves, not for different curves of different data characteristics, so that compressed data distortion can be caused by a compressed data method of the related art, and characteristics of original data cannot be truly reflected.
Based on the problems in the related art, the embodiment of the application provides a data compression method, which comprises the steps of carrying out data analysis on at least two acquired data to be processed to determine at least one data to be compressed, carrying out feature analysis on each data to be compressed, determining a sampling step length corresponding to each data to be compressed when carrying out data compression, and carrying out data resampling on a service corresponding to at least one data to be compressed through the determined sampling step length so as to realize data compression. Therefore, the embodiment of the application determines the data to be compressed according to the characteristics of each data to be processed, and resamples different data to be compressed by adopting different step sizes, so that the embodiment of the application compresses the data on the premise of not reducing the data analysis quality, reduces the data transmission quantity and reduces the consumption of resources and memory.
Fig. 1 is a schematic view of an application scenario of a data compression method provided in an embodiment of the present application, as shown in fig. 1, a data compression system for implementing the data compression method includes a terminal 10, a network 20 and a server 30, at least two time-series data may be displayed on the terminal 10, the server 30 obtains a data compression request sent by the terminal 10 through the network 20, the server 30 performs data analysis on at least two data to be processed to determine at least one data to be compressed, performs feature analysis on each data to be compressed, determines a sampling step length corresponding to each data to be compressed when data compression is performed, performs data resampling on a service corresponding to at least one data to be compressed through the determined sampling step length, obtains compressed data, and sends the compressed data to the terminal 10 through the network 20, the terminal 10 performs visualization on the compressed data, and a user may directly display the received compressed data on a current interface 10-1.
The data compression method provided by the embodiment of the present application will be described below in connection with exemplary applications and implementations of the server provided by the embodiment of the present application. Referring to fig. 2, fig. 2 is a flow chart of a data compression method according to an embodiment of the present application, and will be described with reference to the steps shown in fig. 2.
Step S201, acquiring at least two data to be processed.
In some embodiments, the data to be processed may be data to be analyzed, and the data to be analyzed may be a data column recorded in time sequence such as time series data.
Step S202, determining at least one data to be compressed from the at least two data to be processed by performing data analysis on each data to be processed.
In the embodiment of the application, as shown by monitoring the data to be processed, the fluctuation of most of the monitoring curves is very small, and only a small amount of monitoring data with a small amount of indexes has certain fluctuation, so that the data analysis on the data to be processed can refer to the data fluctuation analysis on the data to be processed, for example, the difference analysis between extreme points on the data, or the standard deviation of each data to be processed is calculated, and whether the data to be processed needs to be compressed or not is determined according to the result of the fluctuation analysis. By data analysis, the data to be compressed which needs to be compressed can be rapidly determined in a large amount of data to be processed.
Here, whether the data to be processed need to be compressed or not can be determined by determining whether the fluctuation of each data to be processed reaches a preset fluctuation threshold value through the difference value result of the extremum or the standard deviation result, when the difference value result of any data extremum is smaller than the preset fluctuation threshold value, the fluctuation of the data is smaller, the compressed data cannot be distorted after the data is compressed, and the compressed data can still truly reflect the data characteristics; when the difference value of the extremum of a certain data is larger than a preset fluctuation threshold, the fluctuation of the data is larger, and the compressed data cannot present real data characteristics after the data is compressed, so that analysis of the discrete degree of the data, such as standard deviation analysis, is needed to determine whether the data is compressed.
And step 203, performing feature analysis on each piece of data to be compressed, and determining a sampling step length of a service corresponding to each piece of data to be compressed.
In some embodiments, after determining the data to be compressed, the sampling step size when compressing the data to be compressed is determined by performing a feature analysis on the data to be compressed. Here, the feature analysis may refer to performing sample interval analysis on the compressed data according to the features of the compressed data to obtain a sampling step size of each data when performing data compression.
And step S204, adopting the sampling step length to resample the data of the service corresponding to the at least one data to be compressed to obtain at least one compressed data.
In the embodiment of the application, each piece of data to be processed corresponds to a service. Resampling the service corresponding to the data to be compressed by the sampling step length corresponding to each data to be compressed to obtain the compressed data.
And carrying out data analysis on the acquired at least two pieces of data to be processed to determine at least one piece of data to be compressed, carrying out feature analysis on each piece of data to be compressed, determining a sampling step length corresponding to each piece of data to be compressed when the data is compressed, and carrying out data resampling on the service corresponding to the at least one piece of data to be compressed through the determined sampling step length so as to realize data compression. Therefore, the embodiment of the application determines the data to be compressed according to the characteristics of each data to be processed, and resamples different data to be compressed by adopting different step sizes, so that the data compression method provided by the embodiment of the application compresses the data, reduces the data transmission quantity, saves the resources and enables the compressed data to truly reflect the characteristics of the original data.
In some embodiments, the data to be compressed may be determined by performing fluctuation analysis on the data to be processed, and fig. 3 is a schematic flow chart of a data compression method according to an embodiment of the present application, as shown in fig. 3, in some embodiments, step S202 may be implemented by:
Step 301, performing fluctuation analysis on each piece of data to be processed to obtain a fluctuation difference value corresponding to each piece of data to be processed.
In some embodiments, the fluctuation analysis may be an analysis method such as extremum analysis, which may determine the fluctuation of data, and the extremum difference corresponding to each piece of data to be processed, that is, the fluctuation difference value, may be determined through extremum analysis, and the fluctuation of data of each piece of data to be processed may be determined through extremum difference.
Step S302, classifying the data to be processed according to the fluctuation difference value, and dividing the data to be processed into first data to be compressed or second data to be compressed.
In some embodiments, step S302 may be implemented by:
In step S3021, if the fluctuation difference is less than or equal to a preset fluctuation threshold, the data to be processed is determined to be first data to be compressed.
Here, the fluctuation threshold may be preset by a technician, and if the fluctuation difference of a certain data to be processed is smaller than or equal to the preset fluctuation threshold, which indicates that the fluctuation of the data to be processed is smaller, the data to be processed may be compressed, and the data to be processed is determined to be the first data to be compressed, so that the first data to be compressed refers to the data to be processed with smaller fluctuation.
It should be noted that, the performing of the partial-value analysis on the data to be processed may be performing a difference calculation on each adjacent set of maximum values and minimum values in the data, or dividing the data into different intervals, and determining whether to determine the data to be processed as the first data to be compressed by calculating the difference between the maximum value and the minimum value of each interval.
Step S3022, if the fluctuation difference is greater than the fluctuation threshold, determining a discrete value of the data to be processed.
In some embodiments, when the extremum difference of the data to be processed is greater than a preset fluctuation threshold, it is indicated that the fluctuation of the data to be processed is greater, and it is required to determine whether data compression is required according to the degree of dispersion of the data to be processed. Here, determining the discrete value of the data to be processed may be performing standard deviation or variance calculation on the data to be processed, where the obtained standard deviation or variance value is the discrete value of the data to be processed. The discrete value corresponding to each piece of data to be processed is used for determining the discrete degree of the data.
Step S3023, determining that the discrete value is smaller than or equal to a preset discrete threshold, and determining the data to be processed as second data to be compressed.
In some embodiments, the discrete threshold may be preset by a technician, for example, marked by a tool such as a histogram. When the standard deviation value or the variance value of the data to be processed is smaller than or equal to the preset discrete threshold value, the difference between the average value of most of the numerical values and all the numerical values in the data to be processed is smaller, that is, the degree of the dispersion of the data to be processed is smaller, the data to be processed can be considered to reflect the characteristics of the original data after being compressed, and then the data to be processed can be determined as second data to be compressed. The second data to be compressed refers to data to be processed which fluctuates greatly but has a small degree of data dispersion.
In some embodiments, when the standard deviation value or the variance value of the data to be processed is greater than the preset discrete threshold value, it is indicated that the difference between most of the values in the data to be processed and the average value of all the values is greater, that is, the degree of dispersion of the data to be processed is greater, if the data to be processed is resampled and compressed, most of the feature points in the data to be processed cannot be acquired, so that the compressed data cannot reflect the features of the original data, and therefore, when the standard deviation value or the variance value of the data to be processed is greater than the preset discrete threshold value, the data to be processed is not resampled, and can be stored and displayed on the terminal interface together with other compressed data.
Step S303, determining the first data to be compressed and the second data to be compressed as the data to be compressed.
According to the embodiment of the application, the fluctuation analysis and the discrete analysis are carried out on the data to be processed, so that the data to be compressed which need to be compressed is accurately and rapidly determined from the data to be processed, the data quantity transmitted by the front end is reduced after the data is compressed, the data with more characteristics cannot be compressed, and the problem that the compressed data cannot reflect the characteristics of the original data cannot occur.
In some embodiments, since the first data to be compressed and the second data to be compressed characterize different types of data, different methods are employed for determining the corresponding sampling step size for the first data to be compressed and the second data to be compressed. Based on the foregoing embodiments, fig. 4 is a schematic flow chart of a data compression method according to an embodiment of the present application, as shown in fig. 4, in some embodiments, step S203 may be implemented by:
And S401, analyzing the sampling interval of the first data to be compressed to obtain a visual sampling interval of the first data to be compressed.
In some embodiments, the first data to be compressed refers to data to be processed that fluctuates less. The analysis of the sampling interval of the first data to be compressed may be to analyze an interval of the first data to be compressed that can be queried in the visualization process, and take the interval of the first data to be compressed that is visualized once as a visualized sampling interval of the first data to be compressed.
In some embodiments, since the fluctuation of the first data to be compressed is small, the data at the start time point and the end time point of the first data to be compressed may also be directly used as the data after the first data to be compressed, that is, the data at the start time point and the end time point are only collected when the first data to be compressed is resampled.
Step S402, determining the duration corresponding to the visual sampling interval as the sampling step length of the service corresponding to the first data to be compressed.
Here, each visual sampling interval has a duration, and the duration can be used as a sampling step length of a service corresponding to the first data to be compressed when the data is compressed. For example, the total data duration of the first data to be compressed is 60 minutes, the visual sampling interval is 1 minute after sampling interval analysis, and when resampling is performed on the service corresponding to the first compressed data, the data is collected every 1 minute from the beginning of data collection until the data collection is completed to form compressed data.
In some embodiments, for the second data to be compressed, step S203 may be further implemented by:
Step S403, obtaining an information table, where the information table at least includes: and resampling step length corresponding to each service.
Here, the information table may be a resampling step length corresponding to each service set by a technician according to a characteristic of each service, and the characteristic of the service may be a function of the service, for example, an attribute may be a function of information transfer or storage.
And step S404, performing feature analysis on each piece of second data to be compressed to correspondingly obtain the attribute of each piece of second data to be compressed.
In some embodiments, the second data to be compressed refers to data to be processed that fluctuates more, but the degree of data dispersion is small. The data can be subjected to characteristic analysis to obtain the attribute of the data. And determining the service corresponding to the second data to be compressed according to the attribute corresponding to each second data to be compressed.
Step 405, according to the attribute of each piece of the second data to be compressed, determining the service corresponding to each piece of the second data to be compressed.
Step S406, determining a sampling step length of the service corresponding to each second data to be compressed according to the information table and the service corresponding to each second data to be compressed.
In some embodiments, the information table includes a resampling step size of each service, and after determining a service corresponding to the data according to the service attribute, the sampling step size of the service corresponding to the second data to be compressed may be determined by a table lookup method.
According to the embodiment of the application, the sampling step length corresponding to the data to be compressed is determined according to the characteristics of different data to be compressed, so that the data compression method provided by the embodiment of the application can set different resampling step lengths according to the characteristics of different data, and the compressed data can not obviously reduce the data analysis quality.
In some embodiments, it is desirable to determine whether there is a miss in the data to be processed before determining whether compression of the data to be processed is required. Based on the foregoing embodiments, fig. 5 is a schematic flow chart of a data compression method according to an embodiment of the present application, as shown in fig. 5, in some embodiments, the data compression method may further include the following steps:
step S501, performing integrity detection on each piece of data to be processed.
Here, the integrity check may be to check whether a curve of the data to be processed is continuous, and if a part of time points in the curve of the data to be processed has no value, it may be indicated that the data to be processed is missing.
Step S502, if it is determined that data missing exists in any data to be processed, the corresponding data to be processed is stored in a preset database.
Here, the preset database is used to store data that does not need to be compressed, and compressed data is transmitted together with data in the preset database or presented together when data transmission or data presentation is performed.
Step S503, determining that there is no data missing in the data to be processed, and performing the data analysis on the data to be processed.
In the embodiment of the application, the condition that the data loss is likely to be system downtime, process restarting or data loss monitoring and the like from the viewpoint of operation and maintenance is existed, the data is not resampled and compressed, and the data with the loss is stored in a preset database. When the data to be processed is not missing, step S202 is executed, and the data analysis is performed on the data to be processed, so as to determine at least one data to be compressed.
The data compression method provided by the embodiment of the application avoids resampling the missing data and reduces the resource consumption.
In some embodiments, each data has a static threshold above which anomalies in the data can be determined. Based on the foregoing embodiments, fig. 6 is a schematic flow chart of a data compression method provided in the embodiment of the present application, as shown in fig. 6, in some embodiments, the embodiment of the present application may further include the following steps:
and step S601, performing anomaly monitoring on each piece of data to be processed.
Step S602, determining whether the time sequence value in any data to be processed is greater than the static threshold.
In some embodiments, the data to be processed is time-series data, at least including a time-series value, and performing anomaly monitoring on the data to be processed means monitoring whether the value of each service data exceeds a preset static threshold.
Step S603, if it is determined that at least one time sequence value in any data to be processed is greater than the static threshold, the data to be processed is saved in a preset database.
Step S604, if it is determined that each time sequence value in any data to be processed is less than or equal to the static threshold, performing the data analysis on the data to be processed.
In some embodiments, for the data to be processed whose time sequence value is greater than the static threshold, the data interval whose time sequence value is greater than the static threshold may not be resampled, or the data to be processed whose time sequence value is greater than the static threshold may be saved in a preset database, where the whole data to be processed is not resampled and compressed. If it is determined that all the time sequence values in any data to be processed are smaller than or equal to the static threshold, step S202 is executed, and the data analysis is performed on the data to be processed, so as to determine at least one data to be compressed.
In some embodiments, the data may be saved, transmitted, or visualized after being compressed by resampling the data to be compressed to form compressed data. Based on the foregoing embodiments, fig. 7 is a schematic flow chart of a data compression method provided in the embodiment of the present application, as shown in fig. 7, in some embodiments, the embodiment of the present application may further include the following steps:
step S701, storing at least one compressed data in the preset database.
Step S702, transmitting at least one compressed data and the data to be processed in the preset database to a data processing device, so that the data processing device processes the compressed data.
In some embodiments, the data transmission device may be a data analysis device by which a technician receives compressed data and uncompressed data to be processed and performs data analysis on the entire data.
Step S703, aggregating the compressed data and the data to be processed in the preset database to form at least one data to be displayed; wherein the compressed data comprises at least: the method comprises a start time stamp, a termination time stamp, sampling step sizes and time sequence values corresponding to each sampling step size from the start time stamp.
Here, the data to be displayed includes all of the compressed data and the data to be processed in the preset database.
Step S704, performing display feature processing on each piece of data to be displayed, so that each piece of data to be displayed is displayed with a corresponding display feature.
In some embodiments, the processing of the display characteristics of each data to be displayed refers to adding characteristics to each data to be displayed, for example, each data to be displayed is displayed in a different color when the visualized color of each data to be displayed is different.
An exemplary application of the embodiments of the present application in a practical application scenario will be described below.
Fig. 8 is a flow chart of a data compression method provided by an embodiment of the present application, and as shown in fig. 8, the data compression method provided by the embodiment of the present application may be implemented by the following steps:
step S801, acquiring time series data.
Step S802, determining whether missing data exists in the time sequence data.
In some embodiments, after obtaining a time series data (i.e., data to be processed) curve to be analyzed, it is identified whether there is a data loss in the curve, and if there is a data loss, it may be a system downtime, a process restart, or a monitored data loss in an operation and maintenance situation, and based on operation and maintenance experience, such a curve is not generally resampled and compressed.
In some embodiments, step S804 is performed when missing data is present in the time series data, and step S803 is performed when missing data is not present in the time series data.
Step S803, calculating extreme value difference of the time sequence data.
Step S804, storing the time sequence data into a database.
Step S805, determining whether the extremum difference is less than the extremum threshold.
In some embodiments, some of the monitoring data exists in the form of a counter, and the slopes of the time points of the real-time data are the same. If the data acquisition time intervals are the same, the data compression method provided by the embodiment of the application can still be used for realizing.
In some embodiments, if the extremum difference is less than the extremum threshold, step S806 is performed, and if the extremum difference is greater than the extremum threshold, step S807 is performed.
Step S806, resampling the time sequence according to the visual query interval.
In some embodiments, if the difference is 0 or less than some threshold, the interval of the visual query is calculated as a resampling step, or the data of the start time point and the end time point is directly used.
Step S807, calculating a standard deviation of the time series data.
Step S808, if the standard deviation of the time sequence data is larger than the standard deviation threshold, storing the time sequence data into a database.
In some embodiments, the standard deviation threshold may be obtained by manual labeling (e.g., by looking at a tool such as a histogram, etc.), or by some anomaly detection algorithm, e.g., iForest, etc.
Step S809, if the standard deviation of the time series data is smaller than the standard deviation threshold, determining the sampling step length of each time series data according to the table look-up method, and resampling according to the sampling step length.
In some embodiments, the range of the data fluctuation can be obtained by a table look-up method, and if the standard deviation threshold is exceeded, the original data is used. If the sampling step is lower than a certain standard deviation, obtaining a resampling step length according to a table look-up method, and resampling according to the sampling step length to obtain compressed data.
Step 810, storing, transmitting and visualizing the data in the database and the resampled data.
In some embodiments, the data transmission includes: metadata such as start time point, end time point, resampling frequency, etc., will help to restore the data to perform the further processing required or to perform the correct resampling or time-series data processing in the restored data. The data in the database and the resampled data may also be sent to a visualization program for visual presentation.
In some embodiments, in many time series curves, there will be a large fluctuation of data for a part of the time and a small fluctuation of data for the rest of the time, for which different data resampling steps will be used for different time windows. Fig. 9 is a schematic flow chart of a data compression method provided by an embodiment of the present application, and as shown in fig. 9, the embodiment of the present application may further include the following steps:
Step S901, performing outlier analysis on the time sequence data.
Step S902, adding fixed intervals before and after the outliers to form an outlier range interval.
In some embodiments, after determining the outlier range bins, the time series data is divided into outlier range bins and bins other than each outlier range bin.
Step S903, acquiring data of an outlier range section.
Step S904, merging the data of the outlier range section and the resampled time series data.
Step S905, calculating the standard deviation of the intervals except for each outlier range interval.
Step S906, perform normalization on each interval and resample different intervals.
Step S907, resampled time series data is generated.
Step S908, transmitting, storing and visualizing the combined data.
In some embodiments, anomaly detection is first performed on time series data to obtain a time range in which outliers occur. The typical anomaly detection method includes: level fluctuation anomaly detection, adjacent data point difference anomaly detection, anomaly detection based on residual error of predicted value and actual value, or aggregation result of a plurality of anomaly detection algorithms. Secondly, the sections before and after the abnormal data are marked as abnormal sections, and the data in the abnormal sections use the original data without resampling calculation. And for data outside the abnormal interval, calculating standard deviation again for the data in different time periods, obtaining corresponding resampling step length through a table look-up method, and resampling the data. And finally, returning to related transmission, storage or application programs after processing the data of different areas in a sequence according to different resampling.
In some embodiments, a static threshold is set for the data used to monitor the alarm, and data exceeding the static threshold is important to the engineer's analysis of anomalies. In an embodiment of the present application, the foregoing compression method is performed on data within a static threshold range. The compression method is not performed for the index or the data section exceeding the static threshold, but the original data is used.
The data compression method provided by the embodiment of the application obviously reduces the data volume of time sequence data transmission and storage, reserves relatively sensitive data intervals in operation and maintenance service, and does not cause loss to the operation and maintenance service in the compression process.
Fig. 10 is a schematic diagram of a composition structure of a data compression device according to an embodiment of the present application, and as shown in fig. 9, the data compression device 100 includes:
An acquisition module 101, configured to acquire at least two data to be processed; the data analysis module 102 is configured to determine at least one data to be compressed from the at least two data to be processed by performing data analysis on each data to be processed; the feature analysis module 103 is configured to perform feature analysis on each piece of data to be compressed, and determine a sampling step length of a service corresponding to each piece of data to be compressed; and the resampling module 104 is configured to resample data of the service corresponding to the at least one data to be compressed by using the sampling step length, so as to obtain at least one compressed data.
In some embodiments, the data analysis module 102 is further configured to perform a fluctuation analysis on each piece of the data to be processed, so as to obtain a fluctuation difference value corresponding to each piece of the data to be processed; classifying the data to be processed according to the fluctuation difference value, and dividing the data to be processed into first data to be compressed or second data to be compressed; and determining the first data to be compressed and the second data to be compressed as the data to be compressed. .
In some embodiments, the data analysis module 102 is further configured to determine the data to be processed as first data to be compressed if the fluctuation difference is less than or equal to a preset fluctuation threshold; if the fluctuation difference value is larger than the fluctuation threshold value, determining a discrete value of the data to be processed; and determining the data to be processed as second data to be compressed if the discrete value is smaller than or equal to a preset discrete threshold value.
In some embodiments, the feature analysis module 103 is further configured to perform a sampling interval analysis on the first data to be compressed to obtain a visual sampling interval of the first data to be compressed; and determining the duration corresponding to the visual sampling interval as the sampling step length of the service corresponding to the first data to be compressed. .
In some embodiments, the feature analysis module 103 is further configured to obtain an information table, where the information table includes at least: resampling step length corresponding to each service; performing feature analysis on each piece of second data to be compressed to correspondingly obtain the attribute of each piece of second data to be compressed; according to the attribute of each piece of second data to be compressed, correspondingly determining the service corresponding to each piece of second data to be compressed; and determining the sampling step length of the service corresponding to each second data to be compressed according to the information table and the service corresponding to each second data to be compressed.
In some embodiments, the data compression apparatus 100 further comprises: the detection module is used for carrying out integrity detection on each piece of data to be processed; the first determining module is used for determining that data loss exists in any data to be processed, and storing the corresponding data to be processed into a preset database; and the second determining module is used for determining that no data loss exists in the data to be processed, and carrying out data analysis on the data to be processed.
In some embodiments, the data compression apparatus 100 further comprises: the exception monitoring is used for carrying out exception monitoring on each piece of data to be processed; a third determining module, configured to determine whether a timing value in any data to be processed is greater than the static threshold; the first storage module is used for storing the data to be processed into a preset database if at least one time sequence value in any data to be processed is determined to be greater than the static threshold value; and the analysis module is used for carrying out data analysis on the data to be processed if each time sequence value in any data to be processed is determined to be smaller than or equal to the static threshold value.
In some embodiments, the data compression apparatus 100 further comprises: the second storage module is used for storing at least one compressed data into the preset database; and the transmission module is used for transmitting at least one compressed data and the data to be processed in the preset database to data processing equipment so that the data processing equipment processes the compressed data.
In some embodiments, the data compression apparatus 100 further comprises: the aggregation module is used for aggregating the compressed data and the data to be processed in the preset database to form at least one data to be displayed; wherein the compressed data comprises at least: the method comprises a start time stamp, a termination time stamp, sampling step sizes and time sequence values corresponding to each sampling step size from the start time stamp; and the display characteristic processing module is used for respectively carrying out display characteristic processing on each piece of data to be displayed so as to enable each piece of data to be displayed with the corresponding display characteristic.
It should be noted that, the description of the apparatus according to the embodiment of the present application is similar to the description of the embodiment of the method described above, and has similar beneficial effects as the embodiment of the method, so that a detailed description is omitted. For technical details not disclosed in the present apparatus embodiment, please refer to the description of the method embodiment of the present application for understanding.
In the embodiment of the present application, if the data compression method is implemented in the form of a software functional module and sold or used as a stand-alone commodity, the data compression method may also be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or some of the contributing to the related art may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a terminal to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the application are not limited to any specific combination of hardware and software.
Correspondingly, an embodiment of the present application provides a data compression device, and fig. 11 is a schematic structural diagram of the data compression device provided in the embodiment of the present application, as shown in fig. 11, where the data compression device 110 at least includes: a processor 111 and a computer readable storage medium 112 configured to store executable instructions, wherein the processor 111 generally controls the overall operation of the data compression device. The computer readable storage medium 112 is configured to store instructions and applications executable by the processor 111, and may also cache data to be processed or processed by various modules in the processor 111 and the data compression device 110, and may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM).
Embodiments of the present application provide a storage medium having stored therein executable instructions which, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, as shown in fig. 2.
In some embodiments, the storage medium may be a computer readable storage medium, such as a ferroelectric Memory (FRAM, ferromagnetic Random Access Memory), read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read Only Memory), electrically erasable programmable Read Only Memory (EEPROM, ELECTRICALLY ERASABLE PROGRAMMABLE READ ONLY MEMORY), flash Memory, magnetic surface Memory, optical Disk, or Compact Disk-Read Only Memory (CD-ROM), or the like; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or distributed across multiple sites and interconnected by a communication network.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed.
The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. A method of data compression, comprising:
acquiring at least two data to be processed;
Carrying out fluctuation analysis on each piece of data to be processed to obtain a fluctuation difference value corresponding to each piece of data to be processed;
if the fluctuation difference value is smaller than or equal to a preset fluctuation threshold value, determining the data to be processed as first data to be compressed;
if the fluctuation difference value is larger than the fluctuation threshold value, determining a discrete value of the data to be processed;
Determining the data to be processed as second data to be compressed if the discrete value is smaller than or equal to a preset discrete threshold value;
Performing feature analysis on each piece of data to be compressed, and determining a sampling step length of a service corresponding to each piece of data to be compressed; the data to be compressed comprises the first data to be compressed and the second data to be compressed;
Resampling data of the service corresponding to the at least one data to be compressed by adopting the sampling step length to obtain at least one compressed data; and resampling the different data to be compressed by adopting different sampling step sizes.
2. The method of claim 1, wherein the performing the feature analysis on each of the data to be compressed to determine the sampling step size of the service corresponding to each of the data to be compressed comprises:
Analyzing the first data to be compressed in a sampling interval to obtain a visual sampling interval of the first data to be compressed;
and determining the duration corresponding to the visual sampling interval as the sampling step length of the service corresponding to the first data to be compressed.
3. The method of claim 1, wherein the performing the feature analysis on each of the data to be compressed to determine the sampling step size of the service corresponding to each of the data to be compressed comprises:
obtaining an information table, wherein the information table at least comprises: resampling step length corresponding to each service;
performing feature analysis on each piece of second data to be compressed to correspondingly obtain the attribute of each piece of second data to be compressed;
According to the attribute of each piece of second data to be compressed, correspondingly determining the service corresponding to each piece of second data to be compressed;
And determining the sampling step length of the service corresponding to each second data to be compressed according to the information table and the service corresponding to each second data to be compressed.
4. The method according to claim 1, wherein the method further comprises:
carrying out integrity detection on each piece of data to be processed;
If any data to be processed is determined to have data missing, the corresponding data to be processed is stored in a preset database;
and if the data to be processed is determined to be absent, carrying out data analysis on the data to be processed.
5. The method of claim 1, wherein each of the data to be processed has a static threshold; the data to be processed comprises: at least one timing value;
the method further comprises the steps of:
performing anomaly monitoring on each piece of data to be processed;
determining whether a time sequence value in any data to be processed is greater than the static threshold value;
If at least one time sequence value in any data to be processed is determined to be greater than the static threshold value, storing the data to be processed into a preset database;
And if each time sequence value in any data to be processed is determined to be smaller than or equal to the static threshold value, carrying out data analysis on the data to be processed.
6. The method according to any one of claims 1 to 5, further comprising:
Storing at least one compressed data into a preset database; or alternatively, the first and second heat exchangers may be,
Transmitting at least one compressed data and data to be processed in the preset database to data processing equipment, so that the data processing equipment processes the compressed data.
7. The method according to any one of claims 1 to 5, further comprising:
Aggregating the compressed data and the data to be processed in a preset database to form at least one data to be displayed; wherein the compressed data comprises at least: the method comprises a start time stamp, a termination time stamp, sampling step sizes and time sequence values corresponding to each sampling step size from the start time stamp;
And respectively carrying out display characteristic processing on each piece of data to be displayed so that each piece of data to be displayed is displayed with the corresponding display characteristic.
8. A data compression apparatus, comprising:
the acquisition module is used for acquiring at least two data to be processed;
The data analysis module is used for carrying out fluctuation analysis on each piece of data to be processed to obtain a fluctuation difference value corresponding to each piece of data to be processed; if the fluctuation difference value is smaller than or equal to a preset fluctuation threshold value, determining the data to be processed as first data to be compressed; if the fluctuation difference value is larger than the fluctuation threshold value, determining a discrete value of the data to be processed; determining the data to be processed as second data to be compressed if the discrete value is smaller than or equal to a preset discrete threshold value;
The characteristic analysis module is used for carrying out characteristic analysis on each piece of data to be compressed and determining the sampling step length of the service corresponding to each piece of data to be compressed; the data to be compressed comprises the first data to be compressed and the second data to be compressed;
the resampling module is used for resampling data of the business corresponding to at least one piece of data to be compressed by adopting the sampling step length to obtain at least one piece of compressed data; and resampling the different data to be compressed by adopting different sampling step sizes.
CN202111076416.8A 2021-09-14 2021-09-14 Data compression method and device Active CN113849505B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111076416.8A CN113849505B (en) 2021-09-14 2021-09-14 Data compression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111076416.8A CN113849505B (en) 2021-09-14 2021-09-14 Data compression method and device

Publications (2)

Publication Number Publication Date
CN113849505A CN113849505A (en) 2021-12-28
CN113849505B true CN113849505B (en) 2024-11-26

Family

ID=78973865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111076416.8A Active CN113849505B (en) 2021-09-14 2021-09-14 Data compression method and device

Country Status (1)

Country Link
CN (1) CN113849505B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078755A (en) * 2019-12-19 2020-04-28 远景智能国际私人投资有限公司 Time sequence data storage query method and device, server and storage medium
CN112332853A (en) * 2020-11-02 2021-02-05 重庆邮电大学 Time sequence data compression and recovery method based on power system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04299479A (en) * 1991-03-28 1992-10-22 Toshiba Corp Data compression preservation device and preservation method
CN100435136C (en) * 2006-06-21 2008-11-19 浙江中控软件技术有限公司 Real-time data compression method based on least square linear fit
JP5149872B2 (en) * 2009-06-19 2013-02-20 日本電信電話株式会社 Acoustic signal transmitting apparatus, acoustic signal receiving apparatus, acoustic signal transmitting method, acoustic signal receiving method, and program thereof
CN106611342B (en) * 2015-10-21 2020-05-01 北京国双科技有限公司 Information processing method and device
JP6865617B2 (en) * 2017-03-31 2021-04-28 株式会社クボタ Terminal equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078755A (en) * 2019-12-19 2020-04-28 远景智能国际私人投资有限公司 Time sequence data storage query method and device, server and storage medium
CN112332853A (en) * 2020-11-02 2021-02-05 重庆邮电大学 Time sequence data compression and recovery method based on power system

Also Published As

Publication number Publication date
CN113849505A (en) 2021-12-28

Similar Documents

Publication Publication Date Title
CN111445121A (en) Risk assessment method and apparatus, storage medium, and electronic apparatus
CN104462606B (en) A kind of method that diagnostic process measure is determined based on daily record data
CN112800061A (en) Data storage method, device, server and storage medium
CN113220530B (en) Data quality monitoring method and platform
CN115277355A (en) Method, device, equipment and medium for processing state code data of monitoring system
US20120078912A1 (en) Method and system for event correlation
CN112416896A (en) Data abnormity warning method and device, storage medium and electronic device
CN110011845B (en) Log collection method and system
US9098863B2 (en) Compressed analytics data for multiple recurring time periods
CN113849505B (en) Data compression method and device
CN108509321A (en) Generate the monitoring method and system of data cube
CN118260168B (en) Data acquisition method, computer program product, equipment and computer medium
CN111258819A (en) Data acquisition method, device and system for MySQL database backup file
US20240195714A1 (en) Methods and Apparatuses for Use in a Network Analytics Tool
CN112836124A (en) Image data acquisition method and device, electronic equipment and storage medium
CN117472960A (en) Multi-dimensional service data monitoring method, system, computer equipment and medium
CN116049432A (en) Construction method and device of equipment knowledge graph
CN110298935B (en) Method for acquiring user operation habit information, diagnosis equipment and server
CN115906008A (en) Data processing method and device, electronic equipment and storage medium
CN112905224B (en) Time-consuming determination method, device and equipment for code review
CN118113569A (en) Log analysis method, device, equipment and storage medium of edge equipment
CN113781068A (en) Online problem solving method and device, electronic equipment and storage medium
CN116647860A (en) Index association alarm analysis method for 5G core network, electronic device and storage medium
CN113407412A (en) Micro-service monitoring method and system
CN205754379U (en) Log processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant