[go: up one dir, main page]

CN119621855A - Industrial equipment time series data storage and preprocessing method - Google Patents

Industrial equipment time series data storage and preprocessing method Download PDF

Info

Publication number
CN119621855A
CN119621855A CN202510156909.4A CN202510156909A CN119621855A CN 119621855 A CN119621855 A CN 119621855A CN 202510156909 A CN202510156909 A CN 202510156909A CN 119621855 A CN119621855 A CN 119621855A
Authority
CN
China
Prior art keywords
data
time
industrial equipment
standard
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510156909.4A
Other languages
Chinese (zh)
Inventor
许晋瑞
来健强
王永宗
商广勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Industrial Internet Co Ltd
Original Assignee
Inspur Industrial Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Industrial Internet Co Ltd filed Critical Inspur Industrial Internet Co Ltd
Priority to CN202510156909.4A priority Critical patent/CN119621855A/en
Publication of CN119621855A publication Critical patent/CN119621855A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a time sequence data storage and preprocessing method of industrial equipment, which belongs to the field of data storage and data mining, and comprises the steps of 1, configuring a sensor to collect original data of the industrial equipment in real time and send the original data to a message queue, 2, setting a data importing module, analyzing a subject of the message queue based on a data synchronizing tool to obtain first data and storing the first data in a table A, 3, designing a table B corresponding to the table A, preprocessing and standardizing the first data in the table A to obtain standard data, 4, extracting time characteristics of the standard data to obtain time domain characteristics, partitioning the standard data according to the time domain characteristics to obtain partitioned data lists, 5, obtaining dimension attributes corresponding to unit data of each partitioned data list, and storing the partitioned data lists in the table C according to a storage logic sequence. The industrial data acquisition and storage are realized, and the data processing efficiency and manageability are improved.

Description

Industrial equipment time sequence data storage and preprocessing method
Technical Field
The invention relates to the field of data storage and data mining, in particular to a time sequence data storage and preprocessing method for industrial equipment.
Background
At present, in the fields of intelligent manufacturing and industrial Internet, log data or measurement data generated by the operation of industrial equipment has time sequence attributes, such as vibration amplitude data sampled by a fan according to fixed frequency at a certain time, and frequency domain characteristic values such as peak values, mean values, variances, waveforms and the like are obtained through data mining, so that the method can be used for classification of vibration signals, fault diagnosis and fault prediction, thereby predicting the service life of the equipment and carrying out periodic maintenance on the equipment. The traditional storage mode adopts a file server or cloud object to store an original data file, or collects the original data file into a message queue, or adopts a time sequence database to store, and then data mining is carried out through Python or Spark, so that the mode needs to occupy a large amount of storage space and occupies a large amount of memory space when Python codes are used for reading, or the SQL capability of a time sequence database such as TDengine is used for carrying out data analysis on time sequence data through a preset function.
Therefore, the invention provides a time sequence data storage and preprocessing method for industrial equipment.
Disclosure of Invention
The invention provides a time sequence data storage and preprocessing method of industrial equipment, which is used for storing data into a table A of HBase by utilizing a data synchronization tool through collecting the industrial equipment data in real time and sending the industrial equipment data to a message queue. Next, standard data is obtained by preprocessing and normalization, and a corresponding table B is designed. And on the basis of the standardized data, extracting time characteristics and carrying out partition processing to generate a partition data list. And finally, storing the data according to the dimension attribute, and storing the partition data into a table C according to the storage logic sequence. The data processing and storage efficiency is optimized, and efficient management and analysis of the data are ensured.
In one aspect, the present invention provides a method for storing and preprocessing time-series data of industrial equipment, comprising:
Step 1, configuring a sensor to acquire original data of industrial equipment in real time, and sending the original data to a message queue;
step 2, setting a data import module, analyzing the subject of the message queue based on a data synchronization tool, acquiring first data and storing the first data in a table A of an HBase database;
step 3, designing a table B corresponding to the table A, and preprocessing and normalizing the first data of the table A to obtain standard data;
Step 4, extracting time features of the standard data to obtain time domain features, and partitioning the standard data according to the time domain features to obtain a partitioned data list;
And 5, acquiring dimension attributes corresponding to the unit data of each partition data list, and storing the partition data list into a table C according to a storage logic sequence.
In another aspect, configuring a sensor to collect raw data of an industrial device in real time includes:
Acquiring the working environment and monitoring requirements of industrial equipment, selecting the type of a sensor, and configuring a unique first number for the sensor;
determining the installation position of the sensor according to the original design drawing of the industrial equipment and the surrounding environment, and configuring a unique second number for the installation position;
and configuring and installing a sensor according to the corresponding relation between the first number and the second number, and initializing and starting the sensor based on a preset time sequence sampling frequency, wherein the sensor is used for acquiring the original data of the industrial equipment in real time.
On the other hand, sending the original data to the message queue includes:
Creating a message queue, serializing the original data into a byte stream, and inserting the byte stream into the message queue according to byte iteration;
Until the byte stream of the original data is completely inserted into the message queue, stopping iteration.
On the other hand, a data importing module is set, and the method for analyzing the theme of the message queue based on the data synchronizing tool comprises the following steps:
Constructing a data import module, configuring and installing a data synchronization tool, and generating a row key value pair group of original data based on the queue identification of the message queue analyzed by the data synchronization tool;
creating and registering consumers in the data synchronization tool, and creating a consumption record data table;
and the consumer consumes the theme of the message queue, and inserts the data into the original data table A to generate a row key value.
On the other hand, obtaining the original data and storing the original data in a table a of the HBase database, including:
constructing a table named table A for storing original time sequence data in the HBase database according to a standard preset time sequence field;
according to the result of executing consumption, obtaining first original data and analyzing;
if the first original data are measured values at the same time point, adopting a character string splicing mode to splice the first original data into a value, wherein special symbols are adopted to separate the single measured values;
If the first raw data are measured values at different time points, the data at different measuring times are different rows.
On the other hand, designing a table B corresponding to the table A, preprocessing and normalizing the first data of the table A to obtain standard data, wherein the method comprises the following steps:
acquiring first data of a table A, and converting all the data into second data in a preset format;
selecting a preset neighbor number K, and calculating the KNN distance between any two measured values in the second data as follows:
Wherein, the method comprises the steps of, Represents the distance between the ith measured value and the jth measured value in the second data, n represents the total n measured values in the second data,Representing the second dataIn (2), ln () represents a logarithmic function; Representing the second data Variance of all measured values in (a); the min and max respectively represent the minimum value and the maximum value;
Selecting any measured value as an intermediate value based on a preset neighbor number K, screening K measured values near the intermediate value to form a sample group, acquiring the average distance of the sample group, judging that the corresponding measured value is an abnormal value if the distance between any measured value in the sample group and the intermediate value is larger than the average distance, and otherwise, judging that the corresponding measured value is normal;
removing the abnormal value of the second data to obtain third data, and normalizing the third data to obtain standard data;
On the other hand, the time feature extraction is carried out on the standard data to obtain time domain features, the standard data is partitioned according to the time domain features to obtain a partitioned data list, and the method comprises the following steps:
standard data of a table B is obtained, and a preset time sequence sampling frequency of the standard data is obtained according to a time sequence field of the standard data;
Extracting time features of the standard data, converting timestamp information of the standard data into specific time features, and taking the specific time features as time domain features;
defining a time interval based on the time domain features, specifically:
Wherein The time interval is represented by a time interval,The starting point in time is indicated as such,Representing the time-origin mapping coefficient,The time-endpoint mapping coefficient is represented,The characteristics of the time domain are represented and,Represents the time interval mean of the standard data,Representing the maximum value of the time interval of standard data, T () represents an event handling function;
According to time intervals Carrying out partition cutting processing on the time part of the standard data, wherein each time partition corresponds to a time partition with the size ofWherein the time partition and its corresponding measured value constitute unit data of a partition data list.
On the other hand, acquiring the dimension attribute corresponding to the unit data of each partition data list, and storing the partition data list into the table C according to the storage logic sequence, including:
traversing the partition data list to obtain the time range of each unit data;
Acquiring dimension attributes corresponding to all fields according to fields of standard data corresponding to any unit data in any time interval and a field name-dimension attribute mapping table, wherein the dimension attributes are dimension attributes of the unit data;
A record is created for each unit data and its time interval, unit data values and dimension properties are stored in a table C in a stored logical order.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a time sequence data storage and preprocessing method of industrial equipment, which is used for storing data into a table A of HBase by utilizing a data synchronization tool through collecting the industrial equipment data in real time and sending the industrial equipment data to a message queue. Next, standard data is obtained by preprocessing and normalization, and a corresponding table B is designed. And on the basis of the standardized data, extracting time characteristics and carrying out partition processing to generate a partition data list. And finally, storing the data according to the dimension attribute, and storing the partition data into a table C according to the storage logic sequence. The data processing and storage efficiency is optimized, and efficient management and analysis of the data are ensured.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for storing and preprocessing time-series data of industrial equipment according to an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1:
As shown in fig. 1, the method for storing and preprocessing time-series data of industrial equipment provided by the embodiment of the invention includes:
Step 1, configuring a sensor to acquire original data of industrial equipment in real time, and sending the original data to a message queue;
step 2, setting a data import module, analyzing the subject of the message queue based on a data synchronization tool, acquiring first data and storing the first data in a table A of an HBase database;
step 3, designing a table B corresponding to the table A, and preprocessing and normalizing the first data of the table A to obtain standard data;
Step 4, extracting time features of the standard data to obtain time domain features, and partitioning the standard data according to the time domain features to obtain a partitioned data list;
And 5, acquiring dimension attributes corresponding to the unit data of each partition data list, and storing the partition data list into a table C according to a storage logic sequence.
In this embodiment, the sensor is a device for monitoring and collecting industrial equipment status or environmental data in real time, including types of temperature, pressure, humidity, vibration, and the like.
In this embodiment, industrial equipment refers to machinery, instruments, tools, and other equipment used in industrial processes for production, processing, inspection, or control.
In this embodiment, the raw data refers to raw information collected in real time by sensors, meters, etc. during the operation of the industrial equipment, either raw or analyzed.
In this embodiment, message queuing is a technique for communicating between different services by sending and receiving messages without requiring a direct synchronous connection.
In this embodiment, the data import module refers to a component that is used to obtain data from a message queue and store it in a database (e.g., HBase).
In this embodiment, the data synchronization tool is a software tool that is primarily used to synchronize data between different data sources or systems, such as kafka et al.
In this embodiment, the topic refers to the class of message or data flow in the message queue.
In this embodiment, the HBase database is an open-source, distributed, columnar storage database, which is part of the Apache Hadoop ecosystem, and is designed to handle large-scale, distributed data storage requirements.
In this embodiment, table a is an HBase table that stores raw data retrieved from a message queue.
In this embodiment, table B is an HBase table for storing standard data after preprocessing, normalization, and partitioning, and includes a column of dimension attributes and a column of time series data after preprocessing.
In this embodiment, by means of the coprocessor function of the HBase, the logic of data preprocessing is placed at the server, a large amount of data is not pulled to the client for processing, excessive memory is occupied, higher data preprocessing efficiency is obtained through the distributed storage and calculation capability of the HBase, synchronization of the data storage and preprocessing functions is also realized, and the subsequent data mining only needs to query the table after data preprocessing, and processing steps such as data cleaning, duplication removal and normalization are not needed.
In this embodiment, the data processing of table a and table B are performed synchronously, the coprocessor of HBase is mounted on the original data table a, and each time a new line of data is inserted into table a, the coprocessor is triggered to run, and the data after preprocessing is written into table B.
In this embodiment, the pre-processing normalization is used to convert the raw data into a standard form suitable for subsequent analysis.
In this embodiment, the standard data refers to data after preprocessing and standardization, and has a uniform format and structure.
In this embodiment, the time domain features refer to data features related to time, and features such as time nature, periodicity, and trending of data are extracted from time stamps or time fields and reflected.
In this embodiment, partitioning refers to dividing data into different blocks according to certain specific characteristics during data storage, querying, and processing.
In this embodiment, the partition data list refers to a series of data units obtained by partitioning standard data according to the extracted time domain features (such as year, month, day, hour, etc.) in step 4.
In this embodiment, the unit data refers to the smallest data unit in table a after being pre-processed and normalized, and partitioned by time characteristics.
In this embodiment, dimension attributes refer to various feature fields that can be used to describe or classify data in data processing and storage.
In this embodiment, the storage logic order refers to a manner of storing the data in the partition data list into the target table (table C) according to a certain rule.
In this embodiment, table C is a table for storing dimension attribute data corresponding to unit data of each partition data list.
The technical scheme has the working principle and beneficial effects that the industrial equipment data is collected and processed in real time, the data is stored in the HBase by utilizing the message queue and the data synchronization tool, and the data storage efficiency and the accuracy of subsequent analysis are improved through pretreatment, feature extraction and partition storage, so that efficient data processing and management are supported.
Example 2:
On the basis of the above embodiment 1, configuring the sensor to collect raw data of the industrial equipment in real time includes:
Acquiring the working environment and monitoring requirements of industrial equipment, selecting the type of a sensor, and configuring a unique first number for the sensor;
determining the installation position of the sensor according to the original design drawing of the industrial equipment and the surrounding environment, and configuring a unique second number for the installation position;
and configuring and installing a sensor according to the corresponding relation between the first number and the second number, and initializing and starting the sensor based on a preset time sequence sampling frequency, wherein the sensor is used for acquiring the original data of the industrial equipment in real time.
In this embodiment, the working environment refers to the physical and environmental conditions in which the device is in actual operation, including temperature, humidity, pressure, vibration, gas composition, electromagnetic interference, and many other factors.
In this embodiment, the monitoring requirements refer to the requirements for real-time monitoring and data acquisition of the operating state, environmental conditions and equipment performance of the industrial equipment.
In this embodiment, the first number is a unique identifier for identifying each sensor.
In this embodiment, the original design drawing refers to a detailed technical drawing drawn by an engineer during the design and construction stages of an industrial plant or system.
In this embodiment, the mounting location refers to a specific location or area where the sensor is actually placed in the device or work environment.
In this embodiment, the second number is a unique identifier for identifying each sensor location.
In this embodiment, the preset time sequence sampling frequency refers to the frequency of data acquisition of the industrial equipment by the sensor in a specified time interval.
The technical scheme has the advantages that the sensor type is selected and the unique number is configured by combining the equipment working environment and the monitoring requirement, the installation position is determined, the number is configured, accurate sensor installation and real-time data acquisition are realized, the equipment monitoring efficiency is optimized, and the accuracy and the reliability of data acquisition are ensured.
Example 3:
On the basis of the above embodiment 2, sending the original data to the message queue includes:
Creating a message queue, serializing the original data into a byte stream, and inserting the byte stream into the message queue according to byte iteration;
Until the byte stream of the original data is completely inserted into the message queue, stopping iteration.
Serialization refers in this embodiment to the process of converting the state of a data structure into a format that can be stored or transmitted.
In this embodiment, byte stream refers to a way in which data is processed and transferred in units of bytes in a computer system, in binary representation.
In this embodiment, iterative insertion refers to the process of inserting sequentially into a message queue in bytes until all bytes are inserted.
The technical scheme has the advantages that the method and the device have the advantages that through serializing original data and iteratively inserting the original data into the message queue, the sequence of inserting the data is ensured by controlling the inserting process through the queue identification and the pointer value, the data collision and repetition are avoided, and the reliability and the efficiency of data transmission are improved.
Example 4:
On the basis of the above embodiment 3, setting a data import module, parsing the subject of the message queue based on a data synchronization tool, including:
Constructing a data import module, configuring and installing a data synchronization tool, and generating a row key value pair group of original data based on the queue identification of the message queue analyzed by the data synchronization tool;
creating and registering consumers in the data synchronization tool, and creating a consumption record data table;
and the consumer consumes the theme of the message queue, and inserts the data into the original data table A to generate a row key value.
In this embodiment, parsing is the process of converting a byte stream into the original data.
In this embodiment, one row key (Rowkey) in the row key value pair corresponds to a plurality of columns, each column corresponds to storing a value of a dimension attribute or a value after splicing the time series data, and the preprocessing only processes the columns of the time series data.
In this embodiment, consumer refers to a component that processes data or consumes data.
In this embodiment, the consumption record data table is a table storing time series data after consumption (i.e., data processing).
In this embodiment, traversing refers to analyzing each data record one by one as the original time series data is processed.
The technical scheme has the advantages that the data synchronization tool analyzes the message queue and generates the row key value pair group, a consumer traverses and judges whether the row key value is consumed, unique consumption of data is ensured, the consumption state is recorded, the accuracy and the efficiency of data processing are improved, and repeated consumption is avoided.
Example 5:
on the basis of the above embodiment 4, the raw data is acquired and stored in table a of the HBase database, including:
constructing a table named table A for storing original time sequence data in the HBase database according to a standard preset time sequence field;
according to the result of executing consumption, obtaining first original data and analyzing;
if the first original data are measured values at the same time point, adopting a character string splicing mode to splice the first original data into a value, wherein special symbols are adopted to separate the single measured values;
If the first raw data are measured values at different time points, the data at different measuring times are different rows.
In this embodiment, the standard preset timing field refers to a basic field for defining and identifying time series data, such as a time stamp, a device identification, a data type, and the like.
In this embodiment, the HBase database is an open-source, distributed, columnar-store NoSQL database system for handling large-scale data sets, particularly suited for storing and managing non-relational data.
In this embodiment, raw time series data refers to data representing a certain physical virtual phenomenon acquired in time series.
In this embodiment, the first raw data refers to raw data that is initially acquired during the time series data acquisition process.
In this embodiment, the measured value refers to data representing a certain physical quantity or state, such as temperature, humidity, voltage, air pressure, speed, flow rate, etc., collected by a sensor, device or system.
In this embodiment, the string concatenation means that a plurality of measured values are connected together through specific symbols to form a complete string.
The technical scheme has the working principle and beneficial effects that the time sequence data are stored through the HBase table A, and the data are processed according to the measurement time, wherein measured values at the same time point are spliced into one value, and the data at different time points are stored in a plurality of rows, so that the time sequence data storage and query are optimized, and the flexibility and the efficiency of data processing are improved.
Example 6:
on the basis of the above embodiment 5, designing a table B corresponding to the table a, and performing pretreatment normalization on the first data of the table a to obtain standard data, where the method includes:
acquiring first data of a table A, and converting all the data into second data in a preset format;
selecting a preset neighbor number K, and calculating the KNN distance between any two measured values in the second data as follows:
Wherein, the method comprises the steps of, Represents the distance between the ith measured value and the jth measured value in the second data, n represents the total n measured values in the second data,Representing the second dataIn (2), ln () represents a logarithmic function; Representing the second data Variance of all measured values in (a); the min and max respectively represent the minimum value and the maximum value;
Selecting any measured value as an intermediate value based on a preset neighbor number K, screening K measured values near the intermediate value to form a sample group, acquiring the average distance of the sample group, judging that the corresponding measured value is an abnormal value if the distance between any measured value in the sample group and the intermediate value is larger than the average distance, and otherwise, judging that the corresponding measured value is normal;
removing the abnormal value of the second data to obtain third data, and normalizing the third data to obtain standard data;
In this example, the second data is the data obtained after a certain processing and conversion, and the original data is from table a.
In this embodiment, the preset number of neighbors refers to the number of neighbors selected when calculating the KNN (K-nearest neighbor) distance.
In this embodiment, KNN distance is a core concept in the K-Nearest Neighbor (K-Nearest Neighbor) algorithm, measuring the distance between two data points.
In this embodiment, the intermediate value refers to a measurement value obtained by screening K nearest neighbor data points as a reference point when KNN calculation is performed.
In this embodiment, the sample set refers to K adjacent measured values screened from around the selected intermediate value based on a preset number of neighbors (K) according to KNN algorithm.
In this embodiment, outliers refer to measurements in the dataset that deviate significantly from the overall data trend.
In this embodiment, the third data is a data set from which an outlier is removed, and the result is obtained by performing normalization processing.
In this embodiment, the standard data is data after normalization processing.
The technical scheme has the advantages that the distance between measured values is calculated through the KNN algorithm, abnormal values are screened and removed, and standard data are generated based on standardized processing. The method effectively improves the accuracy and quality of the data, removes abnormal values, and ensures the reliability and stability of data analysis results.
Example 7:
On the basis of the above embodiment 1, performing time feature extraction on the standard data to obtain time domain features, and partitioning the standard data according to the time domain features to obtain a partitioned data list, where the partitioning data list includes:
standard data of a table B is obtained, and a preset time sequence sampling frequency of the standard data is obtained according to a time sequence field of the standard data;
Extracting time features of the standard data, converting timestamp information of the standard data into specific time features, and taking the specific time features as time domain features;
defining a time interval based on the time domain features, specifically:
Wherein The time interval is represented by a time interval,The starting point in time is indicated as such,Representing the time-origin mapping coefficient,The time-endpoint mapping coefficient is represented,The characteristics of the time domain are represented and,Represents the time interval mean of the standard data,Representing the maximum value of the time interval of standard data, T () represents an event handling function;
According to time intervals Carrying out partition cutting processing on the time part of the standard data, wherein each time partition corresponds to a time partition with the size ofWherein the time partition and its corresponding measured value constitute unit data of a partition data list.
In this embodiment, the timing field refers to a data field related to time, and is used to indicate a point in time when the data recording occurs.
In this embodiment, the time stamp information refers to a specific time point at which each piece of data is recorded, and exists in the form of a time stamp.
In this embodiment, the time rule matching degree is a degree of matching between the extracted time feature and the preset time sequence sampling frequency.
In this embodiment, the specific time feature refers to a specific data attribute extracted from the time stamp information in the standard data, for example, year, month, day, minute, hour, etc., which can accurately describe the time dimension.
The technical scheme has the working principle and beneficial effects that the time interval is defined for data partition cutting by extracting the time characteristics of the standard data and comparing the time characteristics with the preset sampling frequency and selecting the optimal time domain characteristics. The method optimizes the time processing and partitioning of the data and improves the time sequence analysis and processing efficiency of the data.
Example 8:
On the basis of the above embodiment 1, acquiring the dimension attribute corresponding to the unit data of each partition data list, and storing the partition data list in the table C according to the storage logic order, including:
traversing the partition data list to obtain the time range of each unit data;
Acquiring dimension attributes corresponding to all fields according to fields of standard data corresponding to any unit data in any time interval and a field name-dimension attribute mapping table, wherein the dimension attributes are dimension attributes of the unit data;
A record is created for each unit data and its time interval, unit data values and dimension properties are stored in a table C in a stored logical order.
In this embodiment, the field name-dimension attribute mapping table is a mapping structure that associates data fields with their corresponding dimension attributes.
In this embodiment, the dimension attribute refers to descriptive information associated with the data field, such as, for example, time, region, product, sales, etc.
The technical scheme has the advantages that the partition data list is traversed, the time interval and the dimension attribute mapping table are combined, records are created for each unit data, and the records are stored in the table C according to storage logic. The method improves the structured storage efficiency of the data and is convenient for subsequent data query and analysis.
It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims (8)

1.一种工业设备时序数据存储及预处理方法,其特征在于,包括:1. A method for storing and preprocessing time series data of industrial equipment, characterized by comprising: 步骤1:配置传感器实时采集工业设备的原始数据,并将原始数据发送至消息队列中;Step 1: Configure sensors to collect raw data from industrial equipment in real time and send the raw data to the message queue; 步骤2:设置数据导入模块,基于数据同步工具解析所述消息队列的主题,获取第一数据并存储到HBase数据库的表A中;Step 2: Set a data import module, parse the topic of the message queue based on the data synchronization tool, obtain the first data and store it in Table A of the HBase database; 步骤3:设计与表A相对应的表B,对表A的第一数据进行预处理标准化后得到标准数据;Step 3: Design table B corresponding to table A, and pre-process and standardize the first data of table A to obtain standard data; 步骤4:将标准数据进行时间特征提取得到时间域特征,将标准数据根据时间域特征分区处理,得到分区数据列表;Step 4: Extract the time features of the standard data to obtain the time domain features, partition the standard data according to the time domain features, and obtain a partition data list; 步骤5:获取每个分区数据列表的单元数据对应的维度属性,并将分区数据列表按照存储逻辑顺序存储到表C中。Step 5: Get the dimension attributes corresponding to the cell data of each partition data list, and store the partition data list in table C according to the storage logic order. 2.根据权利要求1所述的一种工业设备时序数据存储及预处理方法,其特征在于,配置传感器实时采集工业设备的原始数据,包括:2. According to claim 1, a method for storing and preprocessing time series data of industrial equipment is characterized in that the sensors are configured to collect the raw data of the industrial equipment in real time, including: 获取工业设备的工作环境和监控需求,来选择传感器的类型,并为传感器配置唯一的第一编号;Obtain the working environment and monitoring requirements of the industrial equipment to select the type of sensor and configure a unique first number for the sensor; 根据工业设备的原始设计图结合周边环境,确定传感器的安装位置,为安装位置配置唯一的第二编号;Determine the installation location of the sensor based on the original design of the industrial equipment and the surrounding environment, and assign a unique second number to the installation location; 根据第一编号和第二编号的对应关系,配置安装传感器,并基于预设时序采样频率初始化启动传感器,其中,传感器用于实时采集所述工业设备的原始数据。According to the corresponding relationship between the first number and the second number, the sensor is configured and installed, and the sensor is initialized and started based on a preset timing sampling frequency, wherein the sensor is used to collect the original data of the industrial equipment in real time. 3.根据权利要求2所述的一种工业设备时序数据存储及预处理方法,其特征在于,将原始数据发送至消息队列中,包括:3. The method for storing and preprocessing time series data of industrial equipment according to claim 2, wherein sending the original data to the message queue comprises: 创建消息队列,将原始数据序列化为字节流,将字节流按照字节迭代插入;Create a message queue, serialize the original data into a byte stream, and insert the byte stream iteratively according to the bytes; 直到所述原始数据的字节流完全插入消息队列,停止迭代。The iteration stops until the byte stream of the original data is completely inserted into the message queue. 4.根据权利要求3所述的一种工业设备时序数据存储及预处理方法,其特征在于,设置数据导入模块,基于数据同步工具解析所述消息队列的主题,包括:4. The method for storing and preprocessing time series data of industrial equipment according to claim 3 is characterized in that a data import module is set to parse the topic of the message queue based on a data synchronization tool, including: 构建数据导入模块,并配置安装数据同步工具,基于数据同步工具解析消息队列的队列标识,生成原始数据的行键值对组;Build a data import module, configure and install a data synchronization tool, parse the queue ID of the message queue based on the data synchronization tool, and generate row key-value pairs of the original data; 在数据同步工具中创建并注册消费者,创建消费记录数据表;Create and register consumers in the data synchronization tool, and create a consumption record data table; 所述消费者对消息队列的主题进行消费,将数据插入到原始数据表A中生成行键值。The consumer consumes the topic of the message queue and inserts the data into the original data table A to generate a row key value. 5.根据权利要求4所述的一种工业设备时序数据存储及预处理方法,其特征在于,获取原始数据并存储到HBase数据库的表A中,包括:5. The method for storing and preprocessing time series data of industrial equipment according to claim 4, characterized in that the raw data is obtained and stored in table A of the HBase database, comprising: 根据标准预设时序字段,在HBase数据库构建存储原始时序数据的表命名为表A;According to the standard preset time series fields, a table named Table A is constructed in the HBase database to store the original time series data; 根据执行消费的结果,获取第一原始数据并分析;According to the result of executing the consumption, first original data is obtained and analyzed; 若第一原始数据为同一时间点的测量值,采用字符串拼接方式拼接为一个值,其中,单个测量值之间采用特殊符号分隔开;If the first original data are measurement values at the same time point, they are concatenated into one value by string concatenation, wherein the individual measurement values are separated by special symbols; 若第一原始数据为不同一时间点的测量值,其中,不同的测量时间的数据为不同行。If the first original data are measurement values at different time points, the data at different measurement times are in different rows. 6.根据权利要求5所述的一种工业设备时序数据存储及预处理方法,其特征在于,设计与表A相对应的表B,对表A的第一数据进行预处理标准化后得到标准数据,包括:6. The method for storing and preprocessing time series data of industrial equipment according to claim 5, characterized in that a table B corresponding to table A is designed, and the first data of table A is preprocessed and standardized to obtain standard data, including: 获取表A的第一数据,将所有数据转化为预设格式的第二数据;Obtain first data from table A, and convert all data into second data in a preset format; 选择预设邻居数K,计算第二数据中任意两个测量值之间的KNN距离为:Select a preset number of neighbors K and calculate the KNN distance between any two measured values in the second data as: ;其中,表示第二数据中第i个测量值和第j个测量值的距离,n表示第二数据中一共有n个测量值,表示第二数据中所有测量值的均值,ln( )表示对数函数;表示第二数据中所有测量值的方差;分别表示第i个测量值和第j个测量值的权重;min、max分别表示最小值、最大值符号; ;in, represents the distance between the i-th measurement value and the j-th measurement value in the second data, n represents that there are a total of n measurement values in the second data, Indicates the second data The mean of all measurements in , ln( ) represents the logarithmic function; Indicates the second data The variance of all measurements in ; Respectively represent the weights of the i-th measurement value and the j-th measurement value; min and max represent the minimum and maximum values, respectively; 基于预设邻居数K,选择任一测量值作为中间值,筛选所述中间值附近的K个测量值组成样本组,并获取样本组的平均距离,若所述样本组中任一测量值与中间值的距离大于平均距离,判定对应测量值为异常值,反之,判定对应测量值正常;Based on the preset number of neighbors K, any measurement value is selected as the middle value, K measurement values near the middle value are screened to form a sample group, and the average distance of the sample group is obtained. If the distance between any measurement value in the sample group and the middle value is greater than the average distance, the corresponding measurement value is determined to be an abnormal value, otherwise, the corresponding measurement value is determined to be normal; 将第二数据的异常值剔除掉后得到第三数据,将第三数据标准化得到标准数据。The outliers in the second data are removed to obtain the third data, and the third data are standardized to obtain the standard data. 7.根据权利要求1所述的一种工业设备时序数据存储及预处理方法,其特征在于,将标准数据进行时间特征提取得到时间域特征,将标准数据根据时间域特征分区处理,得到分区数据列表,包括:7. The method for storing and preprocessing time series data of industrial equipment according to claim 1 is characterized in that the standard data is subjected to time feature extraction to obtain time domain features, and the standard data is partitioned according to the time domain features to obtain a partition data list, including: 获取表B的标准数据,根据标准数据的时序字段获取所述标准数据的预设时序采样频率;Obtain standard data from Table B, and obtain a preset timing sampling frequency of the standard data according to a timing field of the standard data; 对所述标准数据进行时间特征提取并将标准数据的时间戳信息转化为具体时间特征,将具体时间特征作为时间域特征;Extracting time features from the standard data and converting timestamp information of the standard data into specific time features, using the specific time features as time domain features; 基于时间域特征定义时间区间,具体为:Define the time interval based on the time domain characteristics, specifically: ;其中表示时间区间,表示起始时间点,表示时间起点映射系数,表示时间终点映射系数,表示时间域特征,表示标准数据的时间间隔均值,表示标准数据的时间间隔最大值,T( )表示事件处理函数; ;in Indicates the time interval, Indicates the starting time point, represents the time starting point mapping coefficient, represents the time end mapping coefficient, Represents the time domain characteristics, represents the time interval mean of the standard data, represents the maximum time interval of standard data, T( ) represents the event processing function; 按照时间区间对标准数据的时间部分进行分区切割处理,每个时间分区对应一段大小为的时间区间,其中,所述时间分区及其对应的测量值构成分区数据列表的单元数据。By time interval The time part of the standard data is partitioned and cut, and each time partition corresponds to a segment of size time interval, wherein the time partition and its corresponding measurement value constitute unit data of the partition data list. 8.根据权利要求1所述的一种工业设备时序数据存储及预处理方法,其特征在于,获取每个分区数据列表的单元数据对应的维度属性,并将分区数据列表按照存储逻辑顺序存储到表C中,包括:8. The method for storing and preprocessing time series data of industrial equipment according to claim 1, characterized in that the dimension attribute corresponding to the unit data of each partition data list is obtained, and the partition data list is stored in table C according to the storage logic order, including: 遍历分区数据列表,获取每个单元数据的时间范围;Traverse the partition data list and obtain the time range of each unit data; 根据任一时间区间内的任一单元数据对应的标准数据的字段,根据字段名称-维度属性映射表获取所有字段对应的维度属性,其中,所述维度属性为所述单元数据的维度属性;According to the fields of the standard data corresponding to any unit data in any time interval, the dimension attributes corresponding to all the fields are obtained according to the field name-dimension attribute mapping table, wherein the dimension attributes are the dimension attributes of the unit data; 为每个单元数据创建一条记录,并将其时间区间、单元数据值以及维度属性按照存储逻辑顺序存入表C。Create a record for each unit data, and store its time interval, unit data value, and dimension attributes in table C in the storage logic order.
CN202510156909.4A 2025-02-13 2025-02-13 Industrial equipment time series data storage and preprocessing method Pending CN119621855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202510156909.4A CN119621855A (en) 2025-02-13 2025-02-13 Industrial equipment time series data storage and preprocessing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202510156909.4A CN119621855A (en) 2025-02-13 2025-02-13 Industrial equipment time series data storage and preprocessing method

Publications (1)

Publication Number Publication Date
CN119621855A true CN119621855A (en) 2025-03-14

Family

ID=94894753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202510156909.4A Pending CN119621855A (en) 2025-02-13 2025-02-13 Industrial equipment time series data storage and preprocessing method

Country Status (1)

Country Link
CN (1) CN119621855A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256088A (en) * 2018-01-23 2018-07-06 清华大学 A kind of storage method and system of the time series data based on key value database
CN112307086A (en) * 2020-10-30 2021-02-02 湖北烽火平安智能消防科技有限公司 Automatic data verification method and device in fire service
CN114048217A (en) * 2021-10-21 2022-02-15 微民保险代理有限公司 Incremental data synchronization method and device, electronic equipment and storage medium
CN115914360A (en) * 2022-09-15 2023-04-04 成都飞机工业(集团)有限责任公司 A time series data storage method, device, equipment and storage medium
WO2024037629A1 (en) * 2022-08-19 2024-02-22 顺丰科技有限公司 Data integration method and apparatus for blockchain, and computer device and storage medium
CN118395290A (en) * 2024-05-13 2024-07-26 齐丰科技股份有限公司 Equipment modeling method suitable for discrete point position table

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256088A (en) * 2018-01-23 2018-07-06 清华大学 A kind of storage method and system of the time series data based on key value database
CN112307086A (en) * 2020-10-30 2021-02-02 湖北烽火平安智能消防科技有限公司 Automatic data verification method and device in fire service
CN114048217A (en) * 2021-10-21 2022-02-15 微民保险代理有限公司 Incremental data synchronization method and device, electronic equipment and storage medium
WO2024037629A1 (en) * 2022-08-19 2024-02-22 顺丰科技有限公司 Data integration method and apparatus for blockchain, and computer device and storage medium
CN115914360A (en) * 2022-09-15 2023-04-04 成都飞机工业(集团)有限责任公司 A time series data storage method, device, equipment and storage medium
CN118395290A (en) * 2024-05-13 2024-07-26 齐丰科技股份有限公司 Equipment modeling method suitable for discrete point position table

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谢文伟等: "《人工智能技术丛书 深度学习与计算机视觉 核心算法与应用》", 30 April 2023, 北京:北京理工大学出版社, pages: 63 - 64 *

Similar Documents

Publication Publication Date Title
CN110347116B (en) A machine tool state monitoring system and monitoring method based on operating data flow
US10679135B2 (en) Periodicity analysis on heterogeneous logs
CN105653427B (en) A Log Monitoring Method Based on Behavior Anomaly Detection
KR101611166B1 (en) System and Method for Deducting about Weak Signal Using Big Data Analysis
WO2012073526A1 (en) Data processing system, and data processing device
CN118378195B (en) Screw air compressor fault prediction method based on multi-source data fusion
CN112800061B (en) Data storage method, device, server and storage medium
CN116066343B (en) An intelligent early warning method and system for oil pump unit fault model
Mueen et al. AWarp: Fast warping distance for sparse time series
Egri et al. Cross-correlation based clustering and dimension reduction of multivariate time series
CN109145109B (en) User group message propagation abnormity analysis method and device based on social network
CN117572837B (en) Intelligent power plant AI active operation and maintenance method and system
CN119621855A (en) Industrial equipment time series data storage and preprocessing method
CN107357919A (en) User behaviors log inquiry system and method
CN116910590A (en) Gas sensor accuracy anomaly identification method and system based on adaptive clustering
CN114880584B (en) A method for fault analysis of generator sets based on community discovery
CN116431702A (en) Industrial big data analysis method and platform based on industrial Internet
EP3926428B1 (en) Control device, control program, and control system
CN113064791A (en) Scattered label feature extraction method based on real-time monitoring of mass log data
Supardi et al. An evolutionary stream clustering technique for outlier detection
CN118820910B (en) Heterogeneous network security big data management method and system
CN117251532B (en) Large-scale literature mechanism disambiguation method based on dynamic multistage matching
CN116861204B (en) Intelligent manufacturing equipment data management system based on digital twinning
CN118503884B (en) Equipment state identification method, equipment and medium
CN118820739B (en) Method, device and medium for visual playback of time series data based on key point recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination