CN112215307B - Method for automatically detecting signal abnormality of earthquake instrument by machine learning - Google Patents
Method for automatically detecting signal abnormality of earthquake instrument by machine learning Download PDFInfo
- Publication number
- CN112215307B CN112215307B CN202011300744.7A CN202011300744A CN112215307B CN 112215307 B CN112215307 B CN 112215307B CN 202011300744 A CN202011300744 A CN 202011300744A CN 112215307 B CN112215307 B CN 112215307B
- Authority
- CN
- China
- Prior art keywords
- data
- value
- sample
- probability density
- density function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 11
- 238000000034 method Methods 0.000 title claims abstract description 11
- 230000005856 abnormality Effects 0.000 title abstract description 3
- 230000002159 abnormal effect Effects 0.000 claims abstract description 31
- 230000006870 function Effects 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 238000012795 verification Methods 0.000 claims abstract description 5
- 238000010606 normalization Methods 0.000 claims abstract description 3
- 238000002790 cross-validation Methods 0.000 claims description 4
- 238000012216 screening Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 abstract 1
- 238000012544 monitoring process Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Geophysics And Detection Of Objects (AREA)
Abstract
The invention discloses a method for automatically detecting signal abnormality of a seismic instrument by applying machine learning, which comprises the following steps: s1, collecting data sets of the same type in the past; s2, taking continuous records of each channel of each station in the data set for a fixed period of time as a sample; s3, extracting various characteristic values capable of representing signal states from each sample; s4, carrying out normalization processing on each characteristic value; s5, manufacturing a training set, a cross verification set and a test set; s6, constructing a probability density function model; selecting a threshold epsilon of the decision boundary; s7, checking the probability density function model by adopting data in the test set; s8, checking and analyzing a sample with calculation judgment errors, and adding a new characteristic value of the abnormal characteristic of the sample; then, carrying out the steps S4 to S7 again, and training an optimization model; and S9, processing the real-time data of the earthquake station according to the S2-S4, and detecting the real-time data by using an optimization model.
Description
Technical Field
The invention relates to the field of seismic monitoring, in particular to a method for automatically detecting signal anomalies of a seismic instrument by using machine learning.
Background
At present, in the field of earthquake monitoring, seismometer equipment in a station network system can acquire and view data in real time, and abnormal station signals can be manually distinguished from data waveforms transmitted back in real time. However, as the construction of the earthquake stations is quickened, the number of the total stations in one province is increased from tens to hundreds to thousands, and after the station data is transmitted to the system, abnormal signals are difficult to distinguish in huge waveform data only by manpower, so that inconvenience is brought to earthquake monitoring work.
Disclosure of Invention
The invention aims to solve the problems and provide a method for automatically detecting signal anomalies of a seismic instrument by using machine learning, which is simple to operate and improves efficiency.
In order to achieve the above object, the technical scheme of the present invention is as follows:
a method for automatically detecting seismic instrument signal anomalies using machine learning, comprising the steps of:
s1, collecting data sets of the same type in the past;
s2, checking and analyzing a data set, taking continuous records of each channel of each station in the data set as one sample, manually screening the sample, deleting the sample with obvious errors or vacancies, manually identifying the sample, and dividing the data set into two subsets of normal and abnormal;
s3, extracting various characteristic values capable of representing signal states from each sample;
s4, carrying out normalization processing on each characteristic value;
s5, selecting 60% of data from the normal subset as a training set; selecting 20% of data from the normal subset, 50% of data from the abnormal subset as a cross validation set, and the rest data as a test set;
s6, constructing a probability density function model according to the average value and variance of each characteristic value in the data of the training set; selecting a threshold epsilon of a judgment boundary through data in the cross verification set;
s7, aiming at a threshold epsilon of the selected judgment boundary, adopting data in a test set to test the probability density function model;
s8, checking and analyzing a sample with calculation judgment errors after the probability density function model is checked, and adding a new characteristic value of the abnormal characteristic of the sample; then, carrying out the steps S4 to S7 again, and training an optimization model;
and S9, processing the real-time data of the earthquake station according to the S2-S4, and detecting the real-time data by using an optimization model.
Further, the characteristic values in the step S3 include an average value, a median value, a maximum value, a minimum value, and an amplitude value.
Further, in the step S3, when the characteristic value is extracted from the sample, a sliding time window is first set, and then the difference value of the maximum value, the minimum value, the intermediate value, the average value and the amplitude value of the adjacent time windows is used as the characteristic value.
Further, the step S6 of constructing a probability density function model includes the following steps:
s1, for a given training set x (1) ,x (1) ,...,x (m) The average value and the variance value are calculated for each characteristic value, and the calculation formula is as follows:
where m is the number of samples, μ j For the average value of the eigenvalues j in the training set,the variance of the characteristic value j in the training set;
s2, establishing a probability density function model through the average value and the variance value, wherein the calculation formula is as follows:
wherein p (x) is a probability density function, n is the number of eigenvalues,as a probability density function of the eigenvalues j, μ j For the mean value of the eigenvalues j in the training set, +.>The variance of the characteristic value j in the training set;
s3, setting a threshold epsilon of a judgment boundary, and predicting the abnormal condition of the data by taking p (x) =epsilon as the judgment boundary, wherein the abnormal condition is normal when p (x) > epsilon, otherwise the abnormal condition is abnormal;
s4, substituting the data in the cross verification set into a probability density function model, and selecting a threshold epsilon of the judgment boundary according to the accuracy and the recall.
Compared with the prior art, the invention has the advantages and positive effects that:
according to the invention, the sample is prepared by collecting the previous data, and the probability density function model is established by extracting the characteristic values in the sample, so that when the seismic data is monitored, the real-time data of a mass of stations can be identified by inputting the real-time data of the seismic stations into the probability density function model, and the normal waveform and the abnormal waveform are automatically distinguished in a state of not participating in the manual work, so that the seismic stations with abnormal signals are screened, the monitoring efficiency is improved while the labor cost is reduced, and convenience is brought to the seismic monitoring work.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.
FIG. 1 is a schematic diagram of an anomaly monitoring principle;
fig. 2 is a frame structure diagram of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, modifications, equivalents, improvements, etc., which are apparent to those skilled in the art without the benefit of this disclosure, are intended to be included within the scope of this invention.
As shown in fig. 1 and 2, the invention can utilize a machine learning method to test and judge the real-time data of the earthquake station, and can rapidly identify the station with abnormal earthquake signals.
Principle of: real-time data can be considered a collection, with "normal" data generally having similarities, and "abnormal" data being data points that differ significantly from other data points, and are therefore referred to as outliers. Anomaly detection techniques in machine learning are processes that find outliers in data (data points that differ significantly from most data points).
As shown in fig. 1, assuming that the dataset has two features, x1 and x2, points in the graph that deviate too far can be considered "outliers" because outliers behave quite differently from other data.
For a given data set x (1) ,x (2) ,...,x (m) Assuming that the features satisfy a gaussian distribution, μ and σ can be calculated for each feature 2 Is the value of (1):
where m is the number of samples, μ j For the average value of the eigenvalues j in the training set,the variance of the characteristic value j in the training set;
mu and sigma are obtained 2 Given a new training example, p (x) can be calculated from the model:
wherein p (x) is a probability density function, n is the number of eigenvalues,as a probability density function of the eigenvalues j, μ j For the mean value of the eigenvalues j in the training set, +.>The variance of the characteristic value j in the training set;
when p (x) is smaller than the threshold epsilon, it is determined as abnormal.
As shown in fig. 2, the present invention is implemented as follows:
1. collecting data: collecting data sets (including data with abnormal signals) of the same type in the past;
2. data cleaning and arrangement: and checking and analyzing the data set, taking the continuous record of each channel of each station in the data set for a period of time as a sample, manually screening the sample, and deleting samples with obvious errors or vacancies in formats, contents and the like. Manually identifying and sorting the data set, performing time-course diagram on each sample, performing manual identification, and dividing the data set into two subsets of normal and abnormal;
3. characteristic engineering: various features are extracted from each sample that can represent the signal state. Such as extracting statistical features (e.g., average, median, maximum, minimum, amplitude, etc.) of the whole data; because the seismic data changes along with time, in order to reflect the time characteristic, a sliding time window needs to be set, for example, a 10s time window is set, each time the seismic data slides for 1s, statistical characteristics in the time window are extracted, and the seismic data slides once and is extracted until the tail end of the seismic data. In order to embody the variation characteristics of the data, the differences of the maximum value, the minimum value, the median value, the average value and the amplitude of the adjacent time windows are counted, and the statistical characteristics of the differences are taken out.
4. And (3) feature processing: in order to make the algorithm more effective, each characteristic value is normalized; looking at the distribution of each feature, the feature can be transformed to approximate the normal distribution.
5. Data distribution: selecting 60% of normal data from the normal data set as a training set; 20% normal data and 50% abnormal data were used as the cross validation set, the remaining data as the test set, and the label was made.
6. And (3) constructing a model: estimating the mean and variance of the features and constructing a probability density function p (x) according to the data of the training set; for the cross validation set, we tried to predict the anomalies of the data using a different threshold ε, p (x) =ε as the decision boundary, and were normal when p (x) > ε, and were otherwise anomalous. Finally, selecting a threshold epsilon according to the correct rate and the recall rate (or F1 value: F1 value = correct rate x 2/(correct rate + recall rate));
7. and (3) checking a model: for the selected threshold epsilon, adopting a test set to detect, and calculating the accuracy and recall rate (F1 value) of the abnormal inspection system;
8. optimizing a model: observing the results of the model test, if an abnormal sample is mistaken by the algorithm as normal, means that the sample has a higher p (x) value. At this time, the sample needs to be checked and analyzed, and new features which can represent the abnormal characteristics of the sample are added. Then, the 4 th to 7 th steps are carried out again, an optimal model is trained until all abnormal samples in the test set are identified;
9. practical application: and (3) processing the real-time data of the earthquake station according to steps 2-4, namely performing anomaly detection on the real-time earthquake waveform according to the optimal model obtained in step 8.
According to the invention, the sample is prepared by collecting the previous data, and the probability density function model is established by extracting the characteristic values in the sample, so that when the seismic data is monitored, the real-time data of a mass of stations can be identified by inputting the real-time data of the seismic stations into the probability density function model, and the normal waveform and the abnormal waveform are automatically distinguished in a state of not participating in the manual work, so that the seismic stations with abnormal signals are screened, the monitoring efficiency is improved while the labor cost is reduced, and convenience is brought to the seismic monitoring work.
Claims (3)
1. A method for automatically detecting signal anomalies of a seismic instrument by using machine learning, which is characterized in that: the method comprises the following steps:
s1, collecting data sets of the same type in the past;
s2, checking and analyzing a data set, taking continuous records of each channel of each station in the data set as one sample, manually screening the sample, deleting the sample with obvious errors or vacancies, manually identifying the sample, and dividing the data set into two subsets of normal and abnormal;
s3, extracting various characteristic values capable of representing signal states from each sample;
s4, carrying out normalization processing on each characteristic value;
s5, selecting 60% of data from the normal subset as a training set; selecting 20% of data from the normal subset, 50% of data from the abnormal subset as a cross validation set, and the rest data as a test set;
s6, constructing a probability density function model according to the average value and variance of each characteristic value in the data of the training set; selecting a threshold epsilon of a judgment boundary through data in the cross verification set;
s7, aiming at a threshold epsilon of the selected judgment boundary, adopting data in a test set to test the probability density function model;
s8, checking and analyzing a sample with calculation judgment errors after the probability density function model is checked, and adding a new characteristic value of the abnormal characteristic of the sample; then, carrying out the steps S4 to S7 again, and training an optimization model;
s9, processing real-time data of the earthquake station according to the S2-S4, and detecting the real-time data by using an optimization model;
the constructing the probability density function model in the step S6 comprises the following steps:
s61, for a given training set x (1) ,x (2) ,...,x (m) The average value and the variance value are calculated for each characteristic value, and the calculation formula is as follows:
where m is the number of samples, μ j For the average value of the eigenvalues j in the training set,the variance of the characteristic value j in the training set;
s62, establishing a probability density function model through the average value and the variance value, wherein the calculation formula is as follows:
wherein p (x) is a probability density function, n is the number of eigenvalues,as a probability density function of the eigenvalues j, μ j For the mean value of the eigenvalues j in the training set, +.>The variance of the characteristic value j in the training set;
s63, setting a threshold epsilon of a judgment boundary, and predicting the abnormal condition of the data by taking p (x) =epsilon as the judgment boundary, wherein the abnormal condition is normal when p (x) > epsilon, otherwise the abnormal condition is abnormal;
s64, substituting the data in the cross verification set into a probability density function model, and selecting a threshold epsilon of the judgment boundary according to the accuracy and the recall.
2. The method for automatically detecting seismic instrument signal anomalies using machine learning of claim 1, wherein: the characteristic values in the step S3 include an average value, a median value, a maximum value, a minimum value, and an amplitude value.
3. The method for automatically detecting seismic instrument signal anomalies using machine learning of claim 2, wherein: when the characteristic value is extracted from the sample in the step S3, a sliding time window is set first, and then the difference value of the maximum value, the minimum value, the intermediate value, the average value and the amplitude value of the adjacent time windows is used as the characteristic value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011300744.7A CN112215307B (en) | 2020-11-19 | 2020-11-19 | Method for automatically detecting signal abnormality of earthquake instrument by machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011300744.7A CN112215307B (en) | 2020-11-19 | 2020-11-19 | Method for automatically detecting signal abnormality of earthquake instrument by machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112215307A CN112215307A (en) | 2021-01-12 |
CN112215307B true CN112215307B (en) | 2024-03-19 |
Family
ID=74067857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011300744.7A Active CN112215307B (en) | 2020-11-19 | 2020-11-19 | Method for automatically detecting signal abnormality of earthquake instrument by machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112215307B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113325824B (en) * | 2021-06-02 | 2022-10-25 | 三门核电有限公司 | Regulating valve abnormity identification method and system based on threshold monitoring |
CN115240428B (en) * | 2022-07-29 | 2024-05-14 | 浙江数智交院科技股份有限公司 | Tunnel operation abnormality detection method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108647891A (en) * | 2018-05-14 | 2018-10-12 | 口口相传(北京)网络技术有限公司 | Data exception classification, Reasons method and device |
CN109311478A (en) * | 2016-12-30 | 2019-02-05 | 同济大学 | A kind of automatic Pilot method for controlling driving speed based on comfort level |
CN109738939A (en) * | 2019-03-21 | 2019-05-10 | 蔡寅 | A kind of Precursory Observational Data method for detecting abnormality |
CN110389264A (en) * | 2019-07-01 | 2019-10-29 | 浙江大学 | A detection method for abnormal power consumption metering |
CN111666187A (en) * | 2020-05-20 | 2020-09-15 | 北京百度网讯科技有限公司 | Method and apparatus for detecting abnormal response time |
-
2020
- 2020-11-19 CN CN202011300744.7A patent/CN112215307B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109311478A (en) * | 2016-12-30 | 2019-02-05 | 同济大学 | A kind of automatic Pilot method for controlling driving speed based on comfort level |
CN108647891A (en) * | 2018-05-14 | 2018-10-12 | 口口相传(北京)网络技术有限公司 | Data exception classification, Reasons method and device |
CN109738939A (en) * | 2019-03-21 | 2019-05-10 | 蔡寅 | A kind of Precursory Observational Data method for detecting abnormality |
CN110389264A (en) * | 2019-07-01 | 2019-10-29 | 浙江大学 | A detection method for abnormal power consumption metering |
CN111666187A (en) * | 2020-05-20 | 2020-09-15 | 北京百度网讯科技有限公司 | Method and apparatus for detecting abnormal response time |
Non-Patent Citations (2)
Title |
---|
Anomaly recognition of ultra low frequency electric data based on artificial neutral network;jianqin An 等;《2016 9th international congress on image and signal processing,biomedical engineering and informatics》;第1-2页 * |
地震前兆数据异常识别关键技术研究;刘子维 等;《中国博士学位论文全文数据库基础科学辑》;第133页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112215307A (en) | 2021-01-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263846B (en) | Fault diagnosis method based on fault data deep mining and learning | |
CN113344134B (en) | Low-voltage distribution monitoring terminal data acquisition abnormality detection method and system | |
CN112215307B (en) | Method for automatically detecting signal abnormality of earthquake instrument by machine learning | |
CN117368651B (en) | Comprehensive analysis system and method for faults of power distribution network | |
CN118260723B (en) | Cable channel structure subsides monitoring system | |
CN111398798B (en) | Circuit breaker energy storage state identification method based on vibration signal interval feature extraction | |
CN116520236B (en) | Anomaly detection method and system for smart meters | |
CN108956111A (en) | A kind of the abnormal state detection method and detection system of mechanical part | |
CN114244594A (en) | Network flow abnormity detection method and detection system | |
CN112070073B (en) | Logging curve abnormity discrimination method based on Markov chain transition probability matrix eigenvalue classification and support vector machine | |
CN116720095A (en) | An electrical characteristic signal clustering method based on genetic algorithm optimization of fuzzy C-means | |
Li et al. | Meteorological radar fault diagnosis based on deep learning | |
CN118051863B (en) | Health data acquisition system and method based on digital metering technology | |
CN116720073A (en) | Abnormality detection extraction method and system based on classifier | |
CN114390002B (en) | Network flow multi-module clustering anomaly detection method based on grouping conditional entropy | |
CN112732773B (en) | Method and system for checking uniqueness of relay protection defect data | |
CN112699609B (en) | Diesel engine reliability model construction method based on vibration data | |
CN115659271A (en) | Sensor abnormality detection method, model training method, system, device, and medium | |
CN115081485A (en) | AI-based automatic analysis method for magnetic flux leakage internal detection data | |
Deuschle et al. | Robust sensor spike detection method based on dynamic time warping | |
Richter | Change detection in dynamic fitness landscapes: An immunological approach | |
CN116448062B (en) | Bridge settlement deformation detection method, device, computer and storage medium | |
CN118916744B (en) | A method and system for monitoring fault data of physical therapy equipment | |
CN117689114B (en) | A pollution monitoring system for groundwater | |
CN118644088B (en) | Pediatric nursing risk assessment model building system based on data analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |