[go: up one dir, main page]

CN116955444B - Method and system for mining collected noise points based on big data analysis - Google Patents

Method and system for mining collected noise points based on big data analysis Download PDF

Info

Publication number
CN116955444B
CN116955444B CN202310717597.0A CN202310717597A CN116955444B CN 116955444 B CN116955444 B CN 116955444B CN 202310717597 A CN202310717597 A CN 202310717597A CN 116955444 B CN116955444 B CN 116955444B
Authority
CN
China
Prior art keywords
data
signal
noise
feature
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310717597.0A
Other languages
Chinese (zh)
Other versions
CN116955444A (en
Inventor
凌晓华
王晓宇
章熠辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liu Fu
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310717597.0A priority Critical patent/CN116955444B/en
Publication of CN116955444A publication Critical patent/CN116955444A/en
Application granted granted Critical
Publication of CN116955444B publication Critical patent/CN116955444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2136Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of data noise point mining, in particular to a method and a system for mining collected noise points based on big data analysis. The method comprises the following steps: carrying out data standardization processing on the cleaning big data to obtain standardized big data; performing feature extraction on the standardized big data to obtain big data features, calculating matrix variance values corresponding to each matrix in the feature matrix, determining an abnormal matrix in the feature matrix, constructing a data scatter diagram corresponding to the standardized big data, calculating distance values between each data point in the data scatter diagram, obtaining a total distance value of the data points, and determining discrete data points in the data scatter diagram; and determining noise data in the standardized big data, extracting a data signal corresponding to the noise data, calculating a signal sparse value corresponding to the data signal, and executing noise optimization processing of the data noise point to obtain a noise optimization processing result. The invention aims to improve the accuracy of mining the collected noise points based on big data analysis.

Description

Method and system for mining collected noise points based on big data analysis
Technical Field
The invention relates to the technical field of data noise point mining, in particular to a method and a system for mining collected noise points based on big data analysis.
Background
Big data analysis refers to methods, tools and applications for collecting, processing and deriving insight from a variety of large, high-speed data sets, which may come from various sources, such as Web, mobile applications, email, social media and networking smart devices, which typically represent data that is generated at high speed, in a variety of forms, from structured (database tables, excel tables) to semi-structured (XML files, web pages) to unstructured (images, audio files) should be complete, but the data during collection will interfere with the data due to malfunctions of the device or human operational errors and other acoustic disturbances, creating noise data that in turn affects subsequent data analysis, thus requiring mining of noise points for the data in order to improve the accuracy of subsequent data analysis.
However, the existing method for mining the collected noise points based on big data analysis mainly comprises the steps of extracting redundant data in the big data, marking redundant data nodes corresponding to the redundant data, identifying redundant node fields of the redundant data nodes, obtaining noise points of the big data according to the redundant node fields, however, this method only processes redundant data in data, and since noise data in large data is various in cause and type, the use of this method causes a reduction in the efficiency of mining noise points of large data, and thus a method capable of improving the efficiency of mining noise points collected based on analysis of large data is required.
Disclosure of Invention
The invention provides a method and a system for mining collected noise points based on big data analysis to solve at least one technical problem.
A method for mining collected noise points based on big data analysis comprises the following steps:
step S1: acquiring original big data to be mined and a data field, performing data cleaning on the original big data to obtain cleaning big data, and performing data standardization processing on the cleaning big data according to the data field to obtain standardized big data;
Step S2: performing feature extraction on the standardized big data to obtain big data features, constructing feature matrixes corresponding to the big data features, calculating matrix variance values corresponding to each matrix in the feature matrixes, and determining abnormal matrixes in the feature matrixes according to the matrix variance values, wherein the calculating the matrix variance values corresponding to each matrix in the feature matrixes comprises the following steps:
Calculating a matrix variance value corresponding to each matrix in the feature matrix through the following formula:
Wherein E represents a matrix variance value corresponding to each matrix in the feature matrix, a represents a matrix serial number of the feature matrix, y represents a total number of the feature matrix, G a represents a matrix expected value of the a-th feature matrix, G a represents a matrix value corresponding to the a-th feature matrix, Representing the average value of the feature matrix;
Step S3: performing linear conversion on the standardized big data to obtain a big data linear value, constructing a data scatter diagram corresponding to the standardized big data according to the big data linear value, calculating a distance value between each data point in the data scatter diagram to obtain a total data point distance value, and determining discrete data points in the data scatter diagram according to the total data point distance value;
step S4: and combining the abnormal matrix and the data discrete points, determining noise data in the standardized big data, extracting a data signal corresponding to the noise data, calculating a signal sparse value corresponding to the data signal, calculating a signal to noise ratio corresponding to the data signal, inquiring a data noise point in the noise data according to the signal to noise ratio and the signal sparse value, extracting a noise characteristic corresponding to the data noise point, constructing a noise optimization scheme corresponding to the data noise point according to the noise characteristic, and executing noise optimization processing of the data noise point according to the noise optimization scheme to obtain a noise optimization processing result.
In an embodiment of the present disclosure, the performing, according to the data field, data normalization processing on the cleaning big data to obtain normalized big data includes:
Dispatching historical big data in the data field, analyzing a data architecture of each data in the historical big data, and determining a data format of each data in the historical big data according to the data architecture;
Measuring the format frequency of each format in the data formats, and determining the standard format in the historical big data according to the format frequency;
Inquiring a format source code corresponding to the standard format, and formulating a format converter for cleaning big data according to the format source code;
And carrying out format standardization processing on the cleaning big data by using the format converter to obtain standardized big data.
In one embodiment of the present disclosure, the constructing a feature matrix corresponding to the big data feature includes:
performing dimension reduction processing on the big data features to obtain dimension reduction features, and performing vector conversion on the dimension reduction features to obtain feature vectors;
Calculating a feature vector value corresponding to the feature vector, and taking the feature vector value as a feature value corresponding to the big data feature;
And calculating vector similarity coefficients among the feature vectors, and constructing a feature matrix corresponding to the big data features according to the vector similarity coefficients and the feature values.
In one embodiment of the present specification, the calculating the vector similarity coefficient between the feature vectors includes:
vector similarity coefficients between the feature vectors are calculated by the following formula:
Wherein D represents a vector similarity coefficient between feature vectors, B represents a sequence number of feature vectors, B represents the number of feature vectors, A b represents a vector length corresponding to the B-th feature vector, The average value of the lengths of all the eigenvectors is represented, and a b+1 represents the length of the vector corresponding to the (b+1) th eigenvector.
In one embodiment of the present disclosure, the constructing a data scatter diagram corresponding to the normalized big data according to the big data linear value includes:
acquiring a data sequence of each datum in the standardized big data, and extracting variable data of each datum in the standardized big data, wherein the variable data comprises self-variable data and dependent variable data;
analyzing the variable relation between the self-variable data and the dependent variable data, and calculating variable values corresponding to the self-variable data and the dependent variable data according to the variable relation and the big data linear value to obtain a first variable value and a second variable value;
and constructing a data scatter diagram corresponding to the standardized big data according to the first variable value, the second variable value and the data sequence.
In one embodiment of the present disclosure, the calculating a distance value between each data point in the data scatter plot, to obtain a total distance value between the data points, includes:
calculating a total value of distances between each data point in the data scatter plot by the following formula:
Wherein H represents the total value of the distances between each data point in the data scatter plot, i the serial number of the data points in the data scatter plot, q represents the number of the data points in the data scatter plot, K i-1 represents the coordinate value of the i-1 st data point in the data scatter plot, K i represents the coordinate value of the i-th data point in the data scatter plot, and K q represents the coordinate value of the q-th data point in the data scatter plot.
In one embodiment of the present disclosure, the calculating a signal sparseness value corresponding to the data signal includes:
identifying a data time domain signal and a data frequency domain signal in the data signals, and carrying out Fourier transform on the data time domain signals to obtain transformed data signals;
According to the transformed data signal and the data frequency domain signal, carrying out signal reconstruction on the data signal to obtain a target data signal;
And calculating signal information entropy corresponding to the target data signal, and taking the signal information entropy as a signal sparse value corresponding to the data signal.
In one embodiment of the present disclosure, the calculating the signal information entropy corresponding to the target data signal includes:
Calculating signal information entropy corresponding to the target data signal according to the following formula:
Wherein P represents the signal information entropy corresponding to the target data signal, j represents the sequence number of the time period corresponding to the target data signal, t represents the number of packets in the target data signal, N j represents the data signal value of the jth time period in the target data signal, and M (N j) represents the probability of occurrence of the data signal value of the jth time period in the target data signal.
In an embodiment of the present disclosure, the constructing, according to the noise characteristic, a noise optimization scheme corresponding to the data noise point includes:
Inquiring a characteristic index corresponding to the noise characteristic, and extracting an index parameter corresponding to the characteristic index;
Calculating the parameter difference between the index parameter and the standard index parameter, and determining an index to be optimized in the characteristic index according to the parameter difference;
Inquiring an optimization rule of each index in the indexes to be optimized, and constructing a noise optimization scheme corresponding to the data noise point according to the optimization rule.
An acquisition noise point mining system based on big data analysis, the system comprising:
The data processing module is used for acquiring the original big data to be mined and the data field, carrying out data cleaning on the original big data to obtain cleaning big data, and carrying out data standardization processing on the cleaning big data according to the data field to obtain standardized big data;
the matrix variance calculating module is configured to perform feature extraction on the standardized big data to obtain big data features, construct feature matrices corresponding to the big data features, calculate matrix variance values corresponding to each matrix in the feature matrices, and determine abnormal matrices in the feature matrices according to the matrix variance values, where the calculating the matrix variance values corresponding to each matrix in the feature matrices includes:
Calculating a matrix variance value corresponding to each matrix in the feature matrix through the following formula:
Wherein E represents a matrix variance value corresponding to each matrix in the feature matrix, a represents a matrix serial number of the feature matrix, y represents a total number of the feature matrix, G a represents a matrix expected value of the a-th feature matrix, G a represents a matrix value corresponding to the a-th feature matrix, Representing the average value of the feature matrix;
The discrete point determining module is used for carrying out linear conversion on the standardized big data to obtain a big data linear value, constructing a data scatter diagram corresponding to the standardized big data according to the big data linear value, calculating a distance value between each data point in the data scatter diagram to obtain a total data point distance value, and determining discrete data points in the data scatter diagram according to the total data point distance value;
The noise point optimizing module is used for combining the abnormal matrix and the data discrete points, determining noise data in the standardized big data, extracting data signals corresponding to the noise data, calculating signal sparse values corresponding to the data signals, calculating signal to noise ratios corresponding to the data signals, inquiring data noise points in the noise data according to the signal to noise ratios and the signal sparse values, extracting noise characteristics corresponding to the data noise points, constructing a noise optimizing scheme corresponding to the data noise points according to the noise characteristics, executing noise optimizing processing of the data noise points according to the noise optimizing scheme, and obtaining noise optimizing processing results.
According to the method, the original big data to be mined and the data field are obtained, the original big data is subjected to data cleaning, invalid data and repeated data in the original big data can be removed, the data quality of the original big data is improved, and the relevant processing technical means corresponding to the original big data can be known through the data field. According to the invention, the standardized big data can be converted into a numerical form by linearly converting the standardized big data, so that the standardized big data can be visually displayed through numerical values, and the standardized big data can be more intuitively known; according to the method, the abnormal matrix and the data discrete points are combined, so that the noise data in the standardized big data are determined, the noise data in the standardized big data can be extracted completely, further the subsequent noise point analysis of the noise data is facilitated, and further the accuracy of noise point mining can be improved; therefore, the method and the system for mining the collected noise points based on the big data analysis can improve the accuracy of mining the collected noise points based on the big data analysis.
Drawings
Other features, objects and advantages of the application will become more apparent upon reading of the detailed description of a non-limiting implementation, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of a method for mining collected noise points based on big data analysis according to an embodiment of the present invention;
fig. 2 is a functional block diagram of an acquisition noise point mining system based on big data analysis according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
Referring to fig. 1, the method for mining the collected noise points based on big data analysis comprises the following steps:
step S1: acquiring original big data to be mined and a data field, performing data cleaning on the original big data to obtain cleaning big data, and performing data standardization processing on the cleaning big data according to the data field to obtain standardized big data;
According to the method, the original big data to be mined and the data field are acquired, data cleaning is carried out on the original big data, invalid data and repeated data in the original big data can be removed, the data quality of the original big data is improved, the relevant processing technical means corresponding to the original big data can be known through the data field, the original big data is required data acquired through a dispatching database or relevant equipment, noise data such as image data, video and audio data and text data are contained in the data, the data field is the type corresponding to the original big data, the data field corresponding to the image data is the image field, cleaning big data is data obtained after the original big data is subjected to repeated, invalid and abnormal data are removed, optionally, the data cleaning on the original big data can be achieved through a data cleaning tool, and the data cleaning tool is compiled by a script language.
According to the data field, the data standardization processing is carried out on the cleaning big data, and the data format in the cleaning big data can be standardized so as to facilitate the subsequent improvement of the data processing efficiency, wherein the standardized big data is the data obtained after the standardized processing of the format of the cleaning big data.
As an embodiment of the present invention, according to the data field, performing data normalization processing on the cleaning big data to obtain normalized big data, including: scheduling historical big data in the field of data, analyzing the data architecture of each data in the historical big data, determining the data format of each data in the historical big data according to the data architecture, measuring the format frequency of each format in the data format, determining the standard format in the historical big data according to the format frequency, inquiring the format source code corresponding to the standard format, formulating the format converter of the cleaning big data according to the format source code, and carrying out format standardization processing on the cleaning big data by utilizing the format converter to obtain standardized big data.
The data structure is formed by data structures corresponding to the historical big data, the data format is a format corresponding to each data in the historical big data when the data are processed by a computer, the frequency of the format is the frequency of each format of the data format, the format source code is a program code corresponding to the standard format, and the format converter is used for carrying out format standardization processing on the cleaning big data.
Optionally, the scheduling of the historical big data in the data field may be implemented by a data scheduler, the parsing of the data architecture of each data in the historical big data may be implemented by a structural analysis method, the metering of the format frequency of each format in the data format may be implemented by a scientific counting method, the querying of the format source code corresponding to the standard format may be implemented by a code querier, and the formulating of the format converter for cleaning the big data may be implemented by code programming according to the format source code.
Step S2: performing feature extraction on the standardized big data to obtain big data features, constructing feature matrixes corresponding to the big data features, calculating matrix variance values corresponding to each matrix in the feature matrixes, and determining abnormal matrixes in the feature matrixes according to the matrix variance values;
According to the method, the characteristic extraction is carried out on the standardized big data, so that the data characteristic property in the standardized big data can be obtained, the data characteristic of the standardized big data can be expressed more intuitively, and the construction of a subsequent characteristic matrix is facilitated, wherein the big data characteristic is the characteristic of each data in the standardized big data, and the characteristic can be realized by a principal component analysis method optionally.
The invention is convenient for converting the big data feature from abstract expression to numerical expression, thereby facilitating subsequent calculation of the feature variance value, wherein the feature matrix is an aggregate square matrix corresponding to the big data feature.
As an embodiment of the present invention, the constructing the feature matrix corresponding to the big data feature includes: performing dimension reduction processing on the big data features to obtain dimension reduction features, performing vector conversion on the dimension reduction features to obtain feature vectors, calculating feature vector values corresponding to the feature vectors, taking the feature vector values as feature values corresponding to the big data features, calculating vector similarity coefficients among the feature vectors, and constructing feature matrixes corresponding to the big data features according to the vector similarity coefficients and the feature values.
The dimension reduction feature is that features in the big data feature are reduced from high dimension to low dimension, the features in the big data feature can be converted into the same dimension so as to facilitate subsequent processing, the feature vector is a vector expression form corresponding to the dimension reduction feature, the feature vector value is a numerical value corresponding to the feature vector, and the vector similarity coefficient represents the vector similarity degree between the feature vectors.
Optionally, the dimension reduction processing on the big data feature may be implemented by a PCA linear dimension reduction method, the vector conversion on the dimension reduction feature may be implemented by a Word2vec algorithm, the calculation of the feature vector value corresponding to the feature vector may be implemented by a vector algorithm, such as addition and subtraction of a vector, and the like, the feature value is divided according to the magnitude of the value of the vector similarity coefficient, and then a feature matrix corresponding to the big data feature is constructed according to a matrix construction function, where the matrix construction function includes a zero function.
Further, as an optional embodiment of the present invention, the calculating a vector similarity coefficient between the feature vectors includes:
vector similarity coefficients between the feature vectors are calculated by the following formula:
Wherein D represents a vector similarity coefficient between feature vectors, B represents a sequence number of feature vectors, B represents the number of feature vectors, A b represents a vector length corresponding to the B-th feature vector, The average value of the lengths of all the eigenvectors is represented, and a b+1 represents the length of the vector corresponding to the (b+1) th eigenvector.
According to the method, the matrix variance value corresponding to each matrix in the feature matrix is calculated, and the deviation degree of each matrix in the feature matrix can be known through the matrix variance value, so that the abnormal matrix in the feature matrix can be determined conveniently, wherein the matrix variance value represents the deviation degree between each matrix in the feature matrix.
As one embodiment of the present invention, the calculating a matrix variance value corresponding to each matrix in the feature matrix includes:
Calculating a matrix variance value corresponding to each matrix in the feature matrix through the following formula:
Wherein E represents a matrix variance value corresponding to each matrix in the feature matrix, a represents a matrix serial number of the feature matrix, y represents a total number of the feature matrix, G a represents a matrix expected value of the a-th feature matrix, G a represents a matrix value corresponding to the a-th feature matrix, Representing the average value of the feature matrix.
According to the method, the abnormal matrix in the feature matrix is determined according to the matrix variance value, so that a matrix with a far deviation in the feature matrix can be obtained, the accuracy of the follow-up determination of noise data is improved, wherein the abnormal matrix is a matrix with a high deviation degree in the feature matrix, optionally, the matrix variance value is compared with a preset variance value, if the matrix variance value is larger than the preset variance value, the feature matrix corresponding to the matrix variance value is used as the abnormal matrix, and the preset variance value is a criterion of the matrix variance value, can be 0.8, and can be set according to an actual service scene.
Step S3: performing linear conversion on the standardized big data to obtain a big data linear value, constructing a data scatter diagram corresponding to the standardized big data according to the big data linear value, calculating a distance value between each data point in the data scatter diagram to obtain a total data point distance value, and determining discrete data points in the data scatter diagram according to the total data point distance value;
according to the invention, the standardized big data can be converted into a numerical form by linearly converting the standardized big data, so that the standardized big data can be visually displayed through the numerical value, and the standardized big data can be more intuitively known, wherein the big data linear value represents the numerical value corresponding to the standardized big data, and optionally, the linear conversion of the standardized big data can be realized through a linear function, such as a linear function.
According to the method, the data scatter diagram corresponding to the standardized big data is constructed according to the big data linear value, the distribution condition of the standardized big data can be known through the data scatter diagram, and the subsequent discrete data point determination is facilitated, wherein the data scatter diagram is a visual diagram corresponding to the standardized big data.
As one embodiment of the present invention, the constructing a data scatter diagram corresponding to the standardized big data according to the big data linear value includes: the method comprises the steps of obtaining a data sequence of each datum in standardized big data, extracting variable data of each datum in the standardized big data, analyzing variable relations between the self-variable data and dependent variable data, calculating variable values corresponding to the self-variable data and the dependent variable data according to the variable relations and the big data linear values, obtaining a first variable value and a second variable value, and constructing a data scatter diagram corresponding to the standardized big data according to the first variable value, the second variable value and the data sequence.
The data sequence is a sequence number of each datum in the standardized big data, the self-variable data is independent variable data in the standardized big data, the dependent variable data is dependent variable data in the standardized big data, a dependent object is the self-variable data, and the first variable value and the second variable value respectively represent linear values corresponding to the self-variable data and the dependent variable data.
Optionally, the data sequence of each data in the standardized big data may be obtained through an SQL query statement, the variable data of each data in the standardized big data may be extracted through a stepwise regression method, the variable relationship between the self-variable data and the dependent variable data may be analyzed through a regression analysis method, a variable relationship coefficient may be determined according to the variable relationship, the variable values corresponding to the self-variable data and the dependent variable data may be calculated by combining the big data linear value and the variable relationship coefficient, and the data scatter diagram corresponding to the standardized big data may be constructed through a EdrawMax tool.
According to the method, the distance value between each data point in the data scatter diagram is calculated to obtain the total distance value of the data points, so that the distance between each data point in the data scatter diagram can be known, and further the degree of dispersion between each data point in the data scatter diagram can be judged, wherein the total distance value of the data points represents the distance between each data point in the data scatter diagram.
As one embodiment of the present invention, the calculating a distance value between each data point in the data scatter diagram to obtain a total distance value of the data points includes:
calculating a total value of distances between each data point in the data scatter plot by the following formula:
Wherein H represents the total value of the distances between each data point in the data scatter plot, i the serial number of the data points in the data scatter plot, q represents the number of the data points in the data scatter plot, K i-1 represents the coordinate value of the i-1 st data point in the data scatter plot, K i represents the coordinate value of the i-th data point in the data scatter plot, and K q represents the coordinate value of the q-th data point in the data scatter plot.
According to the method, the discrete data points in the data scatter diagram are determined according to the data point distance total value, the data points with the longer distances in the data scatter diagram can be obtained, and the discrete data in the standardized big data can be obtained, wherein the discrete data points are the data points with the longer distances in the data scatter diagram, optionally, the data point distance total value can be compared with a preset distance value, and if the data point distance total value is larger than the preset distance value, the data points corresponding to the data point distance total value are taken as the discrete data points in the data scatter diagram.
Step S4: determining noise data in the standardized big data by combining the abnormal matrix and the data discrete points, extracting a data signal corresponding to the noise data, calculating a signal sparse value corresponding to the data signal, calculating a signal to noise ratio corresponding to the data signal, inquiring a data noise point in the noise data according to the signal to noise ratio and the signal sparse value, extracting a noise characteristic corresponding to the data noise point, constructing a noise optimization scheme corresponding to the data noise point according to the noise characteristic, executing noise optimization processing of the data noise point according to the noise optimization scheme, and obtaining a noise optimization processing result;
According to the method, the noise data in the standardized big data are determined by combining the abnormal matrix and the data discrete points, so that the noise data in the standardized big data can be extracted completely, further the subsequent noise point analysis of the noise data is facilitated, and further the accuracy of noise point mining can be improved, wherein the noise data are data influenced by interference in the standardized big data.
The invention can obtain the electric signal expression form of the noise data by extracting the data signal corresponding to the noise data so as to facilitate the calculation of the subsequent signal sparse value, wherein the data signal is an electric signal carrying information in the noise data, and optionally, the extraction of the data signal corresponding to the noise data can be realized by a signal collector.
According to the method and the device, the signal sparseness value corresponding to the data signal is calculated, so that the sparseness degree of the data signal can be known, and further the signal noise part in the data signal can be conveniently known, wherein the signal sparseness value represents the sparseness degree of the data signal.
As one embodiment of the present invention, the calculating a signal sparseness value corresponding to the data signal includes: and identifying a data time domain signal and a data frequency domain signal in the data signals, carrying out Fourier transform on the data time domain signals to obtain transformed data signals, carrying out signal reconstruction on the data signals according to the transformed data signals and the data frequency domain signals to obtain target data signals, calculating signal information entropy corresponding to the target data signals, and taking the signal information entropy as a signal sparse value corresponding to the data signals.
The data time domain signal is a signal transformed with time in the data signal, the data frequency domain signal is a signal corresponding to the data signal in a frequency domain, the transformed data signal is a frequency domain signal obtained by fourier transforming the data time domain signal, the target data signal is a signal reconstructed by the transformed data signal and the data frequency domain signal, and the signal information entropy is a probability representing occurrence of signals in each frequency domain in the target data signal, so that the separation degree of the signals can be known, and further the signal sparsity of the data signal can be judged.
Optionally, identifying the data time domain signal and the data frequency domain signal in the data signal may be implemented by a MATLAB tool, performing fourier transform on the data time domain signal may be implemented by a fourier transform algorithm, and performing signal reconstruction on the data signal may be implemented by a signal reconstruction algorithm.
As an optional embodiment of the present invention, the calculating signal information entropy corresponding to the target data signal includes:
Calculating signal information entropy corresponding to the target data signal according to the following formula:
Wherein P represents the signal information entropy corresponding to the target data signal, j represents the sequence number of the time period corresponding to the target data signal, t represents the number of packets in the target data signal, N j represents the data signal value of the jth time period in the target data signal, and M (N j) represents the probability of occurrence of the data signal value of the jth time period in the target data signal.
The invention can know the ratio of the intensity of the useful signal to the intensity of the interference signal in the data signal by calculating the signal-to-noise ratio corresponding to the data signal, thereby facilitating the subsequent inquiry of noise points in the noise data, wherein the signal-to-noise ratio is the ratio of the energy of the useful signal to the energy of the interference signal in the data signal, and further, the signal-to-noise ratio corresponding to the data signal can be obtained by calculating the ratio of the energy of the useful signal to the energy of the interference signal in the data signal.
According to the invention, the data noise points in the noise data are inquired according to the signal-to-noise ratio and the signal sparse value, so that the accurate noise points generated by the noise data can be obtained, and a related noise optimization scheme can be formulated conveniently, wherein the data noise points are specific reasons for the generation of the noise data, such as sound interference or equipment faults, and the like, optionally, the data noise points in the noise data can be inquired from a preset noise point table according to the signal-to-noise ratio and the signal sparse value, and the preset noise point table is a table formed by analyzing a large number of historical noise points, the signal-to-noise ratio and the mapping relation between the signal sparse value.
According to the method, the noise characteristics corresponding to the data noise points are extracted, and a noise optimization scheme corresponding to the data noise points is constructed according to the noise characteristics so as to remove the data noise points and improve the quality of the standardized big data, wherein the noise optimization scheme is a method for removing the data noise points, and optionally, the noise characteristics corresponding to the data noise points are extracted through a power spectrum method.
As an embodiment of the present invention, the constructing a noise optimization scheme corresponding to the data noise point according to the noise characteristic includes: inquiring the characteristic indexes corresponding to the noise characteristics, extracting index parameters corresponding to the characteristic indexes, calculating parameter differences between the index parameters and standard index parameters, determining indexes to be optimized in the characteristic indexes according to the parameter differences, inquiring the optimization rule of each index in the indexes to be optimized, and constructing a noise optimization scheme corresponding to the data noise points according to the optimization rule.
The characteristic indexes are characteristic items in the noise characteristic, the index parameters are data of each index in the characteristic indexes, the parameter difference is a difference value or a gap between the index parameters and the standard index parameters, the index to be optimized is an index to be optimized in the characteristic indexes, the optimization rule is an optimization method of each index in the index to be optimized, optionally, the characteristic index corresponding to the noise characteristic is queried through a Match function, the index parameter corresponding to the characteristic index is extracted through a lef function, the optimization rule of each index in the index to be optimized is queried from the Internet through a man-machine interaction mode, and the noise optimization scheme corresponding to the data noise point is obtained through combining the optimization rules.
According to the noise optimization scheme, the noise optimization processing of the data noise points is executed so as to improve the data quality of the standardized big data, wherein the noise optimization processing result is obtained after the data noise points are subjected to the optimization processing.
Fig. 2 is a functional block diagram of an acquisition noise point mining system based on big data analysis according to an embodiment of the present invention.
The system 100 for mining the collected noise points based on big data analysis can be installed in electronic equipment. Depending on the functions implemented, the system 100 for mining collected noise points based on big data analysis may include a data processing module 101, a matrix variance calculation module 102, a discrete point determination module 103, and a noise point optimization module 104. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
In the present embodiment, the functions concerning the respective modules/units are as follows:
The data processing module 101 is configured to obtain raw big data to be mined and a data field, perform data cleaning on the raw big data to obtain cleaned big data, and perform data standardization processing on the cleaned big data according to the data field to obtain standardized big data;
The matrix variance calculating module 102 is configured to perform feature extraction on the standardized big data to obtain big data features, construct feature matrices corresponding to the big data features, calculate matrix variance values corresponding to each matrix in the feature matrices, and determine abnormal matrices in the feature matrices according to the matrix variance values, where the calculating the matrix variance values corresponding to each matrix in the feature matrices includes:
Calculating a matrix variance value corresponding to each matrix in the feature matrix through the following formula:
Wherein E represents a matrix variance value corresponding to each matrix in the feature matrix, a represents a matrix serial number of the feature matrix, y represents a total number of the feature matrix, G a represents a matrix expected value of the a-th feature matrix, G a represents a matrix value corresponding to the a-th feature matrix, Representing the average value of the feature matrix;
The discrete point determining module 103 is configured to perform linear conversion on the standardized big data to obtain a big data linear value, construct a data scatter diagram corresponding to the standardized big data according to the big data linear value, calculate a distance value between each data point in the data scatter diagram to obtain a total data point distance value, and determine a discrete data point in the data scatter diagram according to the total data point distance value;
The noise point optimizing module 104 is configured to combine the anomaly matrix and the data discrete points, determine noise data in the standardized big data, extract a data signal corresponding to the noise data, calculate a signal sparse value corresponding to the data signal, calculate a signal-to-noise ratio corresponding to the data signal, query a data noise point in the noise data according to the signal-to-noise ratio and the signal sparse value, extract a noise characteristic corresponding to the data noise point, construct a noise optimizing scheme corresponding to the data noise point according to the noise characteristic, and execute noise optimizing processing of the data noise point according to the noise optimizing scheme to obtain a noise optimizing processing result.
In detail, each module in the big data analysis-based acquisition noise point mining system 100 in the embodiment of the present application adopts the same technical means as the big data analysis-based acquisition noise point mining method described in fig. 1, and can produce the same technical effects, which are not described herein.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (5)

1. The mining method for the collected noise points based on big data analysis is characterized by comprising the following steps of:
step S1: acquiring original big data to be mined and a data field, performing data cleaning on the original big data to obtain cleaning big data, and performing data standardization processing on the cleaning big data according to the data field to obtain standardized big data;
Step S2: performing feature extraction on the standardized big data to obtain big data features, constructing feature matrixes corresponding to the big data features, calculating matrix variance values corresponding to each matrix in the feature matrixes, and determining abnormal matrixes in the feature matrixes according to the matrix variance values, wherein the calculating the matrix variance values corresponding to each matrix in the feature matrixes comprises the following steps:
Calculating a matrix variance value corresponding to each matrix in the feature matrix through the following formula:
Wherein, Representing matrix variance values corresponding to each matrix in the feature matrix, a representing matrix serial numbers of the feature matrix, y representing total number of the feature matrix,Representing the matrix expectation of the a-th feature matrix,Representing the matrix value corresponding to the a-th feature matrix,Representing the average value of the feature matrix;
The constructing the feature matrix corresponding to the big data feature comprises the following steps:
performing dimension reduction processing on the big data features to obtain dimension reduction features, and performing vector conversion on the dimension reduction features to obtain feature vectors;
Calculating a feature vector value corresponding to the feature vector, and taking the feature vector value as a feature value corresponding to the big data feature;
Calculating vector similarity coefficients among the feature vectors, and constructing a feature matrix corresponding to the big data features according to the vector similarity coefficients and the feature values;
Wherein said calculating vector similarity coefficients between said feature vectors comprises:
vector similarity coefficients between the feature vectors are calculated by the following formula:
Wherein, Representing the vector similarity coefficients between the feature vectors,A sequence number representing a feature vector,The number of feature vectors is represented,Representing the vector length corresponding to the b-th feature vector,Representing the length average of all of the vectors in the feature vector,Representing the vector length corresponding to the (b+1) th feature vector;
Step S3: performing linear conversion on the standardized big data to obtain a big data linear value, constructing a data scatter diagram corresponding to the standardized big data according to the big data linear value, calculating a distance value between each data point in the data scatter diagram to obtain a total data point distance value, and determining discrete data points in the data scatter diagram according to the total data point distance value;
The calculating the distance value between each data point in the data scatter diagram to obtain the total distance value of the data points comprises the following steps:
calculating a total value of distances between each data point in the data scatter plot by the following formula:
Wherein, Representing the total value of the distance between each data point in the data scatter plot, i the sequence number of the data point in the data scatter plot,Representing the number of data points in the data scatter plot,Coordinate values representing the i-1 st data point in the data scatter plotCoordinate values representing the ith data point in the data scatter plot,Coordinate values representing the q-th data point in the data scatter plot;
Step S4: determining noise data in the standardized big data by combining the abnormal matrix and the discrete data points, extracting a data signal corresponding to the noise data, calculating a signal sparse value corresponding to the data signal, calculating a signal to noise ratio corresponding to the data signal, inquiring a data noise point in the noise data according to the signal to noise ratio and the signal sparse value, extracting a noise characteristic corresponding to the data noise point, constructing a noise optimization scheme corresponding to the data noise point according to the noise characteristic, executing noise optimization processing of the data noise point according to the noise optimization scheme, and obtaining a noise optimization processing result;
The calculating the signal sparse value corresponding to the data signal includes:
identifying a data time domain signal and a data frequency domain signal in the data signals, and carrying out Fourier transform on the data time domain signals to obtain transformed data signals;
According to the transformed data signal and the data frequency domain signal, carrying out signal reconstruction on the data signal to obtain a target data signal;
Calculating signal information entropy corresponding to the target data signal, and taking the signal information entropy as a signal sparse value corresponding to the data signal;
The calculating the signal information entropy corresponding to the target data signal includes:
Calculating signal information entropy corresponding to the target data signal according to the following formula:
Wherein, Represents the signal information entropy corresponding to the target data signal, j represents the sequence number of the time period corresponding to the target data signal, t represents the number of packets in the target data signal,A data signal value representing a j-th period in the target data signal,Representing the probability of occurrence of the data signal value for the j-th time period in the target data signal.
2. The method for mining collected noise points based on big data analysis according to claim 1, wherein the step of performing data normalization processing on the cleaned big data according to the data field to obtain normalized big data comprises the steps of:
dispatching historical big data in the data field, analyzing a data architecture of each data in the historical big data, and determining a data format of each data in the historical big data according to the data architecture;
Measuring the format frequency of each format in the data formats, and determining the standard format in the historical big data according to the format frequency;
Inquiring a format source code corresponding to the standard format, and formulating a format converter for cleaning big data according to the format source code;
And carrying out format standardization processing on the cleaning big data by using the format converter to obtain standardized big data.
3. The method for mining collected noise points based on big data analysis according to claim 1, wherein the constructing the data scatter diagram corresponding to the standardized big data according to the big data linear value includes:
acquiring a data sequence of each datum in the standardized big data, and extracting variable data of each datum in the standardized big data, wherein the variable data comprises self-variable data and dependent variable data;
analyzing the variable relation between the self-variable data and the dependent variable data, and calculating variable values corresponding to the self-variable data and the dependent variable data according to the variable relation and the big data linear value to obtain a first variable value and a second variable value;
and constructing a data scatter diagram corresponding to the standardized big data according to the first variable value, the second variable value and the data sequence.
4. The method for mining collected noise points based on big data analysis according to claim 1, wherein the constructing a noise optimization scheme corresponding to the data noise points according to the noise characteristics comprises:
Inquiring a characteristic index corresponding to the noise characteristic, and extracting an index parameter corresponding to the characteristic index;
Calculating the parameter difference between the index parameter and the standard index parameter, and determining an index to be optimized in the characteristic index according to the parameter difference;
Inquiring an optimization rule of each index in the indexes to be optimized, and constructing a noise optimization scheme corresponding to the data noise point according to the optimization rule.
5. A big data analysis based acquisition noise point mining system for performing the big data analysis based acquisition noise point mining method of any of claims 1-4, the system comprising:
The data processing module is used for acquiring the original big data to be mined and the data field, carrying out data cleaning on the original big data to obtain cleaning big data, and carrying out data standardization processing on the cleaning big data according to the data field to obtain standardized big data;
the matrix variance calculating module is configured to perform feature extraction on the standardized big data to obtain big data features, construct feature matrices corresponding to the big data features, calculate matrix variance values corresponding to each matrix in the feature matrices, and determine abnormal matrices in the feature matrices according to the matrix variance values, where the calculating the matrix variance values corresponding to each matrix in the feature matrices includes:
Calculating a matrix variance value corresponding to each matrix in the feature matrix through the following formula:
Wherein, Representing matrix variance values corresponding to each matrix in the feature matrix, a representing matrix serial numbers of the feature matrix, y representing total number of the feature matrix,Representing the matrix expectation of the a-th feature matrix,Representing the matrix value corresponding to the a-th feature matrix,Representing the average value of the feature matrix;
The constructing the feature matrix corresponding to the big data feature comprises the following steps:
performing dimension reduction processing on the big data features to obtain dimension reduction features, and performing vector conversion on the dimension reduction features to obtain feature vectors;
Calculating a feature vector value corresponding to the feature vector, and taking the feature vector value as a feature value corresponding to the big data feature;
Calculating vector similarity coefficients among the feature vectors, and constructing a feature matrix corresponding to the big data features according to the vector similarity coefficients and the feature values;
Wherein said calculating vector similarity coefficients between said feature vectors comprises:
vector similarity coefficients between the feature vectors are calculated by the following formula:
Wherein, Representing the vector similarity coefficients between the feature vectors,A sequence number representing a feature vector,The number of feature vectors is represented,Representing the vector length corresponding to the b-th feature vector,Representing the length average of all of the vectors in the feature vector,Representing the vector length corresponding to the (b+1) th feature vector;
The discrete point determining module is used for carrying out linear conversion on the standardized big data to obtain a big data linear value, constructing a data scatter diagram corresponding to the standardized big data according to the big data linear value, calculating a distance value between each data point in the data scatter diagram to obtain a total data point distance value, and determining discrete data points in the data scatter diagram according to the total data point distance value;
The calculating the distance value between each data point in the data scatter diagram to obtain the total distance value of the data points comprises the following steps:
calculating a total value of distances between each data point in the data scatter plot by the following formula:
Wherein, Representing the total value of the distance between each data point in the data scatter plot, i the sequence number of the data point in the data scatter plot,Representing the number of data points in the data scatter plot,Coordinate values representing the i-1 st data point in the data scatter plotCoordinate values representing the ith data point in the data scatter plot,Coordinate values representing the q-th data point in the data scatter plot;
The noise point optimization module is used for combining the abnormal matrix and the discrete data points, determining noise data in the standardized big data, extracting a data signal corresponding to the noise data, calculating a signal sparse value corresponding to the data signal, calculating a signal to noise ratio corresponding to the data signal, inquiring a data noise point in the noise data according to the signal to noise ratio and the signal sparse value, extracting a noise characteristic corresponding to the data noise point, constructing a noise optimization scheme corresponding to the data noise point according to the noise characteristic, and executing noise optimization processing of the data noise point according to the noise optimization scheme to obtain a noise optimization processing result;
The calculating the signal sparse value corresponding to the data signal includes:
identifying a data time domain signal and a data frequency domain signal in the data signals, and carrying out Fourier transform on the data time domain signals to obtain transformed data signals;
According to the transformed data signal and the data frequency domain signal, carrying out signal reconstruction on the data signal to obtain a target data signal;
Calculating signal information entropy corresponding to the target data signal, and taking the signal information entropy as a signal sparse value corresponding to the data signal;
The calculating the signal information entropy corresponding to the target data signal includes:
Calculating signal information entropy corresponding to the target data signal according to the following formula:
Wherein, Represents the signal information entropy corresponding to the target data signal, j represents the sequence number of the time period corresponding to the target data signal, t represents the number of packets in the target data signal,A data signal value representing a j-th period in the target data signal,Representing the probability of occurrence of the data signal value for the j-th time period in the target data signal.
CN202310717597.0A 2023-06-15 2023-06-15 Method and system for mining collected noise points based on big data analysis Active CN116955444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310717597.0A CN116955444B (en) 2023-06-15 2023-06-15 Method and system for mining collected noise points based on big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310717597.0A CN116955444B (en) 2023-06-15 2023-06-15 Method and system for mining collected noise points based on big data analysis

Publications (2)

Publication Number Publication Date
CN116955444A CN116955444A (en) 2023-10-27
CN116955444B true CN116955444B (en) 2024-08-23

Family

ID=88448308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310717597.0A Active CN116955444B (en) 2023-06-15 2023-06-15 Method and system for mining collected noise points based on big data analysis

Country Status (1)

Country Link
CN (1) CN116955444B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110098882A (en) * 2019-05-14 2019-08-06 大连大学 Multiple antennas broadband frequency spectrum detection method based on compressed sensing and entropy
CN111242873A (en) * 2020-01-21 2020-06-05 北京工业大学 Image denoising method based on sparse representation
CN113256805A (en) * 2021-04-27 2021-08-13 浙江省交通运输科学研究院 Rapid pavement linear crack information calculation method based on three-dimensional point cloud reconstruction
CN115329895A (en) * 2022-09-06 2022-11-11 南昌大学 Multi-source heterogeneous data noise reduction analysis and processing method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7490071B2 (en) * 2003-08-29 2009-02-10 Oracle Corporation Support vector machines processing system
WO2007127182A2 (en) * 2006-04-25 2007-11-08 Incel Vision Inc. Noise reduction system and method
US8498982B1 (en) * 2010-07-07 2013-07-30 Openlogic, Inc. Noise reduction for content matching analysis results for protectable content
CN103678500A (en) * 2013-11-18 2014-03-26 南京邮电大学 Data mining improved type K mean value clustering method based on linear discriminant analysis
CN108133232B (en) * 2017-12-15 2021-09-17 南京航空航天大学 Radar high-resolution range profile target identification method based on statistical dictionary learning
CN109189776A (en) * 2018-10-24 2019-01-11 广东电网有限责任公司 A kind of Method of Data with Adding Windows
CN110146876A (en) * 2019-05-31 2019-08-20 湖南省顺鸿智能科技有限公司 The method for carrying out human body target positioning based on comentropy
CN111651440A (en) * 2020-04-30 2020-09-11 深圳壹账通智能科技有限公司 User information identification method, device and computer-readable storage medium
CN112630768B (en) * 2020-09-29 2024-04-02 惠州市德赛西威汽车电子股份有限公司 Noise reduction method for improving frequency modulation continuous wave radar target detection
CN114049267A (en) * 2021-10-29 2022-02-15 西安建筑科技大学 Statistical and bilateral filtering point cloud denoising method based on improved neighborhood search
CN114964226B (en) * 2022-04-29 2024-09-20 燕山大学 Four-rotor gesture resolving method of noise self-adaptive strong tracking extended Kalman filter
CN115684363B (en) * 2022-10-28 2025-06-17 中国电建集团华东勘测设计研究院有限公司 Concrete performance degradation assessment method based on acoustic emission signal processing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110098882A (en) * 2019-05-14 2019-08-06 大连大学 Multiple antennas broadband frequency spectrum detection method based on compressed sensing and entropy
CN111242873A (en) * 2020-01-21 2020-06-05 北京工业大学 Image denoising method based on sparse representation
CN113256805A (en) * 2021-04-27 2021-08-13 浙江省交通运输科学研究院 Rapid pavement linear crack information calculation method based on three-dimensional point cloud reconstruction
CN115329895A (en) * 2022-09-06 2022-11-11 南昌大学 Multi-source heterogeneous data noise reduction analysis and processing method

Also Published As

Publication number Publication date
CN116955444A (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN110543860B (en) Mechanical fault diagnosis method and system based on TJM transfer learning
CN117574274B (en) A PSO-XGBoost system construction method with hybrid feature screening and hyperparameter optimization
CN110147760B (en) A New Method for Feature Extraction and Recognition of High Efficiency Power Quality Disturbance Image
CN111189638A (en) Bearing fault degree identification method based on HMM and QPSO optimization algorithm
CN116955444B (en) Method and system for mining collected noise points based on big data analysis
Zhang et al. Research on electronic circuit fault diagnosis method based on SWT and DCNN-ELM
CN116737799A (en) Method and system for mining collected noise points based on big data analysis
CN120847514A (en) Inverter aging detection method and device based on multi-source data analysis
CN109639283A (en) Workpiece coding method based on decision tree
CN119936564A (en) A fault location method and system for 35kV wind farm collector line
CN117391071B (en) News topic data mining method, device and storage medium
MOHAMMED et al. Developing fast techniques for periodicity analysis of time series
Sanandaji et al. Concentration of measure inequalities for compressive Toeplitz matrices with applications to detection and system identification
CN116089489B (en) A continuous aggregation time series data analysis method and system
Xu et al. TSUBASA: Climate Network Construction on Historical and Real-Time Data
CN116738255A (en) Aircraft electrical signal clustering method based on wavelet packet decomposition and fuzzy C-means
Shang et al. Wavelet-driven transformer-based dual-scale intelligent feature selection fault detector
CN113655309A (en) A secondary detection method of power system that can eliminate potential faults
Li et al. A Study of kNN using ICU multivariate time series data
Tani et al. A new algorithm for medical images indexing based on wavelet transform and principal component analysis
Atashgar et al. Multivariate Statistical Process Control Using Wavelet Approach
CN119720051B (en) Equipment predictive maintenance telemetry system based on industrial automation
CN119130387B (en) Power grid infrastructure investigation auxiliary method and system based on standard
Zhu et al. Transformative enhancement of predictive models via fourier transformer-based denoising for non-intrusive load monitoring
Wu et al. Top-k contrast order-preserving pattern mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240803

Address after: 164800 group 1, Ninth Committee, Baoan street, Kedong Town, Kedong County, Qiqihar City, Heilongjiang Province

Applicant after: Liu Fu

Country or region after: China

Address before: 510000, Guangdong Province, Guangzhou City, Panyu District, Dashi Street, Li Village, Xinguang Expressway, Dashi Subway Station, Exit B, self numbered 1, B822

Applicant before: Shared EasyPay (Guangzhou) Network Technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant