[go: up one dir, main page]

CN118331948B - Intelligent data management system and method using artificial intelligence and big data technology - Google Patents

Intelligent data management system and method using artificial intelligence and big data technology Download PDF

Info

Publication number
CN118331948B
CN118331948B CN202410485224.XA CN202410485224A CN118331948B CN 118331948 B CN118331948 B CN 118331948B CN 202410485224 A CN202410485224 A CN 202410485224A CN 118331948 B CN118331948 B CN 118331948B
Authority
CN
China
Prior art keywords
data
classification
score
module
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410485224.XA
Other languages
Chinese (zh)
Other versions
CN118331948A (en
Inventor
曾灶烟
李树湖
曾炽强
叶婷
曾幸钦
叶海萍
刘惠玲
朱艳青
董碧飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yechen Information Technology Co ltd
Original Assignee
Guangzhou Yechen Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yechen Information Technology Co ltd filed Critical Guangzhou Yechen Information Technology Co ltd
Priority to CN202410485224.XA priority Critical patent/CN118331948B/en
Publication of CN118331948A publication Critical patent/CN118331948A/en
Application granted granted Critical
Publication of CN118331948B publication Critical patent/CN118331948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data management, in particular to an intelligent data management system and method applying artificial intelligence and big data technology. The system comprises a data management platform, wherein the data management platform is in communication connection with an acquisition module, a screening module, a classification module and a storage module, the acquisition module is used for processing and analyzing data uploaded by a sending end, the screening module is used for receiving the data transmitted by the acquisition module and screening the data, whether the regression distance exceeds a preset distance value or not is judged by calculating the regression distance of the position of the data on different classification dimension feature layers from different classification dimension centers, if yes, the data meets the classification requirement, the data is transmitted to the classification module, a first classification score of the data is calculated based on the regression distance, the first classification score is multiplied with classification confidence of classification branch prediction to obtain a second classification score, the data is classified according to the second classification score, and the data can be classified rapidly, so that the data management efficiency is improved.

Description

Intelligent data management system and method applying artificial intelligence and big data technology
Technical Field
The invention relates to the technical field of data management, in particular to an intelligent data management system and method applying artificial intelligence and big data technology.
Background
With the rapid development of the information age, data has become an important resource in modern society. In the face of massive amounts of data, how to efficiently collect, process, analyze, and utilize such data is a significant challenge for enterprises. The artificial intelligence and big data technology are taken as two popular technologies in the current information technology field, and a strong technical support is provided for intelligent data management.
The continuous development of artificial intelligence technology enables machines to simulate intelligent behaviors of human beings and has the capabilities of learning, reasoning, decision making and the like. In data management, the artificial intelligence technology can help the system to realize automatic processing and analysis of data, reduce manual intervention and improve processing efficiency. For example, through a machine learning algorithm, the system can automatically identify and classify data, and intelligent labeling and classification of the data are realized. In addition, the deep learning technology can also be applied to the feature extraction and pattern recognition of the data, and provides powerful support for the deep analysis of the data.
On the other hand, big data technology provides powerful storage and computing power for data management. The big data technology can store and inquire mass data efficiently, and supports real-time processing and analysis of data. Through the big data platform, distributed processing and parallel computing of data can be realized, and the speed and efficiency of data processing are improved. Meanwhile, the big data technology can further carry out deep mining on the data, find potential values and rules in the data, and provide support for decision making.
In existing data management systems, while some systems have attempted to apply artificial intelligence and large data technologies, there are still some challenges and problems. For example, how to design efficient algorithms to cope with complex data processing tasks is a problem we need to face.
In view of the shortcomings of the prior art, there is a need for an intelligent data management system and method that uses artificial intelligence and big data techniques to solve the above problems.
Disclosure of Invention
The invention aims to provide an intelligent data management system and method applying artificial intelligence and big data technology, which solve the technical problem that no effective data management system exists for improving data management efficiency in the existing scheme.
The aim of the invention can be achieved by the following technical scheme:
The intelligent data management system applying the artificial intelligence and big data technology comprises a data management platform, wherein the data management platform is in communication connection with an acquisition module, a screening module, a classification module and a storage module:
The acquisition module is used for processing and analyzing the data uploaded by the sending end, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, and if not, the data sent by the sending end is refused to be received;
The screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely calculating regression distances of the data from different classification dimension centers on different classification dimension feature layers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, transmitting the data to the classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform carries out deleting processing on the data, wherein the preset distance values are the sum of minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirements are met;
The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
the storage module is used for storing the classified data.
Further, the acquiring the risk coefficient of the sending end in the management period specifically includes the following steps:
Acquiring and analyzing risk index information of a transmitting end in a management period to obtain data encryption intensity Packet loss rateDelay timeConnection stability rate;
Encryption strength of dataPacket loss rateDelay timeConnection stability rateSubstituting the risk coefficient calculation formula to obtain a risk coefficientThe risk coefficient calculation formula is as follows:
;
wherein, ,,The weight is given as the weight of the material,The specific value of the constant correction coefficient can be adjusted and set by a user according to the size of the data.
Further, the step of obtaining the data transmission coefficient of the transmitting end in the management period specifically includes the following steps:
The uploading speed of the data transmission of the transmitting end in the management period is detected in real time, uploading speed values corresponding to the detection points are obtained, uploading speed graphs are drawn according to the uploading speed values corresponding to the detection points, an integral value of the closed space is calculated based on the closed space surrounded by the preset uploading speed graphs and the uploading speed graphs, and the integral value is recorded as a data transmission coefficient.
Further, judging whether the data acquisition requirement is met according to the risk coefficient and the data transmission coefficient specifically comprises the following steps:
The method comprises the steps of obtaining a risk coefficient corresponding to the data transmission coefficient through a storage module, obtaining the weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient, calculating the product of the weight corresponding to the risk coefficient and the product of the weight corresponding to the data transmission coefficient, marking the sum of the two products as a comparison value, judging whether the comparison value is larger than a preset acquisition requirement value, if so, judging that the data meets the acquisition requirement, and if not, judging that the data does not meet the acquisition requirement.
Further, calculating the first classification score of the data based on the regression distance specifically includes the following:
Obtaining regression distance Wherein the number of layers of the classification dimension feature layer is 4,Regression distances of the data distance classification dimension feature layers are respectively;
Calculating a first classification score according to a data proximity calculation formula The data proximity calculation formula is as follows:
;
wherein, Indicating retrieval return distanceAnd regression distanceIs set to be a minimum value of (c),Retrieving return distanceAnd regression distanceIs set to be a minimum value of (c),Indicating retrieval return distanceAnd regression distanceIs set at the maximum value of (c),Retrieving return distanceAnd regression distanceIs a maximum value of (a).
Further, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score specifically includes the following steps:
Based on Obtaining classification confidence of classification branch prediction;
Multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score;
Classifying the data according to the second classification score:
creating a reference cluster Setting the range radius of the reference cluster according to the reference score corresponding to the range radiusConstructing a neighborhood in a plane space, scattering data into the plane space according to a second fraction, and locating the second class fraction in a reference clusterData storage set in neighborhoodIn which the second class score is not in the reference clusterData storage set in neighborhoodIs a kind of medium.
Further, creating reference clusters based on k-means algorithm
An intelligent data management method applying artificial intelligence and big data technology, the method comprises the following steps:
S1, processing and analyzing the data uploaded by the sending end, namely generating a management period by taking the moment corresponding to the data uploading request generated by the receiving and sending end as a starting point, acquiring a risk coefficient and a data transmission coefficient of the sending end in the management period, judging whether the data acquisition request is met according to the risk coefficient and the data transmission coefficient, if so, receiving the data sent by the sending end, transmitting the data to a screening module, and if not, refusing to receive the data sent by the sending end;
s2, receiving the transmitted data and screening the data, wherein the classified dimension feature layers are generated according to similarity feature values when the data correspond to different classifications by calculating regression distances of the data from different classified dimension centers, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, the data are transmitted to a classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform deletes the data, wherein the preset distance values are the sum of minimum distance values of the data on each classified dimension feature layer under the condition that the classification requirements are met;
s3, receiving the data transmitted by the screening module and carrying out classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
And storing the classified data.
Compared with the prior art, the invention has the beneficial effects that:
On one hand, the system is used for processing and analyzing the data uploaded by the sending end by the acquisition module, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, if not, the data sent by the sending end is refused to be received, when the data with a large number of complex data are faced, some data which do not meet the requirements can be intercepted through the risk coefficient and the data transmission coefficient, the calculation force of the data management system is saved, and the management efficiency of the data management system is further improved.
And on the other hand, the screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely, calculating the regression distance of each position of the data from different classification dimension centers on different classification dimension feature layers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distance exceeds a preset distance value, if so, the data meets the classification requirement, transmitting the data to the classification module, and if not, refusing to transmit the data to the classification module, and carrying out deleting processing on the data by the data management platform, wherein the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirement is met, and receiving the data transmitted by the acquisition module based on the screening module and carrying out screening processing, so that the preliminary processing on the data can be realized quickly, and the data processing time can be saved.
The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
the storage module is used for storing the classified data.
Finally, the data can be classified rapidly through the classification and storage module, and the efficiency of data management is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1 is a system block diagram of an intelligent data management system employing artificial intelligence and big data techniques in accordance with an embodiment of the present invention;
FIG. 2 is a workflow diagram of an intelligent data management system employing artificial intelligence and big data techniques in accordance with an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, steps, etc. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
The embodiment provides an intelligent data management system applying artificial intelligence and big data technology, and fig. 1 is a system block diagram of the intelligent data management system applying artificial intelligence and big data technology in the embodiment of the invention, as shown in fig. 1, the system includes a data management platform, and the data management platform is communicatively connected with an acquisition module, a screening module, a classification module and a storage module:
The acquisition module is used for processing and analyzing the data uploaded by the sending end, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, and if not, the data sent by the sending end is refused to be received;
The screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely calculating regression distances between each position of the data on different classification dimension feature layers and different classification dimension centers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, transmitting the data to the classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform carries out deleting processing on the data, wherein the preset distance values are the sum of minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirements are met;
The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
the storage module is used for storing the classified data.
In summary, the system is used for processing and analyzing the data uploaded by the sending end through the acquisition module, judging whether the data acquisition requirement is met or not according to the risk coefficient and the data transmission coefficient, if yes, receiving the data sent by the sending end and transmitting the data to the screening module, receiving the data transmitted by the acquisition module and screening the data, judging whether the regression distance exceeds a preset distance value by calculating the regression distance of each position of the data from different classification dimension centers on different classification dimension feature layers, if yes, transmitting the data to the classification module, receiving the data transmitted by the screening module and classifying the data, namely, calculating a first classification score of the data based on the regression distance and multiplying the classification confidence of the classification branch prediction to obtain a second classification score, classifying the classified data according to the second classification score, and the storage module is used for storing the classified data and can rapidly classify the data and improve the data management efficiency.
In some embodiments, fig. 2 is a workflow diagram of an intelligent data management system applying artificial intelligence and big data technology according to an embodiment of the present invention, and as shown in fig. 2, the step of obtaining risk factors of a sender in a management period specifically includes the following steps:
step S201, acquiring and analyzing risk index information of a transmitting end in a management period to obtain data encryption intensity Packet loss rateDelay timeConnection stability rateWherein, the connection stability rateThe acquisition process of (1) may include:
method one, based on the calculation of the number of connection breaks
Recording the number of times of connection interruption:
The number of connection interruptions is recorded during the management period. This may be achieved by monitoring a weblog, a system log, or a dedicated connection monitoring tool.
The total number of connection attempts is recorded:
Also in the above period, the number of times all attempts to establish a connection are recorded. This includes successful connections and connections that have not been established for various reasons.
Calculating the connection stability rate:
connection stability rate= (total number of connection attempts-number of connection breaks)/total number of connection attempts x 100%.
Method II, calculating based on connection time length
Recording total connection time length:
During the management period, the total duration (in milliseconds seconds) of all successful connections is recorded.
Recording the connection interruption time length:
also in the above management period, the total duration of all connection interruptions is recorded.
Calculating the connection stability rate:
connection stability rate= (total connection duration-connection interruption duration)/total connection duration x 100%.
Step S202, encrypting the dataPacket loss rateDelay timeConnection stability rateSubstituting the risk coefficient calculation formula to obtain a risk coefficientThe risk coefficient calculation formula is as follows:
;
wherein, ,,The weight is given as the weight of the material,The specific value of the constant correction coefficient can be adjusted and set by a user according to the size of the data.
In some embodiments, the acquiring the data transmission coefficient of the transmitting end in the management period specifically includes the following procedures:
The uploading speed of the data transmission of the transmitting end in the management period is detected in real time, uploading speed values corresponding to the detection points are obtained, uploading speed graphs are drawn according to the uploading speed values corresponding to the detection points, an integral value of the closed space is calculated based on the closed space surrounded by the preset uploading speed graphs and the uploading speed graphs, and the integral value is recorded as a data transmission coefficient.
Further, judging whether the data acquisition requirement is met according to the risk coefficient and the data transmission coefficient specifically comprises the following steps:
The method comprises the steps of obtaining a risk coefficient corresponding to the data transmission coefficient through a storage module, obtaining the weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient, calculating the product of the weight corresponding to the risk coefficient and the product of the weight corresponding to the data transmission coefficient, marking the sum of the two products as a comparison value, judging whether the comparison value is larger than a preset acquisition requirement value, if so, judging that the data meets the acquisition requirement, and if not, judging that the data does not meet the acquisition requirement.
In some embodiments, calculating the first classification score of the data based on the regression distance specifically includes the following:
Obtaining regression distance Wherein the number of layers of the classification dimension feature layer is 4,Regression distances of the data distance classification dimension feature layers are respectively;
Calculating a first classification score according to a data proximity calculation formula The data proximity calculation formula is as follows:
;
wherein, Indicating retrieval return distanceAnd regression distanceIs set to be a minimum value of (c),Retrieving return distanceAnd regression distanceIs set to be a minimum value of (c),Indicating retrieval return distanceAnd regression distanceIs set at the maximum value of (c),Retrieving return distanceAnd regression distanceIs a maximum value of (a).
In some embodiments, the first classification score is multiplied by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score specifically includes the following steps:
Based on (Support Vector Machine: support vector machine) obtain classification confidence of classification branch predictions: Is a classification model based on statistical learning theory. It separates the different classes of samples by finding an optimal hyperplane. In SVM, branch prediction is typically accomplished by calculating the distance of an input sample from a hyperplane, with samples farther from the hyperplane typically having higher confidence.
Multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score;
Classifying the data according to the second classification score:
creating a reference cluster Setting the range radius of the reference cluster according to the reference score corresponding to the range radiusConstructing a neighborhood in a plane space, scattering data into the plane space according to a second fraction, and locating the second class fraction in a reference clusterData storage set in neighborhoodIn which the second class score is not in the reference clusterData storage set in neighborhoodIs a kind of medium.
Further, creating reference clusters based on k-means algorithm:
The K-means algorithm is a cluster algorithm for unsupervised learning for dividing data into K clusters. The following is a basic step of creating a reference cluster based on the K-means algorithm:
Initializing:
K initial centroids are selected. This may be selected by randomly selecting the K data points in the dataset as the initial centroid, or by some more complex heuristic method (e.g., K-means++).
Distribution clusters:
For each data point in the dataset, its distance to each centroid is calculated (e.g., using euclidean distance).
Data points are assigned to clusters corresponding to centroids closest to them.
Updating the centroid:
for each cluster, the mean (or median) of all data points within it is calculated and set to the new centroid.
Iteration:
steps 2 and 3 are repeated until a certain stop condition is met (e.g. the centroid changes less than a certain threshold, or a preset maximum number of iterations is reached).
And (3) outputting:
The final K centroids and the clusters corresponding to the centroid are the reference clusters based on the K-means algorithm.
The invention also provides an intelligent data management method applying the artificial intelligence and big data technology, which comprises the following steps:
S1, processing and analyzing the data uploaded by the sending end, namely generating a management period by taking the moment corresponding to the data uploading request generated by the receiving and sending end as a starting point, acquiring a risk coefficient and a data transmission coefficient of the sending end in the management period, judging whether the data acquisition request is met according to the risk coefficient and the data transmission coefficient, if so, receiving the data sent by the sending end, transmitting the data to a screening module, and if not, refusing to receive the data sent by the sending end;
s2, receiving the transmitted data and screening the data, wherein the classified dimension feature layers are generated according to similarity feature values when the data correspond to different classifications by calculating regression distances of the data from different classified dimension centers, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, the data are transmitted to a classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform deletes the data, wherein the preset distance values are the sum of minimum distance values of the data on each classified dimension feature layer under the condition that the classification requirements are met;
s3, receiving the data transmitted by the screening module and carrying out classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
And storing the classified data.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the elements is merely a division of some logic functions, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (5)

1.应用人工智能和大数据技术的智能化数据管理系统,其特征在于,系统包括数据管理平台,数据管理平台通信连接有采集模块、筛选模块、分类模块以及存储模块;1. An intelligent data management system using artificial intelligence and big data technology, characterized in that the system includes a data management platform, which is communicatively connected to a collection module, a screening module, a classification module and a storage module; 采集模块用于对发送端上传的数据进行处理分析:以接收发送端生成上传数据请求对应的时刻为起点生成管理周期,获取管理周期内发送端的风险系数和数据传输系数,根据风险系数和数据传输系数判断是否满足数据采集要求,若是,接收发送端发送的数据并将数据传输至筛选模块,若否,拒绝接收发送端发送的数据;其中,获取管理周期内发送端的风险系数具体包括以下过程:The acquisition module is used to process and analyze the data uploaded by the sender: the management cycle is generated from the time when the sender generates the upload data request as the starting point, the risk coefficient and data transmission coefficient of the sender within the management cycle are obtained, and the data collection requirements are determined based on the risk coefficient and data transmission coefficient. If so, the data sent by the sender is received and transmitted to the screening module. If not, the data sent by the sender is rejected. The specific process of obtaining the risk coefficient of the sender within the management cycle includes the following: 获取管理周期内发送端的风险指标信息并进行解析,得到数据加密强度、丢包率、延迟时间以及连接稳定率Obtain and parse the risk indicator information of the sender within the management cycle to obtain the data encryption strength , Packet loss rate , Delay time And the connection stability rate ; 将数据加密强度、丢包率、延迟时间以及连接稳定率代入风险系数计算公式得到风险系数,该风险系数计算公式如下:Encryption strength of data , Packet loss rate , Delay time And the connection stability rate Substitute the risk coefficient calculation formula to get the risk coefficient , the risk factor calculation formula is as follows: ; 其中,为权重,为常数修正系数,其具体值可依据数据的大小,由用户调整设置;in, , , is the weight, It is a constant correction coefficient, and its specific value can be adjusted and set by the user according to the size of the data; 获取管理周期内发送端的数据传输系数具体包括以下过程:The specific process of obtaining the data transmission coefficient of the sender during the management period includes the following steps: 对管理周期内发送端进行数据传输的上传速度进行实时检测,获取检测点对应的上传速度值,并根据检测点对应的上传速度值进行绘制上传速度曲线图,基于预设上传速度曲线图和上传速度曲线图所围成的封闭空间计算封闭空间的积分值,将该积分值记为数据传输系数;Perform real-time detection on the upload speed of data transmission at the sending end within the management period, obtain the upload speed value corresponding to the detection point, draw an upload speed curve according to the upload speed value corresponding to the detection point, calculate the integral value of the closed space based on the preset upload speed curve and the closed space enclosed by the upload speed curve, and record the integral value as the data transmission coefficient; 筛选模块用于接收采集模块传输的数据并进行筛选处理:通过计算数据在不同分类维度特征层上的位置距离不同分类维度中心的回归距离,其中,分类维度特征层是依据数据对应不同分类时的相似度特征值生成,判断回归距离是否超过预设距离值,若是,则数据满足分类要求,并将数据传输至分类模块,若否,拒绝将数据传输至分类模块,数据管理平台将数据进行删除处理,其中,预设距离值为数据在满足分类要求情况下在每个分类维度特征层上的最少距离值之和;The screening module is used to receive the data transmitted by the acquisition module and perform screening processing: by calculating the regression distance between the position of the data on different classification dimension feature layers and the center of different classification dimensions, where the classification dimension feature layer is generated based on the similarity feature value of the data corresponding to different classifications, and judging whether the regression distance exceeds the preset distance value. If so, the data meets the classification requirements and the data is transmitted to the classification module. If not, the data is refused to be transmitted to the classification module, and the data management platform deletes the data, where the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer when the classification requirements are met; 分类模块用于接收筛选模块传输的数据并进行分类处理:基于回归距离计算数据的第一分类分数,并将第一分类分数与分类分支预测的分类置信度相乘得到第二分类分数,根据第二分类分数对数据进行分类:The classification module is used to receive the data transmitted by the screening module and perform classification processing: the first classification score of the data is calculated based on the regression distance, and the first classification score is multiplied by the classification confidence predicted by the classification branch to obtain the second classification score, and the data is classified according to the second classification score: 基于获取分类分支预测的分类置信度;based on Get the classification confidence of the classification branch prediction; 将第一分类分数与分类分支预测的分类置信度相乘得到第二分类分数;Multiply the first classification score by the classification confidence predicted by the classification branch to obtain a second classification score; 根据第二分类分数对数据进行分类:Classify the data based on the second classification score: 创建一个基准簇以及其对应的基准分数,设定基准簇的范围半径,在平面空间中构建邻域,将数据按照第二分数大小进行散落在平面空间里,将第二类分数在基准簇邻域内的数据保存在集合中,将第二类分数不在基准簇邻域内的数据保存在集合中;Create a benchmark cluster And its corresponding benchmark score, set the range radius of the benchmark cluster , construct neighborhoods in the plane space, scatter the data in the plane space according to the size of the second score, and place the second category score in the benchmark cluster The data in the neighborhood is stored in the collection In the example, the second category scores that are not in the benchmark cluster The data in the neighborhood is stored in the collection middle; 存储模块用于将分类后的数据进行存储。The storage module is used to store the classified data. 2.根据权利要求1所述的应用人工智能和大数据技术的智能化数据管理系统,其特征在于,根据风险系数和数据传输系数判断是否满足数据采集要求具体包括以下过程:2. The intelligent data management system using artificial intelligence and big data technology according to claim 1 is characterized in that judging whether the data collection requirements are met according to the risk coefficient and the data transmission coefficient specifically includes the following process: 通过存储模块获取的风险系数对应的权重、数据传输系数对应的权重;计算风险系数与风险系数对应的权重的乘积、数据传输系数与数据传输系数对应的权重的乘积,将两个乘积之和记为比较值;判断比较值是否大于预设采集要求值,若是,则判定该数据满足采集要求,若否,则判定该数据不满足采集要求。The weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient are obtained through the storage module; the product of the risk coefficient and the weight corresponding to the risk coefficient, and the product of the data transmission coefficient and the weight corresponding to the data transmission coefficient are calculated, and the sum of the two products is recorded as the comparison value; it is determined whether the comparison value is greater than the preset collection requirement value, if so, it is determined that the data meets the collection requirements, if not, it is determined that the data does not meet the collection requirements. 3.根据权利要求1所述的应用人工智能和大数据技术的智能化数据管理系统,其特征在于,基于回归距离计算数据的第一分类分数具体包括以下过程:3. The intelligent data management system using artificial intelligence and big data technology according to claim 1 is characterized in that calculating the first classification score of data based on regression distance specifically includes the following process: 获取回归距离,其中,分类维度特征层的层数为4层,分别为数据距离分类维度特征层的回归距离;Get the regression distance , , , , where the number of layers of the classification dimension feature layer is 4, , , , They are the regression distances of the feature layer of the data distance classification dimension; 根据数据接近程度计算公式计算第一分类分数,该数据接近程度计算公式如下:Calculate the first classification score based on the data proximity calculation formula , the calculation formula of the data closeness is as follows: ; 其中,表示取回归距离和回归距离的最小值,取回归距离和回归距离的最小值,表示取回归距离和回归距离的最大值,取回归距离和回归距离的最大值。in, Represents the regression distance and regression distance The minimum value of Take the regression distance and regression distance The minimum value of Represents the regression distance and regression distance The maximum value of Take the regression distance and regression distance The maximum value of . 4.根据权利要求1所述的应用人工智能和大数据技术的智能化数据管理系统,其特征在于,基于k-means算法进行创建基准簇4. The intelligent data management system using artificial intelligence and big data technology according to claim 1, characterized in that the benchmark clusters are created based on the k-means algorithm . 5.应用人工智能和大数据技术的智能化数据管理方法,其特征在于,方法包括以下步骤:5. An intelligent data management method using artificial intelligence and big data technology, characterized in that the method comprises the following steps: S1:对发送端上传的数据进行处理分析:以接收发送端生成上传数据请求对应的时刻为起点生成管理周期,获取管理周期内发送端的风险系数和数据传输系数,根据风险系数和数据传输系数判断是否满足数据采集要求,若是,接收发送端发送的数据并将数据传输至筛选模块,若否,拒绝接收发送端发送的数据;其中,获取管理周期内发送端的风险系数具体包括以下过程:S1: Process and analyze the data uploaded by the sender: Generate a management cycle starting from the time when the sender generates a request to upload data, obtain the risk coefficient and data transmission coefficient of the sender within the management cycle, and judge whether the data collection requirements are met according to the risk coefficient and data transmission coefficient. If so, receive the data sent by the sender and transmit the data to the screening module. If not, refuse to receive the data sent by the sender; wherein, obtaining the risk coefficient of the sender within the management cycle specifically includes the following process: 获取管理周期内发送端的风险指标信息并进行解析,得到数据加密强度、丢包率、延迟时间以及连接稳定率Obtain and parse the risk indicator information of the sender within the management cycle to obtain the data encryption strength , Packet loss rate , Delay time And the connection stability rate ; 将数据加密强度、丢包率、延迟时间以及连接稳定率代入风险系数计算公式得到风险系数,该风险系数计算公式如下:Encryption strength of data , Packet loss rate , Delay time And the connection stability rate Substitute the risk coefficient calculation formula to get the risk coefficient , the risk factor calculation formula is as follows: ; 其中,为权重,为常数修正系数,其具体值可依据数据的大小,由用户调整设置;in, , , is the weight, It is a constant correction coefficient, and its specific value can be adjusted and set by the user according to the size of the data; 获取管理周期内发送端的数据传输系数具体包括以下过程:The specific process of obtaining the data transmission coefficient of the sender during the management period includes the following steps: 对管理周期内发送端进行数据传输的上传速度进行实时检测,获取检测点对应的上传速度值,并根据检测点对应的上传速度值进行绘制上传速度曲线图,基于预设上传速度曲线图和上传速度曲线图所围成的封闭空间计算封闭空间的积分值,将该积分值记为数据传输系数;Perform real-time detection on the upload speed of data transmission at the sending end within the management period, obtain the upload speed value corresponding to the detection point, draw an upload speed curve according to the upload speed value corresponding to the detection point, calculate the integral value of the closed space based on the preset upload speed curve and the closed space enclosed by the upload speed curve, and record the integral value as the data transmission coefficient; S2:接收传输的数据并进行筛选处理:通过计算数据在不同分类维度特征层上的位置距离不同分类维度中心的回归距离,其中,分类维度特征层是依据数据对应不同分类时的相似度特征值生成,判断回归距离是否超过预设距离值,若是,则数据满足分类要求,并将数据传输至分类模块,若否,拒绝将数据传输至分类模块,数据管理平台将数据进行删除处理,其中,预设距离值为数据在满足分类要求情况下在每个分类维度特征层上的最少距离值之和;S2: Receive the transmitted data and perform screening processing: by calculating the regression distance between the position of the data on different classification dimension feature layers and the center of different classification dimensions, where the classification dimension feature layer is generated based on the similarity feature value of the data corresponding to different classifications, determine whether the regression distance exceeds the preset distance value, if so, the data meets the classification requirements, and the data is transmitted to the classification module, if not, refuse to transmit the data to the classification module, and the data management platform deletes the data, where the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer when the classification requirements are met; S3:接收筛选模块传输的数据并进行分类处理:基于回归距离计算数据的第一分类分数,并将第一分类分数与分类分支预测的分类置信度相乘得到第二分类分数,根据第二分类分数对数据进行分类:S3: Receive the data transmitted by the screening module and perform classification processing: calculate the first classification score of the data based on the regression distance, multiply the first classification score by the classification confidence predicted by the classification branch to obtain the second classification score, and classify the data according to the second classification score: 基于获取分类分支预测的分类置信度;based on Get the classification confidence of the classification branch prediction; 将第一分类分数与分类分支预测的分类置信度相乘得到第二分类分数;Multiply the first classification score by the classification confidence predicted by the classification branch to obtain a second classification score; 根据第二分类分数对数据进行分类:Classify the data based on the second classification score: 创建一个基准簇以及其对应的基准分数,设定基准簇的范围半径,在平面空间中构建邻域,将数据按照第二分数大小进行散落在平面空间里,将第二类分数在基准簇邻域内的数据保存在集合中,将第二类分数不在基准簇邻域内的数据保存在集合中;Create a benchmark cluster And its corresponding benchmark score, set the range radius of the benchmark cluster , construct neighborhoods in the plane space, scatter the data in the plane space according to the size of the second score, and place the second category score in the benchmark cluster The data in the neighborhood is stored in the collection In the example, the second category scores that are not in the benchmark cluster The data in the neighborhood is stored in the collection middle; 将分类后的数据进行存储。The classified data is stored.
CN202410485224.XA 2024-04-22 2024-04-22 Intelligent data management system and method using artificial intelligence and big data technology Active CN118331948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410485224.XA CN118331948B (en) 2024-04-22 2024-04-22 Intelligent data management system and method using artificial intelligence and big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410485224.XA CN118331948B (en) 2024-04-22 2024-04-22 Intelligent data management system and method using artificial intelligence and big data technology

Publications (2)

Publication Number Publication Date
CN118331948A CN118331948A (en) 2024-07-12
CN118331948B true CN118331948B (en) 2024-12-10

Family

ID=91779316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410485224.XA Active CN118331948B (en) 2024-04-22 2024-04-22 Intelligent data management system and method using artificial intelligence and big data technology

Country Status (1)

Country Link
CN (1) CN118331948B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297959A (en) * 2021-05-24 2021-08-24 南京邮电大学 Target tracking method and system based on corner attention twin network
CN116010602A (en) * 2023-01-10 2023-04-25 孔祥山 Data optimization method and system based on big data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9378336B2 (en) * 2011-05-16 2016-06-28 Dacadoo Ag Optical data capture of exercise data in furtherance of a health score computation
CN117742605A (en) * 2023-12-21 2024-03-22 河南聚合科技有限公司 Storage operation platform based on block chain optimization of industrial big data distributed management

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297959A (en) * 2021-05-24 2021-08-24 南京邮电大学 Target tracking method and system based on corner attention twin network
CN116010602A (en) * 2023-01-10 2023-04-25 孔祥山 Data optimization method and system based on big data

Also Published As

Publication number Publication date
CN118331948A (en) 2024-07-12

Similar Documents

Publication Publication Date Title
CN103902570B (en) A kind of text classification feature extracting method, sorting technique and device
EP4020315A1 (en) Method, apparatus and system for determining label
CN111614491A (en) A method and system for selection of security situation assessment indicators for power monitoring systems
CN110147389B (en) Account processing method and device, storage medium and electronic device
CN111581258B (en) Security data analysis method, device, system, equipment and storage medium
CN112801783A (en) Entity identification method and device based on digital currency transaction characteristics
CN118331948B (en) Intelligent data management system and method using artificial intelligence and big data technology
CN112508363B (en) Power information system state analysis method and device based on deep learning
US20210241172A1 (en) Machine learning model compression system, pruning method, and computer program product
Mazlin et al. Utilization of filter feature selection with support vector machine for tumours classification
CN106530199B (en) Multimedia integration steganalysis method based on window type hypothesis testing
CN117709691A (en) Intelligent sub-packaging management method and system based on cloud service
CN117640171A (en) Method, system and storage medium for predicting network security attack of power system
Zhang Microsoft Malware Prediction Using LightGBM Model
CN110071845A (en) The method and device that a kind of pair of unknown applications are classified
CN111191233B (en) Macro virus processing method, device and storage medium
CN113888265A (en) Product recommendation method, device, equipment and computer-readable storage medium
CN118741983B (en) Heat dissipation and energy saving control method for liquid cooling room
CN111291182A (en) Hotspot event discovery method, device, equipment and storage medium
CN115827821B (en) A method and system for generating judgment strategies based on information information
Geçkil et al. Web page classification using deep learning with text and image-based analysis
CN118503685B (en) Device fingerprint extraction method and system based on device attributes and passive traffic characteristics
KR102492906B1 (en) Method and system for optimizing compression of in-memory storage based on reinforcement learning
Xiao Research on traffic attack behavior detection for IoT device security
JP2024535757A (en) Consolidation of binary files and extraction of feature information from binary files using wavelet signal processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20241119

Address after: No. 30, Huangluo Village, Aotou Town, Conghua District, Guangzhou City, Guangdong Province, 510000

Applicant after: Guangzhou Yechen Information Technology Co.,Ltd.

Country or region after: China

Address before: No. 30, Huangluo Village, Aotou Town, Conghua District, Guangzhou City, Guangdong Province, 510000

Applicant before: Guangzhou Yechen Information Technology Co.,Ltd.

Country or region before: China

Applicant before: GUANGZHOU SAIDU DETECTION SERVICE CO.,LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant