Disclosure of Invention
The invention aims to provide an intelligent data management system and method applying artificial intelligence and big data technology, which solve the technical problem that no effective data management system exists for improving data management efficiency in the existing scheme.
The aim of the invention can be achieved by the following technical scheme:
The intelligent data management system applying the artificial intelligence and big data technology comprises a data management platform, wherein the data management platform is in communication connection with an acquisition module, a screening module, a classification module and a storage module:
The acquisition module is used for processing and analyzing the data uploaded by the sending end, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, and if not, the data sent by the sending end is refused to be received;
The screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely calculating regression distances of the data from different classification dimension centers on different classification dimension feature layers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, transmitting the data to the classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform carries out deleting processing on the data, wherein the preset distance values are the sum of minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirements are met;
The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
the storage module is used for storing the classified data.
Further, the acquiring the risk coefficient of the sending end in the management period specifically includes the following steps:
Acquiring and analyzing risk index information of a transmitting end in a management period to obtain data encryption intensity Packet loss rateDelay timeConnection stability rate;
Encryption strength of dataPacket loss rateDelay timeConnection stability rateSubstituting the risk coefficient calculation formula to obtain a risk coefficientThe risk coefficient calculation formula is as follows:
;
wherein, ,,The weight is given as the weight of the material,The specific value of the constant correction coefficient can be adjusted and set by a user according to the size of the data.
Further, the step of obtaining the data transmission coefficient of the transmitting end in the management period specifically includes the following steps:
The uploading speed of the data transmission of the transmitting end in the management period is detected in real time, uploading speed values corresponding to the detection points are obtained, uploading speed graphs are drawn according to the uploading speed values corresponding to the detection points, an integral value of the closed space is calculated based on the closed space surrounded by the preset uploading speed graphs and the uploading speed graphs, and the integral value is recorded as a data transmission coefficient.
Further, judging whether the data acquisition requirement is met according to the risk coefficient and the data transmission coefficient specifically comprises the following steps:
The method comprises the steps of obtaining a risk coefficient corresponding to the data transmission coefficient through a storage module, obtaining the weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient, calculating the product of the weight corresponding to the risk coefficient and the product of the weight corresponding to the data transmission coefficient, marking the sum of the two products as a comparison value, judging whether the comparison value is larger than a preset acquisition requirement value, if so, judging that the data meets the acquisition requirement, and if not, judging that the data does not meet the acquisition requirement.
Further, calculating the first classification score of the data based on the regression distance specifically includes the following:
Obtaining regression distance 、、、Wherein the number of layers of the classification dimension feature layer is 4,、、、Regression distances of the data distance classification dimension feature layers are respectively;
Calculating a first classification score according to a data proximity calculation formula The data proximity calculation formula is as follows:
;
wherein, Indicating retrieval return distanceAnd regression distanceIs set to be a minimum value of (c),Retrieving return distanceAnd regression distanceIs set to be a minimum value of (c),Indicating retrieval return distanceAnd regression distanceIs set at the maximum value of (c),Retrieving return distanceAnd regression distanceIs a maximum value of (a).
Further, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score specifically includes the following steps:
Based on Obtaining classification confidence of classification branch prediction;
Multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score;
Classifying the data according to the second classification score:
creating a reference cluster Setting the range radius of the reference cluster according to the reference score corresponding to the range radiusConstructing a neighborhood in a plane space, scattering data into the plane space according to a second fraction, and locating the second class fraction in a reference clusterData storage set in neighborhoodIn which the second class score is not in the reference clusterData storage set in neighborhoodIs a kind of medium.
Further, creating reference clusters based on k-means algorithm。
An intelligent data management method applying artificial intelligence and big data technology, the method comprises the following steps:
S1, processing and analyzing the data uploaded by the sending end, namely generating a management period by taking the moment corresponding to the data uploading request generated by the receiving and sending end as a starting point, acquiring a risk coefficient and a data transmission coefficient of the sending end in the management period, judging whether the data acquisition request is met according to the risk coefficient and the data transmission coefficient, if so, receiving the data sent by the sending end, transmitting the data to a screening module, and if not, refusing to receive the data sent by the sending end;
s2, receiving the transmitted data and screening the data, wherein the classified dimension feature layers are generated according to similarity feature values when the data correspond to different classifications by calculating regression distances of the data from different classified dimension centers, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, the data are transmitted to a classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform deletes the data, wherein the preset distance values are the sum of minimum distance values of the data on each classified dimension feature layer under the condition that the classification requirements are met;
s3, receiving the data transmitted by the screening module and carrying out classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
And storing the classified data.
Compared with the prior art, the invention has the beneficial effects that:
On one hand, the system is used for processing and analyzing the data uploaded by the sending end by the acquisition module, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, if not, the data sent by the sending end is refused to be received, when the data with a large number of complex data are faced, some data which do not meet the requirements can be intercepted through the risk coefficient and the data transmission coefficient, the calculation force of the data management system is saved, and the management efficiency of the data management system is further improved.
And on the other hand, the screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely, calculating the regression distance of each position of the data from different classification dimension centers on different classification dimension feature layers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distance exceeds a preset distance value, if so, the data meets the classification requirement, transmitting the data to the classification module, and if not, refusing to transmit the data to the classification module, and carrying out deleting processing on the data by the data management platform, wherein the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirement is met, and receiving the data transmitted by the acquisition module based on the screening module and carrying out screening processing, so that the preliminary processing on the data can be realized quickly, and the data processing time can be saved.
The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
the storage module is used for storing the classified data.
Finally, the data can be classified rapidly through the classification and storage module, and the efficiency of data management is improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, steps, etc. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.
The embodiment provides an intelligent data management system applying artificial intelligence and big data technology, and fig. 1 is a system block diagram of the intelligent data management system applying artificial intelligence and big data technology in the embodiment of the invention, as shown in fig. 1, the system includes a data management platform, and the data management platform is communicatively connected with an acquisition module, a screening module, a classification module and a storage module:
The acquisition module is used for processing and analyzing the data uploaded by the sending end, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, and if not, the data sent by the sending end is refused to be received;
The screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely calculating regression distances between each position of the data on different classification dimension feature layers and different classification dimension centers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, transmitting the data to the classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform carries out deleting processing on the data, wherein the preset distance values are the sum of minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirements are met;
The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
the storage module is used for storing the classified data.
In summary, the system is used for processing and analyzing the data uploaded by the sending end through the acquisition module, judging whether the data acquisition requirement is met or not according to the risk coefficient and the data transmission coefficient, if yes, receiving the data sent by the sending end and transmitting the data to the screening module, receiving the data transmitted by the acquisition module and screening the data, judging whether the regression distance exceeds a preset distance value by calculating the regression distance of each position of the data from different classification dimension centers on different classification dimension feature layers, if yes, transmitting the data to the classification module, receiving the data transmitted by the screening module and classifying the data, namely, calculating a first classification score of the data based on the regression distance and multiplying the classification confidence of the classification branch prediction to obtain a second classification score, classifying the classified data according to the second classification score, and the storage module is used for storing the classified data and can rapidly classify the data and improve the data management efficiency.
In some embodiments, fig. 2 is a workflow diagram of an intelligent data management system applying artificial intelligence and big data technology according to an embodiment of the present invention, and as shown in fig. 2, the step of obtaining risk factors of a sender in a management period specifically includes the following steps:
step S201, acquiring and analyzing risk index information of a transmitting end in a management period to obtain data encryption intensity Packet loss rateDelay timeConnection stability rateWherein, the connection stability rateThe acquisition process of (1) may include:
method one, based on the calculation of the number of connection breaks
Recording the number of times of connection interruption:
The number of connection interruptions is recorded during the management period. This may be achieved by monitoring a weblog, a system log, or a dedicated connection monitoring tool.
The total number of connection attempts is recorded:
Also in the above period, the number of times all attempts to establish a connection are recorded. This includes successful connections and connections that have not been established for various reasons.
Calculating the connection stability rate:
connection stability rate= (total number of connection attempts-number of connection breaks)/total number of connection attempts x 100%.
Method II, calculating based on connection time length
Recording total connection time length:
During the management period, the total duration (in milliseconds seconds) of all successful connections is recorded.
Recording the connection interruption time length:
also in the above management period, the total duration of all connection interruptions is recorded.
Calculating the connection stability rate:
connection stability rate= (total connection duration-connection interruption duration)/total connection duration x 100%.
Step S202, encrypting the dataPacket loss rateDelay timeConnection stability rateSubstituting the risk coefficient calculation formula to obtain a risk coefficientThe risk coefficient calculation formula is as follows:
;
wherein, ,,The weight is given as the weight of the material,The specific value of the constant correction coefficient can be adjusted and set by a user according to the size of the data.
In some embodiments, the acquiring the data transmission coefficient of the transmitting end in the management period specifically includes the following procedures:
The uploading speed of the data transmission of the transmitting end in the management period is detected in real time, uploading speed values corresponding to the detection points are obtained, uploading speed graphs are drawn according to the uploading speed values corresponding to the detection points, an integral value of the closed space is calculated based on the closed space surrounded by the preset uploading speed graphs and the uploading speed graphs, and the integral value is recorded as a data transmission coefficient.
Further, judging whether the data acquisition requirement is met according to the risk coefficient and the data transmission coefficient specifically comprises the following steps:
The method comprises the steps of obtaining a risk coefficient corresponding to the data transmission coefficient through a storage module, obtaining the weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient, calculating the product of the weight corresponding to the risk coefficient and the product of the weight corresponding to the data transmission coefficient, marking the sum of the two products as a comparison value, judging whether the comparison value is larger than a preset acquisition requirement value, if so, judging that the data meets the acquisition requirement, and if not, judging that the data does not meet the acquisition requirement.
In some embodiments, calculating the first classification score of the data based on the regression distance specifically includes the following:
Obtaining regression distance 、、、Wherein the number of layers of the classification dimension feature layer is 4,、、、Regression distances of the data distance classification dimension feature layers are respectively;
Calculating a first classification score according to a data proximity calculation formula The data proximity calculation formula is as follows:
;
wherein, Indicating retrieval return distanceAnd regression distanceIs set to be a minimum value of (c),Retrieving return distanceAnd regression distanceIs set to be a minimum value of (c),Indicating retrieval return distanceAnd regression distanceIs set at the maximum value of (c),Retrieving return distanceAnd regression distanceIs a maximum value of (a).
In some embodiments, the first classification score is multiplied by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score specifically includes the following steps:
Based on (Support Vector Machine: support vector machine) obtain classification confidence of classification branch predictions: Is a classification model based on statistical learning theory. It separates the different classes of samples by finding an optimal hyperplane. In SVM, branch prediction is typically accomplished by calculating the distance of an input sample from a hyperplane, with samples farther from the hyperplane typically having higher confidence.
Multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score;
Classifying the data according to the second classification score:
creating a reference cluster Setting the range radius of the reference cluster according to the reference score corresponding to the range radiusConstructing a neighborhood in a plane space, scattering data into the plane space according to a second fraction, and locating the second class fraction in a reference clusterData storage set in neighborhoodIn which the second class score is not in the reference clusterData storage set in neighborhoodIs a kind of medium.
Further, creating reference clusters based on k-means algorithm:
The K-means algorithm is a cluster algorithm for unsupervised learning for dividing data into K clusters. The following is a basic step of creating a reference cluster based on the K-means algorithm:
Initializing:
K initial centroids are selected. This may be selected by randomly selecting the K data points in the dataset as the initial centroid, or by some more complex heuristic method (e.g., K-means++).
Distribution clusters:
For each data point in the dataset, its distance to each centroid is calculated (e.g., using euclidean distance).
Data points are assigned to clusters corresponding to centroids closest to them.
Updating the centroid:
for each cluster, the mean (or median) of all data points within it is calculated and set to the new centroid.
Iteration:
steps 2 and 3 are repeated until a certain stop condition is met (e.g. the centroid changes less than a certain threshold, or a preset maximum number of iterations is reached).
And (3) outputting:
The final K centroids and the clusters corresponding to the centroid are the reference clusters based on the K-means algorithm.
The invention also provides an intelligent data management method applying the artificial intelligence and big data technology, which comprises the following steps:
S1, processing and analyzing the data uploaded by the sending end, namely generating a management period by taking the moment corresponding to the data uploading request generated by the receiving and sending end as a starting point, acquiring a risk coefficient and a data transmission coefficient of the sending end in the management period, judging whether the data acquisition request is met according to the risk coefficient and the data transmission coefficient, if so, receiving the data sent by the sending end, transmitting the data to a screening module, and if not, refusing to receive the data sent by the sending end;
s2, receiving the transmitted data and screening the data, wherein the classified dimension feature layers are generated according to similarity feature values when the data correspond to different classifications by calculating regression distances of the data from different classified dimension centers, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, the data are transmitted to a classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform deletes the data, wherein the preset distance values are the sum of minimum distance values of the data on each classified dimension feature layer under the condition that the classification requirements are met;
s3, receiving the data transmitted by the screening module and carrying out classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;
And storing the classified data.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the elements is merely a division of some logic functions, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.