CN118331948B

CN118331948B - Intelligent data management system and method using artificial intelligence and big data technology

Info

Publication number: CN118331948B
Application number: CN202410485224.XA
Authority: CN
Inventors: 曾灶烟; 李树湖; 曾炽强; 叶婷; 曾幸钦; 叶海萍; 刘惠玲; 朱艳青; 董碧飞
Original assignee: Guangzhou Yechen Information Technology Co ltd
Current assignee: Guangzhou Yechen Information Technology Co ltd
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-12-10
Anticipated expiration: 2044-04-22
Also published as: CN118331948A

Abstract

The invention relates to the technical field of data management, in particular to an intelligent data management system and method applying artificial intelligence and big data technology. The system comprises a data management platform, wherein the data management platform is in communication connection with an acquisition module, a screening module, a classification module and a storage module, the acquisition module is used for processing and analyzing data uploaded by a sending end, the screening module is used for receiving the data transmitted by the acquisition module and screening the data, whether the regression distance exceeds a preset distance value or not is judged by calculating the regression distance of the position of the data on different classification dimension feature layers from different classification dimension centers, if yes, the data meets the classification requirement, the data is transmitted to the classification module, a first classification score of the data is calculated based on the regression distance, the first classification score is multiplied with classification confidence of classification branch prediction to obtain a second classification score, the data is classified according to the second classification score, and the data can be classified rapidly, so that the data management efficiency is improved.

Description

Intelligent data management system and method applying artificial intelligence and big data technology

Technical Field

The invention relates to the technical field of data management, in particular to an intelligent data management system and method applying artificial intelligence and big data technology.

Background

With the rapid development of the information age, data has become an important resource in modern society. In the face of massive amounts of data, how to efficiently collect, process, analyze, and utilize such data is a significant challenge for enterprises. The artificial intelligence and big data technology are taken as two popular technologies in the current information technology field, and a strong technical support is provided for intelligent data management.

The continuous development of artificial intelligence technology enables machines to simulate intelligent behaviors of human beings and has the capabilities of learning, reasoning, decision making and the like. In data management, the artificial intelligence technology can help the system to realize automatic processing and analysis of data, reduce manual intervention and improve processing efficiency. For example, through a machine learning algorithm, the system can automatically identify and classify data, and intelligent labeling and classification of the data are realized. In addition, the deep learning technology can also be applied to the feature extraction and pattern recognition of the data, and provides powerful support for the deep analysis of the data.

On the other hand, big data technology provides powerful storage and computing power for data management. The big data technology can store and inquire mass data efficiently, and supports real-time processing and analysis of data. Through the big data platform, distributed processing and parallel computing of data can be realized, and the speed and efficiency of data processing are improved. Meanwhile, the big data technology can further carry out deep mining on the data, find potential values and rules in the data, and provide support for decision making.

In existing data management systems, while some systems have attempted to apply artificial intelligence and large data technologies, there are still some challenges and problems. For example, how to design efficient algorithms to cope with complex data processing tasks is a problem we need to face.

In view of the shortcomings of the prior art, there is a need for an intelligent data management system and method that uses artificial intelligence and big data techniques to solve the above problems.

Disclosure of Invention

The invention aims to provide an intelligent data management system and method applying artificial intelligence and big data technology, which solve the technical problem that no effective data management system exists for improving data management efficiency in the existing scheme.

The aim of the invention can be achieved by the following technical scheme:

The intelligent data management system applying the artificial intelligence and big data technology comprises a data management platform, wherein the data management platform is in communication connection with an acquisition module, a screening module, a classification module and a storage module:

The acquisition module is used for processing and analyzing the data uploaded by the sending end, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, and if not, the data sent by the sending end is refused to be received;

The screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely calculating regression distances of the data from different classification dimension centers on different classification dimension feature layers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, transmitting the data to the classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform carries out deleting processing on the data, wherein the preset distance values are the sum of minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirements are met;

The classification module is used for receiving the data transmitted by the screening module and performing classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;

the storage module is used for storing the classified data.

Further, the acquiring the risk coefficient of the sending end in the management period specifically includes the following steps:

Acquiring and analyzing risk index information of a transmitting end in a management period to obtain data encryption intensity Packet loss rateDelay timeConnection stability rate;

Encryption strength of dataPacket loss rateDelay timeConnection stability rateSubstituting the risk coefficient calculation formula to obtain a risk coefficientThe risk coefficient calculation formula is as follows:

;

wherein, ,,The weight is given as the weight of the material,The specific value of the constant correction coefficient can be adjusted and set by a user according to the size of the data.

Further, the step of obtaining the data transmission coefficient of the transmitting end in the management period specifically includes the following steps:

The uploading speed of the data transmission of the transmitting end in the management period is detected in real time, uploading speed values corresponding to the detection points are obtained, uploading speed graphs are drawn according to the uploading speed values corresponding to the detection points, an integral value of the closed space is calculated based on the closed space surrounded by the preset uploading speed graphs and the uploading speed graphs, and the integral value is recorded as a data transmission coefficient.

Further, judging whether the data acquisition requirement is met according to the risk coefficient and the data transmission coefficient specifically comprises the following steps:

The method comprises the steps of obtaining a risk coefficient corresponding to the data transmission coefficient through a storage module, obtaining the weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient, calculating the product of the weight corresponding to the risk coefficient and the product of the weight corresponding to the data transmission coefficient, marking the sum of the two products as a comparison value, judging whether the comparison value is larger than a preset acquisition requirement value, if so, judging that the data meets the acquisition requirement, and if not, judging that the data does not meet the acquisition requirement.

Further, calculating the first classification score of the data based on the regression distance specifically includes the following:

Obtaining regression distance 、、、Wherein the number of layers of the classification dimension feature layer is 4,、、、Regression distances of the data distance classification dimension feature layers are respectively;

Calculating a first classification score according to a data proximity calculation formula The data proximity calculation formula is as follows:

;

wherein, Indicating retrieval return distanceAnd regression distanceIs set to be a minimum value of (c),Retrieving return distanceAnd regression distanceIs set to be a minimum value of (c),Indicating retrieval return distanceAnd regression distanceIs set at the maximum value of (c),Retrieving return distanceAnd regression distanceIs a maximum value of (a).

Further, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score specifically includes the following steps:

Based on Obtaining classification confidence of classification branch prediction;

Multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score;

Classifying the data according to the second classification score:

creating a reference cluster Setting the range radius of the reference cluster according to the reference score corresponding to the range radiusConstructing a neighborhood in a plane space, scattering data into the plane space according to a second fraction, and locating the second class fraction in a reference clusterData storage set in neighborhoodIn which the second class score is not in the reference clusterData storage set in neighborhoodIs a kind of medium.

Further, creating reference clusters based on k-means algorithm。

An intelligent data management method applying artificial intelligence and big data technology, the method comprises the following steps:

S1, processing and analyzing the data uploaded by the sending end, namely generating a management period by taking the moment corresponding to the data uploading request generated by the receiving and sending end as a starting point, acquiring a risk coefficient and a data transmission coefficient of the sending end in the management period, judging whether the data acquisition request is met according to the risk coefficient and the data transmission coefficient, if so, receiving the data sent by the sending end, transmitting the data to a screening module, and if not, refusing to receive the data sent by the sending end;

s2, receiving the transmitted data and screening the data, wherein the classified dimension feature layers are generated according to similarity feature values when the data correspond to different classifications by calculating regression distances of the data from different classified dimension centers, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, the data are transmitted to a classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform deletes the data, wherein the preset distance values are the sum of minimum distance values of the data on each classified dimension feature layer under the condition that the classification requirements are met;

s3, receiving the data transmitted by the screening module and carrying out classification processing, namely calculating a first classification score of the data based on the regression distance, multiplying the first classification score by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score;

And storing the classified data.

Compared with the prior art, the invention has the beneficial effects that:

On one hand, the system is used for processing and analyzing the data uploaded by the sending end by the acquisition module, wherein the moment corresponding to the data uploading request generated by the receiving and sending end is taken as a starting point to generate a management period, the risk coefficient and the data transmission coefficient of the sending end in the management period are obtained, whether the data acquisition request is met or not is judged according to the risk coefficient and the data transmission coefficient, if yes, the data sent by the sending end is received and transmitted to the screening module, if not, the data sent by the sending end is refused to be received, when the data with a large number of complex data are faced, some data which do not meet the requirements can be intercepted through the risk coefficient and the data transmission coefficient, the calculation force of the data management system is saved, and the management efficiency of the data management system is further improved.

And on the other hand, the screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely, calculating the regression distance of each position of the data from different classification dimension centers on different classification dimension feature layers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distance exceeds a preset distance value, if so, the data meets the classification requirement, transmitting the data to the classification module, and if not, refusing to transmit the data to the classification module, and carrying out deleting processing on the data by the data management platform, wherein the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirement is met, and receiving the data transmitted by the acquisition module based on the screening module and carrying out screening processing, so that the preliminary processing on the data can be realized quickly, and the data processing time can be saved.

the storage module is used for storing the classified data.

Finally, the data can be classified rapidly through the classification and storage module, and the efficiency of data management is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a system block diagram of an intelligent data management system employing artificial intelligence and big data techniques in accordance with an embodiment of the present invention;

FIG. 2 is a workflow diagram of an intelligent data management system employing artificial intelligence and big data techniques in accordance with an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, steps, etc. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The embodiment provides an intelligent data management system applying artificial intelligence and big data technology, and fig. 1 is a system block diagram of the intelligent data management system applying artificial intelligence and big data technology in the embodiment of the invention, as shown in fig. 1, the system includes a data management platform, and the data management platform is communicatively connected with an acquisition module, a screening module, a classification module and a storage module:

The screening module is used for receiving the data transmitted by the acquisition module and carrying out screening processing, namely calculating regression distances between each position of the data on different classification dimension feature layers and different classification dimension centers, wherein the classification dimension feature layers are generated according to similarity feature values when the data correspond to different classifications, judging whether the regression distances exceed preset distance values, if yes, the data meet classification requirements, transmitting the data to the classification module, if not, the data are refused to be transmitted to the classification module, and the data management platform carries out deleting processing on the data, wherein the preset distance values are the sum of minimum distance values of the data on each classification dimension feature layer under the condition that the classification requirements are met;

the storage module is used for storing the classified data.

In summary, the system is used for processing and analyzing the data uploaded by the sending end through the acquisition module, judging whether the data acquisition requirement is met or not according to the risk coefficient and the data transmission coefficient, if yes, receiving the data sent by the sending end and transmitting the data to the screening module, receiving the data transmitted by the acquisition module and screening the data, judging whether the regression distance exceeds a preset distance value by calculating the regression distance of each position of the data from different classification dimension centers on different classification dimension feature layers, if yes, transmitting the data to the classification module, receiving the data transmitted by the screening module and classifying the data, namely, calculating a first classification score of the data based on the regression distance and multiplying the classification confidence of the classification branch prediction to obtain a second classification score, classifying the classified data according to the second classification score, and the storage module is used for storing the classified data and can rapidly classify the data and improve the data management efficiency.

In some embodiments, fig. 2 is a workflow diagram of an intelligent data management system applying artificial intelligence and big data technology according to an embodiment of the present invention, and as shown in fig. 2, the step of obtaining risk factors of a sender in a management period specifically includes the following steps:

step S201, acquiring and analyzing risk index information of a transmitting end in a management period to obtain data encryption intensity Packet loss rateDelay timeConnection stability rateWherein, the connection stability rateThe acquisition process of (1) may include:

method one, based on the calculation of the number of connection breaks

Recording the number of times of connection interruption:

The number of connection interruptions is recorded during the management period. This may be achieved by monitoring a weblog, a system log, or a dedicated connection monitoring tool.

The total number of connection attempts is recorded:

Also in the above period, the number of times all attempts to establish a connection are recorded. This includes successful connections and connections that have not been established for various reasons.

Calculating the connection stability rate:

connection stability rate= (total number of connection attempts-number of connection breaks)/total number of connection attempts x 100%.

Method II, calculating based on connection time length

Recording total connection time length:

During the management period, the total duration (in milliseconds seconds) of all successful connections is recorded.

Recording the connection interruption time length:

also in the above management period, the total duration of all connection interruptions is recorded.

Calculating the connection stability rate:

connection stability rate= (total connection duration-connection interruption duration)/total connection duration x 100%.

Step S202, encrypting the dataPacket loss rateDelay timeConnection stability rateSubstituting the risk coefficient calculation formula to obtain a risk coefficientThe risk coefficient calculation formula is as follows:

;

In some embodiments, the acquiring the data transmission coefficient of the transmitting end in the management period specifically includes the following procedures:

In some embodiments, calculating the first classification score of the data based on the regression distance specifically includes the following:

;

In some embodiments, the first classification score is multiplied by the classification confidence of the classification branch prediction to obtain a second classification score, and classifying the data according to the second classification score specifically includes the following steps:

Based on (Support Vector Machine: support vector machine) obtain classification confidence of classification branch predictions: Is a classification model based on statistical learning theory. It separates the different classes of samples by finding an optimal hyperplane. In SVM, branch prediction is typically accomplished by calculating the distance of an input sample from a hyperplane, with samples farther from the hyperplane typically having higher confidence.

Classifying the data according to the second classification score:

Further, creating reference clusters based on k-means algorithm:

The K-means algorithm is a cluster algorithm for unsupervised learning for dividing data into K clusters. The following is a basic step of creating a reference cluster based on the K-means algorithm:

Initializing:

K initial centroids are selected. This may be selected by randomly selecting the K data points in the dataset as the initial centroid, or by some more complex heuristic method (e.g., K-means++).

Distribution clusters:

For each data point in the dataset, its distance to each centroid is calculated (e.g., using euclidean distance).

Data points are assigned to clusters corresponding to centroids closest to them.

Updating the centroid:

for each cluster, the mean (or median) of all data points within it is calculated and set to the new centroid.

Iteration:

steps 2 and 3 are repeated until a certain stop condition is met (e.g. the centroid changes less than a certain threshold, or a preset maximum number of iterations is reached).

And (3) outputting:

The final K centroids and the clusters corresponding to the centroid are the reference clusters based on the K-means algorithm.

The invention also provides an intelligent data management method applying the artificial intelligence and big data technology, which comprises the following steps:

And storing the classified data.

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the elements is merely a division of some logic functions, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. The storage medium includes various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An intelligent data management system using artificial intelligence and big data technology, characterized in that the system includes a data management platform, which is communicatively connected to a collection module, a screening module, a classification module and a storage module;

The acquisition module is used to process and analyze the data uploaded by the sender: the management cycle is generated from the time when the sender generates the upload data request as the starting point, the risk coefficient and data transmission coefficient of the sender within the management cycle are obtained, and the data collection requirements are determined based on the risk coefficient and data transmission coefficient. If so, the data sent by the sender is received and transmitted to the screening module. If not, the data sent by the sender is rejected. The specific process of obtaining the risk coefficient of the sender within the management cycle includes the following:

Obtain and parse the risk indicator information of the sender within the management cycle to obtain the data encryption strength , Packet loss rate , Delay time And the connection stability rate ;

Encryption strength of data , Packet loss rate , Delay time And the connection stability rate Substitute the risk coefficient calculation formula to get the risk coefficient , the risk factor calculation formula is as follows:

;

in, , , is the weight, It is a constant correction coefficient, and its specific value can be adjusted and set by the user according to the size of the data;

The specific process of obtaining the data transmission coefficient of the sender during the management period includes the following steps:

Perform real-time detection on the upload speed of data transmission at the sending end within the management period, obtain the upload speed value corresponding to the detection point, draw an upload speed curve according to the upload speed value corresponding to the detection point, calculate the integral value of the closed space based on the preset upload speed curve and the closed space enclosed by the upload speed curve, and record the integral value as the data transmission coefficient;

The screening module is used to receive the data transmitted by the acquisition module and perform screening processing: by calculating the regression distance between the position of the data on different classification dimension feature layers and the center of different classification dimensions, where the classification dimension feature layer is generated based on the similarity feature value of the data corresponding to different classifications, and judging whether the regression distance exceeds the preset distance value. If so, the data meets the classification requirements and the data is transmitted to the classification module. If not, the data is refused to be transmitted to the classification module, and the data management platform deletes the data, where the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer when the classification requirements are met;

The classification module is used to receive the data transmitted by the screening module and perform classification processing: the first classification score of the data is calculated based on the regression distance, and the first classification score is multiplied by the classification confidence predicted by the classification branch to obtain the second classification score, and the data is classified according to the second classification score:

based on Get the classification confidence of the classification branch prediction;

Multiply the first classification score by the classification confidence predicted by the classification branch to obtain a second classification score;

Classify the data based on the second classification score:

Create a benchmark cluster And its corresponding benchmark score, set the range radius of the benchmark cluster , construct neighborhoods in the plane space, scatter the data in the plane space according to the size of the second score, and place the second category score in the benchmark cluster The data in the neighborhood is stored in the collection In the example, the second category scores that are not in the benchmark cluster The data in the neighborhood is stored in the collection middle;

The storage module is used to store the classified data.

2. The intelligent data management system using artificial intelligence and big data technology according to claim 1 is characterized in that judging whether the data collection requirements are met according to the risk coefficient and the data transmission coefficient specifically includes the following process:

The weight corresponding to the risk coefficient and the weight corresponding to the data transmission coefficient are obtained through the storage module; the product of the risk coefficient and the weight corresponding to the risk coefficient, and the product of the data transmission coefficient and the weight corresponding to the data transmission coefficient are calculated, and the sum of the two products is recorded as the comparison value; it is determined whether the comparison value is greater than the preset collection requirement value, if so, it is determined that the data meets the collection requirements, if not, it is determined that the data does not meet the collection requirements.

3. The intelligent data management system using artificial intelligence and big data technology according to claim 1 is characterized in that calculating the first classification score of data based on regression distance specifically includes the following process:

Get the regression distance , , , , where the number of layers of the classification dimension feature layer is 4, , , , They are the regression distances of the feature layer of the data distance classification dimension;

Calculate the first classification score based on the data proximity calculation formula , the calculation formula of the data closeness is as follows:

;

in, Represents the regression distance and regression distance The minimum value of Take the regression distance and regression distance The minimum value of Represents the regression distance and regression distance The maximum value of Take the regression distance and regression distance The maximum value of .

4. The intelligent data management system using artificial intelligence and big data technology according to claim 1, characterized in that the benchmark clusters are created based on the k-means algorithm .

5. An intelligent data management method using artificial intelligence and big data technology, characterized in that the method comprises the following steps:

S1: Process and analyze the data uploaded by the sender: Generate a management cycle starting from the time when the sender generates a request to upload data, obtain the risk coefficient and data transmission coefficient of the sender within the management cycle, and judge whether the data collection requirements are met according to the risk coefficient and data transmission coefficient. If so, receive the data sent by the sender and transmit the data to the screening module. If not, refuse to receive the data sent by the sender; wherein, obtaining the risk coefficient of the sender within the management cycle specifically includes the following process:

;

S2: Receive the transmitted data and perform screening processing: by calculating the regression distance between the position of the data on different classification dimension feature layers and the center of different classification dimensions, where the classification dimension feature layer is generated based on the similarity feature value of the data corresponding to different classifications, determine whether the regression distance exceeds the preset distance value, if so, the data meets the classification requirements, and the data is transmitted to the classification module, if not, refuse to transmit the data to the classification module, and the data management platform deletes the data, where the preset distance value is the sum of the minimum distance values of the data on each classification dimension feature layer when the classification requirements are met;

S3: Receive the data transmitted by the screening module and perform classification processing: calculate the first classification score of the data based on the regression distance, multiply the first classification score by the classification confidence predicted by the classification branch to obtain the second classification score, and classify the data according to the second classification score:

Classify the data based on the second classification score:

The classified data is stored.