CN118035864B

CN118035864B - Data security transmission method and system based on intelligent cloud terminal

Info

Publication number: CN118035864B
Application number: CN202410436938.1A
Authority: CN
Inventors: 杨坚; 刘俊
Original assignee: Shenzhen Yuanji Technology Co ltd
Current assignee: Shenzhen Yuanji Technology Co ltd
Priority date: 2024-04-12
Filing date: 2024-04-12
Publication date: 2024-06-14
Anticipated expiration: 2044-04-12
Also published as: CN118035864A

Abstract

The invention relates to the technical field of data security transmission, in particular to a data security transmission method and system based on an intelligent cloud terminal, comprising the following steps: collecting a normal sample and an invaded sample; hierarchical clustering is carried out on all the invaded samples to obtain a hierarchical clustering tree, and a feature cluster of the most preferable layer is determined; acquiring a normal sample and a feature vector of a feature cluster; obtaining a distinguishing feature vector according to the feature vector difference condition between the normal sample and the invaded sample of the feature cluster; obtaining a feature matching sequence according to the matching condition of the feature clusters among the feature vectors of the cluster clusters corresponding to the father nodes in the hierarchical cluster tree, and screening out the feature vector sequence; determining a unique characteristic value according to the change trend of the characteristic value in the characteristic vector sequence, and determining a base vector to combine the unique characteristic value to perform data interpolation to obtain a training sample; and training a classifier by using the training sample, and carrying out safe transmission on the data through the classifier. The invention improves the safety of the data transmission process.

Description

Data security transmission method and system based on intelligent cloud terminal

Technical Field

The invention relates to the technical field of data security transmission, in particular to a data security transmission method and system based on an intelligent cloud terminal.

Background

When carrying out real-time supervision and trouble shooting to the running condition of wisdom cloud terminal, need at first transmit the operation log data of wisdom cloud terminal to the high in the clouds, in order to guarantee that the operation log data of wisdom cloud terminal can carry out safe transmission operation, carry out classification safety processing operation to the operation log data through the classifier always to detect the potential invasion action in the operation process, and then can in time take corresponding measure prevention data leakage. However, if there is a large sample difference between the new intrusion behavior and the intrusion behavior in the training sample data of the classifier, the new intrusion behavior may not be accurately identified using the classifier. Therefore, in the training process of the classifier, a training sample can be obtained by utilizing a data enhancement technology so as to simulate a novel invasion behavior which does not appear, so that the detection effect of the classifier is better, and the safe transmission of data is ensured. The existing method is used for enhancing data usually through data interpolation, but the uniqueness of different invasion behaviors is not considered in the existing method, and the data interpolation is carried out through fixed density, so that the detection effect of a classifier is poor, and the safety of a data transmission process is low.

Disclosure of Invention

In order to solve the technical problems that the detection effect of a classifier is poor and the safety of a data transmission process is low due to the fact that the existing data enhancement method uses fixed density for data interpolation, the invention aims to provide a data safety transmission method and system based on an intelligent cloud terminal, and the adopted technical scheme is as follows:

In a first aspect, the present invention provides a data security transmission method based on a smart cloud terminal, including:

Collecting a running log sample of the intelligent cloud terminal, wherein the running log sample comprises a normal sample and an invaded sample; performing hierarchical clustering on all the invaded samples to obtain a hierarchical clustering tree; determining a characteristic cluster in a clustering result of the most preferred layer according to the difference between the data fluctuation and the overall data fluctuation of each cluster of each layer in the hierarchical cluster tree;

Extracting features of the normal samples to obtain feature vectors of the normal samples, and extracting features of the invaded samples in the feature cluster to obtain feature vectors of the feature cluster; obtaining distinguishing feature vectors between the normal sample and the feature cluster according to the feature vector difference condition between the normal sample and the invaded sample of the feature cluster;

Obtaining a feature matching sequence corresponding to each feature vector of the feature cluster according to the matching condition of the feature clusters among the feature vectors of the cluster clusters corresponding to the father nodes in the hierarchical cluster tree; screening out a feature vector sequence according to the similarity distribution between the feature matching sequence and the distinguishing feature vector corresponding to the same feature cluster;

Determining a unique characteristic value of the feature vector sequence according to the change trend of the feature value corresponding to each feature vector in the feature vector sequence, determining a base vector by using the feature vector sequence, and carrying out data interpolation by combining the unique characteristic value to obtain a training sample; and training a classifier by using the training sample, and carrying out safe transmission on the data through the classifier.

Preferably, the determining the feature cluster in the most preferred layer clustering result according to the difference between the data fluctuation and the overall data fluctuation of each cluster in each layer in the hierarchical clustering tree specifically includes:

Calculating variances of all invaded samples to be marked as first variances, calculating intra-class variances of each cluster of each layer in a hierarchical cluster tree to be marked as second variances of each cluster of each layer, marking the difference between the first variances and each second variance as a difference characteristic value of each cluster of each layer, marking the layer of the cluster corresponding to the maximum value of the difference characteristic value as the most preferred layer, and marking all clusters in the most preferred layer as characteristic clusters.

Preferably, the feature extraction of the normal sample to obtain a feature vector of the normal sample specifically includes:

Taking each normal sample as a row element of the matrix, and forming a normal matrix by all the normal samples; SVD (singular value decomposition) is carried out on the normal matrix to obtain a right singular matrix of the normal matrix, and a column vector of the right singular matrix of the normal matrix is used as a feature vector of a normal sample;

the method for extracting the characteristics of the invaded sample in the characteristic cluster to obtain the characteristic vector of the characteristic cluster specifically comprises the following steps:

For any one feature cluster, taking each invaded sample as a row element of a matrix, and forming the invaded matrix of the feature cluster by all the invaded samples in the feature cluster;

SVD decomposition is carried out on the invaded matrix of the feature cluster to obtain a right singular matrix of the invaded matrix, and a column vector of the right singular matrix of the invaded matrix is used as a feature vector of the invaded matrix of the feature cluster.

Preferably, the obtaining the distinguishing feature vector between the normal sample and the feature cluster according to the feature vector difference between the normal sample and the invaded sample of the feature cluster specifically includes:

marking any one feature cluster as a target feature cluster; taking each feature vector of the normal sample as a left node, and taking each feature vector of each invaded sample in the target feature cluster as a right node;

Obtaining one-to-one matching pairs between each left node and each right node by using a KM matching algorithm, and obtaining a characteristic value corresponding to each characteristic vector in each matching pair; for any matching pair, calculating a normalization value of the absolute value of the difference value between the characteristic values corresponding to the characteristic vectors in the matching pair to obtain a characteristic difference value of the matching pair;

Taking the average value of the feature vectors in the matched pair corresponding to the feature difference value larger than the preset difference threshold value as a distinguishing vector, and taking the average value vector of all the distinguishing vectors corresponding to the normal sample and the target feature cluster as the distinguishing feature vector between the normal sample and the target feature cluster.

Preferably, the obtaining a feature matching sequence corresponding to each feature vector of the feature cluster according to the matching condition of the feature clusters between feature vectors of the clusters corresponding to parent nodes in the hierarchical cluster tree specifically includes:

Marking any one feature cluster as a selected feature cluster, starting with the selected feature cluster in the hierarchical cluster tree, sequentially acquiring a feature class sequence of the selected feature cluster formed by all the cluster clusters comprising the selected feature cluster according to the sequence of gradually increasing layers, and marking each element in the feature class sequence as a comparison class;

Obtaining feature vectors of each category in the feature category sequence, and sequentially carrying out one-to-one matching on the feature vectors of each two adjacent comparison categories in the feature category sequence by utilizing a KM matching algorithm to obtain a vector group of each two adjacent comparison categories, wherein the vector group comprises the feature vectors in the corresponding two adjacent comparison categories;

And starting from each feature vector of the selected feature cluster, constructing a feature matching sequence corresponding to each feature vector of the selected feature cluster according to the arrangement sequence of the comparison categories in the feature category sequence, wherein every two adjacent feature vectors in the feature matching sequence belong to the same vector group.

Preferably, the screening the feature vector sequence according to the similarity distribution between the feature matching sequence and the distinguishing feature vectors corresponding to the same feature cluster specifically includes:

marking any one feature vector of the selected feature cluster as a selected feature vector, and calculating the average value of all feature vectors in a feature matching sequence corresponding to the selected feature vector to obtain a selected reference vector; taking cosine similarity between the selected reference vector and the distinguishing feature vector of the selected feature cluster as a reference coefficient of a feature matching sequence corresponding to the selected feature vector;

And marking the feature matching sequence corresponding to the reference coefficient larger than the preset similarity threshold value as a feature vector sequence of the selected feature cluster.

Preferably, the determining the unique characterization value of the feature vector sequence according to the variation trend of the feature value corresponding to each feature vector in the feature vector sequence specifically includes:

acquiring a characteristic value corresponding to each characteristic vector in each characteristic vector sequence, and aiming at any characteristic vector sequence;

Taking a feature value corresponding to each feature vector in the feature vector sequence as an ordinate of each feature vector, taking an order value of the feature vector in the feature vector sequence as an abscissa, obtaining a coordinate point of each feature vector in the feature vector sequence, and obtaining a projection value corresponding to each coordinate point by using a Principal Component Analysis (PCA) algorithm;

and taking the numerical value of the arctangent angle of the ratio of the ordinate and the abscissa of the coordinate point corresponding to the maximum value of the projection values as the unique characteristic value of the characteristic vector sequence.

Preferably, the determining a basis vector by using the feature vector sequence and performing data interpolation by combining the unique characteristic value to obtain a training sample specifically includes:

For any one of the feature vector sequences, taking the average value of all feature vectors in the feature vector sequences as a base vector, taking the unique characteristic value of the feature vector sequences corresponding to the base vector as the density of interpolation points in the direction of the base vector, performing bilinear interpolation to obtain incremental data, and taking the incremental data, an invaded sample and a normal sample as training samples.

Preferably, the hierarchical clustering of all the invaded samples to obtain a hierarchical clustering tree specifically includes: and performing bottom-up hierarchical clustering on all the invaded samples to obtain a hierarchical clustering tree.

In a second aspect, the present invention provides a data security transmission system based on a smart cloud terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the computer program when executed by the processor implements the steps of a data security transmission method based on a smart cloud terminal.

The embodiment of the invention has at least the following beneficial effects:

According to the invention, the invaded sample and the normal sample are collected firstly, and the large difference exists between different invaded samples in the data in consideration of the diversity of invasion behaviors, so that the invaded samples are required to be classified and analyzed firstly, and a hierarchical clustering tree and a clustering result with the most preferable layer with the best classifying effect are obtained, namely, the characteristic clustering cluster can accurately represent the invaded mode corresponding to each invaded sample. Then, the features of the normal sample and the invaded sample corresponding to each kind are extracted respectively, so that the distinguishing feature vectors of the normal sample and the feature of the feature cluster corresponding to each invaded mode can be obtained by comparing the difference conditions between the features of the normal sample and the features of the feature cluster corresponding to each invaded mode, and the distinguishing feature vectors can reflect the feature difference conditions between the normal sample and the invaded sample. Further, by analyzing the distribution condition of the father node of each feature cluster corresponding to the invaded mode in the hierarchical cluster tree, and combining the matching relation among the feature vectors to obtain a feature matching sequence, the feature matching sequence characterizes a feature vector set which is similar or matched with each feature vector along with the increase of the hierarchy, and then the feature matching sequence can be screened through the similarity distribution between the feature matching sequence and the distinguishing feature vector corresponding to the same feature cluster, so that a sequence which is similar to the distinguishing feature, namely a feature vector sequence, is obtained. Finally, analyzing the variation trend of the feature values in the feature vector sequences, quantifying the unique features of each feature vector sequence, constructing a new feature dimension space based on the unique features in the feature vector sequences, and carrying out interpolation construction of new samples by combining the unique feature values so as to simulate the unseen new intrusion behaviors, thereby realizing data enhancement, increasing the diversity of training samples, leading the detection effect of the classifier to be better, further leading the classifier to be more easy to adapt to and detect the intrusion behaviors of unknown types, and improving the safety of the data transmission process.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a step flowchart of a data security transmission method based on a smart cloud terminal according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a hierarchical clustering tree provided by an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of a specific implementation, structure, characteristics and effects of the data security transmission method and system based on the intelligent cloud terminal according to the invention with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The invention provides a data security transmission method and a data security transmission system based on an intelligent cloud terminal.

An embodiment of a data security transmission method based on a smart cloud terminal comprises the following steps:

Referring to fig. 1, a flowchart of a data security transmission method based on a smart cloud terminal according to an embodiment of the present invention is shown, where the method includes the following steps:

Step one, collecting a running log sample of an intelligent cloud terminal, wherein the running log sample comprises a normal sample and an invaded sample; performing hierarchical clustering on all the invaded samples to obtain a hierarchical clustering tree; and determining the characteristic cluster in the clustering result of the most preferred layer according to the difference between the data fluctuation and the overall data fluctuation of each cluster of each layer in the hierarchical clustering tree.

It should be noted that, the main purpose of this embodiment is to monitor the operation log data that may have intrusion behavior before transmitting the operation log data of the smart cloud terminal, so as to ensure the safe transmission process of the data. Based on the method, firstly, the operation log data in the intelligent cloud terminal system needs to be collected, then the operation log data is converted into vectors, the operation log data is conveniently classified and identified through a classifier, further the operation log data which is possibly invaded is obtained, and the data are encrypted and transmitted, so that the safety and confidentiality of transmission are guaranteed.

Based on the method, a running log sample of the intelligent cloud terminal is collected, wherein the running log sample comprises a normal sample and an invaded sample. Specifically, historical operation log data in the intelligent cloud terminal system is collected, wherein the historical operation log data comprise normal operation log data and invaded operation log data, a Word2Vec Word embedding model is adopted to convert each operation log data into vectors with the same length, each vector is labeled, the vector belonging to the normal operation log data is labeled 0, the vector belonging to the invaded operation log data is labeled 1, the vector of the normal operation log data can be called a normal sample, and the vector of the invaded operation log data can be called an invaded sample.

Then, since the data features of different intrusion behaviors in the historical data are different, classification analysis is required to be performed on the intruded samples in the historical data so as to obtain a clustering result corresponding to each feature. Namely, hierarchical clustering is carried out on all the invaded samples to obtain a hierarchical clustering tree, and in the embodiment, hierarchical clustering is carried out on all the invaded samples from bottom to top to obtain the hierarchical clustering tree. The hierarchical clustering algorithm can find the hierarchical distribution relation of the features among different invaded samples.

Further, by analyzing the classification effect of the clustering result corresponding to each layer in the hierarchical clustering tree, a layer of the corresponding clustering result with the best classification effect can be determined, namely, the feature clustering cluster in the clustering result of the most preferred layer is determined according to the difference between the data fluctuation of each clustering cluster of each layer in the hierarchical clustering tree and the integral data fluctuation.

Specifically, the variances of all the invaded samples are calculated and marked as first variances, the intra-class variances of each cluster in each layer are calculated and marked as second variances of each cluster in each layer, the difference between the first variance and each second variance is marked as the difference characteristic value of each cluster in each layer, the layer where the cluster corresponding to the maximum value of the difference characteristic values is marked as the most preferred layer, and all the cluster in the most preferred layer are marked as characteristic cluster.

In this embodiment, the absolute value of the difference between the first variance and the second variance of each cluster in the hierarchical cluster tree is used as the difference characteristic value of each cluster in the hierarchical cluster tree, the difference characteristic value reflects the difference condition between the overall distribution of each cluster and the invaded data, and the larger the difference is, the better the classification effect of the data is, and the better the corresponding clustering result effect is. The clustering result of the most preferred layer characterizes the clustering result with the greatest data class division interval. Compared with other layers in hierarchical clustering, each cluster in the most preferred layer can more accurately represent an intrusion mode.

Step two, extracting features of the normal samples to obtain feature vectors of the normal samples, and extracting features of the invaded samples in the feature cluster to obtain feature vectors of the feature cluster; and obtaining the distinguishing feature vector between the normal sample and the feature cluster according to the feature vector difference condition between the normal sample and the invaded sample of the feature cluster.

It should be noted that, due to different intrusion modes, the intrusion samples have larger differences among the data representation features of the intrusion samples, and then the feature differences of different intrusion samples can be represented by comparing the data difference representations of different intrusion samples and normal samples. And a data dimension space can be constructed by utilizing the characteristics with the differences, and the interpolation process of new sample data points is carried out on the data dimension space according to the existing sample data points, so that the representation of the invaded data with different characteristics can be better obtained.

Firstly, feature extraction operations need to be performed on a normal sample and an invaded sample respectively, in this embodiment, all normal samples are regarded as a normal cluster, and each cluster corresponds to a feature analysis whole of the invaded sample in the hierarchical cluster tree.

Based on the above, each normal sample is taken as a row element of the matrix, and all the normal samples form a normal matrix; in this embodiment, one row in the normal matrix corresponds to a vector in one normal sample. SVD decomposition is carried out on the normal matrix to obtain a right singular matrix of the normal matrix, and a column vector of the right singular matrix of the normal matrix is used as a feature vector of the normal matrix, namely a feature vector of a normal sample, and a feature value of each feature vector of each normal sample can be obtained. This method is a well known technique and will not be described in any greater detail herein.

It can be understood that the normal matrix can be decomposed by SVD to obtain a left singular matrix and a right singular matrix, the column vector of the left singular matrix represents the column space change of the original space, that is, the column space change feature of the normal matrix, and the column vector of the right singular matrix represents the row change of the original space, that is, the row space change feature of the normal matrix, based on which the column vector of the right singular matrix of the normal matrix can represent the data feature expression of the vector in the normal matrix, so that the column vector of the right singular matrix of the normal matrix is used as the feature vector of the normal matrix, a plurality of feature vectors can be obtained, and each feature vector corresponds to one feature value.

And extracting the characteristics of the invaded sample in the characteristic cluster according to the same method to obtain the characteristic vector of the characteristic cluster. Specifically, for any one feature cluster, taking each invaded sample as a row element of a matrix, wherein all invaded samples in the feature cluster form an invaded matrix of the feature cluster; SVD decomposition is carried out on the invaded matrix of the feature cluster to obtain a right singular matrix of the invaded matrix, and a column vector of the right singular matrix of the invaded matrix is used as a feature vector of the invaded matrix of the feature cluster. And simultaneously, the characteristic value corresponding to each characteristic vector of each characteristic cluster can be obtained.

It can be understood that, according to the same method for obtaining the feature vector of any feature cluster, feature extraction can be performed on each cluster in the hierarchical cluster tree to obtain the feature vector and the corresponding feature value of each cluster in the hierarchical cluster tree. And further, by comparing the characteristic difference conditions between the normal sample and the invaded sample in each characteristic cluster, the distinguishing characteristic expression condition between the invaded sample and the normal sample can be obtained. And obtaining the distinguishing feature vector between the normal sample and the feature cluster according to the feature vector difference condition between the normal sample and the invaded sample of the feature cluster.

Specifically, any one feature cluster is marked as a target feature cluster; taking each feature vector of the normal sample as a left node, and taking each feature vector of each invaded sample in the target feature cluster as a right node; obtaining one-to-one matching pairs between each left node and each right node by using a KM matching algorithm; and simultaneously acquiring the characteristic value corresponding to each characteristic vector in each matching pair, and calculating the normalized value of the absolute value of the difference value between the characteristic values corresponding to the characteristic vectors in the matching pair for any matching pair to obtain the characteristic difference value of the matching pair.

Taking any feature cluster as an example, a one-to-one matching pair exists between each feature vector of the normal sample and each invaded sample in the target feature cluster, namely each matching pair comprises the feature vector of one normal sample and the feature vector of the invaded sample in the target feature cluster.

Each matching pair characterizes the feature matching condition of the corresponding similarity between the normal sample and the invaded sample in the target feature cluster. The feature value corresponding to each feature vector in each matching pair reflects the feature performance of the feature vector, and further the difference of the feature performance in the matching pair is analyzed, namely, the feature difference value of each matching pair characterizes the difference feature performance between the feature vector of the normal sample corresponding to the matching pair and the feature vector of the invaded sample.

Further, the matched pairs with larger difference characteristic expression are screened out, so that the difference characteristic between the normal sample and the invaded sample of the corresponding category of the target characteristic cluster can be obtained. In this embodiment, the difference threshold is set to 0.5, and the practitioner can set according to the specific implementation scenario. Since the value of the feature difference value is a normalized value, the range of the difference threshold value is (0, 1), and the more the value of the difference threshold value is close to 0, the more relaxed the standard for screening the distinguishing feature by using the difference threshold value is, and the more the value of the difference threshold value is close to 1, the more strict the standard for screening the distinguishing feature by using the difference threshold value is.

Because the normal sample contains a plurality of feature vectors, and the target feature cluster also contains a plurality of feature vectors of the invaded sample, when the feature difference value of the matched pair is larger than the difference threshold value, the feature difference between the two feature vectors in the matched pair is larger, and further the feature with larger difference is obtained, so that the method can be used for constructing a subsequent difference feature space.

Based on the above, in the matching pairs with the feature difference value larger than the difference threshold, the average value vector between the feature vector corresponding to the normal sample and the feature vector corresponding to the target feature cluster can reflect the balanced difference feature between the normal sample and the invaded sample, each screened matching pair can correspond to one distinguishing vector, and finally, the average value of all the distinguishing vectors is calculated to obtain the integral distinguishing feature vector between the normal sample and the target feature cluster. That is, the distinguishing feature vector can reflect the balanced difference feature between the normal sample and the invaded sample contained in the target cluster from the overall distribution.

In other embodiments, the accuracy and reliability of the matching can be further improved in distinguishing between normal features and invasive features. An edge connection relationship can be set between each left node and each right node, that is, an edge connection exists between each left node and each right node, and an edge value is cosine similarity between the left node and the right node. And further screening the edge values, namely reserving the edge values corresponding to the cosine similarity larger than a preset edge weight threshold, namely reserving the matching relation with higher similarity, and acquiring the distinguishing vector from the reserved edge values. Wherein, the value of the side weight threshold is 0.7, and the implementer can set according to the specific implementation scene.

Step three, obtaining a feature matching sequence corresponding to each feature vector of the feature cluster according to the matching condition of the feature clusters among the feature vectors of the clusters corresponding to the father nodes in the hierarchical cluster tree; and screening out the feature vector sequence according to the similarity distribution between the feature matching sequence and the distinguishing feature vector corresponding to the same feature cluster.

It should be noted that, in the hierarchical clustering tree, with the most preferred layer as a reference, as the layer number gradually increases, the class scale also gradually increases, and the invaded samples of other clusters are continuously increased, that is, in the most preferred layer, each feature cluster gradually increases with the layer number, in the process of continuously merging the clusters, the difference of distinguishing features between each feature cluster and the normal sample gradually increases, which means that the feature dimension information of the corresponding feature cluster is more biased to the difference dimension of the normal sample and the invaded sample, and when the clusters are continuously merged, the difference of distinguishing features between each feature cluster and the normal sample gradually decreases, which means that the feature dimension information of the corresponding feature cluster is more biased to the feature dimension specific information of the invaded behavior feature of the invaded sample of the feature cluster.

Based on the feature, it is necessary to acquire a cluster sequence which is continuously combined with the direction in which the number of layers in the hierarchical cluster tree gradually increases. The feature matching sequence corresponding to each feature vector of the feature cluster is obtained by analyzing the matching condition of the feature clusters among the feature vectors of the cluster clusters corresponding to the father nodes in the hierarchical cluster tree.

Specifically, any one feature cluster is marked as a selected feature cluster, the selected feature cluster is taken as a starting point in a hierarchical cluster tree, the feature class sequence of the selected feature cluster is sequentially obtained according to the sequence of gradually increasing layers, wherein all the clusters comprising the selected feature cluster form the selected feature cluster, and each element in the feature class sequence is marked as a comparison class.

In this embodiment, as shown in fig. 2, a schematic diagram of a hierarchical clustering tree is shown, in which X1, X2, X3, X4, X5, X6, and X7 respectively represent seven clusters, assuming thatFor selecting the feature clusters, starting with the selected feature clusters, sequentially acquiring all the clusters including the selected feature clusters according to the order of gradually increasing the layer number、/>、/>、The four clusters form a feature class sequence of the selected feature cluster, denoted/>A1, A2, A3, A4 represent the control categories, respectively. As the order of elements increases in the sequence of feature classes, the previous control class merges with other clusters into a new control class at a larger scale.

Then, the association relation between the features of two adjacent comparison categories in the feature category sequence is required to be analyzed, namely, feature vectors of each category in the feature category sequence are acquired, and the feature vectors of each two adjacent comparison categories in the feature category sequence are matched one to one by utilizing a KM matching algorithm to obtain a vector group of each two adjacent comparison categories, wherein the vector group comprises the feature vectors in the corresponding two adjacent comparison categories.

In this embodiment, the feature vector of each comparison class in the feature class sequence is obtained according to the same method as the feature vector of the feature cluster. Further, taking the first comparison type A1 and the second comparison type A2 in the feature type sequence as an example, taking each feature vector in the comparison type A1 as a left node, taking each feature vector in the comparison type A2 as a right node, and obtaining a one-to-one matching relationship between the feature vector of the comparison type A1 and the feature vector of the comparison type A2 by using a KM matching algorithm to obtain a plurality of vector groups of the comparison type A1 and the comparison type A2.

For example, the feature vector against category A1 includesThe feature vector against class A2 includesThen the multiple vector sets of collation class A1 and collation class A2 may be expressed as/>、/>、That is, each comparison group includes a feature vector of the comparison class A1 and a feature vector of the comparison class A2. The vector groups of the comparison class A1 and the comparison class A2 also represent the characteristic distribution situation with larger association relation in the two comparison classes.

Further, according to the same obtaining method as the multiple vector sets of the comparison class A1 and the comparison class A2, multiple vector sets of every two adjacent comparison classes in the feature class sequence can be sequentially obtained, and then according to the association relationship between the feature vectors corresponding to the vector sets, feature vectors matched with each feature vector of the selected feature cluster in each other comparison class, namely, feature vectors with similar feature degrees, can be obtained.

Namely, each feature vector of the selected feature cluster is taken as an initial point, a feature matching sequence corresponding to each feature vector of the selected feature cluster is constructed according to the vector group according to the arrangement sequence of the comparison class in the feature class sequence, and every two adjacent feature vectors in the feature matching sequence belong to the same vector group.

In this embodiment, the feature cluster is selected, that is, the first comparison class A1 in the feature class sequence, and further uses the feature vector of the comparison class A1For illustration, the vector sets of the collation class A1 and the collation class A2 include feature vectors/>Is/>The vector sets of collation class A2 and collation class A3 then contain feature vectors/>Is/>Wherein/>For the feature vector of the collation category A3, the vector group of the collation category A3 and the collation category A4 contains/>Is/>Wherein/>Is a feature vector against class A4. According to the arrangement sequence of the comparison categories in the feature category sequence, the first feature vector in the feature matching sequence is the feature vector of the first comparison category in the feature category sequence, the second feature vector in the feature matching sequence is the feature vector of the second comparison category in the feature category sequence, and the like, the feature vector/>The corresponding feature matching sequence may be expressed as/>In this sequence, every two adjacent feature vectors also belong to the same vector group.

Feature vector for selecting feature clustersFeature vectors in the corresponding feature matching sequences may characterize the feature vectors/>, in the progressively larger scale of the comparison classThe feature distribution is closer to and similar to the feature situation. The distinguishing feature vector of the selected feature cluster characterizes the distinguishing feature distribution situation between the selected feature cluster and the normal sample, and further, the feature class sequence which is similar to the distinguishing feature vector can be screened out by comparing the similarity situation between the feature matching sequence corresponding to each feature vector of the feature cluster and the distinguishing feature vector corresponding to the same feature cluster.

Specifically, any one feature vector of the selected feature cluster is marked as a selected feature vector, and the average value of all feature vectors in a feature matching sequence corresponding to the selected feature vector is calculated to obtain a selected reference vector; and taking cosine similarity between the selected reference vector and the distinguishing feature vector of the selected feature cluster as a reference coefficient of the feature matching sequence corresponding to the selected feature vector.

In the present embodiment, feature vectors are usedAs a selected feature vector, then/>The mean value of all elements in the model is the selected reference vector. And the reference coefficient of the sequence approximately characterizes the characteristic vector/>The larger the value of the similarity between the feature vectors corresponding to the same feature cluster, the larger the similarity between the features represented by the feature vectors in the feature matching sequence and the features represented by the corresponding feature clusters.

Further, the feature matching sequences corresponding to each feature vector in the feature cluster are screened through the reference coefficients, so that a part which is relatively close to the distinguishing features of the normal samples can be obtained. Namely, the feature matching sequence corresponding to the reference coefficient larger than the preset similarity threshold value is marked as the feature vector sequence of the selected feature cluster. The feature vector sequences can better represent distinguishing features between the invaded sample and the normal sample.

In this embodiment, the value of the similarity threshold is 0.7, and the implementer can set according to the specific implementation scenario. Because the value range of the cosine similarity is (0, 1), the value range of the similarity threshold is (0, 1), when the value of the similarity threshold is closer to 0, the standard for screening the feature matching sequence by using the similarity threshold is looser, and when the value of the similarity threshold is closer to 1, the standard for screening the feature matching sequence by using the similarity threshold is stricter.

When the reference coefficient of the feature matching sequence is larger than the similarity threshold, the similarity between the feature matching sequence and the corresponding distinguishing feature vector is larger, and then each feature vector in the feature matching sequence is similar or close to the distinguishing feature vector to a certain extent, and the distinguishing feature difference condition between the invaded sample and the normal sample is represented. And a data basis is provided for the subsequent further analysis of the situation that whether the feature difference is increased along with the increase of the element order in the feature matching sequence.

In each feature vector sequence, as the order of the feature vectors increases, if the feature values corresponding to the feature vectors also gradually increase, the features represented by the feature vectors in the feature vector sequence are more similar to common features among different intrusion modes. And if the feature value corresponding to the feature vector gradually decreases along with the increase of the sequence of the feature vector, the feature represented by the feature vector in the feature vector sequence is more similar to the intrusion mode represented by the feature cluster corresponding to the feature vector sequence.

Based on the characteristics, the uniqueness of each feature vector can be quantified by analyzing the change trend of the feature value of each feature vector in the feature vector sequence, so as to obtain the uniqueness distribution condition of each feature vector sequence. First, the size of the corresponding eigenvalue of each eigenvector in the matrix formed by the corresponding category in each eigenvector sequence needs to be obtained by using a known technology.

For any one feature vector sequence; and taking the feature value corresponding to each feature vector in the feature vector sequence as the ordinate of each feature vector, taking the sequence value of the feature vector in the feature vector sequence as the abscissa, obtaining the coordinate point of each feature vector in the feature vector sequence, and obtaining the projection value corresponding to each coordinate point by using a Principal Component Analysis (PCA) algorithm.

For example, if a feature vector is the 3 rd element in the feature vector sequence, the order value of the feature vector is 3. Each feature vector in the feature vector sequence corresponds to one coordinate, all the coordinates in one feature vector sequence are used as input of a principal component analysis algorithm of PCA, a plurality of two-dimensional vectors and projection values of each two-dimensional vector in the principal component direction can be obtained, and then coordinate points of each feature vector in the feature vector sequence can obtain a corresponding projection value.

Therefore, the coordinate point with the largest projection value in the feature vector sequence is screened out to carry out the quantization of the unique features. Namely, the value of the arctangent angle of the ratio of the ordinate to the abscissa of the coordinate point corresponding to the maximum value of the projection values is used as the unique characteristic value of the feature vector sequence. For example, if the arctangent angle corresponding to the coordinate point is 80 °, the value of the arctangent angle corresponding to the coordinate point is 80, that is, the unique characteristic value of the feature vector sequence in which the feature vector corresponding to the coordinate point is located is 80.

The unique characteristic value reflects the change degree of the characteristic value along with the increase of the characteristic vector sequence, and the larger the value of the unique characteristic value is, the larger the change degree is, the stronger the characteristic unique property reflected by the corresponding characteristic vector sequence is, the characteristic reflected by the characteristic vector sequence can be used as the characteristic dimension of a new sample, and further, more new sample points are needed to be added in the characteristic dimension, namely, the density of the new sample in the characteristic dimension is larger.

Based on the feature vector sequence, a base vector is determined, and data interpolation is carried out by combining the unique characteristic values to obtain a training sample. Specifically, for any one feature vector sequence, taking the average value of all feature vectors in the feature vector sequence as a base vector, taking the unique characteristic value of the feature vector sequence corresponding to the base vector as the density of interpolation points in the direction of the base vector, performing bilinear interpolation to obtain incremental data, and taking the incremental data, an invaded sample and a normal sample as training samples.

In this embodiment, for each vector corresponding to an invaded sample in the pre-collected historical data, the projection value of each invaded sample on each basis vector may be obtained by an inner product method, that is, the coordinates of each invaded sample in a new sample space may be obtained, each basis vector in the new sample space corresponds to one sample point cluster, one sample point cluster corresponds to one invaded class, in each sample point cluster corresponding to the basis vector, the unique characteristic value of the feature vector sequence corresponding to each basis vector is used as the density of the interpolation points in the direction of the basis vector, and multiple interpolation points of the sample point cluster corresponding to each basis vector may be obtained by a bilinear difference method, so as to complete the incremental operation of the data sample and obtain all incremental data.

And finally, taking normal samples and invaded samples in the incremental data and the historical data as training samples, training by an adaboost method to obtain a classifier, and carrying out safe transmission on the data by the classifier.

When the classifier is used for classifying and detecting real-time operation log data of the intelligent cloud terminal, when the classification result of a certain operation log data is detected to belong to the category of the certain invaded data, corresponding security treatment measures are needed to be adopted for the certain operation log data so as to ensure the security transmission of the data. For example, the piece of travel log data may be encrypted using a symmetric encryption algorithm.

In summary, in this embodiment, because the invasive samples have larger differences due to different invasive manners, the dimension where the unique features of the invasive samples are located is analyzed, that is, the dimension where the feature value difference between the invasive samples and the normal samples is larger, so that a new feature dimension space is constructed through the dimension where the unique features of the invasive samples are located, interpolation construction of the new samples is performed on the dimension space according to the existing sample points, and new samples are synthesized through data enhancement operation, so as to simulate the new invasion behavior which is not seen. Thus, the diversity of training samples of the algorithm can be increased, and the training samples are more easily adapted to and used for detecting unknown types of intrusion behaviors.

An embodiment of a data security transmission system based on a smart cloud terminal:

The embodiment provides a data security transmission system based on a smart cloud terminal, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the computer program realizes the steps of the data security transmission method based on the smart cloud terminal when being executed by the processor. Since an embodiment of a data security transmission method based on a smart cloud terminal has been described in detail, it will not be described in detail herein.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application and are intended to be included within the scope of the application.

Claims

1. The data security transmission method based on the intelligent cloud terminal is characterized by comprising the following steps of:

determining a unique characteristic value of the feature vector sequence according to the change trend of the feature value corresponding to each feature vector in the feature vector sequence, determining a base vector by using the feature vector sequence, and carrying out data interpolation by combining the unique characteristic value to obtain a training sample; training a classifier by using the training sample, and carrying out safe transmission on the data through the classifier;

The method for obtaining the distinguishing feature vector between the normal sample and the feature cluster according to the feature vector difference condition between the normal sample and the invaded sample of the feature cluster specifically comprises the following steps:

Taking the average value of the feature vectors in the matched pair corresponding to the feature difference value larger than the preset difference threshold value as a distinguishing vector, and taking the average value vector of all the distinguishing vectors corresponding to the normal sample and the target feature cluster as the distinguishing feature vector between the normal sample and the target feature cluster;

the method for obtaining the feature matching sequence corresponding to each feature vector of the feature cluster according to the matching condition of the feature clusters among the feature vectors of the clusters corresponding to the father nodes in the hierarchical cluster tree specifically comprises the following steps:

Starting from each feature vector of the selected feature cluster, constructing a feature matching sequence corresponding to each feature vector of the selected feature cluster according to the arrangement sequence of the comparison categories in the feature category sequence, wherein every two adjacent feature vectors in the feature matching sequence belong to the same vector group;

The feature vector sequence screening according to the similarity distribution between the feature matching sequence and the distinguishing feature vector corresponding to the same feature cluster specifically comprises the following steps:

2. The method for securely transmitting data based on the intelligent cloud terminal according to claim 1, wherein the determining the feature cluster in the clustering result of the most preferred layer according to the difference between the data fluctuation of each cluster of each layer in the hierarchical clustering tree and the overall data fluctuation specifically comprises:

3. The method for securely transmitting data based on the intelligent cloud terminal according to claim 1, wherein the feature extraction of the normal sample is performed to obtain the feature vector of the normal sample, specifically comprising:

4. The method for securely transmitting data based on the intelligent cloud terminal according to claim 1, wherein the determining the unique characterization value of the feature vector sequence according to the variation trend of the feature value corresponding to each feature vector in the feature vector sequence specifically comprises:

5. The method for securely transmitting data based on the intelligent cloud terminal according to claim 1, wherein the determining a basis vector by using a feature vector sequence and performing data interpolation by combining a unique characterization value to obtain a training sample specifically comprises:

6. The method for securely transmitting data based on the intelligent cloud terminal according to claim 1, wherein the step of performing hierarchical clustering on all the invaded samples to obtain the hierarchical clustering tree is specifically as follows: and performing bottom-up hierarchical clustering on all the invaded samples to obtain a hierarchical clustering tree.

7. A smart cloud terminal based data security transmission system comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the computer program when executed by the processor implements the steps of a smart cloud terminal based data security transmission method according to any of claims 1-6.