[go: up one dir, main page]

CN119293670A - Data security management method and system applied to big data management platform - Google Patents

Data security management method and system applied to big data management platform Download PDF

Info

Publication number
CN119293670A
CN119293670A CN202411806548.5A CN202411806548A CN119293670A CN 119293670 A CN119293670 A CN 119293670A CN 202411806548 A CN202411806548 A CN 202411806548A CN 119293670 A CN119293670 A CN 119293670A
Authority
CN
China
Prior art keywords
data
risk
threat
processing
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411806548.5A
Other languages
Chinese (zh)
Other versions
CN119293670B (en
Inventor
相苗苗
任菲菲
耿梓嫣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tongliyu Technology Co ltd
Original Assignee
Nanjing Tongliyu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tongliyu Technology Co ltd filed Critical Nanjing Tongliyu Technology Co ltd
Priority to CN202411806548.5A priority Critical patent/CN119293670B/en
Publication of CN119293670A publication Critical patent/CN119293670A/en
Application granted granted Critical
Publication of CN119293670B publication Critical patent/CN119293670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了应用于大数据管理平台的数据安全管理方法及系统,涉及数据安全管理领域,包括:获取大数据实时动态数据集,将数据集进行分割后加入空闲的数据处理队列;建立数据威胁识别模型,实时检测平台的异常数据风险威胁,基于威胁类型采取自动化应对措施;对已入库的数据查询采用差分隐私技术,根据数据风险威胁情况,实时调整查询时噪声量;基于异常数据风险威胁情况对动态数据进行敏感数据筛选与缓存备份;检测当前处理速度与待处理任务堆积,结合异常数据威胁风险自动对资源分配进行调整。

The present invention discloses a data security management method and system applied to a big data management platform, which relates to the field of data security management, including: obtaining a real-time dynamic data set of big data, dividing the data set and adding it to an idle data processing queue; establishing a data threat identification model, detecting abnormal data risk threats of the platform in real time, and taking automated response measures based on the threat type; using differential privacy technology to query the data that has been stored in the database, and adjusting the noise amount during the query in real time according to the data risk threat situation; performing sensitive data screening and cache backup on dynamic data based on the abnormal data risk threat situation; detecting the current processing speed and the accumulation of tasks to be processed, and automatically adjusting resource allocation in combination with the abnormal data threat risk.

Description

Data security management method and system applied to big data management platform
Technical Field
The invention relates to the field of data security management, in particular to a data security management method and system applied to a big data management platform.
Background
With the rapid development of big data technology, the security problem of data becomes more and more important. Traditional data security management methods are difficult to cope with complex security threats in large data environments, such as data leakage, data tampering, risk access and the like. The prior art relies primarily on encryption and access control, but the performance and adaptability of these methods in large data environments is challenged. In addition, the traditional data security management method generally aims at or deals with the data security problem of a certain link in the data processing flow, and lacks the data security management of the whole data processing flow.
Therefore, how to realize a data security management method comprehensively considering the whole complete data processing life cycle of data processing distribution, data threat identification, database security inquiry, automatic sensitive data backup and resource allocation self-adaptive adjustment is a problem to be solved.
Disclosure of Invention
In order to solve the technical problems, the data security management method applied to the big data management platform is provided, and the technical scheme solves the problems in the background technology.
In order to achieve the above purpose, the invention adopts the following technical scheme:
The data security management method applied to the big data management platform comprises the following steps:
Acquiring a big data real-time dynamic data set, dividing the data set, and adding the divided data set into an idle data processing queue;
Establishing a data threat identification model, detecting abnormal data risk threats of the platform in real time, and taking automatic treatment measures based on threat types;
Adopting a differential privacy technology for the data inquiry in storage, and adjusting the noise quantity during inquiry in real time according to the data risk threat situation;
Sensitive data screening and cache backup are carried out on the dynamic data based on abnormal data risk threat conditions;
And detecting the current processing speed and the accumulation of tasks to be processed, and automatically adjusting the resource allocation by combining the threat risk of the abnormal data.
Preferably, the acquiring the big data real-time dynamic data set, dividing the data set, and adding the divided data set into an idle data processing queue specifically includes:
Acquiring a dynamic data set in a set time window, setting the size of a data processing segmentation window based on the residual available resources of a server, and segmenting the dynamic data set according to the size of the segmentation window;
Acquiring the load conditions of all the data processing threads and the task accumulation quantity of the corresponding data processing task queues, combining the set initial load weight of each thread with the ratio of the task accumulation quantity of each processing thread to the thread processing rate, and comprehensively calculating the idle evaluation value of the data processing thread;
and after the idle evaluation values of all the processing threads are subjected to reverse order sequencing, selecting the segmented dynamic data and sequentially adding the segmented dynamic data into a task waiting queue of the data processing threads according to the sequencing result.
Preferably, the establishing a data threat identification model, detecting the abnormal data risk threat of the platform in real time, and taking automatic treatment measures based on the threat type specifically includes:
Acquiring historical processing data of a system platform as training data, classifying the training data based on set data classification, and performing type attribution batch marking on the data after the type classification;
Preprocessing and feature extraction are carried out on the data subjected to batch marking, a data classification feature vector is established, a convolutional neural network is selected, a data classification recognition model is trained based on training data subjected to feature extraction, and the trained model is deployed into a system;
Setting data classification threat identification division, and establishing a corresponding threat identification model of each set data classification, wherein the model specifically comprises the following steps:
setting a risk threat classification corresponding to the data classification based on a historical risk classification processing result of the current data classification;
retrieving data specifically related to risk investigation results in training data corresponding to the current data classification, marking the data as risk source data, and attributing and marking all the risk source data based on set threat classification;
Preprocessing and feature extraction are carried out on risk source data, a risk data threat classification feature vector is established, a convolutional neural network is selected, and a special risk threat recognition model of current data classification is trained based on the feature extracted risk source data;
Combining special risk threat identification models corresponding to all data classifications and then deploying the combined special risk threat identification models into a system;
setting a corresponding response strategy based on a historical risk threat processing mode, and establishing a risk threat-response strategy association relation;
Acquiring a data processing thread, carrying out data type recognition on real-time data being processed in the data processing thread through a data classification recognition model, selecting a corresponding special risk threat recognition model to carry out risk threat recognition based on a data type recognition result, and carrying out automatic response strategy selection and data risk threat processing according to the risk threat recognition result and a risk threat-response strategy association relationship.
Preferably, the querying the data in storage by adopting a differential privacy technology, and adjusting the noise amount in real time according to the threat situation of the data risk specifically includes:
Acquiring all table structures of a database, selecting a privacy statistical field, and performing risk real-time adjustment on a set privacy budget by using a Sigmoid function based on the identification hit rate of a risk threat identification model in a currently set segmentation window, wherein the specific expression is as follows:
;
wherein epsilon and epsilon' are respectively set privacy budget and adjusted privacy budget, and f is the hit rate of the risk threat identification model;
Setting the time for updating the monitoring window by the statistics field, obtaining the maximum batch size of the latest time window of the privacy statistics field, and calculating the noise scale parameter by combining the adjusted privacy budget epsilon';
Selecting a random value obeying Laplace distribution, generating Laplace noise according to the noise scale and the selected random value, wherein the specific expression is as follows:
;
Wherein N is generated Laplacian noise, sgn (U) is a sign of a random value, U is a random value, deltaW is a set insertion batch size of a privacy statistics field, and lambda is a noise scale parameter;
and applying the generated noise to the query process of the privacy statistics field to increase the privacy of the statistics data query.
Preferably, the sensitive data screening and cache backup of the dynamic data based on the abnormal data risk threat situation specifically includes:
Acquiring a data type identification result of real-time data being processed by a data processing thread, setting sensitive data classification and sensitive operation keywords, retrieving data matched with the sensitive operation keywords in the sensitive data classification, marking the whole data corresponding to the keywords as sensitive data, and calculating the proportion of the sensitive data to the total processed data;
acquiring the identification hit rate of a risk threat identification model in a currently set segmentation window, carrying out weighted summation on the sensitive data duty ratio and the risk threat hit frequency, and calculating a sensitive data risk evaluation value;
Setting a sensitive data risk evaluation value threshold, and storing all marked sensitive data into a cache for backup when the risk evaluation value is greater than or equal to the sensitive data risk evaluation value threshold;
If no abnormal interrupt occurs to the processing thread in the current task processing process, the cache backup is cleared after the task processing is finished, and if abnormal interrupt occurs, the breakpoint is recorded, and the cache backup data is restored and processed again.
Preferably, detecting the accumulation of the current processing speed and the task to be processed, and automatically adjusting the resource allocation in combination with the threat risk of the abnormal data specifically includes:
detecting the processing speed of all current data processing threads and accumulation of tasks to be processed, and recalculating idle evaluation values of all current data processing threads;
Acquiring sensitive data risk assessment values of all current data processing threads, taking the ratio of the idle assessment value to the sensitive data risk assessment value as a resource allocation tendency coefficient, calculating the average value of the resource allocation tendency coefficients of all the data processing threads, and marking the average value as a thread processing resource allocation reference line;
the computing resources and the storage resources of each processing thread are proportionally adjusted based on the deviation of the resource allocation tendency coefficient of each processing thread and the thread processing resource allocation reference line.
Further, a data security management system applied to a big data management platform is provided, for implementing the data security management method applied to the big data management platform, including:
the data acquisition and processing distribution module is used for dividing the dynamic data set according to the size of the segmentation window based on the available resource surplus of the server according to the dynamic data set in the acquisition set time window, calculating the idle evaluation value of the data processing thread, and sequentially adding the dynamic data set into the task waiting queue of the data processing thread according to the calculation result;
the data type and risk threat identification module is used for processing data according to the history of a system platform to serve as training data, training a data classification identification model, training a data classification special risk threat identification model based on the historical risk check processing result of data classification, simultaneously establishing a risk threat-response strategy association relation, carrying out type and risk threat identification on the current processing data, and carrying out automatic response strategy selection and data risk threat processing;
The database query safety module carries out risk real-time adjustment on the set privacy budget according to the identification hit rate of the risk threat identification model in the currently set segmentation window, calculates noise scale parameters based on the insertion batch size of the privacy statistic field and the adjusted privacy budget, then selects a random value, generates Laplace noise according to the noise scale and the selected random value, and applies the generated noise to the query process of the privacy statistic field;
The automatic sensitive data backup module calculates a sensitive data risk assessment value according to the proportion of the sensitive data to the total processing data and the weighted sum result of the identification hit rate of the risk threat identification model in the currently set segmentation window, and stores all marked sensitive data into a cache for backup when the risk assessment value is greater than or equal to the sensitive data risk assessment value threshold;
And the system resource allocation self-adaptive adjustment module is used for taking the average value of the ratio of the idle evaluation value to the sensitive data risk evaluation value of all the data processing threads as a thread processing resource allocation reference line, and adjusting the computing resource and storage of each processing thread in equal proportion based on the deviation of the resource allocation tendency coefficient of each processing thread and the thread processing resource allocation reference line.
Compared with the prior art, the invention has the beneficial effects that:
the method comprises the steps of obtaining a big data real-time dynamic data set, dividing the data set, adding the data set into an idle data processing queue, establishing a data threat identification model, detecting abnormal data risk threats of a platform in real time, adopting automatic countermeasures based on threat types, adopting a differential privacy technology to data inquiry in storage, adjusting noise quantity in inquiry in real time according to data risk threat conditions, conducting sensitive data screening and cache backup on dynamic data based on the abnormal data risk threat conditions, detecting current processing speed and task accumulation to be processed, and automatically adjusting resource allocation in combination with abnormal data threat risks.
The data security management method comprehensively considering the whole complete data processing life cycle of data processing distribution, data threat identification, database security inquiry, automatic sensitive data backup and resource allocation self-adaptive adjustment is realized, the efficient protection and management of mass data are met, and the security and privacy of the data in the transmission, storage and processing processes are ensured.
Drawings
FIG. 1 is a flow chart of a data security management method applied to a big data management platform of the present invention;
FIG. 2 is a flow chart of the present invention for partitioning a data set and adding it to an idle data processing queue;
FIG. 3 is a flow chart of automated countermeasure based on threat types in the present invention;
FIG. 4 is a flow chart of the invention for real-time adjustment of the amount of noise in a query;
FIG. 5 is a flow chart of sensitive data screening and cache backup for dynamic data according to the present invention;
FIG. 6 is a flow chart of the adjustment of resource allocation in the present invention;
Fig. 7 is a schematic structural diagram of a data security management system applied to a big data management platform according to the present invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.
Referring to fig. 1, the data security management method applied to a big data management platform includes:
Acquiring a big data real-time dynamic data set, dividing the data set, and adding the divided data set into an idle data processing queue;
Establishing a data threat identification model, detecting abnormal data risk threats of the platform in real time, and taking automatic treatment measures based on threat types;
Adopting a differential privacy technology for the data inquiry in storage, and adjusting the noise quantity during inquiry in real time according to the data risk threat situation;
Sensitive data screening and cache backup are carried out on the dynamic data based on abnormal data risk threat conditions;
And detecting the current processing speed and the accumulation of tasks to be processed, and automatically adjusting the resource allocation by combining the threat risk of the abnormal data.
Referring to fig. 2, a real-time dynamic data set of big data is acquired, and the data set is divided and then added into an idle data processing queue.
Acquiring a dynamic data set in a set time window, setting the size of a data processing segmentation window based on the residual available resources of a server, and segmenting the dynamic data set according to the size of the segmentation window;
in the segmentation process, each segmented data segment is required to be smaller than or equal to the set segment window size so as to ensure the integrity of the segmented data.
Acquiring the load conditions of all the data processing threads and the task accumulation quantity of the corresponding data processing task queues, combining the ratio of the task accumulation quantity of each processing thread to the thread processing speed, setting initial load weight of each thread, and taking the product of the ratio of the task accumulation quantity to the thread processing speed and the load weight as an idle evaluation value of the data processing thread;
and after the idle evaluation values of all the processing threads are subjected to reverse order sequencing, selecting the segmented dynamic data and sequentially adding the segmented dynamic data into a task waiting queue of the data processing threads according to the sequencing result.
The process aims to carry out sequential processing task distribution according to the thread processing capacity allowance, is beneficial to avoiding the situation that partial threads are overloaded or the threads are idle, and improves the utilization rate of system resources and the load balance level of the data processing process.
Referring to fig. 3, a data threat identification model is established, abnormal data risk threats of the platform are detected in real time, and automatic treatment measures are adopted based on threat types.
Acquiring historical processing data of a system platform as training data, classifying the training data based on set data classification, and performing type attribution batch marking on the data after the type classification;
Preprocessing and feature extraction are carried out on the data subjected to batch marking, a data classification feature vector is established, a convolutional neural network is selected, a data classification recognition model is trained based on training data subjected to feature extraction, and the trained model is deployed into a system;
The method comprises the steps of establishing a data classification feature vector, wherein the data classification feature vector comprises multi-dimensional feature extraction, including data generation frequency domain features, data length features and data regularity features;
Setting data classification threat identification division, and establishing a corresponding threat identification model of each set data classification, wherein the model specifically comprises the following steps:
setting a risk threat classification corresponding to the data classification based on a historical risk classification processing result of the current data classification;
retrieving data specifically related to risk investigation results in training data corresponding to the current data classification, marking the data as risk source data, and attributing and marking all the risk source data based on set threat classification;
Preprocessing and feature extraction are carried out on risk source data, a risk data threat classification feature vector is established, a convolutional neural network is selected, and a special risk threat recognition model of current data classification is trained based on the feature extracted risk source data;
The method comprises the steps of establishing a risk data threat classification feature vector, wherein the risk data threat classification feature vector comprises multidimensional feature extraction, including data generation frequency domain features, risk threat data keyword similarity features, data length features and data regularity features;
Combining special risk threat identification models corresponding to all data classifications and then deploying the combined special risk threat identification models into a system;
setting a corresponding response strategy based on a historical risk threat processing mode, and establishing a risk threat-response strategy association relation;
The method comprises the steps of acquiring a data processing thread, carrying out data type recognition on real-time data being processed in the data processing thread through a data classification recognition model, selecting a corresponding special risk threat recognition model to carry out risk threat recognition based on a data type recognition result, and carrying out automatic response policy selection and data risk threat processing according to a risk threat recognition result and a risk threat-response policy association relationship, so that an automatic flow of classification recognition-risk threat recognition-response processing of the data is realized.
Referring to fig. 4, a differential privacy technology is adopted for the data query in storage, and the real-time adjustment of the noise amount during query according to the data risk threat situation specifically includes:
Acquiring all table structures of a database, selecting a privacy statistical field, and performing risk real-time adjustment on a set privacy budget by using a Sigmoid function based on the identification hit rate of a risk threat identification model in a currently set segmentation window, wherein the specific expression is as follows:
;
wherein epsilon and epsilon' are respectively set privacy budget and adjusted privacy budget, and f is the hit rate of the risk threat identification model;
when the risk threat identification hit rate is higher, the system security risk is higher, the noise scale during data query can be improved by reducing the privacy budget, and the statistical query has higher security.
Setting the time for updating the monitoring window by the statistics field, obtaining the maximum batch size of the latest time window of the privacy statistics field, and calculating the noise scale parameter by combining the adjusted privacy budget epsilon';
Selecting a random value obeying Laplace distribution, generating Laplace noise according to the noise scale and the selected random value, wherein the specific expression is as follows:
;
Wherein N is generated Laplacian noise, sgn (U) is a sign of a random value, U is a random value, deltaW is a set insertion batch size of a privacy statistics field, and lambda is a noise scale parameter;
and applying the generated noise to the query process of the privacy statistics field to increase the privacy of the statistics data query, specifically adding the true value of the statistics query to the calculated noise value to obtain an approximate statistics value.
Referring to fig. 5, sensitive data screening and cache backup are performed on dynamic data based on abnormal data risk threat conditions.
Acquiring a data type identification result of real-time data being processed by a data processing thread, setting sensitive data classification and sensitive operation keywords, retrieving data matched with the sensitive operation keywords in the sensitive data classification, marking the whole data corresponding to the keywords as sensitive data, and calculating the proportion of the sensitive data to the total processed data;
acquiring the identification hit rate of a risk threat identification model in a currently set segmentation window, carrying out weighted summation on the sensitive data duty ratio and the risk threat hit frequency, and calculating a sensitive data risk evaluation value;
Setting a sensitive data risk evaluation value threshold, and storing all marked sensitive data into a cache for backup when the risk evaluation value is greater than or equal to the sensitive data risk evaluation value threshold;
The sensitive data risk assessment threshold is set by a person skilled in the art based on the risk threat condition and the specific security requirement of the real-time data processing, and is not described herein in detail.
When the thread processes data, the thread is terminated when an uncaptured abnormality is found, which is one of the most common reasons for thread interruption, when the data format which can be processed by the system is leaked, an external person can attack the system by inputting a large amount of non-compliant data format and risk data, and at the moment, the data processing thread can be frequently interrupted, and in the process, some important sensitive data operations are lost.
If no abnormal interrupt occurs to the processing thread in the current task processing process, the cache backup is cleared after the task processing is finished, and if abnormal interrupt occurs, the breakpoint is recorded, and the cache backup data is restored and processed again.
Referring to fig. 6, detecting the current processing speed and the task accumulation to be processed, and automatically adjusting the resource allocation in combination with the threat risk of the abnormal data specifically includes:
detecting the processing speed of all current data processing threads and accumulation of tasks to be processed, and recalculating idle evaluation values of all current data processing threads;
Acquiring sensitive data risk assessment values of all current data processing threads, taking the ratio of the idle assessment value to the sensitive data risk assessment value as a resource allocation tendency coefficient, calculating the average value of the resource allocation tendency coefficients of all the data processing threads, and marking the average value as a thread processing resource allocation reference line;
When the idle evaluation value of the thread is increased and the sensitive data risk evaluation value is reduced, the residual resources of the current thread are sufficient, more resources are not required to be allocated for risk data processing, and at the moment, part of idle resources of the current thread can be recovered. Otherwise, when the idle evaluation value of the thread is reduced and the risk evaluation value of the sensitive data is increased, the current thread is indicated to be tense in residual resources and needs more system resources to ensure the processing efficiency of the risk data, and more resources are required to be allocated to the current processing thread.
Based on the deviation between the resource allocation tendency coefficient of each processing thread and the thread processing resource allocation reference line, the computing resource and the storage resource of each processing thread are proportionally adjusted, and the dynamic allocation balance of the data processing resources is realized.
Further, referring to fig. 7, based on the same inventive concept as the data security management method applied to the big data management platform, the present disclosure further proposes a data security management system applied to the big data management platform, including:
the data acquisition and processing distribution module is used for dividing the dynamic data set according to the size of the segmentation window based on the available resource surplus of the server according to the dynamic data set in the acquisition set time window, calculating the idle evaluation value of the data processing thread, and sequentially adding the dynamic data set into the task waiting queue of the data processing thread according to the calculation result;
the data type and risk threat identification module is used for processing data according to the history of a system platform to serve as training data, training a data classification identification model, training a data classification special risk threat identification model based on the historical risk check processing result of data classification, simultaneously establishing a risk threat-response strategy association relation, carrying out type and risk threat identification on the current processing data, and carrying out automatic response strategy selection and data risk threat processing;
The database query safety module carries out risk real-time adjustment on the set privacy budget according to the identification hit rate of the risk threat identification model in the currently set segmentation window, calculates noise scale parameters based on the insertion batch size of the privacy statistic field and the adjusted privacy budget, then selects a random value, generates Laplace noise according to the noise scale and the selected random value, and applies the generated noise to the query process of the privacy statistic field;
The automatic sensitive data backup module calculates a sensitive data risk assessment value according to the proportion of the sensitive data to the total processing data and the weighted sum result of the identification hit rate of the risk threat identification model in the currently set segmentation window, and stores all marked sensitive data into a cache for backup when the risk assessment value is greater than or equal to the sensitive data risk assessment value threshold;
And the system resource allocation self-adaptive adjustment module is used for taking the average value of the ratio of the idle evaluation value to the sensitive data risk evaluation value of all the data processing threads as a thread processing resource allocation reference line, and adjusting the computing resource and storage of each processing thread in equal proportion based on the deviation of the resource allocation tendency coefficient of each processing thread and the thread processing resource allocation reference line.
Still further, the present solution also proposes a data security management method storage medium applied to the big data management platform, on which a computer readable program is stored, and when the computer readable program is called, the above-mentioned data security management method applied to the big data management platform is executed.
It is understood that the storage medium may be a magnetic medium such as a floppy disk, a hard disk, a magnetic tape, an optical medium such as a DVD, or a semiconductor medium such as a solid state disk SolidStateDisk, SSD, etc.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1.应用于大数据管理平台的数据安全管理方法,其特征在于,包括:1. A data security management method applied to a big data management platform, characterized by comprising: 获取大数据实时动态数据集,将数据集进行分割后加入空闲的数据处理队列;Obtain a real-time dynamic data set of big data, split the data set and add it to the idle data processing queue; 建立数据威胁识别模型,实时检测平台的异常数据风险威胁,基于威胁类型采取自动化应对措施;Establish a data threat identification model to detect abnormal data risk threats on the platform in real time and take automated response measures based on the threat type; 对已入库的数据查询采用差分隐私技术,根据数据风险威胁情况,实时调整查询时噪声量;Differential privacy technology is used to query stored data, and the amount of noise during query is adjusted in real time according to the data risk threat situation; 基于异常数据风险威胁情况对动态数据进行敏感数据筛选与缓存备份;Perform sensitive data screening and cache backup of dynamic data based on abnormal data risk threats; 检测当前处理速度与待处理任务堆积,结合异常数据威胁风险自动对资源分配进行调整。Detect the current processing speed and the backlog of pending tasks, and automatically adjust resource allocation based on the risk of abnormal data threats. 2.根据权利要求1所述的应用于大数据管理平台的数据安全管理方法,其特征在于,所述获取大数据实时动态数据集,将数据集进行分割后加入空闲的数据处理队列具体包括:2. The data security management method applied to a big data management platform according to claim 1 is characterized in that the step of obtaining a real-time dynamic data set of big data and dividing the data set and adding it to an idle data processing queue specifically comprises: 获取设定时间窗口内的动态数据集,基于服务器的可用资源剩余,设定数据处理分段窗口大小,将动态数据集按分段窗口大小进行分割;Get the dynamic data set within the set time window, set the data processing segmentation window size based on the available resources remaining on the server, and divide the dynamic data set according to the segmentation window size; 获取所有数据处理线程负载情况与对应数据处理任务队列的任务堆积数量,将每个处理线程的任务堆积量与线程处理速率的比值,结合每个线程的设定初始负载权重,综合计算数据处理线程的空闲评估值;Obtain the load conditions of all data processing threads and the number of tasks accumulated in the corresponding data processing task queues, and calculate the idle evaluation value of the data processing thread by combining the ratio of the task accumulation amount of each processing thread to the thread processing rate with the set initial load weight of each thread; 将所有处理线程的空闲评估值进行倒序排序后,选取分割后的动态数据按排序结果依次加入数据处理线程的任务等待队列。After the idle evaluation values of all processing threads are sorted in reverse order, the segmented dynamic data are selected and added to the task waiting queue of the data processing thread in sequence according to the sorting results. 3.根据权利要求2所述的应用于大数据管理平台的数据安全管理方法,其特征在于,所述建立数据威胁识别模型,实时检测平台的异常数据风险威胁,基于威胁类型采取自动化应对措施具体包括:3. The data security management method applied to a big data management platform according to claim 2 is characterized in that the establishment of a data threat identification model, real-time detection of abnormal data risk threats of the platform, and taking automated response measures based on threat types specifically include: 获取系统平台历史处理数据作为训练数据,基于设定数据分类对训练数据进行类型划分,对类型划分后的数据进行类型归属批量打标;Obtain historical processing data from the system platform as training data, classify the training data based on the set data classification, and batch-label the data after classification; 将批量打标后的数据进行预处理与特征提取,建立数据分类特征向量,选择卷积神经网络,基于特征提取后的训练数据,训练数据分类识别模型,将训练好的模型部署到系统中;Preprocess and extract features of batch labeled data, establish data classification feature vectors, select convolutional neural networks, train data classification and recognition models based on feature-extracted training data, and deploy the trained models into the system; 设定数据分类威胁识别分工,建立每种设定数据分类的对应威胁识别模型,具体为:Set up data classification threat identification division of labor and establish corresponding threat identification models for each set data classification, specifically: 基于当前数据分类的历史风险排查处理结果,设定数据分类对应的风险威胁分类;Based on the historical risk screening results of the current data classification, set the risk threat classification corresponding to the data classification; 检索当前数据分类对应训练数据中的风险排查结果具体涉及到的数据,标记为风险源数据,对所有风险源数据基于设定威胁分类进行归属标注;Retrieve the data specifically involved in the risk screening results in the training data corresponding to the current data classification, mark it as risk source data, and attribute and label all risk source data based on the set threat classification; 将风险源数据进行预处理与特征提取,建立风险数据威胁分类特征向量,选择卷积神经网络,基于特征提取后的风险源数据训练当前数据分类的专用风险威胁识别模型;Preprocess and extract features of risk source data, establish risk data threat classification feature vectors, select convolutional neural networks, and train a dedicated risk threat identification model for current data classification based on the risk source data after feature extraction; 将所有数据分类对应的专用风险威胁识别模型进行组合后部署到系统中;Combine and deploy the dedicated risk threat identification models corresponding to all data classifications into the system; 基于历史风险威胁处理方式设定对应响应策略,建立风险威胁-响应策略关联关系;Set corresponding response strategies based on historical risk threat handling methods and establish risk threat-response strategy associations; 获取数据处理线程,对数据处理线程中正在处理的实时数据通过数据分类识别模型进行数据类型识别,基于数据类型识别结果选择对应的专用风险威胁识别模型进行风险威胁识别,根据风险威胁识别结果结合风险威胁-响应策略关联关系,进行自动化响应策略选择与数据风险威胁处理。Get the data processing thread, identify the data type of the real-time data being processed in the data processing thread through the data classification recognition model, select the corresponding dedicated risk threat identification model for risk threat identification based on the data type identification result, and perform automated response strategy selection and data risk threat processing based on the risk threat identification result and the risk threat-response strategy association. 4.根据权利要求3所述的应用于大数据管理平台的数据安全管理方法,其特征在于,所述对已入库的数据查询采用差分隐私技术,根据数据风险威胁情况,实时调整查询时噪声量具体包括:4. According to the data security management method applied to the big data management platform of claim 3, it is characterized in that the query of the stored data adopts differential privacy technology, and according to the data risk threat situation, the noise amount during the query is adjusted in real time, specifically including: 获取数据库所有表结构,选取隐私统计字段,基于当前设定分段窗口内风险威胁识别模型的识别命中率,使用Sigmoid函数对设定隐私预算进行风险性实时调整,具体表达式为:Get all the table structures of the database, select the privacy statistical fields, and use the Sigmoid function to adjust the risk of the set privacy budget in real time based on the recognition hit rate of the risk threat recognition model in the current set segmentation window. The specific expression is: ; 式中,ε、ε’分别为设定隐私预算与调整后的隐私预算,f为风险威胁识别模型识别命中率;In the formula, ε and ε’ are the set privacy budget and the adjusted privacy budget respectively, and f is the recognition hit rate of the risk threat identification model; 设定统计字段更新监控窗口时间,获取隐私统计字段的最近一个时间窗口的插入最大批量大小,结合调整后的隐私预算ε’,计算噪声尺度参数;Set the statistical field update monitoring window time, obtain the maximum batch size of the most recent time window of the privacy statistical field, and calculate the noise scale parameter based on the adjusted privacy budget ε’; 选取一个服从拉普拉斯分布的随机值,根据噪声尺度与选取的随机值生成拉普拉斯噪声,具体表达式为:Select a random value that obeys the Laplace distribution, and generate Laplace noise according to the noise scale and the selected random value. The specific expression is: ; 式中,N为生成的拉普拉斯噪声,sgn(U)为随机值的符号,U为随机值,△W为隐私统计字段的设定插入批量大小,λ为噪声尺度参数;Where N is the generated Laplace noise, sgn(U) is the sign of the random value, U is the random value, △W is the set insertion batch size of the privacy statistics field, and λ is the noise scale parameter; 将生成的噪声应用到隐私统计字段的查询过程,增加统计数据查询的隐私性。The generated noise is applied to the query process of the privacy statistical field to increase the privacy of statistical data queries. 5.根据权利要求4所述的应用于大数据管理平台的数据安全管理方法,其特征在于,所述基于异常数据风险威胁情况对动态数据进行敏感数据筛选与缓存备份具体包括:5. The data security management method applied to a big data management platform according to claim 4 is characterized in that the sensitive data screening and cache backup of dynamic data based on abnormal data risk threat conditions specifically includes: 获取数据处理线程正在处理的实时数据的数据类型识别结果,设定敏感数据分类与敏感操作关键词,检索敏感数据分类中敏感操作关键词匹配的数据,将关键词对应的整条数据标记为敏感数据,计算敏感数据占总处理数据的比例;Obtain the data type identification result of the real-time data being processed by the data processing thread, set the sensitive data classification and sensitive operation keywords, retrieve the data matching the sensitive operation keywords in the sensitive data classification, mark the entire data corresponding to the keyword as sensitive data, and calculate the proportion of sensitive data in the total processed data; 获取当前设定分段窗口内风险威胁识别模型的识别命中率,对敏感数据占比与风险威胁命中频率进行赋权求和,计算敏感数据风险评估值;Obtain the recognition hit rate of the risk threat recognition model within the currently set segment window, perform weighted summation on the proportion of sensitive data and the risk threat hit frequency, and calculate the risk assessment value of sensitive data; 设定敏感数据风险评估值阈值,当风险评估值大于等于敏感数据风险评估值阈值时,将所有标记的敏感数据存入缓存备份;Set a sensitive data risk assessment value threshold. When the risk assessment value is greater than or equal to the sensitive data risk assessment value threshold, store all marked sensitive data in the cache backup; 若当前任务处理过程中处理线程未发生异常中断,则在任务处理结束后将缓存备份清除,若发生异常中断,则记录断点并将缓存备份数据恢复并重新进行处理。If the processing thread does not experience an abnormal interruption during the current task processing, the cache backup will be cleared after the task processing is completed. If an abnormal interruption occurs, the breakpoint will be recorded and the cache backup data will be restored and reprocessed. 6.根据权利要求5所述的应用于大数据管理平台的数据安全管理方法,其特征在于,所述检测当前处理速度与待处理任务堆积,结合异常数据威胁风险自动对资源分配进行调整具体包括:6. The data security management method applied to a big data management platform according to claim 5 is characterized in that the detecting the current processing speed and the accumulation of tasks to be processed and automatically adjusting resource allocation in combination with the abnormal data threat risk specifically comprises: 检测当前所有数据处理线程处理速度与待处理任务堆积,重新计算当前所有数据处理线程的空闲评估值;Detect the processing speed of all current data processing threads and the backlog of pending tasks, and recalculate the idle evaluation values of all current data processing threads; 获取当前所有数据处理线程的敏感数据风险评估值,将空闲评估值与敏感数据风险评估值两项的比值作为资源分配倾向性系数,计算所有数据处理线程的资源分配倾向性系数的平均值,标记为线程处理资源分配对照准线;Obtain the sensitive data risk assessment values of all current data processing threads, use the ratio of the idle assessment value to the sensitive data risk assessment value as the resource allocation tendency coefficient, calculate the average value of the resource allocation tendency coefficients of all data processing threads, and mark it as the thread processing resource allocation reference line; 基于每个处理线程的资源分配倾向性系数与线程处理资源分配对照准线的偏差,等比例调整每个处理线程的计算资源与存储资源。Based on the deviation of the resource allocation tendency coefficient of each processing thread and the thread processing resource allocation reference line, the computing resources and storage resources of each processing thread are adjusted in proportion. 7.应用于大数据管理平台的数据安全管理系统,用于实现如权利要求1-6任一项所述的应用于大数据管理平台的数据安全管理方法,其特征在于,包括:7. A data security management system applied to a big data management platform, used to implement the data security management method applied to a big data management platform as claimed in any one of claims 1 to 6, characterized in that it includes: 数据获取与处理分发模块,所述数据获取与处理分发模块根据获取设定时间窗口内的动态数据集,基于服务器的可用资源剩余,将动态数据集按分段窗口大小进行分割,计算数据处理线程的空闲评估值,将动态数据集按计算结果依次加入数据处理线程的任务等待队列;A data acquisition and processing distribution module, which acquires a dynamic data set within a set time window, divides the dynamic data set into segmented window sizes based on the remaining available resources of the server, calculates the idle evaluation value of the data processing thread, and sequentially adds the dynamic data set to the task waiting queue of the data processing thread according to the calculation results; 数据类型与风险威胁识别模块,所述数据类型与风险威胁识别模块根据系统平台历史处理数据作为训练数据,训练数据分类识别模型,基于数据分类的历史风险排查处理结果,训练数据分类的专用风险威胁识别模型,同时建立风险威胁-响应策略关联关系,对当前处理数据进行类型与风险威胁识别,进行自动化响应策略选择与数据风险威胁处理;A data type and risk threat identification module, which uses the historical processing data of the system platform as training data, trains a data classification identification model, and based on the historical risk screening processing results of data classification, trains a dedicated risk threat identification model for data classification, and simultaneously establishes a risk threat-response strategy association relationship, identifies the type and risk threat of the currently processed data, and performs automated response strategy selection and data risk threat processing; 数据库查询安全模块,所述数据库查询安全模块根据当前设定分段窗口内风险威胁识别模型的识别命中率对设定隐私预算进行风险性实时调整,基于隐私统计字段的插入批量大小与调整后的隐私预算计算噪声尺度参数,而后选取随机值,根据噪声尺度与选取的随机值生成拉普拉斯噪声,将生成的噪声应用到隐私统计字段的查询过程;A database query security module, which performs real-time risk adjustment on a set privacy budget according to the recognition hit rate of the risk threat recognition model in the currently set segment window, calculates a noise scale parameter based on the insertion batch size of the privacy statistics field and the adjusted privacy budget, and then selects a random value, generates Laplace noise according to the noise scale and the selected random value, and applies the generated noise to the query process of the privacy statistics field; 敏感数据自动备份模块,所述敏感数据自动备份模块根据敏感数据占总处理数据的比例与当前设定分段窗口内风险威胁识别模型的识别命中率的赋权求和结果,计算敏感数据风险评估值,当风险评估值大于等于敏感数据风险评估值阈值时,将所有标记的敏感数据存入缓存备份;A sensitive data automatic backup module, which calculates the sensitive data risk assessment value according to the weighted sum of the proportion of sensitive data to the total processed data and the recognition hit rate of the risk threat recognition model in the currently set segment window, and when the risk assessment value is greater than or equal to the sensitive data risk assessment value threshold, stores all marked sensitive data in the cache backup; 系统资源分配自适应调整模块,所述系统资源分配自适应调整模块根据所有数据处理线程的空闲评估值与敏感数据风险评估值两项的比值的平均值作为线程处理资源分配对照准线,基于每个处理线程的资源分配倾向性系数与线程处理资源分配对照准线的偏差,等比例调整每个处理线程的计算资源与存储。A system resource allocation adaptive adjustment module, which uses the average value of the ratio of the idle assessment value to the sensitive data risk assessment value of all data processing threads as the thread processing resource allocation reference line, and proportionally adjusts the computing resources and storage of each processing thread based on the deviation between the resource allocation tendency coefficient of each processing thread and the thread processing resource allocation reference line.
CN202411806548.5A 2024-12-10 2024-12-10 Data security management method and system applied to big data management platform Active CN119293670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411806548.5A CN119293670B (en) 2024-12-10 2024-12-10 Data security management method and system applied to big data management platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411806548.5A CN119293670B (en) 2024-12-10 2024-12-10 Data security management method and system applied to big data management platform

Publications (2)

Publication Number Publication Date
CN119293670A true CN119293670A (en) 2025-01-10
CN119293670B CN119293670B (en) 2025-05-02

Family

ID=94167731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411806548.5A Active CN119293670B (en) 2024-12-10 2024-12-10 Data security management method and system applied to big data management platform

Country Status (1)

Country Link
CN (1) CN119293670B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110297711A (en) * 2019-05-16 2019-10-01 平安科技(深圳)有限公司 Batch data processing method, device, computer equipment and storage medium
US20200250335A1 (en) * 2019-02-01 2020-08-06 LeapYear Technologies, Inc. Differentially Private Query Budget Refunding
CN115329745A (en) * 2022-07-28 2022-11-11 中国电信股份有限公司 Data processing method and device
CN117056951A (en) * 2023-08-09 2023-11-14 上海好芯好翼智能科技有限公司 Data security management method for digital platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200250335A1 (en) * 2019-02-01 2020-08-06 LeapYear Technologies, Inc. Differentially Private Query Budget Refunding
CN110297711A (en) * 2019-05-16 2019-10-01 平安科技(深圳)有限公司 Batch data processing method, device, computer equipment and storage medium
CN115329745A (en) * 2022-07-28 2022-11-11 中国电信股份有限公司 Data processing method and device
CN117056951A (en) * 2023-08-09 2023-11-14 上海好芯好翼智能科技有限公司 Data security management method for digital platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NHATHAI PHAN等: "Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learnin", 《ARXIV:1709.05750V2》, 23 April 2018 (2018-04-23), pages 1 - 13 *
吴宁博等: "面向关联属性的差分隐私信息熵度量方法", 《电子学报》, vol. 47, no. 11, 2 December 2019 (2019-12-02), pages 2337 - 2343 *

Also Published As

Publication number Publication date
CN119293670B (en) 2025-05-02

Similar Documents

Publication Publication Date Title
US20210241175A1 (en) Methods and apparatus for management of a machine-learning model to adapt to changes in landscape of potentially malicious artifacts
CN111930526B (en) Load prediction method, load prediction device, computer equipment and storage medium
US7353214B2 (en) Outlier determination rule generation device and outlier detection device, and outlier determination rule generation method and outlier detection method thereof
CN110351244A (en) A kind of network inbreak detection method and system based on multireel product neural network fusion
JP2007503034A (en) Method and apparatus for automatically online detecting and classifying anomalous objects in a data stream
CN113434856B (en) Network intrusion detection method based on PSOGWO-SVM algorithm
CN108667678A (en) A method and device for security detection of operation and maintenance logs based on big data
CN112215655A (en) Client portrait label management method and system
CN109191210A (en) A kind of broadband target user's recognition methods based on Adaboost algorithm
CN111444362A (en) Malicious picture intercepting method, device, equipment and storage medium
CN114742477A (en) Enterprise order data processing method, device, equipment and storage medium
Ienco et al. Change detection in categorical evolving data streams
CN115393666B (en) Small sample expansion method and system based on prototype completion in image classification
CN105930430B (en) A real-time fraud detection method and device based on non-cumulative attributes
CN115527259A (en) Face recognition method, device, equipment and storage medium under partial occlusion
CN119293670B (en) Data security management method and system applied to big data management platform
CN112653711A (en) Network intrusion behavior feature selection method and device and storage medium
CN113887633B (en) Malicious behavior identification method and system for closed source power industrial control system based on IL
CN113435482B (en) Method, device and equipment for judging open set
KR101085066B1 (en) Association classification method for meaningful knowledge exploration in large multi-attribute datasets
CN116362577A (en) Target class membership analysis method, system, device and storage medium
CN119007204B (en) Efficiency improving method, system, equipment and medium for acquiring and downloading advertisement materials
CN114528342B (en) A high-value data mining method and device based on retail scenarios
CN110414605A (en) A Feature Selection Method Based on Multi-kernel Robust Fuzzy Rough Set Model
Shemshaki et al. Face detection based on fuzzy granulation and skin color segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant