CN110377977A - Sensitive information leakage detection method and device and storage medium - Google Patents
Sensitive information leakage detection method and device and storage medium Download PDFInfo
- Publication number
- CN110377977A CN110377977A CN201910579777.0A CN201910579777A CN110377977A CN 110377977 A CN110377977 A CN 110377977A CN 201910579777 A CN201910579777 A CN 201910579777A CN 110377977 A CN110377977 A CN 110377977A
- Authority
- CN
- China
- Prior art keywords
- sensitive
- text information
- leakage
- information
- information leakage
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 42
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000003058 natural language processing Methods 0.000 claims abstract description 12
- 230000009467 reduction Effects 0.000 claims description 49
- 238000004590 computer program Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000008901 benefit Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 10
- 238000004458 analytical method Methods 0.000 abstract description 3
- 230000006872 improvement Effects 0.000 description 8
- 238000004088 simulation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000007423 decrease Effects 0.000 description 3
- 230000001052 transient effect Effects 0.000 description 3
- 101100499229 Mus musculus Dhrsx gene Proteins 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003071 parasitic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computer Hardware Design (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Geometry (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a method for detecting sensitive information leakage, which comprises the following steps: collecting a network data packet to be detected; classifying network data packets to be detected; respectively sending the network data packets to be detected to a plurality of processors, and performing parallel TCP stream restoration processing by using the processors to obtain restored text information; performing natural language processing on the restored text information, judging whether sensitive text information exists in the restored text information or not, and calculating the leakage rate of the sensitive text information; and judging the network data packet where the sensitive text information is located as a sensitive information leakage data packet, performing source tracing analysis on the source address of the sensitive information leakage data packet, and positioning the sensitive information leakage source. The method for detecting the sensitive information leakage can improve the capturing speed of the data packet and reduce the time for identifying and detecting the sensitive information of the network. The invention also discloses a detection device and a storage medium for sensitive information leakage.
Description
Technical field
The present invention relates to sensitive information detection technique field more particularly to a kind of detection methods of sensitive information leakage, dress
It sets and storage medium.
Background technique
With quickly propelling for IT application in enterprise, more and more OA office systems, internal mail system, Instant Messenger
Letter tool is widely applied, this brings great convenience to daily Working Life.But while convenient, also go out
Numerous information security issues are showed, various sensitive information leakage events happen occasionally.
Major part enterprise generally takes two kinds of precautionary measures in " sensitive information leakage " problem at present: first is that reinforcing related
The security system training of concerning security matters staff;Second is that by secrecy authorities in a manner of inspecting by random samples internally online computer or
Subordinate unit carries out censorship.Although these modes solve the problems, such as to a certain extent, still sensitive information leakage
Risk.On the one hand be training can not cover the great external coordination unit personnel of all personnel, especially mobility, even if
It covers also it is difficult to ensure that all personnel can execute according to security requirements;On the other hand, the mode manually irregularly inspected by random samples needs
Want a large amount of human input.In order to effectively guard against the information leakage of enterprises, while personnel's investment is reduced, needs intelligence
Sensitive information leakage further apply with studying and judging analytical technology.
In the prior art, carrying out sensitive information identification and the method for detection to network has: (1) it is mixed for using Network card setup
Parasitic mode formula and the network packet copy mode combined with libpcap carry out the information collection for high speed network;
(2) it for the data packet captured, needs to be reverted to application layer and carries out content analysis, mostly use at present
The method of TCP flow reduction;
The methods of (3) for the content for having reverted to application layer, analyzed using sensitive word and search, Similar Text;
The present inventor has found in the practice of the invention, and following technical problem exists in the prior art:
There are limitations for the packet capture speed of network sensitive information detection technique, are easy to produce packet loss;Due to existing
System is called and the data copy of kernel spacing to user's space often brings the decline on acquisition speed, to cause to lose
Packet;If the Data Structure Design of processing is improper, space complexity and the time complexity that will lead to TCP flow reduction are excessively high;Net
The text data data volume that network restores in real time is very big, and retrieval and parsing are very time-consuming.
The technical program will be acquired the data packet on high speed network, and carry out parallel for collected data packet
TCP flow reduction treatment, then the leakage journey that sensitive information is studied and judged in intelligent text analysis is carried out to the application layer content after parsing merging
Degree finally carries out Source Tracing to the sensitive information of leakage.
Summary of the invention
The embodiment of the present invention provides a kind of detection method of sensitive information leakage, can be improved the acquisition speed of data packet,
Reduce the time that sensitive information identification and detection is carried out to network.
The embodiment of the present invention one provides a kind of detection method of sensitive information leakage, comprising:
Acquire network packet to be detected;
Classify to the network packet to be detected;
The network packet to be detected is respectively sent to several processors, is carried out using the processor parallel
TCP flow reduction treatment, the text information after being restored;
Natural language processing is carried out to the text information after the reduction, in the text information after judging the reduction whether
There are sensitive text informations, and calculate the leakage rate of the sensitive text information;
Network packet where the sensitive text information is determined as sensitive information leakage data packet, to the sensitivity
The source address of information leakage data packet carries out Source Tracing, location-sensitive information leakage source.
As an improvement of the above scheme, the acquisition network packet to be detected, specifically includes:
The network packet to be detected that will be captured is sent in pre-assigned address space;
Wherein, the address space is corresponding with buffer queue;The buffer queue uses the mechanism of first in first out, and head of the queue is used
In reading data, tail of the queue, which is used to analyze the received data, to be written.
As an improvement of the above scheme, described to classify to the network packet to be detected, it specifically includes:
Address resolution is carried out to the network packet to be detected collected in certain time period;
Network packet to be detected after parsing is subjected to clustering processing according to the address field of destination address, and according to poly-
Class processing result is classified.
It is as an improvement of the above scheme, described that the network packet to be detected is respectively sent to several processors,
Parallel TCP flows reduction treatment is carried out using the processor, the text information after being restored specifically includes:
Using TCP connection mark SIP, SPT, DIP and DPT as keyword, then have Hash chained list calculation formula as follows:
The Hash chained list is calculated, all TCP connection points are assigned to each list item in the Hash chained list, is realized
TCP flow reduction treatment, the text information after being restored.
As an improvement of the above scheme, further includes:
Tissue is carried out by all tie points of the Splay tree to each list item in the Hash chained list, is obtained corresponding
Connection identifier;
The address Hash is calculated by Hash function according to the connection identifier, then corresponding to the address Hash
Splay tree is searched, so as to be searched the text information after the reduction, deleted and modified.
As an improvement of the above scheme, the text information to after the reduction carries out natural language processing, judges institute
With the presence or absence of sensitive text information in text information after stating reduction, and the leakage rate of the sensitive text information is calculated, specifically
Include:
Keyword relevant to preset sensitive information is built into sensitive dictionary;
Text information after the reduction is compared with the keyword in sensitive dictionary;
Text identical with the keyword in the sensitive dictionary in text information after the reduction is then identified,
Obtain sensitive text information;
The leakage rate X of the sensitive text information is calculated according to the following formula;
In formula, S is the total quantity of sensitive text information, and S' is the keyword total quantity in sensitive dictionary.
As an improvement of the above scheme, further includes:
Set sensitive information leakage recognition threshold
By sensitive information slip X withIt is compared judgement;
IfThen the text is determined as doubtful information leakage;IfAndThen it is determined as that information is let out
Dew;IfAnd X≤100%, then it is determined as serious information leakage.
The correspondence of the embodiment of the present invention two provides a kind of detection device of sensitive information leakage, comprising:
Packet capture unit, for acquiring network packet to be detected;
Packet classification unit, for classifying to the network packet to be detected;
Data packet reduction unit is used for the network packet to be detected to be respectively sent to several processors
The processor carries out parallel TCP flows reduction treatment, the text information after being restored;
Computing unit is revealed, for carrying out natural language processing to the text information after the reduction, judges the reduction
With the presence or absence of sensitive text information in text information afterwards, and calculate the leakage rate of the sensitive text information;
Source of leakage positioning unit, for the network packet where the sensitive text information to be determined as that sensitive information is let out
Reveal data packet, Source Tracing, location-sensitive information leakage source are carried out to the source address of the sensitive information leakage data packet.
The correspondence of the embodiment of the present invention three provides a kind of detection device of sensitive information leakage, comprising: processor, memory
And the computer program executed by the processor is stored in the memory and is configured as, the processor executes institute
A kind of detection method of sensitive information leakage as described in the embodiment of the present invention one is realized when stating computer program.
The correspondence of the embodiment of the present invention four provides a kind of computer readable storage medium, which is characterized in that the computer
Readable storage medium storing program for executing includes the computer program of storage, wherein controlling the computer in computer program operation can
Equipment executes a kind of detection method of sensitive information leakage as described in the embodiment of the present invention one where reading storage medium.
A kind of detection method of sensitive information leakage provided in an embodiment of the present invention, has the following beneficial effects:
By the selected of interface circuit, the division of analogue system ensure that the big step-length electromagnetic transient simulation of AC system
Model and the PSCAD/EMTDC simulation model of direct current system accurately can smoothly carry out mixing calculating, to play big step
The speed advantage of long electromagnetic transient state procedure, improves the electromagnetic transient simulation rate of ac and dc systems, while retaining PSCAD/EMTDC
The accuracy that simulation model emulates direct current system;By just being switched over to disconnecting switch after reaching stable state, reduce because of switch
Power swing caused by closure;Ideal voltage source is set in the interface section, direct current system start-up course is avoided to destroy exchange
The initialization of system;Realize not only accurate but also efficient hybrid simulation.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the detection method for sensitive information leakage that the embodiment of the present invention one provides.
Fig. 2 is a kind of structural schematic diagram of the detection device of sensitive information leakage provided by Embodiment 2 of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
It is a kind of flow diagram of the detection method for sensitive information leakage that the embodiment of the present invention one provides referring to Fig. 1,
Include:
S101, acquisition network packet to be detected;
S102, classify to the network packet to be detected;
S103, the network packet to be detected is respectively sent to several processors, is carried out using the processor
Parallel TCP flows reduction treatment, the text information after being restored;
S104, natural language processing is carried out to the text information after the reduction, the text information after judging the reduction
In with the presence or absence of sensitive text information, and calculate the leakage rate of the sensitive text information;
S105, the network packet where the sensitive text information is determined as sensitive information leakage data packet, to institute
The source address for stating sensitive information leakage data packet carries out Source Tracing, location-sensitive information leakage source.
Further, the acquisition network packet to be detected, specifically includes:
The network packet to be detected that will be captured is sent in pre-assigned address space;
Wherein, the address space is corresponding with buffer queue;The buffer queue uses the mechanism of first in first out, and head of the queue is used
In reading data, tail of the queue, which is used to analyze the received data, to be written.
Preferably, network packet to be detected is captured by DMA technology, is realized direct to the memory of data
Access.
Further, described to classify to the network packet to be detected, it specifically includes:
Address resolution is carried out to the network packet to be detected collected in certain time period;
Network packet to be detected after parsing is subjected to clustering processing according to the address field of destination address, and according to poly-
Class processing result is classified.
Preferably, setting network interface card is that promiscuous mode captures the data packet on network, by BPF packet filtering mechanism come pair
Data link layer packets are filtered.But traditional Libpcap acquisition mode is due to data copy, system is called and hardware
Interrupt processing will affect the capture rate of data packet.In order to avoid cpu data copy, the system that reduces is called and is interrupted, the present invention
For a kind of detection method of the sensitive information leakage provided in the course of work of reading data, DMA technology will store number in queue
According to Address space mappinD into user's space, allow user's space directly to access this section of memory, avoid memory copying
System call.
Further, described that the network packet to be detected is respectively sent to several processors, using the place
It manages device and carries out parallel TCP flows reduction treatment, the text information after being restored specifically includes:
Using TCP connection mark SIP, SPT, DIP and DPT as keyword, since corresponding hash function requirements have source
The hash value that mesh symmetry, i.e. tetra- elements of SIP, SPT, DIP in function formula and DPT are calculated after exchanging is identical, then
Have Hash chained list calculation formula as follows:
The Hash chained list is calculated, all TCP connection points are assigned to each list item in the Hash chained list, is realized
TCP flow reduction treatment, the text information after being restored.
Preferably due to which the identical data packet of destination IP is probably derived from the same message, the identical number of purpose IP network section
It is larger according to probability of the packet from the same network segment, if but collected data packet in a period is all given same
CPU or thread are handled, then can be due to waiting in line to substantially reduce the efficiency for flowing reduction.Therefore, one kind provided by the invention
The detection method of sensitive information leakage will be assigned in identical CPU with the data packet of the same purpose IP network section by cluster
Stream reduction treatment is carried out, the data packet of different segment carries out parallel processing.
Preferably due to which the transmitting of the data packet of TCP connection is in following features: (1) when data packet is according to connection identifier
After determination is certain TCP connection, then next data packet is likely to be also from the same TCP connection;(2) when certain TCP connection
After data packet reaches, next data packet of the link can also reach quickly.
Data packet searches the time of traversal during in order to reduce TCP recombination, needs using the original preferentially accessed recently
Then, thus the data structure that is combined with Splay tree of approach application Hash chained list come realize lookup in TCP flow reduction and time
It goes through.The data structure is based on principle of locality, will search hit prior node every time, make the hit rate searched in linear list gradually
Tend to successively decrease, reduces the number that traversal compares, accelerate search speed.
The lookup and traversal in TCP flow reduction are realized with the data structure that hash chained list is combined with Splay tree
The key of Hash chained list is the design of Hash function, divides keyword uniformly by the hash value of Hash function being calculated
Cloth is in address section.
Further, further includes: carried out by all tie points of the Splay tree to each list item in the Hash chained list
Tissue, obtains corresponding connection identifier;
The address Hash is calculated by Hash function according to the connection identifier, then corresponding to the address Hash
Splay tree is searched, so as to be searched the text information after the reduction, deleted and modified.
Splay tree is a kind of binary search tree of self-regulated shaping type, will access frequent node every time and pass through a series of rotations
Turn to be moved to top layer as root node, search efficiency can be improved.Since the arrival of network packet has principle of locality,
The next data packet connected belonging to this data packet can also reach quickly, therefore node of this access is possible to another in tree
Secondary accessed, searching only needs relatively once.After tree experienced a series of access, the node frequently accessed recently will
Close to root node, reduces average lookup traversal number, improve lookup rate.
Further, the text information to after the reduction carries out natural language processing, after judging the reduction
With the presence or absence of sensitive text information in text information, and the leakage rate of the sensitive text information is calculated, specifically included:
Keyword relevant to preset sensitive information is built into sensitive dictionary;
Text information after the reduction is compared with the keyword in sensitive dictionary;
Text identical with the keyword in the sensitive dictionary in text information after the reduction is then identified,
Obtain sensitive text information;
The leakage rate X of the sensitive text information is calculated according to the following formula;
In formula, S is the total quantity of sensitive text information, and S' is the keyword total quantity in sensitive dictionary.
By natural language processing, extracts keyword and establish sensitive dictionary, improve sensitive information recognition efficiency.
Further, further includes: setting sensitive information leakage recognition threshold
By sensitive information slip X withIt is compared judgement;
IfThen the text is determined as doubtful information leakage;IfAndThen it is determined as that information is let out
Dew;IfAnd X≤100%, then it is determined as serious information leakage.
The processing such as to trace to the source according to the progress early warning of sensitive information leakage degree.
In a particular embodiment, it is determining there are after sensitive information, further according to the source IP of sensitive information data packet
Source of leakage is traced, destination host is repositioned, so that locking information reveals main body, and is taken based on the envelope in source to information leakage
It is stifled.
Detection method, device and the storage medium of a kind of sensitive information leakage provided in an embodiment of the present invention have as follows
The utility model has the advantages that
Network packet to be detected is captured by DMA technology, the direct memory access (DMA) to data is realized, keeps away
The system for having exempted from memory copying is called, and access efficiency is improved;It is clustered, will be had same for the purpose IP network section of data packet
The data packet of one purpose IP network section is assigned in identical CPU by cluster and carries out stream reduction treatment, the data of different segment
Packet carries out parallel processing, and the hit rate searched in linear list is made gradually to tend to successively decrease, and reduces the number that traversal compares, improves
Inquiry velocity;By natural language processing, extracts keyword and establish sensitive dictionary, improve sensitive information recognition efficiency;Sentencing
It makes there are after sensitive information, traces source of leakage further according to the source IP of sensitive information data packet, destination host is repositioned, to lock
Determine information leakage main body, and is taken based on the closure in source to information leakage.
Referring to fig. 2, be a kind of sensitive information leakage provided by Embodiment 2 of the present invention detection device structural schematic diagram,
Include:
Packet capture unit 201, for acquiring network packet to be detected;
Packet classification unit 202, for classifying to the network packet to be detected;
Data packet reduction unit 203 is adopted for the network packet to be detected to be respectively sent to several processors
Parallel TCP flows reduction treatment is carried out with the processor, the text information after being restored;
Computing unit 204 is revealed, for carrying out natural language processing to the text information after the reduction, judgement is described also
With the presence or absence of sensitive text information in text information after original, and calculate the leakage rate of the sensitive text information;
Source of leakage positioning unit 205, for the network packet where the sensitive text information to be determined as sensitive letter
Leak data packet is ceased, Source Tracing, location-sensitive information leakage source are carried out to the source address of the sensitive information leakage data packet.
The correspondence of the embodiment of the present invention three provides a kind of detection device of sensitive information leakage, including processor, memory
And the computer program executed by the processor is stored in the memory and is configured as, the processor executes institute
The detection method of the sensitive information leakage as described in the embodiment of the present invention one is realized when stating computer program.The sensitive information is let out
The detection device of dew can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.The sensitivity
The detection device of information leakage may include, but be not limited only to, processor, memory.
The correspondence of the embodiment of the present invention four provides a kind of computer readable storage medium, which is characterized in that the computer
Readable storage medium storing program for executing includes the computer program of storage, wherein controlling the computer in computer program operation can
Equipment executes the detection method of the sensitive information leakage as described in the embodiment of the present invention one where reading storage medium.
Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it
His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng the processor is the control centre of the detection device of the sensitive information leakage, whole using various interfaces and connection
The various pieces of the detection device of a sensitive information leakage.
The memory can be used for storing the computer program and/or module, and the processor is by operation or executes
Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization
The various functions of the detection device of sensitive information leakage.The memory can mainly include storing program area and storage data area,
Wherein, storing program area can application program needed for storage program area, at least one function (such as sound-playing function, figure
As playing function etc.) etc.;Storage data area, which can be stored, uses created data (such as audio data, phone according to mobile phone
This etc.) etc..In addition, memory may include high-speed random access memory, it can also include nonvolatile memory, such as firmly
Disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital,
SD) block, flash card (Flash Card), at least one disk memory, flush memory device or other volatile solid-states
Part.
Wherein, if the integrated module/unit of the detection device of the sensitive information leakage is with the shape of SFU software functional unit
Formula realize and when sold or used as an independent product, can store in a computer readable storage medium.It is based on
Such understanding, the present invention realize above-described embodiment method in all or part of the process, can also by computer program come
Relevant hardware is instructed to complete, the computer program can be stored in a computer readable storage medium, the computer
Program is when being executed by processor, it can be achieved that the step of above-mentioned each embodiment of the method.Wherein, the computer program includes meter
Calculation machine program code, the computer program code can be source code form, object identification code form, executable file or certain
Intermediate form etc..The computer-readable medium may include: can carry the computer program code any entity or
Device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only
Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software
Distribution medium etc..
It should be noted that the apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
It needs that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.In addition, device provided by the invention
In embodiment attached drawing, the connection relationship between module indicate between them have communication connection, specifically can be implemented as one or
A plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, it can understand
And implement.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (10)
1. a kind of detection method of sensitive information leakage characterized by comprising
Acquire network packet to be detected;
Classify to the network packet to be detected;
The network packet to be detected is respectively sent to several processors, parallel TCP flows are carried out using the processor
Reduction treatment, the text information after being restored;
Natural language processing is carried out to the text information after the reduction, whether there is in the text information after judging the reduction
Sensitive text information, and calculate the leakage rate of the sensitive text information;
Network packet where the sensitive text information is determined as sensitive information leakage data packet, to the sensitive information
The source address of leak data packet carries out Source Tracing, location-sensitive information leakage source.
2. a kind of detection method of sensitive information leakage as described in claim 1, which is characterized in that the acquisition is to be detected
Network packet specifically includes:
The network packet to be detected that will be captured is sent in pre-assigned address space;
Wherein, the address space is corresponding with buffer queue;The buffer queue uses the mechanism of first in first out, and head of the queue is for counting
According to reading, tail of the queue, which is used to analyze the received data, to be written.
3. a kind of detection method of sensitive information leakage as described in claim 1, which is characterized in that described to described to be detected
Network packet classify, specifically include:
Address resolution is carried out to the network packet to be detected collected in certain time period;
By the network packet to be detected after parsing according to the address field of destination address carry out clustering processing, and according to cluster at
Reason result is classified.
4. a kind of detection method of sensitive information leakage as described in claim 1, which is characterized in that it is described will be described to be detected
Network packet be respectively sent to several processors, using the processor carry out parallel TCP flows reduction treatment, restored
Text information afterwards, specifically includes:
Using TCP connection mark SIP, SPT, DIP and DPT as keyword, then have Hash chained list calculation formula as follows:
The Hash chained list is calculated, all TCP connection points are assigned to each list item in the Hash chained list, realize TCP flow
Reduction treatment, the text information after being restored.
5. a kind of detection method of sensitive information leakage as claimed in claim 4, which is characterized in that further include:
Tissue is carried out by all tie points of the Splay tree to each list item in the Hash chained list, obtains corresponding connection
Mark;
The address Hash is calculated by Hash function according to the connection identifier, then to the corresponding Splay tree in the address Hash
It is searched, so as to be searched the text information after the reduction, deleted and modified.
6. a kind of detection method of sensitive information leakage as described in claim 1, which is characterized in that it is described to the reduction after
Text information carry out natural language processing, with the presence or absence of sensitive text information in the text information after judging the reduction, and
The leakage rate for calculating the sensitive text information, specifically includes:
Keyword relevant to preset sensitive information is built into sensitive dictionary;
Text information after the reduction is compared with the keyword in sensitive dictionary;
Text identical with the keyword in the sensitive dictionary in text information after the reduction is then identified, is obtained
Sensitive text information;
The leakage rate X of the sensitive text information is calculated according to the following formula;
In formula, S is the total quantity of sensitive text information, and S' is the keyword total quantity in sensitive dictionary.
7. a kind of detection method of sensitive information leakage as claimed in claim 6, which is characterized in that further include:
Set sensitive information leakage recognition threshold
By sensitive information slip X withIt is compared judgement;
IfThen the text is determined as doubtful information leakage;IfAndThen it is determined as information leakage;IfAnd X≤100%, then it is determined as serious information leakage.
8. a kind of detection device of sensitive information leakage characterized by comprising
Packet capture unit, for acquiring network packet to be detected;
Packet classification unit, for classifying to the network packet to be detected;
Data packet reduction unit, for the network packet to be detected to be respectively sent to several processors, using described
Processor carries out parallel TCP flows reduction treatment, the text information after being restored;
Computing unit is revealed, for carrying out natural language processing to the text information after the reduction, after judging the reduction
With the presence or absence of sensitive text information in text information, and calculate the leakage rate of the sensitive text information;
Source of leakage positioning unit, for the network packet where the sensitive text information to be determined as sensitive information leakage number
According to packet, Source Tracing, location-sensitive information leakage source are carried out to the source address of the sensitive information leakage data packet.
9. a kind of detection device of sensitive information leakage, including processor, memory and storage in the memory and by
It is configured to the computer program executed by the processor, is realized when the processor executes the computer program as right is wanted
A kind of detection method of sensitive information leakage described in asking any one of 1 to 7.
10. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium includes the calculating of storage
Machine program, wherein equipment where controlling the computer readable storage medium in computer program operation is executed as weighed
Benefit require any one of 1 to 7 described in a kind of detection method of sensitive information leakage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910579777.0A CN110377977A (en) | 2019-06-28 | 2019-06-28 | Sensitive information leakage detection method and device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910579777.0A CN110377977A (en) | 2019-06-28 | 2019-06-28 | Sensitive information leakage detection method and device and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110377977A true CN110377977A (en) | 2019-10-25 |
Family
ID=68251312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910579777.0A Pending CN110377977A (en) | 2019-06-28 | 2019-06-28 | Sensitive information leakage detection method and device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110377977A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112181830A (en) * | 2020-09-28 | 2021-01-05 | 厦门美柚股份有限公司 | Memory leak detection method, device, terminal and medium |
CN112597770A (en) * | 2020-12-16 | 2021-04-02 | 盐城数智科技有限公司 | Sensitive information query method based on deep learning |
CN113704752A (en) * | 2021-08-31 | 2021-11-26 | 上海观安信息技术股份有限公司 | Data leakage behavior detection method and device, computer equipment and storage medium |
CN113765852A (en) * | 2020-06-03 | 2021-12-07 | 深信服科技股份有限公司 | Data packet detection method, system, storage medium and computing device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6631422B1 (en) * | 1999-08-26 | 2003-10-07 | International Business Machines Corporation | Network adapter utilizing a hashing function for distributing packets to multiple processors for parallel processing |
CN102779176A (en) * | 2012-06-27 | 2012-11-14 | 北京奇虎科技有限公司 | System and method for key word filtering |
CN103746919A (en) * | 2014-01-14 | 2014-04-23 | 浪潮电子信息产业股份有限公司 | Method for quickly classifying network packets through combining multi-way decision tree and Hash tables |
JP2014175781A (en) * | 2013-03-07 | 2014-09-22 | Hitachi High-Technologies Corp | Parallel packet processing apparatus, method and program |
US20170214709A1 (en) * | 2009-04-21 | 2017-07-27 | Bandura, Llc | Structuring data and pre-compiled exception list engines and internet protocol threat prevention |
CN109547389A (en) * | 2017-08-08 | 2019-03-29 | 中国移动通信集团宁夏有限公司 | A kind of method and device of ASCII stream file ASCII recombination |
CN109766525A (en) * | 2019-01-14 | 2019-05-17 | 湖南大学 | A data-driven sensitive information leak detection framework |
-
2019
- 2019-06-28 CN CN201910579777.0A patent/CN110377977A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6631422B1 (en) * | 1999-08-26 | 2003-10-07 | International Business Machines Corporation | Network adapter utilizing a hashing function for distributing packets to multiple processors for parallel processing |
US20170214709A1 (en) * | 2009-04-21 | 2017-07-27 | Bandura, Llc | Structuring data and pre-compiled exception list engines and internet protocol threat prevention |
CN102779176A (en) * | 2012-06-27 | 2012-11-14 | 北京奇虎科技有限公司 | System and method for key word filtering |
JP2014175781A (en) * | 2013-03-07 | 2014-09-22 | Hitachi High-Technologies Corp | Parallel packet processing apparatus, method and program |
CN103746919A (en) * | 2014-01-14 | 2014-04-23 | 浪潮电子信息产业股份有限公司 | Method for quickly classifying network packets through combining multi-way decision tree and Hash tables |
CN109547389A (en) * | 2017-08-08 | 2019-03-29 | 中国移动通信集团宁夏有限公司 | A kind of method and device of ASCII stream file ASCII recombination |
CN109766525A (en) * | 2019-01-14 | 2019-05-17 | 湖南大学 | A data-driven sensitive information leak detection framework |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113765852A (en) * | 2020-06-03 | 2021-12-07 | 深信服科技股份有限公司 | Data packet detection method, system, storage medium and computing device |
CN113765852B (en) * | 2020-06-03 | 2023-05-12 | 深信服科技股份有限公司 | Data packet detection method, system, storage medium and computing device |
CN112181830A (en) * | 2020-09-28 | 2021-01-05 | 厦门美柚股份有限公司 | Memory leak detection method, device, terminal and medium |
CN112181830B (en) * | 2020-09-28 | 2022-08-09 | 厦门美柚股份有限公司 | Memory leak detection method, device, terminal and medium |
CN112597770A (en) * | 2020-12-16 | 2021-04-02 | 盐城数智科技有限公司 | Sensitive information query method based on deep learning |
CN112597770B (en) * | 2020-12-16 | 2024-06-11 | 盐城数智科技有限公司 | Sensitive information query method based on deep learning |
CN113704752A (en) * | 2021-08-31 | 2021-11-26 | 上海观安信息技术股份有限公司 | Data leakage behavior detection method and device, computer equipment and storage medium |
CN113704752B (en) * | 2021-08-31 | 2024-01-26 | 上海观安信息技术股份有限公司 | Method and device for detecting data leakage behavior, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110377977A (en) | Sensitive information leakage detection method and device and storage medium | |
WO2022134794A1 (en) | Method and apparatus for processing public opinions about news event, storage medium, and computer device | |
CN110138784A (en) | A kind of Network Intrusion Detection System based on feature selecting | |
CN109309630A (en) | A network traffic classification method, system and electronic device | |
CN111385297A (en) | Wireless device fingerprint identification method, system, device and readable storage medium | |
CN103136471A (en) | Method and system for testing malicious Android application programs | |
CN106875278B (en) | A method of social network user portrait based on random forest | |
CN108959399A (en) | Distributed data deletes flow control method, device, electronic equipment and storage medium | |
CN104615936B (en) | Cloud platform VMM layer behavior monitoring method | |
CN113037567B (en) | A simulation method for network attack behavior simulation system of power grid enterprises | |
WO2023035558A1 (en) | Anchor point cut-based image processing method and apparatus, device, and medium | |
CN109857784A (en) | A kind of big data statistical analysis system | |
CN106792523A (en) | A kind of anomaly detection method based on extensive WiFi event traces | |
CN108259637A (en) | A kind of NAT device recognition methods and device based on decision tree | |
CN113297249A (en) | Slow query statement identification and analysis method and device and query statement statistical method and device | |
CN114969467A (en) | Data analysis and classification method and device, computer equipment and storage medium | |
CN107493275A (en) | The extracted in self-adaptive and analysis method and system of heterogeneous network security log information | |
CN116760175A (en) | Non-invasive load identification method, device, terminal and storage medium | |
CN110647461A (en) | Method and system for sorting regression test cases based on multi-information fusion | |
CN102323975A (en) | A method for judging message correctness based on IEC61850 model file | |
CN116206093B (en) | Method, system and readable storage medium for electric meter data collection based on bitmap | |
CN105786929A (en) | Information monitoring method and device | |
CN107133644B (en) | Digital Library Content Analysis System and Method | |
CN116561750A (en) | Virus characteristic analysis and detection device, system and method based on multiple engines | |
CN106649678B (en) | Data processing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191025 |
|
RJ01 | Rejection of invention patent application after publication |