CN104375984A - Method for detecting sensitive tracks of uploaded files in network - Google Patents
Method for detecting sensitive tracks of uploaded files in network Download PDFInfo
- Publication number
- CN104375984A CN104375984A CN201410668759.7A CN201410668759A CN104375984A CN 104375984 A CN104375984 A CN 104375984A CN 201410668759 A CN201410668759 A CN 201410668759A CN 104375984 A CN104375984 A CN 104375984A
- Authority
- CN
- China
- Prior art keywords
- responsive
- track
- vocabulary
- upload file
- detected
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a method for detecting sensitive tracks of uploaded files in a network. The method comprises the steps that a sensitive track set is set, sensitive words are searched for in the uploaded files and are extracted in sequence, the character positions where the sensitive words are located are extracted, the extracted sensitive words form a sensitive track to be detected according to the sequence of extraction, and matching between the sensitive track to be detected and all the sensitive tracks in the sensitive track set is carried out one by one; after matching is successful, whether the character pitch between every two adjacent sensitive words is smaller than or equal to the character pitch threshold value M is judged so that whether the sensitive track to be detected is the sensitive track in one uploaded file can be determined, and accurate research can be carried out on the sensitive words so that whether the sensitive track to be detected is the sensitive track in the uploaded file can be determined.
Description
Technical field
The present invention relates to secure file and upload field, particularly a kind of method of responsive track in Sampling network upload file.
Background technology
Along with the exploitation of network, user can freely communicate one's views on the net, can obtain more fully information although it is so, but also makes some bad speeches on network like this, serious even can raise fear, and will carry out strict monitoring like this for the file uploaded in network.
If each terminal server needs manually to carry out file content examination, so not only efficiency is very low but also waste of manpower resource, in order to raise the efficiency, at present, the method that each terminal is commonly used, for arrange keyword on the server, shields by carrying out keyword to upload file content with the effect reaching monitoring.
As application number " 200710308404.7 " denomination of invention " the keyword prevention method for Ill short message " the invention provides a kind of keyword prevention method for Ill short message; comprise the steps: that (1) operator or service provider provide a lists of keywords for Ill short message in advance, each entry of lists of keywords comprises two contents: the probability of occurrence of keyword, keyword; (2) user obtains full content or a subset of lists of keywords; (3) acquired keyword is merged in the lists of keywords of user mobile phone inside; (4) user mobile phone is according to the short message of the direct bag filter of lists of keywords containing keyword.The present invention effectively supplements the deficiency that original " keyword method " exists in practicality.
Application number " 201210479196.8 " denomination of invention " text filtering methods based on keyword weights " this application provides a kind of text filtering method based on keyword weights, and the method comprises the following steps: the weights calculating keyword; And based on the weights of calculated keyword, text is filtered; Wherein, the step calculating keyword weights comprises: judge whether described keyword is brand-new keyword, if so, then calculate the number of correct decision data in history decision data and the number of wrong decision data and comprise the number of correct decision data and the number of wrong decision data of described keyword; And calculate the weights of described keyword.In addition, present invention also provides a kind of text filtering system based on keyword weights.
Although can effectively be shielded flame by the weights of single keyword and keyword, so also make only to there is single keyword in some upload files and do not belong to the file conductively-closed too of bad message.
Summary of the invention
Technical matters solved by the invention is the method providing responsive track in a kind of Sampling network upload file, and the method determines whether comprise responsive track in upload file after utilizing responsive path matching to realize and verifying the responsive vocabulary in upload file.
The technical solution realizing the object of the invention is: a kind of method of responsive track in Sampling network upload file, specifically comprises the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, arrange responsive track collection according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, and the character pitch threshold value between adjacent two responsive vocabulary is M, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, when with a wherein responsive path matching success after, execution step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, character pitch threshold value M is 20.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, be extracted as in order in step 5 and search from file first character, extract belong in responsive lexicon the responsive vocabulary stored.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, the responsive vocabulary extracted is formed responsive track to be detected by step 5 in order, is specially and responsive vocabulary is formed responsive track to be detected according to the sequencing extracted.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, be specially: comprise responsive track to be detected in responsive track and be this responsive track to be detected and the success of responsive path matching.
The present invention compared with prior art, its remarkable advantage:
(1) by arranging responsive track collection, in upload file, search responsive vocabulary and responsive vocabulary and current residing character position thereof will be extracted in order, sequence of extraction is pressed in the responsive vocabulary extracted and forms responsive track to be detected, the responsive track of the every bar responsive track to be detected and responsive track concentrated mates the seek rate that can improve and associate keyword one by one.
(2) after the match is successful, whether the character pitch judging between adjacent two responsive vocabulary is again less than or equal to character pitch threshold value M to determine that whether responsive track to be detected is the responsive track in this upload file, can accurately search to determine that whether responsive track to be detected is the responsive track in this upload file to responsive vocabulary.
Embodiment
The method of responsive track in the upload file of a kind of Sampling network of the present invention, specifically comprises the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, arrange responsive track collection according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, and the character pitch threshold value between adjacent two responsive vocabulary is M, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, when with a wherein responsive path matching success after, execution step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
The present invention is first by arranging responsive track collection, in upload file, search responsive vocabulary and responsive vocabulary and current residing character position thereof will be extracted in order, sequence of extraction is pressed in the responsive vocabulary extracted and forms responsive track to be detected, the responsive track of every bar responsive track to be detected and responsive track concentrated mates one by one; After the match is successful, whether the character pitch judging between adjacent two responsive vocabulary is again less than or equal to character pitch threshold value M to determine that whether responsive track to be detected is the responsive track in this upload file, can accurately search to determine that whether responsive track to be detected is the responsive track in this upload file to responsive vocabulary.
Embodiment 1
A method for responsive track in Sampling network upload file, specifically comprises the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, responsive track collection is set according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, the character pitch threshold value between adjacent two responsive vocabulary is 20, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, wherein, be extracted as in order and search from file first character, extract and belong in responsive lexicon the responsive vocabulary stored, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, its matching process is: comprise responsive track to be detected in responsive track and be this responsive track to be detected and responsive path matching successfully; When with a wherein responsive path matching success after, perform step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
Embodiment 2
In the present embodiment, responsive lexicon comprises responsive vocabulary: " big bang " " Nanjing " " tomorrow evening " etc.; From this responsive lexicon, arrange a responsive track: tomorrow evening-Nanjing-big bang, the character pitch threshold value between these three responsive vocabulary is 20;
Network upload file 1 and network upload file 2 are searched respectively, from upload file 1, finds " tomorrow evening " this responsive vocabulary, owing to only comprising a responsive vocabulary in upload file 1, then think and do not comprise responsive track in upload file 1;
" Nanjing " " big bang " two responsive vocabulary are found from upload file 2, extract " Nanjing 20 " " big bang 24 " in order, wherein, 20 and 24 represent the character position that " Nanjing " and " big bang " two responsive vocabulary store in upload file 2 respectively, these two responsive vocabulary are formed responsive track to be detected by sequence of extraction: Nanjing-big bang, this responsive track to be detected is mated with responsive track, because responsive track comprises this responsive track to be detected, and the character pitch of " Nanjing " and " big bang " two responsive vocabulary in upload file 2 is 4 be less than character pitch threshold value 20, then think that this responsive track to be detected is the responsive track in upload file 2.
Obviously, the above embodiment of the present invention is only for example of the present invention is clearly described, and is not the restriction to embodiments of the present invention.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here exhaustive without the need to also giving all embodiments.And these belong to connotation of the present invention the apparent change of extending out or variation still belong to protection scope of the present invention.
Claims (5)
1. the method for responsive track in Sampling network upload file, is characterized in that, specifically comprise the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, arrange responsive track collection according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, and the character pitch threshold value between adjacent two responsive vocabulary is M, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, when with a wherein responsive path matching success after, execution step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
2. the method for responsive track in Sampling network upload file according to claim 1, it is characterized in that, character pitch threshold value M is 20.
3. the method for responsive track in Sampling network upload file according to claim 1, is characterized in that, be extracted as in order and search from file first character in step 5, extract and belong in responsive lexicon the responsive vocabulary stored.
4. the method for responsive track in Sampling network upload file according to claim 1, it is characterized in that, the responsive vocabulary extracted is formed responsive track to be detected by step 5 in order, is specially and responsive vocabulary is formed responsive track to be detected according to the sequencing extracted.
5. the method for responsive track in Sampling network upload file according to claim 1, it is characterized in that, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, be specially: comprise responsive track to be detected in responsive track and be this responsive track to be detected and the success of responsive path matching.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410668759.7A CN104375984A (en) | 2014-11-21 | 2014-11-21 | Method for detecting sensitive tracks of uploaded files in network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410668759.7A CN104375984A (en) | 2014-11-21 | 2014-11-21 | Method for detecting sensitive tracks of uploaded files in network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104375984A true CN104375984A (en) | 2015-02-25 |
Family
ID=52554907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410668759.7A Withdrawn CN104375984A (en) | 2014-11-21 | 2014-11-21 | Method for detecting sensitive tracks of uploaded files in network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104375984A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105847516A (en) * | 2016-05-28 | 2016-08-10 | 腾讯科技(深圳)有限公司 | Method and device for managing contact person information |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10240759A (en) * | 1997-02-28 | 1998-09-11 | Sharp Corp | Retrieval device |
CN1403965A (en) * | 2001-09-05 | 2003-03-19 | 联想(北京)有限公司 | Jamproof theme word extracting method |
CN101477544A (en) * | 2009-01-12 | 2009-07-08 | 腾讯科技(深圳)有限公司 | Rubbish text recognition method and system |
CN102779176A (en) * | 2012-06-27 | 2012-11-14 | 北京奇虎科技有限公司 | System and method for key word filtering |
-
2014
- 2014-11-21 CN CN201410668759.7A patent/CN104375984A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10240759A (en) * | 1997-02-28 | 1998-09-11 | Sharp Corp | Retrieval device |
CN1403965A (en) * | 2001-09-05 | 2003-03-19 | 联想(北京)有限公司 | Jamproof theme word extracting method |
CN101477544A (en) * | 2009-01-12 | 2009-07-08 | 腾讯科技(深圳)有限公司 | Rubbish text recognition method and system |
CN102779176A (en) * | 2012-06-27 | 2012-11-14 | 北京奇虎科技有限公司 | System and method for key word filtering |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105847516A (en) * | 2016-05-28 | 2016-08-10 | 腾讯科技(深圳)有限公司 | Method and device for managing contact person information |
CN105847516B (en) * | 2016-05-28 | 2019-04-26 | 腾讯科技(深圳)有限公司 | A kind of method for managing contact person information and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103336766A (en) | Short text garbage identification and modeling method and device | |
US20120159625A1 (en) | Malicious code detection and classification system using string comparison and method thereof | |
US20150207704A1 (en) | Public opinion information display system and method | |
US20180173450A1 (en) | Method and Device for File Name Identification and File Cleaning | |
CN104700033A (en) | Virus detection method and virus detection device | |
CN102542061B (en) | Intelligent product classification method | |
CN102170640A (en) | Mode library-based smart mobile phone terminal adverse content website identifying method | |
CN102646124A (en) | Method for automatically identifying address information | |
CN105718795B (en) | Malicious code evidence collecting method and system under Linux based on condition code | |
US20190005057A1 (en) | Methods and Devices for File Folder Path Identification and File Folder Cleaning | |
CN101673266A (en) | Method for searching audio and video contents | |
CN104317891A (en) | Method and device for tagging pages | |
CN104317909A (en) | Method and device for verifying data of points of interest | |
CN104951553B (en) | A kind of accurate content of data processing is collected and data mining platform and its implementation | |
CN106650451A (en) | Detection method and device | |
CN103902906A (en) | Mobile terminal malicious code detecting method and system based on application icon | |
CN103324888A (en) | Method and system for automatically extracting virus characteristics based on family samples | |
CN102194503B (en) | Player and character code detection method and device for subtitle file | |
CN109857842B (en) | Method and device for recognizing fault-reporting text | |
CN103823809A (en) | Query phrase classification method and device, and classification optimization method and device | |
CN105224603A (en) | Corpus acquisition methods and device | |
CN104375983A (en) | Detection system of sensitive track in network uploaded file | |
CN104978523A (en) | Malicious sample capture method and system based on network hot word recognition | |
CN104636340A (en) | Webpage URL filtering method, device and system | |
CN109670153A (en) | A kind of determination method, apparatus, storage medium and the terminal of similar model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C04 | Withdrawal of patent application after publication (patent law 2001) | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20150225 |