[go: up one dir, main page]

CN104375984A - Method for detecting sensitive tracks of uploaded files in network - Google Patents

Method for detecting sensitive tracks of uploaded files in network Download PDF

Info

Publication number
CN104375984A
CN104375984A CN201410668759.7A CN201410668759A CN104375984A CN 104375984 A CN104375984 A CN 104375984A CN 201410668759 A CN201410668759 A CN 201410668759A CN 104375984 A CN104375984 A CN 104375984A
Authority
CN
China
Prior art keywords
responsive
track
vocabulary
upload file
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201410668759.7A
Other languages
Chinese (zh)
Inventor
沈智广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUXI COCIS ELECTRONICS SCIENCE AND TECHNOLOGY Co Ltd
Original Assignee
WUXI COCIS ELECTRONICS SCIENCE AND TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUXI COCIS ELECTRONICS SCIENCE AND TECHNOLOGY Co Ltd filed Critical WUXI COCIS ELECTRONICS SCIENCE AND TECHNOLOGY Co Ltd
Priority to CN201410668759.7A priority Critical patent/CN104375984A/en
Publication of CN104375984A publication Critical patent/CN104375984A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a method for detecting sensitive tracks of uploaded files in a network. The method comprises the steps that a sensitive track set is set, sensitive words are searched for in the uploaded files and are extracted in sequence, the character positions where the sensitive words are located are extracted, the extracted sensitive words form a sensitive track to be detected according to the sequence of extraction, and matching between the sensitive track to be detected and all the sensitive tracks in the sensitive track set is carried out one by one; after matching is successful, whether the character pitch between every two adjacent sensitive words is smaller than or equal to the character pitch threshold value M is judged so that whether the sensitive track to be detected is the sensitive track in one uploaded file can be determined, and accurate research can be carried out on the sensitive words so that whether the sensitive track to be detected is the sensitive track in the uploaded file can be determined.

Description

A kind of method of responsive track in Sampling network upload file
Technical field
The present invention relates to secure file and upload field, particularly a kind of method of responsive track in Sampling network upload file.
Background technology
Along with the exploitation of network, user can freely communicate one's views on the net, can obtain more fully information although it is so, but also makes some bad speeches on network like this, serious even can raise fear, and will carry out strict monitoring like this for the file uploaded in network.
If each terminal server needs manually to carry out file content examination, so not only efficiency is very low but also waste of manpower resource, in order to raise the efficiency, at present, the method that each terminal is commonly used, for arrange keyword on the server, shields by carrying out keyword to upload file content with the effect reaching monitoring.
As application number " 200710308404.7 " denomination of invention " the keyword prevention method for Ill short message " the invention provides a kind of keyword prevention method for Ill short message; comprise the steps: that (1) operator or service provider provide a lists of keywords for Ill short message in advance, each entry of lists of keywords comprises two contents: the probability of occurrence of keyword, keyword; (2) user obtains full content or a subset of lists of keywords; (3) acquired keyword is merged in the lists of keywords of user mobile phone inside; (4) user mobile phone is according to the short message of the direct bag filter of lists of keywords containing keyword.The present invention effectively supplements the deficiency that original " keyword method " exists in practicality.
Application number " 201210479196.8 " denomination of invention " text filtering methods based on keyword weights " this application provides a kind of text filtering method based on keyword weights, and the method comprises the following steps: the weights calculating keyword; And based on the weights of calculated keyword, text is filtered; Wherein, the step calculating keyword weights comprises: judge whether described keyword is brand-new keyword, if so, then calculate the number of correct decision data in history decision data and the number of wrong decision data and comprise the number of correct decision data and the number of wrong decision data of described keyword; And calculate the weights of described keyword.In addition, present invention also provides a kind of text filtering system based on keyword weights.
Although can effectively be shielded flame by the weights of single keyword and keyword, so also make only to there is single keyword in some upload files and do not belong to the file conductively-closed too of bad message.
Summary of the invention
Technical matters solved by the invention is the method providing responsive track in a kind of Sampling network upload file, and the method determines whether comprise responsive track in upload file after utilizing responsive path matching to realize and verifying the responsive vocabulary in upload file.
The technical solution realizing the object of the invention is: a kind of method of responsive track in Sampling network upload file, specifically comprises the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, arrange responsive track collection according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, and the character pitch threshold value between adjacent two responsive vocabulary is M, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, when with a wherein responsive path matching success after, execution step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, character pitch threshold value M is 20.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, be extracted as in order in step 5 and search from file first character, extract belong in responsive lexicon the responsive vocabulary stored.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, the responsive vocabulary extracted is formed responsive track to be detected by step 5 in order, is specially and responsive vocabulary is formed responsive track to be detected according to the sequencing extracted.
Preferred version further, in Sampling network upload file of the present invention responsive track method in, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, be specially: comprise responsive track to be detected in responsive track and be this responsive track to be detected and the success of responsive path matching.
The present invention compared with prior art, its remarkable advantage:
(1) by arranging responsive track collection, in upload file, search responsive vocabulary and responsive vocabulary and current residing character position thereof will be extracted in order, sequence of extraction is pressed in the responsive vocabulary extracted and forms responsive track to be detected, the responsive track of the every bar responsive track to be detected and responsive track concentrated mates the seek rate that can improve and associate keyword one by one.
(2) after the match is successful, whether the character pitch judging between adjacent two responsive vocabulary is again less than or equal to character pitch threshold value M to determine that whether responsive track to be detected is the responsive track in this upload file, can accurately search to determine that whether responsive track to be detected is the responsive track in this upload file to responsive vocabulary.
Embodiment
The method of responsive track in the upload file of a kind of Sampling network of the present invention, specifically comprises the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, arrange responsive track collection according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, and the character pitch threshold value between adjacent two responsive vocabulary is M, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, when with a wherein responsive path matching success after, execution step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
The present invention is first by arranging responsive track collection, in upload file, search responsive vocabulary and responsive vocabulary and current residing character position thereof will be extracted in order, sequence of extraction is pressed in the responsive vocabulary extracted and forms responsive track to be detected, the responsive track of every bar responsive track to be detected and responsive track concentrated mates one by one; After the match is successful, whether the character pitch judging between adjacent two responsive vocabulary is again less than or equal to character pitch threshold value M to determine that whether responsive track to be detected is the responsive track in this upload file, can accurately search to determine that whether responsive track to be detected is the responsive track in this upload file to responsive vocabulary.
Embodiment 1
A method for responsive track in Sampling network upload file, specifically comprises the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, responsive track collection is set according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, the character pitch threshold value between adjacent two responsive vocabulary is 20, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, wherein, be extracted as in order and search from file first character, extract and belong in responsive lexicon the responsive vocabulary stored, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, its matching process is: comprise responsive track to be detected in responsive track and be this responsive track to be detected and responsive path matching successfully; When with a wherein responsive path matching success after, perform step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
Embodiment 2
In the present embodiment, responsive lexicon comprises responsive vocabulary: " big bang " " Nanjing " " tomorrow evening " etc.; From this responsive lexicon, arrange a responsive track: tomorrow evening-Nanjing-big bang, the character pitch threshold value between these three responsive vocabulary is 20;
Network upload file 1 and network upload file 2 are searched respectively, from upload file 1, finds " tomorrow evening " this responsive vocabulary, owing to only comprising a responsive vocabulary in upload file 1, then think and do not comprise responsive track in upload file 1;
" Nanjing " " big bang " two responsive vocabulary are found from upload file 2, extract " Nanjing 20 " " big bang 24 " in order, wherein, 20 and 24 represent the character position that " Nanjing " and " big bang " two responsive vocabulary store in upload file 2 respectively, these two responsive vocabulary are formed responsive track to be detected by sequence of extraction: Nanjing-big bang, this responsive track to be detected is mated with responsive track, because responsive track comprises this responsive track to be detected, and the character pitch of " Nanjing " and " big bang " two responsive vocabulary in upload file 2 is 4 be less than character pitch threshold value 20, then think that this responsive track to be detected is the responsive track in upload file 2.
Obviously, the above embodiment of the present invention is only for example of the present invention is clearly described, and is not the restriction to embodiments of the present invention.For those of ordinary skill in the field, can also make other changes in different forms on the basis of the above description.Here exhaustive without the need to also giving all embodiments.And these belong to connotation of the present invention the apparent change of extending out or variation still belong to protection scope of the present invention.

Claims (5)

1. the method for responsive track in Sampling network upload file, is characterized in that, specifically comprise the following steps:
Step 1, responsive lexicon is set, for storing responsive vocabulary;
Step 2, arrange responsive track collection according to the responsive vocabulary in responsive lexicon, wherein each responsive track is made up of at least two responsive vocabulary, and responsive vocabulary has aeoplotropism, and the character pitch threshold value between adjacent two responsive vocabulary is M, M is positive integer;
Step 3, monitoring upload file, search in upload file whether comprise responsive vocabulary, if comprise, performs step 4, otherwise do not comprise responsive track in this upload file;
Step 4, judge the quantity of the responsive vocabulary comprised in this upload file when only comprising a responsive vocabulary, then not comprise responsive track in this upload file; When comprising two or more responsive vocabulary, perform step 5;
Step 5, from this upload file, extract responsive vocabulary and current residing character position thereof in order, after the responsive vocabulary extracted is formed responsive track to be detected in order, perform step 6;
Step 6, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, when with a wherein responsive path matching success after, execution step 7;
Step 7, judge whether the character pitch in responsive track to be detected between adjacent two responsive vocabulary is less than or equal to character pitch threshold value M, if be less than or equal to, then judge that this responsive track to be detected is the responsive track in this upload file; Otherwise, judge not comprise responsive track in this upload file.
2. the method for responsive track in Sampling network upload file according to claim 1, it is characterized in that, character pitch threshold value M is 20.
3. the method for responsive track in Sampling network upload file according to claim 1, is characterized in that, be extracted as in order and search from file first character in step 5, extract and belong in responsive lexicon the responsive vocabulary stored.
4. the method for responsive track in Sampling network upload file according to claim 1, it is characterized in that, the responsive vocabulary extracted is formed responsive track to be detected by step 5 in order, is specially and responsive vocabulary is formed responsive track to be detected according to the sequencing extracted.
5. the method for responsive track in Sampling network upload file according to claim 1, it is characterized in that, concentrate each responsive track to mate with the responsive track in step 2 the to be detected responsive track obtained in step 5, be specially: comprise responsive track to be detected in responsive track and be this responsive track to be detected and the success of responsive path matching.
CN201410668759.7A 2014-11-21 2014-11-21 Method for detecting sensitive tracks of uploaded files in network Withdrawn CN104375984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410668759.7A CN104375984A (en) 2014-11-21 2014-11-21 Method for detecting sensitive tracks of uploaded files in network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410668759.7A CN104375984A (en) 2014-11-21 2014-11-21 Method for detecting sensitive tracks of uploaded files in network

Publications (1)

Publication Number Publication Date
CN104375984A true CN104375984A (en) 2015-02-25

Family

ID=52554907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410668759.7A Withdrawn CN104375984A (en) 2014-11-21 2014-11-21 Method for detecting sensitive tracks of uploaded files in network

Country Status (1)

Country Link
CN (1) CN104375984A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105847516A (en) * 2016-05-28 2016-08-10 腾讯科技(深圳)有限公司 Method and device for managing contact person information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10240759A (en) * 1997-02-28 1998-09-11 Sharp Corp Retrieval device
CN1403965A (en) * 2001-09-05 2003-03-19 联想(北京)有限公司 Jamproof theme word extracting method
CN101477544A (en) * 2009-01-12 2009-07-08 腾讯科技(深圳)有限公司 Rubbish text recognition method and system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10240759A (en) * 1997-02-28 1998-09-11 Sharp Corp Retrieval device
CN1403965A (en) * 2001-09-05 2003-03-19 联想(北京)有限公司 Jamproof theme word extracting method
CN101477544A (en) * 2009-01-12 2009-07-08 腾讯科技(深圳)有限公司 Rubbish text recognition method and system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105847516A (en) * 2016-05-28 2016-08-10 腾讯科技(深圳)有限公司 Method and device for managing contact person information
CN105847516B (en) * 2016-05-28 2019-04-26 腾讯科技(深圳)有限公司 A kind of method for managing contact person information and device

Similar Documents

Publication Publication Date Title
CN103336766A (en) Short text garbage identification and modeling method and device
US20120159625A1 (en) Malicious code detection and classification system using string comparison and method thereof
US20150207704A1 (en) Public opinion information display system and method
US20180173450A1 (en) Method and Device for File Name Identification and File Cleaning
CN104700033A (en) Virus detection method and virus detection device
CN102542061B (en) Intelligent product classification method
CN102170640A (en) Mode library-based smart mobile phone terminal adverse content website identifying method
CN102646124A (en) Method for automatically identifying address information
CN105718795B (en) Malicious code evidence collecting method and system under Linux based on condition code
US20190005057A1 (en) Methods and Devices for File Folder Path Identification and File Folder Cleaning
CN101673266A (en) Method for searching audio and video contents
CN104317891A (en) Method and device for tagging pages
CN104317909A (en) Method and device for verifying data of points of interest
CN104951553B (en) A kind of accurate content of data processing is collected and data mining platform and its implementation
CN106650451A (en) Detection method and device
CN103902906A (en) Mobile terminal malicious code detecting method and system based on application icon
CN103324888A (en) Method and system for automatically extracting virus characteristics based on family samples
CN102194503B (en) Player and character code detection method and device for subtitle file
CN109857842B (en) Method and device for recognizing fault-reporting text
CN103823809A (en) Query phrase classification method and device, and classification optimization method and device
CN105224603A (en) Corpus acquisition methods and device
CN104375983A (en) Detection system of sensitive track in network uploaded file
CN104978523A (en) Malicious sample capture method and system based on network hot word recognition
CN104636340A (en) Webpage URL filtering method, device and system
CN109670153A (en) A kind of determination method, apparatus, storage medium and the terminal of similar model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C04 Withdrawal of patent application after publication (patent law 2001)
WW01 Invention patent application withdrawn after publication

Application publication date: 20150225