[go: up one dir, main page]

CN104079560A - Web address security detecting method and device and server - Google Patents

Web address security detecting method and device and server Download PDF

Info

Publication number
CN104079560A
CN104079560A CN201410248182.4A CN201410248182A CN104079560A CN 104079560 A CN104079560 A CN 104079560A CN 201410248182 A CN201410248182 A CN 201410248182A CN 104079560 A CN104079560 A CN 104079560A
Authority
CN
China
Prior art keywords
coding
code
network address
text
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410248182.4A
Other languages
Chinese (zh)
Inventor
张辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410248182.4A priority Critical patent/CN104079560A/en
Publication of CN104079560A publication Critical patent/CN104079560A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The embodiment of the invention discloses a web address security detecting method and device and a server. The method comprises the steps that a JS code text is obtained from a page file corresponding to a web address according to the web address reported by a client side; the JS code text is converted into a code to be detected through the preset coding algorithm; the code to be detected is analyzed to determine the security of the web address. By means of the method, the device and the server, the security of the web address can be fast detected, and the security of the network of the client side is ensured.

Description

A kind of network address safety detecting method, device and server
Technical field
The present invention relates to Internet technical field, be specifically related to network security technology field, relate in particular to a kind of network address safety detecting method, device and server.
Background technology
Network address refers to webpage (or website) address, can be URL (Uniform Resourse Locator, URL(uniform resource locator)).Divide according to the fail safe of network address, network address can be divided into safe network address and malice network address; Safe network address is made a comment or criticism and is advised the address of website, for example: official's network address of official's network address of each big bank, each shopping website etc.; Malice network address refers to all kinds of swindles, counterfeit, fishing and hangs the web page address such as horse, for example: malice network address of counterfeit all kinds of regular websites etc., client-access malice network address may cause the harm such as the leakage of client privacy information, trojan horse infection.Along with the development of Internet technology, how to network address, fail safe detects, and to protect the network security of client, becomes problem demanding prompt solution.
Summary of the invention
Embodiment of the present invention technical problem to be solved is, a kind of network address safety detecting method, device and server are provided, and can carry out fast detecting to network address fail safe, the network security of protection client.
In order to solve the problems of the technologies described above, embodiment of the present invention first aspect provides a kind of network address safety detecting method, can comprise:
The network address reporting according to client is obtained JS (JavaScript, a kind of client script language) code text from web page files corresponding to described network address;
Adopt default encryption algorithm that described JS code text is converted to coding to be detected;
Analyze described coding to be detected to determine the fail safe of described network address.
Based on first aspect, in the first execution mode, the described network address reporting according to client is obtained JS code text from web page files corresponding to described network address, can comprise:
The network address reporting according to client, downloads web page files corresponding to described network address;
Resolve described web page files, obtain the source code text of described web page files;
From the source code text of described web page files, extract JS code text.
The first execution mode based on first aspect or first aspect, in the second execution mode, described default encryption algorithm is Simhash (a kind of local sensitivity hash algorithm) algorithm; Described JS code text is converted to coding to be detected by the default encryption algorithm of described employing, comprising:
Described JS code text is carried out to word segmentation processing, obtain at least one feature code;
Adopt hash algorithm to the calculating of encode of each feature code, the Hash that obtains each feature code encodes;
Hash coding to each feature code is weighted processing, obtains the weights sequence of each feature code;
The weights sequence of described each feature code is merged to processing, obtain weights sequence string corresponding to described JS code text;
The weights sequence string corresponding to described JS code text carries out dimension-reduction treatment, generates Simhash coding corresponding to described JS code text;
Simhash coding corresponding described JS code text is defined as to coding to be detected.
The second execution mode based on first aspect, in the third execution mode, the described coding to be detected of described analysis, to determine the fail safe of described network address, comprising:
Judge in malice sample code database and whether exist the malice sample coding similar to described coding to be detected, described malice sample code database to comprise at least one malice sample coding, described malice sample is encoded to Simhash coding;
If judge the existence malice sample coding similar to described coding to be detected in described malice sample code database, determine that described network address is malice network address.
Based on the third execution mode of first aspect, in the 4th kind of execution mode, describedly judge in malice sample code database whether have the malice sample coding similar to described coding to be detected, comprising:
Each malice sample coding in described coding to be detected and described malice sample code database is carried out to binary digit comparison;
If arbitrary malice sample coding has difference binary digit with described coding to be detected in described malice sample code database, and the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
Second aspect present invention provides a kind of network address fail safe checkout gear, can comprise:
Text acquisition module for the network address reporting according to client, obtains JS code text from web page files corresponding to described network address;
Coding module, for adopting default encryption algorithm that described JS code text is converted to coding to be detected;
Safety detection module, for analyzing described coding to be detected to determine the fail safe of described network address.
Based on second aspect, in the first execution mode, described text acquisition module comprises:
Download unit, for the network address reporting according to client, downloads web page files corresponding to described network address;
Resolution unit, for resolving described web page files, obtains the source code text of described web page files;
Text extraction unit, extracts JS code text for the source code text from described web page files.
The first execution mode based on second aspect or second aspect, in the second execution mode, described coding module comprises:
Word segmentation processing unit, for described JS code text is carried out to word segmentation processing, obtains at least one feature code;
Coding computing unit, for adopting hash algorithm to the calculating of encode of each feature code, the Hash that obtains each feature code encodes;
Weighted units, is weighted processing for the Hash coding to each feature code, obtains the weights sequence of each feature code;
Merge cells, for the weights sequence of described each feature code is merged to processing, obtains weights sequence string corresponding to described JS code text;
Dimension-reduction treatment unit, carries out dimension-reduction treatment for the weights sequence string to described JS code text, generates Simhash coding corresponding to described JS code text;
Coding determining unit, for being defined as coding to be detected by Simhash coding corresponding described JS code text;
Wherein, described default encryption algorithm is Simhash algorithm.
The second execution mode based on second aspect, in the third execution mode, described safety detection module comprises:
Judging unit, for judging whether malice sample code database exists the malice sample coding similar to described coding to be detected, described malice sample code database comprises at least one malice sample coding, described malice sample is encoded to Simhash coding;
Fail safe determining unit, in the time judging that described malice sample code database exists the malice sample coding similar to described coding to be detected, determines that described network address is for malice network address.
Based on the third execution mode of second aspect, in the 4th kind of execution mode, described judging unit comprises:
Relatively subelement, for carrying out binary digit comparison by each malice sample coding of described coding to be detected and described malice sample code database;
Judgment sub-unit, for thering is difference binary digit when the arbitrary malice sample coding of described malice sample code database with described coding to be detected, and when the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
Third aspect present invention also provides a kind of server, can comprise the network address fail safe checkout gear described in above-mentioned second aspect.
Implement the embodiment of the present invention, there is following beneficial effect:
Encode and analyze by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
Brief description of the drawings
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
The flow chart of a kind of network address safety detecting method that Fig. 1 provides for the embodiment of the present invention;
The another kind of network address safety detecting method flow chart that Fig. 2 provides for the embodiment of the present invention;
The structural representation of a kind of network address fail safe checkout gear that Fig. 3 provides for the embodiment of the present invention;
The structural representation of a kind of text acquisition module that Fig. 4 provides for the embodiment of the present invention;
The structural representation of a kind of coding module that Fig. 5 provides for the embodiment of the present invention;
The structural representation of a kind of safety detection module that Fig. 6 provides for the embodiment of the present invention;
The structural representation of a kind of judging unit that Fig. 7 provides for the embodiment of the present invention;
The structural representation of a kind of server that Fig. 8 provides for the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Below in conjunction with accompanying drawing 1-accompanying drawing 2, the network address safety detecting method that the embodiment of the present invention is provided describes in detail.It should be noted that, the network address fail safe checkout gear that the network address safety detecting method shown in accompanying drawing 1-accompanying drawing 2 can be provided by the embodiment of the present invention is performed, and this network address fail safe checkout gear can run in server.In the embodiment of the present invention, client can include but not limited to: PC (Personal Computer, personal computer), the terminal equipment such as PAD (panel computer), mobile phone, smart mobile phone, notebook computer, or, client can be the applications client in above-mentioned terminal equipment, for example: the computer house keeper client in PC, safe house keeper's client in mobile phone etc.
Refer to Fig. 1, the flow chart of a kind of network address safety detecting method providing for the embodiment of the present invention; The method can comprise the following steps S101-step S103.
S101, the network address reporting according to client is obtained JS code text from web page files corresponding to described network address.
Wherein, network address refers to webpage (or website) address, can be URL.Divide according to the fail safe of network address, network address can be divided into safe network address and malice network address; Safe network address is made a comment or criticism and is advised the address of website, for example: official's network address of official's network address of each big bank, each shopping website etc.; Malice network address refers to all kinds of swindles, counterfeit, fishing and hangs the web page address such as horse, for example: malice network address of counterfeit all kinds of regular websites etc., client-access malice network address may cause the harm such as the leakage of client privacy information, trojan horse infection.At present, page text content corresponding to malice network address adopts JS code to be encrypted encapsulation conventionally, for example, to hide its malice property: and output after page text content corresponding to malice network address adopts JS.document.write to encrypt, to hide its malice property; In order to resist the JS encapsulation characteristic of malice network address, the network address that this step can report according to client is obtained JS code text and is detected analysis to do follow-up fail safe from web page files corresponding to this network address.In the embodiment of the present invention, the network address that client reports refers to the current network address of accessing that client collects from browser address bar, to ensure the fail safe of client current accessed, but be understandable that, the embodiment of the present invention does not limit this, the network address that client reports can also be the also network address of requesting query of user's input that client is collected, etc.
S102, adopts default encryption algorithm that described JS code text is converted to coding to be detected.
Wherein, described default encryption algorithm is preferably Simhash algorithm.Simhash algorithm is the one of Local Sensitive Hash (local sensitivity Hash) algorithm, its feature is local sensitivity, in the time that a small amount of variation occurs input content, calculate by Simhash algorithm the hash value obtaining constant or slight variation only occur.Be understandable that, described default encryption algorithm can also be the algorithm of other types, the embodiment of the present invention does not limit this, for example: default encryption algorithm can also adopt PHA (Perceptual Hashing, perception Hash) algorithm etc.
S103, analyzes described coding to be detected to determine the fail safe of described network address.
The JS that carries out variety of way due to page text content corresponding to malice network address encrypts the general variation of encapsulation can great changes will take place, therefore, this step can be compared the coding of JS code text corresponding with known malice network address coding to be detected, if the two is similar, can determine that the network address to be detected that client reports is malice network address, thereby determine fast network address fail safe.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
Refer to Fig. 2, the another kind of network address safety detecting method flow chart providing for the embodiment of the present invention; In the present embodiment, described default encryption algorithm is preferably Simhash algorithm.The method can comprise the following steps S201-step S211.
S201, the network address reporting according to client, downloads web page files corresponding to described network address.
Wherein, network address refers to webpage (or website) address, can be URL.Divide according to network address fail safe, network address can be divided into safe network address and malice network address; Safe network address is made a comment or criticism and is advised the address of website, for example: official's network address of official's network address of each big bank, each shopping website etc.; Malice network address refers to all kinds of swindles, counterfeit, fishing and hangs the web page address such as horse, for example: malice network address of counterfeit all kinds of regular websites etc., client-access malice network address may cause the harm such as privacy information leakage, trojan horse infection of client.In this step, the network address reporting according to client, can from the webserver, download web page files corresponding to this network address, this web page files includes but not limited to: HTML (HyperText Markup Language, HTML) file, JS file, CSS (Cascading Style Sheet, Cascading Style Sheet) file etc., in the embodiment of the present invention, the network address that client reports refers to the current network address of accessing that client collects from browser address bar, to ensure the fail safe of client current accessed, but be understandable that, the embodiment of the present invention does not limit this, the network address that client reports can also be the also network address of requesting query of user's input that client is collected, etc..
S202, resolves described web page files, obtains the source code text of described web page files.
Wherein, source code text packets has contained page text content and the ways of presentation thereof of described web page files, and client, by operation source code text, can represent the page text content of described web page files.
S203 extracts JS code text from the source code text of described web page files.
At present, page text content corresponding to malice network address adopts JS code to be encrypted encapsulation conventionally, for example, to hide its malice property: and output after page text content corresponding to malice network address adopts JS.document.write to encrypt, to hide its malice property; In order to resist the JS encapsulation characteristic of malice network address, this step can extract JS code text from the source code text of described web page files, to analyze this JS code text whether as the JS text of malice encapsulation, thereby determines network address fail safe.
Step S201-S203 can be the concrete refinement step of the step S101 in embodiment illustrated in fig. 1.
S204, carries out word segmentation processing to described JS code text, obtains at least one feature code.
Wherein, can adopt flexibly various segmenting methods, described JS code text be carried out to the method for word segmentation processing, this segmenting method includes but not limited to: the segmenting method based on space, the segmenting method based on statistics etc.The object of word segmentation processing is to extract the feature code of the feature that characterizes described JS code text; For example: the JS code text that the content of pages text of " development trend of the Internet technology of China " forms through the encapsulation of JS code, after the word segmentation processing of this step, obtainable feature code is as follows: the JS code of the JS code of " China ", the JS code of " the Internet ", " technology ", the JS code of " development ", the JS code of " trend ".
S205, adopts hash algorithm to the calculating of encode of each feature code, and the Hash that obtains each feature code encodes.
Wherein, a corresponding Hash coding of feature code.In this step, feature code being encoded to Hash coding, is the process that a JS code is converted to binary numeral.
S206, is weighted processing to the Hash coding of each feature code, obtains the weights sequence of each feature code.
Significance level according to each feature code in JS code text, can assign weight for each feature code; For example: according to the example in step S204, it is 1-5 that weight rank can be set, weighted value is larger, show that the significance level of this feature code in JS code text is higher, as the weight of the JS code correspondence of " China " can be 4, the weight of the JS code correspondence of " the Internet " can be 5, and the weight of the JS code correspondence of " technology " can be 3, the weight of the JS code correspondence of " development " can be 4, and the weight of the JS code correspondence of " trend " can be 2.In this step, adopt each feature code weight separately, the Hash coding of each feature code is weighted to processing, can form the weighted number word string of each feature code, this weighted number word string is the weights sequence of this feature code; It should be noted that, in weighting processing procedure, if binary digit is 1, weight be on the occasion of, if binary digit is 0, weight is negative value; For example: the Hash that supposes the JS code correspondence of " China " is encoded to " 100101 ", after being adopted weight 4 to be weighted, can obtain the weighted number word string of " 4-4-44-44 ", " 4-4-44-44 " is the weights sequence of this feature code of JS code of " China ".
S207, merges processing by the weights sequence of described each feature code, obtains weights sequence string corresponding to described JS code text.
Through step S206, all corresponding weights sequence of each feature code, this step is carried out step-by-step accumulation calculating by the weights sequence of each feature code, merges into a weights sequence string, and this weights sequence string can characterize the feature of described JS code text.For example: the weights sequence of supposing the JS code correspondence of " China " is " 4-4-44-44 ", the weights sequence of the JS code correspondence of " the Internet " is " 5-55-555 ", this step merges processing procedure for " 4+5-4+-5-4+54+-5-4+54+5 " to the two, thereby obtains the weights sequence string of " 9-91-119 "; In like manner, this step can obtain weights sequence string corresponding to described JS code text.
S208, the weights sequence string corresponding to described JS code text carries out dimension-reduction treatment, generates Simhash coding corresponding to described JS code text.
In this step, the object that weights sequence string corresponding described JS code text is carried out to dimension-reduction treatment, is that weights sequence string corresponding described JS code text is converted to binary coding, thereby generates Simhash coding corresponding to described JS code text.It should be noted that, in dimension-reduction treatment process, in weights sequence string corresponding to described JS code text, be greater than 0 position and be set to 1, be less than 0 position and be set to 0; For example: the weights sequence string of " 9-91-119 " is after dimension-reduction treatment, and available Simhash is encoded to " 101011 ".
S209, is defined as coding to be detected by Simhash coding corresponding described JS code text.
Step S204-S209 can be the concrete refinement step of the step S102 in embodiment illustrated in fig. 1.Whether S210, judge in malice sample code database and exist the malice sample similar to described coding to be detected to encode; If the determination result is YES, proceed to step S211; Otherwise, finish.
Wherein, described malice sample code database comprises at least one malice sample coding, and a malice sample coding refers to adopt default encryption algorithm to encode and calculate the coding obtaining a JS code text corresponding to malice network address.Described default encryption algorithm is preferably Simhash algorithm, and described malice sample coding is preferably Simhash coding.In this step, if there is the malice sample coding similar to described coding to be detected in malice sample code database, utilize the feature of Simhash algorithm known, this coding to be detected is a malice sample coding, is a malice network address thereby can proceed to the network address that step S211 determines that client reports.
The deterministic process of this step S210 specifically can be referring to following flow process:
A, the each malice sample coding in described coding to be detected and described malice sample code database is carried out to binary digit comparison.
For example: suppose to comprise a, b, tri-malice sample codings of c in malice sample code database, the k that is encoded to be detected, needs in steps A respectively k and a to be carried out to binary digit comparison, and k and b are carried out to binary digit comparison, and k and c are carried out to binary digit comparison.
If arbitrary malice sample coding has difference binary digit with described coding to be detected in the described malice sample of B code database, and the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
Referring to the example in steps A, if k and a, b, c thrin have difference binary digit (being different binary digits), and the quantity of difference binary digit is less than predetermined threshold value, can judge in malice sample code database exists the malice sample similar to k to encode, for example: suppose that k, a, b, c are 6 binary systems, predetermined threshold value is 2, if k and a only have 1 difference binary digit, can judge that thus k is similar to a, be i.e. in judgement malice sample code database, have the malice sample coding similar to coding to be detected.On the contrary, if the quantity of the difference binary digit of k and a, b or c is all more than or equal to predetermined threshold value, can judge that k and a, b, c are all dissimilar, thereby judge malice sample coding not similar to described coding to be detected in described malice sample code database.
S211, determines that described network address is for malice network address.
The step S210-step S211 of the present embodiment can be the concrete refinement step of the step S103 in embodiment illustrated in fig. 1.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
Below in conjunction with accompanying drawing 3-accompanying drawing 7, the network address fail safe checkout gear that the embodiment of the present invention is provided describes in detail.It should be noted that, the network address fail safe checkout gear shown in accompanying drawing 3-accompanying drawing 7 can run in server, for carrying out the network address safety detecting method shown in accompanying drawing 1-accompanying drawing 2.
Refer to Fig. 3, the structural representation of a kind of network address fail safe checkout gear providing for the embodiment of the present invention; This device can comprise: text acquisition module 101, coding module 102 and safety detection module 103.
Text acquisition module 101 for the network address reporting according to client, obtains JS code text from web page files corresponding to described network address.
Wherein, network address refers to webpage (or website) address, can be URL.Divide according to the fail safe of network address, network address can be divided into safe network address and malice network address; Safe network address is made a comment or criticism and is advised the address of website, for example: official's network address of official's network address of each big bank, each shopping website etc.; Malice network address refers to all kinds of swindles, counterfeit, fishing and hangs the web page address such as horse, for example: malice network address of counterfeit all kinds of regular websites etc., client-access malice network address may cause the harm such as the leakage of client privacy information, trojan horse infection.At present, page text content corresponding to malice network address adopts JS code to be encrypted encapsulation conventionally, for example, to hide its malice property: and output after page text content corresponding to malice network address adopts JS.document.write to encrypt, to hide its malice property; In order to resist the JS encapsulation characteristic of malice network address, the network address that described text acquisition module 101 can report according to client is obtained JS code text and is detected analysis to do follow-up fail safe from web page files corresponding to this network address.In the embodiment of the present invention, the network address that client reports refers to the current network address of accessing that client collects from browser address bar, to ensure the fail safe of client current accessed, but be understandable that, the embodiment of the present invention does not limit this, the network address that client reports can also be the also network address of requesting query of user's input that client is collected, etc.
Coding module 102, for adopting default encryption algorithm that described JS code text is converted to coding to be detected.
Wherein, described default encryption algorithm is preferably Simhash algorithm.Simhash algorithm is the one of Local Sensitive Hash (local sensitivity Hash) algorithm, its feature is local sensitivity, in the time that a small amount of variation occurs input content, calculate by Simhash algorithm the hash value obtaining constant or slight variation only occur.Be understandable that, described default encryption algorithm can also be the algorithm of other types, the embodiment of the present invention does not limit this, for example: default encryption algorithm can also adopt PHA (Perceptual Hashing, perception Hash) algorithm etc.
Safety detection module 103, for analyzing described coding to be detected to determine the fail safe of described network address.
The JS that carries out variety of way due to page text content corresponding to malice network address encrypts the general variation of encapsulation can great changes will take place, therefore, described safety detection module 103 can be compared the coding of JS code text corresponding with known malice network address coding to be detected, if the two is similar, can determine that the network address to be detected that client reports is malice network address, thereby determine fast network address fail safe.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
Refer to Fig. 4, the structural representation of a kind of text acquisition module providing for the embodiment of the present invention; Text acquisition module 101 can comprise: download unit 1101, resolution unit 1102 and text extraction unit 1103.
Download unit 1101, for the network address reporting according to client, downloads web page files corresponding to described network address.
Wherein, network address refers to webpage (or website) address, can be URL.Divide according to network address fail safe, network address can be divided into safe network address and malice network address; Safe network address is made a comment or criticism and is advised the address of website, for example: official's network address of official's network address of each big bank, each shopping website etc.; Malice network address refers to all kinds of swindles, counterfeit, fishing and hangs the web page address such as horse, for example: malice network address of counterfeit all kinds of regular websites etc., client-access malice network address may cause the harm such as privacy information leakage, trojan horse infection of client.The network address that described download unit 1101 reports according to client, can from the webserver, download web page files corresponding to this network address, this web page files includes but not limited to: html file, JS file, CSS file etc., in the embodiment of the present invention, the network address that client reports refers to the current network address of accessing that client collects from browser address bar, to ensure the fail safe of client current accessed, but be understandable that, the embodiment of the present invention does not limit this, the network address that client reports can also be the also network address of requesting query of user's input that client is collected, etc..
Resolution unit 1102, for resolving described web page files, obtains the source code text of described web page files.
Wherein, source code text packets has contained page text content and the ways of presentation thereof of described web page files, and client, by operation source code text, can represent the page text content of described web page files.
Text extraction unit 1103, extracts JS code text for the source code text from described web page files.
At present, page text content corresponding to malice network address adopts JS code to be encrypted encapsulation conventionally, for example, to hide its malice property: and output after page text content corresponding to malice network address adopts JS.document.write to encrypt, to hide its malice property; In order to resist the JS encapsulation characteristic of malice network address, described text extraction unit 1103 can extract JS code text from the source code text of described web page files, to analyze this JS code text whether as the JS text of malice encapsulation, thereby determines network address fail safe.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
Refer to Fig. 5, the structural representation of a kind of coding module providing for the embodiment of the present invention; In the present embodiment, described default encryption algorithm is Simhash algorithm.This coding module 102 can comprise: word segmentation processing unit 1201, coding computing unit 1202, weighted units 1203, merge cells 1204, dimension-reduction treatment unit 1205 and coding determining unit 1206.
Word segmentation processing unit 1201, for described JS code text is carried out to word segmentation processing, obtains at least one feature code.
Wherein, described word segmentation processing unit 1201 can adopt various segmenting methods flexibly, described JS code text is carried out to the method for word segmentation processing, and this segmenting method includes but not limited to: the segmenting method based on space, the segmenting method based on statistics etc.The object of word segmentation processing is to extract the feature code of the feature that characterizes described JS code text; For example: the JS code text that the content of pages text of " development trend of the Internet technology of China " forms through the encapsulation of JS code, after the word segmentation processing of described word segmentation processing unit 1201, obtainable feature code is as follows: the JS code of the JS code of " China ", the JS code of " the Internet ", " technology ", the JS code of " development ", the JS code of " trend ".
Coding computing unit 1202, for adopting hash algorithm to the calculating of encode of each feature code, the Hash that obtains each feature code encodes.
Wherein, a corresponding Hash coding of feature code.Feature code is encoded to Hash coding by described coding computing unit 1202, is the process that a JS code is converted to binary numeral.
Weighted units 1203, is weighted processing for the Hash coding to each feature code, obtains the weights sequence of each feature code.
Significance level according to each feature code in JS code text, can assign weight for each feature code; For example: according to the example in the present embodiment, it is 1-5 that weight rank can be set, weighted value is larger, show that the significance level of this feature code in JS code text is higher, as the weight of the JS code correspondence of " China " can be 4, the weight of the JS code correspondence of " the Internet " can be 5, and the weight of the JS code correspondence of " technology " can be 3, the weight of the JS code correspondence of " development " can be 4, and the weight of the JS code correspondence of " trend " can be 2.In this step, adopt each feature code weight separately, the Hash coding of each feature code is weighted to processing, can form the weighted number word string of each feature code, this weighted number word string is the weights sequence of this feature code; It should be noted that, in weighting processing procedure, if binary digit is 1, weight be on the occasion of, if binary digit is 0, weight is negative value; For example: the Hash that supposes the JS code correspondence of " China " is encoded to " 100101 ", after being adopted weight 4 to be weighted, can obtain the weighted number word string of " 4-4-44-44 ", " 4-4-44-44 " is the weights sequence of this feature code of JS code of " China ".
Merge cells 1204, for the weights sequence of described each feature code is merged to processing, obtains weights sequence string corresponding to described JS code text.
Each feature code is a corresponding weights sequence all, and 1204 of the described merge cellses weights sequence by each feature code is carried out step-by-step accumulation calculating, merges into a weights sequence string, and this weights sequence string can characterize the feature of described JS code text.For example: the weights sequence of supposing the JS code correspondence of " China " is " 4-4-44-44 ", the weights sequence of the JS code correspondence of " the Internet " is " 5-55-555 ", described merge cells 1204 merges processing procedure for " 4+5-4+-5-4+54+-5-4+54+5 " to the two, thereby obtains the weights sequence string of " 9-91-119 "; In like manner, described merge cells 1204 can obtain weights sequence string corresponding to described JS code text.
Dimension-reduction treatment unit 1205, carries out dimension-reduction treatment for the weights sequence string to described JS code text, generates Simhash coding corresponding to described JS code text.
The object that weights sequence string corresponding described JS code text is carried out dimension-reduction treatment by described dimension-reduction treatment unit 1205, be that weights sequence string corresponding described JS code text is converted to binary coding, thereby generate Simhash coding corresponding to described JS code text.It should be noted that, in dimension-reduction treatment process, in weights sequence string corresponding to described JS code text, be greater than 0 position and be set to 1, be less than 0 position and be set to 0; For example: the weights sequence string of " 9-91-119 " is after dimension-reduction treatment, and available Simhash is encoded to " 101011 ".
Coding determining unit 1206, for being defined as coding to be detected by Simhash coding corresponding described JS code text.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
Refer to Fig. 6, the structural representation of a kind of safety detection module providing for the embodiment of the present invention; This safety detection module 103 can comprise: judging unit 1301 and fail safe determining unit 1302.
Judging unit 1301, for judging whether malice sample code database exists the malice sample coding similar to described coding to be detected.
In specific implementation, the structure of described judging unit 1301 can be referring to Fig. 7, the structural representation of a kind of judging unit providing for the embodiment of the present invention; This judging unit 1301 can comprise: relatively subelement 1311 and judgment sub-unit 1312.
Relatively subelement 1311, for carrying out binary digit comparison by each malice sample coding of described coding to be detected and described malice sample code database.
For example: suppose maliciously in sample code database, to comprise a, b, tri-malice samples codings of c, the k that is encoded to be detected, described relatively subelement 1311 needs respectively k and a to be carried out to binary digit comparison, and k and b are carried out to binary digit comparison, and k and c are carried out to binary digit comparison.
Judgment sub-unit 1312, for thering is difference binary digit when the arbitrary malice sample coding of described malice sample code database with described coding to be detected, and when the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
Referring to above-mentioned example, if k and a, b, c thrin have difference binary digit (being different binary digits), and the quantity of difference binary digit is less than predetermined threshold value, can judge in malice sample code database exists the malice sample similar to k to encode, for example: suppose that k, a, b, c are 6 binary systems, predetermined threshold value is 2, if k and a only have 1 difference binary digit, can judge that thus k is similar to a, be i.e. in judgement malice sample code database, have the malice sample coding similar to coding to be detected.On the contrary, if the quantity of the difference binary digit of k and a, b or c is all more than or equal to predetermined threshold value, can judge that k and a, b, c are all dissimilar, thereby judge malice sample coding not similar to described coding to be detected in described malice sample code database.
Fail safe determining unit 1302, in the time judging that described malice sample code database exists the malice sample coding similar to described coding to be detected, determines that described network address is for malice network address.
Wherein, described malice sample code database comprises at least one malice sample coding, and a malice sample coding refers to adopt default encryption algorithm to encode and calculate the coding obtaining a JS code text corresponding to malice network address.Described default encryption algorithm is preferably Simhash algorithm, and described malice sample coding is preferably Simhash coding.If there is the malice sample coding similar to described coding to be detected in described judging unit 1301 judgement malice sample code databases, utilize the feature of Simhash algorithm known, described fail safe determining unit 1302 can determine that this coding to be detected is a malice sample coding, thereby can determine that the network address that client reports is a malice network address.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
The embodiment of the invention also discloses a kind of server, this server can comprise a network address fail safe checkout gear, and the 26S Proteasome Structure and Function of this network address fail safe checkout gear can, referring to above-mentioned Fig. 3-associated description embodiment illustrated in fig. 7, be not repeated herein.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
The embodiment of the invention also discloses another kind of server, specifically refer to Fig. 8, the structural representation of a kind of server providing for the embodiment of the present invention; The server of the embodiment of the present invention comprises: at least one processor 201, for example CPU, at least one communication bus 202, at least one network interface 203, memory 204.Wherein, communication bus 202 is for realizing the connection communication between these assemblies.Wherein, described network interface 203 optionally can comprise wireline interface, the wave point (as WI-FI, mobile communication interface etc.) of standard.Described memory 204 can be high-speed RAM memory, also can the unsettled memory of right and wrong (non-volatile memory), and for example at least one magnetic disc store.Described memory 204 can also be optionally that at least one is positioned at the storage device away from aforementioned processing device 201.As shown in Figure 8, in the memory 204 as a kind of computer-readable storage medium, store operating system, network communication module, and store program and other programs for carrying out network address fail safe detection.
Wherein concrete, described processor 201 can for call in described memory 204 storage for carrying out the program of network address fail safe detection, carry out following steps:
The network address reporting according to client is obtained JS code text from web page files corresponding to described network address;
Adopt default encryption algorithm that described JS code text is converted to coding to be detected;
Analyze described coding to be detected to determine the fail safe of described network address.
Further, described processor 201 is being carried out the network address reporting according to client, while obtaining the step of JS code text, specifically carries out following steps from web page files corresponding to described network address:
The network address reporting according to client, downloads web page files corresponding to described network address;
Resolve described web page files, obtain the source code text of described web page files;
From the source code text of described web page files, extract JS code text.
Further, described default encryption algorithm is Simhash algorithm; Described processor 201, in the time that execution adopts default encryption algorithm that described JS code text is converted to the step of coding to be detected, is specifically carried out following steps:
Described JS code text is carried out to word segmentation processing, obtain at least one feature code;
Adopt hash algorithm to the calculating of encode of each feature code, the Hash that obtains each feature code encodes;
Hash coding to each feature code is weighted processing, obtains the weights sequence of each feature code;
The weights sequence of described each feature code is merged to processing, obtain weights sequence string corresponding to described JS code text;
The weights sequence string corresponding to described JS code text carries out dimension-reduction treatment, generates Simhash coding corresponding to described JS code text;
Simhash coding corresponding described JS code text is defined as to coding to be detected.
Further, described processor 201 when determining the step of fail safe of described network address, is specifically carried out following steps at coding to be detected described in execution analysis:
Judge in malice sample code database and whether exist the malice sample coding similar to described coding to be detected, described malice sample code database to comprise at least one malice sample coding, described malice sample is encoded to Simhash coding;
If judge the existence malice sample coding similar to described coding to be detected in described malice sample code database, determine that described network address is malice network address.
Further, described processor 201, in the time that execution judges the step that whether has the malice sample coding similar to described coding to be detected in malice sample code database, is specifically carried out following steps:
Each malice sample coding in described coding to be detected and described malice sample code database is carried out to binary digit comparison;
If arbitrary malice sample coding has difference binary digit with described coding to be detected in described malice sample code database, and the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
The embodiment of the present invention is encoded and is analyzed by the JS code text in web page files corresponding to network address that client is reported; can either realize the detection of network address fail safe; can avoid again malice network address to encapsulate the detection error causing by JS code encryption; effectively promote the accuracy that network address fail safe detects, effectively protect the network security of client.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, can carry out the hardware that instruction is relevant by computer program to complete, described program can be stored in a computer read/write memory medium, this program, in the time carrying out, can comprise as the flow process of the embodiment of above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc.
Above disclosed is only a kind of preferred embodiment of the present invention, certainly can not limit with this interest field of the present invention, one of ordinary skill in the art will appreciate that all or part of flow process that realizes above-described embodiment, and the equivalent variations of doing according to the claims in the present invention, still belong to the scope that invention is contained.

Claims (11)

1. a network address safety detecting method, is characterized in that, comprising:
The network address reporting according to client is obtained JS code text from web page files corresponding to described network address;
Adopt default encryption algorithm that described JS code text is converted to coding to be detected;
Analyze described coding to be detected to determine the fail safe of described network address.
2. the method for claim 1, is characterized in that, the described network address reporting according to client is obtained JS code text from web page files corresponding to described network address, comprising:
The network address reporting according to client, downloads web page files corresponding to described network address;
Resolve described web page files, obtain the source code text of described web page files;
From the source code text of described web page files, extract JS code text.
3. method as claimed in claim 1 or 2, is characterized in that, described default encryption algorithm is Simhash algorithm;
Described JS code text is converted to coding to be detected by the default encryption algorithm of described employing, comprising:
Described JS code text is carried out to word segmentation processing, obtain at least one feature code;
Adopt hash algorithm to the calculating of encode of each feature code, the Hash that obtains each feature code encodes;
Hash coding to each feature code is weighted processing, obtains the weights sequence of each feature code;
The weights sequence of described each feature code is merged to processing, obtain weights sequence string corresponding to described JS code text;
The weights sequence string corresponding to described JS code text carries out dimension-reduction treatment, generates Simhash coding corresponding to described JS code text;
Simhash coding corresponding described JS code text is defined as to coding to be detected.
4. method as claimed in claim 3, is characterized in that, the described coding to be detected of described analysis, to determine the fail safe of described network address, comprising:
Judge in malice sample code database and whether exist the malice sample coding similar to described coding to be detected, described malice sample code database to comprise at least one malice sample coding, described malice sample is encoded to Simhash coding;
If judge the existence malice sample coding similar to described coding to be detected in described malice sample code database, determine that described network address is malice network address.
5. method as claimed in claim 4, is characterized in that, describedly judges in malice sample code database whether have the malice sample coding similar to described coding to be detected, comprising:
Each malice sample coding in described coding to be detected and described malice sample code database is carried out to binary digit comparison;
If arbitrary malice sample coding has difference binary digit with described coding to be detected in described malice sample code database, and the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
6. a network address fail safe checkout gear, is characterized in that, comprising:
Text acquisition module for the network address reporting according to client, obtains JS code text from web page files corresponding to described network address;
Coding module, for adopting default encryption algorithm that described JS code text is converted to coding to be detected;
Safety detection module, for analyzing described coding to be detected to determine the fail safe of described network address.
7. device as claimed in claim 6, is characterized in that, described text acquisition module comprises:
Download unit, for the network address reporting according to client, downloads web page files corresponding to described network address;
Resolution unit, for resolving described web page files, obtains the source code text of described web page files;
Text extraction unit, extracts JS code text for the source code text from described web page files.
8. the device as described in claim 6 or 7, is characterized in that, described coding module comprises:
Word segmentation processing unit, for described JS code text is carried out to word segmentation processing, obtains at least one feature code;
Coding computing unit, for adopting hash algorithm to the calculating of encode of each feature code, the Hash that obtains each feature code encodes;
Weighted units, is weighted processing for the Hash coding to each feature code, obtains the weights sequence of each feature code;
Merge cells, for the weights sequence of described each feature code is merged to processing, obtains weights sequence string corresponding to described JS code text;
Dimension-reduction treatment unit, carries out dimension-reduction treatment for the weights sequence string to described JS code text, generates Simhash coding corresponding to described JS code text;
Coding determining unit, for being defined as coding to be detected by Simhash coding corresponding described JS code text;
Wherein, described default encryption algorithm is Simhash algorithm.
9. device as claimed in claim 8, is characterized in that, described safety detection module comprises:
Judging unit, for judging whether malice sample code database exists the malice sample coding similar to described coding to be detected, described malice sample code database comprises at least one malice sample coding, described malice sample is encoded to Simhash coding;
Fail safe determining unit, in the time judging that described malice sample code database exists the malice sample coding similar to described coding to be detected, determines that described network address is for malice network address.
10. device as claimed in claim 9, is characterized in that, described judging unit comprises:
Relatively subelement, for carrying out binary digit comparison by each malice sample coding of described coding to be detected and described malice sample code database;
Judgment sub-unit, for thering is difference binary digit when the arbitrary malice sample coding of described malice sample code database with described coding to be detected, and when the quantity of difference binary digit is less than predetermined threshold value, judge the existence malice sample coding similar to described coding to be detected in described malice sample code database.
11. 1 kinds of servers, is characterized in that, comprise the network address fail safe checkout gear as described in claim 7-12 any one.
CN201410248182.4A 2014-06-05 2014-06-05 Web address security detecting method and device and server Pending CN104079560A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410248182.4A CN104079560A (en) 2014-06-05 2014-06-05 Web address security detecting method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410248182.4A CN104079560A (en) 2014-06-05 2014-06-05 Web address security detecting method and device and server

Publications (1)

Publication Number Publication Date
CN104079560A true CN104079560A (en) 2014-10-01

Family

ID=51600604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410248182.4A Pending CN104079560A (en) 2014-06-05 2014-06-05 Web address security detecting method and device and server

Country Status (1)

Country Link
CN (1) CN104079560A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596016A (en) * 2021-07-27 2021-11-02 北京丁牛科技有限公司 Malicious domain name detection method and device, electronic equipment and storage medium
CN114154153A (en) * 2021-11-23 2022-03-08 中国电信股份有限公司 Malicious code detection method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090216868A1 (en) * 2008-02-21 2009-08-27 Microsoft Corporation Anti-spam tool for browser
CN102831198A (en) * 2012-08-07 2012-12-19 人民搜索网络股份公司 Similar document identifying device and similar document identifying method based on document signature technology
CN102902686A (en) * 2011-07-27 2013-01-30 腾讯科技(深圳)有限公司 Web page detection method and system
CN102957664A (en) * 2011-08-17 2013-03-06 阿里巴巴集团控股有限公司 Method and device for identifying phishing websites
CN102999638A (en) * 2013-01-05 2013-03-27 南京邮电大学 Phishing website detection method excavated based on network group
US20130086677A1 (en) * 2010-12-31 2013-04-04 Huawei Technologies Co., Ltd. Method and device for detecting phishing web page
US8468597B1 (en) * 2008-12-30 2013-06-18 Uab Research Foundation System and method for identifying a phishing website
CN103501306A (en) * 2013-10-23 2014-01-08 腾讯科技(武汉)有限公司 Web site identification method, server and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090216868A1 (en) * 2008-02-21 2009-08-27 Microsoft Corporation Anti-spam tool for browser
US8468597B1 (en) * 2008-12-30 2013-06-18 Uab Research Foundation System and method for identifying a phishing website
US20130086677A1 (en) * 2010-12-31 2013-04-04 Huawei Technologies Co., Ltd. Method and device for detecting phishing web page
CN102902686A (en) * 2011-07-27 2013-01-30 腾讯科技(深圳)有限公司 Web page detection method and system
CN102957664A (en) * 2011-08-17 2013-03-06 阿里巴巴集团控股有限公司 Method and device for identifying phishing websites
CN102831198A (en) * 2012-08-07 2012-12-19 人民搜索网络股份公司 Similar document identifying device and similar document identifying method based on document signature technology
CN102999638A (en) * 2013-01-05 2013-03-27 南京邮电大学 Phishing website detection method excavated based on network group
CN103501306A (en) * 2013-10-23 2014-01-08 腾讯科技(武汉)有限公司 Web site identification method, server and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596016A (en) * 2021-07-27 2021-11-02 北京丁牛科技有限公司 Malicious domain name detection method and device, electronic equipment and storage medium
CN113596016B (en) * 2021-07-27 2022-02-25 北京丁牛科技有限公司 Malicious domain name detection method and device, electronic equipment and storage medium
CN114154153A (en) * 2021-11-23 2022-03-08 中国电信股份有限公司 Malicious code detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104079559A (en) Web address security detecting method and device and server
CN107204960B (en) Webpage identification method and device and server
CN112148305B (en) Application detection method, device, computer equipment and readable storage medium
KR101724307B1 (en) Method and system for detecting a malicious code
US10679088B1 (en) Visual domain detection systems and methods
US20130042306A1 (en) Determining machine behavior
EP2977928B1 (en) Malicious code detection
KR101530941B1 (en) Method, system and client terminal for detection of phishing websites
CN104168293A (en) Method and system for recognizing suspicious phishing web page in combination with local content rule base
CN110096872A (en) The detection method and server of homepage invasion script attack tool
CN111444961A (en) Method for judging internet website affiliation through clustering algorithm
CN115801455A (en) Website fingerprint-based counterfeit website detection method and device
CN107786529B (en) Website detection method, device and system
CN105975599B (en) Method and device for monitoring page embedded points of website
CN106911635B (en) Method and device for detecting whether backdoor program exists in website
CN104079560A (en) Web address security detecting method and device and server
CN108287831B (en) URL classification method and system and data processing method and system
CN111125704B (en) Webpage Trojan horse recognition method and system
US9396170B2 (en) Hyperlink data presentation
CN106911636B (en) A method and device for detecting whether a website has a backdoor program
CN105989284B (en) The recognition methods and equipment of homepage invasion script feature
US10372574B1 (en) Skew detector for data storage system
CN114124913B (en) Method and device for monitoring network asset change and electronic equipment
CN103577449B (en) Phishing website characteristic self-learning mining method and system
CN116896455A (en) Network attack detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141001