CN109561163A - The generation method and device of uniform resource locator rewriting rule - Google Patents
The generation method and device of uniform resource locator rewriting rule Download PDFInfo
- Publication number
- CN109561163A CN109561163A CN201710892706.7A CN201710892706A CN109561163A CN 109561163 A CN109561163 A CN 109561163A CN 201710892706 A CN201710892706 A CN 201710892706A CN 109561163 A CN109561163 A CN 109561163A
- Authority
- CN
- China
- Prior art keywords
- url
- parameter
- prefix
- rewriting rule
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/09—Mapping addresses
- H04L61/10—Mapping addresses of different types
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L61/00—Network arrangements, protocols or services for addressing or naming
- H04L61/45—Network directories; Name-to-address mapping
- H04L61/4505—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
- H04L61/4511—Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0263—Rule management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2101/00—Indexing scheme associated with group H04L61/00
- H04L2101/60—Types of network addresses
- H04L2101/604—Address structures or formats
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Information Transfer Between Computers (AREA)
Abstract
This application provides the generation method of uniform resource position mark URL rewriting rule and devices, wherein this method comprises: obtaining the target set of URL of targeted website;The targeted website are as follows: the website of uniform resource position mark URL rewriting rule to be generated;Obtain the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL, wherein the resource parameters are the subpath of the prefix parameter;The URL rewriting rule collection of the targeted website is generated according to the parameter set.Using the embodiment of the present application, the URL rewriting rule of website can be automatically analyzed out according to access log, participated in without artificial.
Description
Technical field
This application involves internet data processing technology field, in particular to a kind of uniform resource locator (Uniform
Resource Locator, URL) rewriting rule generation method and device, a kind of scanning side URL based on URL rewriting rule
Method and scanner and a kind of network equipment.
Background technique
In order to guarantee the safety of website, it usually can use scanner and security sweep carried out to website.Scanner can be with
Using the web access log of website as input source, the parameters under each URL will be scanned after URL duplicate removal.Because
In web access log, it might have thousands of URL, it is possible that a large amount of different URL indicate a scanning
Path, because will also include meaningless parameter in URL, these meaningless parameters are not the component part in path.In this feelings
Under condition, scanner there is still a need for this large amount of URL is scanned respectively, so that the working efficiency of scanning is lower.
It in the prior art, can be some by developer's human configuration for website in order to improve the working efficiency of scanner
Which URL rewriting rule, the meaningless parameter indicated in URL by URL rewriting rule have, and advises to be rewritten according to URL
After then mapping original URL, the url filtering for much indicating the same scan path can be fallen, only retain a URL and supply
Scanner scanning.
Summary of the invention
But inventor has found in the course of the research, based on the mode of human configuration URL rewriting rule, needs developer
It observes all Web logs and is based on observation result manual configuration, it is larger and easy out that there is only the data flows due to Web log
Wrong situation also makes the mode for configuring URL rewriting rule waste biggish manpower and material resources cost.
Based on this, this application provides the generation method and device of a kind of URL rewriting rule, one kind rewriteeing rule based on URL
URL scan method and scanner and a kind of network equipment then, to by automatically analyzing website according to certain rules
The URL of web access log, without the URL rewriting rule for manually participating in producing the website.
To solve the above-mentioned problems, this application discloses a kind of generation methods of URL rewriting rule, this method comprises:
Obtain the target set of URL of targeted website;The targeted website are as follows: uniform resource position mark URL to be generated rewrites rule
Website then;
Obtain the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL, wherein the resource
Parameter is the subpath of the prefix parameter;
The URL rewriting rule collection of the targeted website is generated according to the parameter set.
Wherein, the target set of URL for obtaining targeted website, comprising:
Initial set of URL in the access log of targeted website is pre-processed, target set of URL is obtained.
Wherein, the initial set of URL in the access log to targeted website pre-processes, and obtains target set of URL, packet
It includes:
According to hypertext transfer protocol HTTP status code, filtered from the initial set of URL in the access log of targeted website
The corresponding illegal URL of illegal URL request;
Standardization processing is carried out for the initial set of URL after illegal URL has been filtered, obtains specification set of URL, the specification
Set of URL includes: domain name, path and filename;
Duplicate removal processing is carried out to the specification set of URL, obtains target set of URL.
Wherein, the parameter set of the prefix parameter obtained in the target set of URL and resource parameters, comprising:
Each target URL in the target set of URL is split based on default separator, respectively obtains each target URL
Corresponding character array;
The sequence that the target URL is formed according to each character string in the character array determines each target URL respectively
In corresponding prefix parameter and resource parameters, to obtain parameter set.
Wherein, the sequence that the target URL is formed according to each character string in the character array, determines each mesh respectively
Mark corresponding prefix parameter and resource parameters in URL, comprising:
Any one character array is obtained as current array, executes array circulation process, the array circulation process packet
It includes:
According to vertical sequence, the first character string in the current array is obtained as current prefix parameter;
Save to initial parameter corresponding with resource parameters adjacent thereafter of the current prefix parameter is concentrated;
Judge whether the current prefix parameter is concentrated in initial URL rewriting rule, if it is, by the current prefix
Parameter and default overwrite parameter group are combined into update prefix parameter;If it is not, then by the current prefix parameter and adjacent thereafter
Resource parameters group is combined into update prefix parameter;
With the update prefix parameter be current prefix parameter, execute it is described by the current prefix parameter with it is adjacent thereafter
Resource parameters it is corresponding save to initial parameter the step of concentrating, until all character strings of current goal array have all recycled
Finish;
Judge whether all circulation finishes all character arrays, if it is not, then any one uncirculated character array is made
For current array, triggering executes the array circulation process;
If it is, using the initial parameter collection as the corresponding target component collection of target set of URL.
Wherein, the URL rewriting rule collection that the targeted website is generated according to the path parameter and non-path parameter,
Include:
For each prefix parameter, judge whether the quantity of resource parameters under the prefix parameter is greater than preset threshold respectively,
It is concentrated if it is, the prefix parameter is updated to the initial URL rewriting rule, it is again regular to obtain updated URL
Collection, up to the initial URL, rule set no longer updates again;
Updated URL rewriting rule collection is determined as the target URL rewriting rule collection.
Wherein, the method also includes:
According to the URL rewriting rule collection of the targeted website, URL to be mapped is mapped into the URL after rewriteeing.
Wherein, the URL rewriting rule collection according to the targeted website, after URL to be mapped is mapped to rewriting
URL, comprising:
Standardize to the URL to be mapped, the URL after being standardized;
The URL after the standardization is split based on default separator, the character array after respectively obtaining segmentation;
It, will be to according to the matching result concentrated in the URL rewriting rule of each prefix parameter in the character array after segmentation
Mapping URL maps to the URL after rewriteeing.
Wherein, the matching concentrated according to each prefix parameter in character array after each segmentation in the URL rewriting rule
As a result, URL to be mapped is mapped to the URL after rewriteeing, comprising:
According to vertical sequence, before the first character string in the character array after obtaining the segmentation is used as currently
Sew parameter;
Judge whether the current prefix parameter is concentrated in the URL rewriting rule, if it is, by the current prefix
Parameter and default overwrite parameter group are combined into update prefix parameter;If it is not, then by the current prefix parameter and adjacent thereafter
Resource parameters group is combined into update prefix parameter;
With the update prefix parameter be current prefix parameter, execute it is described by the current prefix parameter with it is adjacent thereafter
Resource parameters it is corresponding save to initial parameter the step of concentrating, until all character strings in the character array after the segmentation
All circulation finishes;
The update prefix parameter is obtained as the URL after rewriteeing.
Wherein, in the current prefix parameter in the case where the URL rewriting rule is concentrated, further includes:
Obtain the value of the corresponding resource parameters of the current prefix parameter and the resource parameters;
By the resource parameters, the value of resource parameters, inquiry string preservation corresponding with the URL after the rewriting.
The embodiment of the present application also discloses the URL scan method based on URL rewriting rule, this method comprises:
Pre-generated URL rewriting rule collection is obtained, and, the initial set of URL to be scanned of targeted website;The URL weight
It writes rule set to generate in the following way: obtaining the target set of URL of targeted website;The targeted website are as follows: unified money to be generated
The website of source finger URL URL rewriting rule;Obtain the ginseng of mutual corresponding prefix parameter and resource parameters in the target set of URL
Manifold, and generate according to the parameter set URL rewriting rule collection of the targeted website;
The initial URL in the initial set of URL is written over according to the URL rewriting rule collection, after being rewritten
Initial set of URL;
Duplicate removal processing is carried out to the initial set of URL after the rewriting, obtains target ULR collection;
Target URL in the target set of URL is scanned.
The embodiment of the present application also discloses a kind of generating means of URL rewriting rule, which includes:
Set of URL unit is obtained, for obtaining the target set of URL of targeted website;The targeted website are as follows: unified money to be generated
The website of the rewriting rule of source finger URL URL;
Acquiring unit, for obtaining the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL,
Wherein, the resource parameters are the subpath of the prefix parameter;
Generation unit, for generating the URL rewriting rule collection of the targeted website according to the parameter set.
Wherein, the acquisition set of URL unit is used for: being located in advance to the initial set of URL in the access log of targeted website
Reason, obtains target set of URL.
Wherein, the acquisition set of URL unit includes:
Subelement is filtered, for foundation hypertext transfer protocol HTTP status code, from the access log of targeted website
The corresponding illegal URL of illegal URL request is filtered in initial set of URL;
Standardize subelement, carries out standardization processing for being directed to the initial set of URL after having filtered illegal URL, is advised
Model set of URL, the specification URL in the specification set of URL includes: domain name, path and filename;And
Duplicate removal subelement obtains target set of URL for carrying out duplicate removal processing to the specification set of URL.
Wherein, the unit that gets parms, comprising:
Divide subelement, for being split based on default separator to each target URL in the target set of URL, point
The corresponding character array of each target URL is not obtained;
Parameter determines subelement, for forming the sequence of the target URL according to each character string in the character array, point
Do not determine corresponding prefix parameter and resource parameters in each target URL, to obtain parameter set.
Wherein, the parameter determines subelement, is specifically used for:
Any one character array is obtained as current array, executes array circulation process, the array circulation process packet
It includes:
According to vertical sequence, the first character string in the current array is obtained as current prefix parameter;
Save to initial parameter corresponding with resource parameters adjacent thereafter of the current prefix parameter is concentrated;Judge the current prefix
Whether parameter is concentrated in initial URL rewriting rule, if it is, the current prefix parameter and default overwrite parameter group are combined into
Update prefix parameter;Prefix ginseng is updated if it is not, then the current prefix parameter and resource parameters group adjacent thereafter are combined into
Number;With the update prefix parameter for current prefix parameter, execute described by the current prefix parameter and money adjacent thereafter
Source parameter is corresponding to save to initial parameter the step of concentrating, until all character strings of current goal array are all recycled and finished;
Judge whether all circulation finishes all character arrays, if it is not, then any one uncirculated character array is made
For current array, triggering executes the array circulation process;
If it is, using the initial parameter collection as the corresponding target component collection of target set of URL.
Wherein, the generation unit includes:
Judgment sub-unit judges that the quantity of resource parameters under the prefix parameter is for being directed to each prefix parameter respectively
It is no to be greater than preset threshold;
Subelement is updated, in the case where the result of the judgment sub-unit, which is, is, the prefix parameter to be updated
It is concentrated to the initial URL rewriting rule, obtains updated URL rule set again, up to the initial URL again rule set
No longer update;
Rule determines subelement, for updated URL rewriting rule collection to be determined as the target URL rewriting rule
Collection.
Wherein, described device further include:
Map unit, for the URL rewriting rule collection according to the targeted website, after original URL is mapped to rewriting
URL。
Wherein, the map unit 504 may include:
Standardize subelement, for standardizing to the URL to be mapped, the URL after being standardized;
Divide subelement and respectively obtains segmentation for being split based on default separator to the URL after the standardization
Character array afterwards;
Subelement is mapped, for concentrating according to each prefix parameter in the character array after segmentation in the URL rewriting rule
Matching result, by URL to be mapped map to rewrite after URL.
Wherein, the mapping subelement, is specifically used for:
According to vertical sequence, before the first character string in the character array after obtaining the segmentation is used as currently
Sew parameter;Judge whether the current prefix parameter is concentrated in the URL rewriting rule, if it is, by the current prefix
Parameter and default overwrite parameter group are combined into update prefix parameter;If it is not, then by the current prefix parameter and adjacent thereafter
Resource parameters group is combined into update prefix parameter;And with the update prefix parameter for current prefix parameter, execute described by institute
The step of preservation corresponding with resource parameters adjacent thereafter of current prefix parameter is concentrated to initial parameter is stated, until after the segmentation
Character array in all character strings all recycle and finish;The update prefix parameter is obtained as the URL after rewriteeing.
Wherein, the mapping subelement, is also used to:
Obtain the value of the corresponding resource parameters of the current prefix parameter and the resource parameters;And it and will be described
Resource parameters, the value of resource parameters, inquiry string are corresponding with the URL after the rewriting to be saved.
The embodiment of the present application also discloses a kind of scanner, which includes:
Obtain URL unit, for obtaining pre-generated URL rewriting rule collection, and, targeted website it is to be scanned initial
Set of URL;The URL rewriting rule collection generates in the following way: as under type generates: the target set of URL of targeted website is obtained,
The targeted website are as follows: the website of uniform resource position mark URL rewriting rule to be generated;It obtains in the target set of URL mutually
The parameter set of corresponding prefix parameter and resource parameters, and rule are rewritten according to the URL that the parameter set generates the targeted website
Then collect;
Rewriting unit, for being written over according to the URL rewriting rule collection to the initial URL in the initial set of URL,
Initial set of URL after being rewritten;
Duplicate removal unit obtains target ULR collection for carrying out duplicate removal processing to the initial set of URL after the rewriting;
Scanning element, for being scanned to the target URL in the target set of URL.
It includes: processor, memory, network interface and total linear system that the embodiment of the present application, which also discloses a kind of network equipment,
System;
The bus system, for each hardware component of the network equipment to be coupled;
The network interface, for realizing the communication link between the network equipment and at least one other network equipment
It connects;
The memory, for storing program instruction and/or data;
The processor, for reading the instruction and/or data that store in the memory, the following operation of execution:
Obtain the target set of URL of targeted website;The targeted website are as follows: the rewriting of uniform resource position mark URL to be generated
The website of rule;
Obtain the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL, wherein the resource
Parameter is the subpath of the prefix parameter;
The URL rewriting rule collection of the targeted website is generated according to the parameter set.
Compared with prior art, the embodiment of the present application includes the following advantages:
It in the embodiment of the present application, can be in the web access log based on a website using the embodiment of the present application
Set of URL, come to URL each in set of URL prefix parameter and resource parameters analyze, so that it is determined that URL overwrite parameter out, and will
Prefix parameter before URL overwrite parameter finally obtains the target URL rewriting rule collection of the website as URL rewriting rule.Because
The embodiment of the present application is not necessarily to manual analysis web access log, so saving a large amount of manpower and material resources costs, and also can be reduced craft
Mistake when URL rule is configured, so that can also generate URL rewriting quickly for the application scenarios of a large amount of even magnanimity websites
Rule.
Further, can also according to URL rewriting rule concentrate URL rewriting rule, by original URL be rewritten as with it is original
Another URL different URL is scanned for scanner.URL overwrite parameter " $ { dynamic } " therein will not partially be swept
Device is retouched as path to implement to scan, to not only reduce the sweep object of scanner, moreover it is possible to guarantee that scanner will not be attacked
Person attacks easily.
Certainly, any product for implementing the application does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to more clearly explain the technical solutions in the embodiments of the present application, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for
For those of ordinary skill in the art, without any creative labor, it can also be obtained according to these attached drawings
His attached drawing.
Fig. 1 is the flow chart of the generation method embodiment of the URL rewriting rule of the application;
Fig. 2 is the result schematic diagram that URL is pre-processed in the present processes embodiment;
Fig. 3 is to be split to obtain the flow chart of parameter set to URL in the present processes embodiment;
Fig. 4 is the flow chart mapped in the present processes embodiment original URL;
Fig. 5 is the flow chart of the URL scan method embodiment based on URL rewriting rule of the application;
Fig. 6 is the structural block diagram of the generating means embodiment of the URL rewriting rule of the application;
Fig. 7 is the structural block diagram of the scanner embodiment of the application;
Fig. 8 is the structural block diagram of the network equipment 800 shown according to an exemplary embodiment.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.It is based on
Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall in the protection scope of this application.
One of main thought of the application is visited for the web access log for getting one or more websites according to Web
Ask the URL rewriting rule collection of each one or more websites of URL generation in log.Specifically, can be first to URL each in access log
It is pre-processed, for example, the operation such as filtering, standardization or duplicate removal, obtains pretreated set of URL as target set of URL, then
Target set of URL is split based on path, obtains an array for corresponding to the website.It recycles in the array, each prefix ginseng
Several and the path parameter under the prefix parameter attaching relation, for example, whether the number of path parameter is big under some prefix parameter
In preset threshold, to count all prefix parameters for needing to be added to URL rewriting rule concentration, the URL weight of the website is obtained
Write rule set.
With reference to Fig. 1, a kind of flow chart of the generation method embodiment of URL rewriting rule of the application, the present embodiment are shown
It may comprise steps of:
Step 101: obtaining the target set of URL of targeted website;The targeted website are as follows: uniform resource locator to be generated
The website of the rewriting rule of URL.
In the present embodiment, targeted website can be the Web site of URL rewriting rule collection to be generated, in practical applications
It is either one or more, for each targeted website, the URL rewriting rule collection of each targeted website is generated respectively i.e.
It can.The access log of targeted website can be inquired from database of the corresponding server in targeted website etc. and be obtained.
In this step, each initial URL in the access log of the targeted website can be directly acquired as target URL
Collection, it is of course also possible to be pre-processed to each initial URL in access log, and using pretreated each initial URL as mesh
Mark set of URL.Wherein, preprocessing process is mainly that URL in the access log of Web site is filtered, standardizes and duplicate removal etc.
Operation.Specifically, step 101 may comprise steps of A1~step A3:
Step A1: according to HTTP status code, illegal URL is filtered from the initial set of URL in the access log of targeted website
Request corresponding illegal URL.
Wherein, HTTP status code (HTTP Status Code) is used to indicate that the 3 of web page server http response state
Digit numerical code.Wherein, HTTP status code illustrates the processed success of HTTP request, the desired response of HTTP request for 200
Head or data volume will be returned with this response.Therefore, in this step, the url filtering by HTTP status code non-200 is needed to fall, with
The URL for avoiding such from being not present or malfunction is interfered during generating URL rewriting rule collection.
Step A2: standardization processing is carried out for the initial set of URL after illegal URL has been filtered, obtains specification set of URL, institute
Stating the specification URL in specification set of URL includes: domain name, path and filename.
After the illegal URL for having filtered HTTP status code non-200, standardize to filtered URL, to ignore
Fall agreement, user name and the inquiry string etc. in all URL, to obtain specification URL, include in the specification URL domain name,
Path and filename.It is understood that port can also be ignored simultaneously if port is 80 or 443, only retain domain name, path
And filename.Those skilled in the art can also be converted to small English character to the English character of capitalization all in domain name, most
Specification URL is obtained eventually.
Step A3: duplicate removal processing is carried out to the specification URL in the specification set of URL, obtains target set of URL.
In this step, duplicate removal is carried out to the specification URL in standardization set of URL, for mutual duplicate each URL, only protected
A URL is stayed, target set of URL is finally obtained.
Refering to what is shown in Fig. 2, to be filtered, standardizing respectively to URL and the schematic diagram of when duplicate removal each step results.Fig. 2
HTTP status code existing for left upper be " 404 " corresponding URL " http://a.com/blog.php " will in step A1 into
Row filtering, obtains five each URL at the upper right corner Fig. 2.Standardization processing is carried out to five URL at the upper right corner Fig. 2 again, is deleted
Agreements such as " http: // ", and the inquiry string of "? id=211 ", and, convert " A.COM " to " a.com " of small letter,
Etc., obtain five each specification URL at the lower right corner Fig. 2.And then deduplication operation is carried out to each URL at the lower right corner Fig. 2, it will
Two duplicate URL " a.com/blog.php " only retain one, to obtain four URL of Fig. 2 lower right-hand corner.When
So, each URL of Fig. 2 is only a specific example in practical application, and those skilled in the art should not be construed as this Shen
Restriction please.
Step 102: obtaining the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL, wherein
The resource parameters are the subpath of the prefix parameter.
In this step, it for the target URL in each pretreated target set of URL, then needs to carry out cutting, i.e. foundation
Path is split pretreated target URL to obtain an array, includes based on after the segmentation of path in the array
Each character string, first character string are domain names.Then, respectively according to path relation, successively by domain name, domain name and its subpath,
Subpath of domain name and its subpath and subpath etc. is used as prefix parameter, to determine the corresponding resource parameters of the prefix parameter,
Finally obtain the parameter set of prefix parameter preservation corresponding with resource parameters.
In practical applications, an empty URL rewriting rule collection can be preset, as determine one of parameter set according to
According to.Then this step 102 is primarily based on default separator and is split to each target URL in target set of URL, respectively obtains each mesh
The corresponding character array of URL is marked, default separator therein can be "/", i.e. path separators.Again according in each character array
Each character string forms the sequence of the target URL, determines corresponding prefix parameter and resource parameters in each target URL respectively, with
Obtain parameter set.
Specifically, determining that corresponding prefix parameter and resource parameters may include: with the process for obtaining parameter set
Step B1: any one character array is obtained as current array.
In the present embodiment, the corresponding character array of a target URL.Then first by any one untreated character
Array is as current array, for example, the current array obtained for target URL " a.com/search/winter/2 " are as follows:
{ ' a.com ', ' search ', ' winter ', ' 2 ' }.Wherein, ' a.com ' is the 1st character string of the array, and ' 2 ' be the array
The 4th character string, which shares 4 character strings.
Step B2: array circulation process is executed.
For obtained current array, { ' a.com ', ' search ', ' winter ', ' 2 ' }, the array circulation process packet
Include step 21~step 24:
Step B21: according to vertical sequence, before the first character string in the acquisition current array is used as currently
Sew parameter.
In this step, it regard the 1st character string ' a.com ' of the array as current prefix parameter " prefix ", deserves
Preceding prefix parameter resource parameters adjacent thereafter " resource " are second character string ' search ' in array.
Step B22: save to initial parameter corresponding with resource parameters adjacent thereafter of the current prefix parameter is concentrated.
' a.com ' and ' search ' corresponding save to initial parameter is concentrated, in the present embodiment, initial parameter collection can
Think sky.
Step B23: judging whether the current prefix parameter is concentrated in initial URL rewriting rule, if it is, will be described
Current prefix parameter and default overwrite parameter group are combined into update prefix parameter;If it is not, then by the current prefix parameter and its
Adjacent resource parameters group is combined into update prefix parameter afterwards.
Then for ' a.com ', judge whether to concentrate in initial URL rewriting rule.Wherein, the initial URL rewriting rule
Collection can be sky, as the rewriting rule that more and more initial URL rewriting rules of target URL analysis are concentrated is more and more,
Until no longer updating.
Assuming that initial URL rewriting rule collection is sky, then ' a.com ' then may not be used in initial URL rewriting rule concentration
Prefix parameter " a.com/ is updated so that current prefix parameter ' a.com ' and resource parameters ' search ' group adjacent thereafter to be combined into
search".And assume that ' a.com/search ' is concentrated in initial URL rewriting rule, then illustrate ' a.com/search ' phase thereafter
Adjacent resource parameters ' winter ' are URL overwrite parameter, then in this case, by current prefix parameter ' a.com/search '
It is combined into default overwrite parameter (such as " dynamic ") group and updates prefix parameter " a.com/search/ $ { dynamic } ".
Step B24: with the update prefix parameter be current prefix parameter, execute it is described will the current prefix parameter and
Thereafter adjacent resource parameters are corresponding to save to initial parameter the step of concentrating, until all character strings of current goal array are all
Circulation finishes.
In this step, followed by update prefix parameter obtained in step B23, that is, " a.com/search " or
" a.com/search/ $ { dynamic } " is used as current prefix parameter, by " a.com/search " or " a.com/search/ $
{ dynamic } " is corresponding with resource parameters adjacent thereafter to be saved to initial parameter collection.For example, by prefix parameter " a.com/
Search " is corresponding with its resource parameters " winter " to be saved to initial parameter collection.For another example by prefix parameter " a.com/
Search/ $ { dynamic } " is corresponding with its resource parameters " 2 " to be saved to initial parameter collection, until all in the target data
Character string is all recycled and is finished.
Step B3: judge whether all circulation finishes all character arrays, if it is not, then by any one uncirculated character
Array triggers step B2 and executes the array circulation process as current array;If it is, entering step B4.
Then judge whether all circulation finishes the corresponding all character arrays in each targeted website, if not, with any one
The character array not being circulated throughout triggers step B2 and executes the array circulation process as current array.
Step B4: the initial parameter collection is obtained as the corresponding target component collection of target URL.
In this step, then each group prefix parameter and resource parameters that can be concentrated the initial parameter finally no longer updated
Corresponding output, as the foundation for updating URL rewriting rule collection.
Step 103: the URL rewriting rule collection of the targeted website is generated according to the parameter set.
It in this step, can be according to how many a resource parameters have been corresponded under prefix parameter each in parameter set, to determine this
Whether prefix parameter should be added to URL rewriting rule collection.
It in practical applications, can be with the quantity of the different resource parameters under the same prefix parameter, because for one
For the normal path URL, the number of the resource parameters under a prefix parameter should be limited, for needing as URL
The prefix parameter of rewriting rule, corresponding resource parameters are equivalent to the arbitrary parameter of user's input, and quantity is larger.So can
A threshold value is preset, if the quantity of resource parameters is greater than the threshold value, the value of the prefix parameter is added to URL to rewrite and is advised
It then concentrates, finally obtains updated URL rewriting rule collection, rule set no longer updates again up to initial URL, can will update
URL rewriting rule collection afterwards is determined as the target URL rewriting rule collection.
Wherein, URL rewriting rule is used in such a way that a URL indicates a rewriting rule, which parameter indicated
Adjacent path parameter is URL overwrite parameter after the URL concentrated for overwrite parameter, URL rewriting rule.Specifically, URL is rewritten
Rule set may include a plurality of URL rewriting rule, wherein the format of single URL rewriting rule is a URL, such as " a.com/
S ", the meaning of the URL rewriting rule are as follows: the path parameter (such as " a1 " in a.com/s/a1/a2) after " a.com/s "
For URL overwrite parameter.Due in a URL in access log may with the presence of multiple path URL rewriting rules, such as
" a.com/search/test/2 " can be mapped to " a.com/search.php? keyword=test&page=2 ", therefore
It needs to be iterated each URL in access log, until each URL that URL rewriting rule is concentrated does not change.
As it can be seen that using the embodiment of the present application, can set of URL in the web access log based on a website, to URL
The prefix parameter and resource parameters for concentrating each URL are analyzed, so that it is determined that URL overwrite parameter out, and will be before URL overwrite parameter
Prefix parameter as URL rewriting rule, to obtain the target URL rewriting rule collection of the website.Because of the embodiment of the present application
Without manual analysis web access log, so saving a large amount of manpower and material resources costs, and manual configuration URL rule also can be reduced
When mistake so that can also generate URL rewriting rule quickly for the application scenarios of a large amount of even magnanimity websites.
In practical applications, based on obtained URL rewriting rule collection, original WEB log can also be handled, it will
Original URL maps to the path URL after rewriteeing, because URL rewriting rule can represent URL overwrite parameter, accordingly, it is possible to
It calls using each URL including URL overwrite parameter as the same URL for scanner, is called which reduces scanner
URL number.Therefore, after step 103, can also include:
Step 104: according to the URL rewriting rule collection of the targeted website, original URL being mapped into the URL after rewriteeing.
It can be based on URL rewriting rule collection in this step, original URL is mapped into the URL after rewriteeing, and extract inquiry word
Overwrite parameter in symbol string and URL rewriting rule, the input source as scanner.Specifically, the realization process of step 104 can
To include step C1~step C5:
Step C1: standardizing to the URL to be mapped, the URL after being standardized, and stores URL's to be mapped
Inquiry string.
In this step, it needs to standardize to original URL to be mapped, it can be with reference to step the step of concrete norm
The description of rapid A2, details are not described herein.For example, original URL be " a.com/search/winter/2? a=b ", then "? a=b "
As inquiry string is stored, the URL after standardizing in this step are as follows: a.com/search/winter/2.
Step C2: being split the URL after the standardization based on default separator, the character after respectively obtaining segmentation
Array.
In this step, preset path separators "/" is also based on to be split to the URL after specification, obtains word
Accord with array: { ' a.com ', ' search ', ' winter ', ' 2 ' }.
Step C3: the matching knot concentrated according to each prefix parameter in the character array after segmentation in the URL rewriting rule
URL to be mapped is mapped to the URL after rewriteeing by fruit.
Again according to each prefix parameter in the character array, for example, " a.com ", " a.com/search ", " a.com/
Search/winter ", " a.com/search/winter/2 " etc. will be wait reflect in the matching result that URL rewriting rule is concentrated
It penetrates URL and maps to the URL after rewriteeing.
Specifically, the mapping process of step C3 may include step C31~step C35:
Step C31: according to vertical sequence, the first character string in the character array after obtaining the segmentation is made
For current prefix parameter.
Still " a.com " obtained in character array is used as current prefix parameter.
Step C32: judging whether the current prefix parameter is concentrated in the URL rewriting rule, if it is, entering step
Rapid C33, if it is not, then entering step C34.
If current prefix parameter " a.com/search " is concentrated in URL rewriting rule, C33 is entered step, if worked as
Preceding prefix parameter is that " a.com " is not concentrated in URL rewriting rule, then enters step C34.
Step C33: the current prefix parameter and default overwrite parameter group are combined into update prefix parameter, worked as described in acquisition
The value of the preceding corresponding resource parameters of prefix parameter and the resource parameters, and by the resource parameters, resource parameters value, look into
Character string preservation corresponding with the URL after the rewriting is ask, C35 is entered step.
In this step, by current prefix parameter " a.com/search " and default overwrite parameter " $ { dynamic } " group
It is combined into update prefix parameter, obtains " a.com/search/ $ { dynamic } ".In addition it is also necessary to obtain " a.com/search "
Resource parameters value " winter ", and, inquiry string "? a=b ".
Step C34: the current prefix parameter and resource parameters group adjacent thereafter are combined into update prefix parameter, entered
Step C35.
Then current prefix parameter " a.com/search " and its resource parameters " winter " group are combined into more in this step
New prefix parameter obtains " a.com/search/winter ".
Step C4: with the update prefix parameter be current prefix parameter, execute it is described will the current prefix parameter and
Thereafter adjacent resource parameters are corresponding to save to initial parameter the step of concentrating, until the institute in the character array after the segmentation
There is character string all to recycle to finish.
Again by " a.com/search/ $ { dynamic } " in the step C33 or " a.com/search/ in C34
Winter ", as current prefix parameter, return step 33 is judged whether there is to be concentrated in URL rewriting rule, until character array
In all character strings all recycle and finish, obtain URL overwrite parameter at this time, such as " winter ", or " 2 ", and inquiry
Character string "? a=b ".
Step C5: the prefix parameter that updates is obtained as the URL after rewriteeing.
The update prefix parameter no longer updated is finally obtained as the URL after rewriteeing.Assuming that URL rewriting rule concentration includes
" a.com/search " and " a.com/search/ $ { dynamic } ", then the URL packet after the rewriting got in this step
It includes: " a.com/search/ $ { dynamic }/$ { dynamic } ";Wherein, when corresponding resource parameters are dynamic_1, money
Source parameter value is winter;When corresponding resource parameters are dynamic_2, resource parameters value is 2;Corresponding resource parameters are a
When, resource parameters value is b.
As it can be seen that in the embodiment of the present application, the URL rewriting rule also concentrated according to URL rewriting rule, by multiple including phase
The original URL of same URL overwrite parameter is rewritten as a target URL, scans for scanner.URL overwrite parameter " $ therein
{ dynamic } " partially will not be implemented to scan by scanner as path, so that the sweep object of scanner is not only reduced,
Also ensure that scanner will not be attacked easily by attacker.
With reference to Fig. 3, shows and be split and obtain prefix ginseng in the application embodiment of the method to the URL in access log
Several and resource parameters flow charts, this process may comprise steps of:
Step 301: obtaining target set of URL.
Step 302: the target URL in target set of URL being split, an array is obtained.
Step 303: initiation parameter, n are equal to 1, and taking prefix parameter prefix is the 0th element, i.e. domain name in array.
In this step, the array url_array still to obtain are as follows: { ' a.com ', ' search ', ' winter ', ' 2 ' }
For.Initiation parameter, n=1, prefix is " a.com " at this time.
Step 304: triggering step 304~step 307 circulation;Taking resource is the 1st element of array
“search”。
Step 305: storing the value of prefix and corresponding resource at this time.
It will " a.com " and " search " corresponding storage.
Step 306: judging whether the value of prefix at this time is concentrated in URL rewriting rule, if enabling the prefix be
Prefix+ " $ { dynamic } ", otherwise enabling prefix is prefix+resource;
Judge whether " a.com " concentrates in URL rewriting rule;Assuming that corresponding URL rewriting rule collection includes being rewritten by URL
Rule are as follows: " a.com/search ";" a.com " is not concentrated then in the URL rewriting rule, then prefix is prefix+
Resource, i.e. a.com/search.
Step 307: enabling n=n+1, judge whether n is less than the length of array, if it is, continuing step 304~step
The first step of 307 circulations, otherwise enters step 308.
N=2, n is enabled to be less than the length 4 of array, then continue the resource for taking prefix again are as follows: " winter ", into step
Rapid 305 are stored, and successively execute step 306 and 307, until n is equal to the length of array.
Step 308: the value of all prefix and its corresponding resource that output stores in 305 steps.
For example, in the present example, output result can be as shown in table 1:
Table 1
With reference to Fig. 4, the example flow chart that the application embodiment of the method maps original URL according to URL rule set is shown, this
Embodiment may comprise steps of:
Step 401: URL to be mapped being standardized, and stores the parameter name in inquiry string and parameter value.
Assuming that URL to be mapped are as follows: " a.com/search/winter/2? a=b " polling character that then stores in this step
The parameter of string entitled a, parameter value b.
Step 402: URL after standardization processing being based on path separators "/" and is split, an array is obtained.
Separator "/" is based on to the URL " a.com/search/winter/2 " after specification to be split, and obtains a number
Group.
Step 403: initiation parameter, n=1, taking prefix parameter prefix is array the 0th element, i.e. domain name.
In this step, initiation parameter, n=1, prefix is " a.com " at this time.
Step 404: triggering step 404~step 406 loop body, taking resource is the 1st element of array
“search”。
Step 405: judging whether the value of prefix at this time is concentrated in URL rewriting rule, if enabling the prefix be
Prefix+ " $ { dynamic } ", while the value of resource is stored, otherwise enabling prefix is prefix+resource.
Step 406: enabling n=n+1, judge whether n is less than the length of array, if it is, continuing cycling through the first step of body
Step 404 is triggered, otherwise enters step 407.
Step 407: exporting prefix at this time is the URL after rewriteeing, while exporting the ginseng of inquiry string in step 401
The value of all the resource parameters names and resource parameters that are stored in several and parameter value and loop body step 405.
Specifically, assuming that URL rewriting rule concentrates the rewriting rule for including to have: " a.com/search ", and " a.com/
Search/ { dynamic } ", then a kind of possible output of step 407 can be shown in reference table 2.
Table 2
Referring to Fig. 5, the flow chart for the URL scan method embodiment based on URL rewriting rule that present invention also provides a kind of,
The present embodiment may comprise steps of:
Step 501: pre-generated URL rewriting rule collection is obtained, and, the initial set of URL to be scanned of targeted website.
It in practical applications, can also be to WEB log after based on URL rewriting rule collection is obtained using method shown in FIG. 1
In set of URL be written over according to the URL rewriting rule collection.Specifically, scanner can be pre-saved using side shown in FIG. 1
The URL rewriting rule collection that method obtains, for example, save into memory, can also get targeted website all URL be used as to
The initial set of URL of scanning.Because URL rewriting rule can represent URL overwrite parameter, accordingly, it is possible to will include same
The initial URL of many of URL overwrite parameter is subsequent by rewriteeing with after duplicate removal, calls as a URL for scanner, thus
Reduce URL number of scanner scanning.
Step 502: the initial URL in the initial set of URL being written over according to the URL rewriting rule collection, is obtained
Initial set of URL after rewriting.
After getting URL rewriting rule collection and initial URL rule set, using URL rewriting rule collection to initial URL rule
The each initial URL concentrated is written over.Specific rewrite process can be discussed in detail with reference to step C1~step C5, herein
It repeats no more.
Step 503: duplicate removal processing being carried out to the initial set of URL after the rewriting, obtains target ULR collection.
Duplicate removal processing is carried out to the initial URL after rewriting, obtains different, a plurality of target URL.Because for initial
For URL, many initial URL may include identical URL overwrite parameter, then means this multiple initial URL in fact all
It is directed toward the same page, then the address after this multiple initial URL rewriting is identical, therefore, for first after this multiple rewriting
Beginning URL only retains one.And so on, the available target set of URL more much smaller than the number in initial set of URL.
Step 504: the target URL in the target set of URL is scanned.
Then scanner is again scanned each target URL in target set of URL because the number of target URL than
The number of initial URL is much smaller, therefore, the scan efficiency of scanner can be made higher.
For the aforementioned method embodiment, for simple description, therefore, it is stated as a series of action combinations, still
Those skilled in the art should understand that the application is not limited by the described action sequence, because according to the application, it is certain
Step can be performed in other orders or simultaneously.Secondly, those skilled in the art should also know that, it is described in the specification
Embodiment belong to preferred embodiment, necessary to related actions and modules not necessarily the application.
It is corresponding with method provided by a kind of generation method embodiment of URL rewriting rule of above-mentioned the application, referring to figure
6, present invention also provides a kind of generating means embodiments of URL rewriting rule, in the present embodiment, the apparatus may include:
Set of URL unit 601 is obtained, for obtaining the target set of URL of targeted website;The targeted website are as follows: system to be generated
The website of the rewriting rule of one Resource Locator URL.
Wherein, the acquisition set of URL unit 601 can be used for: to the initial set of URL in the access log of targeted website into
Row pretreatment, obtains target set of URL.
Wherein, when the acquisition set of URL unit 601 executes pretreatment, may include:
Subelement is filtered, for foundation hypertext transfer protocol HTTP status code, from the access log of targeted website
The corresponding illegal URL of illegal URL request is filtered in initial set of URL;
Standardize subelement, carries out standardization processing for being directed to the initial set of URL after having filtered illegal URL, is advised
Model set of URL, the specification URL in the specification set of URL includes: domain name, path and filename;And
Duplicate removal subelement obtains target URL for carrying out duplicate removal processing to the specification set of URL.
Get parms unit 602, for obtaining mutual corresponding prefix parameter and resource parameters in the target set of URL
Parameter set, wherein the resource parameters are the subpath of the prefix parameter.
Wherein, the unit 602 that gets parms may include:
Divide subelement, for being split based on default separator to each target URL in the target set of URL, point
The corresponding character array of each target URL is not obtained;And
Parameter determines subelement, for forming the sequence of the target URL according to each character string in the character array, point
Do not determine corresponding prefix parameter and resource parameters in each target URL, to obtain parameter set.
Wherein, the parameter determines subelement, is specifically used for:
Any one character array is obtained as current array, executes array circulation process, the array circulation process packet
It includes:
According to vertical sequence, the first character string in the current array is obtained as current prefix parameter;
Save to initial parameter corresponding with resource parameters adjacent thereafter of the current prefix parameter is concentrated;
Judge whether the current prefix parameter is concentrated in initial URL rewriting rule, if it is, by the current prefix
Parameter and default overwrite parameter group are combined into update prefix parameter;If it is not, then by the current prefix parameter and adjacent thereafter
Resource parameters group is combined into update prefix parameter;
With the update prefix parameter be current prefix parameter, execute it is described by the current prefix parameter with it is adjacent thereafter
Resource parameters it is corresponding save to initial parameter the step of concentrating, until all character strings of current goal array have all recycled
Finish;
Judge whether all circulation finishes all character arrays, if it is not, then any one uncirculated character array is made
For current array, triggering executes the array circulation process;
If it is, using the initial parameter collection as the corresponding target component collection of target set of URL.
Generation unit 603, for generating the URL rewriting rule collection of the targeted website according to the parameter set.
Wherein, the generation unit 603 can specifically include:
Judgment sub-unit judges that the quantity of resource parameters under the prefix parameter is for being directed to each prefix parameter respectively
It is no to be greater than preset threshold;
Subelement is updated, in the case where the result of the judgment sub-unit, which is, is, the prefix parameter to be updated
It is concentrated to the initial URL rewriting rule, obtains updated URL rule set again, up to the initial URL again rule set
No longer update;
Rule determines subelement, for updated URL rewriting rule collection to be determined as the target URL rewriting rule
Collection.
Using the device of the embodiment of the present application, can URL in the web access log based on a website, in URL
Prefix parameter and resource parameters analyzed, so that it is determined that URL overwrite parameter out, and the prefix before URL overwrite parameter is joined
Number generates URL rewriting rule, finally obtains target URL rewriting rule collection.Because being not necessarily to manual analysis web access log,
Mistake when saving a large amount of manpower and material resources costs, and also can be reduced manual configuration URL rule, so that for a large amount of even magnanimity
The application scenarios of website can also generate URL rewriting rule quickly.
Wherein, which can also include:
Map unit 604, for the URL rewriting rule collection according to the targeted website, after original URL is mapped to rewriting
URL.
Wherein, the map unit 604 may include:
Standardize subelement, for standardizing to the URL to be mapped, the URL after being standardized;Segmentation is single
Member, for being split based on default separator to the URL after the standardization, the character array after respectively obtaining segmentation;With
And mapping subelement, the matching for being concentrated according to each prefix parameter in the character array after segmentation in the URL rewriting rule
As a result, URL to be mapped is mapped to the URL after rewriteeing.
Wherein, the mapping subelement, specifically can be used for:
According to vertical sequence, before the first character string in the character array after obtaining the segmentation is used as currently
Sew parameter;Judge whether the current prefix parameter is concentrated in the URL rewriting rule, if it is, by the current prefix
Parameter and default overwrite parameter group are combined into update prefix parameter;If it is not, then by the current prefix parameter and adjacent thereafter
Resource parameters group is combined into update prefix parameter;And with the update prefix parameter for current prefix parameter, execute described by institute
The step of preservation corresponding with resource parameters adjacent thereafter of current prefix parameter is concentrated to initial parameter is stated, until after the segmentation
Character array in all character strings all recycle and finish;The update prefix parameter is obtained as the URL after rewriteeing.
Wherein, the mapping subelement, can be also used for: obtain the corresponding resource parameters of the current prefix parameter and
The value of the resource parameters;And it and will be after the resource parameters, the value of resource parameters, inquiry string and the rewriting
URL is corresponding to be saved.
As it can be seen that the URL rewriting rule that the map unit 604 is also concentrated according to URL rewriting rule, original URL is rewritten
For another URL, scanned for scanner.URL overwrite parameter " $ { dynamic } " therein partially will not be by scanner conduct
Scanning is implemented in path, to not only reduce the sweep object of scanner, moreover it is possible to guarantee that scanner will not be attacked easily by attacker
It hits.
Corresponding with the scan method that Fig. 5 is provided, with reference to Fig. 7, present invention also provides a kind of scanners, which can
To include:
Obtain URL unit 701, for obtaining pre-generated URL rewriting rule collection, and, targeted website it is to be scanned
Initial set of URL;The URL rewriting rule collection such as under type generates: obtaining the target set of URL of targeted website, the targeted website
Are as follows: the website of uniform resource position mark URL rewriting rule to be generated;Obtain mutual corresponding prefix ginseng in the target set of URL
The parameter sets with resource parameters are counted, and generate the URL rewriting rule collection of the targeted website according to the parameter set.
Rewriting unit 702, for carrying out weight to the initial URL in the initial set of URL according to the URL rewriting rule collection
It writes, the initial set of URL after being rewritten.
Duplicate removal unit 703 obtains target ULR collection for carrying out duplicate removal processing to the initial set of URL after the rewriting.
Scanning element 704, for being scanned to the target URL in the target set of URL.
Because the number of target URL is more much smaller than the number of initial URL in the present embodiment, so the present embodiment
Scanner scan efficiency it is higher.
Fig. 8 is a kind of hardware structural diagram of the network equipment 800 in the embodiment of the present invention.The network equipment 800 can be used for
Realize 8.I.e. the network equipment 800 can be used for executing the method provided in above-described embodiment.In the present embodiment, the network equipment 800
It include: processor 801, memory 802, network interface 803 and bus system 804.
The bus system 804, for each hardware component of the network equipment 800 to be coupled.
The network interface 803, for realizing the communication link between the network equipment 800 and at least one other network equipment
It connects, internet, wide area network, local network, the modes such as Metropolitan Area Network (MAN) can be used.
The memory 802, for storing program instruction and/or data.
The processor 801, for reading the instruction and/or data that store in memory 802, the following operation of execution:
Pre-generated URL rewriting rule collection is obtained, and, the initial set of URL to be scanned of targeted website;The URL weight
Rule set is write to generate using the generation method of URL rewriting rule above-mentioned;
The initial URL in the initial URL rule set is written over according to the URL rewriting rule collection, is rewritten
Initial URL afterwards;
Duplicate removal processing is carried out to the initial set of URL after the rewriting, obtains target ULR collection;
Target URL in the target set of URL is scanned.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight
Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other.
For device class embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place ginseng
See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning
Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that
A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or
The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged
Except there is also other identical elements in the process, method, article or apparatus that includes the element.
The generation method and device, scan method and device of URL rewriting rule provided herein are carried out above
It is discussed in detail, specific examples are used herein to illustrate the principle and implementation manner of the present application, above embodiments
Illustrate to be merely used to help understand the present processes and its core concept;At the same time, for those skilled in the art, according to
According to the thought of the application, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification
It should not be construed as the limitation to the application.
Claims (14)
1. a kind of generation method of uniform resource position mark URL rewriting rule, which is characterized in that this method comprises:
Obtain the target set of URL of targeted website;The targeted website are as follows: uniform resource position mark URL rewriting rule to be generated
Website;
Obtain the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL, wherein the resource parameters
For the subpath of the prefix parameter;
The URL rewriting rule collection of the targeted website is generated according to the parameter set.
2. the method according to claim 1, wherein the target set of URL for obtaining targeted website, comprising:
Initial set of URL in the access log of targeted website is pre-processed, target set of URL is obtained.
3. according to the method described in claim 2, it is characterized in that, initial URL in the access log to targeted website
Collection is pre-processed, and target set of URL is obtained, comprising:
According to hypertext transfer protocol HTTP status code, filtered from the initial set of URL in the access log of targeted website illegal
The corresponding illegal URL of URL request;
Standardization processing is carried out for the initial set of URL after illegal URL has been filtered, obtains specification set of URL, the specification set of URL
In specification URL include: domain name, path and filename;
Duplicate removal processing is carried out to the specification set of URL, obtains target set of URL.
4. the method according to claim 1, wherein the prefix parameter obtained in the target set of URL and
The parameter set of resource parameters, comprising:
Each target URL in the target set of URL is split based on default separator, it is corresponding to respectively obtain each target URL
Character array;
The sequence that the target URL is formed according to each character string in the character array, it is right in each target URL to determine respectively
The prefix parameter and resource parameters answered, to obtain parameter set.
5. according to the method described in claim 4, it is characterized in that, described form institute according to character string each in the character array
The sequence of target URL is stated, determines corresponding prefix parameter and resource parameters in each target URL respectively, comprising:
Any one character array is obtained as current array, executes array circulation process, the array circulation process includes:
According to vertical sequence, the first character string in the current array is obtained as current prefix parameter;
Save to initial parameter corresponding with resource parameters adjacent thereafter of the current prefix parameter is concentrated;
Judge whether the current prefix parameter is concentrated in initial URL rewriting rule, if it is, by the current prefix parameter
Update prefix parameter is combined into default overwrite parameter group;If it is not, then by the current prefix parameter and resource adjacent thereafter
Parameter combination is to update prefix parameter;
With the update prefix parameter for current prefix parameter, execute described by the current prefix parameter and money adjacent thereafter
Source parameter is corresponding to save to initial parameter the step of concentrating, until all character strings of current goal array are all recycled and finished;
Judge whether all circulation finishes all character arrays, if it is not, then using any one uncirculated character array as working as
Preceding array, triggering execute the array circulation process;
If it is, using the initial parameter collection as the corresponding target component collection of target set of URL.
6. according to the method described in claim 5, it is characterized in that, described generate according to the path parameter and non-path parameter
The URL rewriting rule collection of the targeted website, comprising:
For each prefix parameter, judge whether the quantity of resource parameters under the prefix parameter is greater than preset threshold respectively, if
It is that the prefix parameter is then updated to the initial URL rewriting rule and is concentrated, obtains updated URL rule set again, directly
To the initial URL, rule set no longer updates again;
Updated URL rewriting rule collection is determined as the target URL rewriting rule collection.
7. the method according to claim 1, wherein further include:
According to the URL rewriting rule collection of the targeted website, URL to be mapped is mapped into the URL after rewriteeing.
8. the method according to the description of claim 7 is characterized in that the URL rewriting rule collection according to the targeted website,
URL to be mapped is mapped into the URL after rewriteeing, comprising:
Standardize to the URL to be mapped, the URL after being standardized;
The URL after the standardization is split based on default separator, the character array after respectively obtaining segmentation;
It, will be to be mapped according to the matching result that each prefix parameter in the character array after segmentation is concentrated in the URL rewriting rule
URL maps to the URL after rewriteeing.
9. according to the method described in claim 8, it is characterized in that, described according to each prefix parameter in character array after each segmentation
In the matching result that the URL rewriting rule is concentrated, URL to be mapped is mapped into the URL after rewriteeing, comprising:
According to vertical sequence, the first character string in the character array after obtaining the segmentation is joined as current prefix
Number;
Judge whether the current prefix parameter is concentrated in the URL rewriting rule, if it is, by the current prefix parameter
Update prefix parameter is combined into default overwrite parameter group;If it is not, then by the current prefix parameter and resource adjacent thereafter
Parameter combination is to update prefix parameter;
With the update prefix parameter for current prefix parameter, execute described by the current prefix parameter and money adjacent thereafter
Source parameter is corresponding to save to initial parameter the step of concentrating, until all character strings in the character array after the segmentation are all followed
Ring finishes;
The update prefix parameter is obtained as the URL after rewriteeing.
10. according to the method described in claim 9, it is characterized in that, in the current prefix parameter in the URL rewriting rule
In the case where concentration, further includes:
Obtain the value of the corresponding resource parameters of the current prefix parameter and the resource parameters;
By the resource parameters, the value of resource parameters, inquiry string preservation corresponding with the URL after the rewriting.
11. a kind of URL scan method, which is characterized in that this method comprises:
Pre-generated URL rewriting rule collection is obtained, and, the initial set of URL to be scanned of targeted website;The URL rewrites rule
Then collect and generate in the following way: obtaining the target set of URL of targeted website, the targeted website are as follows: unified resource to be generated is fixed
The website of position symbol URL rewriting rule;Obtain the parameter of mutual corresponding prefix parameter and resource parameters in the target set of URL
Collect, and generates the URL rewriting rule collection of the targeted website according to the parameter set;
The initial URL in the initial set of URL is written over according to the URL rewriting rule collection, it is initial after being rewritten
Set of URL;
Duplicate removal processing is carried out to the initial set of URL after the rewriting, obtains target ULR collection;
Target URL in the target set of URL is scanned.
12. a kind of generating means of URL rewriting rule, which is characterized in that the device includes:
Set of URL unit is obtained, for obtaining the target set of URL of targeted website;The targeted website are as follows: unified resource to be generated is fixed
The website of the rewriting rule of position symbol URL;
Get parms unit, for obtaining the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL,
Wherein, the resource parameters are the subpath of the prefix parameter;
Generation unit, for generating the URL rewriting rule collection of the targeted website according to the parameter set.
13. a kind of scanner, which is characterized in that the scanner includes:
URL unit is obtained, for obtaining pre-generated URL rewriting rule collection, and, the initial URL to be scanned of targeted website
Collection;The URL rewriting rule collection generates in the following way: obtaining the target set of URL of targeted website, the targeted website are as follows:
The website of uniform resource position mark URL rewriting rule to be generated;Obtain in the target set of URL mutual corresponding prefix parameter and
The parameter set of resource parameters, and generate according to the parameter set URL rewriting rule collection of the targeted website;
Rewriting unit is obtained for being written over according to the URL rewriting rule collection to the initial URL in the initial set of URL
Initial set of URL after rewriting;
Duplicate removal unit obtains target ULR collection for carrying out duplicate removal processing to the initial set of URL after the rewriting;
Scanning element, for being scanned to the target URL in the target set of URL.
14. a kind of network equipment, which is characterized in that the network equipment includes: processor, memory, network interface and total linear system
System;
The bus system, for each hardware component of the network equipment to be coupled;
The network interface, for realizing the communication connection between the network equipment and at least one other network equipment;
The memory, for storing program instruction and/or data;
The processor, for reading the instruction and/or data that store in the memory, the following operation of execution:
Obtain the target set of URL of targeted website;The targeted website are as follows: uniform resource position mark URL rewriting rule to be generated
Website;
Obtain the parameter set of mutual corresponding prefix parameter and resource parameters in the target set of URL, wherein the resource parameters
For the subpath of the prefix parameter;
The URL rewriting rule collection of the targeted website is generated according to the parameter set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710892706.7A CN109561163B (en) | 2017-09-27 | 2017-09-27 | Method and device for generating uniform resource locator rewriting rule |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710892706.7A CN109561163B (en) | 2017-09-27 | 2017-09-27 | Method and device for generating uniform resource locator rewriting rule |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109561163A true CN109561163A (en) | 2019-04-02 |
CN109561163B CN109561163B (en) | 2022-03-15 |
Family
ID=65864234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710892706.7A Active CN109561163B (en) | 2017-09-27 | 2017-09-27 | Method and device for generating uniform resource locator rewriting rule |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109561163B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399546A (en) * | 2019-07-23 | 2019-11-01 | 中南民族大学 | Link De-weight method, device, equipment and storage medium based on web crawlers |
CN110413861A (en) * | 2019-07-23 | 2019-11-05 | 中南民族大学 | Link extracting method, device, equipment and storage medium based on web crawlers |
CN111461537A (en) * | 2020-03-31 | 2020-07-28 | 山东胜软科技股份有限公司 | Oil gas production data based classified quantity counting method and control system |
CN114157648A (en) * | 2021-11-30 | 2022-03-08 | 北京知道创宇信息技术股份有限公司 | Request matching rule generation method and device, website server and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101267457A (en) * | 2008-04-14 | 2008-09-17 | 华耀环宇科技(北京)有限公司 | A network resource mapping method oriented to L1 customer |
US8510454B2 (en) * | 2006-05-04 | 2013-08-13 | Digital River, Inc. | Mapped parameter sets using bulk loading system and method |
CN103685237A (en) * | 2013-11-22 | 2014-03-26 | 北京奇虎科技有限公司 | Method and device for improving website vulnerability scanning speed |
CN104933056A (en) * | 2014-03-18 | 2015-09-23 | 腾讯科技(深圳)有限公司 | Uniform resource locator (URL) de-duplication method and device |
CN106708952A (en) * | 2016-11-25 | 2017-05-24 | 北京神州绿盟信息安全科技股份有限公司 | Web page clustering method and device |
-
2017
- 2017-09-27 CN CN201710892706.7A patent/CN109561163B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8510454B2 (en) * | 2006-05-04 | 2013-08-13 | Digital River, Inc. | Mapped parameter sets using bulk loading system and method |
CN101267457A (en) * | 2008-04-14 | 2008-09-17 | 华耀环宇科技(北京)有限公司 | A network resource mapping method oriented to L1 customer |
CN103685237A (en) * | 2013-11-22 | 2014-03-26 | 北京奇虎科技有限公司 | Method and device for improving website vulnerability scanning speed |
CN104933056A (en) * | 2014-03-18 | 2015-09-23 | 腾讯科技(深圳)有限公司 | Uniform resource locator (URL) de-duplication method and device |
CN106708952A (en) * | 2016-11-25 | 2017-05-24 | 北京神州绿盟信息安全科技股份有限公司 | Web page clustering method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110399546A (en) * | 2019-07-23 | 2019-11-01 | 中南民族大学 | Link De-weight method, device, equipment and storage medium based on web crawlers |
CN110413861A (en) * | 2019-07-23 | 2019-11-05 | 中南民族大学 | Link extracting method, device, equipment and storage medium based on web crawlers |
CN110413861B (en) * | 2019-07-23 | 2021-10-22 | 中南民族大学 | Link extraction method, device, equipment and storage medium based on web crawler |
CN110399546B (en) * | 2019-07-23 | 2022-02-08 | 中南民族大学 | Link duplicate removal method, device, equipment and storage medium based on web crawler |
CN111461537A (en) * | 2020-03-31 | 2020-07-28 | 山东胜软科技股份有限公司 | Oil gas production data based classified quantity counting method and control system |
CN114157648A (en) * | 2021-11-30 | 2022-03-08 | 北京知道创宇信息技术股份有限公司 | Request matching rule generation method and device, website server and storage medium |
CN114157648B (en) * | 2021-11-30 | 2023-11-28 | 北京知道创宇信息技术股份有限公司 | Request matching rule generation method and device, website server and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109561163B (en) | 2022-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102222187B (en) | Domain name structural feature-based hang horse web page detection method | |
CN103501306B (en) | A kind of network address knows method for distinguishing, server and system | |
CN109561163A (en) | The generation method and device of uniform resource locator rewriting rule | |
CN104486461B (en) | Domain name classification method and device, domain name identification method and system | |
CN101471818B (en) | Detection method and system for malevolence injection script web page | |
CN101370024B (en) | Distributed information collection method and system | |
CN106708952B (en) | A kind of Webpage clustering method and device | |
CN105243159A (en) | Visual script editor-based distributed web crawler system | |
CN109344053B (en) | Interface coverage test method, system, computer device and storage medium | |
CN102855418A (en) | Method for discovering Web intranet agent bugs | |
CN106095979A (en) | URL merging treatment method and apparatus | |
CN108959539B (en) | Rule-configurable webpage data analysis method | |
CN105447035B (en) | data scanning method and device | |
CN109308258A (en) | Construction method, device, computer equipment and storage medium of test data | |
CN109885782B (en) | Ecological environment space big data integration method | |
CN109710826A (en) | A kind of internet information artificial intelligence acquisition method and its system | |
CN111723400A (en) | JS sensitive information leakage detection method, device, equipment and medium | |
CN106940711B (en) | URL detection method and detection device | |
CN111597422A (en) | Buried point mapping method and device, computer equipment and storage medium | |
CN103647774A (en) | Web content information filtering method based on cloud computing | |
CN103927325A (en) | URL (uniform resource locator) classifying method and device | |
CN114186102A (en) | Tree structure data construction method and device and computer equipment | |
CN111090802B (en) | Malicious web crawler monitoring and processing method and system based on machine learning | |
US20180309854A1 (en) | Protocol model generator and modeling method thereof | |
CN103685237A (en) | Method and device for improving website vulnerability scanning speed |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |