[go: up one dir, main page]

CN107786604B - A method and apparatus for determining a content server - Google Patents

A method and apparatus for determining a content server Download PDF

Info

Publication number
CN107786604B
CN107786604B CN201610767748.3A CN201610767748A CN107786604B CN 107786604 B CN107786604 B CN 107786604B CN 201610767748 A CN201610767748 A CN 201610767748A CN 107786604 B CN107786604 B CN 107786604B
Authority
CN
China
Prior art keywords
urls
url
target
condition
sorting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610767748.3A
Other languages
Chinese (zh)
Other versions
CN107786604A (en
Inventor
槐昱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Kunlun Technology Co ltd
XFusion Digital Technologies Co Ltd
Original Assignee
Huawei Digital Technologies Suzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Digital Technologies Suzhou Co Ltd filed Critical Huawei Digital Technologies Suzhou Co Ltd
Priority to CN201610767748.3A priority Critical patent/CN107786604B/en
Publication of CN107786604A publication Critical patent/CN107786604A/en
Application granted granted Critical
Publication of CN107786604B publication Critical patent/CN107786604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/101Server selection for load balancing based on network conditions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明实施例公开了一种确定内容服务器的方法及装置,涉及网站检测技术领域,用以提高确定内容服务器的效率。该方法包括:网关设备获取预设时间段内的网站访问记录,网站访问记录包括预设时间段内被访问的N个URL和与N个URL对应的访问次数;网关设备根据网站访问记录在N个URL中确定M个目标URL,M个目标URL为N个URL中为内容服务器的概率最大的M个URL;网关设备访问M个目标URL中的每个URL对应的Host,并接收运行M个目标URL的M个Host的多个目标服务器返回的M个参数,一个参数包括HTTP返回值以及返回数据字节数;网关设备根据M个参数确定M个目标URL中的内容服务器。

Figure 201610767748

The embodiment of the invention discloses a method and a device for determining a content server, which relate to the technical field of website detection and are used to improve the efficiency of determining a content server. The method includes: the gateway device obtains website access records within a preset time period, and the website access records include N URLs accessed within the preset time period and the number of visits corresponding to the N URLs; the gateway device records N URLs according to the website access records. M target URLs are determined among the URLs, and the M target URLs are the M URLs with the highest probability of being a content server among the N URLs; the gateway device accesses the Host corresponding to each URL in the M target URLs, and receives and runs M URLs. M parameters returned by multiple target servers of M Hosts of the target URL, one parameter includes the HTTP return value and the number of returned data bytes; the gateway device determines the content servers in the M target URLs according to the M parameters.

Figure 201610767748

Description

Method and device for determining content server
Technical Field
The invention relates to the technical field of website detection, in particular to a method and a device for determining a content server.
Background
A content server is a type of website that is used to provide services for other websites, such as storing pictures for other websites, analyzing viewer information, performing traffic rating and content filtering, etc., which are not typically displayed directly to a user.
After a user inputs a website in a browser of a terminal device and searches, the browser can visit a website corresponding to the website and also visit a plurality of websites (many websites belong to content server type websites) along with the website, the websites are used for providing advertisement content, statistical access information or pictures for the websites directly visited by the user, and the users of the websites visited along with the browser cannot sense the websites.
Because the content server is not a malicious website generally, when the website security detection is performed, if the content server can be filtered out, the efficiency of the website security detection can be improved.
At present, a method for determining whether a website is a content server specifically includes: when a user accesses a certain website through a browser of a terminal device, the browser initiates a request for accessing the website to a gateway device, the gateway device obtains the request, determines a Uniform Resource Locator (URL) of the website according to the request, records the URL, forwards the request to a target server (a server running the website), the target server responds to the request after receiving the request and returns a response message to the gateway device, and the gateway device judges whether the URL is a content server according to data contained in the response message. For example, when the data included in the response message is a session, blank content, or a 1 × 1 picture, the URL is determined to be the content server.
The above method of determining a content server is inefficient.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining a content server, which are used for improving the efficiency of determining the content server.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, a method for determining a content server is provided, including: the gateway equipment acquires a website access record in a preset time period, wherein the website access record comprises N Uniform Resource Locators (URLs) accessed in the preset time period and access times corresponding to the N URLs, and N is an integer greater than 0; the gateway equipment determines M target URLs in the N URLs according to the website access records, wherein the M target URLs are M URLs with the highest probability of being a content server in the N URLs, and M is an integer which is greater than 0 and less than or equal to N; the gateway equipment accesses a Host corresponding to each URL in M target URLs, and receives M parameters returned by a plurality of target servers running the M hosts of the M target URLs, wherein one parameter comprises a hypertext transfer protocol (HTTP) return value and the number of bytes of return data; the gateway device determines the content servers in the M target URLs according to the M parameters.
In the method provided by the first aspect, the URL with a low probability of being the content server in the URLs of the plurality of websites is excluded by adopting the access records of the websites, so that the number of URLs of which the gateway equipment needs to determine whether to be the content server is greatly reduced, and the efficiency of determining the content server by the gateway equipment is improved. When the website security detection is performed, the probability that the excluded URL is the content server is low, and even if the content server is included, the number of the excluded URLs is small, so that the efficiency of the website security detection cannot be greatly influenced.
With reference to the first aspect, in a first possible implementation manner, the determining, by the gateway device, M target URLs in N URLs according to the website access record includes: the gateway device determines a URL meeting a condition 1 and/or a condition 2 in the N URLs as a target URL, wherein the condition 1 is as follows: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently, and X, Y are all integers greater than 0 and less than 100.
With reference to the first aspect, in a second possible implementation manner, the website access record further includes an identifier of a terminal device accessing N URLs within a preset time period, and the determining, by the gateway device, M target URLs in the N URLs according to the website access record includes: the gateway device determines, as a target URL, a URL that satisfies one or more of conditions 1, 2, and 3 among the N URLs, where condition 1 is: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently; the condition 3 is: the number of the identifications of the terminal devices accessing the URL is greater than or equal to a first preset threshold, and X, Y are integers which are greater than 0 and smaller than 100.
In the two possible implementations, the probability that the URL that satisfies the preset condition (the preset condition is one or more of the condition 1, the condition 2, or the condition 3) is greater than the probability that the URL that does not satisfy the preset condition is the content server.
With reference to the first aspect, the first possible implementation manner, or the second possible implementation manner of the first aspect, in a third possible implementation manner, the determining, by the gateway device, content servers in M target URLs according to M parameters includes: when the HTTP return value in the parameter corresponding to one target URL is not 200, or the HTTP return value in the parameter corresponding to one target URL is 200 and the number of bytes of returned data is less than or equal to a second preset threshold value, the gateway device determines that the target URL is the content server.
In a second aspect, there is provided a gateway device, comprising: the website access record comprises N Uniform Resource Locators (URLs) accessed in a preset time period and the number of access times corresponding to the N URLs, wherein N is an integer greater than 0; a first determining unit, configured to determine M target URLs from the N URLs according to the website access record, where the M target URLs are M URLs with the highest probability of being a content server from the N URLs, and M is an integer greater than 0 and less than or equal to N; the receiving and sending unit is used for accessing the Host corresponding to each URL in the M target URLs and receiving M parameters returned by a plurality of target servers running the M hosts of the M target URLs, wherein one parameter comprises a hypertext transfer protocol (HTTP) return value and the number of bytes of return data; and the second determining unit is used for determining the content servers in the M target URLs according to the M parameters.
Each unit in the gateway device provided in the second aspect is configured to execute the method provided in the first aspect, and therefore beneficial effects of the gateway device may refer to beneficial effects of the method provided in the first aspect, which are not described herein again.
With reference to the second aspect, in a first possible implementation manner, the first determining unit is specifically configured to: determining a URL meeting the condition 1 and/or the condition 2 in the N URLs as a target URL, wherein the condition 1 is as follows: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently, and X, Y are all integers greater than 0 and less than 100.
With reference to the second aspect, in a second possible implementation manner, the website access record further includes identifiers of terminal devices accessing N URLs within a preset time period, and the first determining unit is specifically configured to: determining, as a target URL, a URL that satisfies one or more of conditions 1, 2, and 3 among the N URLs, where condition 1 is: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently; the condition 3 is: the number of the identifications of the terminal devices accessing the URL is greater than or equal to a first preset threshold, and X, Y are integers which are greater than 0 and smaller than 100.
In the two possible implementations, the probability that the URL that satisfies the preset condition (the preset condition is one or more of the condition 1, the condition 2, or the condition 3) is greater than the probability that the URL that does not satisfy the preset condition is the content server.
With reference to the second aspect, the first possible implementation manner or the second possible implementation manner of the second aspect, in a third possible implementation manner, the second determining unit is specifically configured to: and when the HTTP return value in the parameter corresponding to the target URL is not 200, or the HTTP return value in the parameter corresponding to the target URL is 200 and the number of bytes of returned data is less than or equal to a second preset threshold value, determining that the target URL is the content server.
In a third aspect, a gateway device is provided, including: a memory, a processor, and a transceiver, the memory to store code, the processor to perform the following actions in accordance with the code: acquiring a website access record in a preset time period, wherein the website access record comprises N Uniform Resource Locators (URLs) accessed in the preset time period and access times corresponding to the N URLs, and N is an integer greater than 0; determining M target URLs in the N URLs according to the website access records, wherein the M target URLs are M URLs with the highest probability of being a content server in the N URLs, and M is an integer which is greater than 0 and less than or equal to N; the system comprises a transceiver, a server and a server, wherein the transceiver is used for accessing a Host corresponding to each URL in M target URLs and receiving M parameters returned by a plurality of target servers running the M hosts of the M target URLs, and one parameter comprises a hypertext transfer protocol (HTTP) return value and the number of bytes of return data; and the processor is also used for determining the content servers in the M target URLs according to the M parameters.
Each device in the gateway device provided in the third aspect is configured to execute the method provided in the first aspect, and therefore, beneficial effects of the gateway device may refer to beneficial effects of the method provided in the first aspect, which are not described herein again.
With reference to the third aspect, in a first possible implementation manner, the processor is specifically configured to: determining a URL meeting the condition 1 and/or the condition 2 in the N URLs as a target URL, wherein the condition 1 is as follows: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently, and X, Y are all integers greater than 0 and less than 100.
With reference to the third aspect, in a second possible implementation manner, the website access record further includes identifiers of terminal devices accessing N URLs within a preset time period, and the processor is specifically configured to: determining, as a target URL, a URL that satisfies one or more of conditions 1, 2, and 3 among the N URLs, where condition 1 is: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently; the condition 3 is: the number of the identifications of the terminal devices accessing the URL is greater than or equal to a first preset threshold, and X, Y are integers which are greater than 0 and smaller than 100.
In the two possible implementations, the probability that the URL that satisfies the preset condition (the preset condition is one or more of the condition 1, the condition 2, or the condition 3) is greater than the probability that the URL that does not satisfy the preset condition is the content server.
With reference to the third aspect, the first possible implementation manner, or the second possible implementation manner of the third aspect, in a third possible implementation manner, the processor is specifically configured to: and when the HTTP return value in the parameter corresponding to the target URL is not 200, or the HTTP return value in the parameter corresponding to the target URL is 200 and the number of bytes of returned data is less than or equal to a second preset threshold value, determining that the target URL is the content server.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic diagram illustrating a network system according to an embodiment of the present invention;
fig. 2 is a schematic composition diagram of a gateway device according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for determining a content server according to an embodiment of the present invention;
fig. 4 is a flowchart of a method for determining a content server according to another embodiment of the present invention;
fig. 5 is a schematic composition diagram of a gateway device according to an embodiment of the present invention;
fig. 6 is a schematic composition diagram of another gateway device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. "plurality" herein means two or more.
An embodiment of the present invention provides a network system for implementing the method provided in the embodiment of the present invention, as shown in fig. 1, including: the system comprises one or more terminal devices, a gateway device connected with the one or more terminal devices, and one or more target servers connected with the gateway device. The user may access the website through a terminal device, the terminal device may be a computer, a mobile phone, or a tablet computer, the gateway device is disposed at an exit of a terminal device network, and is configured to process (for example, message filtering or message detecting) and/or forward a message communicated between the terminal device and a target server, the gateway device may specifically be a router or a firewall, and one or more websites may be run on one target server. It should be noted that, in order to make the description clearer, in the description of the embodiment of the present invention, the content server refers to a type of website, which is used to provide services for other websites, and the target server refers to a hardware carrier running the website.
The device for executing the method provided by the embodiment of the present invention may be a gateway device, and the hardware architecture composition of the gateway device may refer to fig. 2, including: a network interface, a memory connected to the network interface, and a Central Processing Unit (CPU) connected to the memory.
The network interface can be divided into an input interface and an output interface, wherein the input interface is used for inputting network data to the gateway device, and the output interface is used for outputting the network data from the gateway device;
the CPU is composed of an arithmetic unit and a controller, the arithmetic unit is mainly used for processing network data, and the controller is used for analyzing instructions and sending control signals to all parts of the system orderly and purposefully according to the requirements of the instructions so that the whole system works coordinately and consistently. The memory can store network data and can read the stored network data according to the command. The CPU may be specifically arm (advanced RISC machines), mips (microprocessor with interleaved Pipeline stages), X86 processor, and the like.
An embodiment of the present invention provides a method for determining a content server, as shown in fig. 3, including:
301. the gateway equipment acquires a website access record in a preset time period, wherein the website access record comprises N accessed URLs in the preset time period and access times corresponding to the N accessed URLs, and N is an integer greater than 0.
The length of the preset time period may be set according to an actual application scenario or a requirement, for example, the preset time period may be 5 minutes or 10 minutes, and the length of the preset time period is not specifically limited in the embodiment of the present invention.
Specifically, in the website access record, one URL represents one website, and one URL corresponds to the number of times of access of the URL. The gateway device may record all URLs visited within a preset time period, and then count the number of visits of each URL to obtain a website visit record. For example, the website access record may be as shown in table 1, wherein the number of accesses of URL1 is 3, the number of accesses of URL2 is 9, and the number of accesses of URL3 is 7.
TABLE 1
URL Number of accesses
URL1 3
URL2 9
URL3 7
In order to make the way of acquiring the URL by the gateway device clearer, a brief description is first made of a process of accessing the website by the user. When a user accesses a website through terminal equipment, the terminal equipment sends a request for accessing the website to gateway equipment, the gateway equipment processes the request according to the function provided by the gateway equipment and then sends the request to a target server running the website, the target server returns a response message to the gateway equipment after responding to the request, and the gateway equipment detects data contained in the response message according to the provided function and then returns the data to the terminal equipment.
The gateway device may obtain the URL of the website according to the received request for the terminal device to access the website. Specifically, the request for accessing the website received by the gateway device may be an HTTP message, a header of the request for accessing the website includes a Host field and a Path field, and the URL of the website may be obtained by sequentially connecting contents in the Host field and the Path field. For example, if the content in the Host field is s3.tbcdn.com and the content in the Path field is get/img/3.js, then the URL of the website is s3.tbcdn.com/get/img/3. js. The method provided by the embodiment of the invention can determine the content server based on the actual request for accessing the website, so that the method can adapt to the continuously changed or newly added content server.
302. The gateway device determines M target URLs in the N URLs according to the website access records, wherein the M target URLs are M URLs with the highest probability of being a content server in the N URLs, and M is an integer which is larger than 0 and smaller than or equal to N.
Optionally, the step 302 may include, in a specific implementation: and determining the URL meeting the condition 1 and/or the condition 2 in the N URLs as a target URL.
Optionally, the website access record further includes identifiers of terminal devices accessing N URLs within a preset time period, in this case, the step 302 may include, when implemented specifically: determining, as the target URL, a URL that satisfies one or more of condition 1, condition 2, and condition 3 among the N URLs. The identifier of the terminal device is used to uniquely identify the terminal device, and may specifically be an Internet Protocol (IP) address or a Media Access Control (MAC) address of the terminal device. The gateway device may obtain the identifier of the terminal device accessing the website according to a process in which the gateway device establishes a connection with the terminal device accessing the website. Specifically, in a preset time period, how many different terminal devices access a URL, and how many identifiers of the terminal devices accessing the URL are included in the website access record.
In the above two alternative methods, condition 1 is: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained after sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained after sequencing the Host corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently; the condition 3 is: the number of the identifications of the terminal devices accessing the URL is greater than or equal to a first preset threshold, and X, Y are integers which are greater than 0 and smaller than 100.
The value of X may be determined according to the actual application scenario, for example, when the value of N is large, the value of X may be set to be large, and when the value of N is small, the value of X may be set to be small, for example, X may be 80 or 50. The value of Y is determined in the same way. The first preset threshold may be determined according to an actual application scenario, which is not specifically limited in the embodiment of the present invention, for example, when determining a content server in a website accessed by a terminal device in an enterprise, the first preset threshold may be set to 10% or 20% of the total number of the terminal devices in the enterprise.
Wherein, the URL includes a Host and a Path, for example, when the URL is s3.tbcdn.com/get/img/3.js, the Host of the URL is: com, Path is: get/img/3.js, where the URL corresponds to a primary domain name: com. When a URL only includes a Host, the Host corresponding to the URL is the URL.
Specifically, since the number of access times of the content server is higher than that of an ordinary website (i.e., a non-content server), when the number of access times of the first-level domain name or Host of the URL is larger, the probability that the URL is the content server is larger; since most people visit the Host of the URL of a commonly used website (e.g., hundredth or Taobao), and the commonly used website is not a content server, if the Host of a URL is not visited alone, the probability that the URL is a content server is high; since the content server is a website accompanied by access other than a website directly accessed by the user, and different users may accompany access to the same content server when accessing different websites, the probability that the URL is the content server increases as the number of identifiers of terminal devices accessing the URL increases.
Step 302 is illustrated below by way of a specific example, where condition 1 is: the first-level domain name corresponding to the URL is at the top X% in the first sorted result. If N is 10, 10 URLs are: s3.tbcdn.com/get/img/3.js, da.so.com/q/136614, s3.tbcdn.com, china.baidu.com/query 64, wenwenwenwen.sogou.com/query, mingyi.sogou.com/mingyiquery, pic.tbcdn.com/p ═ 06050, china.baidu.com, s3. tbcdn.com/query 64, wenwen.sogou.com/. query, the first-level domain names of 10 URLs are respectively: com, so.com, tbcdn.com, baidu.com, sogou.com, tbcdn.com, baidu.com, tbcdn.com, sogou.com. The number of times the primary domain name corresponding to 10 URLs was accessed is shown in table 2.
TABLE 2
First level Domain name Number of accesses
tbcdn.com 4
so.com 1
baidu.com 2
sogou.com 3
Then, the first-level domain names corresponding to the N URLs are ordered according to the sequence of the access times from large to small, and the result obtained after the ordering is: and tbcdn.com, sogout.com, baidu.com, so.com, when X is 50, the top 50% of the first-level domains in the ordering result are named tbcdn.com and sogout.com.
In the example described based on table 2, the URL corresponding to the URL where the Host is not separately visited includes: da.so.com/q/136614, wenwenwen.sogou.com/ques, mingyi.sogou.com/mingyiquery, pic.tbcdn.com/p [ (& w ═ 06050, and wen.sogou.com/? And (5) query.
In this example, in a specific implementation, if the URL satisfying the condition 1 and the condition 2 is determined as the target URL in the 10 URLs, the target URL is:
wenwenwen. query, mingyi.sogou.com/mingyiquary, wenwenwenwen.sogou.com/ques and pic.tbcdn.com/p ═ 06050.
303. The gateway device accesses a Host corresponding to each of the M target URLs, and receives M parameters returned by a plurality of target servers running the M hosts of the M target URLs, wherein one parameter comprises a Hyper Text Transfer Protocol (HTTP) return value and the number of bytes of return data.
Specifically, the gateway device may obtain the parameter corresponding to the target URL according to a response message returned by the target server running the target URL. The response message returned by the target server and received by the gateway device may be an HTTP message, the response message includes an HTTP return value and returned data, the response message may include a field indicating the number of bytes of the returned data, and the gateway device may obtain the number of bytes of the returned data according to the field.
Specifically, when the HTTP return value is 200, it indicates that the target server has successfully returned the data requested by the gateway device, and the data included in the response message is the data requested by the gateway device. When the HTTP return value is not 200, it indicates that the target server has not successfully returned the data requested by the gateway device.
304. The gateway device determines the content servers in the M target URLs according to the M parameters.
The step 304 may be implemented as follows: and when the HTTP return value in the parameter corresponding to the target URL is not 200, or the HTTP return value in the parameter corresponding to the target URL is 200 and the number of bytes of returned data is less than or equal to a second preset threshold value, determining that the target URL is the content server.
The second preset threshold may be an empirical value, generally not greater than 100, and may be specifically set to several or several tens according to actual experience.
An embodiment of the present invention further provides a method for determining a content server, which is used to exemplarily describe the method described in fig. 3, in this example, a target URL satisfies condition 1, condition 2, and condition 3, where condition 1 is: the first-level domain name corresponding to the URL is at the top X% in the first sorting result, and the explanation of the content related to the above embodiment in this example can be found in the above, as shown in fig. 4, where the method includes:
401. the gateway equipment acquires the website access record in a preset time period.
The website access record comprises N accessed URLs within a preset time period, access times corresponding to the N URLs and identifications of terminal devices accessing the N URLs within the preset time period.
402. And the gateway equipment sorts the primary domain names corresponding to the N URLs according to the sequence of the access times from large to small to obtain a first sorting result.
403. The gateway device determines whether the primary domain name corresponding to each of the N URLs is at the top X% in the first ranking result.
If yes, go to step 404, otherwise go to step 409.
404. The gateway device determines whether the Host corresponding to the URL has not been accessed separately.
If yes, go to step 405, otherwise go to step 409.
405. The gateway device determines whether the number of different identifiers of the terminal device accessing the URL is greater than or equal to a first preset threshold value.
If yes, go to step 406, otherwise go to step 409.
406. And the gateway equipment accesses the Host corresponding to the URL and receives the parameter returned by the target server operating the Host.
Wherein the parameters comprise an HTTP return value and the number of bytes of return data.
407. The gateway device determines whether the HTTP return value is not 200 or whether the HTTP return value is 200 and the number of bytes of returned data is less than or equal to a second preset threshold.
If yes, go to step 408, otherwise go to step 409.
408. The URL is determined to be a content server.
409. It is determined that the URL is not a content server.
As can be seen from the description based on the embodiment described in fig. 4, whether the URL is a content server can be detected online on the gateway device by deploying a program for implementing the method shown in fig. 4 in the gateway device.
According to the method provided by the embodiment of the invention, the URL with low probability of being the content server in the URLs of a plurality of websites is excluded by adopting the access records of the websites, so that the number of the URLs of which the gateway equipment needs to determine whether to be the content server is greatly reduced, and the efficiency of determining the content server by the gateway equipment is improved. When the website security detection is performed, the probability that the excluded URL is the content server is low, and even if the content server is included, the number of the excluded URLs is small, so that the efficiency of the website security detection cannot be greatly influenced.
An embodiment of the present invention further provides a gateway device 50, as shown in fig. 5, including:
an obtaining unit 501, configured to obtain a website access record in a preset time period, where the website access record includes N URLs and access times corresponding to the N URLs, where N is an integer greater than 0, where the N URLs are accessed in the preset time period;
a first determining unit 502, configured to determine M target URLs from the N URLs according to the website access record, where the M target URLs are M URLs from the N URLs that have a highest probability of being a content server, and M is an integer greater than 0 and less than or equal to N;
a transceiving unit 503, configured to access a Host corresponding to each of the M target URLs, and receive M parameters returned by a plurality of target servers operating the M hosts of the M target URLs, where one parameter includes a HTTP return value and a number of bytes of return data;
a second determining unit 504, configured to determine, according to the M parameters, content servers in the M target URLs.
Optionally, the first determining unit 502 is specifically configured to: determining a URL meeting the condition 1 and/or the condition 2 in the N URLs as a target URL, wherein the condition 1 is as follows: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained by sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained by sequencing the hosts corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently, and X, Y are all integers greater than 0 and less than 100.
Optionally, the website access record further includes identifiers of terminal devices accessing the N URLs within the preset time period, and the first determining unit 502 is specifically configured to: determining, as a target URL, a URL that satisfies one or more of conditions 1, 2, and 3 among the N URLs, where condition 1 is: the first-level domain name corresponding to the URL is located at the top X% in the first sequencing result, the first sequencing result is obtained by sequencing the first-level domain names corresponding to the N URLs according to the sequence of the access times from large to small, or the Host corresponding to the URL is located at the top Y% in the second sequencing result, and the second sequencing result is obtained by sequencing the hosts corresponding to the N URLs according to the sequence of the access times from large to small; the condition 2 is: the Host corresponding to the URL is not accessed independently; the condition 3 is: the number of the identifications of the terminal devices accessing the URL is greater than or equal to a first preset threshold, and X, Y are integers which are greater than 0 and smaller than 100.
Optionally, the second determining unit 504 is specifically configured to: and when the HTTP return value in the parameter corresponding to the target URL is not 200, or the HTTP return value in the parameter corresponding to the target URL is 200 and the number of bytes of returned data is less than or equal to a second preset threshold value, determining that the target URL is the content server.
Each unit in the gateway device 50 provided in the embodiment of the present invention is configured to execute the method, and therefore, beneficial effects of the gateway device 50 may refer to beneficial effects of the method, which are not described herein again.
An embodiment of the present invention further provides a gateway device 60, as shown in fig. 6, including: a memory 601, a processor 602, a transceiver 603 and a bus system 604, wherein the memory 601 is used for storing codes, the processor 602 is used for executing steps 301 and 304 in the method shown in fig. 3 according to the codes, the transceiver 603 is used for executing step 303 in the method shown in fig. 3, the processor 602 is further used for executing steps 401 and 407 and 409 in the method shown in fig. 4, and the transceiver 603 is further used for executing step 406 in the method shown in fig. 4.
The memory 601, the processor 602, and the transceiver 603 are coupled via a bus system 604, wherein the memory 601 may comprise a random access memory, and may further comprise a non-volatile memory, such as at least one disk memory. The bus system 604 may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus system 604 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The transceiver unit 503 in fig. 5 may be the transceiver 603, the remaining units may be the processor 602, and the remaining units may be embedded in a hardware form or a processor independent from the gateway device, or may be stored in a memory of the gateway device in a software form, so that the processor may invoke and execute operations corresponding to the above units, where the processor may be a CPU, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement the embodiments of the present invention.
Each device in the gateway device 60 provided in the embodiment of the present invention is configured to execute the method, and therefore, beneficial effects of the gateway device may refer to beneficial effects of the method, which are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims (12)

1.一种确定内容服务器的方法,其特征在于,包括:1. A method for determining a content server, comprising: 网关设备获取预设时间段内的网站访问记录,所述网站访问记录包括所述预设时间段内被访问的N个统一资源定位符URL和与所述N个URL对应的访问次数,N为大于0的整数;The gateway device obtains website access records within a preset time period, and the website access records include N Uniform Resource Locator URLs accessed within the preset time period and the number of visits corresponding to the N URLs, where N is an integer greater than 0; 所述网关设备根据所述网站访问记录在所述N个URL中确定M个目标URL,所述M个目标URL为所述N个URL中为内容服务器的概率最大的M个URL,M为大于0小于等于N的整数;The gateway device determines M target URLs among the N URLs according to the website access record, where the M target URLs are the M URLs with the highest probability of being a content server among the N URLs, and M is greater than 0 is an integer less than or equal to N; 所述网关设备访问所述M个目标URL中的每个URL对应的主机Host,并接收运行所述M个目标URL的M个Host的多个目标服务器返回的M个参数,一个参数包括超文本传输协议HTTP返回值以及返回数据字节数;The gateway device accesses the host Host corresponding to each of the M target URLs, and receives M parameters returned by multiple target servers running the M Hosts of the M target URLs, and one parameter includes hypertext Transmission protocol HTTP return value and return data bytes; 所述网关设备根据所述M个参数确定所述M个目标URL中的内容服务器。The gateway device determines content servers in the M target URLs according to the M parameters. 2.根据权利要求1所述的方法,其特征在于,所述网关设备根据所述网站访问记录在所述N个URL中确定M个目标URL,包括:2. The method according to claim 1, wherein the gateway device determines M target URLs in the N URLs according to the website access record, comprising: 所述网关设备将所述N个URL中的满足条件1和/或条件2的URL确定为目标URL,条件1为:URL对应的一级域名在第一排序结果中处于前X%,所述第一排序结果为按照访问次数由大至小的顺序对所述N个URL对应的一级域名进行排序后得到的结果,或者,URL对应的Host在第二排序结果中处于前Y%,所述第二排序结果为按照访问次数由大至小的顺序对所述N个URL对应的Host进行排序后得到的结果;条件2为:URL对应的Host没有被单独访问,X、Y均为大于0小于100的整数。The gateway device determines a URL that satisfies Condition 1 and/or Condition 2 among the N URLs as a target URL, and Condition 1 is: the first-level domain name corresponding to the URL is in the top X% in the first sorting result, the The first sorting result is the result obtained after sorting the first-level domain names corresponding to the N URLs in descending order of the number of visits, or, the Host corresponding to the URL is in the top Y% in the second sorting result, so The second sorting result is the result obtained after sorting the hosts corresponding to the N URLs according to the number of visits from large to small; Condition 2 is: the Host corresponding to the URL is not accessed separately, and X and Y are both greater than 0 is an integer less than 100. 3.根据权利要求1所述的方法,其特征在于,所述网站访问记录还包括所述预设时间段内访问所述N个URL的终端设备的标识,所述网关设备根据所述网站访问记录在所述N个URL中确定M个目标URL,包括:3. The method according to claim 1, wherein the website access record further comprises the identifiers of terminal devices that access the N URLs within the preset time period, and the gateway device accesses the website according to the Record to determine M target URLs among the N URLs, including: 所述网关设备将所述N个URL中的满足条件1、条件2和条件3中的一个或多个条件的URL确定为目标URL,条件1为:URL对应的一级域名在第一排序结果中处于前X%,所述第一排序结果为按照访问次数由大至小的顺序对所述N个URL对应的一级域名进行排序后得到的结果,或者,URL对应的Host在第二排序结果中处于前Y%,所述第二排序结果为按照访问次数由大至小的顺序对所述N个URL对应的Host进行排序后得到的结果;条件2为:URL对应的Host没有被单独访问;条件3为:访问URL的终端设备的标识的个数大于或等于第一预设阈值,X、Y均为大于0小于100的整数。The gateway device determines, among the N URLs, a URL that satisfies one or more conditions in Condition 1, Condition 2, and Condition 3 as the target URL, and Condition 1 is: the first-level domain name corresponding to the URL is in the first sorting result. In the top X%, the first sorting result is the result obtained after sorting the first-level domain names corresponding to the N URLs in descending order of the number of visits, or, the Host corresponding to the URL is in the second sorting The results are in the top Y%, and the second sorting result is the result obtained after sorting the Hosts corresponding to the N URLs in descending order of the number of visits; Condition 2 is: the Hosts corresponding to the URLs are not separately Access; Condition 3 is: the number of identifiers of terminal devices accessing the URL is greater than or equal to the first preset threshold, and X and Y are both integers greater than 0 and less than 100. 4.根据权利要求1-3任一项所述的方法,其特征在于,所述网关设备根据所述M个参数确定所述M个目标URL中的内容服务器,包括:4. The method according to any one of claims 1-3, wherein the gateway device determines the content server in the M target URLs according to the M parameters, comprising: 当一个目标URL对应的参数中的HTTP返回值为非200,或者,一个目标URL对应的参数中的HTTP返回值为200、且返回数据字节数小于或等于第二预设阈值时,所述网关设备确定该目标URL为内容服务器。When the HTTP return value in a parameter corresponding to a target URL is not 200, or when the HTTP return value in a parameter corresponding to a target URL is 200, and the number of bytes of returned data is less than or equal to the second preset threshold, the The gateway device determines the target URL as a content server. 5.一种网关设备,其特征在于,包括:5. A gateway device, comprising: 获取单元,用于获取预设时间段内的网站访问记录,所述网站访问记录包括所述预设时间段内被访问的N个统一资源定位符URL和与所述N个URL对应的访问次数,N为大于0的整数;An acquiring unit, configured to acquire website access records within a preset time period, where the website access records include N Uniform Resource Locator URLs accessed within the preset time period and the number of visits corresponding to the N URLs , N is an integer greater than 0; 第一确定单元,用于根据所述网站访问记录在所述N个URL中确定M个目标URL,所述M个目标URL为所述N个URL中为内容服务器的概率最大的M个URL,M为大于0小于等于N的整数;a first determining unit, configured to determine M target URLs among the N URLs according to the website access record, where the M target URLs are the M URLs with the highest probability of being a content server among the N URLs, M is an integer greater than 0 and less than or equal to N; 收发单元,用于访问所述M个目标URL中的每个URL对应的主机Host,并接收运行所述M个目标URL的M个Host的多个目标服务器返回的M个参数,一个参数包括超文本传输协议HTTP返回值以及返回数据字节数;The transceiver unit is configured to access the host Host corresponding to each URL in the M target URLs, and receive M parameters returned by multiple target servers running the M Hosts of the M target URLs, and one parameter includes a super The return value of the text transmission protocol HTTP and the number of bytes of data returned; 第二确定单元,用于根据所述M个参数确定所述M个目标URL中的内容服务器。The second determining unit is configured to determine the content server in the M target URLs according to the M parameters. 6.根据权利要求5所述的网关设备,其特征在于,所述第一确定单元具体用于:6. The gateway device according to claim 5, wherein the first determining unit is specifically configured to: 将所述N个URL中的满足条件1和/或条件2的URL确定为目标URL,条件1为:URL对应的一级域名在第一排序结果中处于前X%,所述第一排序结果为按照访问次数由大至小的顺序对所述N个URL对应的一级域名进行排序后得到的结果,或者,URL对应的Host在第二排序结果中处于前Y%,所述第二排序结果为按照访问次数由大至小的顺序对所述N个URL对应的Host进行排序后得到的结果;条件2为:URL对应的Host没有被单独访问,X、Y均为大于0小于100的整数。Determining a URL that satisfies Condition 1 and/or Condition 2 among the N URLs as a target URL, and Condition 1 is: the first-level domain name corresponding to the URL is in the top X% of the first sorting result, and the first sorting result It is the result obtained after sorting the first-level domain names corresponding to the N URLs in descending order of the number of visits, or, the Host corresponding to the URL is in the top Y% in the second sorting result, and the second sorting The result is the result obtained by sorting the Hosts corresponding to the N URLs in descending order of the number of visits; Condition 2 is: the Hosts corresponding to the URLs are not accessed individually, and X and Y are both greater than 0 and less than 100 Integer. 7.根据权利要求5所述的网关设备,其特征在于,所述网站访问记录还包括所述预设时间段内访问所述N个URL的终端设备的标识,所述第一确定单元具体用于:7. The gateway device according to claim 5, wherein the website access record further comprises identifiers of terminal devices accessing the N URLs within the preset time period, and the first determining unit specifically uses At: 将所述N个URL中的满足条件1、条件2和条件3中的一个或多个条件的URL确定为目标URL,条件1为:URL对应的一级域名在第一排序结果中处于前X%,所述第一排序结果为按照访问次数由大至小的顺序对所述N个URL对应的一级域名进行排序后得到的结果,或者,URL对应的Host在第二排序结果中处于前Y%,所述第二排序结果为按照访问次数由大至小的顺序对所述N个URL对应的Host进行排序后得到的结果;条件2为:URL对应的Host没有被单独访问;条件3为:访问URL的终端设备的标识的个数大于或等于第一预设阈值,X、Y均为大于0小于100的整数。Determining a URL that satisfies one or more conditions in Condition 1, Condition 2 and Condition 3 in the N URLs as the target URL, Condition 1 is: the first-level domain name corresponding to the URL is in the top X in the first sorting result %, the first sorting result is the result obtained after sorting the first-level domain names corresponding to the N URLs in descending order of the number of visits, or, the Host corresponding to the URL is in the top position in the second sorting result Y%, the second sorting result is the result obtained after sorting the Hosts corresponding to the N URLs in descending order of the number of visits; Condition 2 is: the Host corresponding to the URL is not accessed independently; Condition 3 is: the number of identifiers of terminal devices accessing the URL is greater than or equal to the first preset threshold, and X and Y are both integers greater than 0 and less than 100. 8.根据权利要求5-7任一项所述的网关设备,其特征在于,所述第二确定单元具体用于:8. The gateway device according to any one of claims 5-7, wherein the second determining unit is specifically configured to: 当一个目标URL对应的参数中的HTTP返回值为非200,或者,一个目标URL对应的参数中的HTTP返回值为200、且返回数据字节数小于或等于第二预设阈值时,确定该目标URL为内容服务器。When the HTTP return value in the parameter corresponding to a target URL is not 200, or the HTTP return value in the parameter corresponding to a target URL is 200, and the number of bytes of returned data is less than or equal to the second preset threshold, determine the The target URL is the content server. 9.一种网关设备,其特征在于,包括:存储器、处理器和收发器,所述存储器用于存储代码,所述处理器用于根据该代码执行以下动作:9. A gateway device, comprising: a memory, a processor and a transceiver, wherein the memory is used to store code, and the processor is used to perform the following actions according to the code: 获取预设时间段内的网站访问记录,所述网站访问记录包括所述预设时间段内被访问的N个统一资源定位符URL和与所述N个URL对应的访问次数,N为大于0的整数;Obtain website access records within a preset time period, where the website access records include N Uniform Resource Locator URLs accessed within the preset time period and the number of visits corresponding to the N URLs, where N is greater than 0 the integer; 根据所述网站访问记录在所述N个URL中确定M个目标URL,所述M个目标URL为所述N个URL中为内容服务器的概率最大的M个URL,M为大于0小于等于N的整数;According to the website access record, M target URLs are determined among the N URLs, the M target URLs are the M URLs with the highest probability of being a content server among the N URLs, and M is greater than 0 and less than or equal to N the integer; 所述收发器,用于访问所述M个目标URL中的每个URL对应的主机Host,并接收运行所述M个目标URL的M个Host的多个目标服务器返回的M个参数,一个参数包括超文本传输协议HTTP返回值以及返回数据字节数;The transceiver is configured to access the host Host corresponding to each of the M target URLs, and receive M parameters returned by multiple target servers running the M Hosts of the M target URLs, one parameter Including the return value of the Hypertext Transfer Protocol HTTP and the number of bytes of data returned; 所述处理器,还用于根据所述M个参数确定所述M个目标URL中的内容服务器。The processor is further configured to determine a content server in the M target URLs according to the M parameters. 10.根据权利要求9所述的网关设备,其特征在于,所述处理器具体用于:10. The gateway device according to claim 9, wherein the processor is specifically configured to: 将所述N个URL中的满足条件1和/或条件2的URL确定为目标URL,条件1为:URL对应的一级域名在第一排序结果中处于前X%,所述第一排序结果为按照访问次数由大至小的顺序对所述N个URL对应的一级域名进行排序后得到的结果,或者,URL对应的Host在第二排序结果中处于前Y%,所述第二排序结果为按照访问次数由大至小的顺序对所述N个URL对应的Host进行排序后得到的结果;条件2为:URL对应的Host没有被单独访问,X、Y均为大于0小于100的整数。Determining a URL that satisfies Condition 1 and/or Condition 2 among the N URLs as a target URL, and Condition 1 is: the first-level domain name corresponding to the URL is in the top X% of the first sorting result, and the first sorting result It is the result obtained after sorting the first-level domain names corresponding to the N URLs in descending order of the number of visits, or, the Host corresponding to the URL is in the top Y% in the second sorting result, and the second sorting The result is the result obtained by sorting the Hosts corresponding to the N URLs in descending order of the number of visits; Condition 2 is: the Hosts corresponding to the URLs are not accessed individually, and X and Y are both greater than 0 and less than 100 Integer. 11.根据权利要求9所述的网关设备,其特征在于,所述网站访问记录还包括所述预设时间段内访问所述N个URL的终端设备的标识,所述处理器具体用于:11. The gateway device according to claim 9, wherein the website access record further comprises identifiers of terminal devices accessing the N URLs within the preset time period, and the processor is specifically configured to: 将所述N个URL中的满足条件1、条件2和条件3中的一个或多个条件的URL确定为目标URL,条件1为:URL对应的一级域名在第一排序结果中处于前X%,所述第一排序结果为按照访问次数由大至小的顺序对所述N个URL对应的一级域名进行排序后得到的结果,或者,URL对应的Host在第二排序结果中处于前Y%,所述第二排序结果为按照访问次数由大至小的顺序对所述N个URL对应的Host进行排序后得到的结果;条件2为:URL对应的Host没有被单独访问;条件3为:访问URL的终端设备的标识的个数大于或等于第一预设阈值,X、Y均为大于0小于100的整数。Determining a URL that satisfies one or more conditions in Condition 1, Condition 2 and Condition 3 in the N URLs as the target URL, Condition 1 is: the first-level domain name corresponding to the URL is in the top X in the first sorting result %, the first sorting result is the result obtained after sorting the first-level domain names corresponding to the N URLs in descending order of the number of visits, or, the Host corresponding to the URL is in the top position in the second sorting result Y%, the second sorting result is the result obtained after sorting the Hosts corresponding to the N URLs in descending order of the number of visits; Condition 2 is: the Host corresponding to the URL is not accessed independently; Condition 3 is: the number of identifiers of terminal devices accessing the URL is greater than or equal to the first preset threshold, and X and Y are both integers greater than 0 and less than 100. 12.根据权利要求9-11任一项所述的网关设备,其特征在于,所述处理器具体用于:12. The gateway device according to any one of claims 9-11, wherein the processor is specifically configured to: 当一个目标URL对应的参数中的HTTP返回值为非200,或者,一个目标URL对应的参数中的HTTP返回值为200、且返回数据字节数小于或等于第二预设阈值时,确定该目标URL为内容服务器。When the HTTP return value in the parameter corresponding to a target URL is not 200, or the HTTP return value in the parameter corresponding to a target URL is 200, and the number of bytes of returned data is less than or equal to the second preset threshold, determine the The target URL is the content server.
CN201610767748.3A 2016-08-30 2016-08-30 A method and apparatus for determining a content server Active CN107786604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610767748.3A CN107786604B (en) 2016-08-30 2016-08-30 A method and apparatus for determining a content server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610767748.3A CN107786604B (en) 2016-08-30 2016-08-30 A method and apparatus for determining a content server

Publications (2)

Publication Number Publication Date
CN107786604A CN107786604A (en) 2018-03-09
CN107786604B true CN107786604B (en) 2020-04-28

Family

ID=61440789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610767748.3A Active CN107786604B (en) 2016-08-30 2016-08-30 A method and apparatus for determining a content server

Country Status (1)

Country Link
CN (1) CN107786604B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6185619B1 (en) * 1996-12-09 2001-02-06 Genuity Inc. Method and apparatus for balancing the process load on network servers according to network and serve based policies
JP2003256310A (en) * 2002-03-05 2003-09-12 Nec Corp Server load decentralizing system, server load decentralizing apparatus, content management apparatus and server load decentralizing program
CN105323320B (en) * 2015-11-11 2018-09-25 中国联合网络通信集团有限公司 A kind of method and device of content distribution

Also Published As

Publication number Publication date
CN107786604A (en) 2018-03-09

Similar Documents

Publication Publication Date Title
CN102752288B (en) Network access behavior identification method and device
US10043199B2 (en) Method, device and system for publishing merchandise information
CN106933854B (en) Short link processing method and device and server
CN106933871B (en) Short link processing method, device and short link server
KR101514738B1 (en) Advertisement based on application-created social content
KR20140101697A (en) Automatic detection of fraudulent ratings/comments related to an application store
CN109657434B (en) Application access method and device
CN104219230A (en) Method and device for identifying malicious websites
CN105160246A (en) Method for identifying hijacked browser and browser
CN108228864A (en) Web spider identification method, device, computer equipment and storage medium
CN104699837B (en) Method, device and server for selecting illustrated pictures of web pages
CN105138912A (en) Method and device for generating phishing website detection rules automatically
CN111767481A (en) Access processing method, device, equipment and storage medium
CN101694656A (en) Search request method, search method, device and system
CN109145179B (en) A kind of crawler behavioral value method and device
CN105187439A (en) Phishing website detection method and device
CN103905434A (en) Method and device for processing network data
CN107483565B (en) Service background identification method, proxy server and computer storage medium
CN106131069A (en) A kind of Web method for detecting abnormality and device
CN106202297A (en) Identify the method and device of user interest
CN108664493B (en) Method and device for counting validity of URL (Uniform resource locator), electronic equipment and storage medium
CN113127767B (en) Mobile phone number extraction method and device, electronic equipment and storage medium
CN107786604B (en) A method and apparatus for determining a content server
CN107784054B (en) Page publishing method and device
JP6481721B2 (en) User access log association method, apparatus, system, program, and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211222

Address after: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee after: HUAWEI TECHNOLOGIES Co.,Ltd.

Address before: 215123 Building A3, Creative Industry Park, 328 Xinghu Street, Suzhou Industrial Park, Jiangsu Province

Patentee before: Huawei digital technology (Suzhou) Co.,Ltd.

Effective date of registration: 20211222

Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Patentee after: xFusion Digital Technologies Co., Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250306

Address after: 450046, 10th Floor, North Chuangzhi Tiandi Building, Dongshigeng Street, Longzihu Wisdom Island Middle Road East, Zhengdong New District, Zhengzhou City, Henan Province

Patentee after: Henan Kunlun Technology Co.,Ltd.

Country or region after: China

Patentee after: xFusion Digital Technologies Co., Ltd.

Address before: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Patentee before: xFusion Digital Technologies Co., Ltd.

Country or region before: China