[go: up one dir, main page]

CN105335511A - Webpage access method and device - Google Patents

Webpage access method and device Download PDF

Info

Publication number
CN105335511A
CN105335511A CN201510725908.3A CN201510725908A CN105335511A CN 105335511 A CN105335511 A CN 105335511A CN 201510725908 A CN201510725908 A CN 201510725908A CN 105335511 A CN105335511 A CN 105335511A
Authority
CN
China
Prior art keywords
proxy server
webpage
access
information
restricted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510725908.3A
Other languages
Chinese (zh)
Inventor
庞凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510725908.3A priority Critical patent/CN105335511A/en
Publication of CN105335511A publication Critical patent/CN105335511A/en
Priority to US15/745,987 priority patent/US20180225387A1/en
Priority to EP16858633.7A priority patent/EP3273362A4/en
Priority to PCT/CN2016/082981 priority patent/WO2017071189A1/en
Priority to JP2017548061A priority patent/JP6488508B2/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a webpage access method and device. According to the embodiment of the invention, through determining that the access to a webpage is limited, the information of a proxy server is obtained, so that the webpage can be accessed by using the information of the proxy server, and since the information of the proxy server can be automatically obtained, a user has no need of manually searching a website which issues the proxy server; the operation is simple and the success rate is high, so that the webpage access efficiency and reliability are improved.

Description

The access method of webpage and device
[technical field]
The present invention relates to Internet technology, particularly relate to a kind of access method and device of webpage.
[background technology]
Along with the development of internet industry, the information that web page contents provides day by day is enriched, and the data content that webpage is shown is also thereupon more and more abundanter.In the process of accessed web page, the website belonging to some webpages is access restricted web site, and such as, foreign Web sites or school website etc., make these webpages normally to access.
Under such conditions, user needs to utilize relevant keyword such as, and proxy server issuing web site etc., carry out search operation, to obtain the web portal of realease agent server.User accesses the website of the realease agent server obtained, and utilizes the proxy server that it is issued, and the agency that conducts interviews respectively is arranged, and to make it possible to utilize available proxy server, accesses these webpages.Like this, can cause complicated operation, the running time is long, and success ratio is not high, thus reduces efficiency and the reliability of web page access.
[summary of the invention]
Many aspects of the present invention provide a kind of access method and device of webpage, in order to improve efficiency and the reliability of web page access.
An aspect of of the present present invention, provides a kind of access method of webpage, comprising:
Determine that the access of webpage is restricted;
Obtain the information of proxy server;
Utilize the information of described proxy server, access described webpage.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, describedly determines that the access of webpage is restricted, and comprising:
Obtain the request of access of described webpage;
According to the request of access of described webpage, determine that described webpage cannot be accessed;
According to access restricted list, determine that described webpage affiliated web site is for access restricted web site;
Determine that the access of described webpage is restricted.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and the information of described acquisition proxy server, comprising:
According to the banner of described webpage, obtain the information of described proxy server.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, before the information of described acquisition proxy server, also comprises:
Utilize web crawlers, obtain proxy server set, described proxy server set comprises the information of each proxy server at least one available proxy server, for according to described proxy server set, obtains the information of described proxy server.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, describedly utilizes web crawlers, after obtaining proxy server set, also comprises:
Quality verification is carried out at least one proxy server described;
To not by the information of the proxy server of quality verification, carry out filtration treatment.
Another aspect of the present invention, provides a kind of access means of webpage, comprising:
Addressed location, for determining that the access of webpage is restricted;
Acquiring unit, for obtaining the information of proxy server;
Described addressed location, also for utilizing the information of described proxy server, accesses described webpage.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described addressed location further, also for
Obtain the request of access of described webpage;
According to the request of access of described webpage, determine that described webpage cannot be accessed;
According to access restricted list, determine that described webpage affiliated web site is for access restricted web site; And
Determine that the access of described webpage is restricted.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described acquiring unit further, specifically for
According to the banner of described webpage, obtain the information of described proxy server.
Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described device also comprises collecting unit, for
Utilize web crawlers, obtain proxy server set, described proxy server set comprises the information of each proxy server at least one available proxy server, for according to described proxy server set, obtains the information of described proxy server.
Aspect as above and arbitrary possible implementation, provide a kind of implementation, described collecting unit further, also for
Quality verification is carried out at least one proxy server described; And
To not by the information of the proxy server of quality verification, carry out filtration treatment.
As shown from the above technical solution, the embodiment of the present invention is by determining that the access of webpage is restricted, and then obtain the information of proxy server, make it possible to the information utilizing described proxy server, access described webpage, due to can the information of automatic acquisition proxy server, therefore, make the website without the need to user's manual search realease agent server, simple to operate, and success ratio is high, thus improve efficiency and the reliability of web page access.
In addition, adopt technical scheme provided by the present invention, by carrying out quality verification to each proxy server at least one available proxy server included in obtained proxy server set, and then to the information not by the proxy server of quality verification, carry out filtration treatment, effectively can ensure the quality of obtained proxy server.
In addition, adopt technical scheme provided by the present invention, without the need to the website of user's manual search realease agent server, completely transparent to user, the access that effectively can improve user is experienced.
[accompanying drawing explanation]
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The schematic flow sheet of the access method of the webpage that Fig. 1 provides for one embodiment of the invention;
The structural representation of the access means of the webpage that Fig. 2 provides for another embodiment of the present invention;
The structural representation of the access means of the webpage that Fig. 3 provides for another embodiment of the present invention.
[embodiment]
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments whole obtained under creative work prerequisite, belong to the scope of protection of the invention.
Be understandable that, webpage involved in the present invention, also webpage or Web page can be called, can be based on HTML (Hypertext Markup Language) (HyperTextMarkupLanguage, HTML) webpage (WebPage) write, i.e. html web page, or can also be the webpage write based on HTML and Java language, i.e. java server webpage (JavaServerPage, JSP), or can also be the webpage of other language compilation, the present embodiment be particularly limited this.Web page can comprise by one or more web page tag such as, HTML (Hypertext Markup Language) (HyperTextMarkupLanguage, HTML) label, JSP label etc., a display block of definition, be called web page element, such as, word, picture, hyperlink, button, input frame, combobox etc.
It should be noted that, terminal involved in the embodiment of the present invention can include but not limited to mobile phone, personal digital assistant (PersonalDigitalAssistant, PDA), radio hand-held equipment, panel computer (TabletComputer), PC (PersonalComputer, PC), MP3 player, MP4 player, wearable device (such as, intelligent glasses, intelligent watch, Intelligent bracelet etc.) etc.
In addition, term "and/or" herein, being only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".
The schematic flow sheet of the access method of the webpage that Fig. 1 provides for one embodiment of the invention, as shown in Figure 1.
101, determine that the access of webpage is restricted.
102, the information of proxy server is obtained.
103, utilize the information of described proxy server, access described webpage.
It should be noted that, the executive agent of 101 ~ 103 can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or the SDK (Software Development Kit) (SoftwareDevelopmentKit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the search engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this, and the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, such as, browser application, the application of mobile phone Baidu etc., or can also be a web page program (webApp) of the browser in terminal, the present embodiment be particularly limited this.
Like this, by determining that the access of webpage is restricted, and then obtain the information of proxy server, make it possible to the information utilizing described proxy server, access described webpage, due to can the information of automatic acquisition proxy server, therefore, make the website without the need to user's manual search realease agent server, simple to operate, and success ratio is high, thus improve efficiency and the reliability of web page access.
It should be noted that, webpage involved in the present embodiment, can be the webpage of PC website, or can also be the webpage of mobile site, the present embodiment be particularly limited this.
At present, application examples as, when browser or Baidu's APP accessed web page, need first downloading web pages primary resource, and then resolve and play up webpage primary resource.When being resolved to URL(uniform resource locator) (UniformResourceLocator, the URL) of the webpage child resource quoted in webpage primary resource, starting downloading web pages child resource, and according to webpage child resource, playing up webpage primary resource further.If webpage affiliated web site is access restricted web site, then cannot the primary resource of downloading web pages, so, then directly export the information being used to indicate webpage and cannot accessing.
Alternatively, in one of the present embodiment possible implementation, in 101, specifically can obtain the request of access of described webpage, and according to the request of access of described webpage, determine that described webpage cannot be accessed, and according to access restricted list, determine that described webpage affiliated web site is for access restricted web site, and then, then can determine that the access of described webpage is restricted.
After the request of access getting the webpage that user triggers, this request of access is sent to the server of webpage affiliated web site.If this website is access restricted web site, this request of access then can be blocked, and cannot be sent to the server of website.Then, the information being used to indicate webpage and cannot accessing is received.Now, then can determine that this webpage cannot be accessed.
The reason cannot accessed due to webpage has a lot, therefore, after determining that webpage cannot be accessed, also needs to inquire about in access restricted list further, to determine whether this webpage affiliated web site is access restricted web site.If this webpage affiliated web site is in access restricted list, then can determine that this webpage affiliated web site is for access restricted web site.
To sum up, the webpage will accessed due to user cannot be accessed, and this webpage affiliated web site is access restricted web site, therefore, can determine that the access of this webpage is restricted.
In the present invention, in 102, the information of the proxy server obtained can include but not limited to the URL(uniform resource locator) (UniformResourceLocator of proxy server, or uniform resource name (UniformResourceName URL), URN), IP address or other access identities, the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, in 102, specifically can obtain the information of a proxy server, or the information of multiple proxy server can also be obtained.
If obtain the information of a proxy server, so, then utilize the information of this proxy server, perform follow-up 103.
If obtain the information of multiple proxy server, so, then can adopt the selection strategy pre-set, first select the information of a proxy server, then, then can utilize the information of this proxy server, perform follow-up 103.If the access of described webpage is still restricted, then continue the information selecting next proxy server, continue operation above, until the access of described webpage is no longer restricted.
Alternatively, in one of the present embodiment possible implementation, in 102, specifically according to the banner of described webpage, the information of described proxy server can be obtained.Particularly, the mapping relations of the information of a webpage and available proxy server can be prestored, in order to be associated by proxy server available to webpage and its.Like this, then according to the banner of described webpage, and described mapping relations can be utilized, obtain the information with the proxy server corresponding to described banner, the availability of the information of obtained proxy server can be ensured.
In a concrete implementation procedure, specifically can by the information of the mark of described webpage and described proxy server, corresponding stored is in database or file system.
Particularly, specifically can by the mark of described webpage, and the information of proxy server corresponding to the mark of this webpage, corresponding stored is in a database or in file system.
Wherein, the mark of described webpage can include but not limited to the parameter value of the parameter name of the mark of webpage and the mark of webpage, and the present embodiment is not particularly limited this; The information of described proxy server can include but not limited to the parameter value of the parameter name of the information of proxy server and the information of proxy server, and the present embodiment is not particularly limited this.
Wherein, described database can adopt relevant database, such as, oracle database, DB2 database, Structured Query Language (SQL) (StructuredQueryLanguage, SQL) server (Server) database, MySQL database etc., or key assignments (Key-Value) type database can also be adopted, such as, non-SQL (NotOnlySQL) NoSQL database, Redis database, the present embodiment is not particularly limited this.
Such as, specifically can by the parameter name of the mark of described each webpage and parameter value, and the parameter value of the information of proxy server corresponding to the mark of this webpage, corresponding stored is in a database or in file system.As can with the parameter value of the information of the proxy server corresponding to the mark of each webpage, as Key; With the parameter name of the mark of this webpage and parameter value, as Value, the two corresponding stored can be incited somebody to action in Key-Value type database.
Or, more such as, specifically can by the parameter name of the mark of described each webpage and parameter value, and the parameter name of the information of proxy server corresponding to the mark of this webpage and parameter value, corresponding stored is in a database or in file system.As can with the parameter name of the information of the proxy server corresponding to the mark of each webpage and parameter value, as Key; With the parameter name of the mark of this webpage and parameter value, as Value, the two corresponding stored can be incited somebody to action in Key-Value type database.
It should be noted that, to the mark of described webpage and the information of described proxy server, while carrying out stores processor, also need further to the time (Init_time) that first time stores, and the follow-up at least one item upgraded in the time (update_time) stored, carry out record, for the primary demand meeting follow-up management operation.
Particularly, described database or described file system, specifically can be deployed on the memory device of terminal.
Such as, the memory device of described terminal can memory device at a slow speed, be specifically as follows the hard disk of computer system, or can also be inoperative internal memory and the physical memory of mobile phone, such as, ROM (read-only memory) (Read-OnlyMemory, ROM) and RAM (random access memory) card etc., the present embodiment is not particularly limited this.
Or, again such as, the memory device of described terminal can also be speedy storage equipment, be specifically as follows the internal memory of computer system, or can also be running memory and the Installed System Memory of mobile phone, such as, random access memory (RandomAccessMemory, RAM) etc., the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, before 102, web crawlers can also be utilized further, obtain proxy server set, described proxy server set comprises the information of each proxy server at least one available proxy server, for according to described proxy server set, obtains the information of described proxy server.
Utilizing web crawlers, after obtaining proxy server set, quality verification can also carried out at least one proxy server described further, and then, then to not by the information of the proxy server of quality verification, filtration treatment can be carried out.Like this, by carrying out quality verification to each proxy server at least one available proxy server included in obtained proxy server set, and then to the information not by the proxy server of quality verification, carry out filtration treatment, effectively can ensure the quality of obtained proxy server.
So-called quality verification, refers to and carries out stability, the checking such as ageing, to guarantee that proxy server can be used to proxy server.Be understandable that, described quality verification can regularly carry out, and such as, once a day, once in a week, like this, can ensure the quality of obtained proxy server further.
In the present embodiment, by determining that the access of webpage is restricted, and then obtain the information of proxy server, make it possible to the information utilizing described proxy server, access described webpage, due to can the information of automatic acquisition proxy server, therefore, make the website without the need to user's manual search realease agent server, simple to operate, and success ratio is high, thus improve efficiency and the reliability of web page access.
In addition, adopt technical scheme provided by the present invention, by carrying out quality verification to each proxy server at least one available proxy server included in obtained proxy server set, and then to the information not by the proxy server of quality verification, carry out filtration treatment, effectively can ensure the quality of obtained proxy server.
In addition, adopt technical scheme provided by the present invention, without the need to the website of user's manual search realease agent server, completely transparent to user, the access that effectively can improve user is experienced.
It should be noted that, for aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.
In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.
The structural representation of the access means of the webpage that Fig. 2 provides for another embodiment of the present invention, as shown in Figure 2.The access means of the webpage of the present embodiment can comprise addressed location 21 and acquiring unit 22.Wherein, addressed location 21, for determining that the access of webpage is restricted; Acquiring unit 22, for obtaining the information of proxy server; Described addressed location 21, also for utilizing the information of described proxy server, accesses described webpage.
It should be noted that, the access means of the webpage that the present embodiment provides can for being positioned at the application of local terminal, or can also for being arranged in plug-in unit or the SDK (Software Development Kit) (SoftwareDevelopmentKit of the application of local terminal, the functional unit such as SDK), or can also for being arranged in the search engine of the server of network side, or can also for being positioned at the distributed system of network side, the present embodiment is not particularly limited this, and the present embodiment is not particularly limited this.
Be understandable that, described application can be mounted in the local program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the present embodiment is not particularly limited this.
Alternatively, in one of the present embodiment possible implementation, described addressed location 21, can also be further used for the request of access obtaining described webpage; According to the request of access of described webpage, determine that described webpage cannot be accessed; According to access restricted list, determine that described webpage affiliated web site is for access restricted web site; And determine that the access of described webpage is restricted.
Alternatively, in one of the present embodiment possible implementation, described acquiring unit 22, specifically may be used for the banner according to described webpage, obtains the information of described proxy server.
Alternatively, in one of the present embodiment possible implementation, as shown in Figure 3, the access means of the webpage that the present embodiment provides can further include collecting unit 31, may be used for utilizing web crawlers, obtain proxy server set, described proxy server set comprises the information of each proxy server at least one available proxy server, for according to described proxy server set, obtain the information of described proxy server.
Further, described collecting unit 31, can also be further used for carrying out quality verification at least one proxy server described; And to not by the information of the proxy server of quality verification, carry out filtration treatment.
It should be noted that, method in the embodiment that Fig. 1 is corresponding, the access means of the webpage that can be provided by the present embodiment realizes.Detailed description see the related content in embodiment corresponding to Fig. 1, can repeat no more herein.
In the present embodiment, be restricted by the access of addressed location determination webpage, and then the information of proxy server is obtained by acquiring unit, make described addressed location can utilize the information of described proxy server, access described webpage, due to can the information of automatic acquisition proxy server, therefore, make the website without the need to user's manual search realease agent server, simple to operate, and success ratio is high, thus improve efficiency and the reliability of web page access.
In addition, adopt technical scheme provided by the present invention, by collecting unit, quality verification is carried out to each proxy server at least one available proxy server included in obtained proxy server set, and then to the information not by the proxy server of quality verification, carry out filtration treatment, effectively can ensure the quality of obtained proxy server.
In addition, adopt technical scheme provided by the present invention, without the need to the website of user's manual search realease agent server, completely transparent to user, the access that effectively can improve user is experienced.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.
The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.
The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-OnlyMemory, ROM), random access memory (RandomAccessMemory, RAM), magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1. an access method for webpage, is characterized in that, comprising:
Determine that the access of webpage is restricted;
Obtain the information of proxy server;
Utilize the information of described proxy server, access described webpage.
2. method according to claim 1, is characterized in that, describedly determines that the access of webpage is restricted, and comprising:
Obtain the request of access of described webpage;
According to the request of access of described webpage, determine that described webpage cannot be accessed;
According to access restricted list, determine that described webpage affiliated web site is for access restricted web site;
Determine that the access of described webpage is restricted.
3. method according to claim 1, is characterized in that, the information of described acquisition proxy server, comprising:
According to the banner of described webpage, obtain the information of described proxy server.
4. the method according to the arbitrary claim of claims 1 to 3, is characterized in that, before the information of described acquisition proxy server, also comprises:
Utilize web crawlers, obtain proxy server set, described proxy server set comprises the information of each proxy server at least one available proxy server, for according to described proxy server set, obtains the information of described proxy server.
5. method according to claim 4, is characterized in that, describedly utilizes web crawlers, after obtaining proxy server set, also comprises:
Quality verification is carried out at least one proxy server described;
To not by the information of the proxy server of quality verification, carry out filtration treatment.
6. an access means for webpage, is characterized in that, comprising:
Addressed location, for determining that the access of webpage is restricted;
Acquiring unit, for obtaining the information of proxy server;
Described addressed location, also for utilizing the information of described proxy server, accesses described webpage.
7. device according to claim 6, is characterized in that, described addressed location, also for
Obtain the request of access of described webpage;
According to the request of access of described webpage, determine that described webpage cannot be accessed;
According to access restricted list, determine that described webpage affiliated web site is for access restricted web site; And
Determine that the access of described webpage is restricted.
8. device according to claim 6, is characterized in that, described acquiring unit, specifically for
According to the banner of described webpage, obtain the information of described proxy server.
9. the device according to the arbitrary claim of claim 6 ~ 8, is characterized in that, described device also comprises collecting unit, for
Utilize web crawlers, obtain proxy server set, described proxy server set comprises the information of each proxy server at least one available proxy server, for according to described proxy server set, obtains the information of described proxy server.
10. device according to claim 9, is characterized in that, described collecting unit, also for
Quality verification is carried out at least one proxy server described; And
To not by the information of the proxy server of quality verification, carry out filtration treatment.
CN201510725908.3A 2015-10-30 2015-10-30 Webpage access method and device Pending CN105335511A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201510725908.3A CN105335511A (en) 2015-10-30 2015-10-30 Webpage access method and device
US15/745,987 US20180225387A1 (en) 2015-10-30 2016-05-23 Method and apparatus for accessing webpage, apparatus and non-volatile computer storage medium
EP16858633.7A EP3273362A4 (en) 2015-10-30 2016-05-23 Webpage access method, apparatus, device and non-volatile computer storage medium
PCT/CN2016/082981 WO2017071189A1 (en) 2015-10-30 2016-05-23 Webpage access method, apparatus, device and non-volatile computer storage medium
JP2017548061A JP6488508B2 (en) 2015-10-30 2016-05-23 Web page access method, apparatus, device, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510725908.3A CN105335511A (en) 2015-10-30 2015-10-30 Webpage access method and device

Publications (1)

Publication Number Publication Date
CN105335511A true CN105335511A (en) 2016-02-17

Family

ID=55286038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510725908.3A Pending CN105335511A (en) 2015-10-30 2015-10-30 Webpage access method and device

Country Status (5)

Country Link
US (1) US20180225387A1 (en)
EP (1) EP3273362A4 (en)
JP (1) JP6488508B2 (en)
CN (1) CN105335511A (en)
WO (1) WO2017071189A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071189A1 (en) * 2015-10-30 2017-05-04 百度在线网络技术(北京)有限公司 Webpage access method, apparatus, device and non-volatile computer storage medium
CN108769278A (en) * 2018-04-11 2018-11-06 北京中科闻歌科技股份有限公司 A kind of social media account management method and system
CN110147271A (en) * 2019-05-15 2019-08-20 重庆八戒传媒有限公司 Promote the method, apparatus and computer readable storage medium of crawler agent quality
CN111428179A (en) * 2020-03-19 2020-07-17 北大方正集团有限公司 Image monitoring method, device and electronic equipment
CN111767450A (en) * 2020-07-27 2020-10-13 深圳快学教育科技有限公司 Browser data acquisition system and method
CN112583780A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Method, device, system and equipment for accessing website data by using proxy IP

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560604B2 (en) 2009-10-08 2013-10-15 Hola Networks Ltd. System and method for providing faster and more efficient data communication
US9241044B2 (en) 2013-08-28 2016-01-19 Hola Networks, Ltd. System and method for improving internet communication by using intermediate nodes
US11023846B2 (en) 2015-04-24 2021-06-01 United Parcel Service Of America, Inc. Location-based pick up and delivery services
US11057446B2 (en) 2015-05-14 2021-07-06 Bright Data Ltd. System and method for streaming content from multiple servers
EP3472717B1 (en) 2017-08-28 2020-12-02 Luminati Networks Ltd. Method for improving content fetching by selecting tunnel devices
US11190374B2 (en) 2017-08-28 2021-11-30 Bright Data Ltd. System and method for improving content fetching by selecting tunnel devices
US20210067577A1 (en) 2019-02-25 2021-03-04 Luminati Networks Ltd. System and method for url fetching retry mechanism
CN111641664B (en) * 2019-03-01 2023-12-05 北京京东尚科信息技术有限公司 A crawler device service request method, device, system and storage medium
LT4027618T (en) 2019-04-02 2024-08-26 Bright Data Ltd. Managing a non-direct url fetching service
US10637956B1 (en) * 2019-10-01 2020-04-28 Metacluster It, Uab Smart proxy rotator
CN111488392B (en) * 2020-04-16 2023-07-07 北京思特奇信息技术股份有限公司 Query method, query system and electronic equipment
CN114595253A (en) * 2022-02-22 2022-06-07 深圳海域信息技术有限公司 Brand monitoring method, device, electronic device and medium
KR102681000B1 (en) * 2023-02-28 2024-07-04 쿠팡 주식회사 Operating method for electronic apparatus for managing transmission of information and electronic apparatus supporting thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101102313A (en) * 2007-06-21 2008-01-09 潘晓梅 Network download system and method with automatically replaced proxy server and its method
US20080195665A1 (en) * 2007-02-09 2008-08-14 Proctor & Stevenson Limited Tracking web server
CN101931635A (en) * 2009-06-18 2010-12-29 北京搜狗科技发展有限公司 Network resource access method and proxy device
CN102694772A (en) * 2011-03-23 2012-09-26 腾讯科技(深圳)有限公司 Apparatus, system and method for accessing internet web pages
CN104462570A (en) * 2014-12-26 2015-03-25 小米科技有限责任公司 Webpage content obtaining method and device

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6829638B1 (en) * 2000-08-03 2004-12-07 International Business Machines Corporation System and method for managing multiple proxy servers
US7483910B2 (en) * 2002-01-11 2009-01-27 International Business Machines Corporation Automated access to web content based on log analysis
US20030145046A1 (en) * 2002-01-31 2003-07-31 Keller S. Brandon Generating a list of addresses on a proxy server
CN101800758B (en) * 2009-02-09 2012-09-05 华为终端有限公司 Mobile terminal network visiting method, system and gateway
US20100205215A1 (en) * 2009-02-11 2010-08-12 Cook Robert W Systems and methods for enforcing policies to block search engine queries for web-based proxy sites
US9009330B2 (en) * 2010-04-01 2015-04-14 Cloudflare, Inc. Internet-based proxy service to limit internet visitor connection speed
US9049244B2 (en) * 2011-04-19 2015-06-02 Cloudflare, Inc. Registering for internet-based proxy services
CN103024933B (en) * 2011-09-28 2016-01-20 腾讯科技(深圳)有限公司 A kind of method of mobile Internet access system and access mobile Internet
US9386114B2 (en) * 2011-12-28 2016-07-05 Google Inc. Systems and methods for accessing an update server
CN103678311B (en) * 2012-08-31 2018-11-13 腾讯科技(深圳)有限公司 Web access method and system, crawl Routing Service device based on transfer mode
US9241044B2 (en) * 2013-08-28 2016-01-19 Hola Networks, Ltd. System and method for improving internet communication by using intermediate nodes
CN104767837B (en) * 2014-01-08 2018-08-24 阿里巴巴集团控股有限公司 A kind of method and device of identification agent IP address
CN103973682B (en) * 2014-04-30 2018-09-04 北京奇虎科技有限公司 Carry out the method and device of web page access
CN105335511A (en) * 2015-10-30 2016-02-17 百度在线网络技术(北京)有限公司 Webpage access method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080195665A1 (en) * 2007-02-09 2008-08-14 Proctor & Stevenson Limited Tracking web server
CN101102313A (en) * 2007-06-21 2008-01-09 潘晓梅 Network download system and method with automatically replaced proxy server and its method
CN101931635A (en) * 2009-06-18 2010-12-29 北京搜狗科技发展有限公司 Network resource access method and proxy device
CN102694772A (en) * 2011-03-23 2012-09-26 腾讯科技(深圳)有限公司 Apparatus, system and method for accessing internet web pages
CN104462570A (en) * 2014-12-26 2015-03-25 小米科技有限责任公司 Webpage content obtaining method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017071189A1 (en) * 2015-10-30 2017-05-04 百度在线网络技术(北京)有限公司 Webpage access method, apparatus, device and non-volatile computer storage medium
CN108769278A (en) * 2018-04-11 2018-11-06 北京中科闻歌科技股份有限公司 A kind of social media account management method and system
CN110147271A (en) * 2019-05-15 2019-08-20 重庆八戒传媒有限公司 Promote the method, apparatus and computer readable storage medium of crawler agent quality
CN110147271B (en) * 2019-05-15 2020-04-28 重庆八戒传媒有限公司 Method and device for improving quality of crawler proxy and computer readable storage medium
CN112583780A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Method, device, system and equipment for accessing website data by using proxy IP
CN112583780B (en) * 2019-09-30 2023-04-07 北京国双科技有限公司 Method, device, system and equipment for accessing website data by using proxy IP
CN111428179A (en) * 2020-03-19 2020-07-17 北大方正集团有限公司 Image monitoring method, device and electronic equipment
CN111428179B (en) * 2020-03-19 2023-09-19 新方正控股发展有限责任公司 Picture monitoring method and device and electronic equipment
CN111767450A (en) * 2020-07-27 2020-10-13 深圳快学教育科技有限公司 Browser data acquisition system and method

Also Published As

Publication number Publication date
WO2017071189A1 (en) 2017-05-04
EP3273362A1 (en) 2018-01-24
JP2018514846A (en) 2018-06-07
EP3273362A4 (en) 2018-04-25
JP6488508B2 (en) 2019-03-27
US20180225387A1 (en) 2018-08-09

Similar Documents

Publication Publication Date Title
CN105335511A (en) Webpage access method and device
US9245274B2 (en) Identifying selected dynamic content regions
CN104965764A (en) Static resource processing method and apparatus
US9734257B2 (en) Exported overlays
US20170083527A1 (en) Surfacing applications based on browsing activity
US9251283B2 (en) Instrumenting a website with dynamically generated code
CN112384940B (en) Mechanism for crawling e-commerce resource pages on the web
CN104331474A (en) Page processing method and device
CN111427577A (en) Code processing method and device and server
CN104731869A (en) Page display method and device
CN109284450B (en) Method and device for determining order forming paths, storage medium and electronic equipment
US20120072918A1 (en) Generation of generic universal resource indicators
CN103177096A (en) Page element positioning method based on text attribute and page element positioning device based on text attribute
CN109074401B (en) Extraction of primary content of a linked list
CN104951536B (en) Searching method and device
US7496843B1 (en) Web construction framework controller and model tiers
CN105260463A (en) Detail page processing method and apparatus
CN114238839A (en) Page generation method and device, electronic equipment and storage medium
KR101352259B1 (en) Advertisement providing method for general personal computer or mobile terminal and mobile advertisement building method for supporting the same
CN113282285A (en) Code compiling method and device, electronic equipment and storage medium
CN119226590B (en) Website data updating method, device, equipment and computer medium
CN104657882A (en) Method and device for obtaining popularization effect data
CN101145936A (en) A method and system for adding tags in Web pages
CN117010926A (en) User preference mining method and device, electronic equipment and medium
CN105677672A (en) Page display method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160217