[go: up one dir, main page]

CN114385950A - Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium - Google Patents

Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium Download PDF

Info

Publication number
CN114385950A
CN114385950A CN202111674847.4A CN202111674847A CN114385950A CN 114385950 A CN114385950 A CN 114385950A CN 202111674847 A CN202111674847 A CN 202111674847A CN 114385950 A CN114385950 A CN 114385950A
Authority
CN
China
Prior art keywords
link
target website
distinguished
website
icp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111674847.4A
Other languages
Chinese (zh)
Inventor
马科
高静
杨哲
陈云柯
葛裴
夏立强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Information and Communications Technology CAICT
Original Assignee
China Academy of Information and Communications Technology CAICT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Information and Communications Technology CAICT filed Critical China Academy of Information and Communications Technology CAICT
Priority to CN202111674847.4A priority Critical patent/CN114385950A/en
Publication of CN114385950A publication Critical patent/CN114385950A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本申请涉及内外链区分技术领域,公开一种用于区分目标网站内外链的方法,包括:获取目标网站的第一ICP备案信息;获取目标网站的待区分URL链接;根据第一ICP备案信息区分待区分URL链接是目标网站的内链或外链。这样,不需要考虑目标网站的主域名、目标网站的IP地址、待区分URL链接的主域名和待区分URL链接的IP地址之间的关系导致的误判,通过利用目标网站的第一ICP备案信息区分待区分URL链接是目标网站的内链或外链,提高了区分目标网站内外链的准确度。本申请还公开一种用于区分目标网站内外链的装置及电子设备、存储介质。

Figure 202111674847

This application relates to the technical field of distinguishing internal and external links, and discloses a method for distinguishing internal and external links of a target website, including: obtaining first ICP filing information of the target website; obtaining URL links of the target website to be distinguished; distinguishing according to the first ICP filing information The URL link to be distinguished is the internal link or external link of the target website. In this way, there is no need to consider the misjudgment caused by the relationship between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished, and the IP address of the URL link to be distinguished. By using the first ICP record of the target website Information Distinction The URL link to be distinguished is the internal link or external link of the target website, which improves the accuracy of distinguishing the internal and external links of the target website. The present application also discloses a device, an electronic device, and a storage medium for distinguishing internal and external links of a target website.

Figure 202111674847

Description

Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium
Technical Field
The present application relates to the technical field of internal and external chain distinguishing, and for example, to a method and an apparatus for distinguishing an internal and external chain of a target website, an electronic device, and a storage medium.
Background
The links of the website are divided into an inner link and an outer link, wherein the inner link is the link of the internal page of the website, and the outer link is the link of other website pages. In some web page related applications, after collecting a link under a web site, it may be necessary to identify whether the link is an in-link or an out-link for different processing. However, the cloud deployment of services and applications has become a main direction of the current website construction, which is beneficial to rapid deployment, cost reduction and distributed operation, and particularly for non-operational websites such as governments and large-scale enterprises, intensive construction of websites is developed in a dispute, a unified data center and a service platform are constructed, and centralized cloud deployment of various types and departments of websites is realized. The intensive construction of the website blurs the boundaries of the internal chain and the external chain of the website and also changes the connotation of the internal chain and the external chain of the website.
In the process of implementing the embodiments of the present disclosure, it is found that at least the following problems exist in the related art: in the prior art, by using a URL identification method, the main domain name or IP address of the target website is compared with the main domain name or IP address of the URL link, and thus, the accuracy of determining that the URL link is an inner link or an outer link of the target website is low, and erroneous determination conditions that an outer link with the same IP address and the same main domain name is determined as an inner link, and an inner link with different IP addresses and incompletely the same main domain name is determined as an outer link are likely to occur.
Disclosure of Invention
The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview nor is intended to identify key/critical elements or to delineate the scope of such embodiments but rather as a prelude to the more detailed description that is presented later.
The embodiment of the disclosure provides a method and a device for distinguishing an internal link and an external link of a target website, electronic equipment and a storage medium, so as to improve the accuracy of distinguishing the internal link and the external link of the target website.
In some embodiments, the method for distinguishing between inside and outside links of a target website comprises: acquiring first ICP filing information of a target website; acquiring URL links to be distinguished of a target website; and distinguishing whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information.
In some embodiments, the means for distinguishing between inside and outside links of a target website comprises: the first acquisition module is configured to acquire first ICP filing information of a target website; the second acquisition module is configured to acquire the URL link to be distinguished of the target website; and the distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information.
In some embodiments, the means for distinguishing between inside and outside links of a target website comprises: comprising a processor and a memory storing program instructions, the processor being configured, upon execution of the program instructions, to perform the above-described method for distinguishing between an inside and an outside link of a target web site.
In some embodiments, the electronic device includes the above-mentioned means for distinguishing the internal and external links of the target website.
In some embodiments, the storage medium stores program instructions that, when executed, perform the above-described method for distinguishing between internal and external links of a target website.
The method and the device for distinguishing the internal link and the external link of the target website, the electronic equipment and the storage medium provided by the embodiment of the disclosure can realize the following technical effects: the URL link to be distinguished of the target website is obtained by obtaining first ICP record information of the target website, and then the URL link to be distinguished is distinguished to be an internal link or an external link of the target website according to the first ICP record information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished by utilizing the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
The foregoing general description and the following description are exemplary and explanatory only and are not restrictive of the application.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the accompanying drawings and not in limitation thereof, in which elements having the same reference numeral designations are shown as like elements and not in limitation thereof, and wherein:
FIG. 1 is a schematic diagram of a method for distinguishing between internal and external links of a target website according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of another method for distinguishing between internal and external links of a target website provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of another method for distinguishing between internal and external links of a target website provided by an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of another method for distinguishing between internal and external links of a target website provided by an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of an apparatus for distinguishing between an inside and an outside link of a target website according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of another apparatus for distinguishing between an inside chain and an outside chain of a target website according to an embodiment of the present disclosure.
Detailed Description
So that the manner in which the features and elements of the disclosed embodiments can be understood in detail, a more particular description of the disclosed embodiments, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. In the following description of the technology, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, one or more embodiments may be practiced without these details. In other instances, well-known structures and devices may be shown in simplified form in order to simplify the drawing.
The terms "first," "second," and the like in the description and in the claims, and the above-described drawings of embodiments of the present disclosure, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the present disclosure described herein may be made. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions.
The term "plurality" means two or more unless otherwise specified.
In the embodiment of the present disclosure, the character "/" indicates that the preceding and following objects are in an or relationship. For example, A/B represents: a or B.
The term "and/or" is an associative relationship that describes objects, meaning that three relationships may exist. For example, a and/or B, represents: a or B, or A and B.
The term "correspond" may refer to an association or binding relationship, and a corresponds to B refers to an association or binding relationship between a and B.
The technical scheme of the embodiment of the disclosure can be applied to an intelligent terminal or a server. In some embodiments, the smart terminal comprises a smartphone, tablet or computer, etc. device capable of accessing a website.
In the embodiment of the disclosure, the intelligent terminal or the server is used for distinguishing the URL link of the target website, so that when the intelligent terminal or the server accesses the target website, the ICP filing information of the target website can determine that the URL link to be distinguished of the target website is an internal link or an external link of the target website, thereby facilitating the processing of the link.
Referring to fig. 1, an embodiment of the present disclosure provides a method for distinguishing an internal link and an external link of a target website, including:
step S101, the electronic device obtains first ICP (Internet Content Provider) filing information of the target website.
Step S102, the electronic equipment acquires the URL link to be distinguished of the target website.
Step S103, the electronic equipment distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information.
By adopting the method for distinguishing the internal link and the external link of the target website provided by the embodiment of the disclosure, the URL link to be distinguished of the target website is obtained by obtaining the first ICP record information of the target website, and then the URL link to be distinguished is distinguished to be the internal link or the external link of the target website according to the first ICP record information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished by utilizing the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Optionally, the target website is a non-commercial website.
Optionally, the obtaining, by the electronic device, first ICP filing information of the target website includes: the electronic equipment accesses a first website home page of a target website; the electronic equipment extracts the content of a first website home page; the electronic equipment acquires first ICP record information of the target website from the content of the first website home page. Therefore, the first website homepage of the target website is accessed through the electronic equipment, the first ICP record information of the target website is obtained in the content of the first website homepage, so that the URL link to be distinguished is the inner link or the outer link of the target website according to the first ICP record information, misjudgment caused by the relation between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished through the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Optionally, the electronic device accessing a first website home page of the target website, including: the electronic equipment acquires a URL link of a target website; the electronic equipment acquires a home page address of a URL link of a target website; and the electronic equipment accesses the first website home page corresponding to the home page address. Optionally, the home page address of the URL link of the target website is a website address corresponding to the host field of the URL link of the target website.
In some embodiments, the electronic device accesses a first website homepage of the target website, then extracts content of the first website homepage through a crawler, and the electronic device obtains first ICP filing information of the target website from the content of the first website homepage.
Optionally, the first ICP record information of the target website is an ICP record number of a first website home page of the target website.
In some embodiments, according to relevant policy rules, the non-commercial internet information service provider should register the website information before the website is opened, and obtain ICP register information, i.e., ICP register number. And then placing an ICP record number at the bottom of the website home page of the website for public inquiry and verification, and then confirming that the ICP record information is a public and private attribute field of the content of the website home page. Therefore, the ICP record number is particularly suitable for distinguishing the internal link and the external link of the non-commercial website, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not required to be considered, the internal link or the external link of the non-commercial website to be distinguished is distinguished by the first ICP record information of the target website, and the accuracy of distinguishing the internal link and the external link of the non-commercial website is improved. Meanwhile, by using the special attribute of the non-commercial website and only by analyzing the general attribute of the ICP record information in the page content, the distinction of the internal link and the external link of the website can be completed, and the simplicity, the efficiency and the accuracy of distinguishing the internal link and the external link of the target website are improved.
Optionally, the obtaining, by the electronic device, first ICP filing information of the target website in the content of the first website home page includes: and the electronic equipment performs format matching on the content of the home page of the first website, and determines a field as first ICP record information of the target website when the field with the same preset format is matched.
Optionally, the preset format is the format of ICP docket number.
In some embodiments, the format of the ICP docket number is: "province abbreviation + ICP backup" + "main ICP backup number" + "website serial number"; or, the format of the ICP record number is as follows: "ICP gets to" + "main ICP records number" + "website serial number".
Optionally, under the condition that the first ICP filing information of the target website is not acquired, the electronic device extracts the URL link to be distinguished of the target website from the first website home page; the electronic equipment distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website by using a URL identification method.
Optionally, the obtaining, by the electronic device, a to-be-distinguished URL link of the target website includes: under the condition that the electronic equipment acquires the first ICP record information of the target website, the electronic equipment extracts the URL link to be distinguished of the target website from the first website home page. Therefore, under the condition that the first ICP record information of the target website is acquired, the electronic equipment extracts the URL link to be distinguished of the target website from the first website home page, so that the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information, misjudgment caused by the relation between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not required to be considered, and the accuracy of distinguishing the internal link and the external link of the target website is improved by distinguishing the URL link to be distinguished from the internal link or the external link of the target website by utilizing the first ICP record information of the target website.
Optionally, the extracting, by the electronic device, the to-be-distinguished URL link of the target website in the first website home page includes: the electronic equipment carries out format matching on the content of the first website home page; and under the condition that the field with the same format as the preset URL is matched, the electronic equipment determines the field as the URL link to be distinguished of the target website.
Optionally, the URL link to be distinguished includes a secondary link exchange of the target website, a tertiary link of the target website, and the like.
Optionally, the distinguishing, by the electronic device, that the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information includes: the electronic equipment acquires a first host field of a first website home page; the electronic equipment acquires a second host field of the URL link to be distinguished; the electronic equipment determines the URL link to be distinguished as the internal link of the target website under the condition that the first host field is the same as the second host field; and under the condition that the first host field is different from the second host field, the electronic equipment distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information. Therefore, the URL link to be distinguished is determined as the inner link of the target website under the condition that the first host field is the same as the second host field, and under the condition that the first host field is different from the second host field, the URL link to be distinguished is distinguished as the inner link or the outer link of the target website according to the first ICP record information, and misjudgment caused by the relation between the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not required to be considered, so that the accuracy of distinguishing the inner link and the outer link of the target website is improved.
In some embodiments, the URL link includes: a host field, a path field, and a file field. For example: html, www.aaa.com/a/b/files/202111/index; wherein www.aaa.com is a host field; a/b/files/202111 is a path field; html is a file field; com is a domain name field.
Referring to fig. 2, another method for distinguishing an internal link and an external link of a target website is provided in an embodiment of the present disclosure, including:
in step S201, the electronic device accesses a first website home page of the target website.
In step S202, the electronic device extracts the content of the first website home page.
In step S203, the electronic device obtains first ICP filing information of the target website from the content of the first website home page.
Step S204, the electronic equipment acquires the URL link to be distinguished of the target website.
In step S205, the electronic device obtains a first host field of a first website home page.
In step S206, the electronic device obtains a second host field of the URL link to be distinguished.
In step S207, the electronic device determines the URL link to be distinguished as the internal link of the target website when the first host field is the same as the second host field.
Step S208, under the condition that the first host field is different from the second host field, the electronic device distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information.
By adopting the method for distinguishing the internal chain and the external chain of the target website provided by the embodiment of the disclosure, the first ICP filing information of the target website is obtained through the first website homepage of the target website, the URL link to be distinguished of the target website is obtained on the first website homepage, and then the URL link to be distinguished is distinguished to be the internal chain or the external chain of the target website according to the first host field, the second host field and the first ICP filing information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished by utilizing the first host field, the second host field and the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Referring to fig. 3, another method for distinguishing between an internal link and an external link of a target website is provided in an embodiment of the present disclosure, which includes:
in step S301, the electronic device accesses a first website home page of the target website.
In step S302, the electronic device extracts the content of the first website home page.
Step S303, the electronic device obtains first ICP filing information of the target website from the content of the first website home page.
Step S304, under the condition that the electronic equipment acquires the first ICP filing information of the target website, the URL link to be distinguished of the target website is extracted from the first website homepage.
In step S305, the electronic device obtains a first host field of a first website home page.
In step S306, the electronic device obtains a second host field of the URL link to be distinguished.
In step S307, the electronic device determines the URL link to be distinguished as the internal link of the target website when the first host field is the same as the second host field.
Step S308, under the condition that the first host field is different from the second host field, the electronic equipment distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information.
By adopting the method for distinguishing the internal link and the external link of the target website provided by the embodiment of the disclosure, the first ICP filing information of the target website is obtained through the first website home page of the target website, under the condition that the first ICP filing information of the target website is obtained, the URL link to be distinguished of the target website is extracted from the first website home page, and then the URL link to be distinguished is distinguished to be the internal link or the external link of the target website according to the first host field, the second host field and the first ICP filing information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, under the condition that the first ICP record information of the target website is obtained, the URL link to be distinguished is an inner link or an outer link of the target website is distinguished by the first host field, the second host field and the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Optionally, the distinguishing, by the electronic device, that the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information when the first host field is different from the second host field includes: under the condition that the first host field is different from the second host field, the electronic equipment accesses a second website home page linked with the URL to be distinguished according to the second host field; the electronic equipment extracts the content of the home page of the second website; the electronic equipment acquires second ICP filing information of the URL link to be distinguished from the content of the home page of the second website; and the electronic equipment distinguishes whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information and the second ICP record information. Therefore, the second ICP record information of the URL link to be distinguished is obtained according to the second host field under the condition that the first host field is different from the second host field, the first ICP record information and the second ICP record information are used for distinguishing whether the URL link to be distinguished is the inner link or the outer link of the target website, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Optionally, the electronic device accesses a second website home page linked with the URL to be distinguished according to the second host field, including: and the electronic equipment accesses the second website home page of the URL link to be distinguished corresponding to the second host field.
Optionally, the obtaining, by the electronic device, second ICP filing information of the URL link to be distinguished in the content of the home page of the second website includes: the electronic equipment carries out format matching on the content of the home page of the second website; and in the case that the electronic equipment is matched with the field with the same format as the preset format, determining the field as second ICP filing information of the URL link to be distinguished.
In some embodiments, when the first host field is different from the second host field, the electronic device accesses a second website home page of the URL link to be distinguished corresponding to the second host field, then extracts the content of the second website home page through a crawler, and matches, by format, second ICP docket information of the URL link to be distinguished in the content of the second website home page.
Optionally, the distinguishing, by the electronic device, that the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information and the second ICP filing information includes: the electronic equipment determines that the URL link to be distinguished is an inner link of the target website under the condition that the first ICP record information is the same as the second ICP record information; and/or the electronic equipment determines that the URL link to be distinguished is an external link of the target website under the condition that the first ICP record information is different from the second ICP record information. Therefore, the URL link to be distinguished is determined to be the inner link of the target website under the condition that the first ICP record information and the second ICP record information are the same, the URL link to be distinguished is determined to be the outer link of the target website under the condition that the first ICP record information and the second ICP record information are different, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
In some embodiments, the first ICP docket information and the second ICP docket information are determined to be the same in the event that the "province shorthand" field, the "subject ICP docket number" field, the "website serial number" field of the first ICP docket information and the second ICP docket information are all the same.
In some embodiments, when any of the "province abbreviation" field, the "main ICP record number" field, and the "website serial number" field of the first ICP record information and the second ICP record information are not the same, it is determined that the first ICP record information and the second ICP record information are not the same.
Optionally, after acquiring the first filing information and the second filing information, the electronic device stores the first filing information and the second filing information.
Referring to fig. 4, another method for distinguishing between an internal link and an external link of a target website is provided in an embodiment of the present disclosure, which includes:
in step S401, the electronic device accesses a first website home page of the target website.
In step S402, the electronic device extracts the content of the first website home page.
In step S403, the electronic device obtains first ICP filing information of the target website from the content of the first website homepage.
In step S404, under the condition that the electronic device acquires the first ICP filing information of the target website, the URL link to be distinguished of the target website is extracted from the first website home page.
In step S405, the electronic device obtains a first host field of a first website home page.
In step S406, the electronic device obtains a second host field of the URL link to be distinguished.
Step S407, the electronic device determines whether the first host field is the same as the second host field; if the first host field is the same as the second host field, go to step S412; if the first host field is different from the second host field, step S408 is performed.
Step S408, the electronic device accesses the second website home page linked with the URL to be distinguished according to the second host field.
In step S409, the electronic device extracts the content of the home page of the second website.
Step S410, the electronic device obtains second ICP filing information of the URL link to be distinguished from the content of the home page of the second website.
Step S411, the electronic equipment judges whether the first ICP filing information and the second ICP filing information are the same; if the first ICP filing information is the same as the second ICP filing information, step S412 is executed; if the first ICP record information is not the same as the second ICP record information, step S413 is executed.
In step S412, the electronic device determines that the URL link to be distinguished is an in-link of the target website.
In step S413, the electronic device determines that the URL link to be distinguished is an out-link of the target website.
By adopting the method for distinguishing the internal link and the external link of the target website provided by the embodiment of the disclosure, the first ICP record information of the target website is obtained through the first website home page of the target website, under the condition that the first ICP record information of the target website is obtained, the URL link to be distinguished of the target website is extracted from the first website home page, then under the condition that the first host field is the same as the second host field, the URL link is determined to be the internal link of the target website, under the condition that the first host field is different from the second host field, the second ICP record information connected with the URL is obtained through the second website home page connected with the URL, and the URL link to be distinguished is the internal link or the external link of the target website is distinguished according to the first ICP record information and the second ICP record information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, under the condition that the first ICP record information of the target website is obtained, the URL link to be distinguished is an inner link or an outer link of the target website is distinguished by the first host field, the second host field, the first ICP record information and the second ICP record information, and accuracy of distinguishing the inner link and the outer link of the target website is improved.
In some embodiments, in the case of determining an in-link and an out-link for the target website www.xxx1.gov.cn, the obtained URL link to be distinguished of the target website includes URL 1: "www.xxx1.gov.cn/hudong/hdjl/.. and URL 2: "jw.xxx 1.gov.cn/xxgk/zfxxgkml/. so." to distinguish URL1 and URL2 according to URL identification, the main domain name of the target website is xxx1.gov.cn, the main domain name of URL1 is xxx1.gov.cn, the main domain name of URL2 is xxx1.gov.cn, the main domain names of the three are the same, and both have a common IP address through domain name resolution: x.x.111.13. then, according to the URL identification, URL1 and URL2 are both in-links of the website. However, the obtained target website is the same as the ICP record information of the URL link URL1 to be distinguished, and the target website is not the same as the ICP record information of the URL link URL2 to be distinguished, so that the method for distinguishing the inside link and the outside link of the target website according to the present embodiment is used to distinguish the inside link and the outside link of the URL link to be distinguished, and it is determined that the URL1 is the inside link of the target website and the URL2 is the outside link of the target website. By determining the target website, the sponsoring units or the sponsoring units of the URL1 and the URL2, the sponsoring unit of the target website and the URL1 is xx1 municipal xx2 government, and the sponsoring unit of the URL2 is xx1 municipal xx3 committee, namely the URL2 is an outer chain of the target website. Therefore, according to the scheme, the external links with the same IP address and the same main domain name can be judged as the internal links by utilizing the ICP filing information of the target website, and the accuracy of distinguishing the internal links from the external links of the target website is improved.
In some embodiments, in the case of performing determination of an internal link and an external link for the target website www.xxx2.gov.cn, the obtained URL links to be distinguished of the target website include URL3 "www.xxx2.gov.cn/col/col 80524/index. html" and URL4 "www.xxx2.cn/col/col 80524/index. html", URL3 and URL4 are distinguished according to a URL identification method, and the main domain name of the target website is xxx2.gov.cn, the main domain name of URL3 is xxx2.gov.cn, and the main domain name of URL4 is xxx2. cn. The main domain name of the target site is the same as that of URL3, and both also have the same IP address by domain name resolution, 119.188.x.x, then URL3 is the in-link to the target site. The main domain name of the destination website is different from the main domain name of the URL4, and the IP address of the URL4 is resolved by the domain name to be 202.110.x.x, which is different from the IP address of the destination website, and then the URL4 is the out-link of the destination website. However, the acquired destination website is the same as the ICP record information of the URL links to be distinguished, URL3 and URL4, so that the method for distinguishing the inside and outside links of the destination website is adopted to distinguish the inside and outside links of the URL links to be distinguished, and it is determined that both URL3 and URL4 are the inside links of the destination website. By identifying the targeted website, URL3, and the sponsoring units or sponsoring units of URL4, the targeted website, URL3, and URL4 are obtained from xx3 and xx4 government, i.e., URL3 and URL4 are all inlinks of the targeted website. Therefore, according to the scheme, the internal link with different IP addresses and incompletely same main domain names can be prevented from being judged as the external link by utilizing the ICP filing information of the target website, and the accuracy of distinguishing the internal link from the external link of the target website is improved.
In some embodiments, in a business cloud and website intensive website building scene, the problem that the URL identification method cannot accurately distinguish internal and external links is caused. The embodiment of the disclosure provides that ICP record information is introduced to distinguish URL links to be distinguished, so that the URL links to be distinguished can be distinguished from an internal link or an external link of a target website from the perspective of a management domain, and the accuracy of distinguishing the URL links to be distinguished is improved.
Referring to fig. 5, an apparatus for distinguishing an internal link and an external link of a target website according to an embodiment of the present disclosure includes a first obtaining module 1, a second obtaining module 2, and a distinguishing module 3. The first acquisition module 1 is configured to acquire first ICP filing information of a target website; the second obtaining module 2 is configured to obtain a URL link to be distinguished of the target website; the distinguishing module 3 is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information.
By adopting the device for distinguishing the internal chain and the external chain of the target website, the URL link to be distinguished of the target website is obtained by obtaining the first ICP record information of the target website, and then the URL link to be distinguished is distinguished to be the internal chain or the external chain of the target website according to the first ICP record information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished by utilizing the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Optionally, the first obtaining module is configured to obtain the first ICP filing information of the target website by: accessing a first website home page of a target website; extracting the content of a first website home page; and acquiring first ICP filing information of the target website from the content of the first website home page.
Optionally, the second obtaining module is configured to obtain the URL link to be distinguished of the target website by: under the condition that first ICP filing information of the target website is obtained, the URL link to be distinguished of the target website is extracted from a first website homepage.
Optionally, the distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP docket information by: acquiring a first host field of a first website home page; acquiring a second host field of the URL link to be distinguished; under the condition that the first host field is the same as the second host field, determining the URL link to be distinguished as an inner chain of the target website; and under the condition that the first host field is different from the second host field, distinguishing whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information.
Optionally, the distinguishing module is configured to distinguish whether the URL link to be distinguished is an in-link or an out-link of the target website according to the first ICP docket information if the first host field is not the same as the second host field by: under the condition that the first host field is different from the second host field, accessing a second website home page linked with the URL to be distinguished according to the second host field; extracting the content of the home page of the second website; acquiring second ICP filing information of the URL link to be distinguished from the content of the home page of the second website; and distinguishing whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP record information and the second ICP record information.
Optionally, the distinguishing module is configured to distinguish whether the URL link to be distinguished is an in-link or an out-link of the target website according to the first ICP docket information and the second ICP docket information by: under the condition that the first ICP record information is the same as the second ICP record information, determining that the URL link to be distinguished is an inner link of the target website; and/or determining that the URL link to be distinguished is an external link of the target website under the condition that the first ICP record information is different from the second ICP record information.
As shown in fig. 6, an apparatus for distinguishing between an internal link and an external link of a target website according to an embodiment of the present disclosure includes a processor (processor)100 and a memory (memory) 101. Optionally, the apparatus may also include a Communication Interface (Communication Interface)102 and a bus 103. The processor 100, the communication interface 102, and the memory 101 may communicate with each other via a bus 103. The communication interface 102 may be used for information transfer. The processor 100 may invoke logic instructions in the memory 101 to perform the method for distinguishing between internal and external links of a target web site of the above-described embodiments.
In addition, the logic instructions in the memory 101 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products.
The memory 101, which is a computer-readable storage medium, may be used for storing software programs, computer-executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present disclosure. The processor 100 executes functional applications and data processing by executing program instructions/modules stored in the memory 101, namely, implements the method for distinguishing between an internal link and an external link of a target website in the above embodiments.
The memory 101 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. In addition, the memory 101 may include a high-speed random access memory, and may also include a nonvolatile memory.
By adopting the device for distinguishing the internal chain and the external chain of the target website, the URL link to be distinguished of the target website can be obtained by obtaining the first ICP record information of the target website, and then the URL link to be distinguished is distinguished to be the internal chain or the external chain of the target website according to the first ICP record information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished by utilizing the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
The embodiment of the disclosure provides an electronic device, which includes the above device for distinguishing the internal link and the external link of a target website.
By adopting the electronic equipment provided by the embodiment of the disclosure, the URL link to be distinguished of the target website is obtained by obtaining the first ICP record information of the target website, and then the URL link to be distinguished is distinguished to be an internal link or an external link of the target website according to the first ICP record information. Therefore, misjudgment caused by the relation among the main domain name of the target website, the IP address of the target website, the main domain name of the URL link to be distinguished and the IP address of the URL link to be distinguished is not needed to be considered, the URL link to be distinguished is the inner link or the outer link of the target website is distinguished by utilizing the first ICP record information of the target website, and the accuracy of distinguishing the inner link and the outer link of the target website is improved.
Optionally, the electronic device comprises a smart terminal or a server. Optionally, the smart terminal comprises a smart phone, a tablet or a computer, etc. capable of accessing the website.
The disclosed embodiments provide a storage medium storing computer-executable instructions configured to perform the above-described method for distinguishing between internal and external links of a target website.
The disclosed embodiments provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the above-described method for distinguishing between an inside and an outside link of a target website.
The computer-readable storage medium described above may be a transitory computer-readable storage medium or a non-transitory computer-readable storage medium.
The technical solution of the embodiments of the present disclosure may be embodied in the form of a software product, where the computer software product is stored in a storage medium and includes one or more instructions to enable a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present disclosure. And the aforementioned storage medium may be a non-transitory storage medium comprising: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes, and may also be a transient storage medium.
The above description and drawings sufficiently illustrate embodiments of the disclosure to enable those skilled in the art to practice them. Other embodiments may incorporate structural, logical, electrical, process, and other changes. The examples merely typify possible variations. Individual components and functions are optional unless explicitly required, and the sequence of operations may vary. Portions and features of some embodiments may be included in or substituted for those of others. Furthermore, the words used in the specification are words of description only and are not intended to limit the claims. As used in the description of the embodiments and the claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Similarly, the term "and/or" as used in this application is meant to encompass any and all possible combinations of one or more of the associated listed. Furthermore, the terms "comprises" and/or "comprising," when used in this application, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Without further limitation, an element defined by the phrase "comprising an …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element. In this document, each embodiment may be described with emphasis on differences from other embodiments, and the same and similar parts between the respective embodiments may be referred to each other. For methods, products, etc. of the embodiment disclosures, reference may be made to the description of the method section for relevance if it corresponds to the method section of the embodiment disclosure.
Those of skill in the art would appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software may depend upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments. It can be clearly understood by the skilled person that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments disclosed herein, the disclosed methods, products (including but not limited to devices, apparatuses, etc.) may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be merely a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to implement the present embodiment. In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. In the description corresponding to the flowcharts and block diagrams in the figures, operations or steps corresponding to different blocks may also occur in different orders than disclosed in the description, and sometimes there is no specific order between the different operations or steps. For example, two sequential operations or steps may in fact be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. Each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (10)

1.一种用于区分目标网站内外链的方法,其特征在于,包括:1. a method for distinguishing internal and external links of target website, is characterized in that, comprises: 获取目标网站的第一ICP备案信息;Obtain the first ICP filing information of the target website; 获取所述目标网站的待区分URL链接;Obtain the URL link to be distinguished of the target website; 根据所述第一ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链。According to the first ICP filing information, distinguish whether the URL link to be distinguished is an internal link or an external link of the target website. 2.根据权利要求1所述的方法,其特征在于,获取目标网站的第一ICP备案信息,包括:2. method according to claim 1, is characterized in that, obtains the first ICP filing information of target website, comprises: 访问所述目标网站的第一网站首页;Visit the first website homepage of the target website; 提取所述第一网站首页的内容;extracting the content of the home page of the first website; 在所述第一网站首页的内容中获取所述目标网站的第一ICP备案信息。Obtain the first ICP filing information of the target website from the content of the homepage of the first website. 3.根据权利要求2所述的方法,其特征在于,获取所述目标网站的待区分URL链接,包括:3. The method according to claim 2, wherein obtaining the URL link to be distinguished of the target website, comprising: 在获取到所述目标网站的第一ICP备案信息的情况下,在所述第一网站首页中提取所述目标网站的待区分URL链接。In the case of acquiring the first ICP filing information of the target website, extract the URL link of the target website to be distinguished from the home page of the first website. 4.根据权利要求2或3所述的方法,其特征在于,根据所述第一ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链,包括:4. The method according to claim 2 or 3, characterized in that, according to the first ICP filing information, distinguishing that the URL link to be differentiated is the inner link or the outer link of the target website, comprising: 获取所述第一网站首页的第一主机字段;obtaining the first host field of the home page of the first website; 获取所述待区分URL链接的第二主机字段;Obtain the second host field of the URL link to be distinguished; 在所述第一主机字段与所述第二主机字段相同的情况下,将所述待区分URL链接确定为所述目标网站的内链;In the case that the first host field is the same as the second host field, determining the URL link to be distinguished as an internal link of the target website; 在所述第一主机字段与所述第二主机字段不相同的情况下,根据所述第一ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链。In the case where the first host field is different from the second host field, distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information. 5.根据权利要求4所述的方法,其特征在于,在所述第一主机字段与所述第二主机字段不相同的情况下根据所述第一ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链,包括:5. The method according to claim 4, wherein when the first host field is different from the second host field, distinguishing that the URL link to be distinguished is based on the first ICP filing information. The internal or external links of the target website, including: 在所述第一主机字段与所述第二主机字段不相同的情况下,根据所述第二主机字段访问所述待区分URL链接的第二网站首页;In the case that the first host field is different from the second host field, access the second website home page of the URL link to be distinguished according to the second host field; 提取所述第二网站首页的内容;extracting the content of the home page of the second website; 在所述第二网站首页的内容中获取所述待区分URL链接的第二ICP备案信息;Obtain the second ICP filing information of the URL link to be distinguished in the content of the second website homepage; 根据所述第一ICP备案信息和所述第二ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链。According to the first ICP filing information and the second ICP filing information, distinguish whether the URL link to be distinguished is an internal link or an external link of the target website. 6.根据权利要求5所述的方法,其特征在于,根据所述第一ICP备案信息和所述第二ICP备案信息区分所述待区分URL链接是目标网站的内链或外链,包括:6. method according to claim 5, is characterized in that, according to described first ICP filing information and described second ICP filing information, distinguish described URL link to be differentiated is the inner chain or outer chain of target website, comprising: 在所述第一ICP备案信息和所述第二ICP备案信息相同的情况下,确定所述待区分URL链接是所述目标网站的内链;和/或,In the case where the first ICP filing information and the second ICP filing information are the same, it is determined that the URL link to be distinguished is an internal link of the target website; and/or, 在所述第一ICP备案信息和所述第二ICP备案信息不相同的情况下,确定所述待区分URL链接是所述目标网站的外链。In the case that the first ICP filing information and the second ICP filing information are different, it is determined that the URL link to be distinguished is an external link of the target website. 7.一种用于区分目标网站内外链的装置,其特征在于,包括:7. A device for distinguishing internal and external links of a target website, comprising: 第一获取模块,被配置为获取目标网站的第一ICP备案信息;a first obtaining module, configured to obtain the first ICP filing information of the target website; 第二获取模块,被配置为获取所述目标网站的待区分URL链接;The second obtaining module is configured to obtain the URL link to be distinguished of the target website; 区分模块,被配置为根据所述第一ICP备案信息区分所述待区分URL链接是所述目标网站的内链或外链。The distinguishing module is configured to distinguish whether the URL link to be distinguished is an internal link or an external link of the target website according to the first ICP filing information. 8.一种用于区分目标网站内外链的装置,包括处理器和存储有程序指令的存储器,其特征在于,所述处理器被配置为在运行所述程序指令时,执行如权利要求1至6任一项所述的用于区分目标网站内外链的方法。8. A device for distinguishing internal and external links of a target website, comprising a processor and a memory stored with program instructions, wherein the processor is configured to execute the steps as claimed in claims 1 to 1 when the processor is configured to run the program instructions. 6. The method for distinguishing internal and external links of a target website according to any one of them. 9.一种电子设备,其特征在于,包括如权利要求9所述的用于区分目标网站内外链的装置。9. An electronic device, characterized by comprising the device for distinguishing internal and external links of a target website as claimed in claim 9. 10.一种存储介质,存储有程序指令,其特征在于,所述程序指令在运行时,执行如权利要求1至6任一项所述的用于区分目标网站内外链的方法。10 . A storage medium storing program instructions, wherein the program instructions execute the method for distinguishing internal and external links of a target website according to any one of claims 1 to 6 when the program instructions are running. 11 .
CN202111674847.4A 2021-12-31 2021-12-31 Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium Pending CN114385950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111674847.4A CN114385950A (en) 2021-12-31 2021-12-31 Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111674847.4A CN114385950A (en) 2021-12-31 2021-12-31 Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114385950A true CN114385950A (en) 2022-04-22

Family

ID=81199699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111674847.4A Pending CN114385950A (en) 2021-12-31 2021-12-31 Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114385950A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002073607A (en) * 2000-08-25 2002-03-12 Nippon Telegr & Teleph Corp <Ntt> Method and apparatus for automatically estimating similarity between web pages and medium recording the program thereof
CN102667748A (en) * 2009-10-30 2012-09-12 日立数据系统有限公司 Use fixed content storage replicated on content platforms with namespaced partitions
CN102682097A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and equipment for detecting secrete links in web page
CN102902917A (en) * 2011-07-29 2013-01-30 国际商业机器公司 Method and system for preventing phishing attacks
US20150170072A1 (en) * 2013-07-26 2015-06-18 Ad-Vantage Networks, Inc. Systems and methods for managing network resource requests
CN108270754A (en) * 2017-01-03 2018-07-10 中国移动通信有限公司研究院 A kind of detection method and device of fishing website
CN111865886A (en) * 2019-04-30 2020-10-30 深信服科技股份有限公司 IP address information configuration method, system, device and storage medium
CN112217815A (en) * 2020-10-10 2021-01-12 杭州安恒信息技术股份有限公司 Phishing website identification method and device and computer equipment
CN113407802A (en) * 2021-06-10 2021-09-17 杭州安恒信息技术股份有限公司 Spider pool website identification method and device, electronic device and storage medium
US20210397671A1 (en) * 2018-12-18 2021-12-23 Wangsu Science & Technology Co., Ltd. Method and device for processing resource description file and for obtaining page resource

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002073607A (en) * 2000-08-25 2002-03-12 Nippon Telegr & Teleph Corp <Ntt> Method and apparatus for automatically estimating similarity between web pages and medium recording the program thereof
CN102667748A (en) * 2009-10-30 2012-09-12 日立数据系统有限公司 Use fixed content storage replicated on content platforms with namespaced partitions
CN102902917A (en) * 2011-07-29 2013-01-30 国际商业机器公司 Method and system for preventing phishing attacks
CN102682097A (en) * 2012-04-27 2012-09-19 北京神州绿盟信息安全科技股份有限公司 Method and equipment for detecting secrete links in web page
US20150170072A1 (en) * 2013-07-26 2015-06-18 Ad-Vantage Networks, Inc. Systems and methods for managing network resource requests
CN108270754A (en) * 2017-01-03 2018-07-10 中国移动通信有限公司研究院 A kind of detection method and device of fishing website
US20210397671A1 (en) * 2018-12-18 2021-12-23 Wangsu Science & Technology Co., Ltd. Method and device for processing resource description file and for obtaining page resource
CN111865886A (en) * 2019-04-30 2020-10-30 深信服科技股份有限公司 IP address information configuration method, system, device and storage medium
CN112217815A (en) * 2020-10-10 2021-01-12 杭州安恒信息技术股份有限公司 Phishing website identification method and device and computer equipment
CN113407802A (en) * 2021-06-10 2021-09-17 杭州安恒信息技术股份有限公司 Spider pool website identification method and device, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN1760872B (en) Method and system for processing destination addresses
CN109274632B (en) Method and device for identifying a website
CN102594934B (en) Method and device for identifying hijacked website
CN108092963B (en) Webpage identification method and device, computer equipment and storage medium
CN104717185B (en) Displaying response method, device, server and the system of short uniform resource locator
US9967269B2 (en) Method, device and system for processing DNS behavior
Fu et al. CT-GCN: A phishing identification model for blockchain cryptocurrency transactions
CN102200980A (en) Method and system for providing network resources
US11582226B2 (en) Malicious website discovery using legitimate third party identifiers
CN108270754B (en) Method and device for detecting phishing website
CN102663052A (en) Method and device for providing search results of search engine
CN108900554B (en) HTTP asset detection method, system, device and computer medium
US10931688B2 (en) Malicious website discovery using web analytics identifiers
CN115033599B (en) Graph query method, system and related device based on multi-party security
CN110929185B (en) Website directory detection method and device, computer equipment and computer storage medium
CN103618742A (en) Method and system for acquiring sub domain names and webmaster permission verification method
KR101099537B1 (en) Phishing site screening system based on website search and its method
CN107135199B (en) Web page backdoor detection method and device
CN101547211B (en) A method for discovering specific website by specifically scanning IP address field
WO2025076898A1 (en) Dns resolution method, dns server, electronic device and storage medium
CN114710468B (en) Domain name generation and identification method, device, equipment and medium
JP2011170597A (en) Data extraction apparatus, data extraction method, and data extraction program
CN114168945A (en) Method and device for detecting potential risk of sub-domain name
CN114385950A (en) Method and device for distinguishing internal link and external link of target website, electronic equipment and storage medium
CN107332856B (en) Address information detection method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination