[go: up one dir, main page]

CN102137022B - Method for identifying information of data packet, crawler engine and network system - Google Patents

Method for identifying information of data packet, crawler engine and network system Download PDF

Info

Publication number
CN102137022B
CN102137022B CN 201110082236 CN201110082236A CN102137022B CN 102137022 B CN102137022 B CN 102137022B CN 201110082236 CN201110082236 CN 201110082236 CN 201110082236 A CN201110082236 A CN 201110082236A CN 102137022 B CN102137022 B CN 102137022B
Authority
CN
China
Prior art keywords
application protocol
network entity
information
query
crawler
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110082236
Other languages
Chinese (zh)
Other versions
CN102137022A (en
Inventor
何有树
唐华新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Innovation Polymerization LLC
Gw Partnership Co ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN 201110082236 priority Critical patent/CN102137022B/en
Publication of CN102137022A publication Critical patent/CN102137022A/en
Application granted granted Critical
Publication of CN102137022B publication Critical patent/CN102137022B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Computer And Data Communications (AREA)

Abstract

本发明实施例公开了一种提供用于识别数据包的信息的方法、爬虫引擎及网络系统,其中,提供用于识别数据包的信息的方法包括:利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;向深度包检测DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议;其中,使用所述应用协议的网络实体信息包括:使用所述应用协议的网络实体的地址和使用所述应用协议的网络实体所使用的传输层协议标识。使用本发明实施例提供的技术方案,能够减少在识别码流过程中DPI设备的识别时间和性能开销。

The embodiment of the present invention discloses a method for providing information for identifying data packets, a crawler engine, and a network system, wherein the method for providing information for identifying data packets includes: using a crawler program of an application protocol to establish the application The corresponding relationship between the protocol and the network entity information using the application protocol; sending the identification information including the corresponding relationship to the deep packet inspection DPI device, so that the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs; wherein The information about the network entity using the application protocol includes: the address of the network entity using the application protocol and the identifier of the transport layer protocol used by the network entity using the application protocol. Using the technical solution provided by the embodiment of the present invention can reduce the identification time and performance overhead of the DPI device in the process of identifying the code stream.

Description

提供用于识别数据包的信息的方法、爬虫引擎及网络系统Method for providing information for identifying data packets, crawler engine and network system

技术领域 technical field

本发明涉及通信技术领域,特别涉及一种提供用于识别数据包的信息的方法、爬虫引擎及网络系统。The invention relates to the field of communication technology, in particular to a method for providing information for identifying data packets, a crawler engine and a network system.

背景技术 Background technique

深度包检测(Deep Packet Inspection,DPI)技术是DPI设备通过码流中的特征字符或者特征行为识别该码流所属的具体应用协议。后续根据识别结果可以进行各种上层业务,比如计费、流控操作等。DPI设备的网络部署位置和网关一样,网关所管辖的所有用户的所有码流都需要经过DPI设备,用户数可能达到上百万或更多,但是需要保证不能因为码流识别造成码流传输及后续处理的延迟。Deep Packet Inspection (DPI) technology is a DPI device that identifies the specific application protocol to which the code stream belongs through the characteristic characters or characteristic behaviors in the code stream. Subsequently, various upper-layer services can be performed according to the identification results, such as billing, flow control operations, etc. The network deployment location of the DPI device is the same as that of the gateway. All code streams of all users under the jurisdiction of the gateway need to pass through the DPI device. The number of users may reach millions or more, but it is necessary to ensure that the code stream transmission cannot be caused by code stream identification. Delay in subsequent processing.

为了使DPI设备能够识别码流,则需要离线分析一些应用协议码流的具体字符特征,根据分析结果生成知识库,将知识库加载到DPI设备上,后续DPI设备利用该知识库确定所接收的码流所属的具体应用协议。In order to enable the DPI device to identify the code stream, it is necessary to analyze the specific character characteristics of some application protocol code streams offline, generate a knowledge base based on the analysis results, and load the knowledge base to the DPI device. The subsequent DPI device uses the knowledge base to determine the received data. The specific application protocol to which the stream belongs.

现有技术具有如下问题:The prior art has the following problems:

当网络中某个应用协议发生更新后,需要离线分析更新后的该应用协议的码流的具体字符特征,并生成新的知识库,只有将新的知识库加载到DPI设备后该应用协议的码流才能被识别,这样在生成新知识库和将新知识库加载到DPI设备的过程中,就不能识别该应用协议的码流,增加了识别的时间;而且,DPI设备在码流的识别过程中,对于一些加密的码流需要先进行解密,然后再根据解密后的码流中的特征字符或者特征行为进行识别,这样需要耗用大量的性能开销。When an application protocol in the network is updated, it is necessary to analyze the specific character characteristics of the code stream of the updated application protocol offline and generate a new knowledge base. The code stream can only be identified, so that in the process of generating a new knowledge base and loading the new knowledge base into the DPI device, the code stream of the application protocol cannot be recognized, which increases the recognition time; moreover, the DPI device recognizes the code stream During the process, some encrypted code streams need to be decrypted first, and then identified according to the characteristic characters or characteristic behaviors in the decrypted code stream, which requires a lot of performance overhead.

发明内容 Contents of the invention

本发明实施例提供一种提供用于识别数据包的信息的方法、爬虫引擎及网络系统,以减少在识别码流过程中DPI设备识别时间和性能开销。Embodiments of the present invention provide a method for providing information for identifying data packets, a crawler engine, and a network system, so as to reduce DPI device identification time and performance overhead in the process of identifying code streams.

有鉴于此,本发明实施例提供:In view of this, embodiments of the present invention provide:

一种提供用于识别数据包的信息的方法,包括:A method of providing information for identifying a data packet comprising:

利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;Using the crawler program of the application protocol to establish a corresponding relationship between the application protocol and the network entity information using the application protocol;

向深度包检测DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议;Sending identification information including the corresponding relationship to the deep packet inspection DPI device, so that the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs;

其中,使用所述应用协议的网络实体信息包括:使用所述应用协议的网络实体的地址和使用所述应用协议的网络实体所使用的传输层协议标识。Wherein, the information of the network entity using the application protocol includes: the address of the network entity using the application protocol and the identifier of the transport layer protocol used by the network entity using the application protocol.

一种网络爬虫,包括:A web crawler comprising:

建立单元,用于利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;An establishing unit, configured to use a crawler program of an application protocol to establish a corresponding relationship between the application protocol and the network entity information using the application protocol;

发送单元,用于向DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议;A sending unit, configured to send identification information including the corresponding relationship to the DPI device, so that the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs;

其中,使用所述应用协议的网络实体信息包括:使用所述应用协议的网络实体的地址和使用所述应用协议的网络实体所使用的传输层协议标识。Wherein, the information of the network entity using the application protocol includes: the address of the network entity using the application protocol and the identifier of the transport layer protocol used by the network entity using the application protocol.

一种网络系统,包括:上述网络爬虫和深度包检测DPI设备,其中,DPI设备,用于接收所述网络爬虫发送的所述识别信息,利用所述对应关系识别数据包所属的应用协议。A network system, comprising: the above-mentioned web crawler and a deep packet inspection DPI device, wherein the DPI device is configured to receive the identification information sent by the web crawler, and use the corresponding relationship to identify the application protocol to which the data packet belongs.

本发明实施例中利用应用协议的爬虫程序来建立该应用协议和使用该应用协议的网络实体的信息的对应关系,并发送给DPI设备,使DPI设备利用该对应关系识别数据包所属的应用协议,这样,DPI设备就不需要解密数据包,降低了性能开销,DPI设备也不用离线分析该应用协议的码流的具体字符特征了,降低了码流识别的时间。In the embodiment of the present invention, the crawler program of the application protocol is used to establish the corresponding relationship between the application protocol and the information of the network entity using the application protocol, and send it to the DPI device, so that the DPI device can use the corresponding relationship to identify the application protocol to which the data packet belongs In this way, the DPI device does not need to decrypt the data packet, which reduces the performance overhead, and the DPI device does not need to analyze the specific character characteristics of the code stream of the application protocol offline, which reduces the time for code stream identification.

附图说明 Description of drawings

为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the accompanying drawings used in the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. Those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort.

图1是本发明一实施例提供的向DPI设备提供用于识别数据包的信息的方法流程图;FIG. 1 is a flowchart of a method for providing information for identifying a data packet to a DPI device according to an embodiment of the present invention;

图2A是本发明另一实施例提供的向DPI设备提供用于识别数据包的信息的方法流程图;2A is a flowchart of a method for providing information for identifying a data packet to a DPI device according to another embodiment of the present invention;

图2B是本发明另一实施例提供的爬虫引擎获取Peer信息的示意图;Fig. 2B is a schematic diagram of obtaining Peer information by a crawler engine provided by another embodiment of the present invention;

图3是本发明又一实施例提供的向DPI设备提供用于识别数据包的信息的方法流程图;3 is a flowchart of a method for providing information for identifying a data packet to a DPI device according to another embodiment of the present invention;

图4是本发明实施例提供的DPI设备识别数据包所属应用协议的方法流程图;FIG. 4 is a flow chart of a method for a DPI device to identify an application protocol to which a data packet belongs according to an embodiment of the present invention;

图5是本发明实施例提供的识别应用协议的示意图;Fig. 5 is a schematic diagram of an identification application protocol provided by an embodiment of the present invention;

图6是本发明实施例提供的应用协议为Bittorent非DHT协议时爬虫引擎获取Peer信息的方法流程图;Fig. 6 is the method flowchart that crawler engine obtains Peer information when the application agreement that the embodiment of the present invention provides is Bittorent non-DHT agreement;

图7是本发明实施例提供的应用协议为Bittorent DHT协议时爬虫引擎获取Peer信息的方法流程图;Fig. 7 is the method flowchart that reptile engine obtains Peer information when the application agreement that the embodiment of the present invention provides is Bittorent DHT agreement;

图8是本发明实施例提供的一种识别信息共享的示意图;Fig. 8 is a schematic diagram of identification information sharing provided by an embodiment of the present invention;

图9是本发明实施例提供的另一种识别信息共享的示意图;Fig. 9 is a schematic diagram of another identification information sharing provided by an embodiment of the present invention;

图10是本发明实施例提供的一种网络爬虫结构图;FIG. 10 is a structural diagram of a web crawler provided by an embodiment of the present invention;

图11是本发明实施例提供的另一种网络爬虫结构图;Fig. 11 is another kind of web crawler structural diagram provided by the embodiment of the present invention;

图12是本发明实施例提供的又一种网络爬虫结构图;Fig. 12 is another kind of web crawler structural diagram provided by the embodiment of the present invention;

图13是本发明实施例提供的一种网络系统结构图;FIG. 13 is a structural diagram of a network system provided by an embodiment of the present invention;

图14是本发明实施例提供的另一种网络系统结构图。FIG. 14 is a structural diagram of another network system provided by an embodiment of the present invention.

具体实施方式 Detailed ways

参阅图1,本发明实施例提供一种向DPI提供用于识别数据包的信息的方法,其包括:Referring to Fig. 1, an embodiment of the present invention provides a method for providing DPI with information for identifying a data packet, which includes:

101、利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系。101. Using a crawler program of an application protocol, establish a correspondence between the application protocol and network entity information using the application protocol.

其中,该实施例的执行主体可以是爬虫引擎,该爬虫引擎可以位于DPI设备上,也可以位于与DPI设备独立存在的设备上,不影响本发明的实现。Wherein, the execution subject of this embodiment may be a crawler engine, and the crawler engine may be located on a DPI device or a device that exists independently from the DPI device, which does not affect the implementation of the present invention.

其中,所述网络实体的信息包括:所述网络实体的地址和所述网络实体所使用的传输层协议标识。所述网络实体的地址包括:网络实体的IP地址和端口号;或者,所述网络实体的地址包括:所述网络实体的域名。其中,所述应用协议和使用所述应用协议的网络实体信息的对应关系具体可以是:应用协议的标识和使用所述应用协议的网络实体信息的对应关系。其中,应用协议的标识可以是应用协议的ID或者应用协议的名称。Wherein, the information of the network entity includes: the address of the network entity and the identifier of the transport layer protocol used by the network entity. The address of the network entity includes: the IP address and port number of the network entity; or, the address of the network entity includes: the domain name of the network entity. Wherein, the corresponding relationship between the application protocol and the network entity information using the application protocol may specifically be: the corresponding relationship between the identifier of the application protocol and the network entity information using the application protocol. Wherein, the identifier of the application protocol may be an ID of the application protocol or a name of the application protocol.

具体的,网络中的各网络实体可以采用不同的应用协议,也可以多个网络实体采用相同的应用协议,每一个应用协议对应一个爬虫程序,不同的应用协议对应不同的爬虫程序,具体的应用协议的爬虫程序可以有:BT程序、eDonkey程序、QQ程序等。Specifically, each network entity in the network may use different application protocols, or multiple network entities may use the same application protocol, each application protocol corresponds to a crawler program, and different application protocols correspond to different crawler programs. The crawler program of the protocol can include: BT program, eDonkey program, QQ program, etc.

该步骤可以通过如下两种方式实现:This step can be achieved in the following two ways:

第一种方式:爬虫引擎顺序调用爬虫程序集合中的应用协议的爬虫程序向网络中的网络实体发送探测请求消息,直到接收到所述网络实体返回的表示探测成功的响应消息为止,建立对应响应消息的探测请求消息所使用的应用协议与所述网络实体的信息的对应关系。The first method: the crawler engine sequentially invokes the crawler programs of the application protocols in the crawler program set to send detection request messages to the network entities in the network, until the response message returned by the network entity indicating successful detection is received, and a corresponding response is established The corresponding relationship between the application protocol used by the probe request message and the information of the network entity.

其中,爬虫程序集合中有多种应用协议的爬虫程序,比如BT程序、eDonkey程序、QQ程序等,爬虫引擎顺序调用各应用协议的爬虫程序向网络中的网络实体进行探测,以探测该网络实体是否使用该应用协议。如果该网络实体使用某个应用协议,网络实体接收到爬虫引擎利用该应用协议的爬虫程序发送的探测请求消息之后,会向爬虫引擎反馈表示探测成功的响应消息。Among them, there are crawler programs of various application protocols in the crawler program collection, such as BT program, eDonkey program, QQ program, etc., and the crawler engine sequentially calls the crawler programs of each application protocol to detect the network entity in the network to detect the network entity Whether to use the application protocol. If the network entity uses a certain application protocol, the network entity will feed back a response message indicating successful detection to the crawler engine after receiving the detection request message sent by the crawler program of the crawler engine using the application protocol.

优选的,采用这种实现方式的情况下,在该步骤之前,还可以包括:爬虫引擎接收DPI设备发送的需要识别应用协议的网络实体的信息;其中,所述需要识别应用协议的网络实体是DPI设备未能识别出所用应用协议的网络实体。则该步骤中,爬虫引擎顺序调用爬虫程序集合中应用协议的爬虫程序向该需要识别应用协议的网络实体发送探测请求消息,直到接收到该需要识别应用协议的网络实体返回的表示探测成功的响应消息为止。Preferably, in the case of adopting this implementation method, before this step, it may also include: the crawler engine receives the information of the network entity that needs to recognize the application protocol sent by the DPI device; wherein, the network entity that needs to recognize the application protocol is The DPI device failed to identify the network entity for the application protocol used. Then in this step, the crawler engine sequentially invokes the crawler program of the application protocol in the crawler program set to send a detection request message to the network entity that needs to identify the application protocol, until it receives a response indicating that the detection is successful returned by the network entity that needs to identify the application protocol news.

第二种方式包括如下步骤:The second method includes the following steps:

A、确定作为查询源的网络实体;其中,该作为查询源的网络实体使用所述应用协议。A. Determine a network entity serving as a query source; wherein, the network entity serving as a query source uses the application protocol.

B、利用应用协议的爬虫程序从该查询源获取与该查询源关联的网络实体的信息;其中,与该查询源关联的网络实体为使用该应用协议、且与该查询源拥有相同资源的网络实体。B. The crawler program using the application protocol obtains the information of the network entity associated with the query source from the query source; wherein, the network entity associated with the query source is a network that uses the application protocol and has the same resources as the query source entity.

其中,与该查询源拥有相同资源的网络实体可以是与该查询源分享相同文件的网络实体。比如,与该查询源下载同一文件的网络实体,或者,该查询源在上传一个文件,下载该查询源上传的该文件的网络实体。Wherein, the network entity having the same resource as the query source may be a network entity sharing the same file as the query source. For example, a network entity that downloads the same file as the query source, or a network entity that downloads the file uploaded by the query source while the query source is uploading a file.

C、将与查询源关联的网络实体中没有做过查询源的网络实体作为更新的查询源,当没有达到查询结束条件时,返回执行步骤B,此时所述步骤B中的查询源为所述更新的查询源;当达到查询结束条件时,建立所述应用协议和使用所述应用协议的网络实体的信息的对应关系。其中,使用所述应用协议的网络实体包括:步骤A中的作为查询源的网络实体,以及步骤B中获取到的网络实体。C. Use a network entity that has not been used as a query source among the network entities associated with the query source as an updated query source. When the query end condition is not met, return to step B. At this time, the query source in step B is all The updated query source; when the query end condition is reached, the corresponding relationship between the application protocol and the information of the network entity using the application protocol is established. Wherein, the network entity using the application protocol includes: the network entity used as the query source in step A, and the network entity obtained in step B.

其中,查询结束条件可以是获取到的网络实体个数达到了预定的个数,或者定时器超时,即执行查询操作的时间(即执行上述步骤A、B和C的时间)超过了预定的时间。Wherein, the query end condition may be that the number of acquired network entities reaches a predetermined number, or the timer expires, that is, the time for performing the query operation (that is, the time for performing the above steps A, B, and C) exceeds the predetermined time .

其中,步骤A中作为查询源的网络实体可以是预定的使用该应用协议的网络实体;也可以是从特定资源的种子文件中提取出的、且使用该应用协议的网络实体,或者,从特定资源的种子文件提取出的跟踪器Tracker所追踪到的、且使用该应用协议的网络实体;也可以是爬虫引擎顺序调用爬虫程序集合中的应用协议的爬虫程序尝试向网络中的某个网络实体发送探测请求消息,直到接收到该网络实体返回的表示探测成功的响应消息为止,确定对应该响应消息的探测请求消息所使用的应用协议是该网络实体所使用的应用协议后,将该网络实体作为查询源;也可以是DPI设备通知爬虫引擎的,具体的,DPI设备接收到某个数据包之后,会利用深度包检测技术确定该数据包所属的应用协议,建立该数据包的源端和/或目的端与该应用协议标识的映射关系并发送给爬虫引擎;爬虫引擎根据DPI设备发送的映射关系,确定所述映射关系中应用协议标识所对应的网络实体是作为查询源的网络实体。其中,DPI设备利用深度包检测技术确定该数据包所属的应用协议具体可以为:DPI设备利用特征识别方式、行为识别方式、启发式识别方式或者关联识别方式识别该数据包所属的应用协议。其中,采用特征识别方式、行为识别方式、启发式识别方式或者关联识别方式识别数据包业务类型的具体实现方式可以参考现有的方案,在此不再赘述。Wherein, the network entity used as the query source in step A may be a predetermined network entity using the application protocol; it may also be a network entity extracted from the seed file of a specific resource and using the application protocol, or from a specific The network entity tracked by the tracker tracker extracted from the seed file of the resource and using the application protocol; it can also be that the crawler engine sequentially calls the crawler program in the crawler program set to try to send a network entity to a certain network entity in the network Send a probe request message until receiving a response message indicating that the probe is successful returned by the network entity. After determining that the application protocol used in the probe request message corresponding to the response message is the application protocol used by the network entity, the network entity As a query source; the DPI device can also notify the crawler engine. Specifically, after receiving a data packet, the DPI device will use deep packet inspection technology to determine the application protocol to which the data packet belongs, and establish the source end and /or the mapping relationship between the destination end and the application protocol identifier and send it to the crawler engine; the crawler engine determines that the network entity corresponding to the application protocol identifier in the mapping relationship is the network entity as the query source according to the mapping relationship sent by the DPI device. Wherein, the DPI device uses the deep packet inspection technology to determine the application protocol to which the data packet belongs specifically may be: the DPI device uses a feature recognition method, a behavior recognition method, a heuristic recognition method or an association recognition method to identify the application protocol to which the data packet belongs. Wherein, the specific implementation manner of identifying the service type of the data packet by means of feature recognition, behavior recognition, heuristic recognition or association recognition can refer to existing solutions, and will not be repeated here.

其中,适用这种方式的应用协议爬虫程序是适用于点到点的应用协议爬虫程序,比如:BT程序、eDonkey程序等。Among them, the application protocol crawler program applicable to this method is a point-to-point application protocol crawler program, such as: BT program, eDonkey program and so on.

102、向DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议。102. Send identification information including the correspondence to the DPI device, so that the DPI device uses the correspondence to identify the application protocol to which the data packet belongs.

其中,所述识别信息还包括:所述对应关系的老化建议时间,使该DPI设备在该对应关系的老化建议时间到达后,控制该对应关系无效,比如,直接将该对应关系删除等,使在该对应关系的老化建议时间到达后,DPI设备不能再根据该对应关系确定数据包所属的应用协议了。Wherein, the identification information further includes: the aging suggestion time of the corresponding relationship, so that the DPI device can control the corresponding relationship to be invalid after the aging suggested time of the corresponding relationship arrives, for example, directly delete the corresponding relationship, etc., so that After the aging suggestion time of the corresponding relationship arrives, the DPI device can no longer determine the application protocol to which the data packet belongs according to the corresponding relationship.

其中,本发明实施例中的网络实体可以是对等网络实体(Peer),也可以是其他的网络实体,不影响本发明的实现。Wherein, the network entity in the embodiment of the present invention may be a peer-to-peer network entity (Peer), or other network entities, which does not affect the implementation of the present invention.

本发明实施例中利用应用协议的爬虫程序来建立该应用协议的标识和使用该应用协议的网络实体的信息的对应关系,并发送给DPI设备,使DPI设备利用该对应关系识别数据包所属的应用协议,这样,DPI设备就不需要解密数据包,降低了性能开销,DPI设备也不用离线分析该应用协议的码流的具体字符特征,降低了码流识别的时间。In the embodiment of the present invention, the crawler program of the application protocol is used to establish the corresponding relationship between the identification of the application protocol and the information of the network entity using the application protocol, and send it to the DPI device, so that the DPI device can use the corresponding relationship to identify the information packet to which the data packet belongs. In this way, the DPI device does not need to decrypt the data packet, which reduces the performance overhead, and the DPI device does not need to analyze the specific character characteristics of the code stream of the application protocol offline, which reduces the time for code stream identification.

为了使本发明上述技术方案更加清楚明白,如下实施例以网络实体为peer(对等网络实体)为例,对本发明提供的向DPI提供用于识别数据包的信息的方法进行详细描述,其具体包括:In order to make the above-mentioned technical solutions of the present invention clearer, the following embodiments take the network entity as a peer (peer-to-peer network entity) as an example, and describe in detail the method for providing DPI with information for identifying data packets provided by the present invention. include:

201A、DPI设备中的爬虫引擎确定作为查询源的对等网络实体为Peer1,利用应用协议的爬虫程序向Peer1发送查询请求消息。201A. The crawler engine in the DPI device determines that the peer-to-peer network entity as the query source is Peer1, and uses the crawler program of the application protocol to send a query request message to Peer1.

该实施例中爬虫引擎保存有为不用应用协议所开发的爬虫程序,在需要访问Peer时,根据预定信息确定作为查询源的对等网络实体,调用应用协议的爬虫程序,向该查询源发送查询请求。其中,预定信息可以是预定的对应具体应用协议的查询源的信息,比如,可以预定对应Bittorent_DHT这个协议的查询源是Peer1,则预定信息可以为该Peer1的地址信息和该Peer1所应用的传输层协议。In this embodiment, the crawler engine saves the crawler program developed for not using the application protocol. When Peer needs to be accessed, it determines the peer-to-peer network entity as the query source according to the predetermined information, calls the crawler program of the application protocol, and sends a query to the query source. ask. Wherein, the reservation information can be predetermined information corresponding to the query source of the specific application protocol. For example, the query source corresponding to the Bittorent_DHT protocol can be predetermined to be Peer1, and then the reservation information can be the address information of the Peer1 and the transport layer applied by the Peer1 protocol.

可选的,DPI设备中的爬虫引擎也可以根据DPI设备的识别引擎上报的信息确定作为查询源的对等网络实体。具体的,DPI设备中的识别引擎接收到某个数据包之后,利用深度包检测技术确定该数据包所属的应用协议,建立该数据包的源端和/或目的端与该应用协议标识的映射关系并发送给爬虫引擎,爬虫引擎将该映射关系中该应用协议标识所对应的源端和/或目的端作为查询源。Optionally, the crawler engine in the DPI device may also determine the peer-to-peer network entity as the query source according to the information reported by the identification engine of the DPI device. Specifically, after receiving a data packet, the recognition engine in the DPI device uses deep packet inspection technology to determine the application protocol to which the data packet belongs, and establishes a mapping between the source end and/or destination end of the data packet and the application protocol identifier The relationship is sent to the crawler engine, and the crawler engine uses the source and/or destination corresponding to the application protocol identifier in the mapping relationship as the query source.

202A、Peer1向爬虫引擎发送查询响应消息,其包括:与Peer1关联的Peer的信息。202A. Peer1 sends a query response message to the crawler engine, which includes: information about the Peer associated with Peer1.

其中,与Peer1关联的Peer的信息包括:与Peer1关联的Peer的地址信息和与Peer1关联的Peer所使用的传输层协议,其中,该Peer的地址信息可以包括IP地址和端口号,或者包括域名等。Wherein, the information of the Peer associated with Peer1 includes: the address information of the Peer associated with Peer1 and the transport layer protocol used by the Peer associated with Peer1, wherein the address information of the Peer can include an IP address and a port number, or include a domain name wait.

其中,与Peer1关联的Peer为使用该应用协议、且与该Peer1拥有相同资源的Peer,比如拥有相同的媒体资源,具体如,与该Peer1在下载同一个电影的Peer。该实施例中假定与Peer1关联的Peer为Peer2和Peer3。Wherein, the Peer associated with Peer1 is a Peer that uses the application protocol and has the same resource as the Peer1, such as the same media resource, for example, a Peer that is downloading the same movie as the Peer1. In this embodiment, it is assumed that the Peers associated with Peer1 are Peer2 and Peer3.

203A、DPI设备中的爬虫引擎将Peer2作为查询源,利用该应用协议的爬虫程序向Peer2发送查询请求消息。203A. The crawler engine in the DPI device uses Peer2 as a query source, and uses the crawler program of the application protocol to send a query request message to Peer2.

204A、Peer2向爬虫引擎发送查询响应消息,其包括:与Peer2关联的Peer的信息,其中,与Peer2关联的Peer为使用该应用协议、且与该Peer2拥有相同资源的Peer。204A, Peer2 sends a query response message to the crawler engine, which includes: information about the Peer associated with Peer2, wherein the Peer associated with Peer2 is a Peer that uses the application protocol and has the same resources as the Peer2.

205A、爬虫引擎将Peer3作为查询源,利用该应用协议的爬虫程序向Peer3发送查询请求消息。205A. The crawler engine uses Peer3 as a query source, and uses the crawler program of the application protocol to send a query request message to Peer3.

206A、Peer3向爬虫引擎发送查询响应消息,其包括:与Peer3关联的Peer的信息,其中,与Peer3关联的Peer为使用该应用协议、且与该Peer3拥有相同资源的Peer。206A. Peer3 sends a query response message to the crawler engine, which includes: information about the Peer associated with Peer3, wherein the Peer associated with Peer3 is a Peer that uses the application protocol and has the same resources as the Peer3.

需要说明的是,步骤203A和步骤204A是顺序执行的,步骤205A和步骤206A是顺序执行的,但是步骤203A-步骤204A与步骤205A-步骤206A之间没有执行上的先后顺序,也可以并列执行,不影响本发明的实现。It should be noted that step 203A and step 204A are executed sequentially, and step 205A and step 206A are executed sequentially, but there is no sequence of execution between step 203A-step 204A and step 205A-step 206A, and they can also be executed in parallel , does not affect the realization of the present invention.

207A、DPI设备中的爬虫引擎判断是否达到查询结束条件,如果否,执行步骤208A,如果是,执行步骤210A。207A. The crawler engine in the DPI device judges whether the query end condition is met, if not, execute step 208A, and if yes, execute step 210A.

其中,判断是否达到查询结束条件可以是判断查询到的使用该应用协议的Peer的个数是否满足预定的个数,或者判断定时器是否超时,即查询操作时间是否达到了预定的时间。Wherein, judging whether the query end condition is met may be judging whether the number of peers using the application protocol found in the query meets the predetermined number, or judging whether the timer expires, that is, whether the query operation time reaches the predetermined time.

在该步骤之前,爬虫引擎当根据各查询源返回的查询响应消息,确定各查询源反馈的使用该应用协议的Peer中有重复的节点时,去掉各查询源反馈的使用该应用协议的Peer中重复的Peer,则上述所判断的“查询到的使用该应用协议的Peer”不包括该重复的Peer。Before this step, when the crawler engine determines that there are duplicate nodes in the peers using the application protocol fed back by each query source according to the query response messages returned by each query source, remove the nodes in the peers using the application protocol fed back by each query source. Duplicate Peer, then the above-mentioned judged "queried peers using the application protocol" does not include the duplicate Peer.

其中,该步骤是从与Peer2关联的各Peer和与Peer3关联的各Peer中去掉重复的Peer。其中,该步骤也可以将各查询源反馈的使用该应用协议的Peer中无效的Peer去掉,其中无效的Peer可以是当前处于故障状态的节点。Wherein, this step is to remove duplicate Peers from each Peer associated with Peer2 and each Peer associated with Peer3. Wherein, this step may also remove the invalid Peer among the Peers using the application protocol fed back by each query source, where the invalid Peer may be a node currently in a fault state.

208A、DPI设备中的爬虫引擎在去掉各查询源反馈的使用该应用协议的Peer中重复的Peer后,将剩下的Peer作为更新的查询源,利用该应用协议的爬虫程序向更新的查询源发送查询请求消息。208A. After the crawler engine in the DPI device removes the repeated Peers among the Peers using the application protocol fed back by each query source, the remaining Peers are used as the updated query source, and the crawler program of the application protocol is used to update the query source. Send query request message.

该实施例假定与Peer2关联的Peer为Peer4、Peer5和Peer6,与Peer3关联的Peer为Peer5、Peer6,则步骤208中的查询源为Peer4、Peer5和Peer6。In this embodiment, it is assumed that the Peers associated with Peer2 are Peer4, Peer5, and Peer6, and the Peers associated with Peer3 are Peer5, Peer6, then the query sources in step 208 are Peer4, Peer5, and Peer6.

209A、DPI设备中的爬虫引擎会接收到针对该查询请求消息的查询响应消息,其包括与上述更新的查询源关联的Peer的信息,返回执行步骤207A。209A. The crawler engine in the DPI device will receive the query response message for the query request message, which includes the Peer information associated with the updated query source, and return to step 207A.

其中,与上述更新的查询源关联的Peer的信息为:使用该应用协议、且与上述更新的查询源拥有相同资源的Peer的地址和该Peer所使用的传输层协议。Wherein, the Peer information associated with the above-mentioned updated query source is: the address of the Peer that uses the application protocol and has the same resources as the above-mentioned updated query source, and the transport layer protocol used by the Peer.

210A、DPI设备中的爬虫引擎向DPI设备中的识别引擎上报识别信息,该识别信息包括:使用该应用协议的Peer的信息和应用协议标识的对应关系。210A. The crawler engine in the DPI device reports identification information to the identification engine in the DPI device, where the identification information includes: the correspondence between the information of the peer using the application protocol and the application protocol identifier.

其中,使用该应用协议的Peer的信息包括:使用该应用协议的Peer的地址和使用该应用协议的Peer所使用的传输层协议。Wherein, the information of the Peer using the application protocol includes: the address of the Peer using the application protocol and the transport layer protocol used by the Peer using the application protocol.

其中,DPI设备中的爬虫引擎上还可以保存各应用协议所对应的老化建议时间,所以,在该步骤中,爬虫引擎向识别引擎上报的该识别信息还可以包括:老化建议时间,老化建议时间表示该对应关系什么时候老化。在该化建议时间到达之后,该使用该应用协议的Peer的信息与该应用协议标识的对应关系不再有效。Wherein, the crawler engine in the DPI device can also save the aging suggestion time corresponding to each application protocol, so in this step, the identification information reported by the crawler engine to the recognition engine can also include: aging suggestion time, aging suggestion time Indicates when the correspondence is aged out. After the suggested time arrives, the corresponding relationship between the Peer information using the application protocol and the application protocol identifier is no longer valid.

211A、DPI设备中的识别引擎接收并保存爬虫引擎发送的识别信息。211A. The recognition engine in the DPI device receives and saves the recognition information sent by the crawler engine.

图2B示出了上述图2A所示实施例所对应的爬虫引擎获取Peer信息的示意图,其中,假定Peer1为预定的查询源,图2B中的“1”表示第一阶段查询操作,即查询Peer1,查询Peer1所得到的与Peer1关联的Peer为Peer2和Peer3;“2”表示第二阶段查询操作,即查询Peer2和Peer3,其中,查询Peer2所得到的与Peer2关联的Peer为Peer4、Peer5和Peer6,查询Peer3所得到的与Peer3关联的Peer为Peer4和Peer5;“3”表示第三阶段查询操作,即查询Peer4、Peer5和Peer6。Fig. 2B shows a schematic diagram of the crawler engine corresponding to the embodiment shown in Fig. 2A above obtaining Peer information, wherein, assuming that Peer1 is a predetermined query source, "1" in Fig. 2B represents the first stage query operation, that is, querying Peer1 , the Peers associated with Peer1 obtained by querying Peer1 are Peer2 and Peer3; "2" indicates the second-stage query operation, that is, querying Peer2 and Peer3, where the Peers associated with Peer2 obtained by querying Peer2 are Peer4, Peer5, and Peer6 , the Peers associated with Peer3 obtained by querying Peer3 are Peer4 and Peer5; "3" indicates the third-stage query operation, that is, querying Peer4, Peer5, and Peer6.

可选的,爬虫引擎是模拟真实的Peer的,在从查询响应中获取到某个Peer的信息后,该爬虫引擎就可以模拟该Peer,主动将该Peer的信息提供给其他Peer,也可以在接收到其他Peer的相关请求后,将该Peer的信息提供给其他Peer。Optionally, the crawler engine simulates a real Peer. After obtaining the information of a certain Peer from the query response, the crawler engine can simulate the Peer and actively provide the information of the Peer to other Peers. After receiving related requests from other Peers, provide the information of the Peer to other Peers.

本发明实施例中利用应用协议的爬虫程序,从作为查询源的网络实体获取到与该查询源相关联的Peer信息,进而将获取到的作为更新的查询源再获取与更新的查询源相关联的Peer信息,通过这种方式,获取到网络中使用该应用协议的Peer信息,将该应用协议的标识和使用该应用协议的Peer信息的对应关系发送给DPI设备中的识别引擎,使DPI设备中的识别引擎利用该对应关系识别数据包所属的应用协议,这样,DPI设备就不需要解密数据包,降低性能开销,DPI设备也不用离线分析该应用协议的码流的具体字符特征,降低了码流识别的时间。In the embodiment of the present invention, the crawler program using the application protocol obtains the Peer information associated with the query source from the network entity as the query source, and then associates the obtained updated query source with the updated query source In this way, the Peer information using the application protocol in the network is obtained, and the corresponding relationship between the identification of the application protocol and the Peer information using the application protocol is sent to the identification engine in the DPI device, so that the DPI device The identification engine in the application uses the corresponding relationship to identify the application protocol to which the data packet belongs. In this way, the DPI device does not need to decrypt the data packet, reducing performance overhead, and the DPI device does not need to analyze the specific character characteristics of the code stream of the application protocol offline, which reduces the The time of stream identification.

参阅图3,如下实施例是本发明实施例提供的另一种向DPI提供用于识别数据包的信息的方法,该方法具体包括:Referring to FIG. 3, the following embodiment is another method for providing DPI with information for identifying a data packet provided by an embodiment of the present invention. The method specifically includes:

301、DPI设备中的识别引擎接收到数据包后,查询本地保存的应用协议标识和网络实体信息的对应关系,如果根据所保存的对应关系,不能确定该数据包对应的网络实体所对应的应用协议标识,则向DPI设备中的爬虫引擎发送该数据包对应的网络实体信息,其中,该数据包所对应的网络实体是需要识别应用协议的网络实体。301. After receiving the data packet, the identification engine in the DPI device queries the correspondence between the application protocol identifier and the network entity information stored locally. If the corresponding application of the network entity corresponding to the data packet cannot be determined according to the stored correspondence protocol identification, send the network entity information corresponding to the data packet to the crawler engine in the DPI device, wherein the network entity corresponding to the data packet is a network entity that needs to identify the application protocol.

其中,该数据包所对应的网络实体可以是该数据包的源端和/或目的端。Wherein, the network entity corresponding to the data packet may be the source end and/or the destination end of the data packet.

可选的,DPI设备中的识别引擎也可以在根据所保存的对应关系,不能确定该数据包对应的网络实体所对应的应用协议标识后,可以采用深度包检测技术识别该数据包所属的应用协议,即根据该数据包中的特征字符或者特征行为等进行识别,如果在预定时间内不能成功识别该数据包所属的应用协议,则向DPI设备中的爬虫引擎发送该数据包对应的网络实体信息。Optionally, the identification engine in the DPI device can also use the deep packet inspection technology to identify the application to which the data packet belongs after it cannot determine the application protocol identifier corresponding to the network entity corresponding to the data packet according to the stored correspondence. Protocol, that is, to identify according to the characteristic characters or characteristic behaviors in the data packet. If the application protocol to which the data packet belongs cannot be successfully identified within a predetermined time, the network entity corresponding to the data packet will be sent to the crawler engine in the DPI device. information.

302、DPI设备中的爬虫引擎顺序调用爬虫程序集合中应用协议的爬虫程序向需要识别应用协议的网络实体发送探测请求消息,直到接收到该网络实体返回的表示探测成功的响应消息为止。302. The crawler engine in the DPI device sequentially invokes the crawler program of the application protocol in the crawler program set to send a detection request message to the network entity that needs to identify the application protocol, until it receives a response message indicating that the detection is successful returned by the network entity.

比如,DPI设备中的爬虫引擎可以调用爬虫程序集合中的BT程序向需要识别应用协议的网络实体发送探测请求消息,如果接收到该需要识别应用协议的网络实体发送的表示探测成功的响应消息,则执行后续303;如果没有接收到该需要识别应用协议的网络实体发送的表示探测成功的响应消息,则继续调用QQ程序向需要识别应用协议的网络实体发送探测请求消息,直到接收到该需要识别应用协议的网络实体发送的表示探测成功的响应消息为止。For example, the crawler engine in the DPI device can call the BT program in the crawler program set to send a detection request message to the network entity that needs to identify the application protocol. Then perform the follow-up 303; if the response message indicating that the detection is successful sent by the network entity that needs to identify the application protocol is not received, continue to call the QQ program to send a detection request message to the network entity that needs to identify the application protocol until the need to identify Until the response message sent by the network entity of the application protocol indicating that the detection is successful.

303、DPI设备中的爬虫引擎建立对应响应消息的探测请求消息所使用的应用协议的标识与需要识别应用协议的网络实体的信息的对应关系。303. The crawler engine in the DPI device establishes a corresponding relationship between the identifier of the application protocol used in the detection request message corresponding to the response message and the information of the network entity that needs to identify the application protocol.

续上述实例,如果调用QQ程序向需要识别应用协议的网络实体发送探测请求消息时,接收到需要识别应用协议的网络实体发送的表示探测成功的响应消息,则建立QQ程序标识与需要识别应用协议的网络实体的信息的对应关系,即建立QQ程序标识与需要识别应用协议的网络实体的地址和其所使用的传输层协议标识的对应关系。Continuing the above example, if the QQ program is invoked to send a detection request message to the network entity that needs to identify the application protocol, and a response message indicating that the detection is successful is received from the network entity that needs to identify the application protocol, then establish the QQ program ID and the application protocol that needs to be identified The corresponding relationship between the information of the network entity, that is, the establishment of the corresponding relationship between the QQ program identifier, the address of the network entity that needs to identify the application protocol, and the identifier of the transport layer protocol used by it.

304、DPI设备中的爬虫引擎向DPI设备中的识别引擎发送识别信息,该识别信息中包括该对应关系,可选的,该识别信息中还可以包括该对应关系的老化建议时间。304. The crawler engine in the DPI device sends identification information to the identification engine in the DPI device, where the identification information includes the corresponding relationship. Optionally, the identification information may also include an aging suggestion time for the corresponding relationship.

本发明实施例中顺序调用爬虫程序集合中应用协议的爬虫程序向需要识别应用协议的网络实体发送探测请求消息,直到接收到该网络实体返回的表示探测成功的响应消息为止,建立该网络实体信息与对应响应消息的探测请求消息所使用的应用协议标识的对应关系并发送给DPI设备,使DPI设备中的识别引擎利用该对应关系识别数据包所属的应用协议,这样,DPI设备就不需要解密数据包,降低了性能开销,DPI设备也不用离线分析该应用协议的码流的具体字符特征,降低了码流识别的时间。In the embodiment of the present invention, the crawler program of the application protocol in the crawler program set is sequentially called to send a detection request message to the network entity that needs to identify the application protocol, until the response message returned by the network entity indicating that the detection is successful is received, and the information of the network entity is established. The corresponding relationship with the application protocol identifier used in the detection request message of the corresponding response message is sent to the DPI device, so that the identification engine in the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs, so that the DPI device does not need to decrypt The data packet reduces the performance overhead, and the DPI device does not need to analyze the specific character characteristics of the code stream of the application protocol offline, which reduces the time for code stream identification.

图4示出了本发明实施例提供的一种DPI设备识别数据包所属应用协议的方法,其中,该实施例中的网络实体可以是Peer(对等网络实体)其包括:FIG. 4 shows a method for a DPI device to identify the application protocol to which a data packet belongs according to an embodiment of the present invention, wherein the network entity in this embodiment may be a Peer (peer-to-peer network entity), which includes:

401、DPI设备中的识别引擎接收到数据包,该数据包中携带五元组。401. The recognition engine in the DPI device receives a data packet, and the data packet carries a quintuple.

该五元组包括:源网络实体的IP地址、目的网络实体的IP地址、源网络实体的端口号、目的网络实体的端口号和传输层协议标识。The five-tuple includes: the IP address of the source network entity, the IP address of the destination network entity, the port number of the source network entity, the port number of the destination network entity, and the transport layer protocol identifier.

其中,源网络实体是发出该数据包的网络实体,目的网络实体是接收该数据包的网络实体。Wherein, the source network entity is the network entity that sends out the data packet, and the destination network entity is the network entity that receives the data packet.

402、DPI设备中的识别引擎根据已保存的对应关系,确定该数据包所使用的应用协议。402. The identification engine in the DPI device determines the application protocol used by the data packet according to the stored correspondence.

其中,已保存的对应关系包括:应用协议的标识和使用该应用协议的网络实体的信息的对应关系,其中,使用该应用协议的网络实体的信息包括:使用该应用协议的网络实体的地址,和该网络实体所使用的传输层协议标识,网络实体的地址包括该网络实体的IP地址和端口号。Wherein, the saved corresponding relationship includes: the corresponding relationship between the identifier of the application protocol and the information of the network entity using the application protocol, wherein the information of the network entity using the application protocol includes: the address of the network entity using the application protocol, and the identifier of the transport layer protocol used by the network entity, the address of the network entity includes the IP address and port number of the network entity.

具体的,该步骤将五元组和网络实体的信息进行比较,如果网络实体的信息中的传输层协议标识与五元组中的传输层协议标识所表示的传输层协议相同,且,如果网络实体的地址与五元组中的源网络实体的IP地址和端口号相同,或者与目的网络实体的IP地址和端口号相同,则确定该数据包使用的应用协议为该对应关系中该网络实体所对应的应用协议。Specifically, this step compares the five-tuple with the information of the network entity, if the transport layer protocol identifier in the information of the network entity is the same as the transport layer protocol represented by the transport layer protocol identifier in the five-tuple, and, if the network The address of the entity is the same as the IP address and port number of the source network entity in the quintuple, or the same as the IP address and port number of the destination network entity, then it is determined that the application protocol used by the data packet is the network entity in the corresponding relationship The corresponding application protocol.

需要说明的是,如果网络实体的信息中的传输层协议标识与五元组中的传输层协议标识所表示的传输层协议不相同,或者,如果网络实体的地址与五元组中的源网络实体和目的网络实体的IP地址和端口号都不相同,则可以根据数据包中的特征字符或者特征行为识别出该数据包所适用的应用协议,具体的如何根据数据包中的特征字符或者特征行为进行识别是本领域公知常识,在此不再赘述。It should be noted that if the transport layer protocol identifier in the information of the network entity is different from the transport layer protocol indicated by the transport layer protocol identifier in the quintuple, or if the address of the network entity is the same as the source network in the quintuple If the IP address and port number of the entity and the destination network entity are different, the application protocol applicable to the data packet can be identified according to the characteristic characters or characteristic behaviors in the data packet. Behavior recognition is common knowledge in the art, and will not be repeated here.

为了使上述技术方案更加清楚,如下举实例进行说明:假定识别引擎已建立的对应关系为:In order to make the above-mentioned technical solution clearer, the following examples are given to illustrate: Assume that the corresponding relationship established by the recognition engine is:

TCP 192.168.0.1:5566<-->Bittorent_DATA 1800;TCP 192.168.0.1:5566<-->Bittorent_DATA 1800;

UDP 192.168.0.1:5566<-->Bittorent_DHT_Control 1801;UDP 192.168.0.1:5566<-->Bittorent_DHT_Control 1801;

UDP 192.168.0.16:5566<-->Bittorent_DHT_Control 1801;UDP 192.168.0.16:5566<-->Bittorent_DHT_Control 1801;

TCP 192.168.0.16:5566<-->Bittorent_DATA 1800;TCP 192.168.0.16:5566<-->Bittorent_DATA 1800;

其中,TCP和UDP为传输层协议的名称;Bittorent_DATA、DHT_Control分别为应用协议的名称,1800、1801分别为应用协议的ID。Among them, TCP and UDP are the names of the transport layer protocols; Bittorent_DATA and DHT_Control are the names of the application protocols respectively, and 1800 and 1801 are the IDs of the application protocols respectively.

假定接收到的数据包中携带的网络实体所使用的传输层协议为TCP协议,源网络实体IP地址为192.168.0.16,端口号为5566;则识别引擎根据已建立的对应关系,查找到该网络实体对应的应用协议名称为Bittorent_DATA、应用协议的ID为1800。Assume that the transport layer protocol used by the network entity carried in the received data packet is the TCP protocol, the IP address of the source network entity is 192.168. The name of the application protocol corresponding to the entity is Bittorent_DATA, and the ID of the application protocol is 1800.

需要说明的是,上述识别引擎和爬虫引擎可以部署在同一个设备上,即DPI设备上,也可以部署在不同的设备上,即识别引擎部署在DPI设备上,爬虫引擎部署在不同的设备上。当识别引擎和爬虫引擎部署在不同的设备上时,两者可以位于同一网络中,也可以部署在不同的网络中。可以根据一些特殊性需求来部署识别引擎和爬虫引擎,比如,因为识别引擎会部署在DPI设备上,而DPI设备对加密数据包也具有解密的功能,如果DPI部署在运营商的机房,可能会收集到一些运营商的敏感信息,因此有些运营商不允许DPI设备私自访问运营商的外部网络,防止DPI设备向外泄露敏感信息,在这种情况下,如果DPI设备想收集爬虫引擎的信息,就需要部署在运营商的外部网络中。It should be noted that the above recognition engine and crawler engine can be deployed on the same device, that is, the DPI device, or they can be deployed on different devices, that is, the recognition engine is deployed on the DPI device, and the crawler engine is deployed on different devices. . When the recognition engine and the crawler engine are deployed on different devices, they can be located in the same network or in different networks. The recognition engine and crawler engine can be deployed according to some special requirements. For example, because the recognition engine will be deployed on the DPI device, and the DPI device also has the function of decrypting encrypted data packets. If the DPI is deployed in the operator's computer room, it may be Sensitive information of some operators is collected, so some operators do not allow DPI devices to privately access the external network of operators to prevent DPI devices from leaking sensitive information. In this case, if DPI devices want to collect crawler engine information, It needs to be deployed in the operator's external network.

需要说明的是,Bittorent等协议可以细分为多个子协议,比如Bittorent非DHT(Distributed Hash Table,分布式哈希表)协议和Bittorent DHT协议。如下两个实施例分别以应用协议为Bittorent非DHT(Distributed Hash Table,分布式哈希表)协议,和Bittorent DHT协议为例,分别描述爬虫引擎获取Peer信息及向识别引擎上报识别信息的过程。It should be noted that protocols such as Bittorent can be subdivided into multiple sub-protocols, such as Bittorent non-DHT (Distributed Hash Table, distributed hash table) protocol and Bittorent DHT protocol. The following two embodiments take the application protocol as Bittorent non-DHT (Distributed Hash Table, Distributed Hash Table) protocol and Bittorent DHT protocol as examples respectively to describe the process of crawler engine obtaining Peer information and reporting identification information to identification engine.

参阅图6,该实施例以应用协议为Bittorent非DHT协议、网络实体为Peer(对应网络实体)为例描述爬虫引擎获取Peer信息及向识别引擎上报识别信息的过程,该实施例中基于特定资源搜索Peer信息,特定资源可以是预定配置的,也可以是从其他设备上获取的,其中,获取Peer信息及向识别引擎上报识别信息的过程具体包括:Referring to Fig. 6, this embodiment describes the process that the crawler engine acquires Peer information and reports the identification information to the identification engine by taking the application protocol as the Bittorent non-DHT protocol and the network entity as Peer (corresponding network entity) as an example. In this embodiment, based on specific resources To search for Peer information, specific resources can be pre-configured or obtained from other devices. The process of obtaining Peer information and reporting identification information to the identification engine specifically includes:

601、爬虫引擎利用应用协议的爬虫程序,创建搜索请求,利用该搜索请求搜索特定资源,搜索到该特定资源的种子文件。601. The crawler engine uses the crawler program of the application protocol to create a search request, uses the search request to search for a specific resource, and searches for a torrent file of the specific resource.

其中,特定资源可以是视频资源,比如电影建国大业,也可以是音频资源,不影响本发明的实现。Wherein, the specific resource may be a video resource, such as the film The Founding of a Nation, or an audio resource, which does not affect the realization of the present invention.

602、爬虫引擎通过解析种子文件提取到Tracker信息和Peer信息,将Peer信息写入Peer列表。602. The crawler engine extracts Tracker information and Peer information by parsing the seed file, and writes the Peer information into the Peer list.

其中,该步骤中提取到的Peer信息为使用该应用协议的Peer的信息。具体的,Peer信息包括:Peer的地址和该Peer所使用的传输层协议标识。Wherein, the Peer information extracted in this step is the information of the Peer using the application protocol. Specifically, the Peer information includes: the address of the Peer and the identification of the transport layer protocol used by the Peer.

603、爬虫引擎利用应用协议的爬虫程序创建查询请求,根据提取到的Tracker信息,向相应的Tracker发送查询请求,Tracker向爬虫引擎返回查询响应,其包括:使用该应用协议且被该Tracker追踪到的Peer的地址和该Peer所使用的传输层协议标识。603. The crawler engine uses the crawler program of the application protocol to create a query request, sends a query request to the corresponding Tracker according to the extracted Tracker information, and the Tracker returns a query response to the crawler engine, which includes: use the application protocol and be tracked by the Tracker The address of the Peer and the transport layer protocol identifier used by the Peer.

可选的,该步骤中,爬虫引擎还可以判断接收到的查询响应的有效性,当查询响应有效时,执行步骤605,否则结束本流程。其中,爬虫引擎判断接收到的查询响应的有效性的具体方式可以是:通过所接收的查询响应的消息格式判断该查询响应是否有效;也可以是通过所接收到的查询响应中的内容判断该查询响应是否有效;也可以是根据查询请求和查询响应的交互流程判断该查询响应是否有效,还可以通过其他方式判断该查询响应是否有效,不影响本发明的实现。Optionally, in this step, the crawler engine can also judge the validity of the received query response, and when the query response is valid, execute step 605, otherwise end the process. Wherein, the specific method for the crawler engine to judge the validity of the received query response may be: judge whether the query response is valid through the message format of the received query response; or judge the validity of the query response through the content of the received query response Whether the query response is valid; it can also be judged whether the query response is valid according to the interaction process of the query request and the query response, or it can be judged whether the query response is valid by other means, which does not affect the realization of the present invention.

604、爬虫引擎将步骤602中提取到的Peer作为查询源,利用应用协议的爬虫程序创建查询请求,向作为查询源的Peer发送查询请求,该Peer向爬虫引擎返回查询响应,其包括:与该Peer关联的Peer的地址及其所使用的传输层协议。604. The crawler engine uses the Peer extracted in step 602 as a query source, uses the crawler program of the application protocol to create a query request, and sends a query request to the Peer as the query source, and the Peer returns a query response to the crawler engine, which includes: The address of the Peer associated with the Peer and the transport layer protocol used by the Peer.

其中,与该Peer关联的Peer为使用该应用协议、且与该Peer拥有相同资源的Peer。Wherein, the Peer associated with the Peer is a Peer that uses the application protocol and has the same resources as the Peer.

可选的,该步骤中,爬虫引擎还可以判断接收到的查询响应的有效性,当查询响应有效时,后续再将与该Peer关联的Peer的地址和所使用的传输层协议写入Peer列表。其中,爬虫引擎判断接收到的查询响应的有效性的具体方式与步骤603中的相同,在此不再赘述。Optionally, in this step, the crawler engine can also judge the validity of the received query response, and when the query response is valid, subsequently write the address of the Peer associated with the Peer and the transport layer protocol used into the Peer list . Wherein, the specific manner of judging the validity of the received query response by the crawler engine is the same as that in step 603 and will not be repeated here.

605、爬虫引擎去掉步骤603、步骤604中重复的Peer的信息和无效的Peer的信息,将剩下的Peer信息写入Peer列表。605. The crawler engine removes the repeated Peer information and invalid Peer information in steps 603 and 604, and writes the remaining Peer information into the Peer list.

606、爬虫引擎判断Peer列表中所写入的Peer的数量是否达到了阈值,如果是,执行608,如果否,执行607;606. The crawler engine judges whether the number of Peers written in the Peer list has reached the threshold, if yes, execute 608, if not, execute 607;

607、爬虫引擎将上述剩下的Peer(即有效的Peer)作为更新的查询源,返回执行步骤604,利用应用协议的爬虫程序创建查询请求,向更新的查询源发送查询请求。607. The crawler engine takes the above remaining Peers (ie effective Peers) as the updated query source, returns to step 604, uses the crawler program of the application protocol to create a query request, and sends the query request to the updated query source.

608、爬虫引擎向识别引擎发送识别信息,识别信息中包括:Peer列表中的Peer信息、应用协议的标识和老化建议时间。608. The crawler engine sends identification information to the identification engine, and the identification information includes: the Peer information in the Peer list, the identification of the application protocol, and the aging suggestion time.

其中,Peer列表中的Peer信息包括:Peer的地址和该Peer所使用的传输层协议标识。Wherein, the Peer information in the Peer list includes: the address of the Peer and the identification of the transport layer protocol used by the Peer.

其中,该步骤与前面的步骤可以在同一线程中实现,也可以在不同线程或者进程中实现,不影响本发明的实现。Wherein, this step and the previous steps can be implemented in the same thread, or in different threads or processes, without affecting the implementation of the present invention.

参阅图7,该实施例以应用协议为Bittorent DHT协议、网络实体为Peer(对应网络实体)为例描述爬虫引擎获取Peer信息及向识别引擎上报识别信息的过程包括:Referring to Fig. 7, this embodiment is that the application protocol is the Bittorent DHT protocol, and the network entity is Peer (corresponding network entity) as an example to describe the process that the crawler engine obtains Peer information and reports identification information to the identification engine and includes:

701、爬虫引擎利用应用协议的爬虫程序创建查询请求,向已知的Peer发送查询请求,该Peer向爬虫引擎返回查询响应,其包括:与该Peer关联的Peer的信息。701. The crawler engine uses the crawler program of the application protocol to create a query request, and sends the query request to a known Peer, and the Peer returns a query response to the crawler engine, which includes: information about the Peer associated with the Peer.

其中,与该Peer关联的Peer为:使用该应用协议、且与该Peer拥有相同资源的Peer;与该Peer关联的Peer的信息包括:与该Peer关联的Peer的地址和与该Peer关联的Peer所使用的传输层协议标识。Wherein, the Peer associated with the Peer is: the Peer that uses the application protocol and has the same resources as the Peer; the information of the Peer associated with the Peer includes: the address of the Peer associated with the Peer and the Peer associated with the Peer The identifier of the transport layer protocol used.

其中,该步骤中已知的Peer为预定的Peer,或者接收到的查询响应消息中携带的有效的、且没有做过查询源的Peer。其中,预定的Peer是预定的作为该Bittorent DHT协议的查询源的Peer。Wherein, the known Peer in this step is a predetermined Peer, or a valid Peer carried in the received query response message and has not been a query source. Wherein, the predetermined Peer is the Peer that is predetermined as the query source of the Bittorent DHT protocol.

可选的,爬虫引擎接收到查询响应之后,可以判断查询响应的有效性,当该查询响应有效时,其中,判断查询响应的有效性的方式与上述实施例相应描述相同,在此不再赘述。当该查询响应有效时,再执行后续步骤。Optionally, after the crawler engine receives the query response, it can judge the validity of the query response. When the query response is valid, the method of judging the validity of the query response is the same as the corresponding description in the above embodiment, and will not be repeated here. . When the query response is valid, perform subsequent steps.

702、爬虫引擎去掉步骤701中无效的Peer的信息,将剩下的有效的Peer地址信息写入Peer列表。702. The crawler engine removes the invalid Peer information in step 701, and writes the remaining valid Peer address information into the Peer list.

703、爬虫引擎判断Peer列表中所写入的Peer的数量是否达到了阈值,如果是,执行705,如果否,执行704。703. The crawler engine judges whether the number of Peers written in the Peer list has reached the threshold, if yes, execute 705, and if not, execute 704.

704、爬虫引擎将查询响应反馈的Peer中有效的、且没有做过查询源的Peer作为更新的查询源,返回执行步骤701利用应用协议的爬虫程序创建查询请求,向更新的查询源发送查询请求。704. The crawler engine takes the Peer that is valid among the peers fed back by the query response and has not been used as a query source as an updated query source, and returns to step 701 to create a query request using the crawler program of the application protocol, and sends a query request to the updated query source .

705、爬虫引擎向识别引擎发送识别信息,识别信息中包括:Peer列表中的Peer信息、应用协议的标识和老化建议时间。705. The crawler engine sends identification information to the identification engine, and the identification information includes: the Peer information in the Peer list, the identification of the application protocol, and the aging suggestion time.

其中,该步骤与前面的步骤可以在同一线程中实现,也可以在不同线程或者进程中实现,不影响本发明的实现。Wherein, this step and the previous steps can be implemented in the same thread, or in different threads or processes, without affecting the implementation of the present invention.

可选的,识别引擎获取到识别信息之后,可以将其共享给其他识别引擎使用,其中,识别信息包括:应用协议的标识与使用该应用协议的网络实体的信息的对应关系,该识别信息还可以包括:该对应关系的老化建议时间。具体的,可以有如下两种共享方式:Optionally, after the identification engine obtains the identification information, it can share it with other identification engines, wherein the identification information includes: the correspondence between the identification of the application protocol and the information of the network entity using the application protocol, and the identification information also It may include: the aging suggestion time of the corresponding relationship. Specifically, there are two sharing methods as follows:

1、各DPI设备中的识别引擎将识别信息上报给信息共享控制中心,以供其他DPI设备中的识别引擎从该信息共享控制中心获取该识别信息,如图8所示。1. The recognition engine in each DPI device reports the recognition information to the information sharing control center, so that the recognition engines in other DPI devices can obtain the recognition information from the information sharing control center, as shown in FIG. 8 .

2、各DPI设备中的识别引擎彼此通告识别信息,如图9所示。2. The recognition engines in each DPI device notify each other of the recognition information, as shown in FIG. 9 .

其中,DPI设备中的识别引擎可以通过如下方式获取识别信息:第一种方式:爬虫引擎向识别引擎发送的识别信息,如上述各实施例中所述;第二种方式:DPI设备中的识别引擎根据数据包中的特征字符或者特征行为识别出该数据包所适用的应用协议,记录识别信息,其包括:该应用协议的标识与发送和/或接收该数据包的网络实体的信息的对应关系。可选的,该识别信息还可以包括:老化建议时间。Among them, the recognition engine in the DPI device can obtain the recognition information in the following ways: the first way: the recognition information sent by the crawler engine to the recognition engine, as described in the above-mentioned embodiments; the second way: the recognition information in the DPI device The engine identifies the application protocol applicable to the data packet according to the characteristic characters or characteristic behaviors in the data packet, and records the identification information, which includes: the correspondence between the identification of the application protocol and the information of the network entity that sends and/or receives the data packet relation. Optionally, the identification information may also include: aging suggestion time.

参阅图10,本发明实施例提供一种网络爬虫,其包括:Referring to Fig. 10, the embodiment of the present invention provides a kind of web crawler, and it comprises:

建立单元50,用于利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;The establishing unit 50 is configured to use a crawler program of an application protocol to establish a corresponding relationship between the application protocol and the network entity information using the application protocol;

发送单元60,用于向DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议;其中,使用所述应用协议的网络实体信息包括:使用所述应用协议的网络实体的地址和使用所述应用协议的网络实体所使用的传输层协议标识。The sending unit 60 is configured to send identification information including the corresponding relationship to the DPI device, so that the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs; wherein, the network entity information using the application protocol includes: The address of the network entity using the application protocol and the identifier of the transport layer protocol used by the network entity using the application protocol.

其中,对于网络实体的信息等的描述参见方法实施例步骤101下的相关描述,在此不再赘述。For the description of the information of the network entity, etc., refer to the relevant description under step 101 of the method embodiment, and details are not repeated here.

本发明实施例中利用应用协议的爬虫程序来建立该应用协议和使用该应用协议的网络实体的信息的对应关系,并发送给DPI设备,使DPI设备利用该对应关系识别数据包所属的应用协议,这样,DPI设备就不需要解密数据包,降低性能开销,DPI设备也不用离线分析该应用协议的码流的具体字符特征,降低了码流识别的时间。In the embodiment of the present invention, the crawler program of the application protocol is used to establish the corresponding relationship between the application protocol and the information of the network entity using the application protocol, and send it to the DPI device, so that the DPI device can use the corresponding relationship to identify the application protocol to which the data packet belongs In this way, the DPI device does not need to decrypt the data packets, reducing performance overhead, and the DPI device does not need to analyze the specific character characteristics of the code stream of the application protocol offline, reducing the time for code stream identification.

参阅图11,在一种实施方式中,建立单元50具体包括:查询源确定单元51,用于确定作为查询源的网络实体;其中,所述作为查询源的网络实体使用所述应用协议;查询单元52,用于利用应用协议的爬虫程序从所述查询源获取与所述查询源关联的网络实体的信息;其中,与所述查询源关联的网络实体为使用所述应用协议、且与所述查询源拥有相同资源的网络实体;和查询源更新单元53,用于当没有达到查询结束条件时,将与所述查询源关联的网络实体中没有做过查询源的网络实体作为更新的查询源,将所述更新的查询源发送给所述查询单元,触发所述查询单元利用应用协议的爬虫程序从作为更新的查询源的网络实体获取与所述更新的查询源关联的网络实体的信息;对应关系建立单元54,用于当达到查询结束条件时,建立所述应用协议和所述查询单元获取到的网络实体的信息和所述查询源确定单元确定的作为查询源的网络实体的信息的对应关系。具体的,查询源确定单元51,用于根据DPI设备发送的网络实体信息与应用协议标识的映射关系,确定所述映射关系中应用协议标识所对应的网络实体是作为查询源的网络实体,其中,所述DPI设备是利用深度包检测技术确定的网络实体信息与应用协议标识的映射关系。或者,查询源确定单元51确定从特定资源的种子文件中提取出的、且使用该应用协议的网络实体是作为查询源的网络实体;或者,查询源确定单元51确定从特定资源的种子文件提取出的跟踪器Tracker所追踪到的、且使用该应用协议的网络实体是作为查询源的网络实体。Referring to FIG. 11 , in one embodiment, the establishment unit 50 specifically includes: a query source determination unit 51, configured to determine a network entity as a query source; wherein, the network entity as a query source uses the application protocol; query Unit 52, configured to use the crawler program of the application protocol to obtain information of the network entity associated with the query source from the query source; wherein, the network entity associated with the query source uses the application protocol and is related to the query source. The query source has a network entity with the same resource; and a query source updating unit 53, configured to use a network entity that has not been a query source among network entities associated with the query source as an updated query when the query end condition is not reached source, sending the updated query source to the query unit, triggering the query unit to use the crawler program of the application protocol to acquire information of the network entity associated with the updated query source from the network entity as the updated query source The corresponding relationship establishment unit 54 is used to establish the application protocol and the information of the network entity obtained by the query unit and the information of the network entity determined by the query source determination unit as the query source when the query end condition is reached corresponding relationship. Specifically, the query source determining unit 51 is configured to determine, according to the mapping relationship between the network entity information sent by the DPI device and the application protocol identifier, that the network entity corresponding to the application protocol identifier in the mapping relationship is the network entity serving as the query source, wherein , the DPI device is a mapping relationship between network entity information and application protocol identifiers determined by using a deep packet inspection technology. Or, the query source determining unit 51 determines that the network entity extracted from the seed file of the specific resource and using the application protocol is the network entity as the query source; or, the query source determining unit 51 determines that the network entity extracted from the seed file of the specific resource The network entity tracked by the tracker Tracker and using the application protocol is the network entity serving as the query source.

这种实施方式通过顺序利用应用协议的爬虫程序,从查询源获取到与该查询源相关联的Peer信息,进而将获取到的作为更新的查询源再,再获取与更新的查询源相关联的Peer信息,通过这种方式,获取到网络中使用该应用协议的Peer信息,将该应用协议和使用该应用协议的Peer信息的对应关系发送给DPI设备中的识别引擎,使DPI设备中的识别引擎利用该对应关系识别数据包所属的应用协议。This implementation method uses the crawler program of the application protocol in order to obtain the Peer information associated with the query source from the query source, and then uses the obtained query source as an updated query source, and then obtains the peer information associated with the updated query source. Peer information, in this way, the Peer information using the application protocol in the network is obtained, and the corresponding relationship between the application protocol and the Peer information using the application protocol is sent to the identification engine in the DPI device, so that the identification engine in the DPI device The engine uses this correspondence to identify the application protocol to which the data packet belongs.

参阅图12,在另一种实施方式中,建立单元50具体包括:调用单元56,用于顺序调用爬虫程序集合中应用协议的爬虫程序向网络中的网络实体发送探测请求消息,直到接收到所述网络实体返回的表示探测成功的响应消息为止;对应关系建立单元57,用于建立对应响应消息的探测请求消息所使用的应用协议与所述网络实体的信息的对应关系。在这种方式中,该网络爬虫还包括:接收单元61,用于接收DPI设备发送的需要识别应用协议的网络实体的信息;调用单元56,用于顺序调用爬虫程序集合中应用协议的爬虫程序向所述需要识别应用协议的网络实体发送探测请求消息,直到接收到所述需要识别应用协议的网络实体返回的表示探测成功的响应消息为止。这种实施方式通过顺序调用爬虫程序集合中应用协议的爬虫程序向需要识别应用协议的网络实体发送探测请求消息,直到接收到该网络实体返回的表示探测成功的响应消息为止,建立该网络实体信息与对应响应消息的探测请求消息所使用的应用协议的对应关系并发送给DPI设备,使DPI设备中的识别引擎利用该对应关系识别数据包所属的应用协议。Referring to FIG. 12, in another embodiment, the establishing unit 50 specifically includes: a calling unit 56, configured to sequentially call the crawler program of the application protocol in the crawler program set to send a detection request message to the network entity in the network until the received until the response message returned by the network entity indicating that the detection is successful; the correspondence relationship establishing unit 57 is configured to establish a correspondence relationship between the application protocol used in the detection request message corresponding to the response message and the information of the network entity. In this way, the web crawler also includes: a receiving unit 61, configured to receive the information sent by the DPI device that needs to identify the network entity of the application protocol; a calling unit 56, used to sequentially invoke the crawler program of the application protocol in the crawler program set Sending a probe request message to the network entity that needs to identify the application protocol until receiving a response message indicating that the probe is successful returned by the network entity that needs to identify the application protocol. In this embodiment, the crawler program of the application protocol in the crawler program set is sequentially called to send a detection request message to the network entity that needs to identify the application protocol, until the response message returned by the network entity indicating that the detection is successful is received, and the information of the network entity is established. The corresponding relationship with the application protocol used by the detection request message corresponding to the response message is sent to the DPI device, so that the identification engine in the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs.

在这种实施方式中,通过顺序调用爬虫程序集合中应用协议的爬虫程序向需要识别应用协议的网络实体发送探测请求消息,直到接收到该网络实体返回的表示探测成功的响应消息为止,建立该网络实体信息与对应响应消息的探测请求消息所使用的应用协议的对应关系并发送给DPI设备,使DPI设备中的识别引擎利用该对应关系识别数据包所属的应用协议。In this embodiment, the crawler program of the application protocol in the crawler program set is sequentially invoked to send a detection request message to the network entity that needs to identify the application protocol, until the response message returned by the network entity indicating that the detection is successful is received. The corresponding relationship between the network entity information and the application protocol used in the detection request message corresponding to the response message is sent to the DPI device, so that the identification engine in the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs.

参阅图13,本发明实施例提供一种网络系统,其特征在于,包括:网络爬虫70和DPI设备,其中,Referring to FIG. 13 , an embodiment of the present invention provides a network system, which is characterized in that it includes: a web crawler 70 and a DPI device, wherein,

网络爬虫70,用于利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;其中,对于网络实体的信息等的描述参见方法实施例步骤101下的相关描述,在此不再赘述。The web crawler 70 is configured to use a crawler program of an application protocol to establish a corresponding relationship between the application protocol and the information of the network entity using the application protocol; wherein, for the description of the information of the network entity, etc., refer to the step 101 of the method embodiment Relevant descriptions will not be repeated here.

DPI设备80,用于接收所述爬虫引擎发送的所述识别信息,利用所述对应关系识别数据包所属的应用协议。The DPI device 80 is configured to receive the identification information sent by the crawler engine, and use the corresponding relationship to identify the application protocol to which the data packet belongs.

其中,网络爬虫70可以集成在DPI设备80上。网络爬虫70的结构与上述图10、图11和图12所示实施例相似,在此不再赘述。Wherein, the web crawler 70 can be integrated on the DPI device 80 . The structure of the web crawler 70 is similar to the above-mentioned embodiments shown in FIG. 10 , FIG. 11 and FIG. 12 , and will not be repeated here.

为了实现上述识别信息的共享,在一种实施方式中,该网络系统还包括:信息共享控制中心90,其中,In order to realize the above identification information sharing, in one embodiment, the network system further includes: an information sharing control center 90, wherein,

DPI设备70,还用于将所述网络爬虫发送的所述识别信息发送给信息共享控制中心90。信息共享控制中心90,用于接收DPI设备70发送的识别信息,供网络中除了所述DPI设备70以外的其他DPI设备从所述信息共享控制中心90获取所述识别信息。可选的,DPI设备70还可以将通过深度包检测技术得到的识别信息发送给信息共享控制中心,其中,通过深度包检测技术得到的识别信息包括:数据包的源端和/或目的端的地址和所使用的传输层协议与该数据包所属应用协议的对应关系,可选的,其还可以包括:该对应关系的老化建议时间。The DPI device 70 is further configured to send the identification information sent by the web crawler to the information sharing control center 90 . The information sharing control center 90 is configured to receive the identification information sent by the DPI device 70 for other DPI devices in the network except the DPI device 70 to obtain the identification information from the information sharing control center 90 . Optionally, the DPI device 70 may also send the identification information obtained through the deep packet inspection technology to the information sharing control center, wherein the identification information obtained through the deep packet inspection technology includes: the address of the source end and/or destination end of the data packet and the correspondence between the transport layer protocol used and the application protocol to which the data packet belongs, optionally, it may also include: an aging suggestion time for the correspondence.

在另一种实施方式中,所述DPI设备70还用于将所述网络爬虫发送的所述识别信息发送给网络中除所述DPI设备以外的其他DPI设备。In another implementation manner, the DPI device 70 is further configured to send the identification information sent by the web crawler to other DPI devices in the network except the DPI device.

本发明实施例中利用应用协议的爬虫程序来建立该应用协议和使用该应用协议的网络实体的信息的对应关系,并发送给DPI设备,使DPI设备利用该对应关系识别数据包所属的应用协议,这样,DPI设备就不需要解密数据包,降低性能开销,DPI设备也不用离线分析该应用协议的码流的具体字符特征,降低了码流识别的时间。In the embodiment of the present invention, the crawler program of the application protocol is used to establish the corresponding relationship between the application protocol and the information of the network entity using the application protocol, and send it to the DPI device, so that the DPI device can use the corresponding relationship to identify the application protocol to which the data packet belongs In this way, the DPI device does not need to decrypt the data packets, reducing performance overhead, and the DPI device does not need to analyze the specific character characteristics of the code stream of the application protocol offline, reducing the time for code stream identification.

其中,上述DPI设备,可以用于图14所示的网络系统中,其中,DPI设备可以与GGSN(Gateway GPRS Support Node,网关GPRS支持节点)独立存在,也可以与GGSN集成在一个设备上,不影响本发明的实现。图中,GGSN通过SGSN(Serving GPRS Support Node,服务GPRS支持节点)与GPRS(GeneralPacket Radio Service,通用分组无线服务技术)/UMTS(Universal MobileTelecommunications System,通用移动通信系统)连接,DPI设备通过防火墙与Internet连接,Peer位于GPRS/UMTS或者Internet中。Wherein, above-mentioned DPI device, can be used in the network system shown in Figure 14, wherein, DPI device can exist independently with GGSN (Gateway GPRS Support Node, gateway GPRS support node), also can be integrated on a device with GGSN, does not Affect the realization of the present invention. In the figure, GGSN is connected to GPRS (General Packet Radio Service)/UMTS (Universal Mobile Telecommunications System) through SGSN (Serving GPRS Support Node, serving GPRS support node), and the DPI device is connected to the Internet through a firewall. Connection, Peer is located in GPRS/UMTS or Internet.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,例如只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the methods of the above embodiments can be implemented through a program to instruct related hardware, and the program can be stored in a computer-readable storage medium, such as a read-only memory, Disk or CD, etc.

以上对本发明实施例所提供的向DPI提供用于识别数据包的信息的方法、网络爬虫及网络系统进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The method for providing information for identifying data packets to DPI provided by the embodiments of the present invention, the web crawler and the network system have been introduced in detail above. In this paper, specific examples have been used to illustrate the principle and implementation of the present invention. The above The description of the embodiment is only used to help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, according to the idea of the present invention, there will be changes in the specific implementation and scope of application. As mentioned above, the contents of this specification should not be construed as limiting the present invention.

Claims (11)

1.一种提供用于识别数据包的信息的方法,其特征在于,包括:1. A method of providing information for identifying a data packet, comprising: 利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;Using the crawler program of the application protocol to establish a corresponding relationship between the application protocol and the network entity information using the application protocol; 向深度包检测DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议;Sending identification information including the corresponding relationship to the deep packet inspection DPI device, so that the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs; 其中,使用所述应用协议的网络实体信息包括:使用所述应用协议的网络实体的地址和使用所述应用协议的网络实体所使用的传输层协议标识;所述利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体的信息的对应关系包括:Wherein, the network entity information using the application protocol includes: the address of the network entity using the application protocol and the transport layer protocol identifier used by the network entity using the application protocol; the crawler program using the application protocol establishes The corresponding relationship between the application protocol and the information of the network entity using the application protocol includes: A、确定作为查询源的网络实体;其中,所述作为查询源的网络实体使用所述应用协议;A. Determine a network entity serving as a query source; wherein, the network entity serving as a query source uses the application protocol; B、利用应用协议的爬虫程序从所述查询源获取与所述查询源关联的网络实体的信息;其中,与所述查询源关联的网络实体为使用所述应用协议、且与所述查询源拥有相同资源的网络实体;B. Using the crawler program of the application protocol to obtain the information of the network entity associated with the query source from the query source; wherein, the network entity associated with the query source uses the application protocol and is connected to the query source Network entities that have the same resources; C、将与所述查询源关联的网络实体中没有作过查询源的网络实体作为更新的查询源,当没有达到查询结束条件时,返回执行步骤B,此时,所述步骤B中的查询源为所述更新的查询源;当达到查询结束条件时,建立所述应用协议和使用所述应用协议的网络实体的信息的对应关系。C. Use the network entity associated with the query source that has not been used as a query source as an updated query source. When the query end condition is not met, return to step B. At this time, the query in step B The source is the query source of the update; when the query end condition is reached, the corresponding relationship between the application protocol and the information of the network entity using the application protocol is established. 2.根据权利要求1所述的方法,其特征在于,所述确定查询源包括:2. The method according to claim 1, wherein said determining the query source comprises: 根据DPI设备发送的网络实体信息与应用协议标识的映射关系,确定所述映射关系中应用协议标识所对应的网络实体是作为查询源的网络实体。According to the mapping relationship between the network entity information sent by the DPI device and the application protocol identifier, it is determined that the network entity corresponding to the application protocol identifier in the mapping relationship is the network entity serving as the query source. 3.根据权利要求1所述的方法,其特征在于,所述利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体的信息的对应关系包括:3. The method according to claim 1, wherein, using the crawler program of the application protocol, establishing the corresponding relationship between the application protocol and the information of the network entity using the application protocol comprises: 顺序调用爬虫程序集合中应用协议的爬虫程序向网络实体发送探测请求消息,直到接收到所述网络实体返回的表示探测成功的响应消息为止,建立对应响应消息的探测请求消息所使用的应用协议与所述网络实体的信息的对应关系。Sequentially call the crawler program of the application protocol in the crawler program set to send the detection request message to the network entity until the response message returned by the network entity indicating that the detection is successful is received, and the application protocol and the detection request message corresponding to the response message are established. The corresponding relationship of the information of the network entities. 4.根据权利要求3所述的方法,其特征在于,在顺序调用爬虫程序集合中应用协议的爬虫程序向网络实体发送探测请求消息之前,该方法还包括:4. The method according to claim 3, characterized in that, before the crawler program of the application protocol in the crawler program set is called sequentially to send the detection request message to the network entity, the method further comprises: 接收DPI设备发送的需要识别应用协议的网络实体的信息;Receive the information sent by the DPI device that needs to identify the network entity of the application protocol; 所述顺序调用爬虫程序集合中应用协议的爬虫程序向网络实体发送探测请求消息包括:The sequentially invoking the crawler program of the application protocol in the crawler program set to send the detection request message to the network entity includes: 顺序调用爬虫程序集合中应用协议的爬虫程序向所述需要识别应用协议的网络实体发送探测请求消息。The crawler program that sequentially invokes the application protocol in the crawler program set sends a detection request message to the network entity that needs to recognize the application protocol. 5.一种提供用于识别数据包的信息的网络爬虫,其特征在于,包括:5. A web crawler providing information for identifying packets, comprising: 建立单元,用于利用应用协议的爬虫程序,建立所述应用协议和使用所述应用协议的网络实体信息的对应关系;An establishing unit, configured to use a crawler program of an application protocol to establish a corresponding relationship between the application protocol and the network entity information using the application protocol; 发送单元,用于向DPI设备发送包括所述对应关系的识别信息,使所述DPI设备利用所述对应关系识别数据包所属的应用协议;A sending unit, configured to send identification information including the corresponding relationship to the DPI device, so that the DPI device uses the corresponding relationship to identify the application protocol to which the data packet belongs; 其中,使用所述应用协议的网络实体信息包括:使用所述应用协议的网络实体的地址和使用所述应用协议的网络实体所使用的传输层协议标识;Wherein, the network entity information using the application protocol includes: the address of the network entity using the application protocol and the transport layer protocol identifier used by the network entity using the application protocol; 所述建立单元包括:The building unit includes: 查询源确定单元,用于确定作为查询源的网络实体;其中,所述作为查询源的网络实体使用所述应用协议;A query source determining unit, configured to determine a network entity serving as a query source; wherein, the network entity serving as a query source uses the application protocol; 查询单元,用于利用应用协议的爬虫程序从所述查询源获取与所述查询源关联的网络实体的信息;其中,与所述查询源关联的网络实体为使用所述应用协议、且与所述查询源拥有相同资源的网络实体;The query unit is configured to use a crawler program of an application protocol to acquire information of a network entity associated with the query source from the query source; wherein, the network entity associated with the query source uses the application protocol and is related to the query source. A network entity that owns the same resource as the source of the query; 查询源更新单元,用于当没有达到查询结束条件时,将与所述查询源关联的网络实体中没有做过查询源的网络实体作为更新的查询源,将所述更新的查询源发送给所述查询单元,触发所述查询单元利用应用协议的爬虫程序从作为更新的查询源的网络实体获取与所述更新的查询源关联的网络实体的信息;The query source updating unit is configured to use a network entity associated with the query source that has not been used as a query source as an updated query source when the query end condition is not reached, and send the updated query source to all The query unit is triggered to use the crawler program of the application protocol to obtain the information of the network entity associated with the updated query source from the network entity as the updated query source; 对应关系建立单元,用于当达到查询结束条件时,建立所述应用协议和所述查询单元获取到的网络实体的信息和所述查询源确定单元确定的作为查询源的网络实体的信息的对应关系。A correspondence relationship establishing unit, configured to establish a correspondence between the application protocol and the information of the network entity acquired by the query unit and the information of the network entity as the query source determined by the query source determination unit when the query end condition is reached relation. 6.根据权利要求5所述的网络爬虫,其特征在于,6. web crawler according to claim 5, is characterized in that, 所述查询源确定单元,用于根据DPI设备发送的网络实体信息与应用协议标识的映射关系,确定所述映射关系中应用协议标识所对应的网络实体是作为查询源的网络实体。The query source determining unit is configured to determine, according to the mapping relationship between the network entity information sent by the DPI device and the application protocol identifier, that the network entity corresponding to the application protocol identifier in the mapping relationship is the network entity serving as the query source. 7.根据权利要求5所述的网络爬虫,其特征在于,7. web crawler according to claim 5, is characterized in that, 所述建立单元包括:The building unit includes: 调用单元,用于顺序调用爬虫程序集合中应用协议的爬虫程序向网络中的网络实体发送探测请求消息,直到接收到所述网络实体返回的表示探测成功的响应消息为止;The calling unit is used to sequentially call the crawler program of the application protocol in the crawler program set to send a detection request message to the network entity in the network until a response message indicating that the detection is successful returned by the network entity is received; 对应关系建立单元,用于建立对应响应消息的探测请求消息所使用的应用协议与所述网络实体的信息的对应关系。The correspondence relationship establishment unit is configured to establish a correspondence relationship between the application protocol used by the probe request message corresponding to the response message and the information of the network entity. 8.根据权利要求7所述的网络爬虫,其特征在于,8. web crawler according to claim 7, is characterized in that, 还包括:Also includes: 接收单元,用于接收DPI设备发送的需要识别应用协议的网络实体的信息;The receiving unit is configured to receive the information sent by the DPI device that needs to identify the network entity of the application protocol; 所述调用单元,用于顺序调用爬虫程序集合中应用协议的爬虫程序向所述需要识别应用协议的网络实体发送探测请求消息,直到接收到所述需要识别应用协议的网络实体返回的表示探测成功的响应消息为止。The calling unit is used to sequentially call the crawler program of the application protocol in the crawler program set to send a detection request message to the network entity that needs to identify the application protocol, until the network entity that needs to identify the application protocol returns a response indicating that the detection is successful response message. 9.一种提供用于识别数据包的信息的网络系统,其特征在于,包括:权利要求5-8任意一项所述的网络爬虫和深度包检测DPI设备,其中,9. A network system providing information for identifying data packets, comprising: the web crawler and deep packet inspection DPI device according to any one of claims 5-8, wherein, DPI设备,用于接收所述网络爬虫发送的所述识别信息,利用所述对应关系识别数据包所属的应用协议。The DPI device is configured to receive the identification information sent by the web crawler, and use the corresponding relationship to identify the application protocol to which the data packet belongs. 10.根据权利要求9所述的网络系统,其特征在于,10. The network system according to claim 9, characterized in that, 所述DPI设备,还用于将所述网络爬虫发送的所述识别信息发送给信息共享控制中心,以供网络中除了所述DPI设备以外的其他DPI设备从所述信息共享控制中心获取所述识别信息;The DPI device is further configured to send the identification information sent by the web crawler to an information sharing control center, so that other DPI devices in the network other than the DPI device can obtain the information from the information sharing control center. identifying information; 或者,所述DPI设备,还用于将所述网络爬虫发送的所述识别信息发送给网络中除所述DPI设备以外的其他DPI设备。Alternatively, the DPI device is further configured to send the identification information sent by the web crawler to other DPI devices in the network except the DPI device. 11.根据权利要求9所述的网络系统,其特征在于,11. The network system according to claim 9, characterized in that, 所述识别信息还包括:所述对应关系的老化建议时间;The identification information also includes: the aging suggestion time of the corresponding relationship; 所述DPI设备,还用于在所述老化建议时间到达时,控制所述对应关系无效。The DPI device is further configured to control the corresponding relationship to be invalid when the aging suggestion time arrives.
CN 201110082236 2011-04-01 2011-04-01 Method for identifying information of data packet, crawler engine and network system Expired - Fee Related CN102137022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110082236 CN102137022B (en) 2011-04-01 2011-04-01 Method for identifying information of data packet, crawler engine and network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110082236 CN102137022B (en) 2011-04-01 2011-04-01 Method for identifying information of data packet, crawler engine and network system

Publications (2)

Publication Number Publication Date
CN102137022A CN102137022A (en) 2011-07-27
CN102137022B true CN102137022B (en) 2013-11-06

Family

ID=44296681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110082236 Expired - Fee Related CN102137022B (en) 2011-04-01 2011-04-01 Method for identifying information of data packet, crawler engine and network system

Country Status (1)

Country Link
CN (1) CN102137022B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567513B (en) * 2011-12-27 2014-09-17 北京神州绿盟信息安全科技股份有限公司 Method and equipment for collecting phishing websites
CN104408195B (en) * 2014-12-15 2017-12-19 北京国双科技有限公司 The determination methods and device of crawlers working condition
CN108200586B (en) * 2016-12-08 2021-03-23 中国电信股份有限公司 Method and system for mobile network aware data association
CN106941459A (en) * 2017-05-02 2017-07-11 武汉绿色网络信息服务有限责任公司 The processing method and system of HTTP downlink traffics in asymmetric routed environment
WO2019075608A1 (en) 2017-10-16 2019-04-25 Oppo广东移动通信有限公司 Method and device for identifying encrypted data stream, storage medium, and system
CN111371655B (en) * 2020-04-07 2022-02-25 中移雄安信息通信科技有限公司 Deep packet inspection method, DPI device, transit device, system and storage medium
CN113765728B (en) * 2020-06-04 2023-07-14 深信服科技股份有限公司 Network detection method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101534248A (en) * 2009-04-14 2009-09-16 华为技术有限公司 Deep packet identification method, system and business board
CN101582897A (en) * 2009-06-02 2009-11-18 中兴通讯股份有限公司 Deep packet inspection method and device
CN101621504A (en) * 2008-06-30 2010-01-06 中兴通讯股份有限公司 Deep packet inspection method and system
CN101714952A (en) * 2009-12-22 2010-05-26 北京邮电大学 Method and device for identifying traffic of access network
CN101984598A (en) * 2010-11-04 2011-03-09 成都市华为赛门铁克科技有限公司 Message forwarding method and deep packet inspection (DPI) device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8264965B2 (en) * 2008-03-21 2012-09-11 Alcatel Lucent In-band DPI application awareness propagation enhancements

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621504A (en) * 2008-06-30 2010-01-06 中兴通讯股份有限公司 Deep packet inspection method and system
CN101534248A (en) * 2009-04-14 2009-09-16 华为技术有限公司 Deep packet identification method, system and business board
CN101582897A (en) * 2009-06-02 2009-11-18 中兴通讯股份有限公司 Deep packet inspection method and device
CN101714952A (en) * 2009-12-22 2010-05-26 北京邮电大学 Method and device for identifying traffic of access network
CN101984598A (en) * 2010-11-04 2011-03-09 成都市华为赛门铁克科技有限公司 Message forwarding method and deep packet inspection (DPI) device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
简光垚.《基于启发式识别的深层数据包检测P2P流的研究与实现》.《中国优秀硕士学位论文全文数据库(电子期刊)信息科技辑2009年第03期》.2009,全文. *

Also Published As

Publication number Publication date
CN102137022A (en) 2011-07-27

Similar Documents

Publication Publication Date Title
CN102137022B (en) Method for identifying information of data packet, crawler engine and network system
US20190075049A1 (en) Determining Direction of Network Sessions
US7978631B1 (en) Method and apparatus for encoding and mapping of virtual addresses for clusters
WO2020135575A1 (en) System and method for obtaining network topology, and server
CN110569288A (en) A data analysis method, device, equipment and storage medium
WO2013097525A1 (en) Method and system for transmitting network video
US20140020102A1 (en) Integrated network architecture
WO2008141538A1 (en) Method, device and system for distributing file data
CN108418847B (en) A network traffic cache system, method and device
CN102185920A (en) Network-based downloading method and system, and terminal
US8341285B2 (en) Method and system for transferring files
WO2024060408A1 (en) Network attack detection method and apparatus, device and storage medium
CN101779418A (en) Method and apparatus for providing remote device with service of universal plug and play network
JP5508273B2 (en) Network location service
CN103036729A (en) System and method for opening network capability, and relevant network element
US9203704B2 (en) Discovering a server device, by a non-DLNA device, within a home network
CN101741877B (en) Operation method, system and equipment of media resource
JPWO2022098403A5 (en)
EP3682596B1 (en) Serverless core network architecture
CN116192797A (en) Address request message answering method, device, electronic equipment and storage medium
CN112787947B (en) Network service processing method, system and gateway equipment
CN102035725B (en) Relevant technology system for one-way flow uniform resource identifier (URI) under asymmetric routing and method thereof
CN109587643B (en) A method and apparatus for flow leak detection of an application
WO2016106557A1 (en) Method and apparatus for sending video
CN112468886A (en) Multicast data forwarding method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180507

Address after: California, USA

Patentee after: Global innovation polymerization LLC

Address before: London, England

Patentee before: GW partnership Co.,Ltd.

Effective date of registration: 20180507

Address after: London, England

Patentee after: GW partnership Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Patentee before: HUAWEI TECHNOLOGIES Co.,Ltd.

TR01 Transfer of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20131106

CF01 Termination of patent right due to non-payment of annual fee