CN110390043A

CN110390043A - Method, device, terminal and storage medium for crawling web mailbox data

Info

Publication number: CN110390043A
Application number: CN201910522340.3A
Authority: CN
Inventors: 卢俊
Original assignee: OneConnect Smart Technology Co Ltd
Current assignee: OneConnect Smart Technology Co Ltd
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2019-10-29
Also published as: WO2020253366A1

Abstract

The present invention relates to deep layer net page crawler technology fields, more particularly to a kind of crawling method, device, terminal and the storage medium of webpage mailbox data, comprising: after mailbox homepage loads successfully, call the call back function of browser, wherein, the call back function includes the script file of injection；Specified search information is obtained, and the mail data of the mailbox homepage is crawled by the script that crawls of the script file, obtains corresponding with described search information crawling data；After the completion of crawling operation, by it is described crawl data and be uploaded to server parse；The parsing result that the server returns is received, and the parsing result is shown, wherein the parsing result includes the target data to match with described search information；This solution avoids server-sides, and because repeatedly crawling data, shielded phenomenon occurs, while saving server-side and crawling the resource consumed in data procedures.

Description

Method, device, terminal and storage medium for crawling web mailbox data

技术领域technical field

本发明涉及深层网页爬虫技术领域，尤其涉及一种网页邮箱数据的爬取方法、装置、终端和存储介质。The present invention relates to the technical field of deep web crawlers, in particular to a method, device, terminal and storage medium for crawling web mailbox data.

背景技术Background technique

随着计算机和互联网的不断发展，用户接受到的信息也越来越繁杂，通过接受到的信息查找其他信息则更为不便；目前用户在查询需要的信息时，都是通过服务器爬取网页信息并进行解析后加载至相关页面进行展示。With the continuous development of computers and the Internet, the information received by users is becoming more and more complicated, and it is more inconvenient to find other information through the received information; at present, when users query the information they need, they crawl webpage information through the server And after parsing, load it to the relevant page for display.

比如，用户登录邮箱并通过邮件获取相关信息时，首先爬取用户帐号和密码，服务器使用该用户帐号和密码进行登录后，再对用户邮箱中的邮件内容进行爬取和分析，详情请见附图1；但是在此操作过程中，服务器常常会因为在邮箱主页上反复登录，导致访问用户邮箱的次数过于频繁，占用大量计算资源的同时，服务器的ID也容易被网站所屏蔽，从而无法进行后续操作。For example, when a user logs in to the mailbox and obtains relevant information through the email, the user account and password are first crawled, and the server uses the user account and password to log in, and then crawls and analyzes the content of the email in the user mailbox. For details, please refer to the attached Figure 1; however, during this operation, the server often accesses the user’s mailbox too frequently due to repeated logins on the mailbox’s home page, which takes up a lot of computing resources, and the server’s ID is also easily blocked by the website, making it impossible to perform Follow up.

发明内容Contents of the invention

本发明的目的旨在至少能解决上述的技术缺陷之一，特别是现有技术中服务器爬取网页信息过于频繁，不仅容易被网站屏蔽，还会占用大量计算资源的技术缺陷。The purpose of the present invention is to at least solve one of the above-mentioned technical defects, especially the technical defect in the prior art that the server crawls webpage information too frequently, which is not only easily blocked by the website, but also occupies a large amount of computing resources.

本发明提供一种网页邮箱数据的爬取方法，包括如下步骤：The present invention provides a crawling method for web mailbox data, comprising the following steps:

当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件；After the main page of the mailbox is successfully loaded, the callback function of the browser is called, wherein the callback function includes an injected script file;

获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据；Obtain specified search information, and crawl the mail data of the main page of the mailbox through the crawling script of the script file to obtain crawling data corresponding to the search information;

当爬取操作完成后，将所述爬取数据上传至服务器进行解析；After the crawling operation is completed, upload the crawled data to the server for analysis;

接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。receiving the parsing result returned by the server, and displaying the parsing result, wherein the parsing result includes target data matching the search information.

在其中一个实施例中，当邮箱主页面加载成功后，调用浏览器的回调函数的步骤之前，还包括：In one of the embodiments, after the main page of the mailbox is successfully loaded, before the step of calling the callback function of the browser, it also includes:

根据客户端的版本信息文件从指定目录中获取脚本文件；Obtain the script file from the specified directory according to the version information file of the client;

将所述脚本文件注入浏览器的回调函数中。Inject the script file into the callback function of the browser.

在其中一个实施例中，根据客户端的版本信息文件从指定目录中获取脚本文件的步骤之前，还包括：In one of the embodiments, before the step of obtaining the script file from the specified directory according to the version information file of the client, it also includes:

请求通信接口，通过所述通信接口获取更新后的版本信息文件；requesting a communication interface, and obtaining an updated version information file through the communication interface;

将所述更新后的版本信息文件与原有的版本信息文件之间进行比对；comparing the updated version information file with the original version information file;

根据所述比对结果确定所述原有的版本信息文件是否需要更新。It is determined whether the original version information file needs to be updated according to the comparison result.

在其中一个实施例中，根据所述比对结果确定所述原有的版本信息文件是否需要更新的步骤之后，还包括：In one of the embodiments, after the step of determining whether the original version information file needs to be updated according to the comparison result, it also includes:

当所述原有的版本信息文件需要更新时，下载并保存所述更新后的版本信息文件；When the original version information file needs to be updated, download and save the updated version information file;

将所述更新后的版本信息文件的校验值与所述通信接口的校验值之间进行比对；comparing the check value of the updated version information file with the check value of the communication interface;

根据所述比对结果确定所述更新后的版本信息文件是否下载正确。Determine whether the updated version information file is downloaded correctly according to the comparison result.

在其中一个实施例中，根据客户端的版本信息文件从指定目录中获取脚本文件的步骤，包括：In one of the embodiments, the step of obtaining the script file from the specified directory according to the version information file of the client includes:

通过指定目录查找配置文件，其中，所述配置文件包括多个脚本文件以及与所述脚本文件对应的配置数据；Searching for a configuration file by specifying a directory, wherein the configuration file includes a plurality of script files and configuration data corresponding to the script files;

当所述脚本文件存在时，获取所述脚本文件对应的配置数据，并将所述配置数据的校验值与所述脚本文件的校验值之间进行比对；When the script file exists, obtain configuration data corresponding to the script file, and compare the check value of the configuration data with the check value of the script file;

根据所述比对结果获取脚本文件。Acquire the script file according to the comparison result.

在其中一个实施例中，所述搜索信息包括信用卡账单信息；In one of the embodiments, the search information includes credit card billing information;

获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据的步骤，包括：Obtain specified search information, and crawl the mail data of the main page of the mailbox through the crawling script of the script file, and obtain the steps of crawling data corresponding to the search information, including:

根据指定的信用卡账单信息执行所述脚本文件，其中，所述脚本文件包括爬取脚本；Execute the script file according to the specified credit card bill information, wherein the script file includes a crawling script;

利用所述爬取脚本在所述邮箱主页面中爬取与所述信用卡账单信息相关的邮件数据；Using the crawling script to crawl email data related to the credit card bill information in the main page of the mailbox;

统计所述爬取脚本多次爬取到的邮件数据，并根据所述统计结果得到与所述信用卡账单信息对应的爬取数据。Count the email data crawled by the crawling script multiple times, and obtain the crawled data corresponding to the credit card bill information according to the statistical results.

在其中一个实施例中，接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据的步骤，包括：In one of the embodiments, the step of receiving the parsing result returned by the server and displaying the parsing result, wherein the parsing result includes target data matching the search information, includes:

获取所述服务器返回的解析结果，其中，所述解析结果包括与所述信用卡账单信息相匹配的账单数据；Obtaining an analysis result returned by the server, wherein the analysis result includes bill data matching the credit card bill information;

将所述账单数据在所述邮箱主页面中进行展示。The billing data is displayed on the main page of the mailbox.

本发明还提供了一种网页邮箱数据的爬取装置，其包括：The present invention also provides a crawling device for webpage mailbox data, which includes:

调用模块，用于当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件；The calling module is used to call the callback function of the browser when the main page of the mailbox is loaded successfully, wherein the callback function includes the injected script file;

爬取模块，用于获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据；The crawling module is used to obtain specified search information, and crawls the mail data of the main page of the mailbox through the crawling script of the script file to obtain crawling data corresponding to the search information;

数据传输模块，用于当爬取操作完成后，将所述爬取数据上传至服务器进行解析；A data transmission module, configured to upload the crawled data to a server for parsing after the crawling operation is completed;

信息展示模块，用于接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。The information display module is configured to receive the parsing result returned by the server and display the parsing result, wherein the parsing result includes target data matching the search information.

本发明还提供了一种终端，其特征在于，包括存储器和处理器，所述存储器中存储有计算机可读指令，所述计算机可读指令被所述处理器执行时，使得所述处理器执行如上述实施例中任意一项所述网页邮箱数据的爬取方法中的步骤。The present invention also provides a terminal, which is characterized in that it includes a memory and a processor, the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the processor executes The steps in the crawling method for web mailbox data as described in any one of the above-mentioned embodiments.

本发明还提供了一种存储介质，所述存储介质中存储有计算机可读指令，所述计算机可读指令被一个或多个处理器执行时，使得一个或多个处理器执行如上述实施例中任一项所述网页邮箱数据的爬取方法的步骤。The present invention also provides a storage medium, where computer-readable instructions are stored in the storage medium, and when the computer-readable instructions are executed by one or more processors, one or more processors execute the method described in the above-mentioned embodiments. The step of the crawling method of web mailbox data described in any one.

上述网页邮箱数据的爬取方法、装置、终端和存储介质，当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件；获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据；当爬取操作完成后，将所述爬取数据上传至服务器进行解析；接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。The crawling method, device, terminal and storage medium of the above-mentioned web mailbox data, when the main page of the mailbox is successfully loaded, call the callback function of the browser, wherein the callback function includes the injected script file; obtain the specified search information, and Crawl the mail data of the main page of the mailbox through the crawling script of the script file to obtain the crawling data corresponding to the search information; when the crawling operation is completed, upload the crawling data to the server Parsing; receiving the parsing result returned by the server, and displaying the parsing result, wherein the parsing result includes target data matching the search information.

本方案通过提前注入回调函数中的脚本文件，在邮箱主页面加载成功后，对浏览器中的回调函数进行调用，此时注入的脚本文件开始运行，对邮箱主页面中的邮件数据进行爬取，以便得到与指定的搜索信息对应的爬取数据，当爬取操作结束后，将所有的爬取数据一并上传至服务器进行解析，这样服务器返回的解析结果即本方案中与指定搜索信息相匹配的目标数据；本方案可直接利用脚本文件在客户端提取与用户相关的信息，无需完全通过服务器来爬取相关内容，避免服务端因多次爬取数据而被屏蔽的现象发生，同时，由浏览器的回调函数中注入脚本文件代替服务端进行爬取操作，节省了服务端在爬取数据过程中消耗的资源。In this solution, the script file in the callback function is injected in advance, and after the main page of the mailbox is successfully loaded, the callback function in the browser is called. At this time, the injected script file starts to run, and the mail data in the main page of the mailbox is crawled. , in order to obtain the crawled data corresponding to the specified search information. After the crawling operation is completed, all the crawled data will be uploaded to the server for analysis. In this way, the analysis result returned by the server is the specified search information in this solution. Matching target data; this solution can directly use script files to extract user-related information on the client side, without completely crawling relevant content through the server, and avoiding the phenomenon that the server is blocked due to multiple crawling data. At the same time, The script file is injected into the callback function of the browser instead of the server to perform the crawling operation, which saves the resources consumed by the server in the process of crawling data.

本发明附加的方面和优点将在下面的描述中部分给出，这些将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and will become apparent from the description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

图1是本发明背景技术方案结构示意图；Fig. 1 is a schematic structural diagram of the background technical solution of the present invention;

图2是本发明实施例方案的应用环境图；Fig. 2 is the application environment diagram of the scheme of the embodiment of the present invention;

图3是一个实施例的网页邮箱数据的爬取方法流程图；Fig. 3 is a flowchart of a crawling method for web mailbox data of an embodiment;

图4是本发明的网页邮箱数据交互示意图；Fig. 4 is a schematic diagram of webpage mailbox data interaction of the present invention;

图5是一个实施例的网页邮箱数据的爬取装置结构示意图；Fig. 5 is a schematic structural diagram of a crawling device for web mailbox data according to an embodiment;

图6是一个实施例提供的终端相关的手机的部分结构框图。Fig. 6 is a partial structural block diagram of a mobile phone related to a terminal provided by an embodiment.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能解释为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

本技术领域技术人员可以理解，除非特意声明，这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是，本发明的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件，但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组合。Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of said features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or combinations thereof.

本技术领域技术人员可以理解，除非另外定义，这里使用的所有术语(包括技术术语和科学术语)，具有与本发明所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是，诸如通用字典中定义的那些术语，应该被理解为具有与现有技术的上下文中的意义一致的意义，并且除非像这里一样被特定定义，否则不会用理想化或过于正式的含义来解释。Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as commonly understood by those of ordinary skill in the art to which this invention belongs. It should also be understood that terms, such as those defined in commonly used dictionaries, should be understood to have meanings consistent with their meaning in the context of the prior art, and unless specifically defined as herein, are not intended to be idealized or overly Formal meaning to explain.

参考图2所示，图2是本发明实施例的应用环境图；本实施例中，本发明的技术方案可以基于终端设备110上实现，如图2中，用户在终端设备110上通过App等形式安装客户端，用户通过登录客户端浏览相关页面，客户端通过脚本文件爬取用户相关信息后通过通信网络传输至服务器120以实现相关功能；在本发明实施例中，用户通过终端设备110登录客户端，脚本文件爬取用户在客户端的行为信息，并根据该行为信息获取相匹配的目标数据，将该目标数据传输至服务器120中，服务器120对爬取的目标数据进行解析，以返回具体的数据信息；这里的终端设备110可以是智能手机、平板电脑、PC端等，但并不局限于此，这里的服务器120是指实现各种后台功能的服务器设备。Referring to FIG. 2, FIG. 2 is an application environment diagram of an embodiment of the present invention; in this embodiment, the technical solution of the present invention can be implemented based on a terminal device 110. As shown in FIG. Install the client in the form, the user browses relevant pages by logging in the client, and the client crawls the user-related information through the script file and transmits it to the server 120 through the communication network to realize related functions; in the embodiment of the present invention, the user logs in through the terminal device 110 Client, the script file crawls the behavior information of the user on the client, and obtains matching target data according to the behavior information, and transmits the target data to the server 120, and the server 120 parses the crawled target data to return the specific The data information; the terminal device 110 here may be a smart phone, a tablet computer, a PC, etc., but is not limited thereto, and the server 120 here refers to a server device that implements various background functions.

在一个实施例中，如图3所示，图3为一个实施例的网页邮箱数据的爬取方法流程图，本实施例中提出了一种网页邮箱数据的爬取方法，具体可以包括以下步骤：In one embodiment, as shown in FIG. 3 , FIG. 3 is a flow chart of a method for crawling webpage mailbox data in one embodiment. In this embodiment, a method for crawling webpage mailbox data is proposed, which may specifically include the following steps :

S110：当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件。S110: After the main page of the mailbox is successfully loaded, call a callback function of the browser, wherein the callback function includes an injected script file.

在此步骤中，用户登录成功后，在邮箱主页面开启页面蒙层动画，显示“爬取加载中”的加载效果，主要目的是遮盖邮箱主页面，防止用户操作对自动爬取造成干扰。In this step, after the user successfully logs in, the page masking animation is turned on on the main mailbox main page to display the loading effect of "crawling and loading". The main purpose is to cover the main mailbox main page and prevent user operations from interfering with automatic crawling.

这里使用回调函数来访问邮箱登录网页的目的在回调函数是客户端系统代码中的一个native控件，可以理解成系统中一个简单的浏览器，当访问邮箱登录网页，网页加载成功就会被调用，此时脚本文件已经注入到邮箱登录网页中，当回调函数被调用时，注入脚本后的js代码就会处于运行的状态，并实时检测用户的操作。The purpose of using the callback function to access the email login webpage here is that the callback function is a native control in the client system code, which can be understood as a simple browser in the system. When accessing the email login webpage, it will be called when the webpage is loaded successfully. At this point, the script file has been injected into the mailbox login webpage. When the callback function is called, the js code after the script injection will be in the running state, and the user's operation will be detected in real time.

S120：获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据。S120: Obtain specified search information, and crawl the mail data of the main page of the mailbox through the crawling script of the script file to obtain crawled data corresponding to the search information.

本步骤中，通过步骤S110的当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件后，获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据。In this step, after the main page of the mailbox is successfully loaded through step S110, the callback function of the browser is called, wherein, after the callback function includes the injected script file, the specified search information is obtained, and crawled through the script file The fetching script crawls the mail data of the main page of the mailbox to obtain crawled data corresponding to the search information.

上述过程中，利用注入的脚本文件中的爬取脚本在邮箱主页面中爬取与指定的搜索信息相匹配的爬取数据，比如，用户输入的搜索信息为信用卡账单数据，脚本文件检测到用户输入的信息后，爬取邮箱中的邮件数据，查找与信用卡账单数据相关的爬取数据。In the above process, use the crawling script in the injected script file to crawl the crawling data that matches the specified search information in the mailbox main page. For example, the search information entered by the user is credit card bill data, and the script file detects that the user After entering the information, crawl the email data in the mailbox to find the crawled data related to the credit card bill data.

需要说明的是，这里的指定的搜索信息，可以指的是用户登录邮箱主页面后输入的特定的搜索信息，也可以指的是通过后台操作并定时查询的已保存的搜索信息，这些已保存的搜索信息的内容可有业务人员登录后在后台操作页面进行修改。It should be noted that the specified search information here can refer to the specific search information entered by the user after logging in to the main page of the mailbox, or it can refer to the saved search information that is queried regularly through background operations. The content of the search information can be modified by business personnel on the background operation page after logging in.

S130：当所述爬取操作完成后，将所述爬取数据上传至服务器进行解析。S130: After the crawling operation is completed, upload the crawled data to a server for analysis.

本步骤中，当客户端完成爬取操作后，向服务器上传爬取到的爬取数据，服务器可以根据爬取的爬取数据进行深入的文本解析。In this step, after the client completes the crawling operation, it uploads the crawled data to the server, and the server can perform in-depth text analysis based on the crawled data.

例如，在获取信用卡账单数据的邮件爬取场景中，服务器根据爬取的数据进行文本解析获取信用卡的账单数据，并将该账单数据返回至邮箱主页面，以便用户及时获取爬取结果。For example, in the email crawling scenario of obtaining credit card bill data, the server performs text analysis based on the crawled data to obtain credit card bill data, and returns the bill data to the main page of the mailbox, so that users can obtain the crawling results in a timely manner.

S140：接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。S140: Receive the parsing result returned by the server, and display the parsing result, wherein the parsing result includes target data matching the search information.

本步骤中，当服务器完成解析后，接收由服务器返回的解析结果，并将该解析结果在邮箱主页面中进行展示，展示的信息包括与指定的搜索信息相匹配的目标数据。In this step, after the server completes the parsing, it receives the parsing result returned by the server, and displays the parsing result on the main page of the mailbox, and the displayed information includes target data matching the specified search information.

比如，在获取信用卡账单数据的邮件爬取场景中，服务器根据爬取的数据进行文本解析获取信用卡的账单数据，并将该账单数据返回至邮箱主页面，此时邮箱主页面加载的内容包括获取到的信用卡的账单数据，该账单数据可以是按照日期顺序进行排列的方式，也可以是按照不同的支付种类进行统计后得到的账单数据。For example, in the email crawling scenario of obtaining credit card bill data, the server performs text analysis based on the crawled data to obtain credit card bill data, and returns the bill data to the main page of the mailbox. At this time, the content loaded on the main page of the mailbox includes getting The billing data of the received credit card, the billing data may be arranged in date order, or may be the billing data obtained after statistics are made according to different payment types.

账单数据的内容可以是，生活日用、住房缴费、交通出行、饮食、服饰美容、运动健康、文教娱乐、通讯物流以及其他消费的账单信息。The content of the bill data can be bill information for daily necessities, housing payment, transportation, diet, clothing and beauty, sports and health, culture, education and entertainment, communication and logistics, and other consumption.

上述网页邮箱数据的爬取方法，当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件；获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据；当爬取操作完成后，将所述爬取数据上传至服务器进行解析；接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。The crawling method of the above-mentioned webpage mailbox data, when the main page of the mailbox is successfully loaded, calls the callback function of the browser, wherein the callback function includes the injected script file; obtains the specified search information, and crawls through the script file Get the script to crawl the mail data of the main page of the mailbox to obtain the crawl data corresponding to the search information; when the crawl operation is completed, upload the crawl data to the server for analysis; receive the crawl data from the server Returning the parsing result, and displaying the parsing result, wherein the parsing result includes target data matching the search information.

本方案通过提前注入回调函数中的脚本文件，在邮箱主页面加载成功后，对浏览器中的回调函数进行调用，此时注入的脚本文件开始运行，对邮箱主页面中的邮件数据进行爬取，以便得到与指定的搜索信息对应的爬取数据，当爬取操作结束后，将所有的爬取数据一并上传至服务器进行解析，这样服务器返回的解析结果即本方案中与指定搜索信息相匹配的目标数据；本方案可直接利用脚本文件在客户端提取与用户相关的信息，无需完全通过服务器来爬取相关内容，避免服务端因多次爬取数据而被屏蔽的现象发生，同时，由浏览器的回调函数中注入的脚本文件代替服务端进行爬取操作，节省了服务端在爬取数据过程中消耗的资源。In this solution, the script file in the callback function is injected in advance, and after the main page of the mailbox is successfully loaded, the callback function in the browser is called. At this time, the injected script file starts to run, and the mail data in the main page of the mailbox is crawled. , in order to obtain the crawled data corresponding to the specified search information. After the crawling operation is completed, all the crawled data will be uploaded to the server for analysis. In this way, the analysis result returned by the server is the specified search information in this solution. Matching target data; this solution can directly use script files to extract user-related information on the client side, without completely crawling relevant content through the server, and avoiding the phenomenon that the server is blocked due to multiple crawling data. At the same time, The script file injected into the callback function of the browser replaces the crawling operation on the server side, which saves the resources consumed by the server side in the process of crawling data.

在一个实施例中，在步骤S110的当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件之前，还可以包括：In one embodiment, after the main page of the mailbox is successfully loaded in step S110, the callback function of the browser is called, wherein, before the callback function includes the injected script file, it may also include:

(1)根据客户端的版本信息文件从指定目录中获取脚本文件；(1) Obtain the script file from the specified directory according to the version information file of the client;

(2)将所述脚本文件注入浏览器的回调函数中。(2) Inject the script file into the callback function of the browser.

在此步骤中，根据客户端的版本信息文件从指定目录中获取脚本文件，需要说明的是，这里的版本信息文件指的是软件程序发行的次数版本号对应的信息文件；这里的软件程序可以是在客户端安装的邮箱办公软件；这里的指定目录指的是根据版本信息文件存储路径确定的指定路径下的目录文件。In this step, the script file is obtained from the specified directory according to the version information file of the client. It should be noted that the version information file here refers to the information file corresponding to the number of times the software program is released; the software program here can be The mailbox office software installed on the client; the specified directory here refers to the directory file under the specified path determined according to the storage path of the version information file.

由上述说明可知，根据客户端安装的邮箱的版本信息文件，可以从该版本信息文件的指定目录中获取脚本文件，这里的脚本文件可以包括多种文件类型，例如：公共函数库的公共脚本、任意门的公共脚本、爬取脚本和登录脚本等。As can be seen from the above description, according to the version information file of the mailbox installed on the client, the script file can be obtained from the specified directory of the version information file. The script file here can include various file types, for example: public scripts of public function libraries, Public scripts, crawling scripts, and login scripts of any door.

①公共函数库的公共脚本：jquery-2.2.4.min.js；可以作为基础的js函数库被调用；①The public script of the public function library: jquery-2.2.4.min.js; it can be called as the basic js function library;

②任意门脚本的公共脚本：native_RYM.js；可以用于网页与客户端的IOS系统或Android系统进行代码交互；②Public script of Anydoor script: native_RYM.js; it can be used for code interaction between the web page and the client’s IOS system or Android system;

任意门脚本会判断当前环境是哪种系统，调用与系统相应的WebView接口，IOS系统和Android系统可以使用同一份爬取或登录的脚本；The Anydoor script will determine which system the current environment is, and call the corresponding WebView interface of the system. IOS system and Android system can use the same crawling or login script;

③爬取脚本：crawl.js；可以用于爬取网页的数据，在邮件爬取的场景下，该爬取脚本可以主要爬取邮件中的邮件数据；③ Crawling script: crawl.js; it can be used to crawl webpage data. In the scenario of email crawling, this crawling script can mainly crawl email data in emails;

④登录脚本：autologin.js；可以用于在保存有用户帐号和密码时，进行自动登录。④Login script: autologin.js; it can be used to automatically log in when the user account and password are saved.

上述的爬取脚本和登录脚本需要依赖公共脚本进行执行，因此注入脚本文件时，首先注入的是公共脚本。The above crawling scripts and login scripts need to rely on public scripts for execution, so when injecting script files, the public scripts are injected first.

当获取到脚本文件后，将该脚本文件注入浏览器的回调函数中，进一步地，可通过回调函数访问邮箱登录网页，并通过脚本文件的登录脚本爬取邮箱登录网页的用户登录信息。After obtaining the script file, inject the script file into the callback function of the browser, further, access the mailbox login webpage through the callback function, and crawl the user login information of the mailbox login webpage through the login script of the script file.

这里使用回调函数来访问邮箱登录网页的目的在于WebView是客户端系统代码中的一个native控件，可以理解成系统中一个简单的浏览器，当访问邮箱登录网页，网页加载成功就会被调用，如果用户在此前保存了有关邮箱登录账户和登录密码时，js就能发现并做爬取操作。The purpose of using the callback function to access the mailbox login webpage here is that WebView is a native control in the client system code, which can be understood as a simple browser in the system. When accessing the mailbox login webpage, it will be called when the webpage is loaded successfully. If When the user has previously saved the relevant email login account and login password, js can discover and perform crawling operations.

优选地，将所述用户登录信息进行展示，当接收到用户对该用户登录信息的确认信息时，将邮箱登录网页跳转至邮箱主页面。Preferably, the user login information is displayed, and when the confirmation information of the user login information is received from the user, the mailbox login webpage is redirected to the mailbox main page.

本步骤中，通过回调函数访问邮箱登录网页，并通过注入的脚本文件早邮箱登录网页中爬取与邮箱登录网页的用户登录信息后，将该用户登录信息进行展示，当接收到用户对用户登录信息的确认信息时，将邮箱登录网页跳转至邮箱主页面。In this step, access the mailbox login webpage through the callback function, and crawl the user login information related to the mailbox login webpage from the mailbox login webpage through the injected script file, and then display the user login information. When confirming the information, the email login webpage will be redirected to the main email page.

上述过程中，脚本文件爬取到用户保存的用户登录信息后，将该用户登录信息进行展示，用户对该展示信息浏览并确定，在接收到用户的确认信息后，将页面跳转至邮箱主页面。In the above process, after the script file crawls the user login information saved by the user, it displays the user login information, and the user browses and confirms the display information. After receiving the confirmation information from the user, the page jumps to the main page of the mailbox noodle.

进一步地，当用户未保存邮箱登录页面中的用户登录信息时，无需通过脚本文件继续爬取相关信息，用户手动输入完账密登录成功后跳转邮箱主页面，后台也会征求用户意见，是否保存当前用户输入的用户帐号和密码。Furthermore, when the user has not saved the user login information on the email login page, there is no need to continue to crawl relevant information through the script file. After the user manually enters the account password and successfully logs in, he will jump to the main email page, and the background will also ask for the user's opinion. Save the user account and password entered by the current user.

在一个实施例中，根据客户端的版本信息文件从指定目录中获取脚本文件的步骤之前，还可以包括：In one embodiment, before the step of obtaining the script file from the specified directory according to the version information file of the client, it may further include:

(1)请求通信接口，通过所述通信接口获取更新后的版本信息文件；(1) requesting a communication interface, and obtaining an updated version information file through the communication interface;

(2)将所述更新后的版本信息文件与原有的版本信息文件之间进行比对；(2) comparing the updated version information file with the original version information file;

(3)当有比对结果时，更新所述原有的版本信息文件；(3) When there is a comparison result, update the original version information file;

(4)当无比对结果时，无需更新所述原有的版本信息文件。(4) When there is no comparison result, there is no need to update the original version information file.

上述步骤中，通过请求通信接口来获取脚本的最新版本信息文件，并与本地配置的版本信息文件进行对比，判断是否需要更新。In the above steps, the latest version information file of the script is obtained by requesting the communication interface, and compared with the locally configured version information file to determine whether an update is required.

这里的通信接口可以是GP接口，是不同PLMN网的GSN之间采用的接口，其增加了边缘网关(BG，Border Gateway)和防火墙，通过BG来提供边缘网关路由协议，以完成归属于不同PLMN的GPRS支持节点之间的通信。The communication interface here can be a GP interface, which is an interface adopted between GSNs of different PLMN networks. It adds a border gateway (BG, Border Gateway) and a firewall, and provides a border gateway routing protocol through the BG to complete the belonging to different PLMNs. GPRS supports communication between nodes.

进一步地，这里的版本信息文件可以以一个json格式的文件形式进行保存，保存格式如下：{"downloadUrl":"","jsName":"qq","md5":"279f1dd1759c2c270ea2837f1121ebfb","needUpdateVersion":false,"updatedJsPath":"","version":"1.1.3"}。Furthermore, the version information file here can be saved as a file in json format, and the saving format is as follows: {"downloadUrl":"","jsName":"qq","md5":"279f1dd1759c2c270ea2837f1121ebfb","needUpdateVersion" :false,"updatedJsPath":"","version":"1.1.3"}.

本实施例中，在通过客户端的版本信息文件从指定目录中获取脚本文件之前，通过请求GP接口来检查并获取最新更新的版本信息文件，这样，可以获取到更新的脚本文件，以减少本地脚本文件在运行的时候出现的版本老化导致的信息不对称的问题。In this embodiment, before the script file is obtained from the designated directory through the version information file of the client, the latest updated version information file is checked and obtained by requesting the GP interface, so that the updated script file can be obtained to reduce local scripts. The problem of information asymmetry caused by version aging when the file is running.

在一个实施例中，将所述更新后的版本信息文件与原有的版本信息文件之间进行比对的步骤，可以包括：In one embodiment, the step of comparing the updated version information file with the original version information file may include:

(1)获取所述更新后的版本信息文件的版本号，以及所述原有的版本信息文件的版本号；(1) Acquiring the version number of the updated version information file and the version number of the original version information file;

(2)将所述更新后的版本信息文件的版本号与所述原有的版本信息文件的版本号之间进行比对。(2) Compare the version number of the updated version information file with the version number of the original version information file.

上述步骤中，通过比对不同版本信息文件之间的版本号，可以得知本地配置的版本信息文件是否需要更新。In the above steps, by comparing the version numbers of different version information files, it can be known whether the locally configured version information files need to be updated.

需要说明的是，不同的版本信息文件有不同的版本号，版本号作为区分不同版本信息文件的标识性信息，用来比对新旧版本信息文件，以便进一步查看本地版本信息文件是否需要更新。It should be noted that different version information files have different version numbers, and the version number is used as identification information to distinguish different version information files, and is used to compare the old and new version information files, so as to further check whether the local version information files need to be updated.

本实施例中，通过将获取的更新后的版本信息文件与本地原有的版本信息文件的版本号之间进行比对，以便比较得知拉取的更新后的版本信息文件是否为最新版本，避免直接将更新后的版本信息文件拉取到本地，却与原有的版本信息文件之间无差别，导致资源被无端消耗的情况发生。In this embodiment, by comparing the version number of the acquired updated version information file with the version number of the local original version information file, so as to compare whether the pulled updated version information file is the latest version, Avoid directly pulling the updated version information file to the local, but there is no difference with the original version information file, resulting in resource consumption for no reason.

在一个实施例中，根据所述比对结果确定所述原有的版本信息文件是否需要更新的步骤之后，还可以包括：In one embodiment, after the step of determining whether the original version information file needs to be updated according to the comparison result, it may further include:

(1)当所述原有的版本信息文件需要更新时，下载并保存所述更新后的版本信息文件；(1) When the original version information file needs to be updated, download and save the updated version information file;

(2)将所述更新后的版本信息文件的校验值与所述通信接口的校验值之间进行比对；(2) comparing the check value of the updated version information file with the check value of the communication interface;

(3)根据所述比对结果确定所述更新后的版本信息文件是否下载正确。(3) Determine whether the updated version information file is downloaded correctly according to the comparison result.

本步骤中，通过上述实施例的比对结果确定原有的版本信息文件需要更新时，下载并保存该版本信息文件，然后将该版本信息文件的校验值与通信接口的校验值之间进行比对，确保下载的版本信息文件为正确的版本信息文件。In this step, when it is determined that the original version information file needs to be updated through the comparison results of the above-mentioned embodiments, download and save the version information file, and then compare the check value of the version information file with the check value of the communication interface. Perform comparison to ensure that the downloaded version information file is the correct version information file.

上述的校验值可以是md5值，上述通信接口可以是GP接口，当需要更新时，下载最新版本信息文件，将最新版本信息文件的md5值与GP接口的md5值进行对比，判断是否下载正确，判定下载正确时，将该最新版本信息文件存储为配置数据。The above-mentioned verification value can be an md5 value, and the above-mentioned communication interface can be a GP interface. When an update is required, download the latest version information file, compare the md5 value of the latest version information file with the md5 value of the GP interface, and judge whether the download is correct , when it is determined that the download is correct, store the latest version information file as configuration data.

进一步地，如果最新版本信息文件的md5值与GP接口的md5值之间不匹配，则表示下载的版本信息文件不正确，则将之前下载并保存的版本信息文件删除后重新下载，直到下载正确为止，防止下载的不正确的版本信息文件中包含木马病毒等导致终端设备110中毒。Further, if the md5 value of the latest version information file does not match the md5 value of the GP interface, it means that the downloaded version information file is incorrect, then delete the previously downloaded and saved version information file and download it again until the download is correct So far, the terminal device 110 is prevented from being poisoned due to the Trojan horse virus contained in the downloaded incorrect version information file.

在一个实施例中，根据客户端的版本信息文件从指定目录中获取脚本文件的步骤，可以包括：In one embodiment, the step of obtaining the script file from the specified directory according to the version information file of the client may include:

(1)通过指定目录查找配置文件，其中，所述配置文件包括多个脚本文件以及与所述脚本文件对应的配置数据；(1) searching for configuration files by specifying a directory, wherein the configuration files include a plurality of script files and configuration data corresponding to the script files;

(2)当所述脚本文件存在时，获取所述脚本文件对应的配置数据，并将所述配置数据的校验值与所述脚本文件的校验值之间进行比对；(2) when the script file exists, obtain the configuration data corresponding to the script file, and compare the check value of the configuration data with the check value of the script file;

(3)根据所述比对结果获取脚本文件。(3) Obtain the script file according to the comparison result.

本实施例中，根据客户端的版本信息文件从指定目录中获取脚本文件，具体可以包括如下步骤：In this embodiment, the script file is obtained from the specified directory according to the version information file of the client, which may specifically include the following steps:

在指定数据目录下查找配置文件，如指定目录为/data/…/app_emailcrawl/，则在该指定目录下查找xx.js文件，若该脚本文件存在，则获取该脚本文件的配置数据，并将该配置数据的md5值与/data/下脚本文件的md5值进行对比，若该配置数据的md5值是正确，则获取/data/下的脚本文件。Search for the configuration file in the specified data directory. If the specified directory is /data/.../app_emailcrawl/, then search for the xx.js file in the specified directory. If the script file exists, obtain the configuration data of the script file and set The md5 value of the configuration data is compared with the md5 value of the script file under /data/, and if the md5 value of the configuration data is correct, the script file under /data/ is obtained.

进一步地，若存在该配置数据的md5值是错误的、指定数据目录不存在脚本文件、指定数据目录下的脚本文件没有配置数据这些情况时，则删除指定数据目录下的相关数据，复制Asset下的脚本，存储配置数据，通知外部，由外部介入处理。Further, if the md5 value of the configuration data is wrong, the script file does not exist in the specified data directory, and the script file in the specified data directory has no configuration data, delete the relevant data in the specified data directory, and copy the assets under Asset The script stores the configuration data, notifies the outside, and is handled by the outside.

上述实施例中，将指定目录下获取的脚本文件与配置数据之间的校验值进行比对，防止获取的脚本文件为错误的脚本文件，避免消耗内存。In the above embodiment, the check value between the script file obtained in the specified directory and the configuration data is compared to prevent the obtained script file from being a wrong script file and avoid memory consumption.

在一个实施例中，所述搜索信息包括信用卡账单信息；In one embodiment, the search information includes credit card billing information;

步骤S120中，获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据的步骤，可以包括：In step S120, the step of obtaining the specified search information, and crawling the mail data of the main page of the mailbox through the crawling script of the script file, and obtaining the crawled data corresponding to the search information may include:

(1)根据指定的信用卡账单信息执行所述脚本文件，其中，所述脚本文件包括爬取脚本；(1) Execute the script file according to the specified credit card bill information, wherein the script file includes a crawling script;

(2)利用所述爬取脚本在所述邮箱主页面中爬取与所述信用卡账单信息相关的邮件数据；(2) crawling mail data related to the credit card bill information in the main page of the mailbox by using the crawling script;

(3)统计所述爬取脚本多次爬取到的邮件数据，并根据所述统计结果得到与所述信用卡账单信息对应的爬取数据。(3) Count the mail data crawled by the crawling script multiple times, and obtain crawled data corresponding to the credit card bill information according to the statistical results.

本实施例中，获取指定的搜索信息，并根据该搜索信息运行脚本文件中的爬取脚本，利用爬取脚本在邮箱主页面中爬取与搜索信息相关的邮件数据，并将爬取到的邮件数据中的数据内容进一步进行解析，以得到与用户输入的搜索信息相匹配的目标数据。In this embodiment, the specified search information is obtained, and the crawling script in the script file is run according to the search information, and the email data related to the search information is crawled in the mailbox main page by using the crawling script, and the crawled The data content in the email data is further analyzed to obtain the target data matching the search information input by the user.

进一步地，脚本文件在执行爬取的同时，可以向客户端返回爬取过程的爬取状态，这里的爬取状态可以包括爬取进度、已爬取了百分之多少、待完成爬取时间等。Furthermore, while the script file is performing crawling, it can return the crawling status of the crawling process to the client, where the crawling status can include crawling progress, the percentage of crawling, and the time to complete the crawling Wait.

在一个实施例中，如图4所示，图4为一个实施例的网页邮箱数据交互示意图，本实施例中提供了一种网页邮箱数据的爬取方法，可以包括：In one embodiment, as shown in FIG. 4, FIG. 4 is a schematic diagram of webpage mailbox data interaction in one embodiment. This embodiment provides a method for crawling webpage mailbox data, which may include:

终端设备110通过浏览器的回调函数中注入的脚本文件，在客户端打开邮箱主页面的时候，调用该回调函数后，执行该脚本文件的脚本内容，利用爬取脚本爬取邮箱主页面中的邮件数据，并将最终爬取到的相关爬取数据一并上传至服务器120，服务器120对该爬取数据进行深入的文本解析后将解析结果返回至邮箱主页面中进行展示。The terminal device 110 uses the script file injected in the callback function of the browser. When the client opens the main page of the mailbox, after calling the callback function, the script content of the script file is executed, and the crawling script is used to crawl the main page of the mailbox. Email data, and finally crawled related crawled data are uploaded to the server 120, and the server 120 performs in-depth text analysis on the crawled data, and then returns the analysis result to the main page of the mailbox for display.

上述过程中，服务器120只需接收终端设备110上传的爬取数据即可，无需多次访问客户端，防止因访问次数频繁而被屏蔽的现象发生，也进一步减少了服务器120的资源消耗。In the above process, the server 120 only needs to receive the crawling data uploaded by the terminal device 110, and does not need to visit the client multiple times, preventing the phenomenon of being blocked due to frequent visits, and further reducing the resource consumption of the server 120.

在一个实施例中，如图5所示，图5为一个实施例的网页邮箱数据的爬取装置结构示意图，本实施例中提供了一种网页邮箱数据的爬取装置，其包括：调用模块210、爬取模块220、数据传输模块230、信息展示模块240，其中：In one embodiment, as shown in FIG. 5, FIG. 5 is a schematic structural diagram of a crawling device for web mailbox data in an embodiment. A crawling device for web mailbox data is provided in this embodiment, which includes: a calling module 210, crawling module 220, data transmission module 230, information display module 240, wherein:

调用模块210，用于当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件。The calling module 210 is configured to call a callback function of the browser after the main page of the mailbox is loaded successfully, wherein the callback function includes an injected script file.

本模块中，用户登录成功后，在邮箱主页面开启页面蒙层动画，显示“爬取加载中”的加载效果，主要目的是遮盖邮箱主页面，防止用户操作对自动爬取造成干扰。In this module, after the user logs in successfully, the page mask animation is turned on on the main mailbox main page to display the loading effect of "crawling and loading". The main purpose is to cover the main mailbox main page and prevent user operations from interfering with automatic crawling.

爬取模块220，用于获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据。The crawling module 220 is configured to acquire specified search information, and crawl the mail data of the main page of the mailbox through the crawling script of the script file to obtain crawled data corresponding to the search information.

本模块中，通过调用模块210的当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件后，获取指定的的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据。In this module, after the main page of the mailbox is successfully loaded by calling module 210, the callback function of the browser is called, wherein, after the callback function includes the injected script file, the specified search information is obtained, and the script file is used to obtain the specified search information. The crawling script crawls the mail data of the main page of the mailbox to obtain the crawled data corresponding to the search information.

数据传输模块230，用于当爬取操作完成后，将所述爬取数据上传至服务器进行解析。The data transmission module 230 is configured to upload the crawled data to the server for analysis after the crawling operation is completed.

本模块中，当客户端完成爬取操作后，向服务器上传爬取到的爬取数据，服务器可以根据爬取的爬取数据进行深入的文本解析。In this module, after the client completes the crawling operation, it uploads the crawled data to the server, and the server can perform in-depth text analysis based on the crawled data.

信息展示模块240，用于接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。The information display module 240 is configured to receive the parsing result returned by the server and display the parsing result, wherein the parsing result includes target data matching the search information.

本模块中，当服务器完成解析后，接收由服务器返回的解析结果，并将该解析结果在邮箱主页面中进行展示，展示的信息包括与指定的搜索信息相匹配的目标数据。In this module, after the server finishes parsing, it receives the parsing result returned by the server, and displays the parsing result on the main page of the mailbox. The displayed information includes the target data that matches the specified search information.

比如：账单数据的内容可以是，生活日用、住房缴费、交通出行、饮食、服饰美容、运动健康、文教娱乐、通讯物流以及其他消费的账单信息。For example: the content of the bill data can be bill information for daily necessities, housing payment, transportation, diet, clothing and beauty, sports and health, culture, education and entertainment, communication and logistics, and other consumption.

上述网页邮箱数据的爬取装置，当邮箱主页面加载成功后，调用浏览器的回调函数，其中，所述回调函数包括注入的脚本文件；获取指定的搜索信息，并通过所述脚本文件的爬取脚本对所述邮箱主页面的邮件数据进行爬取，得到与所述搜索信息对应的爬取数据；当爬取操作完成后，将所述爬取数据上传至服务器进行解析；接收所述服务器返回的解析结果，并将所述解析结果进行展示，其中，所述解析结果包括与所述搜索信息相匹配的目标数据。The above crawling device for webpage mailbox data, when the main page of the mailbox is successfully loaded, calls the callback function of the browser, wherein the callback function includes the injected script file; obtains the specified search information, and crawls through the script file Get the script to crawl the mail data of the main page of the mailbox to obtain the crawl data corresponding to the search information; when the crawl operation is completed, upload the crawl data to the server for analysis; receive the crawl data from the server Returning the parsing result, and displaying the parsing result, wherein the parsing result includes target data matching the search information.

关于网页邮箱数据的爬取装置的具体限定可以参见上文中对于网页邮箱数据的爬取方法的限定，在此不再赘述。上述网页邮箱数据的爬取装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于终端设备中的处理器中，也可以以软件形式存储于终端设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For specific limitations on the crawling device for web mailbox data, please refer to the above-mentioned limitations on the crawling method for web mailbox data, which will not be repeated here. All or part of the modules in the crawling device for webpage mailbox data can be realized by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the terminal device in the form of hardware, or can be stored in the memory of the terminal device in the form of software, so that the processor can call and execute the corresponding operations of the above modules.

在一个实施例中，提出了一种终端，包括存储器320和处理器380，所述存储器320中存储有计算机可读指令，所述计算机可读指令被所述处理器380执行时，使得所述处理器380执行如上述实施例中任意一项所述网页邮箱数据的爬取方法中的步骤。In one embodiment, a terminal is proposed, including a memory 320 and a processor 380, the memory 320 stores computer-readable instructions, and when the computer-readable instructions are executed by the processor 380, the The processor 380 executes the steps in the method for crawling web mailbox data as described in any one of the above embodiments.

如图6所示，图6示出的是与本发明实施例提供的终端相关的手机的部分结构框图。参考图6，手机包括：射频(Radio Frequency，RF)电路310、存储器320、输入单元330、显示单元340、传感器350、音频电路360、无线保真(wireless fidelity，WiFi)模块370、处理器380、以及电源390等部件。本领域技术人员可以理解，图6中示出的手机结构并不构成对手机的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。As shown in FIG. 6, FIG. 6 shows a partial structural block diagram of a mobile phone related to the terminal provided by the embodiment of the present invention. 6, the mobile phone includes: a radio frequency (Radio Frequency, RF) circuit 310, a memory 320, an input unit 330, a display unit 340, a sensor 350, an audio circuit 360, a wireless fidelity (wireless fidelity, WiFi) module 370, a processor 380 , and power supply 390 and other components. Those skilled in the art can understand that the structure of the mobile phone shown in FIG. 6 does not constitute a limitation to the mobile phone, and may include more or less components than shown in the figure, or combine some components, or arrange different components.

下面结合图6对手机的各个构成部件进行具体的介绍：The following is a specific introduction to each component of the mobile phone in conjunction with Figure 6:

RF电路310可用于收发信息或通话过程中，信号的接收和发送，特别地，将基站的下行信息接收后，给处理器380处理；另外，将设计上行的数据发送给基站。存储器320可用于存储计算机可读存储指令以及模块，处理器380通过运行存储在存储器320的计算机可读存储指令以及模块，从而执行手机的各种功能应用以及数据处理。输入单元330可用于接收输入的数字或字符信息，以及产生与手机的用户设置以及功能控制有关的键信号输入。显示单元340可用于显示由用户输入的信息或提供给用户的信息以及手机的各种菜单。The RF circuit 310 can be used for sending and receiving information or receiving and sending signals during a call. In particular, after receiving the downlink information from the base station, it is processed by the processor 380; in addition, the designed uplink data is sent to the base station. The memory 320 can be used to store computer-readable storage instructions and modules, and the processor 380 executes various functional applications and data processing of the mobile phone by running the computer-readable storage instructions and modules stored in the memory 320 . The input unit 330 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the mobile phone. The display unit 340 may be used to display information input by or provided to the user and various menus of the mobile phone.

手机还可包括至少一种传感器350，比如光传感器、运动传感器以及其他传感器。音频电路360、扬声器361，传声器362可提供用户与手机之间的音频接口。WiFi属于短距离无线传输技术，手机通过WiFi模块370可以帮助用户收发电子邮件、浏览网页和访问流式媒体等，它为用户提供了无线的宽带互联网访问。处理器380是手机的控制中心，利用各种接口和线路连接整个手机的各个部分，通过运行或执行存储在存储器320内的计算机可读存储指令和/或模块，以及调用存储在存储器320内的数据，执行手机的各种功能和处理数据，从而对手机进行整体监控。The handset may also include at least one sensor 350, such as a light sensor, motion sensor, and other sensors. The audio circuit 360, the speaker 361, and the microphone 362 can provide an audio interface between the user and the mobile phone. WiFi is a short-distance wireless transmission technology. The mobile phone can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 370, which provides users with wireless broadband Internet access. The processor 380 is the control center of the mobile phone. It uses various interfaces and lines to connect various parts of the entire mobile phone. By running or executing computer-readable storage instructions and/or modules stored in the memory 320, and calling the stored in the memory 320 Data, perform various functions of the mobile phone and process data, so as to monitor the mobile phone as a whole.

手机还包括给各个部件供电的电源390(比如电池)，优选的，电源可以通过电源管理系统与处理器380逻辑相连，从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。The mobile phone also includes a power supply 390 (such as a battery) for supplying power to each component. Preferably, the power supply can be logically connected to the processor 380 through the power management system, so as to realize functions such as managing charging, discharging, and power consumption management through the power management system.

在一个实施例中，提出了一种存储介质，当存储在存储器320的计算机可读存储指令以及模块被处理器380执行时，可使得处理器380实现上述网页邮箱数据的爬取方法，以及实现图5所示实施例的网页邮箱数据的爬取装置中的相应模块的功能。In one embodiment, a storage medium is proposed. When the computer-readable storage instructions and modules stored in the memory 320 are executed by the processor 380, the processor 380 can realize the above method of crawling web mailbox data, and implement The functions of the corresponding modules in the crawling device for webpage mailbox data of the embodiment shown in FIG. 5 .

应该理解的是，虽然附图的流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，其可以以其他的顺序执行。而且，附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，其执行顺序也不必然是依次进行，而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart of the accompanying drawings are displayed sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.

以上所述仅是本发明的部分实施方式，应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。The above descriptions are only part of the embodiments of the present invention. It should be pointed out that those skilled in the art can make some improvements and modifications without departing from the principles of the present invention. It should be regarded as the protection scope of the present invention.

Claims

1. a kind of crawling method of webpage mailbox data, which comprises the steps of:

After mailbox homepage loads successfully, the call back function of browser is called, wherein the call back function includes the foot of injection This document；

Specified search information is obtained, and script is crawled to the mail data of the mailbox homepage by the script file It is crawled, obtains corresponding with described search information crawling data；

After the completion of crawling operation, by it is described crawl data and be uploaded to server parse；

The parsing result that the server returns is received, and the parsing result is shown, wherein the parsing result packet Include the target data to match with described search information.

2. the method according to claim 1, wherein calling browser after mailbox homepage loads successfully Before the step of call back function, further includes:

Script file is obtained from specified directory according to the version information file of client；

It will be in the call back function of script file injection browser.

3. according to the method described in claim 2, it is characterized in that, according to the version information file of client from specified directory Before the step of obtaining script file, further includes:

Communication interface is requested, updated version information file is obtained by the communication interface；

It will be compared between the updated version information file and original version information file；

Determine whether original version information file needs to update according to the comparison result.

4. according to the method described in claim 3, it is characterized in that, determining that original version is believed according to the comparison result After whether breath file needs the step of updating, further includes:

When original version information file needs to update, downloads and save the updated version information file；

It will be compared between the check value of the updated version information file and the check value of the communication interface；

Determine whether the updated version information file is downloaded correctly according to the comparison result.

5. according to the method described in claim 2, it is characterized in that, according to the version information file of client from specified directory The step of obtaining script file, comprising:

Search configuration file by specified directory, wherein the configuration file include multiple script files and with the script The corresponding configuration data of file；

In the presence of the script file, the corresponding configuration data of the script file is obtained, and by the school of the configuration data It tests between value and the check value of the script file and is compared；

Script file is obtained according to the comparison result.

6. the method according to claim 1, wherein described search information includes Credit Statement information；

Specified search information is obtained, and script is crawled to the mail data of the mailbox homepage by the script file It is crawled, obtains the step of crawling data corresponding with described search information, comprising:

The script file is executed according to specified Credit Statement information, wherein the script file includes crawling script；

Mail data relevant to the Credit Statement information is crawled in the mailbox homepage using the script that crawls；

The mail data that script repeatedly crawls is crawled described in statistics, and is obtained and the credit card account according to the statistical result Single information is corresponding to crawl data.

7. according to the method described in claim 6, it is characterized in that, receive the parsing result that the server returns, and by institute Parsing result to be stated to be shown, wherein the parsing result includes the steps that the target data to match with described search information, Include:

Obtain the parsing result that the server returns, wherein the parsing result includes and the Credit Statement information phase Matched billing data；

The billing data is shown in the mailbox homepage.

8. a kind of webpage mailbox data crawls device characterized by comprising

Calling module, for calling the call back function of browser, wherein the readjustment letter after mailbox homepage loads successfully Number includes the script file of injection；

Module is crawled, crawls script to the mailbox master for obtaining specified search information, and by the script file The mail data of the page is crawled, and obtains corresponding with described search information crawling data；

Data transmission module, for after the completion of crawl operation, by it is described crawl data and be uploaded to server parse；

Information display module, the parsing result returned for receiving the server, and the parsing result is shown, In, the parsing result includes the target data to match with described search information.

9. a kind of terminal, which is characterized in that including memory and processor, computer-readable finger is stored in the memory It enables, when the computer-readable instruction is executed by the processor, so that the processor is executed as claim 1 to 7 is any Step in the crawling method of one webpage mailbox data.

10. a kind of storage medium, it is characterised in that: be stored with computer-readable instruction, the computer in the storage medium When readable instruction is executed by one or more processors, so that one or more processors are executed as any in claim 1 to 7 The step of crawling method of the item webpage mailbox data.