CN106612283B

CN106612283B - A method and device for identifying the source of a downloaded file

Info

Publication number: CN106612283B
Application number: CN201611248788.3A
Authority: CN
Inventors: 张皓秋
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing 360 Zhiling Technology Co ltd
Priority date: 2016-12-29
Filing date: 2016-12-29
Publication date: 2020-02-28
Anticipated expiration: 2036-12-29
Also published as: CN106612283A

Abstract

The invention discloses a method and a device for identifying the source of a downloaded file, which relate to the technical field of network security. The main technical scheme of the invention is as follows: monitoring the flow of the accessed network file, and extracting partial data stream of the network file to obtain verification data; associating the link address of the network file with the verification data; monitoring a data stream of a written file in a local storage, and matching verification data contained in the data stream; and marking the link address associated with the verification data in the written file according to the verification data successfully matched. The method and the device are mainly used for identifying the source of the downloaded file.

Description

A method and device for identifying the source of a downloaded file

技术领域technical field

本发明涉及网络安全技术领域，尤其涉及一种识别下载文件来源的方法及装置。The present invention relates to the technical field of network security, and in particular, to a method and device for identifying the source of a downloaded file.

背景技术Background technique

随着计算机技术的飞速发展，信息网络已经成为社会发展的重要保证。通过网络人们能够获取到所需的信息内容。然而，在网络中浏览信息或网络文件时，一般是将这些文件暂时缓存在本地的内存中，在用户确定需要下载时，在从内存中将用户所需的文件信息保存至本地。在这一过程中，由于内存并不会对所缓存的内容进行标记，并且，在内存中往往会缓存有大量用户访问不同网址的数据信息，因此，在将内存中的数据信息保存至本地时，用户就很难判断所保存的数据信息的文件来源。这对于缺少安全防护的设备以及系统将形成巨大的安全隐患。With the rapid development of computer technology, information network has become an important guarantee for social development. Through the Internet, people can get the information they need. However, when browsing information or network files on the network, these files are generally temporarily cached in the local memory, and when the user determines that they need to be downloaded, the file information required by the user is saved locally from the memory. In this process, since the memory does not mark the cached content, and a large number of data information of users accessing different URLs are often cached in the memory, when saving the data information in the memory to the local , it is difficult for the user to judge the file source of the saved data information. This will form a huge security risk for equipment and systems that lack security protection.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明提供一种识别下载文件来源的方法及装置，通过对访问的网路文件进行采样和标记信息来源，并对写入本地存储的文件进行匹配以确定该文件的信息来源。In view of this, the present invention provides a method and device for identifying the source of a downloaded file, by sampling the accessed network file and marking the information source, and matching the file written in the local storage to determine the information source of the file.

依据本发明的一个方面，提出了一种识别下载文件来源的方法，该方法包括：According to one aspect of the present invention, a method for identifying the source of a downloaded file is proposed, the method comprising:

对所访问的网络文件进行流量监控，提取所述网络文件的部分数据流，得到验证数据；monitoring the flow of the accessed network file, extracting part of the data flow of the network file, and obtaining verification data;

将所述网络文件的链接地址与所述验证数据进行关联；associating the link address of the network file with the verification data;

监控本地存储中的写入文件的数据流，匹配所述数据流中含有的验证数据；Monitoring the data stream written to the file in the local storage, matching the verification data contained in the data stream;

根据匹配成功的验证数据，将所述验证数据关联的链接地址标注在所述写入文件中。According to the successfully matched verification data, the link address associated with the verification data is marked in the write file.

优选的，所述提取所述网络文件的部分数据流，得到验证数据包括：Preferably, the extracting part of the data stream of the network file to obtain the verification data includes:

随机提取所述网络文件中固定大小的一段数据流，生成验证数据；Randomly extracting a fixed-size data stream in the network file to generate verification data;

或者，提取所述网络文件中的多段数据流，生成含有多个数据流文件的验证数据。Or, extracting multiple data streams in the network file to generate verification data including multiple data stream files.

当所访问的网络文件为加密文件时，调用所述加密文件的解密函数；When the accessed network file is an encrypted file, the decryption function of the encrypted file is called;

提取所述解密函数输出的明文数据流中的部分数据流，得到所述验证数据。Part of the data stream in the plaintext data stream output by the decryption function is extracted to obtain the verification data.

优选的，所述将所述网络文件的链接地址与所述验证数据进行关联包括：Preferably, the associating the link address of the network file with the verification data includes:

获取所述网络文件的链接地址；Obtain the link address of the network file;

利用关联数组KEY-VALUE键值对关联所述链接地址与所述验证数据，KEY保存所述链接地址，VALUE保存所述验证数据。The link address and the verification data are associated with an associative array KEY-VALUE key-value pair, where the KEY stores the link address, and the VALUE stores the verification data.

优选的，所述方法还包括：Preferably, the method further includes:

将所述关联数组上报云服务器，以便所述云服务器根据所述关联数组中的链接地址判断所述网络文件是否安全；Reporting the associative array to the cloud server, so that the cloud server judges whether the network file is safe according to the link address in the associative array;

根据所述云服务器反馈的信息确定是否能够将所述网络文件保存至本地存储中。Whether the network file can be saved in the local storage is determined according to the information fed back by the cloud server.

优选的，所述方法还包括：Preferably, the method further includes:

根据写入文件中所标记的链接地址判断所述链接地址是否为安全链接；Determine whether the link address is a secure link according to the link address marked in the write file;

若不是，则对所述写入文件提示报警信息，执行隔离或删除操作。If not, an alarm message is prompted for the written file, and an isolation or deletion operation is performed.

依据本发明的另一个方面，提出了一种识别下载文件来源的装置，该装置包括：According to another aspect of the present invention, a device for identifying the source of a downloaded file is provided, the device comprising:

提取单元，用于对所访问的网络文件进行流量监控，提取所述网络文件的部分数据流，得到验证数据；an extraction unit, configured to monitor the flow of the accessed network file, extract part of the data flow of the network file, and obtain verification data;

关联单元，用于将所述网络文件的链接地址与所述提取单元提取得到的验证数据进行关联；an association unit, configured to associate the link address of the network file with the verification data extracted by the extraction unit;

匹配单元，用于监控本地存储中的写入文件的数据流，匹配所述数据流中含有的验证数据；a matching unit for monitoring the data stream written to the file in the local storage, and matching the verification data contained in the data stream;

标注单元，用于根据所述匹配单元匹配成功的验证数据，将所述验证数据关联的链接地址标注在所述写入文件中。A marking unit, configured to mark the link address associated with the verification data in the writing file according to the verification data successfully matched by the matching unit.

优选的，所述提取单元包括：Preferably, the extraction unit includes:

第一提取模块，用于随机提取所述网络文件中固定大小的一段数据流，生成验证数据；The first extraction module is used for randomly extracting a fixed-size data stream in the network file to generate verification data;

第二提取模块，用于提取所述网络文件中的多段数据流，生成含有多个数据流文件的验证数据。The second extraction module is configured to extract multiple data streams in the network file, and generate verification data containing multiple data stream files.

优选的，所述提取单元还包括：Preferably, the extraction unit further includes:

调用模块，用于当所访问的网络文件为加密文件时，调用所述加密文件的解密函数；a calling module for calling the decryption function of the encrypted file when the accessed network file is an encrypted file;

第三提取模块，用于提取所述调用模块调取的解密函数对加密文件解密后输出的明文数据流中的部分数据流，得到所述验证数据。The third extracting module is configured to extract a part of the data stream in the plaintext data stream output after decrypting the encrypted file by the decryption function called by the calling module to obtain the verification data.

优选的，所述关联单元包括：Preferably, the association unit includes:

获取模块，用于获取所述网络文件的链接地址；an acquisition module for acquiring the link address of the network file;

关联模块，用于利用关联数组KEY-VALUE键值对关联所述链接地址与所述验证数据，KEY保存所述链接地址，VALUE保存所述验证数据。The association module is used for associating the link address and the verification data by using an associative array KEY-VALUE key-value pair, where the KEY stores the link address, and the VALUE stores the verification data.

优选的，所述装置还包括：Preferably, the device further includes:

发送单元，用于将所述关联单元得到的关联数组上报云服务器，以便所述云服务器根据所述关联数组中的链接地址判断所述网络文件是否安全；a sending unit, configured to report the associative array obtained by the associative unit to the cloud server, so that the cloud server judges whether the network file is safe according to the link address in the associative array;

确定单元，用于根据所述云服务器反馈的信息确定是否能够将所述网络文件保存至本地存储中。A determining unit, configured to determine whether the network file can be saved in the local storage according to the information fed back by the cloud server.

优选的，所述装置还包括：Preferably, the device further includes:

判断单元，用于根据所述标注单元在写入文件中所标记的链接地址判断所述链接地址是否为安全链接；a judging unit for judging whether the link address is a secure link according to the link address marked by the labeling unit in the write file;

处理单元，当所述判断单元判断所述链接地址不是安全链接时，对所述写入文件提示报警信息，执行隔离或删除操作。A processing unit, when the judging unit judges that the link address is not a secure link, prompts an alarm message to the written file, and executes an isolation or deletion operation.

本发明所采用的一种识别下载文件来源的方法及装置，主要用于对下载的文件进行来源标注，以便于根据文件来源判断文件的安全性。在识别过程中，通过在文件缓存过程中提取其中的部分数据作为验证数据，同时记录访问该文件的链接地址，并与对应的验证数据进行关联，之后在将缓存中的文件向本地存储中保存时，监控写入文件的数据流，匹配出数据流中含有的验证数据，将与该验证数据关联的链接地址标注在写入文件中，完成对下载文件来源的标注。通过对文件来源的标注可以在无法识别文件内容的情况下，通过对文件来源的判断识别该文件是否为安全文件，从而提高系统对文件识别的能力。The method and device for identifying the source of a downloaded file adopted in the present invention are mainly used to mark the source of the downloaded file, so as to judge the security of the file according to the source of the file. In the identification process, part of the data in the file cache is extracted as verification data, and the link address for accessing the file is recorded at the same time, and is associated with the corresponding verification data, and then the file in the cache is saved to the local storage. , monitor the data stream written to the file, match the verification data contained in the data stream, mark the link address associated with the verification data in the write file, and complete the marking of the source of the downloaded file. By marking the source of the file, if the content of the file cannot be identified, whether the file is a secure file can be identified by judging the source of the file, thereby improving the ability of the system to identify the file.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1示出了本发明实施例提出的一种识别下载文件来源的方法流程图；1 shows a flowchart of a method for identifying the source of a downloaded file provided by an embodiment of the present invention;

图2示出了本发明实施例提出的另一种识别下载文件来源的方法流程图；2 shows a flowchart of another method for identifying the source of a downloaded file provided by an embodiment of the present invention;

图3示出了本发明实施例提出的一种识别下载文件来源的装置的组成框图；3 shows a block diagram of a composition of an apparatus for identifying the source of a downloaded file provided by an embodiment of the present invention;

图4示出了本发明实施例提出的另一种识别下载文件来源的装置的组成框图。FIG. 4 shows a block diagram of another apparatus for identifying the source of a downloaded file provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例，然而应当理解，可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本发明，并且能够将本发明的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present invention will be more thoroughly understood, and will fully convey the scope of the present invention to those skilled in the art.

本发明实施例提供了一种识别下载文件来源的方法，该方法主要用于对下载文件进行来源标注，通过将文件下载的过程分为两个阶段来实现对来源信息的确认与标注。这两阶段分别为文件的缓存阶段与保存阶段，其中，缓存阶段是用户在进行网络文件浏览时对文件进行缓存的过程，在该过程中将文件与其网络地址建立关联并保存，而在保存阶段，即将缓存中的文件保存至本地存储的过程，将与该文件对应的网路地址标注在该文件中。由于用户所查看的网络文件大多数都无需保存，因此，设备缓存中一般会存储有大量的用户已浏览的网络文件，而现有的文件保存方式中，只是在用户需要将缓存中的文件保存至本地时执行相应的文件移动的操作，而无法再追溯该文件的来源信息。但是，对于一个未知文件而言，其安全性在很大程度上可以通过其来源进行判断。因此，本发明实施例所提供的下载文件来源的识别方法正是针对该问题所提出一种解决方案，其具体步骤如图1所示，包括：The embodiment of the present invention provides a method for identifying the source of a downloaded file. The method is mainly used to mark the source of the downloaded file, and the source information is confirmed and marked by dividing the file downloading process into two stages. The two stages are the file caching stage and the saving stage. The caching stage is the process of caching the file when the user browses the network file. In this process, the file is associated with its network address and saved. , that is, the process of saving the file in the cache to the local storage, and marking the network address corresponding to the file in the file. Since most of the network files viewed by users do not need to be saved, a large number of network files that users have browsed are generally stored in the device cache. When the file is moved to the local, the corresponding file movement operation is performed, and the source information of the file can no longer be traced. However, for an unknown file, its security can largely be judged by its source. Therefore, the method for identifying the source of the downloaded file provided by the embodiment of the present invention proposes a solution to this problem. The specific steps are shown in FIG. 1 , including:

101、对所访问的网络文件进行流量监控，提取该网络文件的部分数据流，得到验证数据。101. Monitor the flow of the accessed network file, extract part of the data flow of the network file, and obtain verification data.

该步骤是在用户通过互联网查看网络文件时，将该网络文件缓存至本地设备的内存或缓存区域中的过程。其中，网络文件的类型不限定是文本、图片、视频等格式的文件。在该过程中，本地设备通过网络接口获取网络数据，而目前的网络请求与响应都是通过网络数据流的方式来发送与传输数据的，而数据流是传输中所使用的信息的数字编码信号序列。而通过对这些数据流的监控，也就是对访问网络文件的流量监控，就可以实时地获取保存至缓存中的网络文件的数据内容。由于数据流所具有的序列性质，不同的网络文件中就具有着不同的数据序列，因此，通过提取写入到缓存中的网络文件的部分数据流与该网路文件就具有着唯一的对应关系，通过所提取部分数据流，就可以用于后续向本地存储中写入缓存中的文件进行数据验证。This step is a process of caching the network file in the memory or the cache area of the local device when the user views the network file through the Internet. The types of network files are not limited to files in formats such as text, pictures, and videos. In this process, the local device obtains network data through the network interface, and the current network requests and responses are all sent and transmitted through network data streams, which are digitally encoded signals of information used in transmission. sequence. By monitoring these data streams, that is, monitoring the traffic of accessing network files, the data content of the network files saved in the cache can be acquired in real time. Due to the sequence nature of the data stream, different network files have different data sequences. Therefore, by extracting part of the data stream of the network file written into the cache, there is a unique correspondence with the network file. , and the extracted part of the data stream can be used for subsequent data verification by writing the files in the cache to the local storage.

所以，该步骤就是在用户浏览每个网络文件时，都会在将该网络文件的流数据下载到缓存中时，从数据流中提取部分数据流作为该网络文件的验证数据。其中，对于提取部分数据流的方式本发明实施例中并不限定是采用随机采样的方式提取或者是采用对固定位置进行提取的方式。将所提取的部分数据流与网络文件建立关联关系，并保存在本地，以供后续匹配步骤的调用。Therefore, in this step, when the user browses each network file, when the stream data of the network file is downloaded into the cache, a part of the data stream is extracted from the data stream as the verification data of the network file. The manner of extracting a part of the data stream is not limited in the embodiment of the present invention to be a manner of extracting by random sampling or a manner of extracting a fixed position. The extracted partial data stream is associated with the network file and saved locally for invocation of subsequent matching steps.

102、将网络文件的链接地址与验证数据进行关联。102. Associate the link address of the network file with the verification data.

根据步骤101中所提取的部分数据流，即验证数据，在保存时，除了需要建立验证数据与网络文件的关联关系，还需要建立与该网络文件来源的关联关系，也就是获取该网络文件的链接地址。为此，在提取验证数据的同时，还需要记录当前接收数据流的链接地址，并将该链接地址一同保存在本地，具体的，可以将这些具有关联关系的数据以数据表的形式加以保存，该数据表中记录有每个缓存的网络文件的文件名、提取的验证数据以及对应的链接地址。通过在该数据表中查询文件名或验证数据亦或链接地址，就可以匹配出于查询内容相关联的其他两项内容。According to the part of the data stream extracted in step 101, that is, the verification data, when saving, in addition to establishing an association relationship between the verification data and the network file, it is also necessary to establish an association relationship with the source of the network file, that is, to obtain the network file. link address. For this reason, while extracting the verification data, it is also necessary to record the link address of the currently received data stream, and save the link address together locally. Specifically, these associated data can be stored in the form of a data table. The data table records the file name of each cached network file, the extracted verification data and the corresponding link address. By querying the file name or verification data or link address in the data table, the other two contents associated with the query content can be matched.

103、监控本地存储中的写入文件的数据流，匹配该数据流中含有的验证数据。103. Monitor the data stream written to the file in the local storage, and match the verification data contained in the data stream.

其中，该步骤中所监控的存储在本地存储中的写入文件是指将缓存中的网络文件保存至本地存储中的网络文件。由于这些网络文件可能并不是当前浏览的网络内容，因此，要判断这些网络内容的来源，就需要在将这些网络文件向本地存储中写入时，实时的监控所写入的内容，判断该内容中是否含有上述步骤中所保存的验证数据。Wherein, the write file stored in the local storage monitored in this step refers to saving the network file in the cache to the network file in the local storage. Since these network files may not be the currently browsed network content, in order to determine the source of these network content, it is necessary to monitor the written content in real time when writing these network files to the local storage, and determine the content. contains the verification data saved in the above steps.

由于缓存中的每个网络文件在进行缓存时都进行了验证数据的提取，并且，这些验证数据与网络文件存在着唯一的对应关系，因此，在匹配写入本地存储中的网络文件时，必然会匹配到一个验证数据，当匹配到对应的验证数据后，就执行步骤104。而对于未匹配出验证数据的情况，其原因可能存在两种：一种是该网络文件是在执行提取验证数据操作之前就存储在缓存中的情况，导致缓存的该网络文件没有对应的验证数据。另一种是在从缓存向本地存储写入的过程中或写入操作之前，该网络文件被改写，从而导致的无法匹配出对应的验证数据。Since each network file in the cache is cached, the verification data is extracted, and there is a unique correspondence between these verification data and the network file. Therefore, when matching the network file written to the local storage, it must be A verification data will be matched, and when the corresponding verification data is matched, step 104 is executed. For the case where the verification data is not matched, there may be two reasons: one is that the network file is stored in the cache before the extraction verification data operation is performed, resulting in the cached network file without corresponding verification data . The other is that the network file is rewritten during the process of writing from the cache to the local storage or before the writing operation, so that the corresponding verification data cannot be matched.

对于以上两种情况，所写入的网络文件由于无法得到其对应的链接地址，从而无法判断该文件的安全性。此时，可以通过提示页面告知用户，所保存的当前网络文件存在风险，要求用户确定是继续执行写入操作或者是终止该操作。For the above two cases, since the network file written cannot obtain its corresponding link address, it is impossible to judge the security of the file. At this time, the user may be informed through a prompt page that the stored current network file is at risk, and the user is required to determine whether to continue the writing operation or terminate the operation.

104、根据匹配成功的验证数据，将该验证数据关联的链接地址标注在写入文件中。104. According to the successfully matched verification data, mark the link address associated with the verification data in the write file.

当监控的写入文件中含有对应的验证数据时，系统将利用该验证数据查找其对应的链接地址，并将该链接地址标注在该写入文件中。本发明实施例中具体的标注方式是以标签的形式将链接地址添加到文件的属性中，或者是将链接地址增加到该写入文件的文件名称中。When the monitored write file contains corresponding verification data, the system will use the verification data to find its corresponding link address, and mark the link address in the write file. The specific labeling method in the embodiment of the present invention is to add the link address to the attribute of the file in the form of a label, or to add the link address to the file name of the written file.

上述本发明实施例提供的一种识别下载文件来源的方法，主要用于对下载的文件进行来源标注，以便于根据文件来源判断文件的安全性。在识别过程中，通过在文件缓存过程中提取其中的部分数据作为验证数据，同时记录访问该文件的链接地址，并与对应的验证数据进行关联，之后在将缓存中的文件向本地存储中保存时，监控写入文件的数据流，匹配出数据流中含有的验证数据，将与该验证数据关联的链接地址标注在写入文件中，完成对下载文件来源的标注。通过对文件来源的标注可以在无法识别文件内容的情况下，通过对文件来源的判断识别该文件是否为安全文件，从而提高系统对文件识别的能力。The above-mentioned method for identifying the source of a downloaded file provided by the embodiment of the present invention is mainly used to mark the source of the downloaded file, so as to judge the security of the file according to the source of the file. In the identification process, part of the data in the file cache is extracted as verification data, and the link address for accessing the file is recorded at the same time, and is associated with the corresponding verification data, and then the file in the cache is saved to the local storage. , monitor the data stream written to the file, match the verification data contained in the data stream, mark the link address associated with the verification data in the write file, and complete the marking of the source of the downloaded file. By marking the source of the file, if the content of the file cannot be identified, whether the file is a secure file can be identified by judging the source of the file, thereby improving the ability of the system to identify the file.

进一步的，为了更加详细的说明上述的识别下载文件来源的方法在实际应用中的具体实现，特别是在网络文件的缓存阶段，从缓存网络文件的数据流中提取验证数据以及保存验证数据的具体方式，将在以下实施例中将进行详细说明，具体如图2所示，包括：Further, in order to explain in more detail the specific implementation of the above-mentioned method for identifying the source of downloaded files in practical applications, especially in the cache stage of network files, the specific details of extracting verification data from the data stream of the cached network file and saving the verification data. The method will be described in detail in the following embodiments, as shown in Figure 2, including:

201、提取缓存的网络文件的部分数据流，得到验证数据。201. Extract a part of the data stream of the cached network file to obtain verification data.

该步骤中提取网络文件的部分数据流的方式在本发明实施例中可以是在网络文件中随机地提取一段固定大小的数据流，其中，固定大小的数据流为了确保具有与网络文件的唯一对应关系，该数据流的大小应具有足够的数据内容，从而生成验证数据。此外，还可以在该网络文件中提取多段的数据流，有多段的数据流组合成一个验证数据，如此，在进行验证数据的匹配时，只有在各段数据流都匹配成功的情况下才确定写入文件与该验证数据对应的网络文件为同一个文件。The method of extracting part of the data stream of the network file in this step may be to randomly extract a fixed-size data stream from the network file in this embodiment of the present invention, wherein, in order to ensure that the fixed-size data stream has a unique correspondence with the network file relationship, the size of this data stream should have enough data content to generate validation data. In addition, multiple data streams can also be extracted from the network file, and the multiple data streams can be combined into one verification data. In this way, when matching the verification data, it is determined only when all the data streams are successfully matched. The write file and the network file corresponding to the verification data are the same file.

进一步的，对于用户所访问的网络文件为加密文件时，此时，在缓存该加密文件时，系统将根据该加密文件的加密格式调用对应的解密函数对该加密文件进行解密，并且在解密得到的明文数据中提取部分数据流从而得到对应的验证数据。Further, when the network file accessed by the user is an encrypted file, at this time, when the encrypted file is cached, the system will call the corresponding decryption function according to the encrypted format of the encrypted file to decrypt the encrypted file, and obtain the encrypted file after decryption. Extract part of the data stream from the plaintext data to obtain the corresponding verification data.

202、将网络文件的链接地址与验证数据进行关联。202. Associate the link address of the network file with the verification data.

该步骤是在获取到验证数据的基础上，将网络文件的链接地址与该验证数据建立关联关系。在本发明实施例中，该关联关系是采用关联数组的形式表达的，具体的，该关联数据的表现形式为KEY-VALUE键值对，其中，KEY保存链接地址，VALUE保存验证数据。通过判断写入文件中是否所含有相同的VALUE值，若相同，则证明该写入文件为提取出该验证数据的网络文件，也就是该写入文件的来源为KEY值中所保存的链接地址。This step is to establish an association relationship between the link address of the network file and the verification data on the basis of obtaining the verification data. In the embodiment of the present invention, the association relationship is expressed in the form of an associative array. Specifically, the associated data is expressed in the form of a KEY-VALUE key-value pair, where the KEY stores the link address and the VALUE stores the verification data. By judging whether the written file contains the same VALUE value, if the same, it proves that the written file is a network file from which the verification data is extracted, that is, the source of the written file is the link address stored in the KEY value. .

以上两个步骤就是用户在浏览网络文件过程中将网络文件缓存在本地设备的内存或指定的缓存区中所进行的处理操作，主要包括验证数据的提取以及关联关系的建立。所对应的处理结果就是为缓存中的所有网络文件建立一个验证数据，并将该验证数据与网络文件对应的链接地址相关联，从而得到一个关联关系的列表，在该列表中分别存储着上述的关联数组。在此基础上，本发明实施例还可以将该列表上报给云服务器，由云服务器对该列表中的关联数据进行解析，并对其中所记载的链接地址进行识别，判断链接地址是否为恶意网址或者是安全网址，通过云服务器的识别判断，将向本地反馈对应的信息。据此，本地设备可将缓存中被识别为恶意网址所对应的网络文件进行删除，或者设置这些网络文件为禁止向本地存储中执行写操作的数据文件。The above two steps are the processing operations performed by the user to cache the network file in the memory of the local device or in the designated cache area during the process of browsing the network file, mainly including the extraction of the verification data and the establishment of the association relationship. The corresponding processing result is to create a verification data for all network files in the cache, and associate the verification data with the link address corresponding to the network file, thereby obtaining a list of association relationships, in which the above-mentioned items are stored respectively. associative array. On this basis, the embodiment of the present invention can also report the list to the cloud server, and the cloud server parses the associated data in the list, identifies the link address recorded in the list, and determines whether the link address is a malicious website Or a secure website, through the identification and judgment of the cloud server, the corresponding information will be fed back to the local. Accordingly, the local device can delete the network files identified as malicious URLs in the cache, or set these network files as data files that are prohibited from being written to the local storage.

203、监控本地存储中的写入文件的数据流，匹配该数据流中含有的验证数据。203. Monitor the data stream written to the file in the local storage, and match the verification data contained in the data stream.

204、根据匹配成功的验证数据，将该验证数据关联的链接地址标注在写入文件中。204. According to the successfully matched verification data, mark the link address associated with the verification data in the write file.

以上的两个步骤是与上述的步骤103和步骤104相同，都是在网络文件的保存阶段所进行的具体操作，即对所写入的数据流进行实时的监控，匹配对应的验证数据，该验证数据即为步骤201中所提取出的验证数据，并根据对应的验证数据在写入文件中标注其来源，即标注步骤202中所关联的链接地址。The above two steps are the same as the above-mentioned steps 103 and 104, and they are all specific operations performed in the storage stage of the network file, that is, the real-time monitoring of the written data stream is performed, and the corresponding verification data is matched. The verification data is the verification data extracted in step 201 , and its source is marked in the written file according to the corresponding verification data, that is, the link address associated in step 202 is marked.

通过以上的步骤就完成了对写入文件的来源识别的过程，在该过程中需要特别说明的是，写入文件的过程是从本地的缓存中向本地存储进行保存的过程，例如，从内存向硬盘写入文件的过程。该过程是用户缓存中的多个文件中选择的一个数据文件，通过本发明实施例提供的方法对该数据文件标注来源信息。这样做的主要目的在于用户无需在本地执行该数据文件的情况下判断该数据文件的安全性，从而避免因执行该数据文件而对本地设备或系统所造成的危害以及损失。针对标注来源信息的网络文件如何进行安全性的识别可参考步骤205。Through the above steps, the process of identifying the source of the written file is completed. In this process, it should be noted that the process of writing the file is the process of saving from the local cache to the local storage, for example, from the memory The process of writing files to the hard disk. In this process, a data file is selected from multiple files in the cache of the user, and source information is marked on the data file by the method provided by the embodiment of the present invention. The main purpose of this is that the user does not need to judge the security of the data file without executing the data file locally, so as to avoid harm and loss to the local device or system caused by the execution of the data file. Refer to step 205 for how to identify the security of the network file marked with the source information.

205、根据写入文件所标注的链接地址确定该写入文件是否可以保存在本地存储中。205. Determine whether the written file can be saved in the local storage according to the link address marked in the written file.

当确定了写入文件中的来源标识，即链接地址后，本地设备就可以通过本地系统中的防御软件对该链接地址进行识别判断，以确定该链接地址是否为恶意网址。在本地种的防御软件无法确定出是否安全时，此时可以进一步地通过网络，向云服务器发送这些链接地址，借助云服务器判断链接地址是否为恶意网址或者是安全网址。After determining the source identifier written in the file, that is, the link address, the local device can identify and judge the link address through the defense software in the local system to determine whether the link address is a malicious website. When the local defense software cannot determine whether it is safe, it can further send these link addresses to the cloud server through the network, and use the cloud server to determine whether the link address is a malicious website or a safe website.

进一步的，当确定写入文件的链接地址为非安全网址时，系统将向用户输出报警信息，该报警信息将根据链接地址性质向用户显示不同级别的处理意见，例如，对于恶意网址，输出的报警信息将建议用户删除该写入文件，而对于无法确定安全性的链接地址，所输出的报警信息为提示用户该写入文件存在风险，建议对该写入文件进行隔离。此外，通过向用户输出的报警信息还用于获取用户的操作指令，例如，在报警信息中，用户还可以选择保存该写入文件，或者是删除该写入文件的操作指令。Further, when it is determined that the link address written to the file is a non-secure website, the system will output alarm information to the user, and the alarm information will display different levels of processing opinions to the user according to the nature of the link address. The alarm information will advise the user to delete the write file, and for the link address whose security cannot be determined, the output alarm information is to remind the user that the write file is at risk, and it is recommended to isolate the write file. In addition, the alarm information output to the user is also used to obtain the user's operation instruction. For example, in the alarm information, the user can also choose to save the written file or delete the operation instruction of the written file.

通过上述实施例中所提出的识别下载文件来源的方法可见，将缓存中的未知文件保存至本地存储时，对该文件进行链接地址的标注不仅可以有助于用户了解该文件的相关信息，更加可以通过链接地址来判断文件的安全性，从而提高了网络文件在本地应用的安全性。It can be seen from the method for identifying the source of the downloaded file proposed in the above embodiment that when an unknown file in the cache is saved to the local storage, marking the link address of the file can not only help the user to understand the relevant information of the file, but also The security of the file can be judged by the link address, thereby improving the security of the local application of the network file.

以上详细说明了识别下载文件来源的方法在实际应用中的具体实现，作为实现上述方法的具体装置，本发明实施例还提供了一种识别下载文件来源的装置，如图3所示，该装置包括：The specific implementation of the method for identifying the source of a downloaded file in practical applications is described in detail above. As a specific device for implementing the above method, an embodiment of the present invention also provides a device for identifying the source of a downloaded file. As shown in FIG. 3 , the device include:

提取单元31，用于对所访问的网络文件进行流量监控，提取所述网络文件的部分数据流，得到验证数据；The extraction unit 31 is used to monitor the flow of the accessed network file, extract part of the data flow of the network file, and obtain verification data;

关联单元32，用于将所述网络文件的链接地址与所述提取单元31提取得到的验证数据进行关联；an association unit 32, configured to associate the link address of the network file with the verification data extracted by the extraction unit 31;

匹配单元33，用于监控本地存储中的写入文件的数据流，匹配所述数据流中含有的所述提取单元31提取的验证数据；A matching unit 33, for monitoring the data stream of the written file in the local storage, and matching the verification data extracted by the extraction unit 31 contained in the data stream;

标注单元34，用于根据所述匹配单元33匹配成功的验证数据，将所述验证数据关联的链接地址标注在所述写入文件中。The marking unit 34 is configured to mark the link address associated with the verification data in the writing file according to the verification data successfully matched by the matching unit 33 .

进一步的，如图4所示，所述提取单元31包括：Further, as shown in FIG. 4 , the extraction unit 31 includes:

第一提取模块311，用于随机提取所述网络文件中固定大小的一段数据流，生成验证数据；The first extraction module 311 is used to randomly extract a fixed-size data stream in the network file to generate verification data;

第二提取模块312，用于提取所述网络文件中的多段数据流，生成含有多个数据流文件的验证数据。The second extraction module 312 is configured to extract multiple data streams in the network file, and generate verification data including multiple data stream files.

进一步的，如图4所示，所述提取单元31还包括：Further, as shown in FIG. 4 , the extraction unit 31 further includes:

调用模块313，用于当所访问的网络文件为加密文件时，调用所述加密文件的解密函数；The calling module 313 is used to call the decryption function of the encrypted file when the accessed network file is an encrypted file;

第三提取模块314，用于提取所述调用模块313调取的解密函数对加密文件解密后输出的明文数据流中的部分数据流，得到所述验证数据。The third extracting module 314 is configured to extract part of the data stream in the plaintext data stream output after decrypting the encrypted file by the decryption function called by the calling module 313 to obtain the verification data.

进一步的，如图4所示，所述关联单元32包括：Further, as shown in FIG. 4 , the association unit 32 includes:

获取模块321，用于获取所述网络文件的链接地址；Obtaining module 321, for obtaining the link address of the network file;

关联模块322，用于利用关联数组KEY-VALUE键值对关联所述获取模块321获取的链接地址与所述验证数据，KEY保存所述链接地址，VALUE保存所述验证数据。The association module 322 is configured to associate the link address obtained by the obtaining module 321 with the verification data by using an associative array KEY-VALUE key-value pair, where the KEY stores the link address, and the VALUE stores the verification data.

进一步的，如图4所示，所述装置还包括：Further, as shown in Figure 4, the device further includes:

发送单元35，用于将所述关联单元32得到的关联数组上报云服务器，以便所述云服务器根据所述关联数组中的链接地址判断所述网络文件是否安全；The sending unit 35 is used to report the associative array obtained by the associative unit 32 to the cloud server, so that the cloud server judges whether the network file is safe according to the link address in the associative array;

确定单元36，用于根据所述云服务器反馈的信息确定是否能够将所述网络文件保存至本地存储中。The determining unit 36 is configured to determine whether the network file can be saved in the local storage according to the information fed back by the cloud server.

判断单元37，用于根据所述标注单元34在写入文件中所标记的链接地址判断所述链接地址是否为安全链接；Judging unit 37, for judging whether the link address is a secure link according to the link address marked by the labeling unit 34 in the write file;

处理单元38，当所述判断单元37判断所述链接地址不是安全链接时，对所述写入文件提示报警信息，执行隔离或删除操作。The processing unit 38, when the judging unit 37 judges that the link address is not a secure link, prompts an alarm message to the written file, and executes an isolation or deletion operation.

综上所述，本发明实施例所提供的一种识别下载文件来源的方法及装置，主要用于对下载的文件进行来源标注，以便于根据文件来源判断文件的安全性。在识别过程中，通过在文件缓存过程中提取其中的部分数据作为验证数据，同时记录访问该文件的链接地址，并与对应的验证数据进行关联，之后在将缓存中的文件向本地存储中保存时，监控写入文件的数据流，匹配出数据流中含有的验证数据，将与该验证数据关联的链接地址标注在写入文件中，完成对下载文件来源的标注。通过对文件来源的标注可以在无法识别文件内容的情况下，帮助用户对该文件的内容提供相关信息，以便用户能够判断该文件是否为所需下载的文件，而更为重要的是，通过对文件来源的判断识别该文件是否为安全文件，从而提高系统对文件识别的能力。To sum up, the method and device for identifying the source of a downloaded file provided by the embodiments of the present invention are mainly used to mark the source of the downloaded file, so as to judge the security of the file according to the source of the file. In the identification process, by extracting part of the data in the file cache process as verification data, recording the link address of accessing the file, and associating it with the corresponding verification data, and then saving the file in the cache to the local storage , monitor the data stream written to the file, match the verification data contained in the data stream, mark the link address associated with the verification data in the write file, and complete the marking of the source of the downloaded file. By marking the source of the file, it can help the user to provide relevant information on the content of the file when the content of the file cannot be identified, so that the user can judge whether the file is the file to be downloaded. The determination of the source of the file identifies whether the file is a secure file, thereby improving the system's ability to identify the file.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

可以理解的是，上述云端服务器及装置中的相关特征可以相互参考。另外，上述实施例中的“第一”、“第二”等是用于区分各实施例，而并不代表各实施例的优劣。It can be understood that, the relevant features in the above cloud server and device may refer to each other. In addition, "first", "second", etc. in the above-mentioned embodiments are used to distinguish each embodiment, and do not represent the advantages and disadvantages of each embodiment.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统，装置和单元的具体工作过程，可以参考前述云端服务器实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing cloud server embodiment, which will not be repeated here.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not directed to any particular programming language. It should be understood that various programming languages may be used to implement the inventions described herein, and that the descriptions of specific languages above are intended to disclose the best mode for carrying out the invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的云端服务器、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known cloud servers, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本发明并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的云端服务器解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it is to be understood that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together into a single embodiment, figure, or its description. However, this disclosure of a cloud server should not be construed as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何云端服务器或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any cloud server so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive Or all processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the invention within and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的发明名称(如确定网站内连接等级的装置)中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的云端服务器的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。Various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It should be understood by those skilled in the art that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the components of the inventive title (eg, a device for determining a connection level within a website) according to embodiments of the present invention some or all of the functions. The present invention can also be implemented as an apparatus or apparatus program (eg, computer programs and computer program products) for executing part or all of the cloud server described herein. Such a program implementing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-described embodiments illustrate rather than limit the invention, and that alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

Claims

1. A method for identifying a source of a downloaded file, the method comprising:

monitoring the flow of the accessed network file, and extracting partial data stream of the network file to obtain verification data;

associating the link address of the network file with the verification data;

monitoring a data stream of a written file in a local storage, and matching verification data contained in the data stream;

and marking the link address associated with the verification data in the written file according to the verification data successfully matched.

2. The method of claim 1, wherein extracting the partial data stream of the network file to obtain the verification data comprises:

randomly extracting a section of data stream with a fixed size in the network file to generate verification data;

or extracting a plurality of data streams in the network file to generate verification data containing a plurality of data stream files.

3. The method of claim 1 or 2, wherein extracting the partial data stream of the network file to obtain the verification data comprises:

when the accessed network file is an encrypted file, calling a decryption function of the encrypted file;

and extracting part of data stream in the plaintext data stream output by the decryption function to obtain the verification data.

4. The method of claim 1, wherein associating the link address of the network file with the authentication data comprises:

acquiring a link address of the network file;

and associating the link address with the verification data by using an associated array KEY-VALUE KEY VALUE pair, wherein the KEY stores the link address, and the VALUE stores the verification data.

5. The method of claim 4, further comprising:

reporting the association array to a cloud server so that the cloud server can judge whether the network file is safe or not according to the link address in the association array;

and determining whether the network file can be stored in a local storage according to the information fed back by the cloud server.

6. The method of claim 1, further comprising:

judging whether the link address is a safe link or not according to the link address marked in the written file;

if not, prompting alarm information to the written file, and executing isolation or deletion operation.

7. An apparatus for identifying a source of a downloaded file, the apparatus comprising:

the extraction unit is used for monitoring the flow of the accessed network file, extracting partial data stream of the network file and obtaining verification data;

the association unit is used for associating the link address of the network file with the verification data extracted by the extraction unit;

the matching unit is used for monitoring the data stream of the written file in the local storage and matching the verification data contained in the data stream;

and the marking unit is used for marking the link address associated with the verification data in the written file according to the verification data successfully matched by the matching unit.

8. The apparatus of claim 7, wherein the extraction unit comprises:

the first extraction module is used for randomly extracting a section of data stream with a fixed size in the network file to generate verification data;

and the second extraction module is used for extracting a plurality of data streams in the network file and generating verification data containing a plurality of data stream files.

9. The apparatus according to claim 7 or 8, wherein the extraction unit further comprises:

the calling module is used for calling a decryption function of the encrypted file when the accessed network file is the encrypted file;

and the third extraction module is used for extracting part of data streams in the plaintext data streams output after the encrypted files are decrypted by the decryption function called by the calling module to obtain the verification data.

10. The apparatus of claim 7, wherein the associating unit comprises:

the acquisition module is used for acquiring the link address of the network file;

and the association module is used for associating the link address with the verification data by using an association array KEY-VALUE KEY VALUE pair, wherein the KEY stores the link address, and the VALUE stores the verification data.

11. The apparatus of claim 10, further comprising:

the sending unit is used for reporting the association array obtained by the association unit to a cloud server so that the cloud server can judge whether the network file is safe or not according to the link address in the association array;

and the determining unit is used for determining whether the network file can be saved in a local storage according to the information fed back by the cloud server.

12. The apparatus of claim 7, further comprising:

the judging unit is used for judging whether the link address is a safe link or not according to the link address marked by the marking unit in the written file;

and the processing unit prompts alarm information for the written file and executes isolation or deletion operation when the judging unit judges that the link address is not the safe link.