CN116186764B

CN116186764B - Data security inspection method and system

Info

Publication number: CN116186764B
Application number: CN202310011989.5A
Authority: CN
Inventors: 陈剑飞; 刘新; 刘冬兰; 张昊; 王睿; 孙莉莉; 张方哲; 刘圣龙; 王迪; 牛德玲; 张婕; 王聪; 鲁统贺
Original assignee: Big Data Center Of State Grid Corp Of China; State Grid Shandong Electric Power Co Ltd
Current assignee: Big Data Center Of State Grid Corp Of China; State Grid Shandong Electric Power Co Ltd
Priority date: 2023-01-05
Filing date: 2023-01-05
Publication date: 2023-09-15
Anticipated expiration: 2043-01-05
Also published as: CN116186764A

Abstract

The invention relates to the technical field of data security inspection, and discloses a data security inspection method, which comprises the steps of creating task items in a task database through a processing module, writing various information of a new task, including a file storage path to be scanned, then scanning data through a scanning module, generating a scanning result to an inspection report file to store a designated path after scanning is completed, inquiring the task items in the database through an inquiring module by a user, reading the scanning result from a file system according to the designated path if the task state is completed, realizing scanning of the data, quickly obtaining the scanning result, and facilitating the user to quickly take corresponding measures.

Description

A data security inspection method and system

技术领域Technical field

本发明涉及数据安全检查技术领域，具体涉及一种数据安全检查方法及系统。The invention relates to the technical field of data security inspection, and in particular to a data security inspection method and system.

背景技术Background technique

随着互联网技术的普及及应用，数据的安全性问题不容忽视，计算机系统安全是为数据处理系统建立和采用的技术和管理的安全保护，保护计算机硬件、软件和数据不因偶然和恶意的原因遭到破坏、更改和泄露；即通过采用各种技术和管理措施，使网络系统正常运行，从而确保网络数据的可用性、完整性和保密性；确保经过网络传输和交换的数据不会发生增加、修改、丢失和泄露。With the popularization and application of Internet technology, data security issues cannot be ignored. Computer system security is the security protection of technology and management established and adopted for data processing systems, protecting computer hardware, software and data from accidental and malicious reasons. Be destroyed, altered and leaked; that is, by using various technical and management measures to ensure the normal operation of the network system, thereby ensuring the availability, integrity and confidentiality of network data; ensuring that the data transmitted and exchanged through the network will not increase, Modification, loss and disclosure.

在如今的信息时代，企业越来越多的将技术成果、生产记录、市场统计等信息以数据的形式保存下来，越来越多的数据信息以电子文档的方式保存，中国专利“CN108133148B”公开了一种数据安全检查方法及系统，对采集终端进行可信认证、对数据进行安全采集传输、存储，确保待检查数据的安全可靠，数据安全检查结果真实有效此外，本发明支持多种应用模式，包括本地检查模式、云端检查模式、云地同步检查模式以及云地智能切换检查模式，应用模式多样化，能够满足数据安全检查在复杂业务场景下应用需求。In today's information age, more and more companies are saving technological achievements, production records, market statistics and other information in the form of data. More and more data information is saved in the form of electronic documents. The Chinese patent "CN108133148B" has been made public. A data security inspection method and system are provided, which performs trusted authentication on the collection terminal, securely collects, transmits, and stores data to ensure the safety and reliability of the data to be inspected, and the data security inspection results are true and effective. In addition, the present invention supports multiple application modes. , including local inspection mode, cloud inspection mode, cloud-to-ground synchronous inspection mode and cloud-to-ground intelligent switching inspection mode. The application modes are diverse and can meet the application needs of data security inspection in complex business scenarios.

上述专利虽然可以通过数据安全检查模块调用与预设的检查模式匹配的安全检查子模块，依据被调用的安全检查子模块对待检查数据进行安全检查，得到安全检查结果，但是，并没有如何具体的检查各种不同类型数据文件的方法，不法分子可能会过将敏感数据混杂在正常文件中，从而将敏感数据带出。Although the above-mentioned patent can call the security inspection sub-module that matches the preset inspection mode through the data security inspection module, perform security inspection on the data to be inspected based on the called security inspection sub-module, and obtain the security inspection results, however, there is no specific By examining various types of data files, criminals may be able to bring out sensitive data by mixing it with normal files.

发明内容Contents of the invention

本发明的目的在于提供一种数据安全检查方法及系统，解决上述技术问题。The purpose of the present invention is to provide a data security inspection method and system to solve the above technical problems.

本发明的目的可以通过以下技术方案实现：The object of the present invention can be achieved through the following technical solutions:

一种数据安全检查方法，所述方法包括如下步骤：A data security inspection method, the method includes the following steps:

步骤1、用户通过通讯模块上传需要进行安全扫描的数据，并通过处理模块提交任务；Step 1. The user uploads the data that needs to be security scanned through the communication module and submits the task through the processing module;

步骤2、系统收到任务请求和附带的待扫描数据，它将待扫描数据保存到文件系统，通过处理模块在任务数据库内创建任务条目，写入新任务的各项信息，包括待扫描文件保存路径，然后通过扫描模块进行数据检查；Step 2. The system receives the task request and the attached data to be scanned. It saves the data to be scanned to the file system, creates a task entry in the task database through the processing module, and writes various information about the new task, including saving the file to be scanned. path, and then perform data inspection through the scanning module;

步骤3、扫描模块通过扫描程序定时轮询数据库，如果发现新的任务，则读取任务属性，得到待扫描数据路径，按路径对数据进行扫描，扫描结束后将扫描结果生成检查报告文件保存至指定路径，同时在任务数据库内将当前任务条目的状态更新为已完成；Step 3. The scanning module periodically polls the database through the scanning program. If a new task is found, the task attributes are read, the path of the data to be scanned is obtained, the data is scanned according to the path, and after the scanning is completed, the scan result is generated and the inspection report file is saved to Specify the path and update the status of the current task entry to completed in the task database;

步骤4、待用户下次请求任务状态时，系统通过查询模块查询数据库内的任务条目，如果任务状态已完成，则按照指定路径从文件系统读取扫描结果。Step 4. When the user requests the task status next time, the system queries the task entries in the database through the query module. If the task status has been completed, the scan results are read from the file system according to the specified path.

通过上述技术方案，本发明在当需要检查数据安全时，可通过处理模块在任务数据库内创建任务条目，写入新任务的各项信息，包括待扫描文件保存路径，然后，通过扫描模块扫描数据，扫描完成后将扫描结果生成检查报告文件保存指定路径，用户可以通过查询模块查询数据库内的任务条目，如果任务状态已完成，则按照指定路径从文件系统读取扫描结果。Through the above technical solution, when the present invention needs to check data security, it can create a task entry in the task database through the processing module, write various information of the new task, including the storage path of the file to be scanned, and then scan the data through the scanning module , after the scan is completed, the scan results will be generated and the inspection report file will be saved in the specified path. The user can query the task entries in the database through the query module. If the task status is completed, the scan results will be read from the file system according to the specified path.

作为本发明方案的进一步描述，所述步骤3的具体工作方法包括如下步骤：As a further description of the solution of the present invention, the specific working method of step 3 includes the following steps:

步骤31、首先扫描程序得到扫描路径，开始遍历路径下以及所有子路径下的所有文件；Step 31. First, the scanning program obtains the scan path and begins to traverse all files under the path and all sub-paths;

步骤32、对文件进行文件类型过滤、文件类型验证、文件加密检查，如果检查不通过，直接报告为可疑文件，如果检查正常，则进入下一步检查；Step 32: Perform file type filtering, file type verification, and file encryption checks on the file. If the check fails, it is directly reported as a suspicious file. If the check is normal, proceed to the next step of the check;

步骤33、对Office文档依次进行Office文档结构检查、文本关键词检查，如果检查不通过，报告为可疑文件，如果检查正常，则进入下一步检查；Step 33. Perform Office document structure inspection and text keyword inspection on the Office document in sequence. If the inspection fails, it is reported as a suspicious file. If the inspection is normal, proceed to the next step of inspection;

步骤34、对图像文件进行进行图像加噪和变换；Step 34: Perform image noise and transformation on the image file;

步骤35、对压缩包文件，解压其中的文件，重新回到步骤一，扫描压缩包内的所有文件；Step 35. For the compressed package file, decompress the files, return to step 1, and scan all files in the compressed package;

步骤36、当文件枚举已经枚举完扫描路径下所有的文件后，扫描结束，报告生成器收集了所有的检查结果，它将结果整理为报告文件，并保存到指定的路径下，至此，任务执行结束。Step 36. When the file enumeration has enumerated all the files in the scan path, the scan ends and the report generator collects all the inspection results. It organizes the results into a report file and saves it to the specified path. At this point, Task execution ends.

通过上述技术方案，扫描程序对文件进行文件类型过滤、文件类型验证、文件加密检查，如果检查不通过，直接报告为可疑文件，对图像文件进行进行图像加噪和变换，对压缩包文件，解压其中的文件，然后再对文件进行检查，如果检查不通过，直接报告为可疑文件，如果检查正常，则进入下一步检查。Through the above technical solution, the scanner performs file type filtering, file type verification, and file encryption checks on files. If the check fails, it is directly reported as a suspicious file, performs image noise and transformation on the image file, and decompresses the compressed package file. The files in it are then checked. If the check fails, it is directly reported as a suspicious file. If the check is normal, the next step of the check is entered.

作为本发明方案的进一步描述，所述扫描程序会不断的轮询任务数据库，检索其中的待执行任务并依次执行，如果所有的任务都执行完毕，轮询仍会继续，以等待新的任务出现。As a further description of the solution of the present invention, the scanning program will continuously poll the task database, retrieve the tasks to be executed and execute them in sequence. If all tasks are executed, polling will continue to wait for new tasks to appear. .

作为本发明方案的进一步描述，所述步骤32的具体方法包括：As a further description of the solution of the present invention, the specific method of step 32 includes:

所述文件类型过滤的方法是：The file type filtering method is:

首先，扫描文件的后缀名；First, scan the file extension;

其次，将扫描到的后缀名与提前设置好的白名单比较；Secondly, compare the scanned suffix name with the whitelist set in advance;

最后，白名单内的文件类型才可以通过，其他类型都报告为可疑文件；Finally, only file types in the whitelist can pass, and other types are reported as suspicious files;

所述文件类型验证的方法是：The method of file type verification is:

首先，扫描文件的后缀名；First, scan the file extension;

其次，按照文件后缀名所声明的格式，对文件结构进行检查；Secondly, check the file structure according to the format declared by the file extension name;

最后，如果其不符合后缀所标识的格式则报告为可疑；Finally, if it does not conform to the format identified by the suffix, it is reported as suspicious;

所述文件加密检查的方法是：本系统将所有被密码保护的文件报告为可疑文件，对支持加密的文件类型进行扫描，检查其是否被加密，将被加密的文件报告为可疑。The file encryption check method is: the system reports all password-protected files as suspicious files, scans file types that support encryption, checks whether they are encrypted, and reports encrypted files as suspicious.

通过上述技术方案，对文件进行文件类型过滤、文件类型验证、文件加密检查，防范了文件后缀名更改的隐藏手段，有通过扫描后缀名，将不符合后缀所标识的格式的文件扫描出来，同时，将所有被密码保护的文件报告为可疑文件，扫描出来并将其报告为可疑。Through the above technical solution, file type filtering, file type verification, and file encryption check are performed on files to prevent hidden means of changing file suffix names. By scanning the suffix name, files that do not conform to the format identified by the suffix are scanned out. At the same time, , reports all password-protected files as suspicious, scans them out and reports them as suspicious.

作为本发明方案的进一步描述，所述步骤33的具体方法包括：As a further description of the solution of the present invention, the specific method of step 33 includes:

所述Office文档结构检查方法为：按照微软对各种Office文档的定义，检查Office文档的文件结构，对一些常用的数据隐藏方法进行检查，报告隐藏数据的文件；The Office document structure checking method is: checking the file structure of the Office document according to Microsoft's definition of various Office documents, checking some commonly used data hiding methods, and reporting files with hidden data;

所述文本关键词检查方法为：The text keyword checking method is:

首先，扫描文件的每一个字节；First, scan every byte of the file;

然后，将扫描的字节与提前定义好的对比关键词作比较；Then, compare the scanned bytes with the comparison keywords defined in advance;

最后，若出现对比关键词，则报告为可疑文本。Finally, if a comparison keyword appears, it is reported as suspicious text.

作为本发明方案的进一步描述，所述对比关键词的定义方法为：As a further description of the solution of the present invention, the definition method of the comparison keyword is:

首先，随机选取多篇敏感文档作为实验数据集，使用算法工具对实验数据集进行处理，选取候选关键词，将得到的关键词组成关键词词汇表；First, multiple sensitive documents are randomly selected as the experimental data set, algorithm tools are used to process the experimental data set, candidate keywords are selected, and the obtained keywords are formed into a keyword vocabulary;

然后，对关键词词汇表中的每一个候选关键词计算权重，根据权重进行排名筛选出权重最大的前五个候选关键词；Then, the weight is calculated for each candidate keyword in the keyword vocabulary, and the top five candidate keywords with the largest weight are selected according to the ranking;

最后，删除其他候选关键词，筛选出来的五个候选关键词即为对比关键词。Finally, other candidate keywords are deleted, and the five filtered candidate keywords are the comparison keywords.

通过上述技术方案，对Office文档依次进行Office文档结构检查、文本关键词检查，并通过提前定义好的关键词对Office文本进行检查，防止出线可疑文本。Through the above technical solution, Office documents are sequentially checked for Office document structure and text keywords, and Office texts are checked through keywords defined in advance to prevent suspicious texts from being released.

作为本发明方案的进一步描述，所述步骤34的具体方法包为：对图像文件使用主动攻击的方法，破坏其中可能存在的隐藏数据：对图像文件增加噪声以破坏可能存在的LSB隐写数据，再进行细微的旋转和放缩变换来破坏可能存在的DCT隐写数据。As a further description of the solution of the present invention, the specific method package of step 34 is: using an active attack method on the image file to destroy the hidden data that may exist in it: adding noise to the image file to destroy the LSB steganographic data that may exist, Then perform subtle rotation and scaling transformations to destroy possible DCT steganographic data.

通过上述技术方案，本发明通过主动攻击，破坏图像文件可能存在的隐藏数据，且并不会影响原图像文件。Through the above technical solution, the present invention destroys the hidden data that may exist in the image file through active attacks without affecting the original image file.

作为本发明方案的进一步描述，所述步骤35中：本系统使用Zlib开发包来压缩和解压Zip文件，使用rar．exe来压缩RAR文档，使用unrar.dll来解压RAR文件。As a further description of the solution of the present invention, in step 35: the system uses the Zlib development package to compress and decompress the Zip file, and uses rar. exe to compress RAR files and unrar.dll to decompress RAR files.

通过上述技术方案，对压缩文件解压后，再进行扫描，避免压缩文件中混入可疑文件。Through the above technical solution, the compressed file is decompressed and then scanned to avoid mixing suspicious files into the compressed file.

一种数据安全检查系统，包括：通讯模块、处理模块和扫描模块；A data security inspection system, including: a communication module, a processing module and a scanning module;

通讯模块：用于上传需要进行安全扫描的文件，并向系统发送任务需求；Communication module: used to upload files that need to be security scanned and send task requirements to the system;

处理模块：用于创建扫描任务条目，写入新任务的各项信息；Processing module: used to create scanning task entries and write various information about the new task;

扫描模块:负责扫描检测数据，是本系统的核心模块。Scanning module: Responsible for scanning detection data, it is the core module of this system.

本发明的有益效果：Beneficial effects of the present invention:

1、本发明通过处理模块在任务数据库内创建任务条目，写入新任务的各项信息，包括待扫描文件保存路径，然后，通过扫描模块扫描数据，扫描完成后将扫描结果生成检查报告文件保存指定路径，用户可以通过查询模块查询数据库内的任务条目，如果任务状态已完成，则按照指定路径从文件系统读取扫描结果，实现了对数据的扫描，并且能快速的获得扫描结果，便于用户快速做出相应措施。1. The present invention creates task entries in the task database through the processing module, writes various information of the new task, including the storage path of the file to be scanned, and then scans the data through the scanning module. After the scanning is completed, the scanning results are generated and the inspection report file is saved. By specifying the path, the user can query the task entries in the database through the query module. If the task status has been completed, the scan results are read from the file system according to the specified path, which realizes the scanning of data and can quickly obtain the scan results, which is convenient for users. Take appropriate measures quickly.

2、本发明通过扫描程序对文件进行文件类型过滤、文件类型验证、文件加密检查，如果检查不通过，直接报告为可疑文件，对图像文件进行进行图像加噪和变换，对压缩包文件，解压其中的文件，然后再对文件进行检查，通过对文件名和文件格式检查，防止额外文件嵌入。2. The present invention uses a scanning program to perform file type filtering, file type verification, and file encryption inspection on files. If the inspection fails, it is directly reported as a suspicious file, image files are added to the image noise and transformed, and the compressed package file is decompressed. The files in it are then checked to prevent additional files from being embedded by checking the file name and file format.

3、本发明通过对Office文档依次进行Office文档结构检查、文本关键词检查，并通过提前定义好的关键词对Office文本进行检查，防止额外的消息嵌入文本。3. The present invention sequentially checks the Office document structure and text keywords on the Office document, and checks the Office text through keywords defined in advance to prevent additional messages from being embedded in the text.

4、本发明通过主动主动攻击，破坏图像文件可能存在的隐藏数据，且并不会影响原图像文件，放置额外信息嵌入图像文件。4. The present invention destroys the hidden data that may exist in the image file through active attacks without affecting the original image file, and embeds additional information into the image file.

5、本发明通过解压压缩文件，再对压缩文件进行扫描检查，防止压缩文件内混入可疑文件。5. The present invention decompresses the compressed file and then scans and checks the compressed file to prevent suspicious files from being mixed into the compressed file.

附图说明Description of the drawings

下面结合附图对本发明作进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings.

图1是本发明提供的数据安全检查方法的部分流程示意图。Figure 1 is a partial flow diagram of the data security inspection method provided by the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

所述步骤3的具体工作方法包括如下步骤：The specific working method of step 3 includes the following steps:

所述步骤32的具体方法包括：The specific method of step 32 includes:

所述文件类型过滤的方法是：The file type filtering method is:

首先，扫描文件的后缀名；First, scan the file extension;

最后，白名单内的文件类型才可以通过，其他类型都报告为可疑文件。Finally, only file types in the whitelist can pass, and other types are reported as suspicious files.

所述文件类型验证的方法是：The method of file type verification is:

首先，扫描文件的后缀名；First, scan the file extension;

最后，如果其不符合后缀所标识的格式则报告为可疑。Finally, if it does not conform to the format identified by the suffix it is reported as suspicious.

所述步骤33的具体方法包括：The specific method of step 33 includes:

所述文本关键词检查方法为：The text keyword checking method is:

首先，扫描文件的每一个字节；First, scan every byte of the file;

所述对比关键词的定义方法为：The definition method of the comparison keyword is:

对图像文件使用主动攻击的方法，破坏其中可能存在的隐藏数据：对图像文件增加噪声以破坏可能存在的LSB隐写数据，再进行细微的旋转和放缩变换来破坏可能存在的DCT隐写数据，这一步是“清洗”操作，不会产生“可疑"的检查结果，也不会影响原图像文件。Use an active attack method on the image file to destroy the hidden data that may exist in it: add noise to the image file to destroy the possible LSB steganographic data, and then perform subtle rotation and scaling transformations to destroy the possible DCT steganographic data. , this step is a "cleaning" operation, which will not produce "suspicious" inspection results, nor will it affect the original image file.

本系统使用Zlib开发包来压缩和解压Zip文件，使用rar．exe来压缩RAR文档，使用unrar.dll来解压RAR文件。This system uses the Zlib development package to compress and decompress Zip files, and uses rar. exe to compress RAR files and unrar.dll to decompress RAR files.

所述扫描程序会不断的轮询任务数据库，检索其中的待执行任务并依次执行，如果所有的任务都执行完毕，轮询仍会继续，以等待新的任务出现。The scanner will continuously poll the task database, retrieve the tasks to be executed and execute them in sequence. If all tasks are executed, polling will continue to wait for new tasks to appear.

通过上述技术方案，当需要检查数据安全时，本发明可以通过处理模块在任务数据库内创建任务条目，写入新任务的各项信息，包括待扫描文件保存路径，然后，通过扫描模块扫描数据，扫描完成后将扫描结果生成检查报告文件保存指定路径，用户可以通过查询模块查询数据库内的任务条目，如果任务状态已完成，则按照指定路径从文件系统读取扫描结果。Through the above technical solution, when the data security needs to be checked, the present invention can create a task entry in the task database through the processing module, write various information of the new task, including the storage path of the file to be scanned, and then scan the data through the scanning module. After the scan is completed, the scan results are generated and the inspection report file is saved in the specified path. The user can query the task entries in the database through the query module. If the task status is completed, the scan results are read from the file system according to the specified path.

一种采数据安全检查系统，其特征在于，包括：通讯模块、处理模块和扫描模块；A data collection security inspection system, characterized by including: a communication module, a processing module and a scanning module;

扫描模块:负责扫描检测文件，是本系统的核心模块。Scanning module: Responsible for scanning and detecting files, it is the core module of this system.

以上对本发明的一个实施例进行了详细说明，但所述内容仅为本发明的较佳实施例，不能被认为用于限定本发明的实施范围。凡依本发明申请范围所作的均等变化与改进等，均应仍归属于本发明的专利涵盖范围之内。An embodiment of the present invention has been described in detail above, but the content is only a preferred embodiment of the present invention and cannot be considered to limit the implementation scope of the present invention. All equivalent changes and improvements made within the scope of the present invention shall still fall within the scope of the patent of the present invention.

Claims

1. A data security inspection method, characterized in that the method includes the following steps:

Step 1. The user uploads the data that needs to be security scanned through the communication module and submits the task through the processing module;

Step 2. The system receives the task request and the attached data to be scanned. It saves the data to be scanned to the file system, creates a task entry in the task database through the processing module, and writes various information about the new task, including saving the file to be scanned. path, and then perform data inspection through the scanning module;

Step 3. The scanning module periodically polls the database through the scanning program. If a new task is found, the task attributes are read, the path of the data to be scanned is obtained, the data is scanned according to the path, and after the scanning is completed, the scan result is generated and the inspection report file is saved to Specify the path and update the status of the current task entry to completed in the task database;

Step 4. When the user requests the task status next time, the system queries the task entries in the database through the query module. If the task status has been completed, the scan results are read from the file system according to the specified path;

The specific working method of step 3 includes the following steps:

Step 31. First, the scanning program obtains the scan path and begins to traverse all files under the path and all sub-paths;

Step 32: Perform file type filtering, file type verification, and file encryption checks on the file. If the check fails, it is directly reported as a suspicious file. If the check is normal, proceed to the next step of the check;

Step 33. Perform Office document structure inspection and text keyword inspection on the Office document in sequence. If the inspection fails, it is reported as a suspicious file. If the inspection is normal, proceed to the next step of inspection;

Step 34. Perform image noise and transformation on the image file;

Step 35. For the compressed package file, decompress the files, return to step 1, and scan all files in the compressed package;

Step 36. When the file enumeration has enumerated all the files in the scan path, the scan ends and the report generator collects all the inspection results. It organizes the results into a report file and saves it to the specified path. At this point, Task execution ends.

2. The data security inspection method according to claim 1, characterized in that the scanning program will continuously poll the task database, retrieve the tasks to be executed and execute them in sequence. If all tasks are completed, the polling Will continue to wait for new tasks to appear.

3. The data security inspection method according to claim 1, characterized in that the specific method of step 32 includes:

The file type filtering method is:

First, scan the file extension;

Secondly, compare the scanned suffix name with the whitelist set in advance;

Finally, only the file types in the whitelist can pass, and other types are reported as suspicious files;

The method of file type verification is:

First, scan the file extension;

Secondly, check the file structure according to the format declared by the file extension name;

Finally, if it does not conform to the format identified by the suffix, it is reported as suspicious;

The file encryption check method is: the system reports all password-protected files as suspicious files, scans file types that support encryption, checks whether they are encrypted, and reports encrypted files as suspicious.

4. The data security inspection method according to claim 1, characterized in that,

The specific method of step 33 includes:

The method for checking the Office document structure is: check the file structure of the Office document according to Microsoft's definition of various Office documents, check some commonly used data hiding methods, and report files with hidden data;

The method of checking text keywords is:

First, scan every byte of the file;

Then, compare the scanned bytes with the comparison keywords defined in advance;

Finally, if a comparison keyword appears, it is reported as suspicious text.

5. The data security inspection method according to claim 4, characterized in that,

The definition method of the comparison keyword is:

First, multiple sensitive documents are randomly selected as the experimental data set, algorithm tools are used to process the experimental data set, candidate keywords are selected, and the obtained keywords are formed into a keyword vocabulary;

Then, the weight is calculated for each candidate keyword in the keyword vocabulary, and the top five candidate keywords with the largest weight are selected according to the ranking;

Finally, other candidate keywords are deleted, and the five filtered candidate keywords are the comparison keywords.

6. The data security inspection method according to claim 1, characterized in that the specific method package of step 34 is: using an active attack method on the image file to destroy hidden data that may exist therein: adding noise to the image file To destroy the possible LSB steganographic data, and then perform subtle rotation and scaling transformations to destroy the possible DCT steganographic data.

7. The data security inspection method according to claim 1, characterized in that in step 35: the system uses the Zlib development package to compress and decompress Zip files, using rar. exe to compress RAR files and unrar.dll to decompress RAR files.

8. A data security inspection system using the method according to any one of claims 1 to 7, characterized in that it includes: a communication module, a processing module and a scanning module;

Communication module: used to upload files that need to be security scanned and send task requirements to the system;

Processing module: used to create scanning task entries and write various information about the new task;

Scanning module: Responsible for scanning detection data, it is the core module of this system.