CN117634501B

CN117634501B - A computer file confidentiality checking method and system

Info

Publication number: CN117634501B
Application number: CN202410089364.5A
Authority: CN
Inventors: 路成刚; 胡现龙
Original assignee: Qingdao University of Technology
Current assignee: Qingdao University of Technology
Priority date: 2024-01-23
Filing date: 2024-01-23
Publication date: 2024-06-04
Anticipated expiration: 2044-01-23
Also published as: CN117634501A

Abstract

The invention relates to the technical field of code security measures, in particular to a computer file confidentiality checking method and system, comprising the following steps: based on deep learning, a transformer algorithm and a convolutional neural network are adopted to conduct deep semantic analysis and key feature extraction on text and image content, and feature data integration is conducted to generate text and image feature data. According to the invention, through deep learning, a converter algorithm and an image processing technology, deep semantic analysis and key feature extraction of texts and images are realized, the accuracy of judging secret information in the files is improved by combining natural language processing and an image recognition algorithm, the false alarm rate and the missing rate are reduced, an anti-network simulation mechanism is generated, the early warning capability of potential security holes and attack behaviors of the system is enhanced, the monitoring and prevention effects of data leakage are improved by a data flow analysis method, the file integrity verification is carried out by combining a blockchain technology, and the file security and the non-tamper modification are increased.

Description

A computer file confidentiality checking method and system

技术领域Technical Field

本发明涉及代码安全措施技术领域，尤其涉及一种计算机文件保密检查方法及系统。The present invention relates to the technical field of code security measures, and in particular to a computer file confidentiality checking method and system.

背景技术Background technique

代码安全措施技术领域是信息安全领域的一个重要分支，专注于确保软件、应用程序和计算机系统的安全性。它涵盖了一系列措施和技术，以防止潜在的威胁者（包括黑客、恶意软件、内部威胁等）从代码、应用程序或系统中获取未经授权的访问、信息或控制权。这个领域涵盖了多个方面，包括身份验证、授权、加密、漏洞分析、审计、代码分析和文件保密等。The field of code security measures technology is an important branch of information security, focusing on ensuring the security of software, applications, and computer systems. It covers a range of measures and technologies to prevent potential threat actors (including hackers, malware, internal threats, etc.) from gaining unauthorized access, information, or control from code, applications, or systems. This field covers multiple aspects, including authentication, authorization, encryption, vulnerability analysis, auditing, code analysis, and file confidentiality.

计算机文件保密检查方法是代码安全措施技术领域的一部分，它是一组技术和程序，用于检查和保护计算机文件，这些文件包括应用程序的源代码、配置文件、数据文件等。这些方法的主要任务是确保这些文件不会被未经授权的用户或系统访问、查看、复制或修改。其目的包括保护风险信息、知识产权、降低法律和商业风险，以及确保合规性。这些方法通过实施访问控制、代码审计、加密、安全开发实践和监控警报等手段来达到效果，从而确保计算机文件的机密性，减少潜在的威胁和风险，以及保护信息安全和合规性。这在当今面临不断增加的网络威胁和法规要求的背景下显得尤为重要。Computer file confidentiality inspection methods are part of the field of code security measures technology. They are a set of techniques and procedures used to inspect and protect computer files, including application source code, configuration files, data files, etc. The main task of these methods is to ensure that these files are not accessed, viewed, copied or modified by unauthorized users or systems. Its purpose includes protecting risk information, intellectual property, reducing legal and business risks, and ensuring compliance. These methods achieve their effects by implementing access control, code auditing, encryption, secure development practices, and monitoring alerts, thereby ensuring the confidentiality of computer files, reducing potential threats and risks, and protecting information security and compliance. This is particularly important in today's context of increasing cyber threats and regulatory requirements.

现有的计算机文件保密检查方法大多基于传统的文本和图像分析技术，这些技术往往对于复杂和深层次的语义信息提取存在局限，导致保密信息的判定存在较大的误报或遗漏。同时，很多方法没有结合到生成对抗网络进行系统漏洞的模拟，使得对于潜在威胁的识别和响应不够迅速。此外，数据流分析在很多现有方法中还停留在较为初级的阶段，没有对数据的全链条流动进行全面监控，增大了数据泄露的风险。而文件完整性校验通常没有采用区块链这样的分布式技术，容易受到中心化服务器的威胁，降低了完整性验证的可靠性。Most of the existing computer file confidentiality inspection methods are based on traditional text and image analysis technologies, which are often limited in extracting complex and deep semantic information, resulting in large false positives or omissions in the determination of confidential information. At the same time, many methods are not combined with generative adversarial networks to simulate system vulnerabilities, making the identification and response to potential threats not fast enough. In addition, data flow analysis is still at a relatively early stage in many existing methods, and there is no comprehensive monitoring of the entire chain of data flow, which increases the risk of data leakage. File integrity verification usually does not use distributed technologies such as blockchain, which is vulnerable to threats from centralized servers, reducing the reliability of integrity verification.

发明内容Summary of the invention

本发明的目的是解决现有技术中存在的缺点，而提出的一种计算机文件保密检查方法及系统。The purpose of the present invention is to solve the shortcomings in the prior art and to propose a computer file confidentiality checking method and system.

为了实现上述目的，本发明采用了如下技术方案：一种计算机文件保密检查方法，包括以下步骤：In order to achieve the above object, the present invention adopts the following technical solution: a computer file confidentiality checking method, comprising the following steps:

S1：基于深度学习，采用变换器算法和卷积神经网络，对文本和图像内容进行深层语义分析及关键特征提取，并进行特征数据整合，生成文本和图像特征数据；S1: Based on deep learning, it uses transformer algorithm and convolutional neural network to perform deep semantic analysis and key feature extraction on text and image content, integrate feature data, and generate text and image feature data;

S2：基于所述文本和图像特征数据，采用自然语言处理技术和图像识别算法，对文件中的文本和图像进行保密信息判定，并进行分类处理，生成保密信息报告；S2: Based on the text and image feature data, natural language processing technology and image recognition algorithm are used to determine the confidential information of the text and images in the file, and classification processing is performed to generate a confidential information report;

S3：基于所述保密信息报告，采用生成对抗网络，模拟潜在的安全漏洞和攻击行为，增强系统的预警机制，并进行漏洞数据整合，生成安全漏洞模拟数据；S3: Based on the confidential information report, a generative adversarial network is used to simulate potential security vulnerabilities and attack behaviors, enhance the early warning mechanism of the system, and integrate vulnerability data to generate security vulnerability simulation data;

S4：基于所述安全漏洞模拟数据，采用数据流分析算法，对文件操作和系统行为进行动态监控和静态分析，寻找潜在的数据泄漏点，并进行数据流整合，生成数据流分析报告；S4: Based on the security vulnerability simulation data, a data flow analysis algorithm is used to dynamically monitor and statically analyze file operations and system behaviors, find potential data leakage points, and integrate data flows to generate a data flow analysis report;

S5：基于文件原始数据，采用SHA-256哈希算法，对文件进行完整性校验，并将哈希值与存储在区块链上的哈希值进行对比，确定文件的完整性，并生成完整性校验记录；S5: Based on the original data of the file, the SHA-256 hash algorithm is used to perform integrity verification on the file, and the hash value is compared with the hash value stored on the blockchain to determine the integrity of the file and generate an integrity verification record;

S6：结合所述完整性校验记录，采用基于角色的访问控制和动态加密技术，对文件进行访问控制和加密处理，并设定访问权限，生成访问控制和加密策略；S6: Based on the integrity check record, role-based access control and dynamic encryption technology are used to perform access control and encryption processing on the file, and access rights are set to generate access control and encryption policies;

所述变换器算法具体为BERT、GPT系列模型，用于文本内容的语义理解，所述卷积神经网络用于从图像中抽取关键特征，所述自然语言处理技术用于识别包括私有API密钥、密码的文本型保密信息，所述图像识别算法用于识别图像中的保密内容，所述数据流分析算法具体为追踪保密数据在系统中的流动和存储路径，识别非法访问或篡改的尝试，所述动态加密技术包括AES对文件进行加密，使用RSA对AES的密钥进行管理。The transformer algorithm is specifically a BERT and GPT series model, which is used for semantic understanding of text content. The convolutional neural network is used to extract key features from images. The natural language processing technology is used to identify text-based confidential information including private API keys and passwords. The image recognition algorithm is used to identify confidential content in images. The data flow analysis algorithm is specifically used to track the flow and storage path of confidential data in the system and identify attempts at illegal access or tampering. The dynamic encryption technology includes AES to encrypt files and uses RSA to manage AES keys.

作为本发明的进一步方案，基于深度学习，采用变换器算法和卷积神经网络，对文本和图像内容进行深层语义分析及关键特征提取，并进行特征数据整合，生成文本和图像特征数据的步骤具体为：As a further solution of the present invention, based on deep learning, a transformer algorithm and a convolutional neural network are used to perform deep semantic analysis and key feature extraction on text and image content, and feature data integration. The steps of generating text and image feature data are specifically as follows:

S101：基于深度学习的框架，采用变换器算法进行文本的初步处理，转化为中间向量表示，并进行特征提取，生成文本中间向量；S101: Based on the deep learning framework, the transformer algorithm is used to perform preliminary text processing, convert it into an intermediate vector representation, and perform feature extraction to generate a text intermediate vector;

S102：基于所述文本中间向量，采用BERT模型进行深度学习训练，提取出文本的深层次语义特征，生成文本深度特征向量；S102: Based on the text intermediate vector, a BERT model is used for deep learning training to extract deep semantic features of the text and generate a text deep feature vector;

S103：采用卷积神经网络对图像内容进行初级特征提取，将图像转化为初步的特征矩阵，生成图像初级特征矩阵；S103: using a convolutional neural network to extract primary features of the image content, converting the image into a preliminary feature matrix, and generating a primary feature matrix of the image;

S104：基于所述文本深度特征向量和图像初级特征矩阵，采用自编码器进行特征融合，整合文本与图像的关键特征，生成文本和图像特征数据；S104: Based on the text deep feature vector and the image primary feature matrix, an autoencoder is used to perform feature fusion, integrate key features of the text and the image, and generate text and image feature data;

所述文本中间向量具体为文本内容的向量化表达，所述文本深度特征向量具体指对原文本的深度学习特征表示，所述图像初级特征矩阵具体为图像内容的特征表示，所述文本和图像特征数据包括文本与图像的融合特征向量表示。The text intermediate vector is specifically a vectorized expression of the text content, the text deep feature vector specifically refers to the deep learning feature representation of the original text, the image primary feature matrix is specifically a feature representation of the image content, and the text and image feature data includes a fused feature vector representation of the text and image.

作为本发明的进一步方案，基于所述文本和图像特征数据，采用自然语言处理技术和图像识别算法，对文件中的文本和图像进行保密信息判定，并进行分类处理，生成保密信息报告的步骤具体为：As a further solution of the present invention, based on the text and image feature data, natural language processing technology and image recognition algorithm are used to determine the confidential information of the text and images in the file, and classify them, and the steps of generating a confidential information report are specifically as follows:

S201：基于所述文本和图像特征数据，采用自然语言处理技术对文本内容中的潜在风险信息进行标注，生成文本保密初步报告；S201: Based on the text and image feature data, natural language processing technology is used to mark potential risk information in the text content to generate a preliminary text confidentiality report;

S202：基于所述文本和图像特征数据，采用图像识别算法对图像内容中的潜在风险信息进行标注，生成图像保密初步报告；S202: Based on the text and image feature data, an image recognition algorithm is used to mark potential risk information in the image content, and a preliminary image confidentiality report is generated;

S203：基于所述文本保密初步报告和图像保密初步报告，采用聚类算法对相似的保密信息进行归纳，生成保密信息分类数据；S203: Based on the text confidentiality preliminary report and the image confidentiality preliminary report, a clustering algorithm is used to summarize similar confidential information to generate confidential information classification data;

S204：基于所述保密信息分类数据，采用统计方法进行信息整合，形成保密信息概览和详情，生成保密信息报告；S204: Based on the classified data of confidential information, a statistical method is used to integrate the information to form an overview and details of confidential information and generate a confidential information report;

所述文本保密初步报告具体包括潜在的风险词汇、句子或段落的位置与内容，所述图像保密初步报告具体指标注在图像中的潜在风险区域或物体，所述保密信息分类数据具体包括按类型、来源或重要性分类的保密信息。The text confidentiality preliminary report specifically includes the location and content of potential risk words, sentences or paragraphs, the image confidentiality preliminary report specifically indicates potential risk areas or objects in the image, and the confidential information classification data specifically includes confidential information classified by type, source or importance.

作为本发明的进一步方案，基于所述保密信息报告，采用生成对抗网络，模拟潜在的安全漏洞和攻击行为，增强系统的预警机制，并进行漏洞数据整合，生成安全漏洞模拟数据的步骤具体为：As a further solution of the present invention, based on the confidential information report, a generative adversarial network is used to simulate potential security vulnerabilities and attack behaviors, enhance the early warning mechanism of the system, and integrate vulnerability data. The specific steps of generating security vulnerability simulation data are as follows:

S301：基于所述保密信息报告，采用生成对抗网络，模拟潜在安全漏洞，并生成潜在安全漏洞模拟场景；S301: Based on the confidential information report, using a generative adversarial network to simulate potential security vulnerabilities and generate potential security vulnerability simulation scenarios;

S302：基于所述潜在安全漏洞模拟场景，采用强化学习，模拟攻击者行为，并生成模拟攻击行为数据；S302: Based on the potential security vulnerability simulation scenario, using reinforcement learning to simulate attacker behavior and generate simulated attack behavior data;

S303：基于所述模拟攻击行为数据，采用模式识别，识别系统的弱点，并生成系统漏洞和弱点识别结果；S303: Based on the simulated attack behavior data, pattern recognition is used to identify system vulnerabilities, and system vulnerability and weakness identification results are generated;

S304：基于所述系统漏洞和弱点识别结果，进行数据整合，优化预警机制，并生成优化后的预警机制和漏洞评估报告；S304: Based on the system vulnerability and weakness identification results, perform data integration, optimize the early warning mechanism, and generate an optimized early warning mechanism and vulnerability assessment report;

所述生成对抗网络具体为使用生成器和判别器来捕获数据分布，用于模拟攻击场景，所述模式识别具体为使用支持向量机、决策树算法，自动识别和分类攻击模式，所述优化后的预警机制和漏洞评估报告包括识别漏洞的描述、影响评估和建议防护措施。The generative adversarial network specifically uses a generator and a discriminator to capture data distribution for simulating attack scenarios. The pattern recognition specifically uses a support vector machine and a decision tree algorithm to automatically identify and classify attack patterns. The optimized early warning mechanism and vulnerability assessment report include a description of the identified vulnerability, an impact assessment, and recommended protective measures.

作为本发明的进一步方案，基于所述安全漏洞模拟数据，采用数据流分析算法，对文件操作和系统行为进行动态监控和静态分析，寻找潜在的数据泄漏点，并进行数据流整合，生成数据流分析报告的步骤具体为：As a further solution of the present invention, based on the security vulnerability simulation data, a data flow analysis algorithm is used to dynamically monitor and statically analyze file operations and system behaviors, find potential data leakage points, and integrate data flows. The specific steps of generating a data flow analysis report are as follows:

S401：基于所述优化后的预警机制和漏洞评估报告，采用数据流分析算法，动态监控系统内文件操作，并生成动态文件操作监控数据；S401: Based on the optimized early warning mechanism and vulnerability assessment report, a data flow analysis algorithm is used to dynamically monitor file operations in the system and generate dynamic file operation monitoring data;

S402：基于所述动态文件操作监控数据，继续采用数据流分析算法，进行系统行为的静态分析，并生成系统行为静态分析数据；S402: Based on the dynamic file operation monitoring data, continue to use the data flow analysis algorithm to perform static analysis of system behavior and generate system behavior static analysis data;

S403：基于所述系统行为静态分析数据，标记潜在数据泄漏点，并生成潜在数据泄漏点标记；S403: statically analyzing the data based on the system behavior, marking potential data leakage points, and generating potential data leakage point marks;

S404：基于所述潜在数据泄漏点标记，进行数据流整合，评估整体系统的数据安全风险，并生成数据流分析报告；S404: Based on the potential data leakage point mark, perform data flow integration, evaluate the data security risk of the overall system, and generate a data flow analysis report;

所述数据流分析算法具体为对系统内部数据流的行为模式进行分析，所述系统行为静态分析数据具体指通过静态方法分析系统在没有外部输入时的行为，所述潜在数据泄漏点标记具体为在系统行为中导致信息外泄的风险区域，所述数据流分析报告包括数据流的整体分布、标记的泄漏点和建议修复措施。The data flow analysis algorithm specifically analyzes the behavioral patterns of the data flow within the system. The system behavior static analysis data specifically refers to analyzing the system behavior without external input through static methods. The potential data leakage point marker is specifically the risk area that may lead to information leakage in the system behavior. The data flow analysis report includes the overall distribution of the data flow, the marked leakage points and the recommended repair measures.

作为本发明的进一步方案，基于文件原始数据，采用SHA-256哈希算法，对文件进行完整性校验，并将哈希值与存储在区块链上的哈希值进行对比，确定文件的完整性，并生成完整性校验记录的步骤具体为：As a further solution of the present invention, based on the original data of the file, the SHA-256 hash algorithm is used to perform integrity verification on the file, and the hash value is compared with the hash value stored on the blockchain to determine the integrity of the file, and the steps of generating an integrity verification record are specifically as follows:

S501：基于文件数据，采用二进制读取法，进行文件内容的提取，并进行格式化处理，生成原始数据提取报告；S501: Based on the file data, a binary reading method is used to extract the file content, and format it to generate an original data extraction report;

S502：基于所述原始数据提取报告，采用SHA-256哈希算法，进行文件哈希计算，并进行哈希值的格式化，生成文件哈希值；S502: Based on the original data extraction report, the SHA-256 hash algorithm is used to perform file hash calculation and format the hash value to generate a file hash value;

S503：基于区块链网络接口，采用哈希检索法，进行与所述文件哈希值关联的检索，并进行哈希值比对准备，生成区块链哈希值；S503: Based on the blockchain network interface, a hash search method is used to perform a search associated with the hash value of the file, and a hash value comparison preparation is performed to generate a blockchain hash value;

S504：基于所述文件哈希值和区块链哈希值，采用哈希比对法，进行文件完整性的确认，并进行完整性校验，生成完整性校验记录；S504: Based on the file hash value and the blockchain hash value, a hash comparison method is used to confirm the integrity of the file, perform an integrity check, and generate an integrity check record;

所述原始数据提取报告具体为文件的字节流形式，所述文件哈希值具体为64位字符组成的字符串，所述区块链哈希值具体为在区块链上存储文件的哈希记录，所述完整性校验记录具体指出文件是否被篡改或损坏。The original data extraction report is specifically in the form of a byte stream of the file, the file hash value is specifically a string consisting of 64-bit characters, the blockchain hash value is specifically a hash record of the file stored on the blockchain, and the integrity check record specifically indicates whether the file has been tampered with or damaged.

作为本发明的进一步方案，结合所述完整性校验记录，采用基于角色的访问控制和动态加密技术，对文件进行访问控制和加密处理，并设定访问权限，生成访问控制和加密策略的步骤具体为：As a further solution of the present invention, in combination with the integrity check record, role-based access control and dynamic encryption technology are used to perform access control and encryption processing on the file, and access rights are set. The steps of generating access control and encryption policies are specifically as follows:

S601：基于所述完整性校验记录，采用角色分析法，进行文件角色的定义，并进行权限的设定，生成角色定义和权限分配表；S601: Based on the integrity check record, a role analysis method is used to define the file role, set permissions, and generate a role definition and permission allocation table;

S602：基于所述角色定义和权限分配表，采用基于角色的访问控制算法，进行访问权限的分配，并进行访问策略的设定，生成文件访问控制策略；S602: Based on the role definition and the permission allocation table, a role-based access control algorithm is used to allocate access rights, set access policies, and generate a file access control policy;

S603：基于所述文件访问控制策略，采用AES动态加密法，进行文件的加密处理，并进行加密数据的整合，生成加密后的文件数据；S603: Based on the file access control policy, the AES dynamic encryption method is used to encrypt the file, and the encrypted data is integrated to generate encrypted file data;

S604：基于所述加密后的文件数据和文件访问控制策略，采用策略整合法，进行文件的安全存储策略的制定，并进行策略的确认，生成访问控制和加密策略；S604: Based on the encrypted file data and the file access control policy, a policy integration method is used to formulate a secure storage policy for the file, and the policy is confirmed to generate an access control and encryption policy;

所述角色定义和权限分配表包括角色名称、角色描述及访问权限，所述文件访问控制策略具体为角色以及其访问权限的级别，所述加密后的文件数据具体指对原文件数据进行加密后的字节流，所述访问控制和加密策略具体为访问和解密该文件的完整策略和方法。The role definition and permission allocation table includes role name, role description and access rights. The file access control policy specifically refers to the role and its access rights level. The encrypted file data specifically refers to the byte stream after the original file data is encrypted. The access control and encryption policy specifically refers to the complete strategy and method for accessing and decrypting the file.

一种计算机文件保密检查系统，所述计算机文件保密检查系统用于执行上述计算机文件保密检查方法，所述系统包括特征提取与融合模块、风险信息标注模块、风险漏洞模拟模块、系统行为分析模块、文件内容提取模块、文件完整性验证模块、安全策略制定模块。A computer file confidentiality checking system is used to execute the above-mentioned computer file confidentiality checking method. The system includes a feature extraction and fusion module, a risk information labeling module, a risk vulnerability simulation module, a system behavior analysis module, a file content extraction module, a file integrity verification module, and a security policy formulation module.

作为本发明的进一步方案，所述特征提取与融合模块基于深度学习的框架，采用变换器算法及BERT模型进行文本处理并生成文本深度特征向量，并同时利用卷积神经网络提取图像特征，并与文本特征进行自编码器融合，生成文本和图像特征数据；As a further solution of the present invention, the feature extraction and fusion module is based on a deep learning framework, uses a transformer algorithm and a BERT model to process text and generate a text deep feature vector, and simultaneously uses a convolutional neural network to extract image features, and performs autoencoder fusion with text features to generate text and image feature data;

所述风险信息标注模块基于生成的文本和图像特征数据，通过自然语言处理和图像识别技术进行风险信息标注，运用统计方法归纳并整合，生成保密信息报告；The risk information annotation module annotates risk information based on the generated text and image feature data through natural language processing and image recognition technology, summarizes and integrates it using statistical methods, and generates a confidential information report;

所述风险漏洞模拟模块基于产生的保密信息报告，运用生成对抗网络模拟潜在风险，并通过模式识别确认系统弱点，优化预警机制，生成优化后的预警机制和漏洞评估报告；The risk vulnerability simulation module uses a generative adversarial network to simulate potential risks based on the generated confidential information report, and confirms system weaknesses through pattern recognition, optimizes the early warning mechanism, and generates an optimized early warning mechanism and vulnerability assessment report;

所述系统行为分析模块基于优化后的预警机制和漏洞评估报告，采用数据流分析算法对系统行为进行静态分析、标记潜在数据泄露点，并评估安全风险，生成数据流分析报告；The system behavior analysis module uses a data flow analysis algorithm to statically analyze system behavior, mark potential data leakage points, evaluate security risks, and generate a data flow analysis report based on the optimized early warning mechanism and vulnerability assessment report;

所述文件内容提取模块采用二进制读取法对文件数据进行格式化处理，生成原始数据提取报告；The file content extraction module formats the file data using a binary reading method to generate a raw data extraction report;

所述文件完整性验证模块基于产生的原始数据提取报告，运用SHA-256哈希算法计算文件哈希值，比对文件哈希值及区块链哈希值确保文件完整性，生成完整性校验记录；The file integrity verification module calculates the file hash value based on the generated original data extraction report using the SHA-256 hash algorithm, compares the file hash value with the blockchain hash value to ensure file integrity, and generates an integrity verification record;

所述安全策略制定模块基于提取的完整性校验记录，将文件角色定义并设定权限，采用AES动态加密法对文件加密，并制定文件的安全存储策略，生成访问控制和加密策略。The security policy formulation module defines the file role and sets permissions based on the extracted integrity check record, encrypts the file using the AES dynamic encryption method, formulates a secure storage strategy for the file, and generates access control and encryption strategies.

作为本发明的进一步方案，所述特征提取与融合模块包括文本处理子模块、图像特征提取子模块、特征融合子模块；As a further solution of the present invention, the feature extraction and fusion module includes a text processing submodule, an image feature extraction submodule, and a feature fusion submodule;

所述风险信息标注模块包括风险信息标注子模块、保密信息分类子模块、信息整合子模块；The risk information labeling module includes a risk information labeling submodule, a confidential information classification submodule, and an information integration submodule;

所述风险漏洞模拟模块包括安全漏洞模拟子模块、攻击者行为模拟子模块、系统弱点识别子模块、预警机制优化子模块；The risk vulnerability simulation module includes a security vulnerability simulation submodule, an attacker behavior simulation submodule, a system weakness identification submodule, and an early warning mechanism optimization submodule;

所述系统行为分析模块包括文件操作监控子模块、系统行为分析子模块、数据泄漏点标记子模块、数据流整合子模块；The system behavior analysis module includes a file operation monitoring submodule, a system behavior analysis submodule, a data leakage point marking submodule, and a data flow integration submodule;

所述文件内容提取模块包括数据读取子模块、数据格式化子模块；The file content extraction module includes a data reading submodule and a data formatting submodule;

所述文件完整性验证模块包括哈希计算子模块、哈希检索子模块、完整性确认子模块；The file integrity verification module includes a hash calculation submodule, a hash retrieval submodule, and an integrity confirmation submodule;

所述安全策略制定模块包括文件角色定义子模块、访问权限设定子模块、文件加密子模块、策略确认子模块。The security policy formulation module includes a file role definition submodule, an access permission setting submodule, a file encryption submodule, and a policy confirmation submodule.

与现有技术相比，本发明的优点和积极效果在于：Compared with the prior art, the advantages and positive effects of the present invention are:

本发明中，通过深度学习技术、变换器算法、图像处理技术，能够确保文本和图像的深层语义分析和关键特征提取更为准确。结合自然语言处理和图像识别算法，使得对文件中的保密信息的判定更为精确，降低了误报率和遗漏率。通过生成对抗网络的模拟机制，对系统潜在的安全漏洞和攻击行为有了更高度的预警，增强了对外部威胁的识别和响应能力。而基于数据流分析的方法，使得系统对数据泄漏的监控和预防更为周全。此外，结合区块链技术进行文件的完整性校验，提高了文件的安全性和不可篡改性。In the present invention, through deep learning technology, transformer algorithm, and image processing technology, it is possible to ensure that the deep semantic analysis and key feature extraction of text and images are more accurate. Combining natural language processing and image recognition algorithms, the judgment of confidential information in files is more accurate, reducing the false alarm rate and omission rate. By generating a simulation mechanism of adversarial networks, there is a higher level of early warning of potential security vulnerabilities and attack behaviors in the system, and the ability to identify and respond to external threats is enhanced. The method based on data flow analysis makes the system more comprehensive in monitoring and preventing data leakage. In addition, the integrity check of files is carried out in combination with blockchain technology, which improves the security and non-tamperability of files.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的工作流程示意图；Fig. 1 is a schematic diagram of the workflow of the present invention;

图2为本发明的S1细化流程图；FIG2 is a flow chart of the refinement of S1 of the present invention;

图3为本发明的S2细化流程图；FIG3 is a flow chart of the refinement of S2 of the present invention;

图4为本发明的S3细化流程图；FIG4 is a flow chart of the refinement of S3 of the present invention;

图5为本发明的S4细化流程图；FIG5 is a flow chart of the refinement of S4 of the present invention;

图6为本发明的S5细化流程图；FIG6 is a flow chart of the refinement of S5 of the present invention;

图7为本发明的S6细化流程图；FIG7 is a detailed flow chart of S6 of the present invention;

图8为本发明的系统流程图；FIG8 is a system flow chart of the present invention;

图9为本发明的系统框架示意图。FIG. 9 is a schematic diagram of a system framework of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention.

在本发明的描述中，需要理解的是，术语“长度”、“宽度”、“上”、“下”、“前”、“后”、“左”、“右”、“竖直”、“水平”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，在本发明的描述中，“多个”的含义是两个或两个以上，除非另有明确具体的限定。In the description of the present invention, it should be understood that the terms "length", "width", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inside", "outside" and the like indicate positions or positional relationships based on the positions or positional relationships shown in the drawings, and are only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore cannot be understood as limiting the present invention. In addition, in the description of the present invention, "plurality" means two or more, unless otherwise clearly and specifically defined.

实施例Example

请参阅图1，本发明提供一种技术方案：一种计算机文件保密检查方法，包括以下步骤：Please refer to FIG1 , the present invention provides a technical solution: a computer file confidentiality checking method, comprising the following steps:

S2：基于文本和图像特征数据，采用自然语言处理技术和图像识别算法，对文件中的文本和图像进行保密信息判定，并进行分类处理，生成保密信息报告；S2: Based on text and image feature data, natural language processing technology and image recognition algorithms are used to determine and classify the text and images in the file to generate a confidential information report;

S3：基于保密信息报告，采用生成对抗网络，模拟潜在的安全漏洞和攻击行为，增强系统的预警机制，并进行漏洞数据整合，生成安全漏洞模拟数据；S3: Based on confidential information reports, a generative adversarial network is used to simulate potential security vulnerabilities and attack behaviors, enhance the system's early warning mechanism, and integrate vulnerability data to generate security vulnerability simulation data;

S4：基于安全漏洞模拟数据，采用数据流分析算法，对文件操作和系统行为进行动态监控和静态分析，寻找潜在的数据泄漏点，并进行数据流整合，生成数据流分析报告；S4: Based on the security vulnerability simulation data, the data flow analysis algorithm is used to dynamically monitor and statically analyze file operations and system behaviors, find potential data leakage points, integrate data flows, and generate data flow analysis reports;

S6：结合完整性校验记录，采用基于角色的访问控制和动态加密技术，对文件进行访问控制和加密处理，并设定访问权限，生成访问控制和加密策略；S6: Combined with the integrity check record, role-based access control and dynamic encryption technology are used to perform access control and encryption processing on the file, and access rights are set to generate access control and encryption policies;

变换器算法具体为BERT、GPT系列模型，用于文本内容的语义理解，卷积神经网络用于从图像中抽取关键特征，自然语言处理技术用于识别包括私有API密钥、密码的文本型保密信息，图像识别算法用于识别图像中的保密内容，数据流分析算法具体为追踪保密数据在系统中的流动和存储路径，识别非法访问或篡改的尝试，动态加密技术包括AES对文件进行加密，使用RSA对AES的密钥进行管理。The transformer algorithms are specifically BERT and GPT series models, which are used for semantic understanding of text content. Convolutional neural networks are used to extract key features from images. Natural language processing technology is used to identify text-based confidential information including private API keys and passwords. Image recognition algorithms are used to identify confidential content in images. Data flow analysis algorithms are specifically used to track the flow and storage path of confidential data in the system and identify attempts at illegal access or tampering. Dynamic encryption technology includes AES to encrypt files and uses RSA to manage AES keys.

基于深度学习的文本和图像分析技术，采用变换器算法和卷积神经网络进行深层语义分析和关键特征提取，可以有效地识别文件中的保密信息。通过自然语言处理技术和图像识别算法，对文件中的文本和图像进行保密信息判定和分类处理，生成保密信息报告，有助于及时发现和保护敏感信息。The text and image analysis technology based on deep learning uses transformer algorithms and convolutional neural networks to perform deep semantic analysis and extract key features, which can effectively identify confidential information in files. Through natural language processing technology and image recognition algorithms, confidential information judgment and classification are performed on text and images in files, and confidential information reports are generated, which helps to timely discover and protect sensitive information.

生成对抗网络的应用可以模拟潜在的安全漏洞和攻击行为，增强系统的预警机制。通过对文件操作和系统行为的动态监控和静态分析，寻找潜在的数据泄漏点，并进行数据流整合，生成数据流分析报告，有助于提前发现和防范安全风险。The application of generative adversarial networks can simulate potential security vulnerabilities and attack behaviors and enhance the system's early warning mechanism. Through dynamic monitoring and static analysis of file operations and system behaviors, potential data leakage points can be found, and data flow integration and data flow analysis reports can be generated, which helps to discover and prevent security risks in advance.

SHA-256哈希算法用于文件的完整性校验，将哈希值与区块链上的哈希值进行对比，确定文件的完整性，并生成完整性校验记录。这有助于确保文件在传输和存储过程中没有被篡改或损坏。The SHA-256 hash algorithm is used to verify the integrity of the file, comparing the hash value with the hash value on the blockchain to determine the integrity of the file and generate an integrity verification record. This helps ensure that the file has not been tampered with or damaged during transmission and storage.

基于角色的访问控制和动态加密技术的结合，可以实现对文件的访问控制和加密处理，并设定访问权限，生成访问控制和加密策略。这有助于保护文件的安全性和隐私性，防止未经授权的访问和泄露。The combination of role-based access control and dynamic encryption technology can achieve access control and encryption processing for files, set access rights, and generate access control and encryption policies. This helps to protect the security and privacy of files and prevent unauthorized access and leakage.

请参阅图2，基于深度学习，采用变换器算法和卷积神经网络，对文本和图像内容进行深层语义分析及关键特征提取，并进行特征数据整合，生成文本和图像特征数据的步骤具体为：Please refer to Figure 2. Based on deep learning, the transformer algorithm and convolutional neural network are used to perform deep semantic analysis and key feature extraction on the text and image content, and integrate feature data. The specific steps for generating text and image feature data are as follows:

S102：基于文本中间向量，采用BERT模型进行深度学习训练，提取出文本的深层次语义特征，生成文本深度特征向量；S102: Based on the text intermediate vector, the BERT model is used for deep learning training to extract the deep semantic features of the text and generate a text deep feature vector;

S103：采用卷积神经网络对图像内容进行初级特征提取，将图像转化为初步的特征矩阵，生成图像初级特征矩阵；S103: using a convolutional neural network to extract primary features of the image content, converting the image into a preliminary feature matrix, and generating an image primary feature matrix;

S104：基于文本深度特征向量和图像初级特征矩阵，采用自编码器进行特征融合，整合文本与图像的关键特征，生成文本和图像特征数据；S104: Based on the text deep feature vector and the image primary feature matrix, an autoencoder is used to perform feature fusion, integrate the key features of the text and the image, and generate text and image feature data;

文本中间向量具体为文本内容的向量化表达，文本深度特征向量具体指对原文本的深度学习特征表示，图像初级特征矩阵具体为图像内容的特征表示，文本和图像特征数据包括文本与图像的融合特征向量表示。The text intermediate vector is specifically the vectorized expression of the text content, the text deep feature vector specifically refers to the deep learning feature representation of the original text, the image primary feature matrix is specifically the feature representation of the image content, and the text and image feature data includes the fused feature vector representation of the text and image.

S101中，使用深度学习框架，如TensorFlow或PyTorch，采用变换器算法（例如，Word2Vec或FastText）将原始文本转化为向量化表示。In S101, a deep learning framework such as TensorFlow or PyTorch is used to convert raw text into vectorized representation using a transformer algorithm (e.g., Word2Vec or FastText).

代码示例：Code example:

# 导入相应的库和模型# Import the corresponding libraries and models

from gensim.models import Word2Vecfrom gensim.models import Word2Vec

import nltkimport nltk

# 分词和预处理文本# Tokenization and preprocessing text

text = "Your input text here."text = "Your input text here."

tokens = nltk.word_tokenize(text)tokens = nltk.word_tokenize(text)

# 训练Word2Vec模型# Train the Word2Vec model

model = Word2Vec(tokens, vector_size=100, window=5, min_count=1, sg=0)model = Word2Vec(tokens, vector_size=100, window=5, min_count=1, sg=0)

# 获取文本向量# Get text vector

text_vector = model.wv['your_word']text_vector = model.wv['your_word']

S102中，使用预训练的BERT模型，如Hugging Face的 Transformers 库，对文本进行深度学习训练以提取深层次语义特征。In S102, a pre-trained BERT model, such as Hugging Face's Transformers library, is used to perform deep learning training on text to extract deep semantic features.

代码示例：Code example:

from transformers import BertTokenizer, BertModelfrom transformers import BertTokenizer, BertModel

import torchimport torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

model = BertModel.from_pretrained('bert-base-uncased')model = BertModel.from_pretrained('bert-base-uncased')

text = "Your input text here."text = "Your input text here."

input_ids = tokenizer(text, return_tensors='pt').input_idsinput_ids = tokenizer(text, return_tensors='pt').input_ids

outputs = model(input_ids)outputs = model(input_ids)

# 获取深层次语义特征# Get deep semantic features

text_embeddings = outputs.last_hidden_state.mean(dim=1)text_embeddings = outputs.last_hidden_state.mean(dim=1)

S103中，使用深度学习框架，如PyTorch或TensorFlow，创建卷积神经网络（CNN）模型，用于提取图像的初级特征。In S103, a convolutional neural network (CNN) model is created using a deep learning framework, such as PyTorch or TensorFlow, to extract primary features of the image.

代码示例：Code example:

import torchimport torch

import torch.nn as nnimport torch.nn as nn

import torchvision.models as modelsimport torchvision.models as models

# 加载预训练的CNN模型# Load the pre-trained CNN model

cnn_model = models.resnet50(pretrained=True)cnn_model = models.resnet50(pretrained=True)

cnn_model = nn.Sequential(*list(cnn_model.children())[:-2]) # 去掉最后的全连接层cnn_model = nn.Sequential(*list(cnn_model.children())[:-2]) # Remove the last fully connected layer

# 图像预处理和特征提取# Image preprocessing and feature extraction

image = load_image("your_image.jpg") # 加载图像image = load_image("your_image.jpg") # Load image

image = preprocess_image(image) # 预处理图像image = preprocess_image(image) # Preprocess image

image_features = cnn_model(image) # 提取初级特征image_features = cnn_model(image) # Extract primary features

S104中，创建自编码器模型，将文本深层次语义特征和图像初级特征输入到自编码器中，以生成融合的文本和图像特征数据。In S104, an autoencoder model is created, and the deep semantic features of the text and the primary features of the image are input into the autoencoder to generate fused text and image feature data.

代码示例：Code example:

import torchimport torch

import torch.nn as nnimport torch.nn as nn

class AutoEncoder(nn.Module):class AutoEncoder(nn.Module):

def __init__(self, text_feature_size, image_feature_size, latent_size):def __init__(self, text_feature_size, image_feature_size, latent_size):

super(AutoEncoder, self).__init()super(AutoEncoder, self).__init()

self.text_encoder = nn.Linear(text_feature_size, latent_size)self.text_encoder = nn.Linear(text_feature_size, latent_size)

self.image_encoder = nn.Linear(image_feature_size, latent_size)self.image_encoder = nn.Linear(image_feature_size, latent_size)

self.decoder = nn.Linear(latent_size, text_feature_size +image_feature_size)self.decoder = nn.Linear(latent_size, text_feature_size +image_feature_size)

def forward(self, text_features, image_features):def forward(self, text_features, image_features):

text_latent = self.text_encoder(text_features)text_latent = self.text_encoder(text_features)

image_latent = self.image_encoder(image_features)image_latent = self.image_encoder(image_features)

combined_latent = torch.cat((text_latent, image_latent), dim=1)combined_latent = torch.cat((text_latent, image_latent), dim=1)

reconstructed_features = self.decoder(combined_latent)reconstructed_features = self.decoder(combined_latent)

return reconstructed_featuresreturn reconstructed_features

# 初始化和训练自编码器# Initialize and train the autoencoder

autoencoder = AutoEncoder(text_feature_size, image_feature_size,latent_size)autoencoder = AutoEncoder(text_feature_size, image_feature_size,latent_size)

loss_fn = nn.MSELoss()loss_fn = nn.MSELoss()

optimizer = torch.optim.Adam(autoencoder.parameters(), lr=learning_rate)optimizer = torch.optim.Adam(autoencoder.parameters(), lr=learning_rate)

# 训练自编码器# Train the autoencoder

for epoch in range(num_epochs):for epoch in range(num_epochs):

optimizer.zero_grad()optimizer.zero_grad()

outputs = autoencoder(text_features, image_features)outputs = autoencoder(text_features, image_features)

loss = loss_fn(outputs, torch.cat((text_features, image_features),dim=1))loss = loss_fn(outputs, torch.cat((text_features, image_features), dim=1))

loss.backward()loss.backward()

optimizer.step()optimizer.step()

请参阅图3，基于文本和图像特征数据，采用自然语言处理技术和图像识别算法，对文件中的文本和图像进行保密信息判定，并进行分类处理，生成保密信息报告的步骤具体为：Please refer to Figure 3. Based on the text and image feature data, natural language processing technology and image recognition algorithm are used to determine the confidential information of the text and images in the file, and then classify them. The specific steps for generating a confidential information report are as follows:

S201：基于文本和图像特征数据，采用自然语言处理技术对文本内容中的潜在风险信息进行标注，生成文本保密初步报告；S201: Based on text and image feature data, natural language processing technology is used to annotate potential risk information in text content and generate a preliminary text confidentiality report;

S202：基于文本和图像特征数据，采用图像识别算法对图像内容中的潜在风险信息进行标注，生成图像保密初步报告；S202: Based on the text and image feature data, an image recognition algorithm is used to mark the potential risk information in the image content and generate a preliminary image confidentiality report;

S203：基于文本保密初步报告和图像保密初步报告，采用聚类算法对相似的保密信息进行归纳，生成保密信息分类数据；S203: Based on the text confidentiality preliminary report and the image confidentiality preliminary report, a clustering algorithm is used to summarize similar confidential information to generate confidential information classification data;

S204：基于保密信息分类数据，采用统计方法进行信息整合，形成保密信息概览和详情，生成保密信息报告；S204: Based on the classified data of confidential information, a statistical method is used to integrate the information, form an overview and details of the confidential information, and generate a confidential information report;

文本保密初步报告具体包括潜在的风险词汇、句子或段落的位置与内容，图像保密初步报告具体指标注在图像中的潜在风险区域或物体，保密信息分类数据具体包括按类型、来源或重要性分类的保密信息。The text confidentiality preliminary report specifically includes the location and content of potential risk words, sentences or paragraphs. The image confidentiality preliminary report specifically indicates the potential risk areas or objects in the image. The confidential information classification data specifically includes confidential information classified by type, source or importance.

基于文本和图像特征数据，采用自然语言处理技术对文本内容中的潜在风险信息进行标注。这包括使用文本分析算法来识别潜在的敏感词汇、短语或句子，并将其标记为潜在的风险信息。同时，还需要记录这些潜在风险信息在文本中的位置和内容。Based on text and image feature data, natural language processing technology is used to annotate potential risk information in text content. This includes using text analysis algorithms to identify potential sensitive words, phrases, or sentences and marking them as potential risk information. At the same time, the location and content of this potential risk information in the text also need to be recorded.

接下来，基于文本和图像特征数据，采用图像识别算法对图像内容中的潜在风险信息进行标注。这可以通过使用图像识别技术来检测图像中的特定区域或物体，并将其标记为潜在的风险信息。例如，可以检测到是否有人在图像中展示了禁止拍摄的设备或文件。Next, based on the text and image feature data, image recognition algorithms are used to annotate potential risk information in the image content. This can be done by using image recognition technology to detect specific areas or objects in the image and mark them as potential risk information. For example, it can be detected if someone is displaying a device or document that is prohibited from being photographed in the image.

在生成了文本保密初步报告和图像保密初步报告后，需要对这些报告中的潜在风险信息进行归纳和分类。为此，可以使用聚类算法将相似的保密信息进行分组，形成保密信息分类数据。这些分类可以根据类型、来源或重要性等标准进行划分。After generating the preliminary text confidentiality report and the preliminary image confidentiality report, it is necessary to summarize and classify the potential risk information in these reports. To this end, a clustering algorithm can be used to group similar confidential information to form confidential information classification data. These classifications can be divided according to criteria such as type, source or importance.

最后，基于保密信息分类数据，采用统计方法进行信息整合，形成保密信息概览和详情，并生成保密信息报告。这包括对各类保密信息的统计数据分析，以及对每个类别的详细信息的描述和解释。Finally, based on the classified data of confidential information, statistical methods are used to integrate the information, form an overview and details of confidential information, and generate a confidential information report. This includes statistical data analysis of various types of confidential information, as well as descriptions and explanations of detailed information for each category.

请参阅图4，基于保密信息报告，采用生成对抗网络，模拟潜在的安全漏洞和攻击行为，增强系统的预警机制，并进行漏洞数据整合，生成安全漏洞模拟数据的步骤具体为：Please refer to Figure 4. Based on the confidential information report, a generative adversarial network is used to simulate potential security vulnerabilities and attack behaviors, enhance the system's early warning mechanism, and integrate vulnerability data. The specific steps for generating security vulnerability simulation data are as follows:

S301：基于保密信息报告，采用生成对抗网络，模拟潜在安全漏洞，并生成潜在安全漏洞模拟场景；S301: Based on the confidential information report, a generative adversarial network is used to simulate potential security vulnerabilities and generate potential security vulnerability simulation scenarios;

S302：基于潜在安全漏洞模拟场景，采用强化学习，模拟攻击者行为，并生成模拟攻击行为数据；S302: Based on the potential security vulnerability simulation scenario, reinforcement learning is used to simulate the attacker's behavior and generate simulated attack behavior data;

S303：基于模拟攻击行为数据，采用模式识别，识别系统的弱点，并生成系统漏洞和弱点识别结果；S303: Based on the simulated attack behavior data, pattern recognition is used to identify system weaknesses, and system vulnerability and weakness identification results are generated;

S304：基于系统漏洞和弱点识别结果，进行数据整合，优化预警机制，并生成优化后的预警机制和漏洞评估报告；S304: Based on the system vulnerability and weakness identification results, perform data integration, optimize the early warning mechanism, and generate an optimized early warning mechanism and vulnerability assessment report;

生成对抗网络具体为使用生成器和判别器来捕获数据分布，用于模拟攻击场景，模式识别具体为使用支持向量机、决策树算法，自动识别和分类攻击模式，优化后的预警机制和漏洞评估报告包括识别漏洞的描述、影响评估和建议防护措施。Generative adversarial networks specifically use generators and discriminators to capture data distribution for simulating attack scenarios. Pattern recognition specifically uses support vector machines and decision tree algorithms to automatically identify and classify attack patterns. The optimized early warning mechanism and vulnerability assessment report include descriptions of identified vulnerabilities, impact assessments, and recommended protective measures.

S301中，使用生成对抗网络模型，包括生成器和判别器，来模拟潜在安全漏洞场景。准备数据集，包括保密信息报告中标识的潜在安全漏洞的描述和上下文信息。使用生成器模型生成潜在安全漏洞场景的样本。生成器接收随机噪声作为输入，生成与实际漏洞相似的场景。In S301, a generative adversarial network model, including a generator and a discriminator, is used to simulate potential security vulnerability scenarios. A dataset is prepared, including descriptions and contextual information of potential security vulnerabilities identified in confidential information reports. Samples of potential security vulnerability scenarios are generated using a generator model. The generator receives random noise as input and generates scenarios similar to actual vulnerabilities.

# 代码示例：生成器模型# Code Example: Generator Model

generator = create_generator_model()generator = create_generator_model()

generator.compile(loss='binary_crossentropy', optimizer='adam')generator.compile(loss='binary_crossentropy', optimizer='adam')

generator.fit(random_noise, simulated_vulnerability_scenarios, epochs=100)generator.fit(random_noise, simulated_vulnerability_scenarios, epochs=100)

训练判别器来区分生成的漏洞场景和实际漏洞场景。这有助于提高生成器的性能。The discriminator is trained to distinguish between generated vulnerability scenarios and real vulnerability scenarios. This helps improve the performance of the generator.

# 代码示例：判别器模型# Code Example: Discriminator Model

discriminator = create_discriminator_model()discriminator = create_discriminator_model()

discriminator.compile(loss='binary_crossentropy', optimizer='adam')discriminator.compile(loss='binary_crossentropy', optimizer='adam')

discriminator.fit(scenarios, real_vs_generated_labels, epochs=100)discriminator.fit(scenarios, real_vs_generated_labels, epochs=100)

将生成器和判别器组合为GAN，交替地训练它们，以提高生成的漏洞场景的质量。The generator and discriminator are combined into a GAN and trained alternately to improve the quality of the generated vulnerability scenarios.

# 代码示例：GAN模型# Code Example: GAN Model

gan = create_gan(generator, discriminator)gan = create_gan(generator, discriminator)

gan.compile(loss='binary_crossentropy', optimizer='adam')gan.compile(loss='binary_crossentropy', optimizer='adam')

gan.fit(random_noise, real_labels, epochs=100)gan.fit(random_noise, real_labels, epochs=100)

S302中，使用强化学习模型来模拟攻击者的行为。将潜在安全漏洞场景作为环境，建立状态、动作和奖励的映射。训练强化学习模型，如深度强化学习模型 (DRL)，来模拟攻击者的行为。In S302, a reinforcement learning model is used to simulate the attacker's behavior. The potential security vulnerability scenario is used as an environment to establish a mapping of state, action, and reward. A reinforcement learning model, such as a deep reinforcement learning model (DRL), is trained to simulate the attacker's behavior.

# 代码示例：强化学习训练# Code Example: Reinforcement Learning Training

rl_agent = create_reinforcement_learning_agent()rl_agent = create_reinforcement_learning_agent()

rl_agent.train(environment, num_episodes=1000)rl_agent.train(environment, num_episodes=1000)

在训练后，使用强化学习模型来模拟攻击者的行为，生成攻击行为数据。After training, the reinforcement learning model is used to simulate the attacker's behavior and generate attack behavior data.

# 代码示例：生成模拟攻击行为数据# Code example: Generate simulated attack behavior data

simulated_attack_data = rl_agent.simulate_attacks(num_samples=1000)simulated_attack_data = rl_agent.simulate_attacks(num_samples=1000)

S303中，使用模式识别算法，如支持向量机或决策树，来自动识别系统漏洞和弱点。整理模拟攻击行为数据和相关的系统数据。提取特征，将数据转化为可用于模式识别的形式。训练支持向量机、决策树或其他模型来识别攻击模式和系统弱点。In S303, a pattern recognition algorithm, such as a support vector machine or a decision tree, is used to automatically identify system vulnerabilities and weaknesses. Simulated attack behavior data and related system data are collated. Features are extracted and the data is converted into a form that can be used for pattern recognition. A support vector machine, decision tree, or other model is trained to identify attack patterns and system weaknesses.

# 代码示例：模式识别模型训练# Code Example: Pattern Recognition Model Training

pattern_recognition_model = create_pattern_recognition_model()pattern_recognition_model = create_pattern_recognition_model()

pattern_recognition_model.fit(features, labels)pattern_recognition_model.fit(features, labels)

识别弱点：使用模式识别模型来识别系统中的弱点。Identify weaknesses: Use pattern recognition models to identify weaknesses in the system.

# 代码示例：识别系统中的弱点# Code Example: Identifying Weaknesses in a System

vulnerability_identification = pattern_recognition_model.predict(system_data)vulnerability_identification = pattern_recognition_model.predict(system_data)

S304中，将系统漏洞和弱数据整合。点的识别结果与先前的保密信息报告数根据漏洞和弱点的识别结果，优化系统的预警机制，以提高对潜在威胁的警觉性。生成包括漏洞描述、影响评估和建议防护措施的漏洞评估报告。In S304, the identification results of system vulnerabilities and weak points are integrated with the previous confidential information report data. Based on the identification results of vulnerabilities and weaknesses, the early warning mechanism of the system is optimized to increase the alertness to potential threats. A vulnerability assessment report including vulnerability description, impact assessment and recommended protection measures is generated.

# 代码示例：生成漏洞评估报告# Code Example: Generating a Vulnerability Assessment Report

vulnerability_report = generate_vulnerability_report(vulnerability_identification)vulnerability_report = generate_vulnerability_report(vulnerability_identification)

请参阅图5，基于安全漏洞模拟数据，采用数据流分析算法，对文件操作和系统行为进行动态监控和静态分析，寻找潜在的数据泄漏点，并进行数据流整合，生成数据流分析报告的步骤具体为：Please refer to Figure 5. Based on the security vulnerability simulation data, the data flow analysis algorithm is used to dynamically monitor and statically analyze file operations and system behaviors, find potential data leakage points, and integrate data flows. The specific steps for generating a data flow analysis report are as follows:

S401：基于优化后的预警机制和漏洞评估报告，采用数据流分析算法，动态监控系统内文件操作，并生成动态文件操作监控数据；S401: Based on the optimized early warning mechanism and vulnerability assessment report, a data flow analysis algorithm is used to dynamically monitor file operations in the system and generate dynamic file operation monitoring data;

S402：基于动态文件操作监控数据，继续采用数据流分析算法，进行系统行为的静态分析，并生成系统行为静态分析数据；S402: Based on the dynamic file operation monitoring data, continue to use the data flow analysis algorithm to perform static analysis of the system behavior and generate system behavior static analysis data;

S403：基于系统行为静态分析数据，标记潜在数据泄漏点，并生成潜在数据泄漏点标记；S403: statically analyzing data based on system behavior, marking potential data leakage points, and generating potential data leakage point markers;

S404：基于潜在数据泄漏点标记，进行数据流整合，评估整体系统的数据安全风险，并生成数据流分析报告；S404: Based on the potential data leakage point markers, data flow integration is performed to assess the data security risk of the overall system and generate a data flow analysis report;

数据流分析算法具体为对系统内部数据流的行为模式进行分析，系统行为静态分析数据具体指通过静态方法分析系统在没有外部输入时的行为，潜在数据泄漏点标记具体为在系统行为中导致信息外泄的风险区域，数据流分析报告包括数据流的整体分布、标记的泄漏点和建议修复措施。The data flow analysis algorithm specifically analyzes the behavioral patterns of the system's internal data flows. The system behavior static analysis data specifically refers to analyzing the system's behavior without external input through static methods. The potential data leakage point marking is specifically the risk area that may lead to information leakage in the system behavior. The data flow analysis report includes the overall distribution of the data flow, marked leakage points and recommended repair measures.

基于优化后的预警机制和漏洞评估报告，采用数据流分析算法对系统内的文件操作进行动态监控。这包括收集和分析系统中的文件操作日志，识别出与安全相关的操作，并生成动态文件操作监控数据。Based on the optimized early warning mechanism and vulnerability assessment report, the data flow analysis algorithm is used to dynamically monitor the file operations in the system. This includes collecting and analyzing the file operation logs in the system, identifying security-related operations, and generating dynamic file operation monitoring data.

接下来，基于动态文件操作监控数据，继续采用数据流分析算法对系统行为进行静态分析。这可以通过对系统在没有外部输入时的行为进行分析，识别出可能导致信息外泄的风险区域，并生成系统行为静态分析数据。Next, based on the dynamic file operation monitoring data, the data flow analysis algorithm is used to perform static analysis on the system behavior. This can identify risk areas that may lead to information leakage by analyzing the system behavior without external input, and generate static analysis data on the system behavior.

在生成了系统行为静态分析数据后，需要对这些数据中的潜在数据泄漏点进行标记。这可以通过分析系统行为中的异常模式或与已知漏洞相关的操作来实现。标记潜在数据泄漏点是为了将其与其他正常行为区分开来，以便后续的分析和处理。After generating static analysis data of system behavior, it is necessary to mark potential data leakage points in these data. This can be achieved by analyzing abnormal patterns in system behavior or operations related to known vulnerabilities. The purpose of marking potential data leakage points is to distinguish them from other normal behaviors for subsequent analysis and processing.

最后，基于潜在数据泄漏点的标记结果，进行数据流整合，评估整体系统的数据安全风险，并生成数据流分析报告。这包括对系统中各个数据流的整体分布进行分析，以及针对标记的泄漏点提供相应的修复建议和措施。Finally, based on the marking results of potential data leakage points, data flow integration is performed to assess the data security risks of the overall system and generate a data flow analysis report. This includes analyzing the overall distribution of each data flow in the system and providing corresponding repair suggestions and measures for the marked leakage points.

请参阅图6，基于文件原始数据，采用SHA-256哈希算法，对文件进行完整性校验，并将哈希值与存储在区块链上的哈希值进行对比，确定文件的完整性，并生成完整性校验记录的步骤具体为：Please refer to Figure 6. Based on the original data of the file, the SHA-256 hash algorithm is used to perform integrity verification on the file, and the hash value is compared with the hash value stored on the blockchain to determine the integrity of the file and generate the integrity verification record. The specific steps are:

S502：基于原始数据提取报告，采用SHA-256哈希算法，进行文件哈希计算，并进行哈希值的格式化，生成文件哈希值；S502: Based on the original data extraction report, the SHA-256 hash algorithm is used to perform file hash calculation, and the hash value is formatted to generate a file hash value;

S503：基于区块链网络接口，采用哈希检索法，进行与文件哈希值关联的检索，并进行哈希值比对准备，生成区块链哈希值；S503: Based on the blockchain network interface, a hash search method is used to perform a search associated with the file hash value, and a hash value comparison is performed to generate a blockchain hash value;

S504：基于文件哈希值和区块链哈希值，采用哈希比对法，进行文件完整性的确认，并进行完整性校验，生成完整性校验记录；S504: Based on the file hash value and the blockchain hash value, the file integrity is confirmed and checked by using a hash comparison method, and an integrity check record is generated;

原始数据提取报告具体为文件的字节流形式，文件哈希值具体为64位字符组成的字符串，区块链哈希值具体为在区块链上存储文件的哈希记录，完整性校验记录具体指出文件是否被篡改或损坏。The original data extraction report is in the form of a byte stream of the file, the file hash value is a string consisting of 64-bit characters, the blockchain hash value is a hash record of the file stored on the blockchain, and the integrity check record specifically indicates whether the file has been tampered with or damaged.

读取文件的原始数据。可以使用二进制读取法将文件中的数据读取出来，并进行格式化处理。确保文件内容以字节流的形式被提取出来。接下来，使用SHA-256哈希算法对文件内容进行哈希计算。SHA-256是一种常用的哈希算法，可以将任意长度的数据映射为一个固定长度（64位）的哈希值。通过将文件的字节流输入到SHA-256算法中，可以得到一个唯一的字符串表示的文件哈希值。Read the original data of the file. You can use the binary reading method to read the data in the file and format it. Make sure that the file content is extracted in the form of a byte stream. Next, use the SHA-256 hash algorithm to hash the file content. SHA-256 is a commonly used hash algorithm that can map data of any length to a fixed-length (64-bit) hash value. By inputting the byte stream of the file into the SHA-256 algorithm, you can get a unique string representation of the file hash value.

在生成了文件哈希值后，需要利用区块链网络接口进行与文件哈希值关联的检索。这可以通过查询区块链上存储的文件哈希记录来实现。根据文件的名称或其他标识信息，找到对应的区块链哈希记录。将生成的文件哈希值与区块链上的哈希记录进行比对准备。确保两个哈希值的长度和格式一致，以便进行后续的比对操作。After the file hash is generated, it is necessary to use the blockchain network interface to retrieve the file hash associated with it. This can be achieved by querying the file hash record stored on the blockchain. Based on the name or other identification information of the file, find the corresponding blockchain hash record. Prepare to compare the generated file hash with the hash record on the blockchain. Ensure that the length and format of the two hash values are consistent for subsequent comparison operations.

最后，采用哈希比对法进行文件完整性的确认。比较两个哈希值是否相同，如果相同则说明文件的完整性得到了保证；如果不同则可能存在完整性问题。根据比对结果生成完整性校验记录，指出文件是否被篡改或损坏。Finally, the hash comparison method is used to confirm the integrity of the file. Compare the two hash values to see if they are the same. If they are the same, the integrity of the file is guaranteed; if they are different, there may be an integrity problem. Generate an integrity check record based on the comparison results to indicate whether the file has been tampered with or damaged.

请参阅图7，结合完整性校验记录，采用基于角色的访问控制和动态加密技术，对文件进行访问控制和加密处理，并设定访问权限，生成访问控制和加密策略的步骤具体为：Please refer to FIG. 7. In combination with the integrity check record, role-based access control and dynamic encryption technology are used to perform access control and encryption processing on the file, and access rights are set. The specific steps for generating access control and encryption policies are as follows:

S601：基于完整性校验记录，采用角色分析法，进行文件角色的定义，并进行权限的设定，生成角色定义和权限分配表；S601: Based on the integrity check record, the role analysis method is used to define the file role, set the authority, and generate a role definition and authority allocation table;

S602：基于角色定义和权限分配表，采用基于角色的访问控制算法，进行访问权限的分配，并进行访问策略的设定，生成文件访问控制策略；S602: Based on the role definition and the permission allocation table, a role-based access control algorithm is used to allocate access rights, set access policies, and generate a file access control policy;

S603：基于文件访问控制策略，采用AES动态加密法，进行文件的加密处理，并进行加密数据的整合，生成加密后的文件数据；S603: Based on the file access control policy, the AES dynamic encryption method is used to encrypt the file, and the encrypted data is integrated to generate encrypted file data;

S604：基于加密后的文件数据和文件访问控制策略，采用策略整合法，进行文件的安全存储策略的制定，并进行策略的确认，生成访问控制和加密策略；S604: Based on the encrypted file data and the file access control policy, a policy integration method is used to formulate a secure storage policy for the file, and the policy is confirmed to generate an access control and encryption policy;

角色定义和权限分配表包括角色名称、角色描述及访问权限，文件访问控制策略具体为角色以及其访问权限的级别，加密后的文件数据具体指对原文件数据进行加密后的字节流，访问控制和加密策略具体为访问和解密该文件的完整策略和方法。The role definition and permission allocation table includes the role name, role description and access rights. The file access control policy specifically refers to the role and its access rights level. The encrypted file data specifically refers to the byte stream after the original file data is encrypted. The access control and encryption policy specifically refers to the complete strategy and method for accessing and decrypting the file.

S601中，基于完整性校验记录，确定文件的不同角色。然后为每个角色分配适当的访问权限，例如读取、写入或执行权限。最后，生成一个角色定义和权限分配表，其中列出了角色名称、角色描述以及与每个角色相关的具体访问权限。In S601, different roles of the file are determined based on the integrity check record. Appropriate access rights, such as read, write or execute rights, are then assigned to each role. Finally, a role definition and permission allocation table is generated, which lists the role name, role description and specific access rights associated with each role.

S602中，通过采用基于角色的访问控制算法，根据之前生成的角色定义和权限分配表，来对文件的访问权限进行分配。同时，设定文件访问策略，确定哪些角色可以以何种权限访问文件，以及哪些角色的访问应被拒绝。In S602, the access rights to the file are allocated according to the previously generated role definition and permission allocation table by using a role-based access control algorithm. At the same time, a file access policy is set to determine which roles can access the file with which permissions, and which roles' access should be denied.

S603中，采用AES动态加密法，对文件进行加密处理。这通常涉及使用AES或其他合适的加密算法来保护文件的安全性。加密后的文件数据会被整合成字节流或其他二进制数据格式，以确保数据安全存储。In S603, the AES dynamic encryption method is used to encrypt the file. This usually involves using AES or other suitable encryption algorithms to protect the security of the file. The encrypted file data will be integrated into a byte stream or other binary data format to ensure data security storage.

S604中，通过策略整合法，基于加密后的文件数据和文件访问控制策略，来制定文件的安全存储策略。这包括确定文件的物理存储位置、制定备份策略以及规划访问审计等措施。最终生成访控和加密策略，以确保文件的保密性和完整性。In S604, a secure storage strategy for the file is formulated based on the encrypted file data and the file access control strategy through the policy integration method. This includes determining the physical storage location of the file, formulating a backup strategy, and planning access audits. Finally, an access control and encryption strategy is generated to ensure the confidentiality and integrity of the file.

请参阅图8，一种计算机文件保密检查系统，计算机文件保密检查系统用于执行上述计算机文件保密检查方法，系统包括特征提取与融合模块、风险信息标注模块、风险漏洞模拟模块、系统行为分析模块、文件内容提取模块、文件完整性验证模块、安全策略制定模块。Please refer to Figure 8, a computer file confidentiality inspection system, the computer file confidentiality inspection system is used to execute the above-mentioned computer file confidentiality inspection method, the system includes a feature extraction and fusion module, a risk information labeling module, a risk vulnerability simulation module, a system behavior analysis module, a file content extraction module, a file integrity verification module, and a security policy formulation module.

特征提取与融合模块基于深度学习的框架，采用变换器算法及BERT模型进行文本处理并生成文本深度特征向量，并同时利用卷积神经网络提取图像特征，并与文本特征进行自编码器融合，生成文本和图像特征数据；The feature extraction and fusion module is based on a deep learning framework. It uses the transformer algorithm and BERT model to process text and generate text deep feature vectors. It also uses a convolutional neural network to extract image features and autoencodes them with text features to generate text and image feature data.

风险信息标注模块基于生成的文本和图像特征数据，通过自然语言处理和图像识别技术进行风险信息标注，运用统计方法归纳并整合，生成保密信息报告；The risk information annotation module uses natural language processing and image recognition technology to annotate risk information based on the generated text and image feature data, summarizes and integrates it using statistical methods, and generates a confidential information report;

风险漏洞模拟模块基于产生的保密信息报告，运用生成对抗网络模拟潜在风险，并通过模式识别确认系统弱点，优化预警机制，生成优化后的预警机制和漏洞评估报告；The risk vulnerability simulation module uses generative adversarial networks to simulate potential risks based on the generated confidential information reports, and confirms system weaknesses through pattern recognition, optimizes the early warning mechanism, and generates optimized early warning mechanisms and vulnerability assessment reports;

系统行为分析模块基于优化后的预警机制和漏洞评估报告，采用数据流分析算法对系统行为进行静态分析、标记潜在数据泄露点，并评估安全风险，生成数据流分析报告；The system behavior analysis module uses data flow analysis algorithms to statically analyze system behavior, mark potential data leakage points, evaluate security risks, and generate data flow analysis reports based on the optimized early warning mechanism and vulnerability assessment report.

文件内容提取模块采用二进制读取法对文件数据进行格式化处理，生成原始数据提取报告；The file content extraction module uses binary reading method to format the file data and generate the original data extraction report;

文件完整性验证模块基于产生的原始数据提取报告，运用SHA-256哈希算法计算文件哈希值，比对文件哈希值及区块链哈希值确保文件完整性，生成完整性校验记录；The file integrity verification module uses the SHA-256 hash algorithm to calculate the file hash value based on the generated raw data extraction report, compares the file hash value with the blockchain hash value to ensure file integrity, and generates an integrity verification record;

安全策略制定模块基于提取的完整性校验记录，将文件角色定义并设定权限，采用AES动态加密法对文件加密，并制定文件的安全存储策略，生成访问控制和加密策略。The security policy formulation module defines the file role and sets permissions based on the extracted integrity check records, encrypts the file using the AES dynamic encryption method, formulates a secure storage strategy for the file, and generates access control and encryption strategies.

由于采用深度学习和多种先进算法，该系统能够精确地提取和融合文本与图像特征。这种深度的特征提取确保了信息在传输、存储和处理过程中的安全性，从而大大减少了由于信息泄露、篡改或丢失导致的安全风险。Thanks to the use of deep learning and multiple advanced algorithms, the system is able to accurately extract and fuse text and image features. This deep feature extraction ensures the security of information during transmission, storage and processing, thereby greatly reducing security risks caused by information leakage, tampering or loss.

系统内置的风险信息标注和漏洞模拟模块使得潜在的安全风险得到了前期的识别和预警。这种早期识别和应对策略有助于在问题进一步恶化之前采取相应措施。The system's built-in risk information annotation and vulnerability simulation modules allow potential security risks to be identified and warned in advance. This early identification and response strategy helps to take corresponding measures before the problem deteriorates further.

通过SHA-256哈希算法和区块链技术的结合，该系统在确保文件完整性方面具有很高的可靠性。区块链技术的引入增加了对文件篡改的阻力，从而增加了文件的安全性。Through the combination of SHA-256 hash algorithm and blockchain technology, the system has high reliability in ensuring file integrity. The introduction of blockchain technology increases resistance to file tampering, thereby increasing file security.

该系统不仅关注数据的安全，还着眼于数据的有效管理。从特征提取到风险评估，再到文件完整性的验证，每一个步骤都是为了确保数据的准确性和完整性，从而提高整个数据管理流程的效率。The system not only focuses on data security, but also on effective data management. From feature extraction to risk assessment to file integrity verification, each step is to ensure the accuracy and integrity of the data, thereby improving the efficiency of the entire data management process.

安全策略制定模块的引入使得文件的权限管理更为明确和严格。根据文件的角色和重要性，系统可以动态地为其分配权限，确保只有授权的用户才能访问相关信息。The introduction of the security policy formulation module makes file permission management more clear and strict. According to the role and importance of the file, the system can dynamically assign permissions to it to ensure that only authorized users can access relevant information.

从文本处理、图像特征提取，到文件完整性和访问控制，该系统提供了一个全方位的数据保护策略。每一个模块都为数据的安全性提供了一个额外的屏障，确保在任何环境和条件下，数据都受到了最好的保护。From text processing, image feature extraction, to file integrity and access control, the system provides a comprehensive data protection strategy. Each module provides an additional barrier for data security, ensuring that data is best protected in any environment and conditions.

请参阅图9，特征提取与融合模块包括文本处理子模块、图像特征提取子模块、特征融合子模块；Please refer to FIG9 , the feature extraction and fusion module includes a text processing submodule, an image feature extraction submodule, and a feature fusion submodule;

风险信息标注模块包括风险信息标注子模块、保密信息分类子模块、信息整合子模块；The risk information labeling module includes a risk information labeling submodule, a confidential information classification submodule, and an information integration submodule;

风险漏洞模拟模块包括安全漏洞模拟子模块、攻击者行为模拟子模块、系统弱点识别子模块、预警机制优化子模块；The risk vulnerability simulation module includes a security vulnerability simulation submodule, an attacker behavior simulation submodule, a system weakness identification submodule, and an early warning mechanism optimization submodule;

系统行为分析模块包括文件操作监控子模块、系统行为分析子模块、数据泄漏点标记子模块、数据流整合子模块；The system behavior analysis module includes a file operation monitoring submodule, a system behavior analysis submodule, a data leakage point marking submodule, and a data flow integration submodule;

文件内容提取模块包括数据读取子模块、数据格式化子模块；The file content extraction module includes a data reading submodule and a data formatting submodule;

文件完整性验证模块包括哈希计算子模块、哈希检索子模块、完整性确认子模块；The file integrity verification module includes a hash calculation submodule, a hash retrieval submodule, and an integrity confirmation submodule;

安全策略制定模块包括文件角色定义子模块、访问权限设定子模块、文件加密子模块、策略确认子模块。The security policy formulation module includes a file role definition submodule, an access permission setting submodule, a file encryption submodule, and a policy confirmation submodule.

特征提取与融合模块中，文本处理子模块采用深度学习的框架，使用变换器算法和BERT模型对文本进行处理，生成文本深度特征向量。图像特征提取子模块利用卷积神经网络提取图像特征。特征融合子模块将文本特征和图像特征进行自编码器融合，生成文本和图像特征数据。In the feature extraction and fusion module, the text processing submodule uses the deep learning framework, the transformer algorithm and the BERT model to process the text and generate a text deep feature vector. The image feature extraction submodule uses a convolutional neural network to extract image features. The feature fusion submodule fuses text features and image features through an autoencoder to generate text and image feature data.

风险信息标注模块中，风险信息标注子模块基于生成的文本和图像特征数据，通过自然语言处理和图像识别技术进行风险信息的标注。保密信息分类子模块运用统计方法归纳并整合标注的风险信息，生成保密信息报告。信息整合子模块将保密信息报告进行整合，生成最终的保密信息报告。In the risk information annotation module, the risk information annotation submodule annotates risk information based on the generated text and image feature data through natural language processing and image recognition technology. The confidential information classification submodule uses statistical methods to summarize and integrate the annotated risk information to generate a confidential information report. The information integration submodule integrates the confidential information report to generate the final confidential information report.

风险漏洞模拟模块中，安全漏洞模拟子模块基于产生的保密信息报告，运用生成对抗网络模拟潜在风险。攻击者行为模拟子模块模拟攻击者的行为模式。系统弱点识别子模块通过模式识别确认系统的弱点。预警机制优化子模块根据模拟结果优化预警机制，生成优化后的预警机制和漏洞评估报告。In the risk vulnerability simulation module, the security vulnerability simulation submodule uses a generative adversarial network to simulate potential risks based on the generated confidential information report. The attacker behavior simulation submodule simulates the attacker's behavior pattern. The system weakness identification submodule confirms the system's weaknesses through pattern recognition. The early warning mechanism optimization submodule optimizes the early warning mechanism based on the simulation results and generates an optimized early warning mechanism and vulnerability assessment report.

系统行为分析模块中，文件操作监控子模块对系统的文件操作进行监控。系统行为分析子模块对系统在没有外部输入时的行为进行分析。数据泄漏点标记子模块标记出潜在的数据泄漏点。数据流整合子模块对分析结果进行整合，生成数据流分析报告。In the system behavior analysis module, the file operation monitoring submodule monitors the file operations of the system. The system behavior analysis submodule analyzes the behavior of the system when there is no external input. The data leakage point marking submodule marks potential data leakage points. The data flow integration submodule integrates the analysis results and generates a data flow analysis report.

文件内容提取模块中，数据读取子模块采用二进制读取法对文件数据进行读取。数据格式化子模块对读取的数据进行格式化处理，生成原始数据提取报告。In the file content extraction module, the data reading submodule uses a binary reading method to read the file data. The data formatting submodule formats the read data and generates an original data extraction report.

文件完整性验证模块中，哈希计算子模块基于产生的原始数据提取报告，运用SHA-256哈希算法计算文件的哈希值。哈希检索子模块比对文件哈希值及区块链上的哈希值，确保文件的完整性。完整性确认子模块根据比对结果确认文件的完整性，生成完整性校验记录。In the file integrity verification module, the hash calculation submodule uses the SHA-256 hash algorithm to calculate the hash value of the file based on the generated raw data extraction report. The hash retrieval submodule compares the file hash value with the hash value on the blockchain to ensure the integrity of the file. The integrity confirmation submodule confirms the integrity of the file based on the comparison results and generates an integrity verification record.

安全策略制定模块中，文件角色定义子模块定义文件的角色。访问权限设定子模块设定文件的访问权限。文件加密子模块采用AES动态加密法对文件进行加密。策略确认子模块确认制定的策略，生成访问控制和加密策略。In the security policy formulation module, the file role definition submodule defines the role of the file. The access permission setting submodule sets the access permission of the file. The file encryption submodule uses AES dynamic encryption to encrypt the file. The policy confirmation submodule confirms the formulated policy and generates access control and encryption policies.

以上，仅是本发明的较佳实施例而已，并非对本发明作其他形式的限制，任何熟悉本专业的技术人员可能利用上述揭示的技术内容加以变更或改型为等同变化的等效实施例应用于其他领域，但是凡是未脱离本发明技术方案内容，依据本发明的技术实质对以上实施例所做的任何简单修改、等同变化与改型，仍属于本发明技术方案的保护范围。The above are only preferred embodiments of the present invention and are not intended to limit the present invention in other forms. Any technician familiar with the profession may use the technical contents disclosed above to change or modify them into equivalent embodiments with equivalent changes and apply them to other fields. However, any simple modification, equivalent change and modification made to the above embodiments based on the technical essence of the present invention without departing from the technical solution of the present invention still falls within the protection scope of the technical solution of the present invention.

Claims

1. A computer file confidentiality checking method, characterized in that it comprises the following steps:

Based on deep learning, transformer algorithms and convolutional neural networks are used to perform deep semantic analysis and key feature extraction on text and image content, and integrate feature data to generate text and image feature data;

Based on the text and image feature data, natural language processing technology and image recognition algorithms are used to determine the confidential information of the text and images in the file, and classification processing is performed to generate a confidential information report;

Based on the confidential information report, a generative adversarial network is used to simulate potential security vulnerabilities and attack behaviors, enhance the system's early warning mechanism, and integrate vulnerability data to generate security vulnerability simulation data;

Based on the security vulnerability simulation data, a data flow analysis algorithm is used to dynamically monitor and statically analyze file operations and system behaviors, find potential data leakage points, and integrate data flows to generate a data flow analysis report;

Based on the original data of the file, the SHA-256 hash algorithm is used to check the integrity of the file, and the hash value is compared with the hash value stored on the blockchain to determine the integrity of the file and generate an integrity check record;

In combination with the integrity check record, role-based access control and dynamic encryption technology are used to perform access control and encryption processing on the file, and access rights are set to generate access control and encryption policies;

The transformer algorithm is specifically a BERT and GPT series model for semantic understanding of text content, the convolutional neural network is used to extract key features from images, the natural language processing technology is used to identify text-based confidential information including private API keys and passwords, the image recognition algorithm is used to identify confidential content in images, the data flow analysis algorithm is specifically used to track the flow and storage path of confidential data in the system, and identify attempts at illegal access or tampering, and the dynamic encryption technology includes AES to encrypt files and use RSA to manage AES keys;

Based on the security vulnerability simulation data, the data flow analysis algorithm is used to dynamically monitor and statically analyze file operations and system behaviors, find potential data leakage points, and integrate data flows. The specific steps for generating a data flow analysis report are as follows:

Based on the optimized early warning mechanism and vulnerability assessment report, a data flow analysis algorithm is used to dynamically monitor file operations in the system and generate dynamic file operation monitoring data;

Based on the dynamic file operation monitoring data, continue to use the data flow analysis algorithm to perform static analysis of system behavior and generate system behavior static analysis data;

Static analysis of the data based on the system behavior, marking potential data leakage points, and generating potential data leakage point markers;

Based on the potential data leakage point markers, data flow integration is performed to evaluate the data security risks of the overall system and generate a data flow analysis report;

The data flow analysis algorithm specifically analyzes the behavioral patterns of the data flow within the system. The system behavior static analysis data specifically refers to analyzing the system behavior without external input through static methods. The potential data leakage point marker is specifically the risk area that may lead to information leakage in the system behavior. The data flow analysis report includes the overall distribution of the data flow, the marked leakage points and the recommended repair measures.

2. The computer file confidentiality inspection method according to claim 1 is characterized in that, based on deep learning, a transformer algorithm and a convolutional neural network are used to perform deep semantic analysis and key feature extraction on text and image content, and feature data integration is performed, and the steps of generating text and image feature data are specifically as follows:

Based on the deep learning framework, the transformer algorithm is used to perform preliminary text processing, convert it into an intermediate vector representation, and perform feature extraction to generate a text intermediate vector;

Based on the text intermediate vector, a BERT model is used for deep learning training to extract the deep semantic features of the text and generate a text deep feature vector;

Use convolutional neural network to extract primary features of image content, convert the image into a preliminary feature matrix, and generate the primary feature matrix of the image;

Based on the text deep feature vector and the image primary feature matrix, an autoencoder is used to perform feature fusion, integrate key features of the text and the image, and generate text and image feature data;

The text intermediate vector is specifically a vectorized expression of the text content, the text deep feature vector specifically refers to the deep learning feature representation of the original text, the image primary feature matrix is specifically a feature representation of the image content, and the text and image feature data includes a fused feature vector representation of the text and image.

3. The computer file confidentiality inspection method according to claim 1 is characterized in that, based on the text and image feature data, natural language processing technology and image recognition algorithm are used to determine the confidentiality information of the text and image in the file, and classification processing is performed, and the steps of generating a confidentiality information report are specifically as follows:

Based on the text and image feature data, natural language processing technology is used to annotate potential risk information in the text content to generate a preliminary text confidentiality report;

Based on the text and image feature data, an image recognition algorithm is used to mark potential risk information in the image content and generate a preliminary image confidentiality report;

Based on the text confidentiality preliminary report and the image confidentiality preliminary report, a clustering algorithm is used to summarize similar confidential information to generate confidential information classification data;

Based on the classified data of confidential information, statistical methods are used to integrate the information, form an overview and details of confidential information, and generate a confidential information report;

The text confidentiality preliminary report specifically includes the location and content of potential risk words, sentences or paragraphs, the image confidentiality preliminary report specifically indicates potential risk areas or objects in the image, and the confidential information classification data specifically includes confidential information classified by type, source or importance.

4. The computer file confidentiality inspection method according to claim 1 is characterized in that, based on the confidential information report, a generative adversarial network is used to simulate potential security vulnerabilities and attack behaviors, enhance the early warning mechanism of the system, and integrate vulnerability data. The steps of generating security vulnerability simulation data are specifically as follows:

Based on the confidential information report, using a generative adversarial network to simulate potential security vulnerabilities and generate potential security vulnerability simulation scenarios;

Based on the potential security vulnerability simulation scenario, reinforcement learning is used to simulate attacker behavior and generate simulated attack behavior data;

Based on the simulated attack behavior data, pattern recognition is used to identify system weaknesses and generate system vulnerability and weakness identification results;

Based on the system vulnerability and weakness identification results, data integration is performed to optimize the early warning mechanism, and an optimized early warning mechanism and vulnerability assessment report is generated;

The generative adversarial network specifically uses a generator and a discriminator to capture data distribution for simulating attack scenarios. The pattern recognition specifically uses a support vector machine and a decision tree algorithm to automatically identify and classify attack patterns. The optimized early warning mechanism and vulnerability assessment report include a description of the identified vulnerability, an impact assessment, and recommended protective measures.

5. The computer file confidentiality check method according to claim 1 is characterized in that the steps of performing integrity check on the file based on the original data of the file using the SHA-256 hash algorithm, and comparing the hash value with the hash value stored on the blockchain to determine the integrity of the file, and generating an integrity check record are specifically as follows:

Based on the file data, the binary reading method is used to extract the file content, format it, and generate a raw data extraction report;

Based on the original data extraction report, the SHA-256 hash algorithm is used to perform file hash calculation and format the hash value to generate a file hash value;

Based on the blockchain network interface, a hash search method is used to perform a search associated with the hash value of the file, and a hash value comparison is performed to generate a blockchain hash value;

Based on the file hash value and the blockchain hash value, a hash comparison method is used to confirm the integrity of the file, perform integrity verification, and generate an integrity verification record;

The original data extraction report is specifically in the form of a byte stream of the file, the file hash value is specifically a string consisting of 64-bit characters, the blockchain hash value is specifically a hash record of the file stored on the blockchain, and the integrity check record specifically indicates whether the file has been tampered with or damaged.

6. The computer file confidentiality check method according to claim 1 is characterized in that, in combination with the integrity check record, role-based access control and dynamic encryption technology are used to perform access control and encryption processing on the file, and access rights are set, and the steps of generating access control and encryption strategies are specifically as follows:

Based on the integrity check record, a role analysis method is used to define the file role, set permissions, and generate a role definition and permission allocation table;

Based on the role definition and the permission allocation table, a role-based access control algorithm is used to allocate access rights, set access policies, and generate a file access control policy;

Based on the file access control strategy, the AES dynamic encryption method is used to encrypt the file, and the encrypted data is integrated to generate encrypted file data;

Based on the encrypted file data and file access control policy, a policy integration method is used to formulate a secure storage policy for the file, confirm the policy, and generate an access control and encryption policy;

The role definition and permission allocation table includes role name, role description and access rights. The file access control policy specifically refers to the role and its access rights level. The encrypted file data specifically refers to the byte stream after the original file data is encrypted. The access control and encryption policy specifically refers to the complete strategy and method for accessing and decrypting the file.

7. A computer file confidentiality inspection system, characterized in that, according to the computer file confidentiality inspection method according to any one of claims 1-6, the system includes a feature extraction and fusion module, a risk information labeling module, a risk vulnerability simulation module, a system behavior analysis module, a file content extraction module, a file integrity verification module, and a security policy formulation module.

8. The computer file confidentiality inspection system according to claim 7 is characterized in that the feature extraction and fusion module is based on a deep learning framework, uses a transformer algorithm and a BERT model to process text and generate a text deep feature vector, and simultaneously uses a convolutional neural network to extract image features, and performs autoencoder fusion with text features to generate text and image feature data;

The risk information annotation module annotates risk information based on the generated text and image feature data through natural language processing and image recognition technology, summarizes and integrates it using statistical methods, and generates a confidential information report;

The risk vulnerability simulation module uses a generative adversarial network to simulate potential risks based on the generated confidential information report, and confirms system weaknesses through pattern recognition, optimizes the early warning mechanism, and generates an optimized early warning mechanism and vulnerability assessment report;

The system behavior analysis module uses a data flow analysis algorithm to statically analyze system behavior, mark potential data leakage points, evaluate security risks, and generate a data flow analysis report based on the optimized early warning mechanism and vulnerability assessment report;

The file content extraction module formats the file data using a binary reading method to generate a raw data extraction report;

The file integrity verification module calculates the file hash value based on the generated original data extraction report using the SHA-256 hash algorithm, compares the file hash value with the blockchain hash value to ensure file integrity, and generates an integrity verification record;

The security policy formulation module defines the file role and sets permissions based on the extracted integrity check record, encrypts the file using the AES dynamic encryption method, formulates a secure storage strategy for the file, and generates access control and encryption strategies.

9. The computer file confidentiality inspection system according to claim 7, characterized in that the feature extraction and fusion module includes a text processing submodule, an image feature extraction submodule, and a feature fusion submodule;

The risk information labeling module includes a risk information labeling submodule, a confidential information classification submodule, and an information integration submodule;

The risk vulnerability simulation module includes a security vulnerability simulation submodule, an attacker behavior simulation submodule, a system weakness identification submodule, and an early warning mechanism optimization submodule;

The system behavior analysis module includes a file operation monitoring submodule, a system behavior analysis submodule, a data leakage point marking submodule, and a data flow integration submodule;

The file content extraction module includes a data reading submodule and a data formatting submodule;

The file integrity verification module includes a hash calculation submodule, a hash retrieval submodule, and an integrity confirmation submodule;

The security policy formulation module includes a file role definition submodule, an access permission setting submodule, a file encryption submodule, and a policy confirmation submodule.