CN103218574A

CN103218574A - Hash tree-based data dynamic operation verifiability method

Info

Publication number: CN103218574A
Application number: CN2013101325650A
Authority: CN
Inventors: 邢建川; 韩帅
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-04-09
Filing date: 2013-04-09
Publication date: 2013-07-24

Abstract

The invention discloses a data dynamic operation verifiability method based on a hash tree, which is composed of three parts connected by a communication network including a user USER, a cloud computing data center CDC and a third-party audit agency TPA. As the requesting party for the data storage service, USER hopes to store the data files it owns in the cloud storage space of the cloud computing data center. USER can be either an individual user or an enterprise user. CDC is responsible for responding to users' data storage service requests, storing users' data files in its own huge data center according to certain rules, and is responsible for the management and maintenance of data files. As a reliable third-party auditor, TPA is entrusted by USER to review the integrity and consistency of the data files stored in the CDC data center. The invention solves the problem of verifying the integrity and consistency of user data files under the cloud computing environment.

Description

A Verifiable Method of Data Dynamic Operation Based on Hash Tree

技术领域technical field

本发明属于计算机技术领域，涉及一种基于哈希树的数据动态操作可验证性方法。The invention belongs to the technical field of computers and relates to a hash tree-based data dynamic operation verifiability method.

背景技术Background technique

目前，尽管云计算服务提供商能够通过稳定高速的网络连接使用户获得便捷高效的远程数据存储访问服务，但是由于云计算具有的虚拟化、大规模、动态配置和可扩展性等诸多自身特性，还是为在云计算环境下的大规模数据存储服务带来了诸多的安全隐患和挑战。为了提高存储效率和存储空间的利用率，大的数据文件通常会被云计算服务提供商拆分成若干个小的数据块进行存储，每一个数据块存储的地理位置及存储的状态都是用户所未知的，用户难免会对自己数据文件的完整性和一致性产生怀疑。作为衡量数据存储服务的一大关键指标，如何保证存储在地理未知的庞大服务器集群中的用户数据文件的完整性和一致性，一直以来都是云计算数据存储服务所面临的一大难题。尤其在亚马逊简单存储服务和谷歌Docs服务相继发生服务意外中断等事故之后，用户对云计算服务提供商是否会为了节省资源和节约成本而隐瞒了更多的安全事故产生了极大的不信任。用户希望能够有一套完整的机制使自己在不耗费过多计算资源和时间的前提下，拥有对数据文件进行完整性和一致性审查的能力。At present, although cloud computing service providers can enable users to obtain convenient and efficient remote data storage access services through stable and high-speed network connections, due to the many characteristics of cloud computing, such as virtualization, large-scale, dynamic configuration, and scalability, It still brings many security risks and challenges to large-scale data storage services in the cloud computing environment. In order to improve storage efficiency and utilization of storage space, large data files are usually divided into several small data blocks by cloud computing service providers for storage. The geographical location and storage status of each data block are determined by the user. Unknown, users will inevitably doubt the integrity and consistency of their own data files. As a key indicator for measuring data storage services, how to ensure the integrity and consistency of user data files stored in huge server clusters with unknown geography has always been a major problem faced by cloud computing data storage services. Especially after accidents such as Amazon Simple Storage Service and Google Docs service were unexpectedly interrupted one after another, users have great distrust of whether cloud computing service providers will conceal more security incidents in order to save resources and save costs. Users hope to have a complete set of mechanisms to enable them to have the ability to review the integrity and consistency of data files without consuming too much computing resources and time.

相关的研究在很早之前便已经开始进行，并在设计方案的效率、可验证性、可查询性和可恢复性方面取得了不错的成绩。目前，对于数据完整性和一致性较为通用的解决方案主要有私下审计和公开审计两个类型，如图1所示。Relevant research has been carried out a long time ago, and good results have been achieved in the efficiency, verifiability, queryability and recoverability of the design scheme. At present, there are two types of common solutions for data integrity and consistency: private audit and public audit, as shown in Figure 1.

私下审计顾名思义就是用户自己承担对数据文件的审计工作，公开审计则是将审计委托可信的第三方审计机构来完成。虽然私下审计因其逻辑简单拥有更高的审计效率，但是公开审计不仅能够为用户提供安全可靠的数据审计，还在很大程度上为用户节省了大量的计算资源和时间。在云计算环境下，用户不太可能拥有大量的时间和精力来对自己的数据文件进行频繁的审计工作，将这项费时费力的任务交由拥有可靠审计协议和完整解决方案的可信第三方审计机构来完成可以说是一个非常不错的选择。本发明拥有可信第三方审计机构的云计算数据存储安全体系架构，针对云计算环境下数据文件的存储和操作特点，设计了一个基于MerkleHash Tree的数据动态操作可验证性方案。As the name implies, the private audit means that the user undertakes the audit of the data files, while the public audit entrusts a trusted third-party audit agency to complete the audit. Although private auditing has higher auditing efficiency due to its simple logic, public auditing can not only provide users with safe and reliable data auditing, but also save users a lot of computing resources and time to a large extent. In a cloud computing environment, users are unlikely to have a lot of time and energy to perform frequent audits on their own data files, and this time-consuming and laborious task is entrusted to a trusted third party with reliable audit protocols and complete solutions It can be said that it is a very good choice for audit institutions to complete. The present invention has a cloud computing data storage security architecture of a trusted third-party audit institution, and designs a data dynamic operation verifiability scheme based on MerkleHash Tree for the storage and operation characteristics of data files in the cloud computing environment.

数据存储可验证性方面的研究在很早之前便已受到业界的普遍关注，学者们提出的解决方案在效率、可验证性、可查询性和可恢复性等方面也都取得了一定的成绩。但可惜的是，大多数学者的研究内容局限于对静态数据文件的操作，很多研究成果不能够满足对于数据文件频繁的动态操作。下面将对学者们之前的相关研究成果进行一下总结。The research on the verifiability of data storage has been widely concerned by the industry a long time ago, and the solutions proposed by scholars have also achieved certain results in terms of efficiency, verifiability, queryability and recoverability. Unfortunately, the research content of most scholars is limited to the operation of static data files, and many research results cannot satisfy the frequent dynamic operations of data files. The following is a summary of the previous research results of scholars.

在远程存储数据认证方面，Burns等人最先提出了确保存储在不可信存储介质中数据的可验证性，并提出了一个叫做数据证明拥有(provable data possession，PDP)的模型概念。Burns等人使用基于RSA的标签，实现了对于外部数据的审计功能。但可惜Burns等人没有充分考虑数据的动态存储问题。随后，Ateniese等人针对基于PDP模型在静态存储转换动态存储的过程中存在安全隐患的缺陷，提出了一个在PDP模型下支持动态数据操作的新模型。遗憾的是，Ateniese等人的新模型并没有支持所有的基本数据动态操作，包括插入操作在内的重要数据操作方式都没有能够出现在新模型中。美国伊利诺伊理工大学的华裔科学家王聪博士等人在那之后提出了一个应用于分布式环境，能够审查数据文件正确性并定位可能的失效节点的方案，但是与Ateniese等人提出的模型类似，王聪等人的方案也未能支持全部的基本动态操作。此外，Juels和Kaliski提出了一种叫做数据可检索性证明(proofof retrievability，POR)的概念模型。在他们的文献中，校验码和抽样调查方法被用来确保数据的可检索性，并且保证数据完整地存储在数据中心内。Juels和Kaliski采用为数据块随机添加附加信息的方法来达到检测的目的。但是POR模型限制了查询的数量，也不支持公开的审计方案。Waters和Shacham在那之后设计了一套改进的POR模型，不过改进的模型仅拥有为静态数据提供数据恢复的能力。Papamanthou等人是最先探索构建动态PDP模型可能性的学者，他们提出的改进PDP模型具备完整的动态操作功能，并成功的避免了使用额外的数据标签。此外，Devanbu等人在文献中给出了基于MerkleHash Tree的数据查询验证解决方案，目前主流的基于Merkle Hash Tree结构的查询验证算法均将其作为基础。现有技术中给出了一种基于POR模型的数据块验证模型，该模型也是基于Merkle Hash Tree数据结构，但是并未考虑在分布式环境下数据存储的地理位置对于计算效率所产生的影响。In terms of remote storage data authentication, Burns et al. first proposed to ensure the verifiability of data stored in untrusted storage media, and proposed a model concept called provable data possession (PDP). Burns et al. used RSA-based tags to implement the audit function for external data. Unfortunately, Burns et al. did not fully consider the dynamic storage of data. Subsequently, Ateniese et al. proposed a new model that supports dynamic data operations under the PDP model, aiming at the defect of potential safety hazards in the process of converting static storage to dynamic storage based on the PDP model. Unfortunately, the new model of Ateniese et al. does not support all basic data dynamic operations, and important data operation methods including insert operations are not able to appear in the new model. Dr. Wang Cong, a Chinese scientist at the Illinois Institute of Technology in the United States, and others later proposed a solution for distributed environments that can review the correctness of data files and locate possible failure nodes, but similar to the model proposed by Ateniese et al., Wang The scheme of Cong et al. also fails to support all basic dynamic operations. Furthermore, Juels and Kaliski proposed a conceptual model called proof of retrievability (POR). In their literature, checksums and sampling methods are used to ensure the retrievability of the data and guarantee the integrity of the data stored in the data center. Juels and Kaliski use the method of randomly adding additional information to the data block to achieve the purpose of detection. But the POR model limits the number of queries and does not support public auditing schemes. Waters and Shacham then designed an improved POR model, but the improved model only has the ability to provide data recovery for static data. Papamanthou et al. were the first scholars to explore the possibility of constructing a dynamic PDP model. The improved PDP model proposed by them has complete dynamic operation functions and successfully avoids the use of additional data labels. In addition, Devanbu et al. gave a data query verification solution based on Merkle Hash Tree in the literature. Currently, mainstream query verification algorithms based on Merkle Hash Tree structure use it as the basis. A data block verification model based on the POR model is given in the prior art, which is also based on the Merkle Hash Tree data structure, but does not consider the impact of the geographical location of data storage on computing efficiency in a distributed environment.

发明内容Contents of the invention

为了解决上述技术问题，本发明提供一种基于Merkle Hash Tree的数据动态操作可验证性方案是由用户(USER)、云计算数据中心(CDC)和第三方审计机构(TPA)三部分通过通信网络连接组成。USER作为数据存储服务请求的提出一方，希望将自己拥有的数据文件存储到云计算数据中心的云存储空间之中。USER既可以是个人用户，也可以是企业用户。CDC负责响应用户的数据存储服务请求，按照一定的规则将用户的数据文件存储到自己庞大的数据中心，并对数据文件的管理维护负责。TPA作为可靠的第三方审计机构，受USER的委托对存储在CDC数据中心的数据文件进行完整性和一致性的审查。方案在参考先前研究成果的基础上，利用Merkle Hash Tree特殊的树状结构，解决了云计算环境下对于用户数据文件完整性和一致性的验证问题。不仅有效地支持云计算环境下所有的基本数据动态操作(包括数据的增加、删除和修改等)，还充分考虑到分布式环境下数据存储的地理位置对于计算效率的影响，对基于Merkle Hash Tree可验证性方面的研究做出了一定的贡献。In order to solve the above technical problems, the present invention provides a data dynamic operation verifiability scheme based on Merkle Hash Tree, which is composed of three parts: the user (USER), the cloud computing data center (CDC) and the third-party audit agency (TPA) through the communication network Connection composition. As the requesting party for the data storage service, USER hopes to store the data files it owns in the cloud storage space of the cloud computing data center. USER can be either an individual user or an enterprise user. CDC is responsible for responding to users' data storage service requests, storing users' data files in its own huge data center according to certain rules, and is responsible for the management and maintenance of data files. As a reliable third-party auditor, TPA is entrusted by USER to review the integrity and consistency of the data files stored in the CDC data center. On the basis of referring to previous research results, the solution uses the special tree structure of Merkle Hash Tree to solve the problem of verifying the integrity and consistency of user data files in the cloud computing environment. It not only effectively supports all basic data dynamic operations in the cloud computing environment (including data addition, deletion and modification, etc.), but also fully considers the impact of the geographic location of data storage on computing efficiency in a distributed environment, and is based on Merkle Hash Tree Research on verifiability has made some contributions.

其技术方案如下：Its technical scheme is as follows:

一种基于哈希树的数据动态操作可验证性方法，包括以下步骤：A method for verifiability of data dynamic operations based on a hash tree, comprising the following steps:

A文件预处理A file preprocessing

文件进行预处理之前，USER会向CDC提出数据文件的存储请求。CDC按照其预先设定的访问控制规则对USER进行身份验证。通过CDC身份验证的合法用户便会获得进行文件存储的权限。Before the file is preprocessed, USER will request CDC to store the data file. CDC authenticates USER according to its preset access control rules. Legitimate users authenticated by the CDC will be granted permission to store files.

USER通过CDC的身份验证之后，CDC便开始接收USER需要存储的数据文件，并对其进行预处理：After USER passes the CDC authentication, CDC starts to receive the data files that USER needs to store and preprocess them:

(1)首先，数据文件将被分成大小相同的若干个数据块，F→(f1，f2，f3，f4，f5，f6，...fn)。(1) First, the data file will be divided into several data blocks of the same size, F→(f1, f2, f3, f4, f5, f6, ...fn).

(2)然后，CDC对每个数据块进行哈希操作并求出所有数据块的哈希值H(fi)(1≤i≤n)。(2) Then, CDC performs a hash operation on each data block and obtains the hash value H(fi) (1≤i≤n) of all data blocks.

(3)哈希操作之后，数据文件可以记为：F’→(f1+H(f1)，f2+H(f2)，f3+H(f3)，...，fn+H(fn))。CDC将暂时保存所有数据块的哈希值，为接下来构造文件的验证数据结构Merkle Hash Tree做准备。(3) After the hash operation, the data file can be recorded as: F'→(f1+H(f1), f2+H(f2), f3+H(f3), ..., fn+H(fn)) . CDC will temporarily save the hash values of all data blocks in preparation for the next construction of the verification data structure Merkle Hash Tree of the file.

(4)为了对数据块进行唯一的地理位置标记，方案为每个数据块添加了一个5字节大小的位置标签(LTag)，标签由2字节的次序标记信息、1字节的机架标记信息和2字节的节点标记信息组成。次序标记信息记录的是数据块在所有数据块中的次序编号，机架标记信息记录的是该数据块存储在数据中心的具体机架编号，节点标记信息则标明数据块存储的节点服务器的具体编号。CDC会为每个数据文件维护一个LTag标签列表，记录该文件所有数据块的LTag标签信息。(4) In order to uniquely mark the geographic location of the data block, the scheme adds a 5-byte location tag (LTag) to each data block, and the tag consists of 2-byte sequence tag information and 1-byte rack Composed of tag information and 2-byte node tag information. The sequence mark information records the sequence number of the data block in all data blocks, the rack mark information records the specific rack number of the data block stored in the data center, and the node mark information indicates the specific number of the node server where the data block is stored. serial number. CDC maintains an LTag label list for each data file, recording the LTag label information of all data blocks in the file.

(5)标签添加完毕之后，所有数据块将会被CDC存入数据中心，数据块所存储的地理位置信息也会被记录在数据块自己的LTag标签之中。待所有数据块的LTag标签记录完毕，CDC将会为数据文件创建一个包含全部数据块地理位置信息的LTag标签信息列表。(5) After the tags are added, all data blocks will be stored in the data center by CDC, and the geographical location information stored in the data blocks will also be recorded in the LTag tags of the data blocks themselves. After the LTag tags of all data blocks are recorded, CDC will create an LTag tag information list for the data file that includes the geographic location information of all data blocks.

(6)随后，进行关键数据结构Merkle Hash Tree的构造。(6) Subsequently, the construction of the key data structure Merkle Hash Tree is carried out.

1)首先，CDC从Ltag标签列表中取出文件所有数据块的机架值Rack(LTag(fi))，对存储在不同机架中的数据块进行划分，存入相应机架的ListRack[]列表。1) First, CDC takes out the rack value Rack(LTag(fi)) of all data blocks in the file from the Ltag label list, divides the data blocks stored in different racks, and stores them in the ListRack[] list of the corresponding rack .

2)然后，CDC依次从不同机架的ListRack[]列表中取出所有数据块Ltag标签中的节点信息，对存储在相同机架不同节点的数据块再次进行划分。2) Then, the CDC sequentially takes out the node information in the Ltag tags of all data blocks from the ListRack[] lists of different racks, and divides the data blocks stored in different nodes of the same rack again.

3)接着，对存储在相同机架相同节点的数据块，CDC将取出它们Ltag标签中的次序信息，进行按序排列，用数据块的哈希值作为叶子节点构造节点Merkle Hash Tree，并计算其根节点值(如果节点仅存有一个数据块，则该数据块的哈希值便为该节点Merkle Hash Tree的根节点值)，记为NRoot[i，j](其中i代表所在机架编号，j代表所在节点编号)。3) Next, for the data blocks stored in the same rack and the same node, CDC will take out the sequence information in their Ltag tags, arrange them in order, use the hash value of the data block as the leaf node to construct the node Merkle Hash Tree, and calculate Its root node value (if the node has only one data block, the hash value of the data block is the root node value of the node's Merkle Hash Tree), recorded as NRoot[i, j] (where i represents the rack number, j stands for the node number where it is located).

4)完成文件存储的所有节点Merkle Hash Tree的构造和根节点值的计算工作之后，CDC将存储在相同机架的所有节点Merkle Hash Tree根节点值按节点顺序进行排序，将其作为叶子节点构造各个机架的Merkle Hash Tree，并计算其根节点值(如果机架仅有一个节点存储数据文件，则该节点的Merkle Hash Tree根节点值便为机架Merkle Hash Tree的根节点值)，记为RRoot[i](其中i代表机架序号)。4) After completing the construction of the Merkle Hash Tree of all nodes stored in the file and the calculation of the root node value, CDC will sort the root node values of the Merkle Hash Tree of all nodes stored in the same rack according to the order of the nodes, and use it as a leaf node construction Merkle Hash Tree of each rack, and calculate its root node value (if there is only one node in the rack to store data files, then the Merkle Hash Tree root node value of the node is the root node value of the rack Merkle Hash Tree), record It is RRoot[i] (where i represents the serial number of the rack).

5)最后，对所有机架Merkle Hash Tree的根节点值进行按序排列，将它们的根节点值作为叶子节点构造文件的Merkle Hash Tree，并计算其根节点值(如果仅有一个机架存储数据文件，则该机架的Merkle Hash Tree根节点值便为文件Merkle Hash Tree的根节点值)，记为FRoot。5) Finally, arrange the root node values of all rack Merkle Hash Trees in order, use their root node values as the Merkle Hash Tree of the leaf node construction file, and calculate their root node values (if only one rack stores data file, then the Merkle Hash Tree root node value of the rack is the root node value of the file Merkle Hash Tree), denoted as FRoot.

B数据文件的验证Verification of B data files

(1)首先，CDC完成USER的数据文件存储工作，并向TPA提交USER所存储数据文件的相关验证信息。数据文件在被USER删除之前，只要发生了改动，CDC都将负责为数据文件生成最新的验证信息，并与TPA进行实时更新。(1) First, CDC completes the storage of USER's data files and submits relevant verification information of USER's stored data files to TPA. Before the data file is deleted by USER, as long as there is a change, CDC will be responsible for generating the latest verification information for the data file and updating it with TPA in real time.

(2)然后，CDC通知USER数据文件的存储及验证信息的生成工作已经完成，并告知USER可以开始委托TPA对数据文件进行完整性和一致性的验证。(2) Then, CDC notifies USER that the storage of data files and the generation of verification information have been completed, and informs USER that it can start entrusting TPA to verify the integrity and consistency of data files.

(3)USER可以选择马上或者另选其他时间与TPA进行通信，来委托TPA对数据文件进行完整性和一致性验证。(3) USER can choose to communicate with TPA immediately or another time to entrust TPA to verify the integrity and consistency of data files.

(4)TPA在收到USER的审计委托请求之后，会根据USER的需求对数据文件进行定期或不定期的审计验证，并对所有审计操作保留验证日志以供USER日后查验。验证过程中，如果出现数据文件验证失败的情况，则TPA将通过邮件或短信等通信方式告知USER，同时要求CDC进行数据文件的副本恢复等补救措施。(4) After receiving the audit entrustment request from USER, TPA will conduct regular or irregular audit verification of data files according to USER's needs, and keep verification logs for all audit operations for future inspection by USER. During the verification process, if data file verification fails, TPA will inform USER by email or SMS, and at the same time request CDC to take remedial measures such as copy recovery of data files.

方案中数据文件的验证过程是指USER委托TPA所进行的定期不定期的数据文件完整性和一致性审查。TPA通过向CDC提出数据文件的完整性和一致性验证请求，并对CDC回复的验证信息进行审核，来完成USER所委托的验证工作。具体描述如下：The verification process of data files in the scheme refers to the periodic and irregular data file integrity and consistency review entrusted by TPA by USER. TPA completes the verification work entrusted by USER by submitting a data file integrity and consistency verification request to CDC and reviewing the verification information replied by CDC. The specific description is as follows:

1)验证的起始阶段，TPA生成验证信息checkM(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j]))。checkM验证消息由三部分组成：一是用户的个人信息UInfo，UInfo可以帮助CDC迅速准确定位需要进行查验的数据文件的拥有者；二是需要查验的数据文件信息FInfo，FInfo明确的指出需要进行查验的数据文件；三是具体的Merkle Hash Tree根节点信息RInfo，RInfo由数据文件Merkle Hash Tree根节点值FRoot、指定机架Merkle Hash Tree根节点值RRoot[i]和指定节点Merkle Hash Tree根节点值NRoot[i，j]三部分组成。1) In the initial phase of verification, TPA generates verification information checkM(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j])). The checkM verification message consists of three parts: one is the user's personal information UInfo, UInfo can help CDC quickly and accurately locate the owner of the data file that needs to be checked; the other is the data file information that needs to be checked FInfo, FInfo clearly points out that it needs to be checked The third is the specific Merkle Hash Tree root node information RInfo, RInfo consists of the data file Merkle Hash Tree root node value FRoot, the specified rack Merkle Hash Tree root node value RRoot[i] and the specified node Merkle Hash Tree root node value NRoot[i, j] consists of three parts.

2)验证消息checkM生成之后，TPA会通过通信网络将checkM传送给CDC。2) After the verification message checkM is generated, the TPA will transmit the checkM to the CDC through the communication network.

3)CDC接收到验证信息后，首先会对checkM进行分解，确定TPA所要审查的相关内容。3) After CDC receives the verification information, it first decomposes checkM to determine the relevant content to be reviewed by TPA.

4)然后，CDC使用上述A节介绍的相关Merkle Hash Tree生成算法重新生成并计算相应Merkle Hash Tree及其根节点值。4) Then, CDC uses the relevant Merkle Hash Tree generation algorithm introduced in Section A above to regenerate and calculate the corresponding Merkle Hash Tree and its root node value.

5)完成所有验证信息的生成工作之后，CDC将生成验证回复信息respondM(UInfo，FInfo，RInfo’(FRoot’，RRoot’[i]，NRoot’[i，j]))。其中，UInfo和FInfo的内容与checkM验证信息的前两部分保持一致，RInfo’则返回TPA需要审查的相关Merkle Hash Tree根节点值，包括指定的数据文件Merkle Hash Tree根节点值FRoot’、指定的机架Merkle Hash Tree根节点值RRoot’[i]和指定的节点Merkle Hash Tree根节点值NRoot’[i，j]三部分。5) After completing the generation of all verification information, CDC will generate verification reply information respondM(UInfo, FInfo, RInfo'(FRoot', RRoot'[i], NRoot'[i, j])). Among them, the contents of UInfo and FInfo are consistent with the first two parts of the checkM verification information, and RInfo' returns the relevant Merkle Hash Tree root node values that TPA needs to review, including the specified data file Merkle Hash Tree root node value FRoot', the specified The rack Merkle Hash Tree root node value RRoot'[i] and the specified node Merkle Hash Tree root node value NRoot'[i, j] are three parts.

6)最后，TPA在收到respondM验证回复信息之后，将与自身保存的相关Merkle Hash Tree根节点值进行比对。如果结果一致，则审查失败，数据完整且与先前版本一致。反之，则说明数据与先前版本不一致，TPA将要求CDC进行数据文件的副本恢复等异常处理操作，并将审查结果告知USER。6) Finally, after receiving the response message from respondM, TPA will compare it with the root node value of the relevant Merkle Hash Tree saved by itself. If the result is consistent, the review fails and the data is complete and consistent with the previous version. On the contrary, it means that the data is inconsistent with the previous version, and TPA will require CDC to perform abnormal handling operations such as copy recovery of data files, and inform USER of the review results.

C数据的动态操作Dynamic manipulation of C data

C1数据的插入操作Insert operation of C1 data

插入操作是数据文件最基本的动态操作内容，相对于修改和删除操作也是方案中最为复杂的动态操作。例如，在数据文件的第i个数据块fi之后插入数据块fi’，方案的操作步骤可以描述如下：The insert operation is the most basic dynamic operation content of the data file, and it is also the most complex dynamic operation in the scheme compared to the modify and delete operations. For example, to insert data block fi' after the i-th data block fi' of the data file, the operation steps of the scheme can be described as follows:

(1)CDC计算出新的数据块fi’的哈希值H(fi’)，并为需要进行插入操作的数据块fi’创建标签信息LTag(fi’)。(1) CDC calculates the hash value H(fi') of the new data block fi', and creates tag information LTag(fi') for the data block fi' that needs to be inserted.

(2)生成并运行插入操作函数Insert(fi’，H(fi’)，LTag(fi))，通过LTag(fi)定位数据块插入的具体位置，进行数据块的插入操作。(2) Generate and run the insert operation function Insert(fi', H(fi'), LTag(fi)), and use LTag(fi) to locate the specific position where the data block is inserted, and perform the insert operation of the data block.

(3)调用Merkle Hash Tree插入操作函数MerkleInsert(h(fi)，h(fi’)，LTag(fi))，该函数对h(fi)和h(fi’)进行连接操作，并对连结后的数值再次进行哈希操作。然后CDC使用哈希操作产生的新结果替代原来节点Merkle Hash Tree中LTag(fi)位置节点的哈希值。值得注意的是，连结后数值进行哈希得到的结果已不再是Merkle Hash Tree的叶子节点，而是作为由h(fi)和h(fi’)两个叶子节点生成的中间节点。(3) Call the Merkle Hash Tree insertion function MerkleInsert(h(fi), h(fi'), LTag(fi)), which connects h(fi) and h(fi'), and The value of is hashed again. Then CDC uses the new result generated by the hash operation to replace the hash value of the node at the LTag(fi) position in the original node Merkle Hash Tree. It is worth noting that the result obtained by hashing the values after the connection is no longer the leaf node of the Merkle Hash Tree, but an intermediate node generated by the two leaf nodes h(fi) and h(fi’).

(4)调用fi’所在节点的Merkle Hash Tree生成函数CreateMerkle()，生成新的节点MerkleHashTree，并计算其根节点值NRoot[i，j]，其中i代表数据块所在的节点，j代表数据块所在机架。(4) Call the Merkle Hash Tree generation function CreateMerkle() of the node where fi' is located to generate a new node MerkleHashTree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data block where the rack is located.

(5)调用fi’所在机架的Merkle Hash Tree生成函数CreateMerkle()，生成新的机架MerkleHash Tree，并计算其根节点值RRoot[i]，其中i代表数据块所在的机架。(5) Call the Merkle Hash Tree generation function CreateMerkle() of the rack where fi’ is located to generate a new rack Merkle Hash Tree, and calculate its root node value RRoot[i], where i represents the rack where the data block is located.

(6)重新调用文件Merkle Hash Tree生成函数CreateMerkle()，生成新的文件Merkle HashTree，并计算其根节点值FRoot。(6) Recall the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle HashTree and calculate its root node value FRoot.

(7)CDC将更新信息向TPA进行汇报，发送更新请求信息UpdateRequest(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j])。其中UInfo、FInfo和RInfo与B节定义相同。(7) CDC reports the update information to TPA, and sends update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]). Among them, UInfo, FInfo and RInfo are the same as defined in Section B .

(8)TPA按照更新请求信息的具体内容对数据文件的验证信息进行更新，以备以后查验之用。(8) The TPA updates the verification information of the data file according to the specific content of the update request information for future inspection.

C2数据的修改操作C2 data modification operation

修改操作是云计应用中最频繁的数据动态操作。例如对数据文件的第i块数据块fi进行修改，修改之后第i块数据块将变为fi’，方案的具体操作步骤描述如下：Modification operations are the most frequent data dynamic operations in cloud computing applications. For example, if the i-th data block fi of the data file is modified, the i-th data block will become fi' after the modification. The specific operation steps of the scheme are described as follows:

(1)CDC计算出新的数据块fi’的哈希值H(fi’)，并使用fi的标签信息LTag(fi)作为fi’的标签信息LTag(fi’)。(1) CDC calculates the hash value H(fi’) of the new data block fi’, and uses the tag information LTag(fi) of fi as the tag information LTag(fi’) of fi’.

(2)生成并运行更新操作函数Update(fi’，H(fi’)，LTag(fi))，通过LTag(fi)定位需要进行修改的数据块的具体位置，进行数据块的修改操作。(2) Generate and run the update operation function Update(fi', H(fi'), LTag(fi)), locate the specific position of the data block that needs to be modified through LTag(fi), and perform the modification operation of the data block.

(3)调用Merkle Hash Tree修改操作函数MerkleUpdate(fi’，H(fi’)，LTag(fi’))，通过LTag(fi’)定位需要进行修改的数据块的具体位置，用H(fi’)替代H(fi)。(3) Call the Merkle Hash Tree modification operation function MerkleUpdate(fi', H(fi'), LTag(fi')), use LTag(fi') to locate the specific location of the data block that needs to be modified, use H(fi' ) instead of H(fi).

(4)调用fi’所在节点的Merkle Hash Tree生成函数CreateMerkle()，生成新的节点MerkleHash Tree，并计算其根节点值NRoot[i，j]，其中i代表数据块所在的节点，j代表数据块所在机架。(4) Call the Merkle Hash Tree generation function CreateMerkle() of the node where fi' is located to generate a new node Merkle Hash Tree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data The rack where the block resides.

(5)调用fi’所在机架的Merkle Hash Tree生成函数CreateMerkle()，生成新的机架MerkleHash Tree，并计算其相应的根节点值RRoot[i]，其中i代表数据块所在的机架。(5) Call the Merkle Hash Tree generation function CreateMerkle() of the rack where fi’ is located to generate a new rack Merkle Hash Tree, and calculate its corresponding root node value RRoot[i], where i represents the rack where the data block is located.

(6)调用文件Merkle Hash Tree生成函数CreateMerkle()，生成新的文件Merkle Hash Tree，并计算其根节点值FRoot。(6) Call the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle Hash Tree and calculate its root node value FRoot.

C3数据的删除操作C3 data deletion operation

例如将数据文件的第i个数据块fi之后的数据块fi+1删除，方案的具体操作步骤描述如下：For example, to delete the data block fi+1 after the i-th data block fi of the data file, the specific operation steps of the scheme are described as follows:

(1)生成并运行删除操作函数Delete(fi+1，H(fi+1)，LTag(fi+1))，通过LTag(fi+1)定位需要进行删除的数据块的具体位置，进行数据块的删除操作。(1) Generate and run the delete operation function Delete(fi+1, H(fi+1), LTag(fi+1)), use LTag(fi+1) to locate the specific location of the data block to be deleted, and perform data Block delete operation.

(2)调用Merkle Hash Tree删除操作函数MerkleDelete(LTag(fi+1)，LTag(fi))函数，将LTag(fi+1)位置上的h(fi+1)删除，使用h(fi)值替代h(fi)和h(fi+1)哈希生成的Merkle Hash Tree中间节点h(h(fi)+h(fi+1))。值得注意的是，删除操作虽然使Merkle Hash Tree减少了两个叶子节点，但是Merkle Hash Tree基本的逻辑结构并没有改变。(2) Call the Merkle Hash Tree delete operation function MerkleDelete(LTag(fi+1), LTag(fi)) to delete h(fi+1) at the position of LTag(fi+1), and use the value of h(fi) Replace the Merkle Hash Tree intermediate node h(h(fi)+h(fi+1)) generated by the hash of h(fi) and h(fi+1). It is worth noting that although the deletion operation reduced the Merkle Hash Tree by two leaf nodes, the basic logical structure of the Merkle Hash Tree has not changed.

(3)调用fi所在节点的Merkle Hash Tree生成函数CreateMerkle()，生成新的节点MerkleHash Tree，并计算其根节点值NRoot[i，j]，其中i代表数据块所在的节点，j代表数据块所在的机架。(3) Call the Merkle Hash Tree generation function CreateMerkle() of the node where fi is located to generate a new node MerkleHash Tree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data block the rack on which it is located.

(4)调用fi所在机架的Merkle Hash Tree生成函数CreateMerkle()，生成新的机架MerkleHash Tree，并计算其相应的根节点值RRoot[i]，其中i代表数据块所在的机架。(4) Call the Merkle Hash Tree generation function CreateMerkle() of the rack where fi is located to generate a new rack Merkle Hash Tree, and calculate its corresponding root node value RRoot[i], where i represents the rack where the data block is located.

(5)调用文件Merkle Hash Tree生成函数CreateMerkle()，生成新的文件Merkle Hash Tree，并计算其根节点值FRoot。(5) Call the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle Hash Tree and calculate its root node value FRoot.

(6)CDC将更新信息向TPA进行汇报，发送更新请求信息UpdateRequest(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j])。其中UInfo、FInfo和RInfo与B节定义相同。(6) CDC reports the update information to TPA, and sends update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]). Among them, UInfo, FInfo and RInfo are the same as defined in Section B .

(7)TPA按照更新请求信息的具体内容对USER的数据文件进行更新，以备以后查验之用。(7) The TPA updates the data file of the USER according to the specific content of the update request information for future inspection.

进一步优选，所述A中的(2)CDC对每个数据块进行哈希操作并求出所有数据块的哈希值。方案选择SHA-1作为哈希函数对数据块进行哈希操作。Further preferably, (2) CDC in A performs a hash operation on each data block and obtains hash values of all data blocks. The scheme selects SHA-1 as the hash function to perform hash operations on data blocks.

进一步优选，所述A中的(6)进行关键数据结构Merkle HashTree的构造。方案规定，将数据文件所有数据块的哈希值作为Merkle Hash Tree的叶子节点进行树的构造。方案不同于以往学者仅按照数据块的次序信息顺序排列进行文件Merkle Hash Tree的构造，而是将参考所有数据块的LTag标签信息，按照先节点、再机架、后文件的顺序依次构造数据文件的节点MerkleHash Tree、机架Merkle Hash Tree和文件Merkle Hash Tree，并计算其根节点值。Further preferably, (6) in said A carries out the construction of key data structure Merkle HashTree. The scheme stipulates that the hash values of all data blocks in the data file are used as the leaf nodes of the Merkle Hash Tree to construct the tree. The scheme is different from the previous scholars who only arranged the file Merkle Hash Tree according to the sequence information of the data blocks, but will refer to the LTag tag information of all data blocks, and construct the data files sequentially in the order of nodes first, then racks, and then files The node MerkleHash Tree, the rack Merkle Hash Tree and the file Merkle Hash Tree, and calculate its root node value.

进一步优选，所述B中的(4)中的1)，验证的起始阶段，TPA生成验证信息checkM(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j]))。方案允许TPA任意选择Merkle Hash Tree根节点信息中的一项、两项或三项进行审查。Further preferably, in 1) of (4) in B, at the initial stage of verification, the TPA generates verification information checkM(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j])). The scheme allows TPA to arbitrarily select one, two or three items of Merkle Hash Tree root node information for review.

进一步优选，所述C1中的数据的插入操作。方案将插入操作定义为在数据文件的某个特定数据块之后插入一个新的数据块。方案规定，新插入的数据块与其之前的特定数据块将存储在同一服务器节点之中。因此，方案中数据的插入操作并没有改变单一服务器节点MerkleHash Tree的逻辑结构。Further preferably, the insertion operation of the data in C1. The scheme defines an insert operation as inserting a new data block after a specific data block in the data file. The scheme stipulates that the newly inserted data block will be stored in the same server node as the specific data block before it. Therefore, the data insertion operation in the scheme does not change the logical structure of the single server node MerkleHash Tree.

进一步优选，所述C2中的数据的修改操作。方案将基本的数据修改操作定义为使用新的数据块对需要修改的数据块进行替换。方案规定，修改后的数据块与其修改之前的存储节点相同。因此，方案中数据的修改操作并没有改变单一服务器节点Merkle Hash Tree的逻辑结构。Further preferably, the modification operation of the data in C2. The scheme defines the basic data modification operation as replacing the data block that needs to be modified with a new data block. The scheme stipulates that the modified data block is the same as the storage node before it was modified. Therefore, the data modification operation in the scheme does not change the logical structure of the Merkle Hash Tree of a single server node.

进一步优选，所述C3中的数据的删除操作。方案将其定义为将数据文件特定数据块之后的数据块进行删除。方案规定如果所删除数据块为其所在节点的唯一数据块，则C3中步骤2将直接对机架Merkle Hash Tree进行操作，然后跳过步骤3和步骤4，直接从步骤5开始执行。Further preferably, the data in C3 is deleted. The scheme defines it as deleting data blocks after a specific data block of the data file. The scheme stipulates that if the deleted data block is the only data block of the node where it is located, step 2 in C3 will directly operate on the rack Merkle Hash Tree, then skip steps 3 and 4, and start directly from step 5.

本发明的有益效果：Beneficial effects of the present invention:

本发明基于一定的安全假设(通信信道可靠、TPA可信和CDC忠实等)。因此，USER不仅能够放心的将数据文件的验证审计工作交由TPA进行处理，而且验证信息的维护和计算工作也不必再由USER亲力亲为。方案充分利用了云计算环境下CDC庞大的计算资源和存储空间，不仅最大限度地发挥了云计算所具有的大规模、虚拟化的优势，节省了大量的用户资源。而且，相比学者们之前的研究成果，方案能够支持云计算环境下所有的基本数据动态操作(插入、删除和修改等)。在数据的完整性和一致性验证方面，方案使用了划分层次的MerkleHash Tree的根节点值作为验证信息，将该数据结构在数据完整性和一致性的验证方面所具有的优势与云计算环境下数据存储的分布特点有机的融合在一起。在Merkle Hash Tree的构造和根节点值的计算方面，方案巧妙地利用了标记有数据块存储地理位置信息的LTag标签，不仅有效地减少了在Merkle HashTree根节点值的计算过程中由于CDC内部通信所耗费的大量时间，同时也使USER对数据文件进行动态操作之后，系统不必再使用数据文件全部数据块的哈希值进行文件Merkle Hash Tree的重构及其根节点值的计算，有效地提升了动态操作之后数据文件验证信息的生成速率。此外，LTag标签和分层次Merkle Hash Tree设计思想的提出，使得方案允许TPA对CDC中存储数据块的单一服务器节点或机架进行审计验证。相比没有进行分层构造Merkle Hash Tree的方案，本方案可以通过checkM验证信息，对特定存储节点或机架进行验证。在验证出现不一致时，能够迅速准确地定位失效节点。同时，在MerkleHash Tree重构方面，由于不需要使用文件的所有数据块进行文件Merkle Hash Tree的重构，因此方案在时间方面拥有较大的效率优势。与之前学者们的研究成果相比，本发明所述方案拥有一定的效率优势。The present invention is based on certain security assumptions (communication channel reliability, TPA trustworthiness, CDC faithfulness, etc.). Therefore, not only can USER safely hand over the verification and auditing of data files to TPA, but also the maintenance and calculation of verification information does not have to be done by USER himself. The solution makes full use of CDC's huge computing resources and storage space in the cloud computing environment, not only maximizes the advantages of large-scale and virtualization of cloud computing, but also saves a lot of user resources. Moreover, compared with the previous research results of scholars, the scheme can support all basic data dynamic operations (insert, delete and modify, etc.) in the cloud computing environment. In terms of data integrity and consistency verification, the scheme uses the root node value of the divided MerkleHash Tree as verification information, and compares the advantages of the data structure in data integrity and consistency verification with the cloud computing environment. The distribution characteristics of data storage are organically integrated. In terms of the construction of the Merkle Hash Tree and the calculation of the root node value, the scheme cleverly uses the LTag tag marked with the geographical location information of the data block storage, which not only effectively reduces the internal communication of the CDC during the calculation process of the Merkle HashTree root node value It takes a lot of time, and at the same time, after the USER performs dynamic operations on the data files, the system no longer needs to use the hash values of all data blocks in the data files to reconstruct the file Merkle Hash Tree and calculate the root node value, effectively improving The rate at which data file verification messages are generated after dynamic operations. In addition, the proposal of LTag tags and hierarchical Merkle Hash Tree design ideas allows the scheme to allow TPA to audit and verify a single server node or rack that stores data blocks in CDC. Compared with the scheme without hierarchical construction of Merkle Hash Tree, this scheme can verify the information of specific storage nodes or racks through checkM. When there is an inconsistency in the verification, the failure node can be quickly and accurately located. At the same time, in terms of Merkle Hash Tree reconstruction, since it is not necessary to use all the data blocks of the file to reconstruct the Merkle Hash Tree of the file, the scheme has a greater efficiency advantage in terms of time. Compared with the research results of previous scholars, the scheme of the present invention has certain efficiency advantages.

附图说明Description of drawings

图1为背景技术中云计算数据文件的审计类型；Fig. 1 is the audit type of cloud computing data files in the background technology;

图2为CDC进行USER身份验证的流程图；Figure 2 is a flow chart of CDC performing USER identity verification;

图3为文件分块示意图；Figure 3 is a schematic diagram of file segmentation;

图4为LTag标签格式示意图；Figure 4 is a schematic diagram of the format of the LTag tag;

图5为Merkle HashTree的构建示意图；Figure 5 is a schematic diagram of the construction of Merkle HashTree;

图6为数据块的机架划分流程图；Fig. 6 is the rack division flowchart of data block;

图7为数据块的节点划分流程图；Fig. 7 is the node division flow chart of data block;

图8为节点Merkle Hash Tree的构造与根节点值的计算流程图；Figure 8 is a flowchart of the construction of the node Merkle Hash Tree and the calculation of the root node value;

图9为机架Merkle Hash Tree的构造与根节点值的计算流程图；Figure 9 is a flowchart of the construction of the rack Merkle Hash Tree and the calculation of the root node value;

图10为文件Merkle Hash Tree的构造与根节点值的计算流程图；Figure 10 is a flowchart of the construction of the file Merkle Hash Tree and the calculation of the root node value;

图11为USER、CDC和TPA的基本任务描述图；Figure 11 is a description of the basic tasks of USER, CDC and TPA;

图12为数据文件的验证过程图；Fig. 12 is the verification process figure of data file;

图13为Merkle Hash Tree的插入操作示意图；Figure 13 is a schematic diagram of the insertion operation of the Merkle Hash Tree;

图14为Merkle Hash Tree的修改操作示意图；Figure 14 is a schematic diagram of the modification operation of the Merkle Hash Tree;

图15为Merkle Hash Tree的删除操作示意图；Figure 15 is a schematic diagram of the deletion operation of Merkle Hash Tree;

图16为删除操作的重构时间对比图；Figure 16 is a comparison diagram of the reconstruction time of the delete operation;

图17为修改操作的重构时间对比图；Figure 17 is a comparison diagram of the reconstruction time of the modification operation;

图18为插入操作的重构时间对比图。Figure 18 is a comparison chart of the reconstruction time of the insert operation.

具体实施方式Detailed ways

下面结合附图和具体实施方式对本发明的技术方案作进一步详细地说明。The technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

安全假定security assumption

方案假设所有通信信道均不存在大面积数据包丢失的情况(通信信道包括USER与CDC之间、CDC与TPA之间和TPA与USER之间三部分)。同时，方案中的TPA是无偏见的、完全可信的第三方审计机构，能够忠实完成USER委托的所有任务。方案中CDC的安全假定与以往的研究略有不同，CDC不再是完全不可信任的，而是拥有一定的好奇心但能够忠实的完成所有任务。CDC保证自己所承担的所有参数计算结果的正确性、无欺骗性和不可抵赖性，并能够无条件响应任意时刻对于任意用户数据文件的审计请求。方案中所有用户数据文件经过预处理之后，数据块之间不存在相互干扰的可能。此外，方案的关注点在于如何支持对存储在CDC的用户数据文件的完整性和一致性验证以及数据的动态操作。The scheme assumes that all communication channels do not have a large area of data packet loss (communication channels include three parts between USER and CDC, between CDC and TPA, and between TPA and USER). At the same time, the TPA in the scheme is an unbiased and completely credible third-party audit organization, which can faithfully complete all tasks entrusted by USER. The security assumption of CDC in the scheme is slightly different from previous studies. CDC is no longer completely untrustworthy, but has a certain degree of curiosity but can faithfully complete all tasks. CDC guarantees the correctness, non-deception and non-repudiation of all parameter calculation results undertaken by itself, and can unconditionally respond to audit requests for any user data files at any time. After all user data files in the scheme are preprocessed, there is no possibility of mutual interference between data blocks. In addition, the focus of the solution is how to support the integrity and consistency verification of user data files stored in CDC and the dynamic operation of data.

文件预处理file preprocessing

文件进行预处理之前，USER会向CDC提出数据文件的存储请求。CDC按照其预先设定的访问控制规则对USER进行身份验证。通过CDC身份验证的合法用户便会获得进行文件存储的权限，如图2所示。Before the file is preprocessed, USER will request CDC to store the data file. CDC authenticates USER according to its preset access control rules. Legitimate users who pass the CDC authentication will obtain permission to store files, as shown in Figure 2.

首先，数据文件将被分成大小相同的若干个数据块，F→(f₁，f₂，f₃，f₄，f₅，f₆，...f_n)，如图3所示。First, the data file will be divided into several data blocks of the same size, F→(f ₁ , f ₂ , f ₃ , f ₄ , f ₅ , f ₆ , . . . f _n ), as shown in FIG. 3 .

然后，CDC对每个数据块进行哈希操作并求出所有数据块的的哈希值H(f_i)(1≤i≤n)。方案选择SHA-1作为哈希函数对数据块进行哈希操作。哈希操作之后，数据文件可以记为：F’→(f₁+H(f₁)，f₂+H(f₂)，f₃+H(f₃)，...，f_n+H(f_n))。CDC将暂时保存所有数据块的哈希值，为接下来构造文件的验证数据结构Merkle Hash Tree做准备。Then, CDC performs a hash operation on each data block and obtains the hash value H(f _i ) (1≤i≤n) of all data blocks. The scheme selects SHA-1 as the hash function to perform hash operations on data blocks. After the hash operation, the data file can be recorded as: F'→(f ₁ +H(f ₁ ), f ₂ +H(f ₂ ), f ₃ +H(f ₃ ),..., f _n +H (f _n )). CDC will temporarily save the hash values of all data blocks in preparation for the subsequent construction of the file verification data structure Merkle Hash Tree.

为了对数据块进行唯一的地理位置标记，方案为每个数据块添加了一个5字节大小的位置标签(LTag)，标签由2字节的次序标记信息、1字节的机架标记信息和2字节的节点标记信息组成。次序标记信息记录的是数据块在所有数据块中的次序编号，机架标记信息记录的是该数据块存储在数据中心的具体机架编号，节点标记信息则标明数据块存储的节点服务器的具体编号。CDC会为每个数据文件维护一个LTag标签列表，记录该文件所有数据块的LTag标签信息。标签的具体格式如图4所示。In order to uniquely mark the geographic location of the data block, the scheme adds a 5-byte location tag (LTag) to each data block. The tag consists of 2-byte sequence tag information, 1-byte rack tag information, and 2 bytes of node tag information. The sequence mark information records the sequence number of the data block in all data blocks, the rack mark information records the specific rack number of the data block stored in the data center, and the node mark information indicates the specific number of the node server where the data block is stored. serial number. CDC maintains an LTag label list for each data file, recording the LTag label information of all data blocks in the file. The specific format of the label is shown in Figure 4.

LTag标签对方案后续的操作起到了极为重要的作用。当前，采用速率优先的数据块存储分配方案，越来越受到业内人士的青睐。速率优先原则使得数据块能够快速存储在数据中心内部空闲的服务器节点之中，有效避免了数据块存储过程中的排队和等待现象。例如，一个拥有10个机架每个机架拥有100个节点的数据中心存储一个拥有10000个数据块的数据文件，每个节点并不会平均存储10个数据块，而是要依据数据块存储时各机架中服务器节点的状态来确定数据块存储的具体位置。为数据块加入LTag标签明确了所有数据块存储的具体节点位置信息，这为后文Merkle Hash Tree的构造、根节点值的计算以及错误节点的定位提供了重要的位置信息，使方案在效率方面能够得到一定的提升。The LTag tag plays an extremely important role in the subsequent operation of the scheme. At present, the data block storage allocation scheme that adopts the rate priority is more and more favored by people in the industry. The principle of speed priority enables data blocks to be quickly stored in idle server nodes in the data center, effectively avoiding queuing and waiting during the storage of data blocks. For example, a data center with 10 racks and 100 nodes in each rack stores a data file with 10,000 data blocks. Each node does not store 10 data blocks on average, but stores them according to data blocks. The status of the server nodes in each rack at the time determines the specific location of the data block storage. Adding LTag tags to data blocks clarifies the specific node location information stored in all data blocks, which provides important location information for the construction of the Merkle Hash Tree, the calculation of the root node value, and the location of error nodes, making the solution more efficient. can get some improvement.

标签添加完毕之后，所有数据块将会被CDC存入数据中心，数据块所存储的地理位置信息也会被记录在数据块自己的LTag标签之中。待所有数据块的LTag标签记录完毕，CDC将会为数据文件创建一个包含全部数据块地理位置信息的LTag标签信息列表。After the tag is added, all data blocks will be stored in the data center by CDC, and the geographical location information stored in the data block will also be recorded in the LTag tag of the data block itself. After the LTag tags of all data blocks are recorded, CDC will create an LTag tag information list for the data file that includes the geographic location information of all data blocks.

随后，进行关键数据结构Merkle Hash Tree的构造。方案规定，将数据文件所有数据块的哈希值作为Merkle Hash Tree的叶子节点进行树的构造。方案不同于以往学者仅按照数据块的次序信息顺序排列进行文件Merkle Hash Tree的构造，而是将参考所有数据块的LTag标签信息，按照先节点、再机架、后文件的顺序依次构造数据文件的节点Merkle Hash Tree、机架Merkle Hash Tree和文件Merkle Hash Tree，并计算其根节点值。构造算法描述如图5所示：Subsequently, the construction of the key data structure Merkle Hash Tree is carried out. The scheme stipulates that the hash values of all data blocks in the data file are used as the leaf nodes of the Merkle Hash Tree to construct the tree. The scheme is different from the previous scholars who only arranged the file Merkle Hash Tree according to the sequence information of the data blocks, but will refer to the LTag tag information of all data blocks, and construct the data files sequentially in the order of nodes first, then racks, and then files The node Merkle Hash Tree, the rack Merkle Hash Tree and the file Merkle Hash Tree, and calculate its root node value. The description of the construction algorithm is shown in Figure 5:

首先，CDC从Ltag标签列表中取出文件所有数据块的机架值Rack(LTag(f_i))，对存储在不同机架中的数据块进行划分，存入相应机架的ListRack[]列表。然后，CDC依次从不同机架的ListRack[]列表中取出所有数据块Ltag标签中的节点信息，对存储在相同机架不同节点的数据块再次进行划分。接着，对存储在相同机架相同节点的数据块，CDC将取出它们Ltag标签中的次序信息，进行按序排列，用数据块的哈希值作为叶子节点构造节点Merkle HashTree，并计算其根节点值(如果节点仅存有一个数据块，则该数据块的哈希值便为该节点Merkle Hash Tree的根节点值)，记为NRoot[i，j](其中i代表所在机架编号，j代表所在节点编号)。完成文件存储的所有节点Merkle Hash Tree的构造和根节点值的计算工作之后，CDC将存储在相同机架的所有节点Merkle Hash Tree根节点值按节点顺序进行排序，将其作为叶子节点构造各个机架的Merkle Hash Tree，并计算其根节点值(如果机架仅有一个节点存储数据文件，则该节点的Merkle Hash Tree根节点值便为机架Merkle Hash Tree的根节点值)，记为RRoot[i](其中i代表机架序号)。最后，对所有机架Merkle Hash Tree的根节点值进行按序排列，将它们的根节点值作为叶子节点构造文件的Merkle Hash Tree，并计算其根节点值(如果仅有一个机架存储数据文件，则该机架的Merkle Hash Tree根节点值便为文件Merkle HashTree的根节点值)，记为FRoot。First, CDC takes out the rack value Rack(LTag(f _i )) of all data blocks in the file from the Ltag tag list, divides the data blocks stored in different racks, and stores them in the ListRack[] list of the corresponding rack. Then, the CDC sequentially takes out the node information in the Ltag tags of all data blocks from the ListRack[] lists of different racks, and divides the data blocks stored in different nodes of the same rack again. Next, for the data blocks stored in the same rack and the same node, CDC will take out the order information in their Ltag tags, arrange them in order, use the hash value of the data block as the leaf node to construct the node Merkle HashTree, and calculate its root node value (if there is only one data block in the node, the hash value of the data block is the root node value of the node’s Merkle Hash Tree), recorded as NRoot[i, j] (where i represents the rack number, j represents the node number where it is located). After completing the construction of the Merkle Hash Tree of all nodes stored in the file and the calculation of the root node value, CDC will sort the root node values of the Merkle Hash Tree of all nodes stored in the same rack in order of nodes, and use them as leaf nodes to construct each machine Merkle Hash Tree of the rack, and calculate its root node value (if the rack has only one node to store data files, then the Merkle Hash Tree root node value of the node is the root node value of the rack Merkle Hash Tree), recorded as RRoot [i] (where i represents the serial number of the rack). Finally, sort the root node values of all rack Merkle Hash Trees in order, use their root node values as the Merkle Hash Tree of leaf node construction files, and calculate their root node values (if only one rack stores data files , then the Merkle Hash Tree root node value of the rack is the root node value of the file Merkle HashTree), denoted as FRoot.

方案中Merkle Hash Tree的各步骤构建算法以及代码和流程图描述如下：The Merkle Hash Tree construction algorithm, codes and flowcharts of each step in the scheme are described as follows:

方案规定，在数据块的机架划分阶段，将会对组成数据文件的所有数据块进行遍历，并根据所有数据块所属的机架信息对数据块进行分类，划分出属于不同机架的数据块，如图6所示。The scheme stipulates that in the rack division stage of data blocks, all data blocks that make up the data file will be traversed, and the data blocks will be classified according to the rack information to which all data blocks belong, and the data blocks belonging to different racks will be divided ,As shown in Figure 6.

在数据块的机架划分结束之后，方案将会对分别属于不同机架的数据块进行节点划分。方案规定，CDC将会遍历所有属于同一机架的数据块，按照他们所属机架的不同对其进行再次划分，如图7所示。After the rack division of the data block is completed, the scheme will divide the nodes of the data blocks belonging to different racks. The scheme stipulates that CDC will traverse all the data blocks belonging to the same rack, and divide them again according to the different racks they belong to, as shown in Figure 7.

完成数据块的划分之后，便可以进行方案的关键数据结构Merkle Hash Tree的构建。CDC将会首先构建所有节点的Merkle Hash Tree。CDC会将属于同一节点的数据块的哈希值按序进行排列，然后调用建树函数进行Merkle Hash Tree的构建。在节点Merkle Hash Tree构建完成之后，CDC还将分别计算所有节点Merkle Hash Tree的根节点值，如图8所示。After the division of data blocks is completed, the key data structure of the scheme, the Merkle Hash Tree, can be constructed. CDC will first build the Merkle Hash Tree of all nodes. CDC will sort the hash values of the data blocks belonging to the same node in order, and then call the tree building function to build the Merkle Hash Tree. After the node Merkle Hash Tree is built, CDC will also calculate the root node values of all node Merkle Hash Trees, as shown in Figure 8.

方案中机架Merkle Hash Tree的构建则是要使用相关节点Merkle Hash Tree的根节点值作为其叶子节点进行构建。CDC将首先对属于同一机架的节点Merkle Hash Tree根节点值按序排列，然后调用建树函数构建机架Merkle Hash Tree。所有机架Merkle Hash Tree构建完成后，CDC都将计算其相应的根节点值，如图9所示。The construction of the rack Merkle Hash Tree in the scheme is to use the root node value of the relevant node Merkle Hash Tree as its leaf node for construction. CDC will first sort the Merkle Hash Tree root node values of nodes belonging to the same rack in sequence, and then call the tree building function to build the rack Merkle Hash Tree. After the construction of Merkle Hash Tree of all racks is completed, CDC will calculate its corresponding root node value, as shown in Figure 9.

文件Merkle Hash Tree的构建与机架Merkle Hash Tree类似，只不过在文件Merkle HashTree的构建过程，CDC使用的是所有机架Merkle Hash Tree的根节点值并计算其根节点值。CDC会对所有机架Merkle Hash Tree根节点值进行按序排列，然后调用建树函数构建文件的Merkle Hash Tree。最后，CDC会对数据文件的文件Merkle Hash Tree根节点值进行计算，如图10所示。The construction of the file Merkle Hash Tree is similar to that of the rack Merkle Hash Tree, except that during the construction of the file Merkle HashTree, CDC uses the root node values of all rack Merkle Hash Trees and calculates their root node values. CDC will sort the Merkle Hash Tree root node values of all racks in order, and then call the tree building function to build the Merkle Hash Tree of the file. Finally, CDC will calculate the Merkle Hash Tree root node value of the data file, as shown in Figure 10.

数据文件的验证Validation of data files

方案在文件预处理的过程中，分别构造了节点、机架和文件的Merkle Hash Tree，并计算了其对应的根节点值。所有的根节点值都将成为方案的主要验证信息。方案规定，数据文件所有的验证信息不仅会保存在CDC之中，还会在TPA中存有备份。CDC负责通过通信网络与TPA实时更新数据文件的最新验证信息，以便TPA能够完成USER委托的数据完整性和一致性的验证工作。方案规定，TPA需要实时更新数据文件最新的LTag列表信息、所存储数据文件的全部节点Merkle Hash Tree根节点值、所存储数据文件的全部机架Merkle Hash Tree根节点值和所存储数据文件的文件Merkle Hash Tree根节点值。此外，TPA清楚数据文件所有验证信息的生成规则和方法。In the process of file preprocessing, the scheme constructs the Merkle Hash Tree of nodes, racks and files respectively, and calculates the corresponding root node values. All root node values will become the main verification information of the scheme. The scheme stipulates that all verification information of data files will not only be stored in CDC, but also have a backup in TPA. CDC is responsible for updating the latest verification information of data files in real time through the communication network and TPA, so that TPA can complete the verification of data integrity and consistency entrusted by USER. The scheme stipulates that TPA needs to update the latest LTag list information of data files in real time, the Merkle Hash Tree root node values of all nodes in the stored data files, the Merkle Hash Tree root node values of all racks in the stored data files, and the files of the stored data files Merkle Hash Tree root node value. In addition, TPA is clear about the generation rules and methods of all verification information of data files.

下面对USER、CDC和TPA在方案中的互操作进行一下描述，如图11所示：The following describes the interoperation of USER, CDC and TPA in the solution, as shown in Figure 11:

首先，CDC完成USER的数据文件存储工作，并向TPA提交USER所存储数据文件的相关验证信息。数据文件在被USER删除之前，只要发生了改动，CDC都将负责为数据文件生成最新的验证信息，并与TPA进行实时更新。然后，CDC通知USER数据文件的存储及验证信息的生成工作已经完成，并告知USER可以开始委托TPA对数据文件进行完整性和一致性的验证。USER可以选择马上或者另选其他时间与TPA进行通信，来委托TPA对数据文件进行完整性和一致性验证。TPA在收到USER的审计委托请求之后，会根据USER的需求对数据文件进行定期或不定期的审计验证，并对所有审计操作保留验证日志以供USER日后查验。验证过程中，如果出现数据文件验证失败的情况，则TPA将通过邮件或短信等通信方式告知USER，同时要求CDC进行数据文件的副本恢复等补救措施。First, CDC completes the data file storage of USER and submits relevant verification information of the data files stored by USER to TPA. Before the data file is deleted by USER, as long as there is a change, CDC will be responsible for generating the latest verification information for the data file and updating it with TPA in real time. Then, CDC notifies USER that the storage of data files and the generation of verification information have been completed, and informs USER that it can start entrusting TPA to verify the integrity and consistency of data files. USER can choose to communicate with TPA immediately or choose another time to entrust TPA to verify the integrity and consistency of data files. After receiving the audit entrustment request from USER, TPA will conduct regular or irregular audit verification of data files according to USER's needs, and keep verification logs for all audit operations for future inspection by USER. During the verification process, if data file verification fails, TPA will inform USER by email or SMS, and at the same time request CDC to take remedial measures such as copy recovery of data files.

方案中数据文件的验证过程是指USER委托TPA所进行的定期不定期的数据文件完整性和一致性审查。TPA通过向CDC提出数据文件的完整性和一致性验证请求，并对CDC回复的验证信息进行审核，来完成USER所委托的验证工作，如图12所示。具体描述如下：The verification process of data files in the scheme refers to the periodic and irregular data file integrity and consistency review entrusted by TPA by USER. TPA completes the verification work entrusted by USER by submitting a data file integrity and consistency verification request to CDC and reviewing the verification information replied by CDC, as shown in Figure 12. The specific description is as follows:

验证的起始阶段，TPA生成验证信息checkM(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j]))。checkM验证消息由三部分组成：一是用户的个人信息UInfo，UInfo可以帮助CDC迅速准确定位需要进行查验的数据文件的拥有者；二是需要查验的数据文件信息FInfo，FInfo明确的指出需要进行查验的数据文件；三是具体的Merkle Hash Tree根节点信息RInfo，RInfo由数据文件Merkle Hash Tree根节点值FRoot、指定机架Merkle Hash Tree根节点值RRoot[i]和指定节点Merkle Hash Tree根节点值NRoot[i，j]三部分组成。方案允许TPA任意选择Merkle Hash Tree根节点信息中的一项、两项或三项进行审查。验证消息checkM生成之后，TPA会通过通信网络将checkM传送给CDC。CDC接收到验证信息后，首先会对checkM进行分解，确定TPA所要审查的相关内容。然后，CDC使用相关Merkle Hash Tree生成算法重新生成并计算相应Merkle Hash Tree及其根节点值。完成所有验证信息的生成工作之后，CDC将生成验证回复信息respondM(UInfo，FInfo，RInfo’(FRoot’，RRoot’[i]，NRoot’[i，j]))。其中，UInfo和FInfo的内容与checkM验证信息的前两部分保持一致，RInfo’则返回TPA需要审查的相关Merkle Hash Tree根节点值，包括指定的数据文件Merkle Hash Tree根节点值FRoot’、指定的机架Merkle Hash Tree根节点值RRoot’[i]和指定的节点Merkle Hash Tree根节点值NRoot’[i，j]三部分。最后，TPA在收到respondM验证回复信息之后，将与自身保存的相关Merkle Hash Tree根节点值进行比对。如果结果一致，则审查失败，数据完整且与先前版本一致。反之，则说明数据与先前版本不一致，TPA将要求CDC进行数据文件的副本恢复等异常处理操作，并将审查结果告知USER。In the initial phase of verification, TPA generates verification information checkM(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j])). The checkM verification message consists of three parts: one is the user's personal information UInfo, UInfo can help CDC quickly and accurately locate the owner of the data file that needs to be checked; the other is the data file information that needs to be checked FInfo, FInfo clearly points out that it needs to be checked The third is the specific Merkle Hash Tree root node information RInfo, RInfo consists of the data file Merkle Hash Tree root node value FRoot, the specified rack Merkle Hash Tree root node value RRoot[i] and the specified node Merkle Hash Tree root node value NRoot[i, j] consists of three parts. The scheme allows TPA to arbitrarily select one, two or three items of Merkle Hash Tree root node information for review. After the verification message checkM is generated, the TPA will transmit checkM to the CDC through the communication network. After CDC receives the verification information, it first decomposes checkM to determine the relevant content to be reviewed by TPA. Then, CDC uses the relevant Merkle Hash Tree generation algorithm to regenerate and calculate the corresponding Merkle Hash Tree and its root node value. After completing the generation of all verification information, CDC will generate verification reply information respondM(UInfo, FInfo, RInfo'(FRoot', RRoot'[i], NRoot'[i, j])). Among them, the contents of UInfo and FInfo are consistent with the first two parts of the checkM verification information, and RInfo' returns the relevant Merkle Hash Tree root node values that TPA needs to review, including the specified data file Merkle Hash Tree root node value FRoot', the specified The rack Merkle Hash Tree root node value RRoot'[i] and the specified node Merkle Hash Tree root node value NRoot'[i, j] are three parts. Finally, after receiving the respondM verification reply message, TPA will compare it with the relevant Merkle Hash Tree root node value saved by itself. If the result is consistent, the review fails and the data is complete and consistent with the previous version. On the contrary, it means that the data is inconsistent with the previous version, and TPA will require CDC to perform abnormal handling operations such as copy recovery of data files, and inform USER of the review results.

数据的动态操作Dynamic manipulation of data

云计算环境下，大部分数据文件频繁进行的三种基本动态操作分别是：数据的插入、数据的修改和数据的删除。本方案中，数据文件经过这三种动态操作之后，所有数据文件相关的验证信息必须由CDC重新生成。本发明将具体介绍数据文件经过动态操作之后，MerkleHash Tree数据结构的三种重构算法。In the cloud computing environment, most data files frequently perform three basic dynamic operations: data insertion, data modification, and data deletion. In this solution, after the data files undergo these three dynamic operations, all verification information related to the data files must be regenerated by CDC. The present invention will specifically introduce three reconstruction algorithms of the MerkleHash Tree data structure after the data file is dynamically operated.

数据的插入操作data insertion

插入操作是数据文件最基本的动态操作内容，相对于修改和删除操作也是方案中最为复杂的动态操作。方案将插入操作定义为在数据文件的某个特定数据块之后插入一个新的数据块。方案规定，新插入的数据块与其之前的特定数据块将存储在同一服务器节点之中。因此，方案中数据的插入操作并没有改变单一服务器节点Merkle Hash Tree的逻辑结构。例如，在数据文件的第i个数据块f_i之后插入数据块f_i’，方案的操作步骤可以描述如下：The insert operation is the most basic dynamic operation content of the data file, and it is also the most complex dynamic operation in the scheme compared to the modify and delete operations. The scheme defines an insert operation as inserting a new data block after a specific data block in the data file. The scheme stipulates that the newly inserted data block will be stored in the same server node as the specific data block before it. Therefore, the data insertion operation in the scheme does not change the logical structure of the single server node Merkle Hash Tree. For example, to insert data block f _i ' after the i-th data block f _i of the data file, the operation steps of the scheme can be described as follows:

1、CDC计算出新的数据块f_i’的哈希值H(f_i’)，并为需要进行插入操作的数据块f_i’创建标签信息LTag(f_i’)。1. The CDC calculates the hash value H(f _i ') of the new data block f _i ', and creates tag information LTag(f _i ') for the data block f _i ' that needs to be inserted.

2、生成并运行插入操作函数Insert(f_i’，H(f_i’)，LTag(f_i))，通过LTag(f_i)定位数据块插入的具体位置，进行数据块的插入操作。2. Generate and run the insert operation function Insert(f _i ', H(f _i '), LTag(f _i )), locate the specific position where the data block is inserted through LTag(f _i ), and perform the insert operation of the data block.

3、调用Merkle Hash Tree插入操作函数MerkleInsert(h(f_i)，h(f_i’)，LTag(f_i))，该函数对h(f_i)和h(f_i’)进行连接操作，并对连结后的数值再次进行哈希操作。然后CDC使用哈希操作产生的新结果替代原来节点Merkle Hash Tree中LTag(f_i)位置节点的哈希值。值得注意的是，连结后数值进行哈希得到的结果已不再是Merkle Hash Tree的叶子节点，而是作为由h(f_i)和h(f_i’)两个叶子节点生成的中间节点。图13详细描绘了该步骤在节点Merkle Hash Tree中的变化过程。3. Call the Merkle Hash Tree insertion function MerkleInsert(h(f _i ), h(f _i '), LTag(f _i )), which connects h(f _i ) and h(f _i '), And perform a hash operation on the concatenated values again. Then CDC uses the new result generated by the hash operation to replace the hash value of the node at the LTag(f _i ) position in the original node Merkle Hash Tree. It is worth noting that the result obtained by hashing the values after concatenation is no longer a leaf node of the Merkle Hash Tree, but an intermediate node generated by the two leaf nodes h(f _i ) and h(f _i '). Figure 13 depicts the change process of this step in the node Merkle Hash Tree in detail.

4、调用f_i’所在节点的Merkle Hash Tree生成函数CreateMerkle()，生成新的节点MerkleHash Tree，并计算其根节点值NRoot[i，j]，其中i代表数据块所在的节点，j代表数据块所在机架。4. Call the Merkle Hash Tree generation function CreateMerkle() of the node where f _i 'is located to generate a new node MerkleHash Tree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data The rack where the block resides.

5、调用f_i’所在机架的Merkle Hash Tree生成函数CreateMerkle()，生成新的机架MerkleHash Tree，并计算其根节点值RRoot[i]，其中i代表数据块所在的机架。5. Call the Merkle Hash Tree generation function CreateMerkle() of the rack where f _i ' is located to generate a new rack MerkleHash Tree, and calculate its root node value RRoot[i], where i represents the rack where the data block is located.

6、重新调用文件Merkle Hash Tree生成函数CreateMerkle()，生成新的文件Merkle HashTree，并计算其根节点值FRoot。6. Recall the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle HashTree and calculate its root node value FRoot.

7、CDC将更新信息向TPA进行汇报，发送更新请求信息UpdateRequest(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j])。7. The CDC reports the update information to the TPA, and sends update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]).

8、TPA按照更新请求信息的具体内容对数据文件的验证信息进行更新，以备以后查验之用。8. The TPA updates the verification information of the data file according to the specific content of the update request information for future inspection.

数据的修改操作Data modification operations

修改操作是云计应用中最频繁的数据动态操作。方案将基本的数据修改操作定义为使用新的数据块对需要修改的数据块进行替换。方案规定，修改后的数据块与其修改之前的存储节点相同。因此，方案中数据的修改操作并没有改变单一服务器节点Merkle Hash Tree的逻辑结构。例如对数据文件的第i块数据块f_i进行修改，修改之后第i块数据块将变为f_i’，方案的具体操作步骤描述如下：Modification operations are the most frequent data dynamic operations in cloud computing applications. The scheme defines the basic data modification operation as replacing the data block that needs to be modified with a new data block. The scheme stipulates that the modified data block is the same as the storage node before it was modified. Therefore, the data modification operation in the scheme does not change the logical structure of the Merkle Hash Tree of a single server node. For example, the i-th data block f _i of the data file is modified, and the i-th data block will become f _i ' after the modification. The specific operation steps of the scheme are described as follows:

1、CDC计算出新的数据块f_i’的哈希值H(f_i’)，并使用f_i的标签信息LTag(f_i)作为f_i’的标签信息LTag(f_i’)。1. The CDC calculates the hash value H(f _i ') of the new data block f _i ', and uses the tag information LTag(f _i ) of f _i as the tag information LTag(f _i _' ) of f i '.

2、生成并运行更新操作函数Update(f_i’，H(f_i’)，LTag(f_i))，通过LTag(f_i)定位需要进行修改的数据块的具体位置，进行数据块的修改操作。2. Generate and run the update operation function Update(f _i ', H(f _i '), LTag(f _i )), use LTag(f _i ) to locate the specific location of the data block to be modified, and modify the data block operate.

3、调用Merkle Hash Tree修改操作函数MerkleUpdate(f_i’，H(f_i’)，LTag(f_i’))，通过LTag(f_i’)定位需要进行修改的数据块的具体位置，用H(f_i’)替代H(f_i)。图14详细描绘了该步骤在节点Merkle Hash Tree中的变化过程。3. Call the Merkle Hash Tree modification operation function MerkleUpdate(f _i ', H(f _i '), LTag(f _i ')), use LTag(f _i ') to locate the specific location of the data block to be modified, use H (f _i ') replaces H(f _i ). Figure 14 depicts the change process of this step in the node Merkle Hash Tree in detail.

5、调用f_i’所在机架的Merkle Hash Tree生成函数CreateMerkle()，生成新的机架MerkleHash Tree，并计算其相应的根节点值RRoot[i]，其中i代表数据块所在的机架。5. Call the Merkle Hash Tree generating function CreateMerkle() of the rack where f _i ' is located to generate a new rack MerkleHash Tree, and calculate its corresponding root node value RRoot[i], where i represents the rack where the data block is located.

6、调用文件Merkle Hash Tree生成函数CreateMerkle()，生成新的文件Merkle Hash Tree，并计算其根节点值FRoot。6. Call the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle Hash Tree and calculate its root node value FRoot.

7、CDC将更新信息向TPA进行汇报，发送更新请求信息UpdateRequest(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j])。其中UInfo、FInfo和RInfo与3.4节定义相同。7. The CDC reports the update information to the TPA, and sends the update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]). Among them, UInfo, FInfo and RInfo are the same as defined in Section 3.4.

数据的删除操作Data deletion operation

关于数据的删除操作，方案将其定义为将数据文件特定数据块之后的数据块进行删除。例如将数据文件的第i个数据块f_i之后的数据块f_i+1删除，方案的具体操作步骤描述如下：Regarding the data deletion operation, the scheme defines it as deleting the data blocks following the specific data block of the data file. For example, the data block f _i+1 after the i-th data block f _i of the data file is deleted, and the specific operation steps of the scheme are described as follows:

1、生成并运行删除操作函数Delete(f_i+1，H(f_i+1)，LTag(f_i+1))，通过LTag(f_i+1)定位需要进行删除的数据块的具体位置，进行数据块的删除操作。1. Generate and run the delete operation function Delete(f _i+1 , H(f _i+1 ), LTag(f _i+1 )), and use LTag(f _i+1 ) to locate the specific location of the data block to be deleted , to delete the data block.

2、调用Merkle Hash Tree删除操作函数MerkleDelete(LTag(f_i+1)，LTag(f_i))函数，将LTag(f_i+1)位置上的h(f_i+1)删除，使用h(f_i)值替代h(f_i)和h(f_i+1)哈希生成的Merkle Hash Tree中间节点h(h(f_i)+h(f_i+1))。值得注意的是，删除操作虽然使Merkle Hash Tree减少了两个叶子节点，但是Merkle Hash Tree基本的逻辑结构并没有改变。图15详细描绘了该步骤在节点Merkle HashTree中的变化过程。2. Call the Merkle Hash Tree delete operation function MerkleDelete(LTag(f _i+1 ), LTag(f _i )) function to delete h(f _i+1 ) at the position of LTag(f _i+1 ), use h( The f _i ) value replaces the intermediate node h(h(f _i )+h(f _i+1 )) of the Merkle Hash Tree generated by the hash of h(f _i ) and h(f _i+1 ). It is worth noting that although the deletion operation reduced the Merkle Hash Tree by two leaf nodes, the basic logical structure of the Merkle Hash Tree has not changed. Figure 15 depicts in detail the change process of this step in the node Merkle HashTree.

3、调用f_i所在节点的Merkle Hash Tree生成函数CreateMerkle()，生成新的节点MerkleHash Tree，并计算其根节点值NRoot[i，j]，其中i代表数据块所在的节点，j代表数据块所在的机架。3. Call the Merkle Hash Tree generation function CreateMerkle() of the node where f _i is located to generate a new node MerkleHash Tree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data block the rack on which it is located.

4、调用f_i所在机架的Merkle Hash Tree生成函数CreateMerkle()，生成新的机架MerkleHash Tree，并计算其相应的根节点值RRoot[i]，其中i代表数据块所在的机架。4. Call the Merkle Hash Tree generation function CreateMerkle() of the rack where f _i is located to generate a new rack MerkleHash Tree, and calculate its corresponding root node value RRoot[i], where i represents the rack where the data block is located.

5、调用文件Merkle Hash Tree生成函数CreateMerkle()，生成新的文件Merkle Hash Tree，并计算其根节点值FRoot。5. Call the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle Hash Tree and calculate its root node value FRoot.

6、CDC将更新信息向TPA进行汇报，发送更新请求信息UpdateRequest(UInfo，FInfo，RInfo(FRoot，RRoot[i]，NRoot[i，j])。其中UInfo、FInfo和RInfo与3.4节定义相同。6. The CDC reports the update information to the TPA, and sends the update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]). Among them, UInfo, FInfo and RInfo are the same as defined in Section 3.4.

7、TPA按照更新请求信息的具体内容对USER的数据文件进行更新，以备以后查验之用。7. TPA updates the data file of USER according to the specific content of the update request information for future inspection.

此外，方案规定如果所删除数据块为其所在节点的唯一数据块，则步骤2将直接对机架Merkle Hash Tree进行操作，然后跳过步骤3和步骤4，直接从步骤5开始执行。In addition, the plan stipulates that if the deleted data block is the only data block of the node where it is located, step 2 will directly operate on the rack Merkle Hash Tree, then skip steps 3 and 4, and start directly from step 5.

方案的分析analysis of the program

由于方案基于一定的安全假设(通信信道可靠、TPA可信和CDC忠实等)。因此，USER不仅能够放心的将数据文件的验证审计工作交由TPA进行处理，而且验证信息的维护和计算工作也不必再由USER亲力亲为。方案充分利用了云计算环境下CDC庞大的计算资源和存储空间，不仅最大限度地发挥了云计算所具有的大规模、虚拟化的优势，节省了大量的用户资源。而且，相比学者们之前的研究成果，方案能够支持云计算环境下所有的基本数据动态操作(插入、删除和修改等)。在数据的完整性和一致性验证方面，方案使用了划分层次的MerkleHash Tree的根节点值作为验证信息，将该数据结构在数据完整性和一致性的验证方面所具有的优势与云计算环境下数据存储的分布特点有机的融合在一起。在Merkle Hash Tree的构造和根节点值的计算方面，方案巧妙地利用了标记有数据块存储地理位置信息的LTag标签，不仅有效地减少了在Merkle Hash Tree根节点值的计算过程中由于CDC内部通信所耗费的大量时间，同时也使USER对数据文件进行动态操作之后，系统不必再使用数据文件全部数据块的哈希值进行文件Merkle Hash Tree的重构及其根节点值的计算，有效地提升了动态操作之后数据文件验证信息的生成速率。此外，LTag标签和分层次Merkle Hash Tree设计思想的提出，使得方案允许TPA对CDC中存储数据块的单一服务器节点或机架进行审计验证。相比没有进行分层构造Merkle Hash Tree的方案，本方案可以通过checkM验证信息，对特定存储节点或机架进行验证。在验证出现不一致时，能够迅速准确地定位失效节点。同时，在MerkleHash Tree重构方面，由于不需要使用文件的所有数据块进行文件Merkle Hash Tree的重构，因此方案在时间方面拥有较大的效率优势。与之前学者们的研究成果相比，本发明所述方案拥有一定的效率优势。Because the scheme is based on certain security assumptions (communication channel is reliable, TPA is credible and CDC is faithful, etc.). Therefore, not only can USER safely hand over the verification and auditing of data files to TPA, but also the maintenance and calculation of verification information does not have to be done by USER himself. The solution makes full use of CDC's huge computing resources and storage space in the cloud computing environment, not only maximizes the advantages of large-scale and virtualization of cloud computing, but also saves a lot of user resources. Moreover, compared with the previous research results of scholars, the scheme can support all basic data dynamic operations (insert, delete and modify, etc.) in the cloud computing environment. In terms of data integrity and consistency verification, the scheme uses the root node value of the divided MerkleHash Tree as verification information, and compares the advantages of the data structure in data integrity and consistency verification with the cloud computing environment. The distribution characteristics of data storage are organically integrated. In terms of the construction of the Merkle Hash Tree and the calculation of the root node value, the scheme cleverly uses the LTag tag marked with the geographical location information of the data block storage, which not only effectively reduces the internal CDC in the calculation process of the Merkle Hash Tree root node value A lot of time is spent on communication, and at the same time, after the USER performs dynamic operations on the data file, the system does not need to use the hash value of all data blocks of the data file to reconstruct the Merkle Hash Tree of the file and calculate the value of the root node, effectively Increased the rate at which datafile validation messages are generated after dynamic operations. In addition, the proposal of LTag tags and hierarchical Merkle Hash Tree design ideas allows the scheme to allow TPA to audit and verify a single server node or rack that stores data blocks in CDC. Compared with the scheme without hierarchical construction of Merkle Hash Tree, this scheme can verify the information of specific storage nodes or racks through checkM. When there is an inconsistency in the verification, the failure node can be quickly and accurately located. At the same time, in terms of Merkle Hash Tree reconstruction, since it is not necessary to use all the data blocks of the file to reconstruct the Merkle Hash Tree of the file, the scheme has a greater efficiency advantage in terms of time. Compared with the research results of previous scholars, the scheme of the present invention has certain efficiency advantages.

表1列出了本文所述方案与之前学者们的研究成果在动态操作的支持、公开审计的的支持、验证数据完整性的时间复杂度和定位数据块不一致服务器节点的支持等方面的对比结果。Table 1 lists the comparison results between the scheme described in this paper and the research results of previous scholars in terms of support for dynamic operations, support for public auditing, time complexity for verifying data integrity, and support for locating inconsistent server nodes. .

表1方案的对比Table 1 Comparison of schemes

表1中，现有技术2中所提方案仅支持有限数量的完整性验证，并不支持数据的插入操作。文献没有给出清楚的方案实施过程设计。由表中所示数据的对比情况可知，本发明方案在动态操作的支持、公开审计的的支持和验证数据完整性的时间复杂度方面并不逊色于先前的研究成果，而在对于不一致服务器节点定位的支持方面却拥有其他研究成果所不具有的优势。In Table 1, the solution proposed in prior art 2 only supports a limited number of integrity verifications, and does not support data insertion operations. The literature does not give a clear program implementation process design. From the comparison of the data shown in the table, it can be seen that the scheme of the present invention is not inferior to the previous research results in the support of dynamic operations, the support of public auditing and the time complexity of verifying data integrity, but for inconsistent server nodes In terms of positioning support, it has advantages that other research results do not have.

假设单一数据块的哈希时间为T_h，Merkle Hash Tree的构造时间为(LeafA-1)T_m(其中LeafA为Merkle Hash Tree的叶子节点数)，同一机架内节点之间的单位信息通信时间为T_n，机架之间的单位通信时间为T_r。则方案在文件Merkle Hash Tree的构造以及数据文件的插入、修改和删除等动态操作的时间开销计算公式可以归纳如下：Assuming that the hash time of a single data block is T _h , and the construction time of Merkle Hash Tree is (LeafA-1)T _m (where LeafA is the number of leaf nodes of Merkle Hash Tree), the unit information communication between nodes in the same rack The time is T _n , and the unit communication time between racks is T _r . Then the calculation formula of time overhead calculation formula for the construction of file Merkle Hash Tree and dynamic operations such as insertion, modification and deletion of data files can be summarized as follows:

文件预处理过程中数据文件的文件Merkle Hash Tree构造时间T_CreatMHT的计算，如公式3-1所示。The calculation of the file Merkle Hash Tree construction time T _CreatMHT of the data file in the file preprocessing process is shown in formula 3-1.

${T T}_{CreateMHT CreateMHT} = = {Σ Σ}_{i i = = 11}^{RackA RackA} {Σ Σ}_{j j = = 11}^{Rac Rac {k k}_{i i} NA NA} [[{T T}_{m m} \times \times (({Rack rack}_{i i} {N N}_{j j} BA BA - - 11))]] + + {Σ Σ}_{i i = = 11}^{RackA RackA} [[{T T}_{m m} \times \times (({RackNA RackNA}_{i i} - - 11))]] - - - - - - ((11))$ $+ + {T T}_{m m} \times \times ((RackA RackA - - 11)) + + {Σ Σ}_{i i = = 11}^{RackA RackA} (({T T}_{n no} \times \times {RackNA RackNA}_{i i})) + + {T T}_{r r} \times \times RackA RackA + + {T T}_{h h} \times \times FileBA FileBA$

其中，RackA为数据块分布的机架总数，Rack_iNA为第i个机架内的节点总数，Rack_iN_jBA为第i个机架中第j个节点的数据块总数，RackNA_i为第i个机架的节点总数，FileBlockA为文件的数据块总数。Among them, RackA is the total number of racks where the data blocks are distributed, Rack _i NA is the total number of nodes in the i-th rack, Rack _i N _j BA is the total number of data blocks of the j-th node in the i-th rack, and RackNA _i is The total number of nodes in the i-th rack, and FileBlockA is the total number of data blocks in the file.

数据文件的插入操作时间耗费T_Insert的计算，如公式3-2所示：The insertion operation time of the data file consumes the calculation of T _Inserter t, as shown in Formula 3-2:

T_Insert＝T_m×(Rack_iN_jBA-1)+T_m×(RackNA_i-1) (2)T _Insert ＝T _m ×(Rack _i N _j BA-1)+T _m ×(RackNA _i -1) (2)

+T_m×(RackA-1)+T_n×RackNA_i+T_r×RackA+T_h +T _m ×(RackA-1)+T _n ×RackNA _i +T _r ×RackA+T _h

数据文件的删除操作时间耗费T_Delete的计算，如公式3-3所示：The data file deletion operation takes time to calculate T _Delete , as shown in Equation 3-3:

T_Delete＝T_m×(Rack_iN_jBA-1)+T_m×(RackNA_i-1) (3)T _Delete =T _m ×(Rack _i N _j BA-1)+T _m ×(RackNA _i -1) (3)

+T_m×(RackA-1)+T_n×RackNA_i+T_r×RackA+T _m ×(RackA-1)+T _n ×RackNA _i +T _r ×RackA

数据文件的修改操作时间耗费T_Update的计算，如公式3-4所示：The modification operation time of the data file consumes the calculation of T _Update , as shown in Formula 3-4:

T_Update＝T_m×(Rack_iN_jBA-1)+T_m×(RackNA_i-1) (4)T _Update =T _m ×(Rack _i N _j BA-1)+T _m ×(RackNA _i -1) (4)

此外，本文还使用OpenSSL0.9.81函数库进行了方案的相关仿真实验代码编写。实验环境为在Windows XP操作系统搭建的VMware Workstation6.0虚拟机上运行Ubuntu10.04，处理器为奔腾双核E5300处理器，分配的内存容量为0.5GB。文件大小为1GB，分块大小为0.5MB。In addition, this paper also uses the OpenSSL0.9.81 function library to write the relevant simulation experiment codes of the scheme. The experimental environment is to run Ubuntu 10.04 on a VMware Workstation 6.0 virtual machine built on the Windows XP operating system, the processor is a Pentium dual-core E5300 processor, and the allocated memory capacity is 0.5GB. The file size is 1GB and the chunk size is 0.5MB.

仿真实验描述了在数据块分布节点数量不同的情况下，数据文件在分别进行删除、修改和插入操作之后，本文方案所述的分层次的Merkle Hash Tree重构与未进行分层的MerkleHash Tree重构在时间方面的对比情况。The simulation experiment describes how the hierarchical Merkle Hash Tree reconstruction described in this paper and the non-hierarchical Merkle Hash Tree reconstruction after the data files are deleted, modified, and inserted in the case of different numbers of data block distribution nodes are described. The comparison of structures in terms of time.

数据文件进行删除操作后Merkle Hash Tree的重构时间对比情况，如图16所示。The reconstruction time comparison of Merkle Hash Tree after the data file is deleted is shown in Figure 16.

数据文件进行修改操作后Merkle Hash Tree的重构时间对比情况，如图17所示。The reconstruction time comparison of Merkle Hash Tree after the data file is modified is shown in Figure 17.

数据文件进行插入操作后Merkle Hash Tree的重构时间对比情况，如图18所示。The reconstruction time comparison of Merkle Hash Tree after the data file is inserted is shown in Figure 18.

为使对比效果更为明显，实验假设数据文件的所有数据块被平均存储在同一机架的不同节点之中。因此，当节点数量为1时，分层次的Merkle Hash Tree构造也可看作是对数据文件的所有数据块进行哈希操作构建数据文件的文件Merkle Hash Tree，即与没有进行分层次的Merkle HashTree构造相同。由以上的仿真结果可知，无论是对数据文件进行了删除操作、修改操作还是插入操作，分层次的Merkle Hash Tree构造显然要比未分层的Merkle Hash Tree构造节省时间。而且在一定节点数量区间范围内，分层次Merkle Hash Tree的构造时间还会随着节点数量的增多而减少。因此，本文所采用的分层次的Merkle Hash Tree构造方案在时间的开销上与先前方案相比具有明显的优势。In order to make the comparison effect more obvious, the experiment assumes that all data blocks of the data file are stored in different nodes of the same rack on average. Therefore, when the number of nodes is 1, the hierarchical Merkle Hash Tree structure can also be regarded as performing a hash operation on all data blocks of the data file to construct the Merkle Hash Tree of the data file, which is different from the Merkle HashTree without hierarchical Same construction. From the above simulation results, it can be seen that whether the data file is deleted, modified or inserted, the hierarchical Merkle Hash Tree structure obviously saves time compared to the non-hierarchical Merkle Hash Tree structure. Moreover, within a certain range of the number of nodes, the construction time of the hierarchical Merkle Hash Tree will decrease as the number of nodes increases. Therefore, the hierarchical Merkle Hash Tree construction scheme adopted in this paper has obvious advantages in terms of time overhead compared with previous schemes.

在存储空间的开销方面，方案在数据文件存储的初始阶段即为每个数据块额外添加了一个5字节的LTag标签，并为所有数据文件都生成了一个LTag标签信息列表，LTag标签和标签信息列表占用的存储空间大小是由数据文件的分块数量所决定。方案中的验证信息由节点Merkle Hash Tree根节点值、机架Merkle Hash Tree根节点值和文件Merkle Hash Tree根节点值三部分组成。由于本方案使用SHA-1作为数据块哈希操作的哈希函数，因此每个MerkleHash Tree的根节点值大小均为160比特，即20字节。数据文件的验证信息占用存储空间的大小是由数据块分布的节点和机架数量决定。此外，在方案的初始化阶段，CDC将暂时保留所有数据块的哈希值。在动态操作Merkle Hash Tree的重构阶段，CDC将暂时保留操作数据块所在服务器节点的所有数据块的哈希值。在respondM验证响应信息的生成阶段，CDC还将暂时保留需要验证节点、机架或文件的所有数据块的哈希值。所有暂时保留的中间记录，都将随着Merkle Hash Tree的重构和验证信息的生成而销毁。In terms of storage space overhead, the solution adds an additional 5-byte LTag tag to each data block at the initial stage of data file storage, and generates an LTag tag information list for all data files, LTag tags and tags The storage space occupied by the information list is determined by the number of blocks in the data file. The verification information in the scheme consists of three parts: node Merkle Hash Tree root node value, rack Merkle Hash Tree root node value and file Merkle Hash Tree root node value. Since this scheme uses SHA-1 as the hash function of the data block hash operation, the root node value of each MerkleHash Tree is 160 bits, that is, 20 bytes. The size of the storage space occupied by the verification information of the data file is determined by the number of nodes and racks where the data blocks are distributed. In addition, during the initialization phase of the scheme, CDC will temporarily retain the hash value of all data blocks. During the reconstruction phase of the dynamic operation of the Merkle Hash Tree, CDC will temporarily retain the hash values of all data blocks of the server node where the data block is located. During the generation phase of respondM verification response information, CDC will also temporarily retain the hash values of all data blocks that need to verify nodes, racks or files. All temporarily retained intermediate records will be destroyed along with the reconstruction of the Merkle Hash Tree and the generation of verification information.

以上所述，仅为本发明最佳实施方式，任何熟悉本技术领域的技术人员在本发明披露的技术范围内，可显而易见地得到的技术方案的简单变化或等效替换均落入本发明的保护范围内。The above is only the best implementation mode of the present invention, any simple changes or equivalent replacements of the technical solutions that can be clearly obtained by any person skilled in the art within the technical scope disclosed in the present invention all fall into the scope of the present invention within the scope of protection.

Claims

1. A method for verifiability of data dynamic operation based on hash tree, it is characterized in that, comprising the following steps:

A file preprocessing

Before the file is pre-processed, the USER will submit a data file storage request to the CDC, and the CDC will authenticate the USER according to its preset access control rules, and the legal user who passes the CDC authentication will obtain the permission to store the file;

After USER passes the CDC authentication, CDC starts to receive the data files that USER needs to store and preprocess them:

(1) First, the data file will be divided into several data blocks of the same size, F→(f1, f2, f3, f4, f5, f6,...fn),

(2) Then, CDC performs a hash operation on each data block and calculates the hash value H(fi) (1≤i≤n) of all data blocks,

(3) After the hash operation, the data file can be recorded as: F'→(f1+H(f1), f2+H(f2), f3+H(f3), ..., fn+H(fn)) , CDC will temporarily store the hash values of all data blocks in preparation for the next construction of the verification data structure Merkle Hash Tree of the file,

(4) In order to uniquely mark the geographic location of the data block, the scheme adds a 5-byte location tag (LTag) to each data block, and the tag consists of 2-byte sequence tag information and 1-byte rack The tag information is composed of 2-byte node tag information. The sequence tag information records the sequence number of the data block in all data blocks. The rack tag information records the specific rack number of the data block stored in the data center. The node The tag information indicates the specific number of the node server where the data block is stored, and CDC will maintain a list of LTag tags for each data file, recording the LTag tag information of all data blocks in the file;

(5) After the tag is added, all data blocks will be stored in the data center by CDC, and the geographical location information stored in the data block will also be recorded in the LTag tag of the data block itself, and the LTag tags of all data blocks will be recorded After completion, CDC will create a LTag tag information list for the data file that contains the geographic location information of all data blocks.

(6) Subsequently, construct the key data structure Merkle Hash Tree,

1) First, CDC takes out the rack value Rack(LTag(fi)) of all data blocks in the file from the Ltag label list, divides the data blocks stored in different racks, and stores them in the ListRack[] list of the corresponding rack ,

2) Then, the CDC sequentially takes out the node information in the Ltag label of all data blocks from the ListRack[] lists of different racks, and divides the data blocks stored in different nodes of the same rack again,

3) Next, for the data blocks stored in the same rack and the same node, CDC will take out the sequence information in their Ltag tags, arrange them in order, use the hash value of the data block as the leaf node to construct the node Merkle Hash Tree, and calculate Its root node value, if the node has only one data block, the hash value of the data block is the root node value of the node Merkle Hash Tree, recorded as NRoot[i, j], where i represents the rack number, j stands for the node number,

4) After completing the construction of the Merkle Hash Tree of all nodes stored in the file and the calculation of the root node value, CDC will sort the root node values of the Merkle Hash Tree of all nodes stored in the same rack according to the order of the nodes, and use it as a leaf node construction Merkle Hash Tree of each rack, and calculate its root node value, if the rack has only one node to store data files, then the Merkle Hash Tree root node value of this node is the root node value of the rack Merkle Hash Tree, recorded as RRoot[i], where i represents the serial number of the rack,

5) Finally, arrange the root node values of the Merkle HashTree of all racks in order, use their root node values as the Merkle Hash Tree of the leaf node construction file, and calculate the root node value (if only one rack stores data file, then the Merkle Hash Tree root node value of the rack is the root node value of the file Merkle Hash Tree), denoted as FRoot;

Verification of B data files

(1) First, CDC completes the data file storage work of USER, and submits the relevant verification information of the data files stored by USER to TPA. Before the data files are deleted by USER, as long as any changes occur, CDC will be responsible for generating the latest data files. Verification information, and real-time update with TPA,

(2) Then, CDC notifies USER that the storage of data files and the generation of verification information have been completed, and informs USER that it can start entrusting TPA to verify the integrity and consistency of data files,

(3) USER can choose to communicate with TPA immediately or at another time to entrust TPA to verify the integrity and consistency of data files,

(4) After receiving the audit entrustment request from USER, TPA will conduct regular or irregular audit verification of data files according to USER's needs, and keep verification logs for all audit operations for future inspection by USER. During the verification process, if If data file verification fails, TPA will inform USER by email or SMS, and at the same time request CDC to perform remedial measures such as copy recovery of data files,

The verification process of data files in the scheme refers to the periodic and irregular data file integrity and consistency review entrusted by USER to TPA. TPA submits a data file integrity and consistency verification request to CDC and verifies CDC’s reply. The information is reviewed to complete the verification work entrusted by USER. The specific description is as follows:

1) In the initial stage of verification, TPA generates verification information checkM(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j])), and the checkM verification message consists of three parts: one is the user's personal information UInfo, UInfo can help CDC quickly and accurately locate the owner of the data file that needs to be checked; the second is the data file information that needs to be checked FInfo, FInfo clearly points out the data file that needs to be checked; the third is the specific Merkle Hash Tree root node information RInfo, RInfo is composed of three parts: the Merkle Hash Tree root node value FRoot of the data file, the specified rack Merkle Hash Tree root node value RRoot[i], and the specified node Merkle Hash Tree root node value NRoot[i, j].

2) After the verification message checkM is generated, TPA will transmit checkM to CDC through the communication network,

3) After CDC receives the verification information, it will first decompose checkM to determine the relevant content to be reviewed by TPA.

4) Then, CDC uses the relevant Merkle Hash Tree generation algorithm introduced in Section A above to regenerate and calculate the corresponding Merkle Hash Tree and its root node value,

5) After completing the generation of all verification information, CDC will generate verification reply information respondM(UInfo, FInfo, RInfo'(FRoot', RRoot'[i], NRoot'[i, j])), among them, UInfo and FInfo The content is consistent with the first two parts of the checkM verification information, and RInfo' returns the relevant Merkle Hash Tree root node values that TPA needs to review, including the specified data file Merkle Hash Tree root node value FRoot', the specified rack Merkle Hash Tree The root node value RRoot'[i] and the specified node Merkle Hash Tree root node value NRoot'[i, j] are three parts,

6) Finally, after receiving the respondM verification reply message, TPA will compare it with the relevant Merkle Hash Tree root node value saved by itself. If the result is consistent, the review will fail, and the data is complete and consistent with the previous version. Otherwise, it means If the data is inconsistent with the previous version, TPA will require CDC to perform abnormal handling operations such as copy recovery of data files, and inform USER of the review results;

Dynamic manipulation of C data

Insert operation of C1 data

The insertion operation is the most basic dynamic operation content of the data file. Compared with the modification and deletion operations, it is also the most complicated dynamic operation in the scheme. After inserting the data block fi' after the i-th data block fi in the data file, the operation steps of the scheme can be described as follows:

(1) CDC calculates the hash value H(fi') of the new data block fi', and creates tag information LTag(fi') for the data block fi' that needs to be inserted,

(2) Generate and run the insertion operation function Insert(fi', H(fi'), LTag(fi)), locate the specific position of the data block insertion through LTag(fi), and perform the insertion operation of the data block,

(3) Call the Merkle Hash Tree insertion function MerkleInsert(h(fi), h(fi'), LTag(fi)), which connects h(fi) and h(fi'), and Then the CDC uses the new result generated by the hash operation to replace the hash value of the node at the LTag(fi) position in the original node Merkle Hash Tree. After the connection, the result obtained by hashing the value is no longer The leaf node of the Merkle Hash Tree, but as an intermediate node generated by the two leaf nodes h(fi) and h(fi'),

(4) Call the Merkle Hash Tree generation function CreateMerkle() of the node where fi' is located to generate a new node Merkle Hash Tree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data the rack where the block resides,

(5) Call the Merkle Hash Tree generation function CreateMerkle() of the rack where fi’ is located to generate a new rack Merkle Hash Tree, and calculate its root node value RRoot[i], where i represents the rack where the data block is located,

(6) Recall the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle HashTree and calculate its root node value FRoot,

(7) CDC reports the update information to TPA, and sends update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]), where UInfo, FInfo and RInfo are the same as those defined in Section B ,,

(8) TPA updates the verification information of the data file according to the specific content of the update request information for future inspection purposes,

C2 data modification operation

The modification operation is the most frequent data dynamic operation in cloud computing applications. Modify the i-th data block fi of the data file. After the modification, the i-th data block will become fi'. The specific operation steps of the scheme are described as follows:

(1) CDC calculates the hash value H(fi’) of the new data block fi’, and uses the tag information LTag(fi) of fi as the tag information LTag(fi’) of fi’,

(2) Generate and run the update operation function Update(fi', H(fi'), LTag(fi)), locate the specific position of the data block that needs to be modified through LTag(fi), and perform the modification operation of the data block,

(3) Call the Merkle Hash Tree modification operation function MerkleUpdate(fi', H(fi'), LTag(fi')), use LTag(fi') to locate the specific location of the data block that needs to be modified, use H(fi' ) instead of H(fi),

(4) Call the Merkle Hash Tree generation function CreateMerkle() of the node where fi' is located to generate a new node MerkleHashTree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data block on the rack,

(5) Call the Merkle Hash Tree generation function CreateMerkle() of the rack where fi’ is located to generate a new rack Merkle Hash Tree, and calculate its corresponding root node value RRoot[i], where i represents the rack where the data block is located,

(6) Call the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle Hash Tree and calculate its root node value FRoot,

(7) CDC reports the update information to TPA, and sends update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]), where UInfo, FInfo and RInfo are the same as those defined in Section B ,

(8) The TPA updates the verification information of the data file according to the specific content of the update request information for future inspection;

C3 data deletion operation

For example, to delete the data block fi+1 after the i-th data block fi of the data file, the specific operation steps of the scheme are described as follows:

(1) Generate and run the delete operation function Delete(fi+1, H(fi+1), LTag(fi+1)), use LTag(fi+1) to locate the specific location of the data block to be deleted, and perform data Block delete operations,

(2) Call the Merkle Hash Tree delete operation function MerkleDelete(LTag(fi+1), LTag(fi)) to delete h(fi+1) at the position of LTag(fi+1), and use the value of h(fi) Replace the intermediate node h(h(fi)+h(fi+1) of the Merkle Hash Tree generated by the hash of h(fi) and h(fi+1). Although the deletion operation reduces the Merkle Hash Tree by two leaf nodes, the The basic logical structure of MerkleHash Tree has not changed.

(3) Call the Merkle Hash Tree generation function CreateMerkle() of the node where fi is located to generate a new node MerkleHash Tree, and calculate its root node value NRoot[i, j], where i represents the node where the data block is located, and j represents the data block the rack where the

(4) Call the Merkle Hash Tree generation function CreateMerkle() of the rack where fi is located to generate a new rack Merkle Hash Tree, and calculate its corresponding root node value RRoot[i], where i represents the rack where the data block is located,

(5) Call the file Merkle Hash Tree generation function CreateMerkle() to generate a new file Merkle Hash Tree and calculate its root node value FRoot,

(6) CDC reports the update information to TPA, and sends update request information UpdateRequest(UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j]), where UInfo, FInfo and RInfo are the same as defined in section B ,

(7) The TPA updates the data file of the USER according to the specific content of the update request information for future inspection.

2. the data dynamic operation verifiability method based on hash tree according to claim 1, is characterized in that, (2) CDC among the described A carries out hash operation to each data block and obtains all data blocks hash value, the scheme selects SHA-1 as the hash function to perform hash operations on the data block.

3. the data dynamic operation verifiability method based on hash tree according to claim 1, it is characterized in that, (6) in the described A carries out the structure of key data structure Merkle Hash Tree, and scheme stipulates, data file The hash values of all data blocks are used as the leaf nodes of the Merkle Hash Tree to construct the tree. The scheme is different from the previous scholars who only arrange the file Merkle Hash Tree according to the order information of the data blocks, but will refer to the LTag of all data blocks For label information, the node Merkle Hash Tree, rack Merkle Hash Tree and file Merkle Hash Tree of the data file are sequentially constructed in the order of first node, second rack, and last file, and the root node value is calculated.

4. the data dynamic operation verifiability method based on hash tree according to claim 1, is characterized in that, 1) in (4) in described B, the initial stage of verification, TPA generates verification information checkM (UInfo, FInfo, RInfo(FRoot, RRoot[i], NRoot[i, j])), the scheme allows TPA to arbitrarily select one, two or three items of Merkle Hash Tree root node information for review.

5. The method for verifiability of data dynamic operation based on hash tree according to claim 1, characterized in that, the insertion operation of the data in the C1, the scheme defines the insertion operation as certain specific data in the data file A new data block is inserted after the block. The plan stipulates that the newly inserted data block and the specific data block before it will be stored in the same server node. The data insertion operation in the plan does not change the logical structure of the single server node Merkle Hash Tree .

6. The method for verifiability of data dynamic operation based on hash tree according to claim 1, characterized in that, for the modification operation of the data in the C2, the scheme defines the basic data modification operation as using a new data block To replace the data block that needs to be modified, the plan stipulates that the modified data block is the same as the storage node before the modification, and the data modification operation in the plan does not change the logical structure of the single server node Merkle Hash Tree.

7. The method for verifiability of data dynamic operation based on hash tree according to claim 1, characterized in that, the deletion operation of the data in the C3 is defined by the scheme as the data after the specific data block of the data file The scheme stipulates that if the deleted data block is the only data block of the node where it is located, step (2) in C3 will directly operate on the MerkleHash Tree of the rack, and then skip steps (3) and (4), Execute directly from step (5).