CN104239575A

CN104239575A - Virtual machine mirror image file storage and distribution method and device

Info

Publication number: CN104239575A
Application number: CN201410524284.4A
Authority: CN
Inventors: 姜进磊; 武永卫; 杨广文; 赵勋; 何川
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2014-10-08
Filing date: 2014-10-08
Publication date: 2014-12-24

Abstract

The invention provides a virtual machine image file storage and distribution method and device. The storage method includes: dividing the image file into fixed-length image file blocks; and storing the image file blocks. By storing the image file in fixed-length chunks, the storage space of the virtual machine image file can be effectively saved, while ensuring a high compression ratio. The distribution method includes: determining the image file block belonging to the virtual machine of the client node; establishing a connection with other client nodes based on the P2P protocol; when there are image file blocks required by the client node virtual machine in other client nodes, Otherwise, the required image file blocks are obtained from the data server; or, when there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file blocks are sent. The P2P protocol is used to realize the distribution of image files between virtual machine clients, which effectively improves the efficiency and speed of virtual machine image file distribution.

Description

Method and device for storing and distributing virtual machine image files

技术领域technical field

本发明涉及数据存储技术领域，特别涉及一种虚拟机镜像文件存储、分发方法及装置。The invention relates to the technical field of data storage, in particular to a method and device for storing and distributing virtual machine image files.

背景技术Background technique

“云计算”、“云存储”是近年来出现较为重要的技术概念，给计算机科学以及互联网的发展带来了长远的影响。云计算、云存储隐藏了底层复杂的基础设施和管理逻辑，为用户提供了简洁、易用、可动态调整的资源池。用户无需掌握云内部的专业知识，无需直接控制云底层的基础设施，就可以方便地使用云技术提供的计算、存储资源，并且可以依据自己的需求，更改资源的使用量，以便于节省成本。"Cloud computing" and "cloud storage" are relatively important technical concepts that have emerged in recent years, and have brought long-term impacts on the development of computer science and the Internet. Cloud computing and cloud storage hide the underlying complex infrastructure and management logic, and provide users with a simple, easy-to-use, and dynamically adjustable resource pool. Users do not need to master the professional knowledge inside the cloud, and do not need to directly control the underlying infrastructure of the cloud. They can easily use the computing and storage resources provided by cloud technology, and can change the usage of resources according to their own needs, so as to save costs.

随着各大公司云数据中心的建成，云环境下的虚拟机规模急剧扩张，虚拟机之间硬件资源的复用以及网络通信负载的加剧导致了整个云计算服务性能的下降，如何通过提高虚拟环境下的服务性能来提升云系统的服务质量，成为虚拟机研究领域的重要课题。虚拟机在运行过程中依赖于本身所需的虚拟机镜像文件，在大规模虚拟集群的数据中心中，海量的虚拟机镜像文件的存储方案设计显得尤为重要。With the establishment of cloud data centers of major companies, the scale of virtual machines in the cloud environment has expanded rapidly, the multiplexing of hardware resources between virtual machines and the intensification of network communication load have led to the decline in the performance of the entire cloud computing service. How to improve the performance of virtual machines? It has become an important topic in the field of virtual machine research to improve the service quality of the cloud system by improving the service performance in the environment. A virtual machine depends on its own virtual machine image files during operation. In a large-scale virtual cluster data center, the design of a storage solution for a large number of virtual machine image files is particularly important.

云计算环境创建新虚拟机时，使用的硬盘镜像通常是拷贝自一个模板镜像。而传统的SAN(Storage Area Network and SAN protocols，存储域网络及其协议)存储设备不支持容错处理，如果SAN设备失效，则整个系统都无法继续工作。When creating a new virtual machine in a cloud computing environment, the hard disk image used is usually copied from a template image. However, traditional SAN (Storage Area Network and SAN protocols, storage area network and its protocols) storage devices do not support fault-tolerant processing. If the SAN device fails, the entire system cannot continue to work.

可见，现有虚拟机镜像文件存储和分发中存在的问题如下：It can be seen that the existing problems in the storage and distribution of virtual machine image files are as follows:

1.庞大的虚拟机镜像文件存储所占的存储空间巨大；1. Huge virtual machine image file storage occupies a huge storage space;

2.虚拟机镜像文件的分发单纯依赖镜像文件数据服务器，导致镜像文件服务器会成为网络热点，限制了虚拟机的建立及运行速度和鲁棒性。2. The distribution of the virtual machine image file depends solely on the image file data server, causing the image file server to become a network hotspot, which limits the establishment, operation speed and robustness of the virtual machine.

发明内容Contents of the invention

本发明针对上述问题，提出了一种虚拟机镜像文件存储、分发方法及装置，用以解决虚拟机镜像文件的海量存储浪费存储空间及分发效率低的问题。Aiming at the above problems, the present invention proposes a method and device for storing and distributing virtual machine image files to solve the problems of wasted storage space and low distribution efficiency of mass storage of virtual machine image files.

本发明提供了一种虚拟机镜像文件存储方法，包括如下步骤：The present invention provides a virtual machine image file storage method, comprising the following steps:

将镜像文件分割成定长的镜像文件块；Divide the image file into fixed-length image file blocks;

对镜像文件块进行存储。Store mirrored file blocks.

本发明提供了一种虚拟机镜像文件分发方法，包括如下步骤：The present invention provides a virtual machine image file distribution method, comprising the following steps:

确定归属本客户端节点虚拟机的镜像文件块；Determine the image file blocks that belong to the virtual machine of the client node;

基于P2P(Peer to Peer，对等网络)协议与其他客户端节点建立连接；Establish connections with other client nodes based on the P2P (Peer to Peer, peer-to-peer network) protocol;

在确定其他客户端节点中存在本客户端节点虚拟机所需的镜像文件块时，从其他客户端节点中获取该镜像文件块，否则从数据服务器获取所需的镜像文件块；或者，在确定本客户端节点虚拟机中存在其他客户端节点虚拟机所需的镜像文件块时，将该镜像文件块发送至其他客户端节点，其中，镜像文件块是将虚拟机镜像文件按定长分割后得到的。When it is determined that the image file blocks required by the client node virtual machine exist in other client nodes, obtain the image file blocks from other client nodes, otherwise obtain the required image file blocks from the data server; or, after determining When there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file block is sent to other client nodes, wherein the image file block is obtained by dividing the virtual machine image file by a fixed length owned.

本发明提供了一种虚拟机镜像文件存储装置，包括：The present invention provides a virtual machine image file storage device, comprising:

分割单元，用于将镜像文件分割成定长的镜像文件块；A split unit, used to split the image file into fixed-length image file blocks;

存储单元，用于对镜像文件块进行存储。The storage unit is used to store the image file blocks.

本发明提供了一种虚拟机镜像文件分发装置，包括：The present invention provides a virtual machine image file distribution device, comprising:

确定单元，用于确定归属本客户端节点虚拟机的镜像文件块；A determination unit is used to determine the image file blocks that belong to the virtual machine of the client node;

连接单元，用于基于对等网络P2P协议与其他客户端节点建立连接；A connection unit, configured to establish connections with other client nodes based on a peer-to-peer network P2P protocol;

分发单元，用于在确定其他客户端节点中存在本客户端节点虚拟机所需的镜像文件块时，从其他客户端节点中获取该镜像文件块，否则从数据服务器获取所需的镜像文件块；或者，在确定本客户端节点虚拟机中存在其他客户端节点虚拟机所需的镜像文件块时，将该镜像文件块发送至其他客户端节点，其中，镜像文件块是将虚拟机镜像文件按定长分割后得到的。The distribution unit is used to obtain the image file blocks from other client nodes when it is determined that the image file blocks required by the virtual machine of the client node exist in other client nodes, otherwise obtain the required image file blocks from the data server ; Or, when it is determined that there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file block is sent to other client nodes, wherein the image file block is the virtual machine image file Obtained after dividing by fixed length.

本发明有益效果如下：The beneficial effects of the present invention are as follows:

当前虚拟机镜像文件存储中，由于镜像文件数量庞大，导致虚拟机镜像文件存储所占空间巨大。在本发明实施例提供的虚拟机镜像文件存储方案中，通过对虚拟机镜像文件分割成定长的镜像文件块进行存储，由于通过对镜像文件定长分割成块，保证了较高的压缩比，所以，可以有效节约虚拟机镜像文件的存储空间。In current virtual machine image file storage, due to the huge number of image files, the virtual machine image file storage occupies a huge space. In the virtual machine image file storage solution provided by the embodiment of the present invention, by dividing the virtual machine image file into fixed-length image file blocks for storage, a higher compression ratio is ensured by dividing the image file into fixed-length blocks , so the storage space of the virtual machine image file can be effectively saved.

另外，在现有的虚拟机镜像文件分发中单纯依赖镜像文件服务器，限制了虚拟机的建立及运行速度和鲁棒性。针对该问题，在本发明实施例提供的虚拟机镜像文件分发方案中，采用P2P协议在虚拟机客户端间进行镜像文件的快速克隆、分发，因为利用P2P虚拟机镜像文件的分发方法，所有的镜像文件服务器和客户端节点均可作为镜像文件块的提供者，相互交换存储的镜像文件块，有效将镜像文件服务器的数据传输负载转移到客户端节点上，平衡了数据传输的负载，避免了网络热点的出现，有效缩短了拷贝镜像文件所需等待的时间和IO开销，提升了虚拟机镜像文件分发的效率和速度。In addition, relying solely on the image file server in the existing virtual machine image file distribution limits the establishment and running speed and robustness of the virtual machine. In view of this problem, in the virtual machine image file distribution scheme provided by the embodiment of the present invention, the P2P protocol is used to quickly clone and distribute the image file between virtual machine clients, because using the distribution method of the P2P virtual machine image file, all Both the image file server and the client node can be used as the provider of the image file block, exchange the stored image file blocks, effectively transfer the data transmission load of the image file server to the client node, balance the load of data transmission, and avoid The emergence of network hotspots effectively shortens the waiting time and IO overhead for copying image files, and improves the efficiency and speed of virtual machine image file distribution.

附图说明Description of drawings

下面将参照附图描述本发明的具体实施例，其中：Specific embodiments of the present invention will be described below with reference to the accompanying drawings, wherein:

图1为本发明实施例中虚拟机镜像文件存储方法的流程示意图；FIG. 1 is a schematic flow diagram of a method for storing a virtual machine image file in an embodiment of the present invention;

图2为本发明实施例中不同镜像文件切块大小下系统实现的去冗余压缩比示意图；Fig. 2 is a schematic diagram of the de-redundancy compression ratio realized by the system under different image file block sizes in the embodiment of the present invention;

图3为本发明实施例中不同镜像文件切块大小参数下系统的IO性能结果示意图；Fig. 3 is a schematic diagram of the IO performance results of the system under different image file block size parameters in the embodiment of the present invention;

图4为本发明实施例中镜像文件块存储方案IO性能测试结果示意图；Fig. 4 is a schematic diagram of the IO performance test results of the image file block storage scheme in the embodiment of the present invention;

图5为本发明实施例中指纹计算的MD5算法与SHA-1算法的性能比较示意图；Fig. 5 is the performance comparison schematic diagram of the MD5 algorithm of fingerprint calculation and the SHA-1 algorithm in the embodiment of the present invention;

图6为本发明实施例中镜像文件块文件存储示意图；Fig. 6 is a schematic diagram of image file block file storage in an embodiment of the present invention;

图7为本发明实施例中虚拟机镜像文件分发方法的流程示意图；7 is a schematic flow diagram of a method for distributing a virtual machine image file in an embodiment of the present invention;

图8为本发明实施例中虚拟机镜像文件分发方法的应用场景示意图；FIG. 8 is a schematic diagram of an application scenario of a method for distributing a virtual machine image file in an embodiment of the present invention;

图9为本发明实施例中虚拟机镜像文件存储装置示意图；9 is a schematic diagram of a virtual machine image file storage device in an embodiment of the present invention;

图10为本发明实施例中虚拟机镜像文件分发装置示意图；10 is a schematic diagram of a virtual machine image file distribution device in an embodiment of the present invention;

图11为本发明实施例中Bonnie++系统性能测试结果示意图；Fig. 11 is a schematic diagram of Bonnie++ system performance test results in the embodiment of the present invention;

图12为本发明实施例中PostMark系统性能测试结果示意图；Fig. 12 is a schematic diagram of PostMark system performance test results in the embodiment of the present invention;

图13为本发明实施例中Linux虚拟机性能测试结果示意图；Fig. 13 is a schematic diagram of the performance test results of the Linux virtual machine in the embodiment of the present invention;

图14为本发明实施例中传输虚拟机镜像文件花费的时间对比示意图；FIG. 14 is a schematic diagram of a comparison of the time spent in transferring a virtual machine image file in an embodiment of the present invention;

图15为本发明实施例中按需传输时虚拟机的启动速度对比示意图。FIG. 15 is a schematic diagram of a comparison of startup speeds of virtual machines during on-demand transmission according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的技术方案及优点更加清楚明白，以下结合附图对本发明的示例性实施例进行进一步详细的说明，显然，所描述的实施例仅是本发明的一部分实施例，而不是所有实施例的穷举。In order to make the technical solutions and advantages of the present invention clearer, the exemplary embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only part of the embodiments of the present invention, not all implementations. Exhaustive list of examples.

图1为本发明实施例中虚拟机镜像文件存储方法的流程示意图，如图1所示，在虚拟机镜像文件存储时可以包括如下步骤：Fig. 1 is a schematic flow diagram of a method for storing a virtual machine image file in an embodiment of the present invention. As shown in Fig. 1, the storage of a virtual machine image file may include the following steps:

步骤101：将镜像文件分割成定长的镜像文件块；Step 101: dividing the image file into fixed-length image file blocks;

步骤102：对镜像文件块进行存储。Step 102: Store the image file blocks.

实施中，定长的镜像文件块大小可以为4KB的N倍，其中N为自然数。In implementation, the fixed-length image file block size may be N times 4KB, where N is a natural number.

具体的，采用定长的镜像文件块分割方案，使用4KB或者4KB整数倍大小作为镜像文件块作为文件系统在磁盘上保存的基本数据块大小，保证了所有写入磁盘的数据可以按照磁盘簇边界进行对齐。由于操作系统镜像和应用程序镜像均为只读，一旦写入镜像文件之后不能再修改，因此，定长切分的方案可以保证较高的压缩比。Specifically, a fixed-length image file block segmentation scheme is adopted, and 4KB or an integer multiple of 4KB is used as the image file block size as the basic data block size saved by the file system on the disk, which ensures that all data written to the disk can be divided according to the disk cluster boundary. to align. Since both the operating system image and the application image are read-only, they cannot be modified once they are written into the image file. Therefore, the fixed-length segmentation scheme can ensure a high compression ratio.

由于镜像文件块的大小，将直接影响到系统的IO性能以及存储压缩比。因此需要选取一个最佳的数据块大小参数，以便获得IO性能以及存储压缩比之间的平衡。The size of the image file block will directly affect the IO performance and storage compression ratio of the system. Therefore, it is necessary to select an optimal data block size parameter in order to obtain a balance between IO performance and storage compression ratio.

图2为本发明实施例中不同镜像文件切块大小下系统实现的去冗余压缩比示意图，如图2所示，通过对常用的183个总大小为2.31TB的虚拟机镜像文件进行去冗余而得到，这些镜像文件包括了多个版本的Windows操作系统以及Linux操作系统，应用软件包括了科学计算软件、生物信息学分析软件、数据库管理系统、办公系统、网络服务系统、集成开发环境等多种组合，基本涵盖了绝大部分的虚拟机应用需求。所有的虚拟机文件系统存储的磁盘簇大小都是4KB，因此，小于4KB的去冗余镜像文件块并不能对去冗余压缩比带来很明显的提高。当镜像文件块大小增加时，系统整体的去冗余压缩比迅速下降。Fig. 2 is a schematic diagram of the deredundancy compression ratio realized by the system under different image file slice sizes in the embodiment of the present invention. As shown in Fig. 2, deduplication is performed on 183 commonly used virtual machine image files with a total size of 2.31 TB Yu Erde obtained that these image files include multiple versions of Windows operating system and Linux operating system, and the application software includes scientific computing software, bioinformatics analysis software, database management system, office system, network service system, integrated development environment, etc. A variety of combinations basically cover most of the virtual machine application requirements. All virtual machine file systems store disk clusters with a size of 4KB. Therefore, deredundancy image file blocks smaller than 4KB cannot significantly improve the deredundancy compression ratio. When the image file block size increases, the overall de-redundancy compression ratio of the system drops rapidly.

图3为本发明实施例中不同镜像文件切块大小参数下系统的IO性能结果示意图，如图3所示，实验数据来自Liquid，其中，Liquid是将本发明实施例中提供的虚拟机镜像文件存储方法应用到实际中后，得到相应的虚拟机镜像文件存储系统，为了称呼方便，命名该装置为“Liquid”。图3中标记了Raw数据，是对本地的镜像文件块存储进行读写操作的性能，因为读写的镜像文件块经过去冗余操作，只能是不相同的镜像文件块，其数量要比实际写入镜像的数据要少。标记了去冗余的数据，则是考虑了去冗余压缩比而得到的结果，也即表明了客户端文件系统给虚拟机所提供的实际IO性能的指标。较小的数据块将会导致更多的磁盘随机读写操作，因此将会破坏系统的IO性能。随着镜像文件块的增长，IO性能逐渐提升，而在镜像文件块大小达到256KB之后逐渐稳定下来。Figure 3 is a schematic diagram of the IO performance results of the system under different image file slice size parameters in the embodiment of the present invention, as shown in Figure 3, the experimental data comes from Liquid, where Liquid is the virtual machine image file provided in the embodiment of the present invention After the storage method is applied in practice, the corresponding virtual machine image file storage system is obtained. For the convenience of calling, the device is named "Liquid". Raw data is marked in Figure 3, which is the performance of reading and writing operations on the local image file block storage, because the read and write image file blocks can only be different image file blocks after de-redundancy operations, and the number of them is smaller than that of The actual data written to the mirror is less. The data marked with de-redundancy is the result obtained by considering the de-redundancy compression ratio, which is an indicator of the actual IO performance provided by the client file system to the virtual machine. Smaller data blocks will cause more random disk read and write operations, and thus will destroy the IO performance of the system. With the growth of the image file block, the IO performance gradually improves, and gradually stabilizes after the image file block size reaches 256KB.

基于本发明实施例中的实验结果以及实际使用经验表明，使用256KB至1MB范围的镜像文件块大小可以取得良好的系统IO性能以及去冗余压缩比，达到两者之间的平衡，能够满足绝大部分的应用需求。Based on the experimental results in the embodiments of the present invention and the actual use experience, it is shown that using the image file block size in the range of 256KB to 1MB can achieve good system IO performance and de-redundancy compression ratio, reach a balance between the two, and can meet absolute requirements. most application requirements.

实施中，定长的镜像文件块大小可以为256KB。In implementation, the block size of the fixed-length image file may be 256KB.

具体实施中，在镜像文件块大小的选择方面，可自定义切块的大小，可以实现不同大小切块的压缩存储，一般来说较大的切块方案意味着各镜像块之间的重复度更小，压缩比更小，而较小的切块方案则由于元数据量的增量导致了系统IO性能的下降。经过试验测试，合适的切块大小为256KB，能够达到IO性能与存储压缩比的平衡。In the specific implementation, in terms of the selection of the image file block size, the size of the block can be customized, and the compressed storage of blocks of different sizes can be realized. Generally speaking, a larger block plan means the repetition degree between each image block Smaller, the compression ratio is smaller, and the smaller chunking scheme leads to a decrease in system IO performance due to the increase in the amount of metadata. After testing, the appropriate block size is 256KB, which can achieve a balance between IO performance and storage compression ratio.

图4为本发明实施例中镜像文件块存储方案IO性能测试结果示意图，如图4所示，采用本发明实施例中的虚拟机镜像文件存储方案与普通的Ext4(Thefourth extended file system，扩展日志式文件系统第四版)文件系统设计进行比较对照，实验环境为32核主频为2GHz的Intel(R)Xeon CPU E5-2640处理器，32GB内存服务器1台，安装本发明实施中提供的虚拟机镜像文件存储系统Liquid，然后以256KB为默认镜像文件的切块大小，分别写入1GB～160GB大小的镜像文件进行读写性能测试。测试结果显示，经过优化过的本发明实施例中提供的存储方案，性能优于Ext4文件系统的存储性能，两者在镜像读取方面的性能区别不大，但是在写入性能中可以明显看到，Ext4随着系统中存储镜像文件规模的逐渐扩大，系统中元数据规模将显著增多，导致其在存储量大于10G后写入性能直线下降，而本发明实施例中提供的存储方案中对于镜像文件写入的性能则支持相对稳定，在对于镜像文件的大规模增长，其写入速度都稳定在40MB/s以上，读取性能则稳定在30MB/s左右，整体上在支持海量镜像文件存储方面优于基于Ext4的方案。Figure 4 is a schematic diagram of the IO performance test results of the image file block storage scheme in the embodiment of the present invention. As shown in Figure 4, the virtual machine image file storage scheme in the embodiment of the present invention and the common Ext4 (The fourth extended file system, extended log) are adopted Type file system fourth edition) file system design is compared and contrasted, and the experimental environment is that 32 cores main frequency is the Intel (R) Xeon CPU E5-2640 processor of 2GHz, 1 32GB internal memory server, installs the virtual system provided in the implementation of the present invention Then use 256KB as the default block size of the image file, and then write the image file with a size of 1GB to 160GB to test the read and write performance. The test results show that the performance of the optimized storage scheme provided in the embodiment of the present invention is better than that of the Ext4 file system, and the performance difference between the two in terms of image reading is not large, but it can be clearly seen in the writing performance. It is known that with the gradual expansion of the scale of stored image files in the system, the scale of metadata in the system will increase significantly in Ext4, causing its write performance to plummet when the storage capacity is greater than 10G. However, in the storage solution provided in the embodiment of the present invention, The performance of image file writing is relatively stable. For the large-scale growth of image files, the writing speed is stable above 40MB/s, and the reading performance is stable at about 30MB/s. On the whole, it supports massive image files. Storage is superior to Ext4-based solutions.

实施中，对镜像文件块进行存储时，可以按照磁盘簇边界对齐存储。During implementation, when storing mirrored file blocks, they may be stored aligned according to disk cluster boundaries.

具体实施中，分割成定长的镜像文件块可以为4KB或者4KB整数倍大小，在存储至磁盘上可以按磁盘簇边界对齐存储，以有效利用磁盘空间。In a specific implementation, the fixed-length image file block can be 4KB or an integer multiple of 4KB, and can be stored on the disk aligned with the boundary of the disk cluster to effectively use the disk space.

实施中，可以对存储的镜像文件块进行去冗余。During implementation, deredundancy may be performed on stored image file blocks.

具体实施中，对存储的镜像文件块去冗余，可以对镜像文件变长切块或者定长切块，从效果来看，变长切块的效果更好，但实现复杂，计算量也大；本发明实施例中的镜像文件块修改不多，定长切块易于实现，且能实现较好的压缩效果。In the specific implementation, to remove the redundancy of the stored image file blocks, the image file can be cut into variable-length or fixed-length blocks. From the effect point of view, the effect of variable-length cutting is better, but the implementation is complicated and the amount of calculation is also large. ; In the embodiment of the present invention, the image file blocks are not modified much, and fixed-length cutting is easy to implement, and can achieve a better compression effect.

实施中，可以对镜像文件块进行MD5(Message Digest Algorithm MD5，消息摘要算法第五版)指纹特征计算，确定镜像文件块的指纹特征；根据指纹特征对存储的镜像文件块去冗余；在存储镜像文件块的同时存储其指纹特征。During implementation, MD5 (Message Digest Algorithm MD5, message digest algorithm fifth edition) fingerprint feature calculation can be performed on the image file block to determine the fingerprint feature of the image file block; according to the fingerprint feature, the stored image file block is deredundant; While mirroring a file block, its fingerprint feature is stored at the same time.

具体实施中，考虑到在存储虚拟机镜像文件过程中去冗余，需要检测镜像文件中是否含有重复的镜像文件块。可以采用MD5加密算法来进行镜像文件指纹特征的计算，对于具备相同指纹特征的镜像文件块，则被认为是重复冗余的镜像文件块。针对指纹特征算法的选取，本发明实施例中，对MD5加密算法和SHA-1(Secure Hash Algorithm，安全散列算法)算法的性能进行测试比较，将二者应用于计算各定长镜像文件块的指纹特征，找出具备相同数据的镜像块实施去冗余存储的过程。在本发明实施例中，使用了1～16个线程并发的计算测试，计算两种算法在相同线程数下所表现出的吞吐量性能差异性能。图5为本发明实施例中指纹计算的MD5算法与SHA-1算法的性能比较示意图，具体测试比较结果如图5所示：MD5算法在计算吞吐量上明显优于SHA-1算法，并在一定程度下会随着线程数目的增加计算速度优势更为明显，而SHA-1算法则对于CPU计算资源的敏感程度不够，多线程条件下计算吞吐量依然维持在较低的水平。因此，MD5是计算指纹特征的一种更加优异算法。当然，除了MD5算法，也可以采用可以实现同样功能其他算法。In specific implementation, in consideration of eliminating redundancy in the process of storing the virtual machine image file, it is necessary to detect whether the image file contains duplicate image file blocks. The MD5 encryption algorithm can be used to calculate the fingerprint feature of the image file, and the image file blocks with the same fingerprint feature are considered as redundant image file blocks. For the selection of the fingerprint feature algorithm, in the embodiment of the present invention, the performance of the MD5 encryption algorithm and the SHA-1 (Secure Hash Algorithm, secure hash algorithm) algorithm is tested and compared, and the two are applied to calculate each fixed-length image file block Fingerprint characteristics, find out the process of de-redundant storage of mirror blocks with the same data. In the embodiment of the present invention, 1 to 16 concurrent threads are used for calculation tests to calculate the throughput performance difference performance of the two algorithms under the same number of threads. Figure 5 is a schematic diagram of the performance comparison between the MD5 algorithm and the SHA-1 algorithm for fingerprint calculation in the embodiment of the present invention, and the specific test comparison results are as shown in Figure 5: the MD5 algorithm is significantly better than the SHA-1 algorithm in terms of calculation throughput, and in To a certain extent, as the number of threads increases, the advantages of computing speed become more obvious, while the SHA-1 algorithm is not sensitive enough to CPU computing resources, and the computing throughput remains at a low level under multi-threaded conditions. Therefore, MD5 is a more excellent algorithm for calculating fingerprint features. Of course, in addition to the MD5 algorithm, other algorithms that can achieve the same function can also be used.

具体实施中，在完成一个应用程序或操作系统的镜像文件存储后，运行该镜像文件时，对该镜像文件分块并进行MD5指纹特征计算，可以包括：In specific implementation, after completing the image file storage of an application program or operating system, when running the image file, block the image file and perform MD5 fingerprint feature calculation, which may include:

将在不同虚拟机中需进行修改的镜像文件块存储在第一级缓存区，将最近读取的只读镜像文件块存储在第二级缓存区；Store the image file blocks that need to be modified in different virtual machines in the first-level cache, and store the most recently read read-only image file blocks in the second-level cache;

对第二级缓存区里的镜像文件块进行MD5指纹特征计算。Perform MD5 fingerprint feature calculation on the image file blocks in the second-level cache.

具体实施中，可以在内存中维护两级缓存区，将在不同虚拟机中需进行修改的镜像文件块存储在第一级缓存区，这里，第一级缓存区主要缓存各虚拟机客户端运行镜像文件过程中需要频繁修改的镜像文件块，即，第一级缓存区可以命名为私有缓存，在第一级缓存区中的镜像文件块不计算指纹特征。第二级缓存区主要保存最近读取的只读数据块，对第二级缓存区中的镜像文件块进行指纹特征的计算。即，第一级缓存区中的镜像文件块只有被刷入第二级缓存区中，才需要计算指纹特征，采用两级缓存可以有效减少镜像文件块的指纹特征计算量，第一级缓存中的镜像文件块是针对不同虚拟机修改的，不能供其他虚拟机使用，因此不需要计算其指纹特征，这样处理保证了对于频繁修改的镜像文件块能够快速的完成，极大的提升系统的运行效率。In the specific implementation, two-level cache areas can be maintained in the memory, and image file blocks that need to be modified in different virtual machines are stored in the first-level cache area. Here, the first-level cache area mainly caches the virtual machine clients running Image file blocks that need to be frequently modified during the image file process, that is, the first-level cache area can be named as a private cache, and the image file blocks in the first-level cache area do not calculate fingerprint features. The second-level cache area mainly stores the read-only data blocks read recently, and performs fingerprint feature calculation on the image file blocks in the second-level cache area. That is, only when the image file blocks in the first-level cache are flushed into the second-level cache, the fingerprint features need to be calculated. Using two-level cache can effectively reduce the amount of fingerprint feature calculation of the image file blocks. In the first-level cache The image file blocks are modified for different virtual machines and cannot be used by other virtual machines, so there is no need to calculate their fingerprint features. This process ensures that frequently modified image file blocks can be quickly completed, which greatly improves the operation of the system. efficiency.

实施中，可以根据LRU(Least Recently Used，近期最少使用算法)算法确定第二缓存区中的镜像文件块。During implementation, the image file blocks in the second cache area can be determined according to the LRU (Least Recently Used, least recently used algorithm) algorithm.

具体实施中，可以使用LRU算法进行缓存管理，加快虚拟机访问速度。LRU算法，即最少使用页面置换算法，是为虚拟页式存储管理服务的。关于操作系统的内存管理，如何节省利用容量不大的内存为最多的进程提供资源，一直是研究的重要方向。而内存的虚拟存储管理，是现在较为通用的方式，具体为：在内存有限的情况下，扩展一部分外存作为虚拟内存，真正的内存只存储当前运行时所用得到信息。这无疑极大地扩充了内存的功能，极大地提高了计算机的并发处理能力。虚拟页式存储管理，则是将进程所需空间划分为多个页面，内存中只存放当前所需页面，其余页面放入外存的管理方式。In a specific implementation, the LRU algorithm may be used for cache management to speed up the access speed of the virtual machine. The LRU algorithm, that is, the least-used page replacement algorithm, serves for virtual paging storage management. Regarding the memory management of the operating system, how to save and utilize the memory with a small capacity to provide resources for the most processes has always been an important direction of research. The virtual storage management of memory is a more common method now, specifically: in the case of limited memory, expand a part of the external memory as virtual memory, and the real memory only stores the information used during the current operation. This undoubtedly greatly expands the function of the memory and greatly improves the concurrent processing capability of the computer. Virtual page storage management is a management method that divides the space required by the process into multiple pages, stores only the currently required pages in the memory, and puts the rest of the pages in the external memory.

虚拟页式存储管理减少了进程所需的内存空间，却也带来了运行时间变长这一缺点：进程运行过程中，不可避免地要把在外存中存放的一些信息和内存中已有的进行交换，由于外存的低速，这一步骤所花费的时间不可忽略。因而，通过采取优秀的算法以减少读取外存的次数，还可以进一步取得更好的效果。Virtual page storage management reduces the memory space required by the process, but it also brings the disadvantage of longer running time: during the running of the process, it is inevitable to compare some information stored in the external memory with the existing information in the memory. To exchange, due to the low speed of the external memory, the time spent in this step cannot be ignored. Therefore, by adopting an excellent algorithm to reduce the number of times of reading external memory, better results can be further achieved.

实施中，可以将镜像文件块合并为大数据块BLK(Block，数据块)文件后进行存储。During implementation, the image file blocks may be merged into a large data block BLK (Block, data block) file for storage.

实施中，可以获取BLK文件的大小、空闲位置以及镜像文件块的指纹特征所对应的位置偏移量；根据BLK文件的大小、空闲位置以及镜像文件块的指纹特征所对应的位置偏移量，存储镜像文件块。During implementation, the position offset corresponding to the size of the BLK file, the free position and the fingerprint feature of the mirror image file block can be obtained; according to the size of the BLK file, the free position and the position offset corresponding to the fingerprint feature of the mirror image file block, Store image file blocks.

实施中，可以采用位图文件bitmap记录BLK文件已使用和/或未使用的位置。During implementation, a bitmap file bitmap may be used to record the used and/or unused positions of the BLK file.

具体实施中，图4为本发明实施例中镜像文件块存储方案IO性能测试结果示意图，图6为本发明实施例中镜像文件块文件存储示意图，通过对海量镜像文件块文件进行合并管理，减少系统中海量的元数据操作，提高系统IO性能，具体如图6和图5所示，传统的虚拟机镜像文件存储方法是将所有的镜像文件块作为单独的存储文件放置在指定的目录下，这样该目录下将产生大量的镜像文件块文件，同时伴随着大量的文件元数据管理，降低系统性能。本发明实施中可以借鉴Haystack的原理来实现虚拟机镜像文件存储，即，将多个镜像文件块进行合并，将其存在一个的大数据块BLK文件中。In the specific implementation, Fig. 4 is a schematic diagram of the IO performance test results of the mirrored file block storage scheme in the embodiment of the present invention, and Fig. 6 is a schematic diagram of the mirrored file block file storage in the embodiment of the present invention, by merging and managing a large number of mirrored file block files, reducing Massive metadata operations in the system improve system IO performance. As shown in Figure 6 and Figure 5, the traditional virtual machine image file storage method is to place all image file blocks as separate storage files in a specified directory. In this way, a large number of image file block files will be generated in this directory, accompanied by a large amount of file metadata management, reducing system performance. In the implementation of the present invention, the principle of Haystack can be used for reference to realize virtual machine image file storage, that is, multiple image file blocks are merged and stored in one large data block BLK file.

文件系统的上层需要对大数据块BLK的元数据进行管理，包括BLK文件的大小、空闲位置、指纹特征对应偏移量等。这里可以采用BMP(Bitmap，图像文件格式)文件记录BLK文件中使用与空闲记录的bitmap表，IDX(Index，索引)文件记录指纹特征对应镜像文件块在BLK文件中的偏移位置，三个文件都以指纹的首个Byte字符作为其文件名前缀，具体可以如下：The upper layer of the file system needs to manage the metadata of the large data block BLK, including the size of the BLK file, the free location, the offset corresponding to the fingerprint feature, etc. Here you can use the BMP (Bitmap, image file format) file to record the bitmap table used and idle records in the BLK file, and the IDX (Index, index) file to record the offset position of the fingerprint feature corresponding to the mirror file block in the BLK file. Three files Both use the first Byte character of the fingerprint as the prefix of their file name, the details can be as follows:

BMP文件：在每次进行新镜像文件块写入或数据删除时，需要用到BMP文件，该文件以bit位的方式记录了BLK文件中对应偏移量中的数据空间是否已经使用，写入镜像文件块前，首先查看BMP文件中空闲的空间，删除的镜像文件块则将将其对应的bit位置为0。BMP file: Every time a new image file block is written or data is deleted, a BMP file is required. This file records whether the data space in the corresponding offset in the BLK file has been used in bit mode, and writes Before mirroring the file block, first check the free space in the BMP file, and the deleted mirror file block will set its corresponding bit position to 0.

IDX文件：记录对应BLK文件中的元数据内容，主要用于找到指纹特征对应的镜像文件块在BLK文件中的偏移位置，所有镜像文件块的读取首先需要从IDX中读取对应偏移位置，然后再从BLK文件中对应的偏移位置读取镜像文件块，为提高镜像文件块读取的速度，可以将IDX文件内容缓存在内存中。IDX file: Record the metadata content in the corresponding BLK file, mainly used to find the offset position of the image file block corresponding to the fingerprint feature in the BLK file. To read all image file blocks first need to read the corresponding offset from IDX position, and then read the image file block from the corresponding offset position in the BLK file. In order to improve the reading speed of the image file block, the content of the IDX file can be cached in memory.

BLK文件：放置指纹特征对应的镜像文件块数据，BLK为一个类似镜像文件块集合的数组，每个数组元素长度固定为镜像块的切块长度，即，在本发明实施例中为4KB或其整数倍，对于镜像文件块位置的查找，由IDX文件指纹特征对应的镜像文件块位置来确定。BLK file: Place the image file block data corresponding to the fingerprint feature. BLK is an array similar to the image file block set, and the length of each array element is fixed as the length of the image block, that is, in the embodiment of the present invention, it is 4KB or more Integer multiple, for the search of the image file block position, it is determined by the image file block position corresponding to the fingerprint feature of the IDX file.

如图6所示，基于上述三个文件，可以将大量的镜像文件块内容合并在同一个文件中。在虚拟机用户在写入新的镜像块数据时，可以首先从BMP文件中获取可读写的位置BLK_ID，BMP文件将该BLK_ID置为1，然后在IDX文件中写入对应指纹块到BLK_ID的映射，同时将BLK文件中对应BLK_ID的偏移位置提供给用户进行写入操作。当镜像块已经存在于Liquid中时，只需要从IDX文件中读取对应BLK_ID号，然后从BLK文件中读取相关数据。As shown in FIG. 6, based on the above three files, a large number of image file block contents can be merged into the same file. When a virtual machine user writes new image block data, he can first obtain the readable and writable location BLK_ID from the BMP file, set the BLK_ID to 1 in the BMP file, and then write the corresponding fingerprint block to the BLK_ID in the IDX file Mapping, and at the same time provide the offset position corresponding to BLK_ID in the BLK file to the user for writing operation. When the mirror block already exists in Liquid, you only need to read the corresponding BLK_ID number from the IDX file, and then read the relevant data from the BLK file.

在解决虚拟机镜像文件存储的基础上，本发明实施例针对在现有的虚拟机镜像文件分发中单纯依赖镜像文件服务器，限制了虚拟机的建立及运行速度和鲁棒性的问题，提供了一种虚拟机镜像文件分发方法，图7为本发明实施例中虚拟机镜像文件分发方法实施的流程示意图，如图7所示，该方法可以包括如下步骤：On the basis of solving the problem of virtual machine image file storage, the embodiment of the present invention aims at the problem that the existing virtual machine image file distribution only relies on the image file server, which limits the establishment, running speed and robustness of the virtual machine, and provides A method for distributing a virtual machine image file. FIG. 7 is a schematic flow diagram of the implementation of the method for distributing a virtual machine image file in an embodiment of the present invention. As shown in FIG. 7, the method may include the following steps:

步骤701：确定归属本客户端节点虚拟机的镜像文件块；Step 701: Determine the image file block that belongs to the client node virtual machine;

步骤702：基于P2P协议与其他客户端节点建立连接；Step 702: establish connections with other client nodes based on the P2P protocol;

步骤703：在确定其他客户端节点中存在本客户端节点虚拟机所需的镜像文件块时，从其他客户端节点中获取该镜像文件块，否则从数据服务器获取所需的镜像文件块；或者，在确定本客户端节点虚拟机中存在其他客户端节点虚拟机所需的镜像文件块时，将该镜像文件块发送至其他客户端节点，其中，镜像文件块是将虚拟机镜像文件按定长分割后得到的。Step 703: When it is determined that the image file blocks required by the client node virtual machine exist in other client nodes, obtain the image file blocks from other client nodes, otherwise obtain the required image file blocks from the data server; or , when it is determined that there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file block is sent to other client nodes, wherein the image file block is the virtual machine image file according to a predetermined Obtained after long division.

其中，步骤701和步骤702没有严格的时序要求，本领域技术人员应该明白，步骤702既可以在步骤701之后执行，也可以在步骤701之前进行，本发明对上述步骤的执行顺序不作限制。Wherein, step 701 and step 702 do not have strict timing requirements, and those skilled in the art should understand that step 702 can be performed after step 701 or before step 701, and the present invention does not limit the execution order of the above steps.

具体实施中，图8为本发明实施例中虚拟机镜像文件分发方法的应用场景示意图，如图8所示，本发明实施例中基于P2P的数据分发模式，将镜像服务器上的数据传输负载转移到客户端节点上，系统中所有的镜像文件服务器节点以及客户端节点都可以作为镜像文件块的提供节点，互相之间通过交换存储的数据块指纹集合来实现信息的分发，实施中，各个节点(包括镜像文件服务器节点和客户端节点)可以通过心跳包将节点负载相关信息发送给与其连接的节点，以平衡系统中的负载分布，有效规避了网络热点的出现，避免系统中瓶颈的出现。In the specific implementation, FIG. 8 is a schematic diagram of the application scenario of the virtual machine image file distribution method in the embodiment of the present invention. As shown in FIG. On the client node, all mirror file server nodes and client nodes in the system can serve as mirror file block provider nodes, and realize information distribution by exchanging stored data block fingerprint sets. During implementation, each node (including image file server nodes and client nodes) can send node load-related information to the nodes connected to it through heartbeat packets to balance the load distribution in the system, effectively avoid the emergence of network hotspots, and avoid the emergence of bottlenecks in the system.

具体实施中，客户端节点会优先从与其连接的客户端节点中获取镜像文件块，只有当所有与其连接的客户端节点中都不包含需要的镜像文件块对应的指纹特征时，客户端节点才会向镜像文件服务器中发起读取请求，以获得所需要的镜像文件块，有效将镜像文件服务器的数据传输负载转移到客户端节点上，平衡了数据传输的负载，避免了网络热点的出现。In the specific implementation, the client node will first obtain the image file block from the client node connected to it, and only when all the client nodes connected to it do not contain the fingerprint feature corresponding to the required image file block, the client node will It will initiate a read request to the image file server to obtain the required image file blocks, effectively transfer the data transmission load of the image file server to the client node, balance the load of data transmission, and avoid the occurrence of network hotspots.

实施中，可以进一步将从其他客户端或数据服务器获取的镜像文件块存放在缓存区中。During implementation, the image file blocks obtained from other clients or data servers may be further stored in the cache area.

实施中，将从其他客户端或数据服务器获取的镜像文件块存放在缓存区中，可以通过根据近期最少使用算法LRU确定缓存区中的镜像文件块。During implementation, the image file blocks obtained from other clients or data servers are stored in the cache area, and the image file blocks in the cache area can be determined according to the least recently used algorithm LRU.

实施中，可以计算镜像文件块的MD5指纹特征；根据指纹特征确定其他客户端节点中是否存在本客户端节点虚拟机所需的镜像文件块，或者，根据指纹特征确定本客户端节点虚拟机中是否存在其他客户端节点虚拟机所需的镜像文件块。During implementation, the MD5 fingerprint feature of the image file block can be calculated; determine whether the image file block required by the client node virtual machine exists in other client nodes according to the fingerprint feature; Whether there are image file blocks required by other client node virtual machines.

具体实施中，在通过虚拟机镜像文件分发方法获取一个应用程序或操作系统的镜像文件后，运行该镜像文件时，对该镜像文件分块并进行MD5指纹特征计算，可以包括：In specific implementation, after obtaining the image file of an application program or operating system by the virtual machine image file distribution method, when running the image file, block the image file and perform MD5 fingerprint feature calculation, which may include:

实施中，可以根据LRU算法确定第二缓存区中的镜像文件块。During implementation, the image file blocks in the second cache area may be determined according to the LRU algorithm.

实施中，根据指纹特征确定其他客户端节点中是否存在本客户端节点虚拟机所需的镜像文件块，可以包括：向其他客户端节点发送本客户端节点虚拟机所需的镜像文件块指纹特征集合；指纹特征集合用于供其他客户端节点根据该指纹特征集合确定是否存在本客户端节点虚拟机所需的镜像文件块；或者，根据指纹特征确定本客户端节点虚拟机中是否存在其他客户端节点虚拟机所需的镜像文件块，包括：接收其他客户端节点发送的该客户端节点虚拟机所需的镜像文件块指纹特征集合；根据指纹特征集合确定是否存在该客户端节点虚拟机所需的镜像文件块。During implementation, determining whether there is an image file block required by the virtual machine of the client node in other client nodes according to the fingerprint feature may include: sending the fingerprint feature of the image file block required by the virtual machine of the client node to other client nodes set; the fingerprint feature set is used for other client nodes to determine whether there are image file blocks required by the client node virtual machine according to the fingerprint feature set; or, to determine whether there are other client node virtual machines in the client node virtual machine according to the fingerprint feature The image file blocks required by the end node virtual machine include: receiving the fingerprint feature set of the image file block required by the client node virtual machine sent by other client nodes; The required image file blocks.

具体实施中，系统中每个客户端节点可以都向元数据服务器节点寻问其他节点的位置，并维持与这些客户端节点的连接，并且会周期的向这些客户端节点获取最新的镜像文件块指纹特征集合，保证其维护客户端节点信息足够准确。当需要获取具有某个指纹特征对应的镜像文件块时，客户端节点首先随机的选取几个其他客户端节点，检测这些与其连接的客户端节点中是否包含了需要的镜像文件块对应的指纹特征，如果找到了需要的镜像文件块对应的指纹特征，则从该客户端节点获取该镜像文件块。In the specific implementation, each client node in the system can ask the metadata server node for the location of other nodes, maintain connections with these client nodes, and periodically obtain the latest image file blocks from these client nodes A collection of fingerprint features to ensure that it maintains accurate enough client node information. When it is necessary to obtain the image file block corresponding to a certain fingerprint feature, the client node first randomly selects several other client nodes, and checks whether the client nodes connected to it contain the fingerprint feature corresponding to the required image file block , if the fingerprint feature corresponding to the desired image file block is found, the image file block is obtained from the client node.

P2P镜像文件分发协议中寻找Peer数据节点的算法，如下所示：The algorithm for finding Peer data nodes in the P2P image file distribution protocol is as follows:

上述代码含义为：The meaning of the above code is:

寻找Peer数据节点的算法中，基于输入的要读取的镜像文件块的指纹特征，若在Peer数据节点中找到该镜像文件块的指纹特征，找到该指纹特征对应的镜像文件块并进行读取，否则，说明该Peer数据节点中不存在要读取的镜像文件块。In the algorithm for finding peer data nodes, based on the fingerprint feature of the image file block to be read, if the fingerprint feature of the image file block is found in the peer data node, find the image file block corresponding to the fingerprint feature and read it , otherwise, it means that there is no image file block to be read in the Peer data node.

在寻找客户端节点时，以随机的顺序进行查询，这样的做法可以有效地避免系统中热点的出现，保证各个客户端节点的负载被平均地分散开来。When looking for client nodes, query them in random order, which can effectively avoid hot spots in the system and ensure that the load of each client node is evenly distributed.

实施中，镜像文件块指纹特征集合在传输前可以采用Bloom Filter数据结构压缩处理。During implementation, the image file block fingerprint feature set can be compressed using the Bloom Filter data structure before transmission.

具体实施中，Bloom filter是由Howard Bloom在1970年提出的二进制向量数据结构，它具有很好的空间和时间效率，可以用来检测一个元素是不是集合中的一个成员，这种检测只会对在集合内的数据错判，而不会对不是集合内的数据进行错判，这样每个检测请求返回有“在集合内(可能错误)”和“不在集合内(绝对不在集合内)”两种情况，可见Bloom filter是牺牲了正确率换取时间和空间。如需要判断一个元素是不是在一个集合中，通常做法是把所有元素保存下来，然后通过比较知道它是不是在集合内，链表、树都是基于这种思路，当集合内元素个数的变大，需要的空间和时间都线性变大，检索速度也越来越慢。Bloom filter采用的是哈希函数的方法，将一个元素映射到一个m长度的阵列上的一个点，当这个点是1时，那么这个元素在集合内，反之则不在集合内。这个方法的缺点就是当检测的元素很多的时候可能有冲突，解决方法就是使用k个哈希函数对应k个点，如果所有点都是1的话，那么元素在集合内，如果有0的话，元素则不在集合内。In the specific implementation, Bloom filter is a binary vector data structure proposed by Howard Bloom in 1970. It has good space and time efficiency and can be used to detect whether an element is a member of the set. This detection will only The data in the set is misjudged, and the data that is not in the set is not misjudged, so that each detection request returns "in the set (possibly wrong)" and "not in the set (absolutely not in the set)" two In this case, it can be seen that the Bloom filter sacrifices the correct rate for time and space. If you need to judge whether an element is in a set, the usual method is to save all the elements, and then know whether it is in the set by comparison. Linked lists and trees are based on this idea. When the number of elements in the set changes Larger, the space and time required are linearly increased, and the retrieval speed is getting slower and slower. Bloom filter uses a hash function method to map an element to a point on an m-length array. When the point is 1, the element is in the set, otherwise it is not in the set. The disadvantage of this method is that there may be conflicts when there are many detected elements. The solution is to use k hash functions corresponding to k points. If all points are 1, then the element is in the set. If there are 0, the element is not in the set.

一个Bloom Filter有以下参数：A Bloom Filter has the following parameters:

m：bit数组的宽度(bit数)；m: the width of the bit array (number of bits);

n：加入其中的key的数量；n: the number of keys added to it;

k：使用的hash函数的个数；k: the number of hash functions used;

f：False Positive的比率；f: False Positive ratio;

Bloom Filter的f满足下列公式：The f of Bloom Filter satisfies the following formula:

${((11 - - {((11 - - \frac{11}{m m}))}^{kn k n}))}^{k k} \approx \approx {((11 - - {e e}^{- - kn k n / / m m}))}^{k k}$

在给定m和n时，能够使f最小化的k值为：When m and n are given, the value of k that can minimize f is:

$\frac{m m}{n no} ln ln 22 \approx \approx \frac{99 m m}{1313 n no} \approx \approx 0.7 0.7 \frac{m m}{n no}$

此时给出的f为：The f given at this time is:

根据以上公式，对于任意给定的f，有：According to the above formula, for any given f, there are:

n＝m ln(0.6185)/ln(f)n=m ln(0.6185)/ln(f)

同时，需要k个hash来达成这个目标：At the same time, k hashes are needed to achieve this goal:

k＝-ln(f)/ln(2)k=-ln(f)/ln(2)

由于k必须取整数，在Bloom Filter的程序实现中，还应该使用上面的公式来求得实际的f：Since k must be an integer, in the program implementation of Bloom Filter, the above formula should also be used to obtain the actual f:

f＝(1-e^-kn/m)^k f＝(1-e ^-kn/m ) ^k

以上3个公式是程序实现Bloom Filter的关键公式。The above three formulas are the key formulas for the program to realize Bloom Filter.

传输的指纹集合通过Bloom Filter的数据结构进行压缩处理，每个BloomFilter都是一个Bit数组，使用m个二进制位来表示一个具有n个指纹的集合。当新添加一个指纹到Bloom Filter中时，将会使用k个哈希函数得到k个Bit数组中的位置，然后将这些位置上的数据位都设置为1，以表示一个新的指纹加入。当检测一个指纹是否属于一个Bloom Filter时，同样使用k个哈希函数找到bit数组中的k个位置，检测这些位置上的数据是否都为1，当且仅当所有的位置都为1时，证明待检测的指纹属于Bloom Filter中。The transmitted fingerprint set is compressed through the data structure of Bloom Filter. Each BloomFilter is a Bit array, using m binary bits to represent a set with n fingerprints. When a new fingerprint is added to the Bloom Filter, k hash functions will be used to obtain the positions in the k Bit array, and then the data bits in these positions will be set to 1 to indicate that a new fingerprint is added. When detecting whether a fingerprint belongs to a Bloom Filter, also use k hash functions to find k positions in the bit array, and detect whether the data in these positions are all 1, if and only if all the positions are 1, Prove that the fingerprint to be detected belongs to the Bloom Filter.

系统中每个客户端节点都会向元数据服务器节点寻找其他客户端节点的位置，并维持与这些客户端节点的连接，并且会周期的向这些客户端节点获取最新的Boom Filter数组，保证其维护的这些客户端节点信息足够准确。当本客户端节点需要获取具有某个指纹特征对应的镜像文件块时，本客户端节点首先随机的选取几个客户端节点，检测这些客户端节点的Boom Filter数组中是否包含了需要的镜像文件块的指纹特征，如果在这些客户端节点中的一个或几个中找到需要的镜像文件块的指纹特征，本客户端节点会优先从这几个客户端节点中的一个获取镜像块数据，只有当所有客户端节点都不包含需要镜像文件块的指纹数据时，客户端节点才会向镜像文件服务器中发起读取请求，获得所需要的镜像块内容。利用这种优化设计的P2P数据分发模式，可以有效的将镜像服务器上的数据传输负载转移到客户端上，平衡系统中的负载分布，避免网络热点的出现，有效的避免了系统中瓶颈的出现。Each client node in the system will look for the location of other client nodes from the metadata server node, maintain connections with these client nodes, and periodically obtain the latest Boom Filter array from these client nodes to ensure its maintenance These client node information are accurate enough. When the client node needs to obtain the image file block corresponding to a certain fingerprint feature, the client node first randomly selects several client nodes, and checks whether the Boom Filter array of these client nodes contains the required image file If the fingerprint feature of the required image file block is found in one or several of these client nodes, the client node will preferentially obtain the image block data from one of these client nodes, only When all client nodes do not contain the fingerprint data of the required mirror file block, the client node will initiate a read request to the mirror file server to obtain the required mirror block content. Using this optimally designed P2P data distribution mode, the data transmission load on the mirror server can be effectively transferred to the client, and the load distribution in the system can be balanced to avoid the emergence of network hotspots and effectively avoid the emergence of bottlenecks in the system .

实施中，可以以压缩速度高于网络带宽速度的压缩算法压缩镜像文件块。更进一步的，压缩算法可以为Google Snappy压缩算法。During implementation, the image file blocks may be compressed with a compression algorithm whose compression speed is higher than that of the network bandwidth. Furthermore, the compression algorithm may be Google Snappy compression algorithm.

具体实施中，由于镜像文件需要通过网络进行传输。为了尽可能地减少网络带宽的消耗，并且将网络传输的速度提升到最大值，本发明实施例中可以对传输的镜像文件块进行简单的压缩处理，缩小每个数据包的大小。镜像文件块的压缩算法必须要有较快的速度，至少要高于网络带宽，以避免其本身成为瓶颈。此外还不能占用太多的CPU计算资源，以免得影响虚拟机的运行。通过比较了常用的一些成熟压缩算法，例如，Google的Snappy压缩函数库来对镜像文件块进行压缩。该函数库的主要目标是确保较快的压缩速度，而不是为了取得较高的压缩比率。Snappy压缩算法能取得合理的压缩比率，用于压缩网络数据包十分合适。测试结果证明了Snappy数据压缩算法相比gzip、bzip2两种数据压缩算法的优异性，具体测试结果如下：In specific implementation, since the image file needs to be transmitted through the network. In order to reduce the consumption of network bandwidth as much as possible and increase the speed of network transmission to the maximum value, in the embodiment of the present invention, a simple compression process can be performed on the transmitted image file blocks to reduce the size of each data packet. The compression algorithm of the image file block must have a relatively fast speed, at least higher than the network bandwidth, so as to avoid itself becoming a bottleneck. In addition, too many CPU computing resources cannot be occupied, so as not to affect the operation of the virtual machine. By comparing some commonly used mature compression algorithms, for example, Google's Snappy compression function library to compress image file blocks. The main goal of this library is to ensure fast compression speed, not to achieve high compression ratio. The Snappy compression algorithm can achieve a reasonable compression ratio, which is very suitable for compressing network data packets. The test results prove that the Snappy data compression algorithm is superior to the two data compression algorithms gzip and bzip2. The specific test results are as follows:

采用Google Snappy算法是由于该算法比较常用、主流，易于本领域技术人员使用/理解，所以这里以Google Snappy算法为例；但是，从理论上来说，用其他的算法也是可以的，只要能够实现快速压缩的目的即可，Google Snappy算法仅作为一个优选实施例用于教导本领域技术人员具体如何实施本发明，但并不意味仅能使用Google Snappy算法，实施过程中可以结合实践需要来确定相应的算法。The Google Snappy algorithm is used because it is commonly used and mainstream, and is easy to use/understand by those skilled in the art, so here we take the Google Snappy algorithm as an example; however, theoretically speaking, it is also possible to use other algorithms, as long as it can achieve fast The purpose of compression is sufficient. The Google Snappy algorithm is only used as a preferred embodiment to teach those skilled in the art how to implement the present invention, but it does not mean that only the Google Snappy algorithm can be used. In the implementation process, it can be combined with practical needs to determine the corresponding algorithm.

镜像文件的快速克隆免去了拷贝新镜像文件所需要等待的时间和IO开销，极大的缩减了用户创建新虚拟机时所需要花费的时间；在云计算平台中，创建一个新虚拟机的最佳做法就是从预先配置好的镜像文件模板复制出一个新的虚拟机镜像文件，这样可以免去每次新建虚拟机时都重新进行操作系统和应用程序的安装，避免花费过多的时间。绝大部分的虚拟机镜像文件大小都有几GB，逐字节赋值这样大小的镜像文件需要耗费数分钟的时间，且随着需要创建的虚拟机数量的增加，所需要花费的总时间也会呈线性增长。为了有效地创建新虚拟机镜像文件，Liquid提供了镜像文件的快速复制功能，可以在非常短的时间内就创建好新的虚拟机镜像文件。The fast cloning of image files eliminates the waiting time and IO overhead for copying new image files, and greatly reduces the time it takes for users to create new virtual machines; in the cloud computing platform, creating a new virtual machine The best practice is to copy a new virtual machine image file from a pre-configured image file template, which can avoid reinstalling the operating system and applications every time a new virtual machine is created, and avoid spending too much time. Most of the virtual machine image files are several GB in size, and it takes several minutes to assign an image file of this size byte by byte, and as the number of virtual machines to be created increases, the total time required will also increase. It grows linearly. In order to effectively create a new virtual machine image file, Liquid provides a fast copy function of the image file, which can create a new virtual machine image file in a very short time.

对Liquid而言，镜像文件其实就是一个保存了所有镜像文件块指纹特征的元数据表示出来的。复制镜像文件的操作完全可以由复制元数据来取代，只需要额外地对镜像文件块的引用计数进行调整即可。由于Liquid底层的存储是内容关联存储(Content Addressable Storage)，支持写时拷贝的操作，因此对克隆出来的镜像文件修改并不会影响到模板镜像文件的内容，也不会对后续克隆的镜像文件带来影响。虚拟机镜像文件的元数据主要是由指纹元数据组成的，其数据规模相对于镜像文件而言要小很多。当使用256KB镜像文件块大小时，一个10GB的镜像文件的元数据仅有640KB，其复制操作可以在毫秒级别完成。相对于逐字节复制镜像文件内容的方案，Liquid所提供的镜像克隆功能可以将创建新虚拟机镜像文件的时间大大缩短，加快了创建新虚拟机镜像的速度。For Liquid, an image file is actually represented by metadata that preserves the fingerprint characteristics of all image file blocks. The operation of copying the mirror file can be completely replaced by copying the metadata, and it only needs to adjust the reference count of the mirror file block additionally. Since the underlying storage of Liquid is Content Addressable Storage, which supports copy-on-write operations, the modification of the cloned image file will not affect the content of the template image file, nor will it affect the subsequent cloned image file. affect. The metadata of the virtual machine image file is mainly composed of fingerprint metadata, and its data size is much smaller than that of the image file. When a 256KB image file block size is used, the metadata of a 10GB image file is only 640KB, and its copy operation can be completed in milliseconds. Compared with the solution of copying the content of the image file byte by byte, the image cloning function provided by Liquid can greatly shorten the time for creating a new virtual machine image file and speed up the creation of a new virtual machine image.

最后，镜像文件可以通过网络进行按需传输，可以将网络传输的镜像文件内容缩减到最少，能够有效的降低网络负载。Liquid支持利用写时拷贝(Copy-on-Write)的技术，将一个镜像文件块从其他客户端节点复制到当前客户端节点上，然后依据虚拟机的访问而实时下载需要的镜像文件块。这使得客户端节点可以在镜像文件还未传输到本地存储的情况下，就启动虚拟机，可以免去等待传输镜像文件的时间，从而显著缩短虚拟机启动所花费的时间。Finally, the image file can be transmitted on demand through the network, which can reduce the content of the image file transmitted over the network to a minimum, and can effectively reduce the network load. Liquid supports copy-on-write (Copy-on-Write) technology to copy an image file block from other client nodes to the current client node, and then download the required image file block in real time according to the access of the virtual machine. This enables the client node to start the virtual machine before the image file is transferred to the local storage, which saves the waiting time for transferring the image file, thereby significantly shortening the time spent on starting the virtual machine.

此外，由于通过网络传输的数据仅限于启动时需要访问的部分，通过网络传输的数据量被限制在一个较小的值。这相比下载完整个镜像文件再启动虚拟机的做法，即节省了等待时间，又缩减了网络传输的开销，具有极大的优势。Furthermore, since the data transferred over the network is limited to the parts that need to be accessed at startup, the amount of data transferred over the network is limited to a small value. Compared with the practice of downloading the entire image file and then starting the virtual machine, this saves waiting time and reduces network transmission overhead, which has great advantages.

基于虚拟机镜像文件存储的方法同一发明构思，本发明实施例中还提供了一种虚拟机镜像文件存储装置。由于这些设备解决问题的原理与一种虚拟机镜像文件存储方法相似，因此这些设备的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept as the method for storing a virtual machine image file, an embodiment of the present invention also provides a virtual machine image file storage device. Because the principle of these devices to solve the problem is similar to a method for storing virtual machine image files, the implementation of these devices can be referred to the implementation of the method, and the repetition will not be repeated.

图9为本发明实施例中虚拟机镜像文件存储装置示意图，如图9所示，虚拟机镜像文件存储装置可以包括：FIG. 9 is a schematic diagram of a storage device for a virtual machine image file in an embodiment of the present invention. As shown in FIG. 9, the storage device for a virtual machine image file may include:

分割单元901，用于将镜像文件分割成定长的镜像文件块；A segmentation unit 901, configured to segment the image file into fixed-length image file blocks;

存储单元902，用于对镜像文件块进行存储。The storage unit 902 is configured to store image file blocks.

实施中，分割单元可以进一步用于按4KB的N倍将镜像文件分割成定长的镜像文件块，其中N为自然数。During implementation, the division unit may be further used to divide the image file into fixed-length image file blocks by N times of 4KB, where N is a natural number.

实施中，分割单元，可以进一步用于将镜像文件分割成256KB大小的定长镜像文件块。During implementation, the splitting unit may be further used to split the image file into fixed-length image file blocks of 256 KB.

实施中，存储单元，可以进一步用于按照磁盘簇边界对齐的方式存储镜像文件块。In implementation, the storage unit may be further used to store mirrored file blocks in a disk cluster boundary-aligned manner.

实施中，存储单元，可以进一步用于对存储的镜像文件块进行去冗余。In implementation, the storage unit may be further used to de-redundant the stored image file blocks.

实施中，可以进一步包括：In implementation, it may further include:

指纹特征单元，用于对镜像文件块进行MD5指纹特征计算，确定镜像文件块的指纹特征；The fingerprint feature unit is used to carry out MD5 fingerprint feature calculation to the image file block to determine the fingerprint feature of the image file block;

存储单元可以进一步用于根据指纹特征对存储的镜像文件块去冗余，并在存储镜像文件块的同时存储其指纹特征。The storage unit can be further used to de-redundant the stored image file blocks according to the fingerprint features, and store the fingerprint features while storing the image file blocks.

实施中，指纹特征单元用于在完成一个应用程序或操作系统镜像文件的存储后，运行镜像文件时，可以包括：In implementation, the fingerprint feature unit is used to run the image file after completing the storage of an application program or operating system image file, which may include:

缓存子单元，用于将在不同虚拟机中需进行修改的镜像文件块存储在第一级缓存区，将最近读取的只读镜像文件块存储在第二级缓存区；The cache subunit is used to store image file blocks that need to be modified in different virtual machines in the first-level cache area, and store recently read read-only image file blocks in the second-level cache area;

指纹特征子单元，用于对第二级缓存区里的镜像文件块进行MD5指纹特征计算；The fingerprint characteristic sub-unit is used for carrying out MD5 fingerprint characteristic calculation to the image file block in the second-level cache area;

确定子单元，用于根据指纹特征子单元计算的指纹特征确定镜像文件块的指纹特征。The determining subunit is used to determine the fingerprint feature of the image file block according to the fingerprint feature calculated by the fingerprint feature subunit.

实施中，缓存子单元，可以进一步用于根据近期最少使用算法LRU确定第二缓存区中的镜像文件块。In implementation, the cache subunit may be further configured to determine the image file blocks in the second cache area according to the least recently used algorithm (LRU).

实施中，存储单元，可以进一步用于将镜像文件块合并为大数据块BLK文件后进行存储。In implementation, the storage unit can be further used to merge the image file blocks into a large data block BLK file for storage.

实施中，存储单元，可以进一步用于根据BLK文件的大小、空闲位置以及镜像文件块的指纹特征所对应的位置偏移量，存储镜像文件块。In implementation, the storage unit can be further used to store the mirrored file block according to the size of the BLK file, the free location, and the position offset corresponding to the fingerprint feature of the mirrored file block.

实施中，存储单元，可以进一步用于采用位图文件bitmap记录BLK文件已使用和/或未使用的位置。In implementation, the storage unit can be further used to record the used and/or unused positions of the BLK file by using the bitmap file bitmap.

在本发明实施例提供的虚拟机镜像文件存储装置，通过对虚拟机镜像文件分割成定长的镜像文件块进行存储，由于通过对镜像文件定长分割成块，保证了较高的压缩比，所以，可以有效节约虚拟机镜像文件的存储空间。In the virtual machine image file storage device provided in the embodiment of the present invention, by dividing the virtual machine image file into fixed-length image file blocks for storage, since the image file is divided into fixed-length blocks, a higher compression ratio is ensured. Therefore, the storage space of the virtual machine image file can be effectively saved.

基于虚拟机镜像文件分发方法同一发明构思，本发明实施例中还提供了一种虚拟机镜像文件分发装置。由于这些设备解决问题的原理与一种虚拟机镜像文件分发方法相似，因此这些设备的实施可以参见方法的实施，重复之处不再赘述。Based on the same inventive concept as the virtual machine image file distribution method, an embodiment of the present invention also provides a virtual machine image file distribution device. Since the principle of these devices to solve the problem is similar to a method for distributing virtual machine image files, the implementation of these devices can be referred to the implementation of the method, and the repetition will not be repeated.

图10为本发明实施例中虚拟机镜像文件分发装置示意图，如图10所示，虚拟机镜像文件分发装置可以包括：FIG. 10 is a schematic diagram of a virtual machine image file distribution device in an embodiment of the present invention. As shown in FIG. 10, the virtual machine image file distribution device may include:

确定单元1001，用于确定归属本客户端节点虚拟机的镜像文件块；A determining unit 1001, configured to determine the image file blocks that belong to the client node virtual machine;

连接单元1002，用于基于对等网络P2P协议与其他客户端节点建立连接；A connection unit 1002, configured to establish connections with other client nodes based on the peer-to-peer network P2P protocol;

分发单元1003，用于在确定其他客户端节点中存在本客户端节点虚拟机所需的镜像文件块时，从其他客户端节点中获取所述镜像文件块，否则从数据服务器获取所述镜像文件块；或者，在确定本客户端节点虚拟机中存在其他客户端节点虚拟机所需的镜像文件块时，将所述镜像文件块发送至其他客户端节点，其中，所述镜像文件块是将虚拟机镜像文件按定长分割后得到的。The distribution unit 1003 is configured to obtain the image file blocks from other client nodes when it is determined that the image file blocks required by the virtual machine of the client node exist in other client nodes, otherwise obtain the image file blocks from the data server block; or, when it is determined that there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file block is sent to other client nodes, wherein the image file block is The virtual machine image file is obtained by dividing the image file according to a fixed length.

实施中，该虚拟机镜像文件分发装置，可以进一步包括缓存单元，用于将从其他客户端或数据服务器获取的镜像文件块存放在缓存区中。In implementation, the virtual machine image file distribution device may further include a cache unit, configured to store image file blocks obtained from other clients or data servers in the cache area.

实施中，缓存单元，进一步用于根据近期最少使用算法LRU确定缓存区中的镜像文件块。In implementation, the cache unit is further configured to determine the image file blocks in the cache area according to the least recently used algorithm (LRU).

实施中，该虚拟机镜像文件分发装置，可以进一步包括指纹特征计算单元，可以用于计算镜像文件块的MD5指纹特征；In implementation, the virtual machine image file distribution device can further include a fingerprint feature calculation unit, which can be used to calculate the MD5 fingerprint feature of the image file block;

分发单元，可以进一步用于根据指纹特征确定其他客户端节点中是否存在本客户端节点虚拟机所需的镜像文件块，或者，根据指纹特征确定本客户端节点虚拟机中是否存在其他客户端节点虚拟机所需的镜像文件块。The distribution unit can be further used to determine whether there are image file blocks required by the client node virtual machine in other client nodes according to the fingerprint characteristics, or to determine whether other client nodes exist in the client node virtual machine according to the fingerprint characteristics The image file blocks required by the virtual machine.

实施中，分发单元，可以进一步用于在根据指纹特征确定其他客户端节点中是否存在本客户端节点虚拟机所需的镜像文件块时，向其他客户端节点发送本客户端节点虚拟机所需的镜像文件块指纹特征集合；该指纹特征集合用于供其他客户端节点根据该指纹特征集合确定是否存在本客户端节点虚拟机所需的镜像文件块；或者，在根据该指纹特征确定本客户端节点虚拟机中是否存在其他客户端节点虚拟机所需的镜像文件块时，接收其他客户端节点发送的该客户端节点虚拟机所需的镜像文件块指纹特征集合；根据该指纹特征集合确定是否存在该客户端节点虚拟机所需的镜像文件块。In implementation, the distribution unit can be further used to send the image file blocks required by the virtual machine of the client node to other client nodes when determining whether there are image file blocks required by the virtual machine of the client node according to the fingerprint feature. The fingerprint feature set of the image file block; the fingerprint feature set is used for other client nodes to determine whether there is an image file block required by the virtual machine of the client node based on the fingerprint feature set; or, when determining the client node based on the fingerprint feature When there are image file blocks required by other client node virtual machines in the end node virtual machine, receive the image file block fingerprint feature set required by the client node virtual machine sent by other client nodes; determine according to the fingerprint feature set Whether there are image file blocks required by this client node virtual machine.

实施中，该分发单元，可以进一步用于在向其他客户端节点发送本客户端节点虚拟机所需的镜像文件块指纹特征集合前，对所述镜像文件块指纹特征集合进行Bloom Filter数据结构压缩处理。In implementation, the distribution unit can be further used to perform Bloom Filter data structure compression on the image file block fingerprint feature set before sending the image file block fingerprint feature set required by the client node virtual machine to other client nodes deal with.

实施中，分发单元，可以进一步用于以压缩速度高于网络带宽速度的压缩算法压缩镜像文件块。In implementation, the distribution unit may be further configured to compress the image file blocks with a compression algorithm whose compression speed is higher than the network bandwidth speed.

实施中，分发单元，可以进一步用于采用Google Snappy压缩算法压缩镜像文件块。During implementation, the distribution unit can be further used to compress image file blocks using the Google Snappy compression algorithm.

本发明实施例中提供的虚拟机镜像文件分发装置，因为采用P2P协议在虚拟机客户端间进行镜像文件的快速克隆、分发的工作原理，将所有的镜像文件服务器和客户端节点均可作为镜像文件块的提供者，相互交换存储的镜像文件块，有效将镜像文件服务器的数据传输负载转移到客户端节点上，平衡了数据传输的负载，避免了网络热点的出现，有效缩短了拷贝镜像文件所需等待的时间和IO开销，提升了虚拟机镜像文件分发的效率和速度。The virtual machine image file distribution device provided in the embodiment of the present invention uses the P2P protocol to quickly clone and distribute image files between virtual machine clients, so all image file servers and client nodes can be used as image files. The provider of the file block exchanges the stored image file blocks with each other, effectively transfers the data transmission load of the image file server to the client node, balances the data transmission load, avoids the occurrence of network hotspots, and effectively shortens the copying of image files. The required waiting time and IO overhead improve the efficiency and speed of virtual machine image file distribution.

下面以实例来进行说明，该实例中，采用了本发明实施例中提供的虚拟机镜像文件存储方案，以及虚拟机镜像文件分发方案。An example is used below to illustrate. In this example, the virtual machine image file storage solution and the virtual machine image file distribution solution provided in the embodiment of the present invention are adopted.

图11为本发明实施例中Bonnie++系统性能测试结果示意图，图12为本发明实施例中PostMark系统性能测试结果示意图，图13为本发明实施例中Linux虚拟机性能测试结果示意图，如图11至图13所示，为了比较不同的镜像存储系统的性能，并且更加实际地反映Liquid镜像存储系统的性能，本发明实施例利用虚拟机对它们进行性能测试实验，实验环境部署在清华虚拟计算平台Nova系统中。Figure 11 is a schematic diagram of the performance test results of the Bonnie++ system in the embodiment of the present invention, Figure 12 is a schematic diagram of the performance test results of the PostMark system in the embodiment of the present invention, and Figure 13 is a schematic diagram of the performance test results of the Linux virtual machine in the embodiment of the present invention, as shown in Figure 11 to As shown in Figure 13, in order to compare the performance of different mirrored storage systems and reflect the performance of the Liquid mirrored storage system more realistically, the embodiment of the present invention uses a virtual machine to perform performance test experiments on them, and the experimental environment is deployed on the Tsinghua virtual computing platform Nova system.

首先，需要测试是比较虚拟机的IO性能。实验使用Bonnie++以及PostMark这两个IO性能测试工具，比较了存储在本地文件系统中的Raw格式、Qcow2格式，以及存储于Liquid中的Raw镜像格式文件的IO性能。在本实验中，Liquid使用了16KB至2MB的镜像块大小。所有的虚拟机镜像都是一个新创建的50GB的文件，使用ext4文件系统。虚拟机使用了2GB的内存以及1个vCPU，运行的操作系统环境是Ubuntu 10.10以及Linux Kernel 2.6.35。First of all, the test is to compare the IO performance of the virtual machine. The experiment uses two IO performance testing tools, Bonnie++ and PostMark, to compare the IO performance of Raw format, Qcow2 format stored in the local file system, and Raw image format files stored in Liquid. In this experiment, Liquid used image block sizes ranging from 16KB to 2MB. All virtual machine images are a newly created 50GB file using the ext4 file system. The virtual machine uses 2GB of memory and 1 vCPU, and the running operating system environment is Ubuntu 10.10 and Linux Kernel 2.6.35.

请参阅图11和图12的结果图所示，通过测试结果可以看出来，在物理机上的本地文件系统的Native IO具有最佳的性能。这被用于对其他的镜像文件存储解决方案的比较基准。Raw格式的镜像文件具有最好的读写性能，因为它的格式简单，不会带来过多的性能开销。对于Qcow2格式的镜像文件，以及存储在Liquid系统中的镜像文件，由于写操作会需要频繁地涉及到新数据块的分配，其写性能受到了比较明显的影响。不过Liquid尽管在写数据时还要提供去冗余的支持，它的IO速度还是要比Qcow2稍快一些，因为其内部使用了优秀的缓存机制，以确保写操作能够有效地进行。相比写操作而言，数据读取的速度受到的影响则不那么明显，因为实际上读取的数据还会被操作系统进行缓存。一个合理的数据块大小将能够更加有效地使用缓存，而较大的数据块会导致较低的缓存命中率，较小的数据块则因为需要频繁地进行随机IO操作会带来额外的性能开销。Please refer to the results shown in Figure 11 and Figure 12. From the test results, it can be seen that the Native IO of the local file system on the physical machine has the best performance. This is used as a benchmark against other image file storage solutions. The image file in Raw format has the best read and write performance, because its format is simple and will not bring too much performance overhead. For image files in Qcow2 format and image files stored in the Liquid system, since write operations frequently involve the allocation of new data blocks, their write performance is significantly affected. However, although Liquid also provides redundant support when writing data, its IO speed is still slightly faster than Qcow2, because it uses an excellent caching mechanism internally to ensure that write operations can be performed efficiently. Compared with writing operations, the speed of data reading is less affected, because the read data is actually cached by the operating system. A reasonable data block size will be able to use the cache more effectively, while a larger data block will result in a lower cache hit rate, and a smaller data block will cause additional performance overhead due to the need for frequent random IO operations .

如图13所示，对虚拟机启动速度的测试集合，Raw镜像格式的表现最佳，其次是Qcow2格式的镜像文件。在Liquid内存储的虚拟机镜像文件的启动速度稍微慢一些，因为缓存还能来得及做好准备存储好常用的数据块以便提供IO加速。较小的镜像文件块会延长启动时间，因为在虚拟机启动的阶段会需要执行更多的随机IO操作。Linux源代码的解压和编译操作的测试中，Liquid的性能要好于Qcow2格式。这是因为Liquid中的缓存机制确保了最近频繁访问的镜像文件块被缓存在内存中，以确保较好的IO性能。这两个测试会产生大量的小文件，对于大量的镜像文件块，Liquid并不直接将这些镜像文件块写入到磁盘，而是将其暂时放置在缓存中，以便于在后续需要写入同一镜像文件块时，能够快速地进行。当缓存被填满时，Liquid会集中、批量地将镜像文件写入到磁盘存储中，这要比零散的小块数据IO性能更好。As shown in Figure 13, for the test set of virtual machine startup speed, the Raw image format performed best, followed by the Qcow2 image file format. Virtual machine image files stored in Liquid are slightly slower to start, because the cache still has time to prepare and store commonly used data blocks to provide IO acceleration. Smaller image file blocks increase boot time because more random IO operations are performed during the virtual machine startup phase. In the test of decompression and compilation operation of Linux source code, the performance of Liquid is better than that of Qcow2 format. This is because the caching mechanism in Liquid ensures that recently frequently accessed image file blocks are cached in memory to ensure better IO performance. These two tests will generate a large number of small files. For a large number of image file blocks, Liquid does not directly write these image file blocks to disk, but temporarily places them in the cache, so that they can be written to the same file later. When mirroring file blocks, it is possible to do so quickly. When the cache is full, Liquid will centrally and batch-write the image files to the disk storage, which is better than fragmented small data IO performance.

图14为本发明实施例中传输虚拟机镜像文件花费的时间对比示意图，如图14所示，为了说明本发明实施例中提供的虚拟机镜像文件分发方法的速度，将一个8GB的镜像文件从1个客户端节点传输到7个其他客户端节点所需要花费的时间。用于测试的是一个新安装的Ubuntu 10.10的镜像文件，使用的是256KB的数据块大小。从测试结果来看，在这些测试中，直接使用scp工具(Secure Copy，安全拷贝)传输镜像文件需要花费最多的时间，因为镜像文件源节点本身成为了系统中的一个热点，其网络传输的带宽被饱和了，且scp工具本身的加密、解密算法也带来了不小的性能损失。NFS存储也面临类似的问题，但由于本身实现上的优化，性能要比scp好一些。BitTorrent协议通过将数据传输的负载均摊给所有的节点，而避免了镜像文件源节点成为一个瓶颈，其性能远远好于scp工具和NFS存储。不过BitTorrent协议本身并没有考虑到镜像中的冗余数据，因此重复的数据依然还是通过网络进行了传输。Liquid同样使用P2P的传输方式，但是考虑到了冗余数据块的存在，避免了通过网络传输重复的数据，因此要比BitTorrent还取得了更好的性能。Fig. 14 is a schematic diagram of the comparison of the time spent in transferring the virtual machine image file in the embodiment of the present invention. As shown in Fig. 14, in order to illustrate the speed of the method for distributing the virtual machine image file provided in the embodiment of the present invention, an 8GB image file is transferred from The time it takes for 1 client node to transmit to 7 other client nodes. The image file used for testing is a freshly installed Ubuntu 10.10, using a data block size of 256KB. From the test results, in these tests, it takes the most time to directly use the scp tool (Secure Copy, safe copy) to transfer the image file, because the source node of the image file itself has become a hotspot in the system, and the bandwidth of its network transmission It is saturated, and the encryption and decryption algorithm of the scp tool itself also brings a lot of performance loss. NFS storage also faces similar problems, but due to the optimization of its own implementation, the performance is better than scp. The BitTorrent protocol avoids the bottleneck of the image file source node by sharing the load of data transmission to all nodes, and its performance is far better than scp tools and NFS storage. However, the BitTorrent protocol itself does not take into account the redundant data in the mirror, so the duplicate data is still transmitted through the network. Liquid also uses the P2P transmission method, but it takes into account the existence of redundant data blocks and avoids transmitting duplicate data through the network, so it has achieved better performance than BitTorrent.

按需传输(On-demand Fetching)的性能则是通过启动虚拟机的速度来进行衡量的。如图15所示，相比于传统的下载完毕镜像文件再启动虚拟机的做法，按需传输可以在镜像文件还未下载完毕的时候就启动虚拟机，通过实时传输需要的镜像数据块的做法来满足虚拟机的IO需求。在实验中可以看到，按需传输所需要的启动时间要比传统的做法长，这是因为每次缺少镜像文件块时，都需要实时地进行一次RPC(Remote Procedure Call Protocol，远程过程调用协议)，下载相应的镜像文件块。这相比本地IO，会带来更长的延时。当镜像文件块变小的时候，本地磁盘发生的缺少数据块的次数会增多，因此也就需要更多的网络IO操作，进而导致更长的启动时间。但是即便有这样的时间开销，虚拟机启动的整体时间是被缩短了的，因此这实际上是加快了虚拟机的启动速度。此外，本发明实施例还进行了虚拟机的启动速度测试，Linux内核源代码的解压缩以及编译等其他系统性能测试。这套测试集合既包括了计算密集(CPU-Intensive)的测试，也包括了IO密集(IO-Intensive)的测试，能够有效地代表常见的虚拟机工作负载。在本测试中，使用的Linux内核源代码是3.0.4。The performance of On-demand Fetching is measured by the speed of starting the virtual machine. As shown in Figure 15, compared to the traditional method of starting the virtual machine after downloading the image file, on-demand transmission can start the virtual machine before the image file is downloaded, and transmit the required image data blocks in real time. To meet the IO requirements of the virtual machine. It can be seen in the experiment that the start-up time required for on-demand transmission is longer than that of the traditional method. This is because an RPC (Remote Procedure Call Protocol, Remote Procedure Call Protocol) needs to be performed in real time each time a mirror file block is missing. ), download the corresponding image file block. This will bring longer latency than local IO. When the image file blocks become smaller, the number of missing data blocks on the local disk will increase, so more network IO operations will be required, resulting in longer startup time. But even with this time overhead, the overall time to start the virtual machine is shortened, so this actually speeds up the startup of the virtual machine. In addition, the embodiment of the present invention also tests the startup speed of the virtual machine, decompresses and compiles the source code of the Linux kernel, and other system performance tests. This test set includes both CPU-Intensive and IO-Intensive tests, which can effectively represent common virtual machine workloads. In this test, the Linux kernel source code used is 3.0.4.

为了描述的方便，以上装置的各部分以功能分为各种模块或单元分别描述。当然，在实施本发明时可以把各模块或单元的功能在同一个或多个软件或硬件中实现。For the convenience of description, each part of the above device is divided into various modules or units by function and described separately. Of course, when implementing the present invention, the functions of each module or unit can be implemented in one or more pieces of software or hardware.

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

尽管已描述了本发明的优选实施例，但本领域内的技术人员一旦得知了基本创造性概念，则可对这些实施例作出另外的变更和修改。所以，所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。While preferred embodiments of the invention have been described, additional changes and modifications to these embodiments can be made by those skilled in the art once the basic inventive concept is appreciated. Therefore, it is intended that the appended claims be construed to cover the preferred embodiment as well as all changes and modifications which fall within the scope of the invention.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention also intends to include these modifications and variations.

Claims

1. a virtual machine image file storage method, is characterized in that, comprises the steps:

Divide the image file into fixed-length image file blocks;

Store mirrored file blocks.

2. The method according to claim 1, wherein the fixed-length image file block size is N times 4KB, wherein N is a natural number.

3. The method according to claim 2, wherein the fixed-length image file block size is 256KB.

4. The method according to any one of claims 1 to 3, wherein when storing the image file blocks, they are aligned and stored according to disk cluster boundaries.

5. The method of claim 1, further comprising:

Deredundancy is performed on stored image file blocks.

6. The method of claim 5, further comprising:

Perform message digest algorithm fifth edition MD5 fingerprint feature calculation on the image file block to determine the fingerprint feature of the image file block;

De-redundancy the stored image file blocks according to the fingerprint characteristics;

While storing the image file block, its fingerprint feature is stored.

7. The method according to claim 1, characterized in that merging the image file blocks into a large data block BLK file and storing them.

8. The method of claim 7, comprising:

Obtain the size of the BLK file, the free position and the position offset corresponding to the fingerprint feature of the image file block;

The image file block is stored according to the size of the BLK file, the free location, and the position offset corresponding to the fingerprint feature of the image file block.

9. The method of claim 8, comprising:

A bitmap file bitmap is used to record the used and/or unused positions of the BLK file.

10. A method for distributing virtual machine image files, comprising the steps of:

Determine the image file blocks that belong to the virtual machine of the client node;

Establish connections with other client nodes based on the peer-to-peer network P2P protocol;

When it is determined that there are image file blocks required by the client node virtual machine in other client nodes, obtain the image file blocks from other client nodes, otherwise obtain the required image file blocks from the data server; or, in When it is determined that there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file blocks are sent to other client nodes, wherein the image file blocks are virtual machine image files Obtained after dividing by fixed length.

11. The method according to claim 10, further comprising: storing the image file blocks obtained from other clients or data servers in a cache area.

12. The method according to claim 11, wherein storing the image file blocks obtained from other clients or data servers in the cache area comprises: determining the image file blocks in the cache area according to the least recently used algorithm (LRU).

13. The method of claim 10, further comprising:

Calculate the MD5 fingerprint feature of the image file block;

Determine whether there are image file blocks required by the virtual machine of this client node in other client nodes according to the fingerprint feature, or determine whether there is an image file required by the virtual machine of other client nodes in the virtual machine of the client node according to the fingerprint feature file blocks.

14. The method according to claim 13, wherein, according to the fingerprint feature, determining whether there are image file blocks required by the virtual machine of the client node in other client nodes comprises: sending the client node to other client nodes The image file block fingerprint feature set required by the node virtual machine; the fingerprint feature set is used for other client nodes to determine whether there is an image file block required by the client node virtual machine according to the fingerprint feature set;

Or, determine whether there are image file blocks required by other client node virtual machines in the client node virtual machine according to the fingerprint feature, including: receiving image file blocks required by the client node virtual machine sent by other client nodes A fingerprint feature set; determine whether there is an image file block required by the client node virtual machine according to the fingerprint feature set.

15. The method according to claim 14, further comprising: compressing the image file block fingerprint feature set using a Bloom Filter data structure before transmission.

16. The method of claim 10, further comprising:

The image file blocks are compressed with a compression algorithm whose compression speed is higher than the network bandwidth speed.

17. The method according to claim 16, comprising:

The compression algorithm is the Google Snappy compression algorithm.

18. A virtual machine image file storage device, comprising:

A split unit, used to split the image file into fixed-length image file blocks;

The storage unit is used to store the image file blocks.

19. The device according to claim 18, wherein the division unit is further configured to divide the image file into fixed-length image file blocks by N times of 4KB, wherein N is a natural number.

20. The device according to claim 19, wherein the splitting unit is further configured to split the image file into 256KB fixed-length image file blocks.

21. The device according to any one of claims 18 to 20, wherein the storage unit is further configured to store mirrored file blocks in a disk cluster boundary-aligned manner.

22. The device according to claim 18, wherein the storage unit is further configured to perform de-redundancy on the stored image file blocks.

23. The apparatus of claim 22, further comprising:

The fingerprint feature unit is used to carry out MD5 fingerprint feature calculation to the image file block to determine the fingerprint feature of the image file block;

The storage unit is further used to de-redundant the stored image file blocks according to the fingerprint features, and store the fingerprint features while storing the image file blocks.

24. The device according to claim 18, wherein the storage unit is further configured to merge the image file blocks into a large data block BLK file for storage.

25. The device according to claim 24, wherein the storage unit is further configured to store the image file block according to the size of the BLK file, the free location, and the position offset corresponding to the fingerprint feature of the image file block.

26. The device according to claim 25, wherein the storage unit is further configured to use a bitmap file bitmap to record the used and/or unused positions of the BLK file.

27. A device for distributing virtual machine image files, comprising:

A determination unit is used to determine the image file blocks that belong to the virtual machine of the client node;

A connection unit, configured to establish connections with other client nodes based on a peer-to-peer network P2P protocol;

A distribution unit, configured to obtain the image file blocks from other client nodes when it is determined that the image file blocks required by the virtual machine of the client node exist in other client nodes, otherwise obtain the image file blocks from the data server or, when it is determined that there are image file blocks required by other client node virtual machines in the client node virtual machine, the image file blocks are sent to other client nodes, wherein the image file blocks are virtual The machine image file is obtained by dividing the image file according to the fixed length.

28. The device according to claim 27, further comprising: a cache unit, configured to store image file blocks obtained from other clients or data servers in the cache area.

29. The device according to claim 28, wherein the cache unit is further configured to determine the image file blocks in the cache according to a least recently used algorithm (LRU).

30. The apparatus of claim 27, further comprising:

A fingerprint feature calculation unit, configured to calculate the MD5 fingerprint feature of the image file block;

The distribution unit is further used to determine whether there are image file blocks required by the client node virtual machine in other client nodes according to the fingerprint feature, or to determine whether there are other client node virtual machines in the client node virtual machine according to the fingerprint feature The desired image file blocks.

31. The device according to claim 30, wherein the distributing unit is further configured to send other clients The node sends the image file block fingerprint feature set required by the client node virtual machine; the fingerprint feature set is used for other client nodes to determine whether there is an image file block required by the client node virtual machine according to the fingerprint feature set; or , when determining whether there are image file blocks required by other client node virtual machines in the client node virtual machine according to the fingerprint characteristics, receiving the image file block fingerprints required by the client node virtual machine sent by other client nodes A feature set; determine whether there is an image file block required by the client node virtual machine according to the fingerprint feature set.

32. The device according to claim 31, wherein the distribution unit is further configured to, before sending the image file block fingerprint feature set required by the client node virtual machine to other client nodes, The fingerprint feature set is processed by Bloom Filter data structure compression.

33. The device according to claim 27, wherein the distribution unit is further configured to compress the image file block with a compression algorithm whose compression speed is higher than a network bandwidth speed.

34. The device according to claim 33, wherein the distributing unit is further configured to compress the image file block using the Google Snappy compression algorithm.