Extremely cold object tape storage system, access and management method
Technical Field
The application belongs to the technical field of big data, and particularly relates to an extremely cold object tape storage system, an access and management method.
Background
With the development of internet technology, in the rapid evolution of large-scale data storage technology, with the development of electronic commerce business, electronic payment and online file editing and storage of network, each user stores a large amount of personal sensitive data in the network.
Currently, every individual enjoying internet service is generating a large amount of data every day, which data changes over time, with a gradual decrease in the probability of being revisited. In software engineering, data management of different access frequencies is called thermal management. The low-heat data is stored in an inexpensive storage medium, and the high-heat data is stored in a storage medium having a high access speed.
Chinese patent, CN202410373669.9, application date 2024.03.29, application name "a method and device for recovering data in a distributed object storage system", proposes that "recovering backup data from a tape library to a distributed file system by using an object recovering component to obtain the whole content of a recovered object", scanning the whole content of the recovered object, analyzing the whole content of the recovered object to obtain metadata of each object, writing the metadata into a distributed database, and obtaining a corresponding object from the distributed file system according to the metadata of the corresponding object when receiving an object obtaining request initiated by a client.
The Chinese patent CN202110317844.9, application date 2021.03.25, application name "a layered data storage system and method, a backup management server", a backup management server, an application server, a distributed object storage cluster and a tape library are provided, wherein the backup management server is used for generating and sending a first backup request to the application server according to the residual storage capacity and backup strategy of the distributed object storage cluster, generating and sending a second backup request to the distributed object storage cluster according to the residual storage capacity and the retention period of backup data of the tape library, the application server is connected with the distributed object storage cluster and used for storing corresponding data in the application server in the distributed object storage cluster according to the first backup request, and the distributed object storage cluster is connected with the tape library and used for storing corresponding data in the distributed object storage cluster in the tape library according to the second backup request. The method and the device can improve the data backup efficiency and success rate and reduce the operation and maintenance cost. ".
The data protection system and method based on the tape library are provided, wherein the data protection system comprises a data backup server and the tape library, the data backup server comprises a backup data storage module for storing data backed up from a production server and corresponding time point data sets, a data archiving module for establishing a data archiving task, archiving the data stored in the backup data storage module into the tape library by taking the time point data sets as units according to the data archiving task, and recording the position of each data object in the tape library, and a position database for storing the position of the data object in the tape library. Compared with the prior art, the invention not only can effectively protect the data, but also can reduce the cost by utilizing the advantages of the magnetic tape, and can conveniently manage the data in the magnetic tape.
In the above patents, the tape library is used as a general backup database, and the data of the tape library is not generally enabled in the case that the main data is not lost.
In fig. 8, when the tape library is used as a backup database, the object library is partially or wholly stored, and the object metadata and the object data are stored together.
In fig. 9, when the tape library is used as a backup database, the object library is partially or wholly stored, the object metadata is separated from the object data and is stored in different tape spaces, and the object metadata is not responsible for indexing the object data.
In the prior art, the tape device is used as the integral backup of the object database, the tape device is used in the mode, the tape storage device cannot be used for real object-oriented management, and when the data is read, the data in the tape device needs to be integrally restored to the database.
With the development of business, granularity of cold data is smaller and smaller, conversion of cold data and hot data is randomized gradually, such as transaction records of shopping websites, cold data before a few years, extremely cold data are possibly required to be converted into warm data because a certain user inquires about own consumption records. By adopting a cold data system or a tape backup system which is stored in batches, under the impact of the service, the tape backup data needs to be frequently moved to a warm data area, so that the performance of the system is greatly reduced, and the power consumption of the system is rapidly increased.
Abbreviation and key term definitions
MB, english shorthand, a storage unit in a computer, is called MByte in full. MB (all-MByte) a unit of storage in a computer, known as a "megabit".
Object storage, also known as object-based storage, is a generic term used to describe a method of resolving and processing discrete units, which are referred to as objects.
Object Metadata (Metadata), which is a set of name-value pairs, is information describing the properties of an object. In object storage, object metadata includes creation time, modification time, storage type, etc. of the object, which can help manage and retrieve the object.
Object Data (Data), in which managed objects are stored.
SSD storage device the present application refers to a device that uses SSD as a storage medium, such as SSD (Solid State disk STATE DRIVE), which is a storage device that uses flash memory storage technology, unlike traditional mechanical hard disks, where SSD has no mechanical moving parts.
Magnetic disk (disk) refers to a memory that stores data using magnetic recording technology. The magnetic disk is a main storage medium of a computer, can store a large amount of binary data, and can keep the data from losing after power is cut off.
Magnetic tape apparatus magnetic tape is a magnetically laminated tape material for recording sound, images, digital or other signals, and is the most versatile magnetic recording material with the greatest yield. Typically by coating a plastic film substrate (support) with a layer of granular magnetic material or by vapor deposition of a layer of magnetic oxide or alloy film.
Disclosure of Invention
According to the method and the device, through the index relation of the extremely cold object metadata recorded to the extremely cold object data, the two-layer index relation of the extremely cold object index of extremely cold object metadata of extremely cold object data can be realized, and the tape equipment can support object-oriented storage and access. By means of the two-layer index relationship, the tape storage device can be subjected to object-oriented storage addressing management, tape data do not need to be read in batches, and the access efficiency of the device is greatly improved
The extremely cold object tape storage system comprises an extremely cold object index layer, a cold object data layer and an extremely cold object metadata layer, wherein the extremely cold object index layer comprises storage devices used for storing extremely cold object indexes, the cold object data layer comprises storage devices C1 used for storing extremely cold object data, the extremely cold object metadata layer comprises storage devices D1 used for storing extremely cold object metadata, the extremely cold object indexes comprise names or numbers of extremely cold objects and storage device D1 position information stored by the extremely cold object metadata, the extremely cold object metadata comprises the names or numbers of the extremely cold objects and the position information of the storage devices C1 of the extremely cold object data, the storage devices C1 are tape storage devices, the storage devices D1 are tape storage devices, the position information of the storage devices D1 stored by the extremely cold object metadata is found in the extremely cold object indexes according to the names or the numbers, the position information of the storage devices D1 stored by the extremely cold object metadata is obtained after the extremely cold object metadata is obtained, and the extremely cold object data is obtained by the storage device D1.
The extremely cold object index layers may include extremely cold level 1 object index layers, extremely cold level 2 object index layers, and extremely cold level 3 object index layers.
The extremely cold level 1 object index layer may include an SSD storage device for holding extremely cold level 1 object indexes.
The extremely cold level 2 object index layer may include a hard disk storage device for holding extremely cold level 2 object indexes.
The extremely cold 3 level object index layer may include a tape storage device for holding extremely cold 3 level object indexes.
The extremely cold object index may be stored in a database.
The extremely cold object index may be stored by a key-value storage system.
The temperature object data layer comprises a storage device B1, wherein the storage device B1 is used for storing temperature object metadata, the temperature object data layer comprises a storage device B2, the storage device B2 is used for storing temperature object data, the temperature object metadata comprises temperature object names or numbers and position information of the storage device B2 of the temperature object data, the storage device B1 is an SSD storage device, the storage device B2 is a magnetic tape device or a magnetic disk device, the object metadata is acquired firstly, whether the data are on a magnetic disk or a magnetic tape is confirmed based on the content of the object metadata, the temperature object data are stored by an object storage system, and the temperature object metadata are stored by a key-value distributed storage system.
The cold object metadata storage system comprises a cold object data layer, wherein the warm object data layer comprises a storage device B3, the storage device B3 is used for storing cold object metadata, the cold object data layer comprises a storage device C3, the storage device C3 is used for storing cold object data, the cold object metadata comprises a cold object name or a number and position information of the storage device C3 of the cold object data, the storage device B3 is an SSD storage device or a hard disk device, the storage device C3 is a tape storage device, and the cold object metadata is stored by a key-value distributed storage system.
A storage access method of extremely cold objects comprises the steps of searching in an extremely cold object index library, judging whether metadata numbers are inquired or not, returning to the end, judging whether the metadata numbers are found in the extremely cold object index library, and obtaining extremely cold object metadata in extremely cold object metadata storage equipment according to the metadata numbers, wherein the extremely cold data are obtained in the extremely cold data storage equipment according to the extremely cold object metadata, the extremely cold object indexes are stored in SSD, magnetic discs or magnetic tape storage equipment, the storage equipment of the extremely cold object metadata library is magnetic tape equipment, and the storage equipment of the extremely cold data is magnetic tape equipment.
Step A10 is also included before step C10, wherein the step A10 is to search in a temperature object metadata base; A20 refers to whether to query the warm object metadata, A30 refers to obtaining the warm object data according to the warm object metadata and returning, B10 refers to whether to query the cold object metadata in a cold object metadata base, B20 refers to whether to query the cold object metadata, B30 refers to obtaining the cold object data according to the cold object metadata and returning, the storage device of the cold object metadata base is SSD storage device or disk device, the storage device of the cold object data is tape device or disk device, the storage device of the warm object metadata base is SSD storage device, and the storage device of the warm object data is SSD storage device.
It may be that in step C, if the extremely cold object metadata is moved to the cold object metadata base after the extremely cold object metadata is obtained, and in step D20, if the cold object metadata is moved to the SSD storage device after the cold object metadata is obtained.
A storage management method for extremely cold objects comprises the steps of periodically checking an object metadata base, obtaining temperature attributes and latest access time from the object metadata, if the temperature attributes are cold data, if the latest access time is greater than a set value temperature value TC, changing the temperature attributes of the data to be extremely cold data, migrating the object metadata to tape storage equipment, recording index positions of the object metadata in an extremely cold object index base, and deleting the object metadata in the object metadata base.
Before migration of the object metadata, counting the quantity of extremely cold metadata to be migrated, and if the quantity is larger than a migration threshold value, starting migration, wherein the threshold value is the minimum value of single-write data of the tape equipment.
May further include, if the temperature attribute is temperature data, if the latest access time is greater than a set temperature value TH; the method comprises the steps of determining that temperature attributes of object metadata change data are cold data, migrating the object data pointed by the object metadata into a tape storage device, recording tape position information after migration of the object metadata, deleting original object data, releasing storage positions, counting the number of the object data to be migrated before migration of the object data, and starting migration if the number is larger than a migration threshold value, wherein the threshold value is a minimum value of single write data of the tape device.
The technical effect of the technical scheme is that the two-layer index relationship of extremely cold object index, extremely cold object metadata and extremely cold object data can be realized through the index relationship from extremely cold object metadata to extremely cold object data, and the tape equipment can support object-oriented storage and access.
The technical scheme has the technical effects that the tape storage equipment can be subjected to object-oriented storage addressing management through the two-layer index relationship, tape data do not need to be read in batches, and the equipment access efficiency is greatly improved.
One of the technical effects of the technical scheme is that the extremely cold object tape storage system can be independently established and independently managed independently of the existing object storage system.
The technical scheme has the technical effects that the extremely cold object index layer comprises multiple stages, and can realize heat management in a long-span time period.
The technical scheme has the technical effects that the extremely cold object index layer comprises multiple stages, different storage media are indexed by extremely cold objects of different stages, and the storage cost of data with large span is greatly reduced.
The technical scheme has the technical effects that the extremely cold object index of extremely cold level 1 is stored in the SSD storage device, so that the searching speed can be improved.
One of the technical effects of the technical scheme is that the extremely cold object index of extremely cold level 2 is stored in the hard disk storage device, and the hard disk has higher access speed.
One of the technical effects of the technical scheme is that the extremely cold object index of extremely cold level 2 is stored in the tape storage device, so that massive long-span extremely cold data can be stored, and the storage cost is greatly reduced.
The technical scheme has the technical effects that mass long-span extremely-cold data can be stored, and single-object data addressing based on tape equipment can be achieved for extremely-low-frequency data access, so that the cold data access efficiency is high.
The technical scheme has the technical effects that the hierarchical storage of the extremely cold data can effectively control the data waiting tolerance of a user, and the user has waiting tolerance for extremely cold object data.
One of the technical effects of the technical scheme is that a warm object and a cold object are also built into a whole, data are gradually cooled into extremely cold object data and extremely cold object metadata, and data moving cost is low.
The technical scheme has the technical effects that the cold object is converted into the extremely cold object, metadata of the extremely cold object only need to be moved, the extremely cold object data does not need to be moved, and the system moving cost is low.
Drawings
FIG. 1 is a schematic block diagram of an extremely cold object tape storage system;
FIG. 2 is an extremely cold object tape storage system including a warm object data layer;
FIG. 3 is an extremely cold object tape storage system including a cold object data layer;
FIG. 4 is a schematic diagram of an extremely cold object storage access method;
FIG. 5 is a schematic diagram of a warm object store access method;
FIG. 6 is a schematic diagram of an extremely cold object storage management method;
FIG. 7 is a schematic diagram of a warm object storage management method;
FIG. 8 is a schematic diagram of a tape-based object store backup method;
FIG. 9 is a schematic diagram of a tape-based object storage backup method.
Detailed Description
The present application is described in further detail below with reference to the accompanying drawings. The following description of the preferred embodiments of the present application is not intended to limit the present application. The description of the preferred embodiments of the present application is merely illustrative of the general principles of the application. The numbers "first", "second" and "a" and "B" in the present application are for convenience of description only, and do not represent a time or space sequence relationship, and the letter and number combination "TA", "TB" and "H" in the present application are for convenience of description only, and the meaning is determined by the word in which they are referred to.
Referring to fig. 1, the extremely cold object tape storage system comprises an extremely cold object index layer, a cold object data layer and an extremely cold object metadata layer, wherein the extremely cold object index layer comprises storage devices used for storing extremely cold object indexes, the cold object data layer comprises storage devices C1 used for storing extremely cold object data, the extremely cold object metadata layer comprises storage devices D1 used for storing extremely cold object metadata, the extremely cold object indexes comprise names or numbers of extremely cold objects and storage device D1 position information stored by the extremely cold object metadata, the extremely cold object metadata comprises the names or numbers of the extremely cold objects and the position information of the storage devices C1 of the extremely cold object data, the storage devices C1 are tape storage devices, the storage devices D1 are tape storage devices, the position information of the storage devices D1 stored by the extremely cold object metadata is found according to the names or the numbers in the extremely cold object indexes, the position information of the storage devices D1 stored by the extremely cold object metadata is obtained after the extremely cold object metadata is obtained, and the extremely cold object data is obtained by the storage device D1 position information obtained by the extremely cold object metadata.
The storage device C1 and the storage device D1 are tape storage devices, which may be physically isolated devices, or may be the same physical device, and an extremely cold object index layer is adopted to establish an index relationship between an object name and extremely cold object metadata, and although the extremely cold object metadata is stored in the tape device, index searching may be performed on a single object through the extremely cold object index.
By storing extremely cold object metadata and extremely cold object data in the tape equipment, establishing an index relation of the extremely cold object metadata through extremely cold object indexes, and recording the index relation of the extremely cold object metadata to the extremely cold object data, the two-layer index relation of the extremely cold object indexes and the extremely cold object metadata can be realized, and the tape equipment can support object oriented storage and access.
The tape equipment is unidirectional rotation, the access speed is slow in the reading process, the traditional tape data is used for addressing and managing single objects, the time and the labor are wasted, the data volume stored by extremely cold objects is large, and the access frequency is low, but the real-time requirement of users on the data is not too high, for example, the users inquire about the data purchased once a few years ago, the users have the waiting tolerance, and the extremely cold data is stored by the tape equipment, so that the equipment cost pressure of the system can be greatly reduced. By adopting the secondary index, the cache cost of the data can be reduced, and the extremely cold object metadata can be distinguished from the extremely cold object data without completely reading the data of one disk.
The extremely cold object tape storage system can be built independently and managed independently of existing object storage systems.
The extremely cold object index layers may include extremely cold level 1 object index layers, extremely cold level 2 object index layers, and extremely cold level 3 object index layers.
Although the storage of extremely cold object indexes in SSD storage devices or disk devices is not expensive, for large systems such as nationwide shopping platforms, storage of shopping record data for 10 years or 20 years is very huge data even if the object data are indexed. By classifying extremely cold data, such as by far and near from the current year, the extremely cold data is classified into 1 level corresponding to data within 10 years from the current year, 2 level corresponding to data from 10 years to 20 years from the current year, and 3 level corresponding to data above 20 years from the current year.
The extremely cold level 1 object index layer may include an SSD storage device for holding extremely cold level 1 object indexes.
And the extremely cold object index of extremely cold level 1 is stored in the SSD storage device, so that the searching speed can be improved.
The extremely cold level 2 object index layer may include a hard disk storage device for holding extremely cold level 2 object indexes.
And the extremely cold object index of extremely cold level 2 is stored in the hard disk storage device, and the hard disk has a relatively high access speed.
The extremely cold 3 level object index layer may include a tape storage device for holding extremely cold 3 level object indexes.
For extremely cold object indexes of extremely cold level 2, stored on tape storage devices, in extreme cases, 3 separate tape storage devices need to be accessed separately, the access speed of the seek is relatively slowest, but the probability of occurrence of such speed is small, and such time overhead can be borne for individual users.
The extremely cold object index may be stored in a database.
The extremely cold object index is stored by a database, and for the extremely cold 2-level extremely cold object index, the data of the whole data sub-database of the tape storage device needs to be read into a cache disk, such as a magnetic disk, to be accessed. If the data is not stored in a database, but is stored in a sequence table, algorithm searching access, such as dichotomy, can be performed based on the relative position of the magnetic tape, so that the position of the data storage is gradually found.
The extremely cold object index may be stored by a key-value storage system.
For large data platforms, the key-value storage system may be used to store extremely cold object indices. The elements stored in the extremely cold object index are simple, and only the names of the objects, the tape equipment numbers and the positions can be stored, so that massive data can be stored by using a large key-value storage system.
As shown in FIG. 2, the temperature object data layer comprises a storage device B1, wherein the storage device B1 is used for storing temperature object metadata, the temperature object data layer comprises a storage device B2, the storage device B2 is used for storing temperature object data, the temperature object metadata comprises a temperature object name or a number and position information of the storage device B2 of the temperature object data, the storage device B1 is an SSD storage device, the storage device B2 is a magnetic tape device or a magnetic disk device, the object metadata is acquired firstly, whether the data are on a magnetic disk or a magnetic tape is confirmed based on the content of the object metadata, the temperature object data are stored by an object storage system, and the temperature object metadata are stored by a key-value distributed storage system.
The extremely cold object tape storage system may be part of a system that also builds a body of warm and cold objects, with data being gradually cooled into extremely cold object data and extremely cold object metadata.
For the online document data platform, after a file is generated, the file object data, the object metadata and the access frequency of the object data are repeatedly read and modified in a short period, a temperature object data layer can be set, the temperature object data layer stores the temperature object metadata and the temperature object data in an SSD storage device, and the temperature object metadata stores information such as the index position of the temperature object data, the establishment time of the temperature object data, the data property and the like.
As shown in FIG. 3, the cold object data layer comprises a cold object data layer, wherein the warm object data layer comprises a storage device B3, the storage device B3 is used for storing cold object metadata, the cold object data layer comprises a storage device C3, the storage device C3 is used for storing cold object data, the cold object metadata comprises a cold object name or a number and position information of the storage device C3 of the cold object data, the storage device B3 is an SSD storage device or a hard disk device, the storage device C3 is a magnetic tape storage device, and the cold object metadata is stored by a key-value distributed storage system.
For an online document data platform or a short video platform, after a file is generated for a period of time, the frequency of modification is gradually reduced, the frequency of access and reading is gradually reduced, the number of the files is huge, the timely reading of metadata information of the data needs to be reserved, and for object data, the faster reading speed is also needed, the data can be defined as cold data, and the cold data has the characteristic of being degraded by warm data, so that the object metadata of the data is kept still, the object data is moved to a cold data level, in the object metadata, the attribute of the data is modified into a cold object, and the movement of the cold object metadata is reduced.
The cold object metadata is formed by changing warm object metadata, when the access frequency of one data is reduced, in the object metadata, the attribute of the data is modified into an extremely cold object, the extremely cold object metadata is required to be moved to a tape device, an extremely cold object index is built in an extremely cold object index layer, and then the object metadata in the warm object data layer is deleted.
Referring to FIG. 4, the extremely cold object storage access method comprises the steps of searching in an extremely cold object index library, determining whether metadata numbers are inquired, determining whether metadata numbers are found, returning to the end, determining step D10, obtaining extremely cold object metadata in extremely cold object metadata storage equipment according to the metadata numbers, determining step D20, obtaining extremely cold data in the extremely cold data storage equipment according to the extremely cold object metadata, storing the extremely cold object indexes in SSD, a magnetic disk or a magnetic tape storage equipment, wherein the storage equipment of the extremely cold object metadata library is the magnetic tape equipment, and the storage equipment of the extremely cold data is the magnetic tape equipment.
The extremely cold object tape storage system can be independently established for independent management, and when the extremely cold object tape storage system reads data, the object-oriented storage and access of the tape equipment are supported by the two-layer index relationship of extremely cold object index, extremely cold object metadata and extremely cold object data.
The extremely cold object index is stored in the SSD, disk or tape storage device, with the speed of access being the fastest and the slowest. For a system with SSD, disk and tape, the extremely cold object indexes need to be accessed sequentially, and only if the extremely cold object indexes are not found in the tape storage device, the fact that the object is not stored in the system can be judged.
As shown in FIG. 5, before the step C10, the method further comprises the steps of searching in a warm object metadata base, the step A10 of judging whether the warm object metadata is inquired, the step A20 of judging whether the warm object metadata is inquired, the step A30 of judging whether the warm object metadata is obtained according to the warm object metadata, returning, the step B10 of judging whether the cold object metadata is inquired, the step B20 of judging whether the cold object metadata is inquired, the step B30 of judging whether the cold object metadata is obtained according to the cold object metadata, returning, wherein the storage device of the cold object metadata is SSD storage device or disk device, the storage device of the cold object data is tape device or disk device, the storage device of the warm object metadata base is SSD storage device, and the storage device of the warm object data is SSD storage device.
It may be that in step C, if the extremely cold object metadata is moved to the cold object metadata base after the extremely cold object metadata is obtained, and in step D20, if the cold object metadata is moved to the SSD storage device after the cold object metadata is obtained.
For a system with warm or cold objects, it is necessary to find access data in the order of warm, cold, and very cold objects, as the warm, cold, and very cold objects of the data are changing.
Referring to fig. 6, an extremely cold object storage management method includes periodically checking an object metadata base, acquiring a temperature attribute and a latest access time from the object metadata, if the temperature attribute is cold data and the latest access time is greater than a set temperature value TC, changing the temperature attribute of data to extremely cold data from the object metadata, migrating the object metadata to a tape storage device, recording an index position of the object metadata in an extremely cold object index base, and deleting the object metadata in the object metadata base.
The object metadata is periodically queried, for example, according to the characteristics of the service, for example, according to 1 day unit or 1 month unit, the data in the large object data system is detected, the latest accessed time record is detected, the heat of the data is judged, and the object data is moved first and then the object metadata is moved.
Before migration of the object metadata, counting the quantity of extremely cold metadata to be migrated, and if the quantity is larger than a migration threshold value, starting migration, wherein the threshold value is the minimum value of single-write data of the tape equipment.
The tape equipment, single motion, preferably adopts batch writing mode, can promote the utilization efficiency of equipment by a wide margin. When the data to be moved is queried, the label to be moved can be recorded in the object metadata, and the data can be moved once when the size of the movement is reached.
As shown in FIG. 7, the method further comprises the steps of if the temperature attribute is temperature data and the latest access time is greater than a set value temperature value TH, changing the temperature attribute of data to be cold data in the object metadata, migrating the object data pointed by the object metadata to a tape storage device, recording the migrated tape position information of the object metadata, deleting the original object data, releasing a storage position, counting the number of the object data to be migrated before the object data migration, and starting the migration if the number is greater than a migration threshold value, wherein the threshold value is the minimum value of the single write data of the tape device.
While the invention has been illustrated and described in terms of a preferred embodiment and several alternatives, the invention is not limited by the specific description in this specification. Other alternative or equivalent components may also be used in the practice of the present invention.